All of lore.kernel.org
 help / color / mirror / Atom feed
* [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec
@ 2015-12-10  1:46 Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Borislav Petkov, Masami Hiramatsu

When an HA clustering software or administrator detects unresponsiveness
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump.  If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issues, this patch set does following things:

- Don't call panic() on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "apic_extnmi=none" boot option which masks external NMI
  NMI at the boot time

Additionally, "apic_extnmi=all" is provieded.  This option unmasks
external NMI for all CPUs.  This would help cause kernel panic even if
CPU 0 can't handle an external NMI due to hang-up in NMI context
or being handled by other NMI handlers.

This patch set can be applied to current -tip tree.

V6:
- Update comments and patch descriptions all over the patch series
- Add documentation for kernel.panic_on_io_nmi sysctl (PATCH 6/6)
- Separate PATCH 5/6 from PATCH 2/6 because the portion is actually
  needed for "apic_extnmi=all" case introduced by PATCH 4/6
- ...and various fixes (please see the change logs in each patch
  description for details)

V5: https://lkml.org/lkml/2015/11/20/228
- Use WRITE_ONCE() for crash_ipi_done to keep the instruction order
  (PATCH 2/4)
- Address concurrent unknown/external NMI case, too (PATCH 2/4)
- Fix build errors (PATCH 3/4)
- Rename "noextnmi" boot option to "apic_extnmi" and expand its
  feature (PATCH 4/4)

V4: https://lkml.org/lkml/2015/9/25/193
- Improve comments and descriptions (PATCH 1/4 to 3/4)
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current cpu or not
  (PATCH 3/4)

V3: https://lkml.org/lkml/2015/8/6/39
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2: https://lkml.org/lkml/2015/7/27/31
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
  concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

V1: https://lkml.org/lkml/2015/7/22/81

---

Hidehiro Kawai (6):
      panic/x86: Fix re-entrance problem due to panic on NMI
      panic/x86: Allow CPUs to save registers even if they are looping in NMI context
      kexec: Fix race between panic() and crash_kexec() called directly
      x86/apic: Introduce apic_extnmi boot option
      x86/nmi: Fix to save registers for crash dump on external NMI broadcast
      Documentation: Add documentation for kernel.panic_on_io_nmi sysctl


 Documentation/kernel-parameters.txt |    9 +++++++++
 Documentation/sysctl/kernel.txt     |   15 +++++++++++++++
 arch/x86/include/asm/apic.h         |    5 +++++
 arch/x86/include/asm/reboot.h       |    1 +
 arch/x86/kernel/apic/apic.c         |   35 +++++++++++++++++++++++++++++++++--
 arch/x86/kernel/nmi.c               |   27 ++++++++++++++++++++++-----
 arch/x86/kernel/reboot.c            |   28 ++++++++++++++++++++++++++++
 include/linux/kernel.h              |   29 +++++++++++++++++++++++++++++
 include/linux/kexec.h               |    2 ++
 kernel/kexec_core.c                 |   30 +++++++++++++++++++++++++++++-
 kernel/panic.c                      |   29 ++++++++++++++++++++++++-----
 kernel/watchdog.c                   |    2 +-
 12 files changed, 198 insertions(+), 14 deletions(-)


-- 
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-10 15:41   ` Borislav Petkov
  2015-12-19 10:12   ` [tip:x86/apic] panic, x86: " tip-bot for Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context Hidehiro Kawai
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Borislav Petkov, Masami Hiramatsu

If panic on NMI happens just after panic() on the same CPU, panic()
is recursively called.  As the result, it stalls after failing to
acquire panic_lock.

To avoid this problem, don't call panic() in NMI context if
we've already entered panic().

V6:
- Add a comment about panic_cpu
- Replace the magic number -1 for panic_cpu with a macro

V4:
- Improve comments in io_check_error() and panic()

V3:
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another CPU already panicked

V2:
- Use atomic_cmpxchg() instead of current spin_trylock() to
  exclude concurrent accesses to the panic routines
- Don't introduce no-lock version of panic()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/kernel/nmi.c  |   16 ++++++++++++----
 include/linux/kernel.h |   21 +++++++++++++++++++++
 kernel/panic.c         |   15 ++++++++++++---
 kernel/watchdog.c      |    2 +-
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 697f90d..5131714 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 		 reason, smp_processor_id());
 	show_regs(regs);
 
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
+	if (panic_on_io_nmi) {
+		nmi_panic("NMI IOCK error: Not continuing");
+
+		/*
+		 * If we return from nmi_panic(), it means we have received
+		 * NMI while processing panic().  So, simply return without
+		 * a delay and re-enabling NMI.
+		 */
+		return;
+	}
 
 	/* Re-enable the IOCK line, wait for a few seconds */
 	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 350dfb0..db66867 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -446,6 +446,27 @@ extern int sysctl_panic_on_stackoverflow;
 extern bool crash_kexec_post_notifiers;
 
 /*
+ * panic_cpu holds a panicking CPU number and is used for exclusive
+ * execution of panic and crash_kexec routines. If the value is
+ * PANIC_CPU_INVALID, it means that none of CPU has entered panic or
+ * crash_kexec.
+ */
+extern atomic_t panic_cpu;
+#define PANIC_CPU_INVALID	-1
+
+/*
+ * A variant of panic() called from NMI context.
+ * If we've already panicked on this CPU, return from here.
+ */
+#define nmi_panic(fmt, ...)						\
+	do {								\
+		int this_cpu = raw_smp_processor_id();			\
+		if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu) \
+		    != this_cpu)					\
+			panic(fmt, ##__VA_ARGS__);			\
+	} while (0)
+
+/*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
  */
diff --git a/kernel/panic.c b/kernel/panic.c
index 4b150bc..3261e2d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
  */
 void panic(const char *fmt, ...)
 {
-	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
 	int state = 0;
+	int old_cpu, this_cpu;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
 	 * from deadlocking the first cpu that invokes the panic, since
 	 * there is nothing to prevent an interrupt handler (that runs
-	 * after the panic_lock is acquired) from invoking panic again.
+	 * after setting panic_cpu) from invoking panic again.
 	 */
 	local_irq_disable();
 
@@ -94,8 +96,15 @@ void panic(const char *fmt, ...)
 	 * multiple parallel invocations of panic, all other CPUs either
 	 * stop themself or will wait until they are stopped by the 1st CPU
 	 * with smp_send_stop().
+	 *
+	 * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which
+	 * comes here, so go ahead.
+	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
+	 * panic_cpu to this CPU.  In this case, this is also the 1st CPU.
 	 */
-	if (!spin_trylock(&panic_lock))
+	this_cpu = raw_smp_processor_id();
+	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
+	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
 	console_verbose();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 18f34cf..b9be18f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
-			panic("Hard LOCKUP");
+			nmi_panic("Hard LOCKUP");
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-19 10:13   ` [tip:x86/apic] panic, x86: Allow CPUs to save registers even if " tip-bot for Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Borislav Petkov, Masami Hiramatsu

Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends NMI IPI to non-panic CPUs to stop them and save their register
information and doing some cleanups for crash dumping.  However, if
a non-panic CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows (before applying this patch series):

  CPU 0                             CPU 1
  ===========================       ==========================
  receive an unknown NMI
  unknown_nmi_error()
    panic()                         receive an unknown NMI
      spin_trylock(&panic_lock)     unknown_nmi_error()
      crash_kexec()                   panic()
                                        spin_trylock(&panic_lock)
                                        panic_smp_self_stop()
                                          infinite loop
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                          infinite loop...

  Here, since CPU 1 is in NMI context, additional NMI from CPU 0
  is blocked until CPU 1 executes IRET.  However, CPU 1 never
  executes IRET, so the NMI is not handled and the callback function
  to save registers is never called.

  Actually, this can happen on some servers which broadcast NMIs to
  all CPUs when the dump button is pushed.

To save registers in this case, we need to:

  a) Return from NMI handler instead of looping infinitely
  or
  b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices.  So, we chose b).

This patch does following things:

1. Move the timing of `infinite loop in NMI context' (actually
   done by panic_smp_self_stop()) outside of panic() to enable us to
   refer pt_regs.  Please note that panic_smp_self_stop() is still
   used for normal context
2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
   registers and do some cleanups after setting waiting_for_crash_ipi
   which is used for counting down the number of CPUs which handled
   the callback

V6:
- Revise the commit description and many comments
- Move changes involved with nmi_reason_lock to a later patch because
  it turned out there is no problem at this point
- Rename crash_ipi_done to crash_ipi_issued to clarify its meaning

V5:
- Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
  compiler doesn't change the instruction order
- Support the case of b in the above description
- Add poll_crash_ipi_and_callback()

V4:
- Rewrite the patch description

V3:
- Newly introduced

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/kernel/nmi.c    |    6 +++---
 arch/x86/kernel/reboot.c |   20 ++++++++++++++++++++
 include/linux/kernel.h   |   14 +++++++++++---
 kernel/panic.c           |   10 ++++++++++
 kernel/watchdog.c        |    2 +-
 5 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5131714..5e00de7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -256,7 +256,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 	show_regs(regs);
 
 	if (panic_on_io_nmi) {
-		nmi_panic("NMI IOCK error: Not continuing");
+		nmi_panic(regs, "NMI IOCK error: Not continuing");
 
 		/*
 		 * If we return from nmi_panic(), it means we have received
@@ -305,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 02693dd..1da1302 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -718,6 +718,7 @@ static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
+static int crash_ipi_issued;
 
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
@@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
+	/* Kick CPUs looping in NMI context. */
+	WRITE_ONCE(crash_ipi_issued, 1);
+
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
 		mdelay(1);
@@ -788,6 +792,22 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	/* Leave the nmi callback set */
 }
+
+/* Override the weak function in kernel/panic.c */
+void nmi_panic_self_stop(struct pt_regs *regs)
+{
+	while (1) {
+		/*
+		 * Wait for the crash dumping IPI to be issued, and then
+		 * call its callback directly.
+		 */
+		if (READ_ONCE(crash_ipi_issued))
+			crash_nmi_callback(0, regs); /* Don't return */
+
+		cpu_relax();
+	}
+}
+
 #else /* !CONFIG_SMP */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index db66867..f28eebb 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -255,6 +255,7 @@ extern long (*panic_blink)(int state);
 __printf(1, 2)
 void panic(const char *fmt, ...)
 	__noreturn __cold;
+void nmi_panic_self_stop(struct pt_regs *);
 extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
@@ -457,13 +458,20 @@ extern atomic_t panic_cpu;
 /*
  * A variant of panic() called from NMI context.
  * If we've already panicked on this CPU, return from here.
+ * If another CPU already panicked, loop in nmi_panic_self_stop() which
+ * can provide architecture dependent code such as saving register states
+ * for crash dump.
  */
-#define nmi_panic(fmt, ...)						\
+#define nmi_panic(regs, fmt, ...)					\
 	do {								\
+		int old_cpu;						\
 		int this_cpu = raw_smp_processor_id();			\
-		if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu) \
-		    != this_cpu)					\
+		old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID,	\
+					 this_cpu);			\
+		if (old_cpu == PANIC_CPU_INVALID)			\
 			panic(fmt, ##__VA_ARGS__);			\
+		else if (old_cpu != this_cpu)				\
+			nmi_panic_self_stop(regs);			\
 	} while (0)
 
 /*
diff --git a/kernel/panic.c b/kernel/panic.c
index 3261e2d..3d6c3f1 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -61,6 +61,16 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+/*
+ * Stop ourselves in NMI context if another cpu has already panicked.
+ * Architecture code may override this to prepare for crash dumping
+ * (e.g. save register information).
+ */
+void __weak nmi_panic_self_stop(struct pt_regs *regs)
+{
+	panic_smp_self_stop();
+}
+
 atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b9be18f..84b5035 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
-			nmi_panic("Hard LOCKUP");
+			nmi_panic(regs, "Hard LOCKUP");
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-19 10:13   ` [tip:x86/apic] kexec: Fix race between panic() and crash_kexec() tip-bot for Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Borislav Petkov, Masami Hiramatsu

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        nmi_shootdown_cpus() // stop other CPUs

CPU 1:
  panic()
    crash_kexec()
      mutex_trylock() // failed to acquire
    smp_send_stop() // stop other CPUs
    infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        <NMI>
        io_check_error()
          panic()
            crash_kexec()
              mutex_trylock() // failed to acquire
            infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude
others by using atomic_t panic_cpu.

V6:
- Use PANIC_CPU_INVALID instead of hard-coded -1
- Add comments about __crash_kexec()

V5:
- Add missing dummy __crash_kexec() for !CONFIG_KEXEC_CORE case
- Replace atomic_xchg() with atomic_set() in crash_kexec() because
  it is used as a release operation and there is no need of memory
  barrier effect.  This change also removes an unused value warning

V4:
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current CPU or not

V2:
- Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
  to exclude concurrent accesses
- Don't introduce no-lock version of crash_kexec()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 include/linux/kexec.h |    2 ++
 kernel/kexec_core.c   |   30 +++++++++++++++++++++++++++++-
 kernel/panic.c        |    4 ++--
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d140b1e..7b68d27 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage *image,
 					  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 					     const char *name);
+extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
@@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
 #define kexec_in_progress false
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 11b64a6..c823f30 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -853,7 +853,12 @@ struct kimage *kexec_image;
 struct kimage *kexec_crash_image;
 int kexec_load_disabled;
 
-void crash_kexec(struct pt_regs *regs)
+/*
+ * No panic_cpu check version of crash_kexec().  This function is called
+ * only when panic_cpu holds the current CPU number; this is the only CPU
+ * which processes crash_kexec routines.
+ */
+void __crash_kexec(struct pt_regs *regs)
 {
 	/* Take the kexec_mutex here to prevent sys_kexec_load
 	 * running on one cpu from replacing the crash kernel
@@ -876,6 +881,29 @@ void crash_kexec(struct pt_regs *regs)
 	}
 }
 
+void crash_kexec(struct pt_regs *regs)
+{
+	int old_cpu, this_cpu;
+
+	/*
+	 * Only one CPU is allowed to execute the crash_kexec() code as with
+	 * panic().  Otherwise parallel calls of panic() and crash_kexec()
+	 * may stop each other.  To exclude them, we use panic_cpu here too.
+	 */
+	this_cpu = raw_smp_processor_id();
+	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
+	if (old_cpu == PANIC_CPU_INVALID) {
+		/* This is the 1st CPU which comes here, so go ahead. */
+		__crash_kexec(regs);
+
+		/*
+		 * Reset panic_cpu to allow another panic()/crash_kexec()
+		 * call.
+		 */
+		atomic_set(&panic_cpu, PANIC_CPU_INVALID);
+	}
+}
+
 size_t crash_get_memory_size(void)
 {
 	size_t size = 0;
diff --git a/kernel/panic.c b/kernel/panic.c
index 3d6c3f1..81a0a3d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -138,7 +138,7 @@ void panic(const char *fmt, ...)
 	 * the "crash_kexec_post_notifiers" option to the kernel.
 	 */
 	if (!crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL); /* bypass panic_cpu check */
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -163,7 +163,7 @@ void panic(const char *fmt, ...)
 	 * more unstable, it can increase risks of the kdump failure too.
 	 */
 	if (crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL); /* bypass panic_cpu check */
 
 	bust_spinlocks(0);
 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
                   ` (2 preceding siblings ...)
  2015-12-10  1:46 ` [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-19 10:13   ` [tip:x86/apic] x86/apic: Introduce apic_extnmi command line parameter tip-bot for Hidehiro Kawai
  2015-12-10  1:46 ` [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast Hidehiro Kawai
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Borislav Petkov, Masami Hiramatsu

This patch introduces new boot option, apic_extnmi:

 apic_extnmi={ bsp | all | none}

The default value is "bsp" and this is the current behavior; only
BSP receives external NMI.  "all" allows external NMIs to be
broadcast to all CPUs.  This would raise the success rate of panic
on NMI when BSP hangs up in NMI context or the external NMI is
swallowed by other NMI handlers on BSP.  If you specified "none",
any CPUs don't receive external NMIs.  This is useful for dump
capture kernel so that it wouldn't be shot down while saving a
crash dump.

V6:
- Fix comments
- Change to use strncmp to parse the value of "apic_extnmi="
- Make apic_set_extnmi() return -EINVAL if the given value is invalid

V5:
- Rename the option from "noextnmi" to "apic_extnmi"
- Add apic_extnmi=all feature
- Fix the wrong documentation about "noextnmi" (apic_extnmi=none)

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-parameters.txt |    9 +++++++++
 arch/x86/include/asm/apic.h         |    5 +++++
 arch/x86/kernel/apic/apic.c         |   35 +++++++++++++++++++++++++++++++++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..74acea5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -472,6 +472,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Change the amount of debugging information output
 			when initialising the APIC and IO-APIC components.
 
+	apic_extnmi=	[APIC,X86] External NMI delivery setting
+			Format: { bsp (default) | all | none }
+			bsp:  External NMI is delivered only to CPU 0
+			all:  External NMIs are broadcast to all CPUs as a
+			      backup of CPU 0
+			none: External NMI is masked for all CPUs. This is
+			      useful so that a dump capture kernel won't be
+			      shot down by NMI
+
 	autoconf=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 7f62ad4..c80f6b6 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -23,6 +23,11 @@
 #define APIC_VERBOSE 1
 #define APIC_DEBUG   2
 
+/* Macros for apic_extnmi which controls external NMI masking */
+#define APIC_EXTNMI_BSP		0 /* Default */
+#define APIC_EXTNMI_ALL		1
+#define APIC_EXTNMI_NONE	2
+
 /*
  * Define the default level of output to be very little
  * This can be turned up by using apic=verbose for more
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8d7df74..162d2bd 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -82,6 +82,12 @@ physid_mask_t phys_cpu_present_map;
 static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
 
 /*
+ * This variable controls which CPUs receive external NMIs.  By default,
+ * external NMIs are delivered only to the BSP.
+ */
+static int apic_extnmi = APIC_EXTNMI_BSP;
+
+/*
  * Map cpu index to physical APIC ID
  */
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
 	value = APIC_DM_NMI;
 	if (!lapic_is_integrated())		/* 82489DX */
 		value |= APIC_LVT_LEVEL_TRIGGER;
+	if (apic_extnmi == APIC_EXTNMI_NONE)
+		value |= APIC_LVT_MASKED;
 	apic_write(APIC_LVT1, value);
 }
 
@@ -1378,9 +1386,11 @@ void setup_local_APIC(void)
 	apic_write(APIC_LVT0, value);
 
 	/*
-	 * only the BP should see the LINT1 NMI signal, obviously.
+	 * Only the BP sees the LINT1 NMI signal by default.  This can be
+	 * modified by apic_extnmi= boot option.
 	 */
-	if (!cpu)
+	if ((!cpu && apic_extnmi != APIC_EXTNMI_NONE) ||
+	    apic_extnmi == APIC_EXTNMI_ALL)
 		value = APIC_DM_NMI;
 	else
 		value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -2557,3 +2567,24 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
 	return 0;
 }
 early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
+
+static int __init apic_set_extnmi(char *arg)
+{
+	if (!arg)
+		return -EINVAL;
+
+	if (!strncmp("all", arg, 3))
+		apic_extnmi = APIC_EXTNMI_ALL;
+	else if (!strncmp("none", arg, 4))
+		apic_extnmi = APIC_EXTNMI_NONE;
+	else if (!strncmp("bsp", arg, 3))
+		apic_extnmi = APIC_EXTNMI_BSP;
+	else {
+		pr_warn("Unknown external NMI delivery mode `%s' ignored\n",
+			arg);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+early_param("apic_extnmi", apic_set_extnmi);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
                   ` (3 preceding siblings ...)
  2015-12-10  1:46 ` [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-10  3:57   ` kbuild test robot
  2015-12-10  1:46 ` [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl Hidehiro Kawai
  2015-12-12 11:17 ` [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Borislav Petkov
  6 siblings, 1 reply; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Borislav Petkov, Masami Hiramatsu

Now, multiple CPUs can receive external NMI simultaneously by
specifying "apic_extnmi=all" as an boot option.  When we take a
crash dump by using external NMI with this option, we fail to save
register values into the crash dump.  This happens as follows:

  CPU 0                              CPU 1
  ================================   =============================
  receive an external NMI
  default_do_nmi()                   receive an external NMI
    spin_lock(&nmi_reason_lock)      default_do_nmi()
    io_check_error()                   spin_lock(&nmi_reason_lock)
      panic()                            busy loop
      ...
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                         busy loop...

  Here, since CPU 1 is in NMI context, additional NMI from CPU 0
  is blocked until CPU 1 executes IRET.  However, CPU 1 never
  executes IRET, so the NMI is not handled and the callback function
  to save registers is never called.

To solve this issue, we check if the IPI for crash dumping was
issued while waiting for nmi_reason_lock to be released, and if so,
call its callback function directly.  If the IPI is not issued (e.g.
kdump is disabled), the actual behavior doesn't change.

V6:
- Separated from the former patch `panic/x86: Allow cpus to save
  registers even if they are looping in NMI context'
- Fix comments
- Remove unneeded UP version of poll_crash_ipi_and_calback
- Rename poll_crash_ipi_and_callback to run_crash_ipi_callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/include/asm/reboot.h |    1 +
 arch/x86/kernel/nmi.c         |   11 ++++++++++-
 arch/x86/kernel/reboot.c      |   18 +++++++++++++-----
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index a82c4f1..2cb1cc2 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
 
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+void run_crash_ipi_callback(struct pt_regs *regs);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5e00de7..cbfa0b5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -29,6 +29,7 @@
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
+#include <asm/reboot.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
 	}
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
+	/*
+	 * Another CPU may be processing panic routines while holding
+	 * nmi_reason_lock.  Check if the CPU issued the IPI for crash
+	 * dumping, and if so, call its callback directly.  If there is
+	 * no CPU preparing crash dump, we simply loop here without doing
+	 * special things.
+	 */
+	while (!raw_spin_trylock(&nmi_reason_lock))
+		run_crash_ipi_callback(regs);
 	reason = x86_platform.get_nmi_reason();
 
 	if (reason & NMI_REASON_MASK) {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 1da1302..60a216b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -793,17 +793,25 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	/* Leave the nmi callback set */
 }
 
+/*
+ * Wait for the crash dumping IPI to be issued, and then call its callback
+ * directly.  This function is used when we have already been in NMI handler.
+ */
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+	if (crash_ipi_issued)
+		crash_nmi_callback(0, regs); /* Don't return */
+}
+
 /* Override the weak function in kernel/panic.c */
 void nmi_panic_self_stop(struct pt_regs *regs)
 {
 	while (1) {
 		/*
-		 * Wait for the crash dumping IPI to be issued, and then
-		 * call its callback directly.
+		 * If there is no CPU preparing crash dump, we simply loop
+		 * here without doing special things.
 		 */
-		if (READ_ONCE(crash_ipi_issued))
-			crash_nmi_callback(0, regs); /* Don't return */
-
+		run_crash_ipi_callback(regs);
 		cpu_relax();
 	}
 }



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
                   ` (4 preceding siblings ...)
  2015-12-10  1:46 ` [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast Hidehiro Kawai
@ 2015-12-10  1:46 ` Hidehiro Kawai
  2015-12-19 10:14   ` [tip:x86/apic] Documentation: Document " tip-bot for Hidehiro Kawai
  2015-12-12 11:17 ` [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Borislav Petkov
  6 siblings, 1 reply; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  1:46 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Borislav Petkov, Masami Hiramatsu

kernel.panic_on_io_nmi sysctl was introduced by commit 5211a242d0cb
("x86: Add sysctl to allow panic on IOCK NMI error"), but its
documentation is missing. So, add it.

V6:
- Newly added

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 Documentation/sysctl/kernel.txt |   15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index af70d15..235f804 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -551,6 +551,21 @@ the recommended setting is 60.
 
 ==============================================================
 
+panic_on_io_nmi:
+
+Controls the kernel's behavior when a CPU receives an NMI caused by
+an IO error.
+
+0: try to continue operation (default)
+
+1: panic immediately. The IO error triggered NMI indicates a serious
+   system condition, which could result in IO data corruption. Rather
+   than continuing, panicking might be a better choice. Some servers
+   issue this sort of NMI when the dump button is pushed, and you
+   can use this option to take a crash dump
+
+==============================================================
+
 panic_on_oops:
 
 Controls the kernel's behaviour when an oops or BUG is encountered.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast
  2015-12-10  1:46 ` [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast Hidehiro Kawai
@ 2015-12-10  3:57   ` kbuild test robot
  2015-12-10  6:36     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 20+ messages in thread
From: kbuild test robot @ 2015-12-10  3:57 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: kbuild-all, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Eric W. Biederman, H. Peter Anvin, Andrew Morton,
	Thomas Gleixner, Vivek Goyal, Baoquan He, linux-doc, x86, kexec,
	linux-kernel, Steven Rostedt, Michal Hocko, Ingo Molnar,
	Borislav Petkov, Masami Hiramatsu

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

Hi Hidehiro,

[auto build test ERROR on v4.4-rc4]
[also build test ERROR on next-20151209]
[cannot apply to tip/x86/core]

url:    https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095254
config: x86_64-randconfig-s4-12101030 (attached as .config)
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   arch/x86/built-in.o: In function `do_nmi':
>> (.text+0x7339): undefined reference to `run_crash_ipi_callback'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 25886 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Re: [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast
  2015-12-10  3:57   ` kbuild test robot
@ 2015-12-10  6:36     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-12-10  6:52       ` [V6.1 " Hidehiro Kawai
  0 siblings, 1 reply; 20+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-12-10  6:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: kbuild-all, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Eric W. Biederman, H. Peter Anvin, Andrew Morton,
	Thomas Gleixner, Vivek Goyal, Baoquan He, linux-doc, x86, kexec,
	linux-kernel, Michal Hocko, Ingo Molnar, Borislav Petkov,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2017 bytes --]

Hi Steven,

> From: Steven Rostedt [mailto:rostedt@goodmis.org]
> On Tue, Nov 24, 2015 at 11:48:53AM +0100, Borislav Petkov wrote:
> > > +	 */
> > > +	while (!raw_spin_trylock(&nmi_reason_lock))
> > > +		poll_crash_ipi_and_callback(regs);
> >
> > Waaait a minute: so if we're getting NMIs broadcasted on every core but
> > we're *not* crash dumping, we will run into here too. This can't be
> > right. :-\
> 
> This only does something if crash_ipi_done is set, which means you are killing
> the box. But perhaps a comment that states that here would be useful, or maybe
> just put in the check here. There's no need to make it depend on SMP, as
> raw_spin_trylock() will turn to just ({1}) for UP, and that code wont even be
> hit.

It seems that poll_crash_ipi_and_callback (now renamed to run_crash_ipi_callback)
is referred for UP if CONFIG_DEBUG_SPINLOCK=y, and it causes a build error
as below.  run_crash_ipi_callback refers crash_ipi_issued and crash_nmi_callback,
which are defined only if CONFIG_SMP=y.  So we need to defined it separately
for SMP and UP.

I'll resend this patch later.

> Hi Hidehiro,
> 
> [auto build test ERROR on v4.4-rc4]
> [also build test ERROR on next-20151209]
> [cannot apply to tip/x86/core]
> 
> url:
> https://github.com/0day-ci/linux/commits/Hidehiro-Kawai/Fix-race-issues-among-panic-NMI-and-crash_kexec/20151210-095
> 254
> config: x86_64-randconfig-s4-12101030 (attached as .config)
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64
> 
> All errors (new ones prefixed by >>):
> 
>    arch/x86/built-in.o: In function `do_nmi':
> >> (.text+0x7339): undefined reference to `run_crash_ipi_callback'
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [V6.1 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast
  2015-12-10  6:36     ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-12-10  6:52       ` Hidehiro Kawai
  2015-12-11 18:04         ` Borislav Petkov
  2015-12-19 10:14         ` [tip:x86/apic] x86/nmi: Save regs in crash dump on external NMI tip-bot for Hidehiro Kawai
  0 siblings, 2 replies; 20+ messages in thread
From: Hidehiro Kawai @ 2015-12-10  6:52 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Borislav Petkov, Masami Hiramatsu

Now, multiple CPUs can receive external NMI simultaneously by
specifying "apic_extnmi=all" as an boot option.  When we take a
crash dump by using external NMI with this option, we fail to save
register values into the crash dump.  This happens as follows:

  CPU 0                              CPU 1
  ================================   =============================
  receive an external NMI
  default_do_nmi()                   receive an external NMI
    spin_lock(&nmi_reason_lock)      default_do_nmi()
    io_check_error()                   spin_lock(&nmi_reason_lock)
      panic()                            busy loop
      ...
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                         busy loop...

  Here, since CPU 1 is in NMI context, additional NMI from CPU 0
  is blocked until CPU 1 executes IRET.  However, CPU 1 never
  executes IRET, so the NMI is not handled and the callback function
  to save registers is never called.

To solve this issue, we check if the IPI for crash dumping was
issued while waiting for nmi_reason_lock to be released, and if so,
call its callback function directly.  If the IPI is not issued (e.g.
kdump is disabled), the actual behavior doesn't change.

V6.1:
- Reintroduce the UP version of run_crash_ipi_callback to fix build
  error for CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y case

V6:
- Separated from the former patch `panic/x86: Allow cpus to save
  registers even if they are looping in NMI context'
- Fix comments
- Remove unneeded UP version of poll_crash_ipi_and_calback
- Rename poll_crash_ipi_and_callback to run_crash_ipi_callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/include/asm/reboot.h |    1 +
 arch/x86/kernel/nmi.c         |   11 ++++++++++-
 arch/x86/kernel/reboot.c      |   22 +++++++++++++++++-----
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index a82c4f1..2cb1cc2 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
 
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+void run_crash_ipi_callback(struct pt_regs *regs);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5e00de7..cbfa0b5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -29,6 +29,7 @@
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
+#include <asm/reboot.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
 	}
 
 	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
+	/*
+	 * Another CPU may be processing panic routines while holding
+	 * nmi_reason_lock.  Check if the CPU issued the IPI for crash
+	 * dumping, and if so, call its callback directly.  If there is
+	 * no CPU preparing crash dump, we simply loop here without doing
+	 * special things.
+	 */
+	while (!raw_spin_trylock(&nmi_reason_lock))
+		run_crash_ipi_callback(regs);
 	reason = x86_platform.get_nmi_reason();
 
 	if (reason & NMI_REASON_MASK) {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 1da1302..8a184e3 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -793,17 +793,25 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	/* Leave the nmi callback set */
 }
 
+/*
+ * Wait for the crash dumping IPI to be issued, and then call its callback
+ * directly.  This function is used when we have already been in NMI handler.
+ */
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+	if (crash_ipi_issued)
+		crash_nmi_callback(0, regs); /* Don't return */
+}
+
 /* Override the weak function in kernel/panic.c */
 void nmi_panic_self_stop(struct pt_regs *regs)
 {
 	while (1) {
 		/*
-		 * Wait for the crash dumping IPI to be issued, and then
-		 * call its callback directly.
+		 * If there is no CPU preparing crash dump, we simply loop
+		 * here without doing special things.
 		 */
-		if (READ_ONCE(crash_ipi_issued))
-			crash_nmi_callback(0, regs); /* Don't return */
-
+		run_crash_ipi_callback(regs);
 		cpu_relax();
 	}
 }
@@ -813,4 +821,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	/* No other CPUs to shoot down */
 }
+
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+}
 #endif



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
@ 2015-12-10 15:41   ` Borislav Petkov
  2015-12-11  0:23     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-12-19 10:12   ` [tip:x86/apic] panic, x86: " tip-bot for Hidehiro Kawai
  1 sibling, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2015-12-10 15:41 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Masami Hiramatsu

On Thu, Dec 10, 2015 at 10:46:26AM +0900, Hidehiro Kawai wrote:
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V6:
> - Add a comment about panic_cpu
> - Replace the magic number -1 for panic_cpu with a macro
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another CPU already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
>  arch/x86/kernel/nmi.c  |   16 ++++++++++++----
>  include/linux/kernel.h |   21 +++++++++++++++++++++
>  kernel/panic.c         |   15 ++++++++++++---
>  kernel/watchdog.c      |    2 +-
>  4 files changed, 46 insertions(+), 8 deletions(-)

Looks better.

Did some comments cleanup and nmi_panic() macro reformatting so that it
is more readable and ended up applying this:

---
From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Date: Thu, 10 Dec 2015 10:46:26 +0900
Subject: [PATCH] panic, x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/nmi.c  | 16 ++++++++++++----
 include/linux/kernel.h | 20 ++++++++++++++++++++
 kernel/panic.c         | 16 +++++++++++++---
 kernel/watchdog.c      |  2 +-
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 697f90db0e37..292a24bd0553 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 		 reason, smp_processor_id());
 	show_regs(regs);
 
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
+	if (panic_on_io_nmi) {
+		nmi_panic("NMI IOCK error: Not continuing");
+
+		/*
+		 * If end up here, it means we have received an NMI while
+		 * processing panic(). Simply return without delaying and
+		 * re-enabling NMI.
+		 */
+		return;
+	}
 
 	/* Re-enable the IOCK line, wait for a few seconds */
 	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 350dfb08aee3..750cc5c7c999 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
 extern bool crash_kexec_post_notifiers;
 
 /*
+ * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
+ * holds a CPU number which is executing panic() currently. A value of
+ * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
+ */
+extern atomic_t panic_cpu;
+#define PANIC_CPU_INVALID	-1
+
+/*
+ * A variant of panic() called from NMI context. We return if we've already
+ * panicked on this CPU.
+ */
+#define nmi_panic(fmt, ...)						\
+do {									\
+	int cpu = raw_smp_processor_id();				\
+									\
+	if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu) != cpu)	\
+		panic(fmt, ##__VA_ARGS__);				\
+} while (0)
+
+/*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
  */
diff --git a/kernel/panic.c b/kernel/panic.c
index 4b150bc0c6c1..3344524cf6ff 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
  */
 void panic(const char *fmt, ...)
 {
-	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
 	int state = 0;
+	int old_cpu, this_cpu;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
 	 * from deadlocking the first cpu that invokes the panic, since
 	 * there is nothing to prevent an interrupt handler (that runs
-	 * after the panic_lock is acquired) from invoking panic again.
+	 * after setting panic_cpu) from invoking panic() again.
 	 */
 	local_irq_disable();
 
@@ -94,8 +96,16 @@ void panic(const char *fmt, ...)
 	 * multiple parallel invocations of panic, all other CPUs either
 	 * stop themself or will wait until they are stopped by the 1st CPU
 	 * with smp_send_stop().
+	 *
+	 * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which
+	 * comes here, so go ahead.
+	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
+	 * panic_cpu to this CPU.  In this case, this is also the 1st CPU.
 	 */
-	if (!spin_trylock(&panic_lock))
+	this_cpu = raw_smp_processor_id();
+	old_cpu  = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
+
+	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
 	console_verbose();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 18f34cf75f74..b9be18fae154 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
-			panic("Hard LOCKUP");
+			nmi_panic("Hard LOCKUP");
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;
-- 
2.3.5

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: Re: [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-12-10 15:41   ` Borislav Petkov
@ 2015-12-11  0:23     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 20+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-12-11  0:23 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7642 bytes --]

> From: Borislav Petkov [mailto:bp@alien8.de]
[...]
> Looks better.
> 
> Did some comments cleanup and nmi_panic() macro reformatting so that it
> is more readable and ended up applying this:

Thanks a lot!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> ---
> From: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Date: Thu, 10 Dec 2015 10:46:26 +0900
> Subject: [PATCH] panic, x86: Fix re-entrance problem due to panic on NMI
> 
> If panic on NMI happens just after panic() on the same CPU, panic() is
> recursively called. Kernel stalls, as a result, after failing to acquire
> panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if we've
> already entered panic().
> 
> For that, introduce nmi_panic() macro to reduce code duplication. In
> the case of panic on NMI, don't return from NMI handlers if another CPU
> already panicked.
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Cc: Aaron Tomlin <atomlin@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Javi Merino <javi.merino@arm.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: kexec@lists.infradead.org
> Cc: linux-doc@vger.kernel.org
> Cc: lkml <linux-kernel@vger.kernel.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
> [ Cleanup comments, fixup formatting. ]
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/nmi.c  | 16 ++++++++++++----
>  include/linux/kernel.h | 20 ++++++++++++++++++++
>  kernel/panic.c         | 16 +++++++++++++---
>  kernel/watchdog.c      |  2 +-
>  4 files changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90db0e37..292a24bd0553 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
>  #endif
> 
>  	if (panic_on_unrecovered_nmi)
> -		panic("NMI: Not continuing");
> +		nmi_panic("NMI: Not continuing");
> 
>  	pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
>  		 reason, smp_processor_id());
>  	show_regs(regs);
> 
> -	if (panic_on_io_nmi)
> -		panic("NMI IOCK error: Not continuing");
> +	if (panic_on_io_nmi) {
> +		nmi_panic("NMI IOCK error: Not continuing");
> +
> +		/*
> +		 * If end up here, it means we have received an NMI while
> +		 * processing panic(). Simply return without delaying and
> +		 * re-enabling NMI.
> +		 */
> +		return;
> +	}
> 
>  	/* Re-enable the IOCK line, wait for a few seconds */
>  	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> 
>  	pr_emerg("Do you have a strange power saving mode enabled?\n");
>  	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> -		panic("NMI: Not continuing");
> +		nmi_panic("NMI: Not continuing");
> 
>  	pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 350dfb08aee3..750cc5c7c999 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
>  extern bool crash_kexec_post_notifiers;
> 
>  /*
> + * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
> + * holds a CPU number which is executing panic() currently. A value of
> + * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
> + */
> +extern atomic_t panic_cpu;
> +#define PANIC_CPU_INVALID	-1
> +
> +/*
> + * A variant of panic() called from NMI context. We return if we've already
> + * panicked on this CPU.
> + */
> +#define nmi_panic(fmt, ...)						\
> +do {									\
> +	int cpu = raw_smp_processor_id();				\
> +									\
> +	if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu) != cpu)	\
> +		panic(fmt, ##__VA_ARGS__);				\
> +} while (0)
> +
> +/*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
>   */
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 4b150bc0c6c1..3344524cf6ff 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
>  		cpu_relax();
>  }
> 
> +atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> -	static DEFINE_SPINLOCK(panic_lock);
>  	static char buf[1024];
>  	va_list args;
>  	long i, i_next = 0;
>  	int state = 0;
> +	int old_cpu, this_cpu;
> 
>  	/*
>  	 * Disable local interrupts. This will prevent panic_smp_self_stop
>  	 * from deadlocking the first cpu that invokes the panic, since
>  	 * there is nothing to prevent an interrupt handler (that runs
> -	 * after the panic_lock is acquired) from invoking panic again.
> +	 * after setting panic_cpu) from invoking panic() again.
>  	 */
>  	local_irq_disable();
> 
> @@ -94,8 +96,16 @@ void panic(const char *fmt, ...)
>  	 * multiple parallel invocations of panic, all other CPUs either
>  	 * stop themself or will wait until they are stopped by the 1st CPU
>  	 * with smp_send_stop().
> +	 *
> +	 * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which
> +	 * comes here, so go ahead.
> +	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
> +	 * panic_cpu to this CPU.  In this case, this is also the 1st CPU.
>  	 */
> -	if (!spin_trylock(&panic_lock))
> +	this_cpu = raw_smp_processor_id();
> +	old_cpu  = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
> +
> +	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
>  		panic_smp_self_stop();
> 
>  	console_verbose();
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 18f34cf75f74..b9be18fae154 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  			trigger_allbutself_cpu_backtrace();
> 
>  		if (hardlockup_panic)
> -			panic("Hard LOCKUP");
> +			nmi_panic("Hard LOCKUP");
> 
>  		__this_cpu_write(hard_watchdog_warn, true);
>  		return;
> --
> 2.3.5
> 
> --
> Regards/Gruss,
>     Boris.
> 
> ECO tip #101: Trim your mails when you reply.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [V6.1 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast
  2015-12-10  6:52       ` [V6.1 " Hidehiro Kawai
@ 2015-12-11 18:04         ` Borislav Petkov
  2015-12-19 10:14         ` [tip:x86/apic] x86/nmi: Save regs in crash dump on external NMI tip-bot for Hidehiro Kawai
  1 sibling, 0 replies; 20+ messages in thread
From: Borislav Petkov @ 2015-12-11 18:04 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Ingo Molnar, Masami Hiramatsu

On Thu, Dec 10, 2015 at 03:52:46PM +0900, Hidehiro Kawai wrote:
> Now, multiple CPUs can receive external NMI simultaneously by
> specifying "apic_extnmi=all" as an boot option.  When we take a
> crash dump by using external NMI with this option, we fail to save
> register values into the crash dump.  This happens as follows:
> 
>   CPU 0                              CPU 1
>   ================================   =============================
>   receive an external NMI
>   default_do_nmi()                   receive an external NMI
>     spin_lock(&nmi_reason_lock)      default_do_nmi()
>     io_check_error()                   spin_lock(&nmi_reason_lock)
>       panic()                            busy loop
>       ...
>         kdump_nmi_shootdown_cpus()
>           issue NMI IPI -----------> blocked until IRET
>                                          busy loop...
> 
>   Here, since CPU 1 is in NMI context, additional NMI from CPU 0
>   is blocked until CPU 1 executes IRET.  However, CPU 1 never
>   executes IRET, so the NMI is not handled and the callback function
>   to save registers is never called.
> 
> To solve this issue, we check if the IPI for crash dumping was
> issued while waiting for nmi_reason_lock to be released, and if so,
> call its callback function directly.  If the IPI is not issued (e.g.
> kdump is disabled), the actual behavior doesn't change.
> 
> V6.1:
> - Reintroduce the UP version of run_crash_ipi_callback to fix build
>   error for CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y case
> 
> V6:
> - Separated from the former patch `panic/x86: Allow cpus to save
>   registers even if they are looping in NMI context'
> - Fix comments
> - Remove unneeded UP version of poll_crash_ipi_and_calback
> - Rename poll_crash_ipi_and_callback to run_crash_ipi_callback
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
>  arch/x86/include/asm/reboot.h |    1 +
>  arch/x86/kernel/nmi.c         |   11 ++++++++++-
>  arch/x86/kernel/reboot.c      |   22 +++++++++++++++++-----
>  3 files changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> index a82c4f1..2cb1cc2 100644
> --- a/arch/x86/include/asm/reboot.h
> +++ b/arch/x86/include/asm/reboot.h
> @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
>  
>  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
>  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> +void run_crash_ipi_callback(struct pt_regs *regs);
>  
>  #endif /* _ASM_X86_REBOOT_H */
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 5e00de7..cbfa0b5 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -29,6 +29,7 @@
>  #include <asm/mach_traps.h>
>  #include <asm/nmi.h>
>  #include <asm/x86_init.h>
> +#include <asm/reboot.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/nmi.h>
> @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
>  	}
>  
>  	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> -	raw_spin_lock(&nmi_reason_lock);
> +	/*
> +	 * Another CPU may be processing panic routines while holding
> +	 * nmi_reason_lock.  Check if the CPU issued the IPI for crash
> +	 * dumping, and if so, call its callback directly.  If there is
> +	 * no CPU preparing crash dump, we simply loop here without doing
> +	 * special things.
> +	 */
> +	while (!raw_spin_trylock(&nmi_reason_lock))
> +		run_crash_ipi_callback(regs);

Added cpu_relax() here too:

+       while (!raw_spin_trylock(&nmi_reason_lock)) {
+               run_crash_ipi_callback(regs);
+               cpu_relax();
+       }

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec
  2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
                   ` (5 preceding siblings ...)
  2015-12-10  1:46 ` [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl Hidehiro Kawai
@ 2015-12-12 11:17 ` Borislav Petkov
  6 siblings, 0 replies; 20+ messages in thread
From: Borislav Petkov @ 2015-12-12 11:17 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	Baoquan He, linux-doc, x86, kexec, linux-kernel, Steven Rostedt,
	Michal Hocko, Masami Hiramatsu

On Thu, Dec 10, 2015 at 10:46:24AM +0900, Hidehiro Kawai wrote:
> Hidehiro Kawai (6):
>       panic/x86: Fix re-entrance problem due to panic on NMI
>       panic/x86: Allow CPUs to save registers even if they are looping in NMI context
>       kexec: Fix race between panic() and crash_kexec() called directly
>       x86/apic: Introduce apic_extnmi boot option
>       x86/nmi: Fix to save registers for crash dump on external NMI broadcast
>       Documentation: Add documentation for kernel.panic_on_io_nmi sysctl
> 
> 
>  Documentation/kernel-parameters.txt |    9 +++++++++
>  Documentation/sysctl/kernel.txt     |   15 +++++++++++++++
>  arch/x86/include/asm/apic.h         |    5 +++++
>  arch/x86/include/asm/reboot.h       |    1 +
>  arch/x86/kernel/apic/apic.c         |   35 +++++++++++++++++++++++++++++++++--
>  arch/x86/kernel/nmi.c               |   27 ++++++++++++++++++++++-----
>  arch/x86/kernel/reboot.c            |   28 ++++++++++++++++++++++++++++
>  include/linux/kernel.h              |   29 +++++++++++++++++++++++++++++
>  include/linux/kexec.h               |    2 ++
>  kernel/kexec_core.c                 |   30 +++++++++++++++++++++++++++++-
>  kernel/panic.c                      |   29 ++++++++++++++++++++++++-----
>  kernel/watchdog.c                   |    2 +-
>  12 files changed, 198 insertions(+), 14 deletions(-)

All applied,
thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] panic, x86: Fix re-entrance problem due to panic on NMI
  2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
  2015-12-10 15:41   ` Borislav Petkov
@ 2015-12-19 10:12   ` tip-bot for Hidehiro Kawai
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: masami.hiramatsu.pt, hpa, rusty, mingo, mhocko, prarit, dzickus,
	d.hatayama, peterz, sjenning, fweisbec, nicolas.iooss_linux,
	mina86, bp, vkuznets, atomlin, cmetcalf, linux-kernel, bhe,
	linux, corbet, javi.merino, dahi, rostedt, luto, uobergfe,
	hidehiro.kawai.ez, gobinda.cemk07, akpm, tglx, ebiederm, vgoyal

Commit-ID:  1717f2096b543cede7a380c858c765c41936bc35
Gitweb:     http://git.kernel.org/tip/1717f2096b543cede7a380c858c765c41936bc35
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:09 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:00 +0100

panic, x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic() is
recursively called. Kernel stalls, as a result, after failing to acquire
panic_lock.

To avoid this problem, don't call panic() in NMI context if we've
already entered panic().

For that, introduce nmi_panic() macro to reduce code duplication. In
the case of panic on NMI, don't return from NMI handlers if another CPU
already panicked.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/nmi.c  | 16 ++++++++++++----
 include/linux/kernel.h | 20 ++++++++++++++++++++
 kernel/panic.c         | 16 +++++++++++++---
 kernel/watchdog.c      |  2 +-
 4 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 697f90d..fca8793 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -255,8 +255,16 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 		 reason, smp_processor_id());
 	show_regs(regs);
 
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
+	if (panic_on_io_nmi) {
+		nmi_panic("NMI IOCK error: Not continuing");
+
+		/*
+		 * If we end up here, it means we have received an NMI while
+		 * processing panic(). Simply return without delaying and
+		 * re-enabling NMIs.
+		 */
+		return;
+	}
 
 	/* Re-enable the IOCK line, wait for a few seconds */
 	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -297,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 350dfb0..750cc5c 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -446,6 +446,26 @@ extern int sysctl_panic_on_stackoverflow;
 extern bool crash_kexec_post_notifiers;
 
 /*
+ * panic_cpu is used for synchronizing panic() and crash_kexec() execution. It
+ * holds a CPU number which is executing panic() currently. A value of
+ * PANIC_CPU_INVALID means no CPU has entered panic() or crash_kexec().
+ */
+extern atomic_t panic_cpu;
+#define PANIC_CPU_INVALID	-1
+
+/*
+ * A variant of panic() called from NMI context. We return if we've already
+ * panicked on this CPU.
+ */
+#define nmi_panic(fmt, ...)						\
+do {									\
+	int cpu = raw_smp_processor_id();				\
+									\
+	if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu) != cpu)	\
+		panic(fmt, ##__VA_ARGS__);				\
+} while (0)
+
+/*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
  */
diff --git a/kernel/panic.c b/kernel/panic.c
index 4b150bc..3344524 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -61,6 +61,8 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -71,17 +73,17 @@ void __weak panic_smp_self_stop(void)
  */
 void panic(const char *fmt, ...)
 {
-	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
 	int state = 0;
+	int old_cpu, this_cpu;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
 	 * from deadlocking the first cpu that invokes the panic, since
 	 * there is nothing to prevent an interrupt handler (that runs
-	 * after the panic_lock is acquired) from invoking panic again.
+	 * after setting panic_cpu) from invoking panic() again.
 	 */
 	local_irq_disable();
 
@@ -94,8 +96,16 @@ void panic(const char *fmt, ...)
 	 * multiple parallel invocations of panic, all other CPUs either
 	 * stop themself or will wait until they are stopped by the 1st CPU
 	 * with smp_send_stop().
+	 *
+	 * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which
+	 * comes here, so go ahead.
+	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
+	 * panic_cpu to this CPU.  In this case, this is also the 1st CPU.
 	 */
-	if (!spin_trylock(&panic_lock))
+	this_cpu = raw_smp_processor_id();
+	old_cpu  = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
+
+	if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
 	console_verbose();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 18f34cf..b9be18f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
-			panic("Hard LOCKUP");
+			nmi_panic("Hard LOCKUP");
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] panic, x86: Allow CPUs to save registers even if looping in NMI context
  2015-12-10  1:46 ` [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context Hidehiro Kawai
@ 2015-12-19 10:13   ` tip-bot for Hidehiro Kawai
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux, hpa, gobinda.cemk07, jiang.liu, sjenning, mingo, cmetcalf,
	vkuznets, linux-kernel, bhe, mina86, fweisbec, dyoung, dzickus,
	oleg, isimatu.yasuaki, tglx, peterz, bp, mhocko,
	masami.hiramatsu.pt, akpm, javi.merino, atomlin, prarit, corbet,
	dahi, nicolas.iooss_linux, vgoyal, ebiederm, luto, uobergfe,
	d.hatayama, rostedt, s.l-h, hidehiro.kawai.ez

Commit-ID:  58c5661f2144c089bbc2e5d87c9ec1dc1d2964fe
Gitweb:     http://git.kernel.org/tip/58c5661f2144c089bbc2e5d87c9ec1dc1d2964fe
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:10 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:01 +0100

panic, x86: Allow CPUs to save registers even if looping in NMI context

Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
sends an NMI IPI to CPUs which haven't called panic() to stop them,
save their register information and do some cleanups for crash dumping.
However, if such a CPU is infinitely looping in NMI context, we fail to
save its register information into the crash dump.

For example, this can happen when unknown NMIs are broadcast to all
CPUs as follows:

  CPU 0                             CPU 1
  ===========================       ==========================
  receive an unknown NMI
  unknown_nmi_error()
    panic()                         receive an unknown NMI
      spin_trylock(&panic_lock)     unknown_nmi_error()
      crash_kexec()                   panic()
                                        spin_trylock(&panic_lock)
                                        panic_smp_self_stop()
                                          infinite loop
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                          infinite loop...

Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
so the NMI is not handled and the callback function to save registers is
never called.

In practice, this can happen on some servers which broadcast NMIs to all
CPUs when the NMI button is pushed.

To save registers in this case, we need to:

  a) Return from NMI handler instead of looping infinitely
  or
  b) Call the callback function directly from the infinite loop

Inherently, a) is risky because NMI is also used to prevent corrupted
data from being propagated to devices.  So, we chose b).

This patch does the following:

1. Move the infinite looping of CPUs which haven't called panic() in NMI
   context (actually done by panic_smp_self_stop()) outside of panic() to
   enable us to refer pt_regs. Please note that panic_smp_self_stop() is
   still used for normal context.

2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
   registers and do some cleanups after setting waiting_for_crash_ipi which
   is used for counting down the number of CPUs which handled the callback

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: lkml <linux-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
[ Cleanup comments, fixup formatting. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/nmi.c    |  6 +++---
 arch/x86/kernel/reboot.c | 20 ++++++++++++++++++++
 include/linux/kernel.h   | 16 ++++++++++++----
 kernel/panic.c           |  9 +++++++++
 kernel/watchdog.c        |  2 +-
 5 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index fca8793..424aec4 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -256,7 +256,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
 	show_regs(regs);
 
 	if (panic_on_io_nmi) {
-		nmi_panic("NMI IOCK error: Not continuing");
+		nmi_panic(regs, "NMI IOCK error: Not continuing");
 
 		/*
 		 * If we end up here, it means we have received an NMI while
@@ -305,7 +305,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 02693dd..1da1302 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -718,6 +718,7 @@ static int crashing_cpu;
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
+static int crash_ipi_issued;
 
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
@@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	smp_send_nmi_allbutself();
 
+	/* Kick CPUs looping in NMI context. */
+	WRITE_ONCE(crash_ipi_issued, 1);
+
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
 		mdelay(1);
@@ -788,6 +792,22 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	/* Leave the nmi callback set */
 }
+
+/* Override the weak function in kernel/panic.c */
+void nmi_panic_self_stop(struct pt_regs *regs)
+{
+	while (1) {
+		/*
+		 * Wait for the crash dumping IPI to be issued, and then
+		 * call its callback directly.
+		 */
+		if (READ_ONCE(crash_ipi_issued))
+			crash_nmi_callback(0, regs); /* Don't return */
+
+		cpu_relax();
+	}
+}
+
 #else /* !CONFIG_SMP */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 750cc5c..7311c32 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -255,6 +255,7 @@ extern long (*panic_blink)(int state);
 __printf(1, 2)
 void panic(const char *fmt, ...)
 	__noreturn __cold;
+void nmi_panic_self_stop(struct pt_regs *);
 extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
@@ -455,14 +456,21 @@ extern atomic_t panic_cpu;
 
 /*
  * A variant of panic() called from NMI context. We return if we've already
- * panicked on this CPU.
+ * panicked on this CPU. If another CPU already panicked, loop in
+ * nmi_panic_self_stop() which can provide architecture dependent code such
+ * as saving register state for crash dump.
  */
-#define nmi_panic(fmt, ...)						\
+#define nmi_panic(regs, fmt, ...)					\
 do {									\
-	int cpu = raw_smp_processor_id();				\
+	int old_cpu, cpu;						\
 									\
-	if (atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu) != cpu)	\
+	cpu = raw_smp_processor_id();					\
+	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, cpu);	\
+									\
+	if (old_cpu == PANIC_CPU_INVALID)				\
 		panic(fmt, ##__VA_ARGS__);				\
+	else if (old_cpu != cpu)					\
+		nmi_panic_self_stop(regs);				\
 } while (0)
 
 /*
diff --git a/kernel/panic.c b/kernel/panic.c
index 3344524..06f31b49 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -61,6 +61,15 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+/*
+ * Stop ourselves in NMI context if another CPU has already panicked. Arch code
+ * may override this to prepare for crash dumping, e.g. save regs info.
+ */
+void __weak nmi_panic_self_stop(struct pt_regs *regs)
+{
+	panic_smp_self_stop();
+}
+
 atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b9be18f..84b5035 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			trigger_allbutself_cpu_backtrace();
 
 		if (hardlockup_panic)
-			nmi_panic("Hard LOCKUP");
+			nmi_panic(regs, "Hard LOCKUP");
 
 		__this_cpu_write(hard_watchdog_warn, true);
 		return;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] kexec: Fix race between panic() and crash_kexec()
  2015-12-10  1:46 ` [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
@ 2015-12-19 10:13   ` tip-bot for Hidehiro Kawai
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: d.hatayama, linux-kernel, schwidefsky, tglx, bhe, mhocko,
	sjenning, vgoyal, mnfhuang, hpa, peterz, vkuznets, mingo, bp,
	x86, hidehiro.kawai.ez, ebiederm, dyoung, masami.hiramatsu.pt,
	corbet, akpm, rostedt

Commit-ID:  7bbee5ca3896f69f09c68be549cb8997abe6bca6
Gitweb:     http://git.kernel.org/tip/7bbee5ca3896f69f09c68be549cb8997abe6bca6
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:11 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:01 +0100

kexec: Fix race between panic() and crash_kexec()

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        nmi_shootdown_cpus() // stop other CPUs

CPU 1:
  panic()
    crash_kexec()
      mutex_trylock() // failed to acquire
    smp_send_stop() // stop other CPUs
    infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        <NMI>
        io_check_error()
          panic()
            crash_kexec()
              mutex_trylock() // failed to acquire
            infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude others
by using the panic_cpu atomic.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/kexec.h |  2 ++
 kernel/kexec_core.c   | 30 +++++++++++++++++++++++++++++-
 kernel/panic.c        |  8 ++++++--
 3 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d140b1e..7b68d27 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage *image,
 					  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 					     const char *name);
+extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
@@ -332,6 +333,7 @@ int __weak arch_kexec_apply_relocations(const Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
 #define kexec_in_progress false
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 11b64a6..c823f30 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -853,7 +853,12 @@ struct kimage *kexec_image;
 struct kimage *kexec_crash_image;
 int kexec_load_disabled;
 
-void crash_kexec(struct pt_regs *regs)
+/*
+ * No panic_cpu check version of crash_kexec().  This function is called
+ * only when panic_cpu holds the current CPU number; this is the only CPU
+ * which processes crash_kexec routines.
+ */
+void __crash_kexec(struct pt_regs *regs)
 {
 	/* Take the kexec_mutex here to prevent sys_kexec_load
 	 * running on one cpu from replacing the crash kernel
@@ -876,6 +881,29 @@ void crash_kexec(struct pt_regs *regs)
 	}
 }
 
+void crash_kexec(struct pt_regs *regs)
+{
+	int old_cpu, this_cpu;
+
+	/*
+	 * Only one CPU is allowed to execute the crash_kexec() code as with
+	 * panic().  Otherwise parallel calls of panic() and crash_kexec()
+	 * may stop each other.  To exclude them, we use panic_cpu here too.
+	 */
+	this_cpu = raw_smp_processor_id();
+	old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
+	if (old_cpu == PANIC_CPU_INVALID) {
+		/* This is the 1st CPU which comes here, so go ahead. */
+		__crash_kexec(regs);
+
+		/*
+		 * Reset panic_cpu to allow another panic()/crash_kexec()
+		 * call.
+		 */
+		atomic_set(&panic_cpu, PANIC_CPU_INVALID);
+	}
+}
+
 size_t crash_get_memory_size(void)
 {
 	size_t size = 0;
diff --git a/kernel/panic.c b/kernel/panic.c
index 06f31b49..b333380 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -136,9 +136,11 @@ void panic(const char *fmt, ...)
 	 * everything else.
 	 * If we want to run this after calling panic_notifiers, pass
 	 * the "crash_kexec_post_notifiers" option to the kernel.
+	 *
+	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (!crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL);
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -161,9 +163,11 @@ void panic(const char *fmt, ...)
 	 * panic_notifiers and dumping kmsg before kdump.
 	 * Note: since some panic_notifiers can make crashed kernel
 	 * more unstable, it can increase risks of the kdump failure too.
+	 *
+	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
 	if (crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL);
 
 	bust_spinlocks(0);
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] x86/apic: Introduce apic_extnmi command line parameter
  2015-12-10  1:46 ` [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
@ 2015-12-19 10:13   ` tip-bot for Hidehiro Kawai
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mhocko, akpm, luto, x86, macro, hpa, tglx, ricardo.ribalda, bp,
	vgoyal, jiang.liu, linux-kernel, pbonzini, mingo, peterz,
	viresh.kumar, bsd, masami.hiramatsu.pt, joro, rostedt, bhe,
	corbet, ebiederm, hidehiro.kawai.ez

Commit-ID:  b7c4948e9881fb38b048269f376fb4bf194ce24a
Gitweb:     http://git.kernel.org/tip/b7c4948e9881fb38b048269f376fb4bf194ce24a
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:12 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:01 +0100

x86/apic: Introduce apic_extnmi command line parameter

This patch introduces a command line parameter apic_extnmi:

 apic_extnmi=( bsp|all|none )

The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.

"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.

If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bandan Das <bsd@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/kernel-parameters.txt |  9 +++++++++
 arch/x86/include/asm/apic.h         |  5 +++++
 arch/x86/kernel/apic/apic.c         | 34 ++++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 742f69d..74acea5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -472,6 +472,15 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			Change the amount of debugging information output
 			when initialising the APIC and IO-APIC components.
 
+	apic_extnmi=	[APIC,X86] External NMI delivery setting
+			Format: { bsp (default) | all | none }
+			bsp:  External NMI is delivered only to CPU 0
+			all:  External NMIs are broadcast to all CPUs as a
+			      backup of CPU 0
+			none: External NMI is masked for all CPUs. This is
+			      useful so that a dump capture kernel won't be
+			      shot down by NMI
+
 	autoconf=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 7f62ad4..c80f6b6 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -23,6 +23,11 @@
 #define APIC_VERBOSE 1
 #define APIC_DEBUG   2
 
+/* Macros for apic_extnmi which controls external NMI masking */
+#define APIC_EXTNMI_BSP		0 /* Default */
+#define APIC_EXTNMI_ALL		1
+#define APIC_EXTNMI_NONE	2
+
 /*
  * Define the default level of output to be very little
  * This can be turned up by using apic=verbose for more
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 8d7df74..8a5cdda 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -82,6 +82,12 @@ physid_mask_t phys_cpu_present_map;
 static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
 
 /*
+ * This variable controls which CPUs receive external NMIs.  By default,
+ * external NMIs are delivered only to the BSP.
+ */
+static int apic_extnmi = APIC_EXTNMI_BSP;
+
+/*
  * Map cpu index to physical APIC ID
  */
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
 	value = APIC_DM_NMI;
 	if (!lapic_is_integrated())		/* 82489DX */
 		value |= APIC_LVT_LEVEL_TRIGGER;
+	if (apic_extnmi == APIC_EXTNMI_NONE)
+		value |= APIC_LVT_MASKED;
 	apic_write(APIC_LVT1, value);
 }
 
@@ -1378,9 +1386,11 @@ void setup_local_APIC(void)
 	apic_write(APIC_LVT0, value);
 
 	/*
-	 * only the BP should see the LINT1 NMI signal, obviously.
+	 * Only the BSP sees the LINT1 NMI signal by default. This can be
+	 * modified by apic_extnmi= boot option.
 	 */
-	if (!cpu)
+	if ((!cpu && apic_extnmi != APIC_EXTNMI_NONE) ||
+	    apic_extnmi == APIC_EXTNMI_ALL)
 		value = APIC_DM_NMI;
 	else
 		value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -2557,3 +2567,23 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
 	return 0;
 }
 early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
+
+static int __init apic_set_extnmi(char *arg)
+{
+	if (!arg)
+		return -EINVAL;
+
+	if (!strncmp("all", arg, 3))
+		apic_extnmi = APIC_EXTNMI_ALL;
+	else if (!strncmp("none", arg, 4))
+		apic_extnmi = APIC_EXTNMI_NONE;
+	else if (!strncmp("bsp", arg, 3))
+		apic_extnmi = APIC_EXTNMI_BSP;
+	else {
+		pr_warn("Unknown external NMI delivery mode `%s' ignored\n", arg);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+early_param("apic_extnmi", apic_set_extnmi);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] x86/nmi: Save regs in crash dump on external NMI
  2015-12-10  6:52       ` [V6.1 " Hidehiro Kawai
  2015-12-11 18:04         ` Borislav Petkov
@ 2015-12-19 10:14         ` tip-bot for Hidehiro Kawai
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mhocko, dyoung, ebiederm, bhe, hidehiro.kawai.ez, mingo, rostedt,
	hpa, vgoyal, peterz, x86, akpm, corbet, masami.hiramatsu.pt,
	linux-kernel, s.l-h, luto, jiang.liu, tglx, bp

Commit-ID:  b279d67df88a49c6ca32b3eebd195660254be394
Gitweb:     http://git.kernel.org/tip/b279d67df88a49c6ca32b3eebd195660254be394
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:13 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:01 +0100

x86/nmi: Save regs in crash dump on external NMI

Now, multiple CPUs can receive an external NMI simultaneously by
specifying the "apic_extnmi=all" command line parameter. When we take
a crash dump by using external NMI with this option, we fail to save
registers into the crash dump. This happens as follows:

  CPU 0                              CPU 1
  ================================   =============================
  receive an external NMI
  default_do_nmi()                   receive an external NMI
    spin_lock(&nmi_reason_lock)      default_do_nmi()
    io_check_error()                   spin_lock(&nmi_reason_lock)
      panic()                            busy loop
      ...
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                         busy loop...

Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
IRET so the NMI is not handled and the callback function to save
registers is never called.

To solve this issue, we check if the IPI for crash dumping was issued
while waiting for nmi_reason_lock to be released, and if so, call its
callback function directly. If the IPI is not issued (e.g. kdump is
disabled), the actual behavior doesn't change.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/reboot.h |  1 +
 arch/x86/kernel/nmi.c         | 16 ++++++++++++++--
 arch/x86/kernel/reboot.c      | 24 +++++++++++++++++-------
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index a82c4f1..2cb1cc2 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
 
 typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
 void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+void run_crash_ipi_callback(struct pt_regs *regs);
 
 #endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 424aec4..8a2cdd7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -29,6 +29,7 @@
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
+#include <asm/reboot.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -356,8 +357,19 @@ static void default_do_nmi(struct pt_regs *regs)
 		return;
 	}
 
-	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
-	raw_spin_lock(&nmi_reason_lock);
+	/*
+	 * Non-CPU-specific NMI: NMI sources can be processed on any CPU.
+	 *
+	 * Another CPU may be processing panic routines while holding
+	 * nmi_reason_lock. Check if the CPU issued the IPI for crash dumping,
+	 * and if so, call its callback directly.  If there is no CPU preparing
+	 * crash dump, we simply loop here.
+	 */
+	while (!raw_spin_trylock(&nmi_reason_lock)) {
+		run_crash_ipi_callback(regs);
+		cpu_relax();
+	}
+
 	reason = x86_platform.get_nmi_reason();
 
 	if (reason & NMI_REASON_MASK) {
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 1da1302..d64889a 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -793,17 +793,23 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	/* Leave the nmi callback set */
 }
 
+/*
+ * Check if the crash dumping IPI got issued and if so, call its callback
+ * directly. This function is used when we have already been in NMI handler.
+ * It doesn't return.
+ */
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+	if (crash_ipi_issued)
+		crash_nmi_callback(0, regs);
+}
+
 /* Override the weak function in kernel/panic.c */
 void nmi_panic_self_stop(struct pt_regs *regs)
 {
 	while (1) {
-		/*
-		 * Wait for the crash dumping IPI to be issued, and then
-		 * call its callback directly.
-		 */
-		if (READ_ONCE(crash_ipi_issued))
-			crash_nmi_callback(0, regs); /* Don't return */
-
+		/* If no CPU is preparing crash dump, we simply loop here. */
+		run_crash_ipi_callback(regs);
 		cpu_relax();
 	}
 }
@@ -813,4 +819,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
 	/* No other CPUs to shoot down */
 }
+
+void run_crash_ipi_callback(struct pt_regs *regs)
+{
+}
 #endif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip:x86/apic] Documentation: Document kernel.panic_on_io_nmi sysctl
  2015-12-10  1:46 ` [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl Hidehiro Kawai
@ 2015-12-19 10:14   ` tip-bot for Hidehiro Kawai
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot for Hidehiro Kawai @ 2015-12-19 10:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vgoyal, linux-kernel, bp, uobergfe, bp, x86, rostedt, hpa,
	nicolas.iooss_linux, masami.hiramatsu.pt, xypron.glpk, peterz,
	cmetcalf, mhocko, sjenning, corbet, manfred, hidehiro.kawai.ez,
	tglx, bhe, dzickus, ebiederm, mingo, jkosina, akpm

Commit-ID:  9f318e3fcb1d4c48c26e8ca2ff2a459b82f36a23
Gitweb:     http://git.kernel.org/tip/9f318e3fcb1d4c48c26e8ca2ff2a459b82f36a23
Author:     Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
AuthorDate: Mon, 14 Dec 2015 11:19:14 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 19 Dec 2015 11:07:01 +0100

Documentation: Document kernel.panic_on_io_nmi sysctl

kernel.panic_on_io_nmi sysctl was introduced by commit

  5211a242d0cb ("x86: Add sysctl to allow panic on IOCK NMI error")

but its documentation is missing. So, add it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Requested-by: Borislav Petkov <bp@alien8.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014637.25437.71903.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/sysctl/kernel.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index af70d15..73c6b1e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -551,6 +551,21 @@ the recommended setting is 60.
 
 ==============================================================
 
+panic_on_io_nmi:
+
+Controls the kernel's behavior when a CPU receives an NMI caused by
+an IO error.
+
+0: try to continue operation (default)
+
+1: panic immediately. The IO error triggered an NMI. This indicates a
+   serious system condition which could result in IO data corruption.
+   Rather than continuing, panicking might be a better choice. Some
+   servers issue this sort of NMI when the dump button is pushed,
+   and you can use this option to take a crash dump.
+
+==============================================================
+
 panic_on_oops:
 
 Controls the kernel's behaviour when an oops or BUG is encountered.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-19 10:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10  1:46 [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 1/6] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
2015-12-10 15:41   ` Borislav Petkov
2015-12-11  0:23     ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-19 10:12   ` [tip:x86/apic] panic, x86: " tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 2/6] panic/x86: Allow CPUs to save registers even if they are looping in NMI context Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] panic, x86: Allow CPUs to save registers even if " tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 3/6] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] kexec: Fix race between panic() and crash_kexec() tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 4/6] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
2015-12-19 10:13   ` [tip:x86/apic] x86/apic: Introduce apic_extnmi command line parameter tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 5/6] x86/nmi: Fix to save registers for crash dump on external NMI broadcast Hidehiro Kawai
2015-12-10  3:57   ` kbuild test robot
2015-12-10  6:36     ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-10  6:52       ` [V6.1 " Hidehiro Kawai
2015-12-11 18:04         ` Borislav Petkov
2015-12-19 10:14         ` [tip:x86/apic] x86/nmi: Save regs in crash dump on external NMI tip-bot for Hidehiro Kawai
2015-12-10  1:46 ` [V6 PATCH 6/6] Documentation: Add documentation for kernel.panic_on_io_nmi sysctl Hidehiro Kawai
2015-12-19 10:14   ` [tip:x86/apic] Documentation: Document " tip-bot for Hidehiro Kawai
2015-12-12 11:17 ` [V6 PATCH 0/6] Fix race issues among panic, NMI and crash_kexec Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.