linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
@ 2015-09-25 11:28 Hidehiro Kawai
  2015-09-25 11:28 ` [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Hidehiro Kawai @ 2015-09-25 11:28 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Masami Hiramatsu

When an HA clustering software or administrator detects unresponsivenes
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump.  If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issues, this patch set does following things:

- Don't call panic() on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "noextnmi" boot option which masks external NMI at the
  boot time (supported only for x86)

This patch set can be applied to current -tip tree.

V4:
- Improve comments and descriptions (PATCH 1/4 to 3/4)
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current cpu or not
  (PATCH 3/4)

V3: https://lkml.org/lkml/2015/8/6/39
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2: https://lkml.org/lkml/2015/7/27/31
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
  concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

V1: https://lkml.org/lkml/2015/7/22/81

---

Hidehiro Kawai (4):
      panic/x86: Fix re-entrance problem due to panic on NMI
      panic/x86: Allow cpus to save registers even if they are looping in NMI context
      kexec: Fix race between panic() and crash_kexec() called directly
      x86/apic: Introduce noextnmi boot option


 Documentation/kernel-parameters.txt |    4 ++++
 arch/x86/kernel/apic/apic.c         |   17 ++++++++++++++++-
 arch/x86/kernel/nmi.c               |   16 ++++++++++++----
 arch/x86/kernel/reboot.c            |   11 +++++++++++
 include/linux/kernel.h              |   21 +++++++++++++++++++++
 include/linux/kexec.h               |    1 +
 kernel/kexec_core.c                 |   26 +++++++++++++++++++++++++-
 kernel/panic.c                      |   29 ++++++++++++++++++++++++-----
 kernel/watchdog.c                   |    5 +++--
 9 files changed, 117 insertions(+), 13 deletions(-)


-- 
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-09-25 11:28 [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
@ 2015-09-25 11:28 ` Hidehiro Kawai
  2015-09-25 12:13   ` 河合英宏 / KAWAI,HIDEHIRO
  2015-09-25 11:28 ` [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Hidehiro Kawai @ 2015-09-25 11:28 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	Masami Hiramatsu

If panic on NMI happens just after panic() on the same CPU, panic()
is recursively called.  As the result, it stalls after failing to
acquire panic_lock.

To avoid this problem, don't call panic() in NMI context if
we've already entered panic().

V4:
- Improve comments in io_check_error() and panic()

V3:
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
  if another cpu already panicked

V2:
- Use atomic_cmpxchg() instead of current spin_trylock() to
  exclude concurrent accesses to the panic routines
- Don't introduce no-lock version of panic()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/kernel/nmi.c  |   16 ++++++++++++----
 include/linux/kernel.h |   13 +++++++++++++
 kernel/panic.c         |   15 ++++++++++++---
 kernel/watchdog.c      |    2 +-
 4 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 697f90d..5131714 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -255,8 +255,16 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 		 reason, smp_processor_id());
 	show_regs(regs);
 
-	if (panic_on_io_nmi)
-		panic("NMI IOCK error: Not continuing");
+	if (panic_on_io_nmi) {
+		nmi_panic("NMI IOCK error: Not continuing");
+
+		/*
+		 * If we return from nmi_panic(), it means we have received
+		 * NMI while processing panic().  So, simply return without
+		 * a delay and re-enabling NMI.
+		 */
+		return;
+	}
 
 	/* Re-enable the IOCK line, wait for a few seconds */
 	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -297,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		panic("NMI: Not continuing");
+		nmi_panic("NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410..57c33da 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -443,6 +443,19 @@ extern __scanf(2, 0)
 
 extern bool crash_kexec_post_notifiers;
 
+extern atomic_t panic_cpu;
+
+/*
+ * A variant of panic() called from NMI context.
+ * If we've already panicked on this cpu, return from here.
+ */
+#define nmi_panic(fmt, ...)						\
+	do {								\
+		int this_cpu = raw_smp_processor_id();			\
+		if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
+			panic(fmt, ##__VA_ARGS__);			\
+	} while (0)
+
 /*
  * Only to be used by arch init code. If the user over-wrote the default
  * CONFIG_PANIC_TIMEOUT, honor it.
diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff..a105e67 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+atomic_t panic_cpu = ATOMIC_INIT(-1);
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
  */
 void panic(const char *fmt, ...)
 {
-	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
 	int state = 0;
+	int old_cpu, this_cpu;
 
 	/*
 	 * Disable local interrupts. This will prevent panic_smp_self_stop
 	 * from deadlocking the first cpu that invokes the panic, since
 	 * there is nothing to prevent an interrupt handler (that runs
-	 * after the panic_lock is acquired) from invoking panic again.
+	 * after setting panic_cpu) from invoking panic again.
 	 */
 	local_irq_disable();
 
@@ -93,8 +95,15 @@ void panic(const char *fmt, ...)
 	 * multiple parallel invocations of panic, all other CPUs either
 	 * stop themself or will wait until they are stopped by the 1st CPU
 	 * with smp_send_stop().
+	 *
+	 * `old_cpu == -1' means this is the 1st CPU which comes here, so
+	 * go ahead.
+	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
+	 * panic_cpu to this cpu.  In this case, this is also the 1st CPU.
 	 */
-	if (!spin_trylock(&panic_lock))
+	this_cpu = raw_smp_processor_id();
+	old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
+	if (old_cpu != -1 && old_cpu != this_cpu)
 		panic_smp_self_stop();
 
 	console_verbose();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 64ed1c3..00fbaa29 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -324,7 +324,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			return;
 
 		if (hardlockup_panic)
-			panic("Watchdog detected hard LOCKUP on cpu %d",
+			nmi_panic("Watchdog detected hard LOCKUP on cpu %d",
 			      this_cpu);
 		else
 			WARN(1, "Watchdog detected hard LOCKUP on cpu %d",



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
  2015-09-25 11:28 [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
  2015-09-25 11:28 ` [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
@ 2015-09-25 11:28 ` Hidehiro Kawai
  2015-09-30 11:50   ` Peter Zijlstra
  2015-09-25 11:28 ` [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
  2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
  3 siblings, 1 reply; 38+ messages in thread
From: Hidehiro Kawai @ 2015-09-25 11:28 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	Masami Hiramatsu

nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
to non-panic cpus to stop them while saving their register
information and doing some cleanups for crash dumping.  So if a
non-panic cpus is infinitely looping in NMI context, we fail to
save its register information and lose the information from the
crash dump.

`Infinite loop in NMI context' can happen when panic on NMI happens
while another cpu has already been processing panic().  To save
registers in that case too, this patch does following two things:

1. Move the timing of `infinite loop in NMI context' (actually
   done by panic_smp_self_stop()) outside of panic() to enable us to
   refer pt_regs
2. call a callback of nmi_shootdown_cpus() directly to save
   registers and do some cleanups after setting waiting_for_crash_ipi
   which is used for counting down the number of cpus which handled
   the callback

V4:
- Rewrite the patch description

V3:
- Newly introduced

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
---
 arch/x86/kernel/nmi.c    |    6 +++---
 arch/x86/kernel/reboot.c |   11 +++++++++++
 include/linux/kernel.h   |   12 ++++++++++--
 kernel/panic.c           |   10 ++++++++++
 kernel/watchdog.c        |    5 +++--
 5 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5131714..5e00de7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 #endif
 
 	if (panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 
@@ -256,7 +256,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 	show_regs(regs);
 
 	if (panic_on_io_nmi) {
-		nmi_panic("NMI IOCK error: Not continuing");
+		nmi_panic(regs, "NMI IOCK error: Not continuing");
 
 		/*
 		 * If we return from nmi_panic(), it means we have received
@@ -305,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 
 	pr_emerg("Do you have a strange power saving mode enabled?\n");
 	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
-		nmi_panic("NMI: Not continuing");
+		nmi_panic(regs, "NMI: Not continuing");
 
 	pr_emerg("Dazed and confused, but trying to continue\n");
 }
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 02693dd..d82259b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
 static nmi_shootdown_cb shootdown_callback;
 
 static atomic_t waiting_for_crash_ipi;
+static int crash_ipi_done;
 
 static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
 {
@@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	wmb();
 
 	smp_send_nmi_allbutself();
+	crash_ipi_done = 1; /* Kick cpus looping in nmi context */
 
 	msecs = 1000; /* Wait at most a second for the other cpus to stop */
 	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
@@ -788,6 +790,15 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 
 	/* Leave the nmi callback set */
 }
+
+void nmi_panic_self_stop(struct pt_regs *regs)
+{
+	while (crash_ipi_done == 0)
+		cpu_relax();
+
+	crash_nmi_callback(0, regs); /* Shouldn't return */
+}
+
 #else /* !CONFIG_SMP */
 void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 {
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 57c33da..9fe9961 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -255,6 +255,7 @@ static inline void might_fault(void) { }
 __printf(1, 2)
 void panic(const char *fmt, ...)
 	__noreturn __cold;
+void nmi_panic_self_stop(struct pt_regs *);
 extern void oops_enter(void);
 extern void oops_exit(void);
 void print_oops_end_marker(void);
@@ -448,12 +449,19 @@ extern __scanf(2, 0)
 /*
  * A variant of panic() called from NMI context.
  * If we've already panicked on this cpu, return from here.
+ * If another cpu already panicked, loop in nmi_panic_self_stop() which
+ * can provide architecture dependent code such as saving register states
+ * for crash dump.
  */
-#define nmi_panic(fmt, ...)						\
+#define nmi_panic(regs, fmt, ...)					\
 	do {								\
+		int old_cpu;						\
 		int this_cpu = raw_smp_processor_id();			\
-		if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
+		old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);	\
+		if (old_cpu == -1)					\
 			panic(fmt, ##__VA_ARGS__);			\
+		else if (old_cpu != this_cpu)				\
+			nmi_panic_self_stop(regs);			\
 	} while (0)
 
 /*
diff --git a/kernel/panic.c b/kernel/panic.c
index a105e67..cddbfe0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,16 @@ void __weak panic_smp_self_stop(void)
 		cpu_relax();
 }
 
+/*
+ * Stop ourself in NMI context if another cpu has already panicked.
+ * Architecture code may override this to prepare for crash dumping
+ * (e.g. save register information).
+ */
+void __weak nmi_panic_self_stop(struct pt_regs *regs)
+{
+	panic_smp_self_stop();
+}
+
 atomic_t panic_cpu = ATOMIC_INIT(-1);
 
 /**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 00fbaa29..0074e5d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -324,8 +324,9 @@ static void watchdog_overflow_callback(struct perf_event *event,
 			return;
 
 		if (hardlockup_panic)
-			nmi_panic("Watchdog detected hard LOCKUP on cpu %d",
-			      this_cpu);
+			nmi_panic(regs,
+				  "Watchdog detected hard LOCKUP on cpu %d",
+				  this_cpu);
 		else
 			WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
 			     this_cpu);



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-25 11:28 [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
  2015-09-25 11:28 ` [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
  2015-09-25 11:28 ` [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
@ 2015-09-25 11:28 ` Hidehiro Kawai
  2015-09-28  3:53   ` kbuild test robot
  2015-09-28  4:02   ` kbuild test robot
  2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
  3 siblings, 2 replies; 38+ messages in thread
From: Hidehiro Kawai @ 2015-09-25 11:28 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Masami Hiramatsu

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        nmi_shootdown_cpus() // stop other cpus

CPU 1:
  panic()
    crash_kexec()
      mutex_trylock() // failed to acquire
    smp_send_stop() // stop other cpus
    infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
  oops_end()
    crash_kexec()
      mutex_trylock() // acquired
        <NMI>
        io_check_error()
          panic()
            crash_kexec()
              mutex_trylock() // failed to acquire
            infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude
others by using atomic_t panic_cpu.

V4:
- Use new __crash_kexec(), no exclusion check version of crash_kexec(),
  instead of checking if panic_cpu is the current cpu or not

V2:
- Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
  to exclude concurrent accesses
- Don't introduce no-lock version of crash_kexec()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 include/linux/kexec.h |    1 +
 kernel/kexec_core.c   |   26 +++++++++++++++++++++++++-
 kernel/panic.c        |    4 ++--
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d140b1e..f0cd2fa 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -237,6 +237,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage *image,
 					  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 					     const char *name);
+extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 201b453..4edb20a 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -853,7 +853,8 @@ int kimage_load_segment(struct kimage *image,
 struct kimage *kexec_crash_image;
 int kexec_load_disabled;
 
-void crash_kexec(struct pt_regs *regs)
+/* No panic_cpu check version of crash_kexec */
+void __crash_kexec(struct pt_regs *regs)
 {
 	/* Take the kexec_mutex here to prevent sys_kexec_load
 	 * running on one cpu from replacing the crash kernel
@@ -876,6 +877,29 @@ void crash_kexec(struct pt_regs *regs)
 	}
 }
 
+void crash_kexec(struct pt_regs *regs)
+{
+	int old_cpu, this_cpu;
+
+	/*
+	 * Only one CPU is allowed to execute the crash_kexec() code as with
+	 * panic().  Otherwise parallel calls of panic() and crash_kexec()
+	 * may stop each other.  To exclude them, we use panic_cpu here too.
+	 */
+	this_cpu = raw_smp_processor_id();
+	old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
+	if (old_cpu == -1) {
+		/* This is the 1st CPU which comes here, so go ahead. */
+		__crash_kexec(regs);
+
+		/*
+		 * Reset panic_cpu to allow another panic()/crash_kexec()
+		 * call.
+		 */
+		atomic_xchg(&panic_cpu, -1);
+	}
+}
+
 size_t crash_get_memory_size(void)
 {
 	size_t size = 0;
diff --git a/kernel/panic.c b/kernel/panic.c
index cddbfe0..994be45 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -137,7 +137,7 @@ void panic(const char *fmt, ...)
 	 * the "crash_kexec_post_notifiers" option to the kernel.
 	 */
 	if (!crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL);
 
 	/*
 	 * Note smp_send_stop is the usual smp shutdown function, which
@@ -162,7 +162,7 @@ void panic(const char *fmt, ...)
 	 * more unstable, it can increase risks of the kdump failure too.
 	 */
 	if (crash_kexec_post_notifiers)
-		crash_kexec(NULL);
+		__crash_kexec(NULL);
 
 	bust_spinlocks(0);
 



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-09-25 11:28 [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
                   ` (2 preceding siblings ...)
  2015-09-25 11:28 ` [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
@ 2015-09-25 11:28 ` Hidehiro Kawai
  2015-09-30 11:55   ` Peter Zijlstra
                     ` (2 more replies)
  3 siblings, 3 replies; 38+ messages in thread
From: Hidehiro Kawai @ 2015-09-25 11:28 UTC (permalink / raw)
  To: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	Masami Hiramatsu

This patch introduces new boot option "noextnmi" which disables
external NMI.  This option is useful for the dump capture kernel
so that an HA application or administrator wouldn't mistakenly
shoot down the kernel by NMI.

Currently, only x86 supports this option.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/kernel-parameters.txt |    4 ++++
 arch/x86/kernel/apic/apic.c         |   17 ++++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 22a4b68..8bcaccd 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			noexec=on: enable non-executable mappings (default)
 			noexec=off: disable non-executable mappings
 
+	noextnmi	[X86]
+			Mask external NMI.  This option is useful for a
+			dump capture kernel to be shot down by NMI.
+
 	nosmap		[X86]
 			Disable SMAP (Supervisor Mode Access Prevention)
 			even if it is supported by processor.
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 24e94ce..fd47128 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -82,6 +82,12 @@
 static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
 
 /*
+ * Don't enable external NMI via LINT1 on BSP.  This is useful for
+ * the dump capture kernel.
+ */
+static bool apic_noextnmi;
+
+/*
  * Map cpu index to physical APIC ID
  */
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
 	value = APIC_DM_NMI;
 	if (!lapic_is_integrated())		/* 82489DX */
 		value |= APIC_LVT_LEVEL_TRIGGER;
+	if (apic_noextnmi)
+		value |= APIC_LVT_MASKED;
 	apic_write(APIC_LVT1, value);
 }
 
@@ -1380,7 +1388,7 @@ void setup_local_APIC(void)
 	/*
 	 * only the BP should see the LINT1 NMI signal, obviously.
 	 */
-	if (!cpu)
+	if (!cpu && !apic_noextnmi)
 		value = APIC_DM_NMI;
 	else
 		value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -2548,3 +2556,10 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
 	return 0;
 }
 early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
+
+static int __init apic_set_noextnmi(char *arg)
+{
+	apic_noextnmi = true;
+	return 0;
+}
+early_param("noextnmi", apic_set_noextnmi);



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-09-25 11:28 ` [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
@ 2015-09-25 12:13   ` 河合英宏 / KAWAI,HIDEHIRO
  2015-09-30 11:26     ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-09-25 12:13 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal
  Cc: linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6012 bytes --]

Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
previoius discussion, but it still exists.  So, I didn't change
anything about panic_on_unrecovered_nmi.

Thanks,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

> From: Hidehiro Kawai [mailto:hidehiro.kawai.ez@hitachi.com]
> 
> If panic on NMI happens just after panic() on the same CPU, panic()
> is recursively called.  As the result, it stalls after failing to
> acquire panic_lock.
> 
> To avoid this problem, don't call panic() in NMI context if
> we've already entered panic().
> 
> V4:
> - Improve comments in io_check_error() and panic()
> 
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
>   if another cpu already panicked
> 
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to
>   exclude concurrent accesses to the panic routines
> - Don't introduce no-lock version of panic()
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> ---
>  arch/x86/kernel/nmi.c  |   16 ++++++++++++----
>  include/linux/kernel.h |   13 +++++++++++++
>  kernel/panic.c         |   15 ++++++++++++---
>  kernel/watchdog.c      |    2 +-
>  4 files changed, 38 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 697f90d..5131714 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
>  #endif
> 
>  	if (panic_on_unrecovered_nmi)
> -		panic("NMI: Not continuing");
> +		nmi_panic("NMI: Not continuing");
> 
>  	pr_emerg("Dazed and confused, but trying to continue\n");
> 
> @@ -255,8 +255,16 @@ void unregister_nmi_handler(unsigned int type, const char *name)
>  		 reason, smp_processor_id());
>  	show_regs(regs);
> 
> -	if (panic_on_io_nmi)
> -		panic("NMI IOCK error: Not continuing");
> +	if (panic_on_io_nmi) {
> +		nmi_panic("NMI IOCK error: Not continuing");
> +
> +		/*
> +		 * If we return from nmi_panic(), it means we have received
> +		 * NMI while processing panic().  So, simply return without
> +		 * a delay and re-enabling NMI.
> +		 */
> +		return;
> +	}
> 
>  	/* Re-enable the IOCK line, wait for a few seconds */
>  	reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
> @@ -297,7 +305,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
> 
>  	pr_emerg("Do you have a strange power saving mode enabled?\n");
>  	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> -		panic("NMI: Not continuing");
> +		nmi_panic("NMI: Not continuing");
> 
>  	pr_emerg("Dazed and confused, but trying to continue\n");
>  }
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 5582410..57c33da 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -443,6 +443,19 @@ extern __scanf(2, 0)
> 
>  extern bool crash_kexec_post_notifiers;
> 
> +extern atomic_t panic_cpu;
> +
> +/*
> + * A variant of panic() called from NMI context.
> + * If we've already panicked on this cpu, return from here.
> + */
> +#define nmi_panic(fmt, ...)						\
> +	do {								\
> +		int this_cpu = raw_smp_processor_id();			\
> +		if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> +			panic(fmt, ##__VA_ARGS__);			\
> +	} while (0)
> +
>  /*
>   * Only to be used by arch init code. If the user over-wrote the default
>   * CONFIG_PANIC_TIMEOUT, honor it.
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 04e91ff..a105e67 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
>  		cpu_relax();
>  }
> 
> +atomic_t panic_cpu = ATOMIC_INIT(-1);
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
>   */
>  void panic(const char *fmt, ...)
>  {
> -	static DEFINE_SPINLOCK(panic_lock);
>  	static char buf[1024];
>  	va_list args;
>  	long i, i_next = 0;
>  	int state = 0;
> +	int old_cpu, this_cpu;
> 
>  	/*
>  	 * Disable local interrupts. This will prevent panic_smp_self_stop
>  	 * from deadlocking the first cpu that invokes the panic, since
>  	 * there is nothing to prevent an interrupt handler (that runs
> -	 * after the panic_lock is acquired) from invoking panic again.
> +	 * after setting panic_cpu) from invoking panic again.
>  	 */
>  	local_irq_disable();
> 
> @@ -93,8 +95,15 @@ void panic(const char *fmt, ...)
>  	 * multiple parallel invocations of panic, all other CPUs either
>  	 * stop themself or will wait until they are stopped by the 1st CPU
>  	 * with smp_send_stop().
> +	 *
> +	 * `old_cpu == -1' means this is the 1st CPU which comes here, so
> +	 * go ahead.
> +	 * `old_cpu == this_cpu' means we came from nmi_panic() which sets
> +	 * panic_cpu to this cpu.  In this case, this is also the 1st CPU.
>  	 */
> -	if (!spin_trylock(&panic_lock))
> +	this_cpu = raw_smp_processor_id();
> +	old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> +	if (old_cpu != -1 && old_cpu != this_cpu)
>  		panic_smp_self_stop();
> 
>  	console_verbose();
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 64ed1c3..00fbaa29 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -324,7 +324,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  			return;
> 
>  		if (hardlockup_panic)
> -			panic("Watchdog detected hard LOCKUP on cpu %d",
> +			nmi_panic("Watchdog detected hard LOCKUP on cpu %d",
>  			      this_cpu);
>  		else
>  			WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
> 

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-25 11:28 ` [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
@ 2015-09-28  3:53   ` kbuild test robot
  2015-09-28  7:08     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-09-28  4:02   ` kbuild test robot
  1 sibling, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2015-09-28  3:53 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: kbuild-all, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Eric W. Biederman, H. Peter Anvin, Andrew Morton,
	Thomas Gleixner, Vivek Goyal, linux-doc, x86, kexec,
	linux-kernel, Michal Hocko, Masami Hiramatsu

[-- Attachment #1: Type: text/plain, Size: 2637 bytes --]

Hi Hidehiro,

[auto build test results on v4.3-rc2 -- if it's inappropriate base, please ignore]

config: ia64-allyesconfig (attached as .config)
reproduce:
  wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
  chmod +x ~/bin/make.cross
  git checkout 0077681103150af584e5e592c0238fd010654c26
  # save the attached .config to linux build tree
  make.cross ARCH=ia64 

All warnings (new ones prefixed by >>):

   In file included from arch/ia64/include/uapi/asm/intrinsics.h:21:0,
                    from arch/ia64/include/asm/intrinsics.h:10,
                    from arch/ia64/include/asm/bitops.h:18,
                    from include/linux/bitops.h:36,
                    from include/linux/kernel.h:10,
                    from include/linux/list.h:8,
                    from include/linux/preempt.h:10,
                    from include/linux/spinlock.h:50,
                    from include/linux/mmzone.h:7,
                    from include/linux/gfp.h:5,
                    from include/linux/mm.h:9,
                    from kernel/kexec_core.c:12:
   kernel/kexec_core.c: In function 'crash_kexec':
   arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not used [-Wunused-value]
    ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
     ^
   arch/ia64/include/asm/atomic.h:135:30: note: in expansion of macro 'xchg'
    #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
                                 ^
>> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
      atomic_xchg(&panic_cpu, -1);
      ^

vim +/atomic_xchg +899 kernel/kexec_core.c

   883	
   884		/*
   885		 * Only one CPU is allowed to execute the crash_kexec() code as with
   886		 * panic().  Otherwise parallel calls of panic() and crash_kexec()
   887		 * may stop each other.  To exclude them, we use panic_cpu here too.
   888		 */
   889		this_cpu = raw_smp_processor_id();
   890		old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
   891		if (old_cpu == -1) {
   892			/* This is the 1st CPU which comes here, so go ahead. */
   893			__crash_kexec(regs);
   894	
   895			/*
   896			 * Reset panic_cpu to allow another panic()/crash_kexec()
   897			 * call.
   898			 */
 > 899			atomic_xchg(&panic_cpu, -1);
   900		}
   901	}
   902	
   903	size_t crash_get_memory_size(void)
   904	{
   905		size_t size = 0;
   906	
   907		mutex_lock(&kexec_mutex);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 43669 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-25 11:28 ` [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
  2015-09-28  3:53   ` kbuild test robot
@ 2015-09-28  4:02   ` kbuild test robot
  2015-09-28  4:46     ` 河合英宏 / KAWAI,HIDEHIRO
  1 sibling, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2015-09-28  4:02 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: kbuild-all, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Eric W. Biederman, H. Peter Anvin, Andrew Morton,
	Thomas Gleixner, Vivek Goyal, linux-doc, x86, kexec,
	linux-kernel, Michal Hocko, Masami Hiramatsu

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

Hi Hidehiro,

[auto build test results on v4.3-rc2 -- if it's inappropriate base, please ignore]

config: x86_64-allnoconfig (attached as .config)
reproduce:
  git checkout 0077681103150af584e5e592c0238fd010654c26
  # save the attached .config to linux build tree
  make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   kernel/panic.c: In function 'panic':
>> kernel/panic.c:140:3: error: implicit declaration of function '__crash_kexec' [-Werror=implicit-function-declaration]
      __crash_kexec(NULL);
      ^
   cc1: some warnings being treated as errors

vim +/__crash_kexec +140 kernel/panic.c

   134		 * If we have crashed and we have a crash kernel loaded let it handle
   135		 * everything else.
   136		 * If we want to run this after calling panic_notifiers, pass
   137		 * the "crash_kexec_post_notifiers" option to the kernel.
   138		 */
   139		if (!crash_kexec_post_notifiers)
 > 140			__crash_kexec(NULL);
   141	
   142		/*
   143		 * Note smp_send_stop is the usual smp shutdown function, which

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 6030 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-28  4:02   ` kbuild test robot
@ 2015-09-28  4:46     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-09-28  4:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, Michal Hocko,
	平松雅巳 / HIRAMATU,MASAMI,
	kbuild-all

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1856 bytes --]

> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please ignore]
> 
> config: x86_64-allnoconfig (attached as .config)
> reproduce:
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make ARCH=x86_64
> 
> All error/warnings (new ones prefixed by >>):
> 
>    kernel/panic.c: In function 'panic':
> >> kernel/panic.c:140:3: error: implicit declaration of function '__crash_kexec' [-Werror=implicit-function-declaration]
>       __crash_kexec(NULL);
>       ^

Sorry, I missed to take into account the case of !CONFIG_KEXEC_CORE.

 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
+static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }

I'll resend the revised version later.

>    cc1: some warnings being treated as errors
> 
> vim +/__crash_kexec +140 kernel/panic.c
> 
>    134		 * If we have crashed and we have a crash kernel loaded let it handle
>    135		 * everything else.
>    136		 * If we want to run this after calling panic_notifiers, pass
>    137		 * the "crash_kexec_post_notifiers" option to the kernel.
>    138		 */
>    139		if (!crash_kexec_post_notifiers)
>  > 140			__crash_kexec(NULL);
>    141
>    142		/*
>    143		 * Note smp_send_stop is the usual smp shutdown function, which
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-28  3:53   ` kbuild test robot
@ 2015-09-28  7:08     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-09-30 11:53       ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-09-28  7:08 UTC (permalink / raw)
  To: kbuild-all
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2578 bytes --]

> Hi Hidehiro,
> 
> [auto build test results on v4.3-rc2 -- if it's inappropriate base, please ignore]
> 
> config: ia64-allyesconfig (attached as .config)
> reproduce:
>   wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>   chmod +x ~/bin/make.cross
>   git checkout 0077681103150af584e5e592c0238fd010654c26
>   # save the attached .config to linux build tree
>   make.cross ARCH=ia64
[snip]
>    arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is not used [-Wunused-value]
>     ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
>      ^
>    arch/ia64/include/asm/atomic.h:135:30: note: in expansion of macro 'xchg'
>     #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
>                                  ^
> >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
>       atomic_xchg(&panic_cpu, -1);
>       ^

I changed to use atomic_xchg() instead of atomic_set() in V3
because atomic_set() doesn't mean memory barrier.  However,
I thought again and there is no need of barrier; there is no
problem if a competitor sees old value of panic_cpu or new one.
So, atomic_set() is sufficient and using it will remove this warning.

I will resend the fixed version later.

> vim +/atomic_xchg +899 kernel/kexec_core.c
> 
>    883
>    884		/*
>    885		 * Only one CPU is allowed to execute the crash_kexec() code as with
>    886		 * panic().  Otherwise parallel calls of panic() and crash_kexec()
>    887		 * may stop each other.  To exclude them, we use panic_cpu here too.
>    888		 */
>    889		this_cpu = raw_smp_processor_id();
>    890		old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
>    891		if (old_cpu == -1) {
>    892			/* This is the 1st CPU which comes here, so go ahead. */
>    893			__crash_kexec(regs);
>    894
>    895			/*
>    896			 * Reset panic_cpu to allow another panic()/crash_kexec()
>    897			 * call.
>    898			 */
>  > 899			atomic_xchg(&panic_cpu, -1);
>    900		}
>    901	}
>    902
>    903	size_t crash_get_memory_size(void)
>    904	{
>    905		size_t size = 0;
>    906
>    907		mutex_lock(&kexec_mutex);
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-09-25 12:13   ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-09-30 11:26     ` Peter Zijlstra
  2015-10-01  1:02       ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:26 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Fri, Sep 25, 2015 at 12:13:55PM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> previoius discussion, but it still exists.  So, I didn't change
> anything about panic_on_unrecovered_nmi.
> 

> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
> >  #endif
> > 
> >  	if (panic_on_unrecovered_nmi)
> > -		panic("NMI: Not continuing");
> > +		nmi_panic("NMI: Not continuing");
> > 
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> > 

I was looking at unregister_nmi_handler() because that's the function
the diff points to. That still doesn't have panic_on_unrecovered_nmi.

It looks like your diff tool is 'broken' and generates nonsense function
data.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
  2015-09-25 11:28 ` [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
@ 2015-09-30 11:50   ` Peter Zijlstra
  2015-10-01  1:43     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:50 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar, Masami Hiramatsu

On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
>  static nmi_shootdown_cb shootdown_callback;
>  
>  static atomic_t waiting_for_crash_ipi;
> +static int crash_ipi_done;
>  
>  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
>  {
> @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
>  	wmb();
>  
>  	smp_send_nmi_allbutself();
> +	crash_ipi_done = 1; /* Kick cpus looping in nmi context */

I would suggest using WRITE_ONCE() for that, because without the
volatile the compiler need not actually emit the store until after the
whole waiting thing _IF_ it can inline the whole thing.

Currently udelay() will end up being a function call and will therefore
force the store to be emitted, but I'd rather not rely on that.

>  
>  	msecs = 1000; /* Wait at most a second for the other cpus to stop */
>  	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-28  7:08     ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-09-30 11:53       ` Peter Zijlstra
  2015-10-01  2:04         ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:53 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: kbuild-all, Jonathan Corbet, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko,
	平松雅巳 / HIRAMATU,MASAMI

On Mon, Sep 28, 2015 at 07:08:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> >       atomic_xchg(&panic_cpu, -1);
> >       ^
> 
> I changed to use atomic_xchg() instead of atomic_set() in V3
> because atomic_set() doesn't mean memory barrier.  However,
> I thought again and there is no need of barrier; there is no
> problem if a competitor sees old value of panic_cpu or new one.
> So, atomic_set() is sufficient and using it will remove this warning.
> 
> I will resend the fixed version later.

So if you rely on the memory barrier; you should have also put a comment
on explaining the ordering requirements.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
@ 2015-09-30 11:55   ` Peter Zijlstra
  2015-10-01  2:33     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-13 20:22   ` Thomas Gleixner
  2015-10-27  8:46   ` Baoquan He
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2015-09-30 11:55 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar, Masami Hiramatsu

On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> This patch introduces new boot option "noextnmi" which disables
> external NMI.  This option is useful for the dump capture kernel
> so that an HA application or administrator wouldn't mistakenly
> shoot down the kernel by NMI.

So that they can get really stuck when the crash kernel crashes, right?
;-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
  2015-09-30 11:26     ` Peter Zijlstra
@ 2015-10-01  1:02       ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01  1:02 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1271 bytes --]

> On Fri, Sep 25, 2015 at 12:13:55PM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Peter saids -tip tree doesn't have panic_on_unrecovered_nmi in the
> > previoius discussion, but it still exists.  So, I didn't change
> > anything about panic_on_unrecovered_nmi.
> >
> 
> > > --- a/arch/x86/kernel/nmi.c
> > > +++ b/arch/x86/kernel/nmi.c
> > > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
> > >  #endif
> > >
> > >  	if (panic_on_unrecovered_nmi)
> > > -		panic("NMI: Not continuing");
> > > +		nmi_panic("NMI: Not continuing");
> > >
> > >  	pr_emerg("Dazed and confused, but trying to continue\n");
> > >
> 
> I was looking at unregister_nmi_handler() because that's the function
> the diff points to. That still doesn't have panic_on_unrecovered_nmi.
> 
> It looks like your diff tool is 'broken' and generates nonsense function
> data.

I had noticed the function name is wrong, but I didn't know how
do I fix that, sorry.  Now, I updated git to the latest version
and the issue disappeared.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
  2015-09-30 11:50   ` Peter Zijlstra
@ 2015-10-01  1:43     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01  1:43 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1362 bytes --]

> On Fri, Sep 25, 2015 at 08:28:07PM +0900, Hidehiro Kawai wrote:
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >  	wmb();
> >
> >  	smp_send_nmi_allbutself();
> > +	crash_ipi_done = 1; /* Kick cpus looping in nmi context */
> 
> I would suggest using WRITE_ONCE() for that, because without the
> volatile the compiler need not actually emit the store until after the
> whole waiting thing _IF_ it can inline the whole thing.
> 
> Currently udelay() will end up being a function call and will therefore
> force the store to be emitted, but I'd rather not rely on that.

OK, I use WRITE_ONCE().
Thanks!

> >
> >  	msecs = 1000; /* Wait at most a second for the other cpus to stop */
> >  	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Re: [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
  2015-09-30 11:53       ` Peter Zijlstra
@ 2015-10-01  2:04         ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01  2:04 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: kbuild-all, Jonathan Corbet, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1352 bytes --]

> On Mon, Sep 28, 2015 at 07:08:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > >> kernel/kexec_core.c:899:3: note: in expansion of macro 'atomic_xchg'
> > >       atomic_xchg(&panic_cpu, -1);
> > >       ^
> >
> > I changed to use atomic_xchg() instead of atomic_set() in V3
> > because atomic_set() doesn't mean memory barrier.  However,
> > I thought again and there is no need of barrier; there is no
> > problem if a competitor sees old value of panic_cpu or new one.
> > So, atomic_set() is sufficient and using it will remove this warning.
> >
> > I will resend the fixed version later.
> 
> So if you rely on the memory barrier; you should have also put a comment
> on explaining the ordering requirements.

I don't intend to use an explicit memory barrier.  There is no
memory ordering requirement here.  Also, atomic_set() which will be
used instead of atomic_xchg() is used as a RELEASE operation, so
I believe there is no problem.

Documentation/memory-barriers.txt:
> The following operations are potential problems as they do _not_ imply memory
> barriers, but might be used for implementing such things as RELEASE-class
> operations:
> 
>         atomic_set();
> ...

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-09-30 11:55   ` Peter Zijlstra
@ 2015-10-01  2:33     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-01  6:27       ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01  2:33 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1163 bytes --]

> On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> 
> So that they can get really stuck when the crash kernel crashes, right?
> ;-)

No, it is different from my intention.

`mistakenly' in the above means; they issue NMI due to a misconception
that the monitored host is stuck in the 1st kernel while it is actually
taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
there is a tool such as fence_kdump which notifies "I'm taking a crash
dump, so don't send NMI" to the HA clustering software.  However, there
is a time window between kernel panic and the notification.

"noextnmi" allows users to avoid this kind of accident all the time of
2nd kernel.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01  2:33     ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-01  6:27       ` Peter Zijlstra
  2015-10-01  7:01         ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2015-10-01  6:27 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Thu, Oct 01, 2015 at 02:33:18AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > This patch introduces new boot option "noextnmi" which disables
> > > external NMI.  This option is useful for the dump capture kernel
> > > so that an HA application or administrator wouldn't mistakenly
> > > shoot down the kernel by NMI.
> > 
> > So that they can get really stuck when the crash kernel crashes, right?
> > ;-)
> 
> No, it is different from my intention.
> 
> `mistakenly' in the above means; they issue NMI due to a misconception
> that the monitored host is stuck in the 1st kernel while it is actually
> taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> there is a tool such as fence_kdump which notifies "I'm taking a crash
> dump, so don't send NMI" to the HA clustering software.  However, there
> is a time window between kernel panic and the notification.
> 
> "noextnmi" allows users to avoid this kind of accident all the time of
> 2nd kernel.

Yes yes, I understand. But if the crash kernel also gets stuck they have
no means of recovery, right? (other than power cycling the hardware)

Just playing devils advocate here, I don't actually object to the patch.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01  6:27       ` Peter Zijlstra
@ 2015-10-01  7:01         ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-01  8:43           ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01  7:01 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1881 bytes --]

> On Thu, Oct 01, 2015 at 02:33:18AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > > This patch introduces new boot option "noextnmi" which disables
> > > > external NMI.  This option is useful for the dump capture kernel
> > > > so that an HA application or administrator wouldn't mistakenly
> > > > shoot down the kernel by NMI.
> > >
> > > So that they can get really stuck when the crash kernel crashes, right?
> > > ;-)
> >
> > No, it is different from my intention.
> >
> > `mistakenly' in the above means; they issue NMI due to a misconception
> > that the monitored host is stuck in the 1st kernel while it is actually
> > taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> > there is a tool such as fence_kdump which notifies "I'm taking a crash
> > dump, so don't send NMI" to the HA clustering software.  However, there
> > is a time window between kernel panic and the notification.
> >
> > "noextnmi" allows users to avoid this kind of accident all the time of
> > 2nd kernel.
> 
> Yes yes, I understand. But if the crash kernel also gets stuck they have
> no means of recovery, right? (other than power cycling the hardware)

Yes, but I think it's not a big problem.

I suppose that a sever which uses this feature will equip a BMC
and BMC mandatorily supports hard reset command for the server.
If the HA clustering software detects no response from the server
after relatively long timeout, it might want to insert hard reset
to the server by IPMI over LAN.

> Just playing devils advocate here, I don't actually object to the patch.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01  7:01         ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-01  8:43           ` Borislav Petkov
  2015-10-01 10:24             ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-10-01  8:43 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Thu, Oct 01, 2015 at 07:01:50AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> I suppose that a sever which uses this feature will equip a BMC
> and BMC mandatorily supports hard reset command for the server.
> If the HA clustering software detects no response from the server
> after relatively long timeout, it might want to insert hard reset
> to the server by IPMI over LAN.

So why doesn't the capture kernel *automatically* ignore external NMIs,
without a cmdline option?

Before it starts capturing, it says: "I'm starting capturing and am
ignoring external NMIs from now on."

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01  8:43           ` Borislav Petkov
@ 2015-10-01 10:24             ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-01 11:01               ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-01 10:24 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1048 bytes --]

> On Thu, Oct 01, 2015 at 07:01:50AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > I suppose that a sever which uses this feature will equip a BMC
> > and BMC mandatorily supports hard reset command for the server.
> > If the HA clustering software detects no response from the server
> > after relatively long timeout, it might want to insert hard reset
> > to the server by IPMI over LAN.
> 
> So why doesn't the capture kernel *automatically* ignore external NMIs,
> without a cmdline option?
> 
> Before it starts capturing, it says: "I'm starting capturing and am
> ignoring external NMIs from now on."

But how do we check if the starting kernel is a dump capture kernel?
I think using cmdline option is the simplest way.  You just have to
add "noextnmi" to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01 10:24             ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-01 11:01               ` Borislav Petkov
  2015-10-02  0:58                 ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-10-01 11:01 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> But how do we check if the starting kernel is a dump capture kernel?

How does that first kernel pass info to the capture kernel?

> I think using cmdline option is the simplest way.

More often than not, simplest != correct.

What happens if I pass this option to the first kernel? All of a sudden
my *first* kernel doesn't get external NMIs.

Do you catch my drift?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-01 11:01               ` Borislav Petkov
@ 2015-10-02  0:58                 ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-02  7:47                   ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-02  0:58 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1121 bytes --]

> On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > But how do we check if the starting kernel is a dump capture kernel?
> 
> How does that first kernel pass info to the capture kernel?

As I described in the previous mail, You just have to add "noextnmi"
to KDUMP_COMMANDLINE_APPEND in /etc/sysconfig/kdump.  Then, "noextnmi"
option is passed to the capture kernel by the action of kexec command.

Cmdline option gives users flexibility.  I'm not sure all users
want to disable external NMIs in the 2nd kernel.

> > I think using cmdline option is the simplest way.
> 
> More often than not, simplest != correct.
> 
> What happens if I pass this option to the first kernel? All of a sudden
> my *first* kernel doesn't get external NMIs.

Yes, your first kernel doesn't get external NMIs, but basically
you don't have to set "noextnmi" option to the first kernel.


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-02  0:58                 ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-02  7:47                   ` Borislav Petkov
  2015-10-05  2:03                     ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-10-02  7:47 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Fri, Oct 02, 2015 at 12:58:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > But how do we check if the starting kernel is a dump capture kernel?
> > 
> > How does that first kernel pass info to the capture kernel?
> 
> As I described in the previous mail,

I meant: "How does the first kernel pass info to the capture kernel by
*not* using the kernel command line"?

The kernel command line is not the channel to pass data to the kdump
kernel.

> Yes, your first kernel doesn't get external NMIs, but basically
> you don't have to set "noextnmi" option to the first kernel.

So it doesn't belong there as a kernel command line parameter in the
first place.

IOW, you need a different method to pass data to the second kernel. Be
it an ELF header, be it a shared page, whatever.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-02  7:47                   ` Borislav Petkov
@ 2015-10-05  2:03                     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-05  8:27                       ` Borislav Petkov
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-05  2:03 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2405 bytes --]

> On Fri, Oct 02, 2015 at 12:58:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > On Thu, Oct 01, 2015 at 10:24:19AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > But how do we check if the starting kernel is a dump capture kernel?
> > >
> > > How does that first kernel pass info to the capture kernel?
> >
> > As I described in the previous mail,
> 
> I meant: "How does the first kernel pass info to the capture kernel by
> *not* using the kernel command line"?
> 
> The kernel command line is not the channel to pass data to the kdump
> kernel.

That's different from my point of view.  I'm not going to pass
some data from the first kernel to the second kernel. I'm just going to
provide a configurable option for the second kernel to users.

We souldn't enable this feature silently.  Some users wouldn't like
to enable this feature.  For example, a user enables a watchdog timer
which raises an external NMI when the counter is not reset for a
specific duration.  Then, the second kernel hangs up while saving
crash dump, and NMI is delivered to the CPU.  The kernel gets panic
due to the NMI, prints some information to the display and serial
console, and then automatically reboot.  In this case, users don't
want to block external NMIs.

So, making this feature configurable by command line option is
reasonable.

> > Yes, your first kernel doesn't get external NMIs, but basically
> > you don't have to set "noextnmi" option to the first kernel.
> 
> So it doesn't belong there as a kernel command line parameter in the
> first place.
> 
> IOW, you need a different method to pass data to the second kernel. Be
> it an ELF header, be it a shared page, whatever.

I think we should use the ELF header only if the passed information
is saved to a crash dump.

Also, we wouldn't want to introduce new shared page for that purpose.
A memory segment provided by kexec syscall is not usable because
the second kernel doesn't know what there is in a segment without a
command line option.  Please note that "elfcorehdr" command line option
prepared by kexec command is used to inform the second kernel about
the address of the ELF header memory segment.

Regards,


Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-05  2:03                     ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-05  8:27                       ` Borislav Petkov
  2015-10-05  9:21                         ` 河合英宏 / KAWAI,HIDEHIRO
       [not found]                         ` <alpine.DEB.2.11.1510132210590.25029@nanos>
  0 siblings, 2 replies; 38+ messages in thread
From: Borislav Petkov @ 2015-10-05  8:27 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Mon, Oct 05, 2015 at 02:03:58AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> That's different from my point of view.  I'm not going to pass
> some data from the first kernel to the second kernel. I'm just going to
> provide a configurable option for the second kernel to users.

Dude, WTF?! You're adding a kernel command line which is supposed to
be used *only* by the kdump kernel. But nooo, it is there in the open
and visible to people. And anyone can type it in during boot. AND THAT
SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!

This information is strictly for the kdump kernel - it shouldn't be a
generic command line option. How hard it is to understand that simple
fact?!

<snip obvious use case>

> I think we should use the ELF header only if the passed information
> is saved to a crash dump.

So what?! ELF header will contain the additional bit of information that
the second kernel wasn't reacting to NMIs. But that's fine, that *is*
the desired behavior anyway.

All I'm saying is, this is a strict kdump kernel "command", so to speak,
and it doesn't belong with the generic kernel command line parameters.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-05  8:27                       ` Borislav Petkov
@ 2015-10-05  9:21                         ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-05 10:14                           ` Borislav Petkov
       [not found]                         ` <alpine.DEB.2.11.1510132210590.25029@nanos>
  1 sibling, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-05  9:21 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2261 bytes --]

> On Mon, Oct 05, 2015 at 02:03:58AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > That's different from my point of view.  I'm not going to pass
> > some data from the first kernel to the second kernel. I'm just going to
> > provide a configurable option for the second kernel to users.
> 
> Dude, WTF?! You're adding a kernel command line which is supposed to
> be used *only* by the kdump kernel. But nooo, it is there in the open
> and visible to people. And anyone can type it in during boot. AND THAT
> SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> 
> This information is strictly for the kdump kernel - it shouldn't be a
> generic command line option. How hard it is to understand that simple
> fact?!

So, the problem for you is that "noextnmi" option is visible and effective
in the first kernel, isn't it?  If so, we can ignore "noextnmi" option
if we are in the first kernel and remove it from the documentation.
"elfcorehdr" cmdline option prepared by kexec command is passed to only
the second kernel, and it is also used to check if the booted kernel is
a kdump kernel.  Thus, if "elfcorehdr" is NOT specified, then ignore
"noextnmi".

Documentation/kernel-parameters.txt:
>         elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
>                         Specifies physical address of start of kernel core
>                         image elf header and optionally the size. Generally
>                         kexec loader will pass this option to capture kernel.
>                         See Documentation/kdump/kdump.txt for details.

> <snip obvious use case>
> 
> > I think we should use the ELF header only if the passed information
> > is saved to a crash dump.
> 
> So what?! ELF header will contain the additional bit of information that
> the second kernel wasn't reacting to NMIs. But that's fine, that *is*
> the desired behavior anyway.
> 
> All I'm saying is, this is a strict kdump kernel "command", so to speak,
> and it doesn't belong with the generic kernel command line parameters.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-05  9:21                         ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-05 10:14                           ` Borislav Petkov
  2015-10-13 11:55                             ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Borislav Petkov @ 2015-10-05 10:14 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On Mon, Oct 05, 2015 at 09:21:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> So, the problem for you is that "noextnmi" option is visible and effective
> in the first kernel, isn't it?

No, such an option shouldn't exist at all. You should be passing
information *in* *a* *different* *manner* to the kdump kernel - not with
a kernel command line option.

I get the feeling I'm starting to sound like a broken record on this
mail thread... :-(

One other thing we could probably try to do is use boot_params which
is, IIUC, passed to the second kernel. So we can add another bit to
boot_params.hdr.loadflags or so and use that. Or something similar.

Not particularly crazy about it but it is still much better than a
command line param...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-05 10:14                           ` Borislav Petkov
@ 2015-10-13 11:55                             ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-13 11:55 UTC (permalink / raw)
  To: 'Borislav Petkov'
  Cc: 'Peter Zijlstra',
	Jonathan Corbet, Ingo Molnar, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Thomas Gleixner, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1527 bytes --]

Hello, Boris

Sorry for the late reply.

> On Mon, Oct 05, 2015 at 09:21:02AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > So, the problem for you is that "noextnmi" option is visible and effective
> > in the first kernel, isn't it?
> 
> No, such an option shouldn't exist at all. You should be passing
> information *in* *a* *different* *manner* to the kdump kernel - not with
> a kernel command line option.

Sorry, I couldn't find out the reason why I shouldn't use cmdline option.
It doesn't need new user I/F to inform the 1st kernel about masking/unmasking
external NMI in the 2nd kernel, doesn't need new data passing infrastructure,
and is easy to configure that.  Also, "elfcorehdr" and "disable_cpu_apicid"
have already been introduced as cmdline options for dump capture kernel.
This means the cmdline option approach would be mostly acceptable.

> I get the feeling I'm starting to sound like a broken record on this
> mail thread... :-(
> 
> One other thing we could probably try to do is use boot_params which
> is, IIUC, passed to the second kernel. So we can add another bit to
> boot_params.hdr.loadflags or so and use that. Or something similar.

I think using boot_params would be worse than ELF header approach.
It needs to reserve boot_params bits for all boot loaders.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
  2015-09-30 11:55   ` Peter Zijlstra
@ 2015-10-13 20:22   ` Thomas Gleixner
  2015-10-14  3:39     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-27  8:46   ` Baoquan He
  2 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2015-10-13 20:22 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar, Masami Hiramatsu

On Fri, 25 Sep 2015, Hidehiro Kawai wrote:

> This patch introduces new boot option "noextnmi" which disables
> external NMI.  This option is useful for the dump capture kernel
> so that an HA application or administrator wouldn't mistakenly
> shoot down the kernel by NMI.
> 
> Currently, only x86 supports this option.

You might add that is can be used for debugging purposes as
well. External NMIs can be their own source of trouble. :)
 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/kernel-parameters.txt |    4 ++++
>  arch/x86/kernel/apic/apic.c         |   17 ++++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 22a4b68..8bcaccd 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			noexec=on: enable non-executable mappings (default)
>  			noexec=off: disable non-executable mappings
>  
> +	noextnmi	[X86]
> +			Mask external NMI.  This option is useful for a
> +			dump capture kernel to be shot down by NMI.

That should read: "...not to be shot down", right?

Other than that.

Acked-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-13 20:22   ` Thomas Gleixner
@ 2015-10-14  3:39     ` 河合英宏 / KAWAI,HIDEHIRO
       [not found]       ` <alpine.DEB.2.11.1510140924370.25029@nanos>
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-14  3:39 UTC (permalink / raw)
  To: 'Thomas Gleixner'
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2513 bytes --]

> On Fri, 25 Sep 2015, Hidehiro Kawai wrote:
> 
> > This patch introduces new boot option "noextnmi" which disables
> > external NMI.  This option is useful for the dump capture kernel
> > so that an HA application or administrator wouldn't mistakenly
> > shoot down the kernel by NMI.
> >
> > Currently, only x86 supports this option.
> 
> You might add that is can be used for debugging purposes as
> well. External NMIs can be their own source of trouble. :)

Thanks for your comments!  I'll do that.

By the way, I have a pending patch which expands this option like
this:

	apic_extnmi={ bsp | all | none }

If apic_extnmi=all is specified, external NMIs are broadcast to
all CPUs.  This raises the successful rate of kernel panic in the case
where an external NMI to CPU 0 is swallowed by other NMI handlers or
blocked due to hang-up in NMI context.  The patch works without any
problems, but I'm going to drop the feature if it will cause long
discussion.  I'd like to settle this patch set down once.  At least,
I'm going to change this option to apic_extnmi={bsp|none} style for
the future expansion.

How do you think about this?

> > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > ---
> >  Documentation/kernel-parameters.txt |    4 ++++
> >  arch/x86/kernel/apic/apic.c         |   17 ++++++++++++++++-
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 22a4b68..8bcaccd 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			noexec=on: enable non-executable mappings (default)
> >  			noexec=off: disable non-executable mappings
> >
> > +	noextnmi	[X86]
> > +			Mask external NMI.  This option is useful for a
> > +			dump capture kernel to be shot down by NMI.
> 
> That should read: "...not to be shot down", right?

Yes, you are right.  I'll fix it.

> Other than that.
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
       [not found]                         ` <alpine.DEB.2.11.1510132210590.25029@nanos>
@ 2015-10-14 13:54                           ` Ingo Molnar
  2015-10-16  1:58                             ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 1 reply; 38+ messages in thread
From: Ingo Molnar @ 2015-10-14 13:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Borislav Petkov,
	河合英宏 / KAWAI,HIDEHIRO,
	'Peter Zijlstra',
	Jonathan Corbet, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Vivek Goyal, linux-doc, x86, kexec, linux-kernel,
	Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI


* Thomas Gleixner <tglx@linutronix.de> wrote:

> Borislav,
> 
> On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > On Mon, Oct 05, 2015 at 02:03:58AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > That's different from my point of view.  I'm not going to pass
> > > some data from the first kernel to the second kernel. I'm just going to
> > > provide a configurable option for the second kernel to users.
> > 
> > Dude, WTF?! You're adding a kernel command line which is supposed to
> > be used *only* by the kdump kernel. But nooo, it is there in the open
> > and visible to people. And anyone can type it in during boot. AND THAT
> > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > 
> > This information is strictly for the kdump kernel - it shouldn't be a
> > generic command line option. How hard it is to understand that simple
> > fact?!
> 
> Calm down!
> 
> Disabling that NMI in the first kernel is not going to make the world
> explode. We have enough command line options a user can type in, which
> are way worse than that one. If some "expert" types nonsense on the
> first kernel command line, then it's none of our problems, really.
> 
> If Kawai would have marked that option as a debug feature, this
> discussion would have probably never happened.
> 
> Aside of that, the best way to hand in options for the kdump kernel is
> the command line. We have an existing interface for that.
> 
> Let's move on. Nothing to see here.

So Boris kind of has a point: there are numerous problems with boot options as 
kexec parameter interface:

 - boot option strings are not a well defined programmatic interface:
    - failures are not obvious (they are often ignored)
    - inserting/setting parameters is awkward as well.

 - boot options are not an ABI, so when options have dual use with kexec it's easy
   to break things inadvertently: without that failure being apparent other than
   reintroducing obscure kexec failure modes again.

 - in the booted up kexec kernel it's not really obvious which options got set by
   kexec and which got set by the user - as there's no separation of namespaces.

 - likewise, if the user specifies an option in conflict with a kexec requirement
   it's not really obvious what's happening: the user setting should probably
   dominate - I'm not sure that's the case with the current kexec code.

Boot options are basically a user interface.

On the other hand the hack of (ab-)using boot parameters as kexec parameter 
passing is an existing, many years old practice and we cannot just stop it without 
offering an alternative (and better!) interface first.

We could improve things by either adding a separate kexec-only parameter passing 
facility (like programmatic boot parameters are) - or by creating some sort of 
boot parameter alias that clearly identifies kexec parameters.

So for example when introducing 'noextnmi' we'd also add a facility to add a 
'kexec_noextnmi' alias - which clearly identifies this boot parameter as a kexec 
related one.

Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the 
separation of namespaces (and the prioritization of user vs. kexec requirements) 
becomes well defined as well..

We should also probably print a warning if a kexec_* parameter is passed in that 
has no matching handler in the kexec()-ed kernel.

I do concur that this patch is probably OK given existing practices.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-14 13:54                           ` Ingo Molnar
@ 2015-10-16  1:58                             ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-16  1:58 UTC (permalink / raw)
  To: 'Ingo Molnar', Thomas Gleixner
  Cc: Borislav Petkov, 'Peter Zijlstra',
	Jonathan Corbet, Eric W. Biederman, H. Peter Anvin,
	Andrew Morton, Vivek Goyal, linux-doc, x86, kexec, linux-kernel,
	Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4293 bytes --]

> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > Borislav,
> >
> > On Mon, 5 Oct 2015, Borislav Petkov wrote:
> > > On Mon, Oct 05, 2015 at 02:03:58AM +0000, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > That's different from my point of view.  I'm not going to pass
> > > > some data from the first kernel to the second kernel. I'm just going to
> > > > provide a configurable option for the second kernel to users.
> > >
> > > Dude, WTF?! You're adding a kernel command line which is supposed to
> > > be used *only* by the kdump kernel. But nooo, it is there in the open
> > > and visible to people. And anyone can type it in during boot. AND THAT
> > > SHOULDN'T BE POSSIBLE IN THE FIRST PLACE!
> > >
> > > This information is strictly for the kdump kernel - it shouldn't be a
> > > generic command line option. How hard it is to understand that simple
> > > fact?!
> >
> > Calm down!
> >
> > Disabling that NMI in the first kernel is not going to make the world
> > explode. We have enough command line options a user can type in, which
> > are way worse than that one. If some "expert" types nonsense on the
> > first kernel command line, then it's none of our problems, really.
> >
> > If Kawai would have marked that option as a debug feature, this
> > discussion would have probably never happened.
> >
> > Aside of that, the best way to hand in options for the kdump kernel is
> > the command line. We have an existing interface for that.
> >
> > Let's move on. Nothing to see here.
> 
> So Boris kind of has a point: there are numerous problems with boot options as
> kexec parameter interface:
> 
>  - boot option strings are not a well defined programmatic interface:
>     - failures are not obvious (they are often ignored)
>     - inserting/setting parameters is awkward as well.
> 
>  - boot options are not an ABI, so when options have dual use with kexec it's easy
>    to break things inadvertently: without that failure being apparent other than
>    reintroducing obscure kexec failure modes again.
> 
>  - in the booted up kexec kernel it's not really obvious which options got set by
>    kexec and which got set by the user - as there's no separation of namespaces.
> 
>  - likewise, if the user specifies an option in conflict with a kexec requirement
>    it's not really obvious what's happening: the user setting should probably
>    dominate - I'm not sure that's the case with the current kexec code.

Thanks for the detailed explanation.  I can understand the disadvantages
of using boot option.  So these are essential problems with boot options
rather than new boot option added for kexec'ed kernel.
 
> Boot options are basically a user interface.
> 
> On the other hand the hack of (ab-)using boot parameters as kexec parameter
> passing is an existing, many years old practice and we cannot just stop it without
> offering an alternative (and better!) interface first.
> 
> We could improve things by either adding a separate kexec-only parameter passing
> facility (like programmatic boot parameters are) - or by creating some sort of
> boot parameter alias that clearly identifies kexec parameters.
> 
> So for example when introducing 'noextnmi' we'd also add a facility to add a
> 'kexec_noextnmi' alias - which clearly identifies this boot parameter as a kexec
> related one.
> 
> Every 'kexec' inserted parameter would be prefixed by kexec_ - and then the
> separation of namespaces (and the prioritization of user vs. kexec requirements)
> becomes well defined as well..
> 
> We should also probably print a warning if a kexec_* parameter is passed in that
> has no matching handler in the kexec()-ed kernel.

It would be reasonable.  Or we might improve kexec command so that
it removes conflict boot options with warnings.

As I stated in another mail, I'm going to change "noextnmi" to
"apic_extnmi={bsp|all|none}", which can be used both the first and
second kernels.  So, I'll add this option without "kexec_" prefix
at this point.

> I do concur that this patch is probably OK given existing practices.

Thanks!

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
       [not found]       ` <alpine.DEB.2.11.1510140924370.25029@nanos>
@ 2015-10-16  2:02         ` 河合英宏 / KAWAI,HIDEHIRO
  0 siblings, 0 replies; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-16  2:02 UTC (permalink / raw)
  To: 'Thomas Gleixner'
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Vivek Goyal, linux-doc, x86,
	kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1027 bytes --]

> > By the way, I have a pending patch which expands this option like
> > this:
> >
> > 	apic_extnmi={ bsp | all | none }
> >
> > If apic_extnmi=all is specified, external NMIs are broadcast to
> > all CPUs.  This raises the successful rate of kernel panic in the case
> > where an external NMI to CPU 0 is swallowed by other NMI handlers or
> > blocked due to hang-up in NMI context.  The patch works without any
> > problems, but I'm going to drop the feature if it will cause long
> > discussion.  I'd like to settle this patch set down once.  At least,
> > I'm going to change this option to apic_extnmi={bsp|none} style for
> > the future expansion.
> >
> > How do you think about this?
> 
> Do it right away with all three variants. They make a lot of sense to
> me.

OK. I'll do that.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
  2015-09-30 11:55   ` Peter Zijlstra
  2015-10-13 20:22   ` Thomas Gleixner
@ 2015-10-27  8:46   ` Baoquan He
  2015-10-27  9:01     ` 河合英宏 / KAWAI,HIDEHIRO
  2 siblings, 1 reply; 38+ messages in thread
From: Baoquan He @ 2015-10-27  8:46 UTC (permalink / raw)
  To: Hidehiro Kawai
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	Masami Hiramatsu

Hi,

I just have a look at this thread. I am wondering why we don't use
existing is_kdump_kernel() directly to disable external NMI if it's
in kdump kernel. Then no need to introduce another boot option "noextnmi"
which is used only for kdump kernel.

Thanks
Baoquan

On 09/25/15 at 08:28pm, Hidehiro Kawai wrote:
> This patch introduces new boot option "noextnmi" which disables
> external NMI.  This option is useful for the dump capture kernel
> so that an HA application or administrator wouldn't mistakenly
> shoot down the kernel by NMI.
> 
> Currently, only x86 supports this option.
> 
> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---
>  Documentation/kernel-parameters.txt |    4 ++++
>  arch/x86/kernel/apic/apic.c         |   17 ++++++++++++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 22a4b68..8bcaccd 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2379,6 +2379,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			noexec=on: enable non-executable mappings (default)
>  			noexec=off: disable non-executable mappings
>  
> +	noextnmi	[X86]
> +			Mask external NMI.  This option is useful for a
> +			dump capture kernel to be shot down by NMI.
> +
>  	nosmap		[X86]
>  			Disable SMAP (Supervisor Mode Access Prevention)
>  			even if it is supported by processor.
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 24e94ce..fd47128 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -82,6 +82,12 @@
>  static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;
>  
>  /*
> + * Don't enable external NMI via LINT1 on BSP.  This is useful for
> + * the dump capture kernel.
> + */
> +static bool apic_noextnmi;
> +
> +/*
>   * Map cpu index to physical APIC ID
>   */
>  DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
> @@ -1161,6 +1167,8 @@ void __init init_bsp_APIC(void)
>  	value = APIC_DM_NMI;
>  	if (!lapic_is_integrated())		/* 82489DX */
>  		value |= APIC_LVT_LEVEL_TRIGGER;
> +	if (apic_noextnmi)
> +		value |= APIC_LVT_MASKED;
>  	apic_write(APIC_LVT1, value);
>  }
>  
> @@ -1380,7 +1388,7 @@ void setup_local_APIC(void)
>  	/*
>  	 * only the BP should see the LINT1 NMI signal, obviously.
>  	 */
> -	if (!cpu)
> +	if (!cpu && !apic_noextnmi)
>  		value = APIC_DM_NMI;
>  	else
>  		value = APIC_DM_NMI | APIC_LVT_MASKED;
> @@ -2548,3 +2556,10 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
>  	return 0;
>  }
>  early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
> +
> +static int __init apic_set_noextnmi(char *arg)
> +{
> +	apic_noextnmi = true;
> +	return 0;
> +}
> +early_param("noextnmi", apic_set_noextnmi);
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-27  8:46   ` Baoquan He
@ 2015-10-27  9:01     ` 河合英宏 / KAWAI,HIDEHIRO
  2015-10-27  9:06       ` 'Baoquan He'
  0 siblings, 1 reply; 38+ messages in thread
From: 河合英宏 / KAWAI,HIDEHIRO @ 2015-10-27  9:01 UTC (permalink / raw)
  To: 'Baoquan He'
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1189 bytes --]

Hi,

> I just have a look at this thread. I am wondering why we don't use
> existing is_kdump_kernel() directly to disable external NMI if it's
> in kdump kernel. Then no need to introduce another boot option "noextnmi"
> which is used only for kdump kernel.

As I stated in another mail, there is a case where we don't want to
mask external NMIs in the second kernel.  So, we need some
configurable way.  Please see the following quotation.

> We souldn't enable this feature silently.  Some users wouldn't like
> to enable this feature.  For example, a user enables a watchdog timer
> which raises an external NMI when the counter is not reset for a
> specific duration.  Then, the second kernel hangs up while saving
> crash dump, and NMI is delivered to the CPU.  The kernel gets panic
> due to the NMI, prints some information to the display and serial
> console, and then automatically reboot.  In this case, users don't
> want to block external NMIs.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option
  2015-10-27  9:01     ` 河合英宏 / KAWAI,HIDEHIRO
@ 2015-10-27  9:06       ` 'Baoquan He'
  0 siblings, 0 replies; 38+ messages in thread
From: 'Baoquan He' @ 2015-10-27  9:06 UTC (permalink / raw)
  To: 河合英宏 / KAWAI,HIDEHIRO
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar, Eric W. Biederman,
	H. Peter Anvin, Andrew Morton, Thomas Gleixner, Vivek Goyal,
	linux-doc, x86, kexec, linux-kernel, Michal Hocko, Ingo Molnar,
	平松雅巳 / HIRAMATU,MASAMI

On 10/27/15 at 09:01am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hi,
> 
> > I just have a look at this thread. I am wondering why we don't use
> > existing is_kdump_kernel() directly to disable external NMI if it's
> > in kdump kernel. Then no need to introduce another boot option "noextnmi"
> > which is used only for kdump kernel.
> 
> As I stated in another mail, there is a case where we don't want to
> mask external NMIs in the second kernel.  So, we need some
> configurable way.  Please see the following quotation.

Got it. Thanks for telling.

> 
> > We souldn't enable this feature silently.  Some users wouldn't like
> > to enable this feature.  For example, a user enables a watchdog timer
> > which raises an external NMI when the counter is not reset for a
> > specific duration.  Then, the second kernel hangs up while saving
> > crash dump, and NMI is delivered to the CPU.  The kernel gets panic
> > due to the NMI, prints some information to the display and serial
> > console, and then automatically reboot.  In this case, users don't
> > want to block external NMIs.
> 
> Regards,
> 
> Hidehiro Kawai
> Hitachi, Ltd. Research & Development Group
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-10-27  9:06 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-25 11:28 [V4 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
2015-09-25 11:28 ` [V4 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
2015-09-25 12:13   ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-30 11:26     ` Peter Zijlstra
2015-10-01  1:02       ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-25 11:28 ` [V4 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
2015-09-30 11:50   ` Peter Zijlstra
2015-10-01  1:43     ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-25 11:28 ` [V4 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
2015-09-28  3:53   ` kbuild test robot
2015-09-28  7:08     ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-30 11:53       ` Peter Zijlstra
2015-10-01  2:04         ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-28  4:02   ` kbuild test robot
2015-09-28  4:46     ` 河合英宏 / KAWAI,HIDEHIRO
2015-09-25 11:28 ` [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option Hidehiro Kawai
2015-09-30 11:55   ` Peter Zijlstra
2015-10-01  2:33     ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-01  6:27       ` Peter Zijlstra
2015-10-01  7:01         ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-01  8:43           ` Borislav Petkov
2015-10-01 10:24             ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-01 11:01               ` Borislav Petkov
2015-10-02  0:58                 ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-02  7:47                   ` Borislav Petkov
2015-10-05  2:03                     ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-05  8:27                       ` Borislav Petkov
2015-10-05  9:21                         ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-05 10:14                           ` Borislav Petkov
2015-10-13 11:55                             ` 河合英宏 / KAWAI,HIDEHIRO
     [not found]                         ` <alpine.DEB.2.11.1510132210590.25029@nanos>
2015-10-14 13:54                           ` Ingo Molnar
2015-10-16  1:58                             ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-13 20:22   ` Thomas Gleixner
2015-10-14  3:39     ` 河合英宏 / KAWAI,HIDEHIRO
     [not found]       ` <alpine.DEB.2.11.1510140924370.25029@nanos>
2015-10-16  2:02         ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-27  8:46   ` Baoquan He
2015-10-27  9:01     ` 河合英宏 / KAWAI,HIDEHIRO
2015-10-27  9:06       ` 'Baoquan He'

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).