From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753676AbbKXM6f (ORCPT ); Tue, 24 Nov 2015 07:58:35 -0500 Received: from mail-wm0-f54.google.com ([74.125.82.54]:36661 "EHLO mail-wm0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753540AbbKXM6d (ORCPT ); Tue, 24 Nov 2015 07:58:33 -0500 Date: Tue, 24 Nov 2015 13:58:30 +0100 From: Michal Hocko To: Hidehiro Kawai Cc: Jonathan Corbet , Peter Zijlstra , Ingo Molnar , "Eric W. Biederman" , "H. Peter Anvin" , Andrew Morton , Thomas Gleixner , Vivek Goyal , Baoquan He , linux-doc@vger.kernel.org, x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Borislav Petkov , Masami Hiramatsu Subject: Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Message-ID: <20151124125830.GH29472@dhcp22.suse.cz> References: <20151120093641.4285.97253.stgit@softrs> <20151120093646.4285.62259.stgit@softrs> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151120093646.4285.62259.stgit@softrs> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 20-11-15 18:36:46, Hidehiro Kawai wrote: > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI > to non-panic cpus to stop them while saving their register > information and doing some cleanups for crash dumping. So if a > non-panic cpus is infinitely looping in NMI context, we fail to > save its register information and lose the information from the > crash dump. > > `Infinite loop in NMI context' can happen: > > a. when a cpu panics on NMI while another cpu is processing panic > b. when a cpu received an external or unknown NMI while another > cpu is processing panic on NMI > > In the case of a, it loops in panic_smp_self_stop(). In the case > of b, it loops in raw_spin_lock() of nmi_reason_lock. This can > happen on some servers which broadcasts NMIs to all CPUs when a dump > button is pushed. > > To save registers in these case too, this patch does following things: > > 1. Move the timing of `infinite loop in NMI context' (actually > done by panic_smp_self_stop()) outside of panic() to enable us to > refer pt_regs > 2. call a callback of nmi_shootdown_cpus() directly to save > registers and do some cleanups after setting waiting_for_crash_ipi > which is used for counting down the number of cpus which handled > the callback > > V5: > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the > compiler doesn't change the instruction order > - Support the case of b in the above description > - Add poll_crash_ipi_and_callback() > > V4: > - Rewrite the patch description > > V3: > - Newly introduced > > Signed-off-by: Hidehiro Kawai > Cc: Andrew Morton > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Peter Zijlstra > Cc: Eric Biederman > Cc: Vivek Goyal > Cc: Michal Hocko Yes this seems correct Acked-by: Michal Hocko > --- > arch/x86/include/asm/reboot.h | 1 + > arch/x86/kernel/nmi.c | 17 +++++++++++++---- > arch/x86/kernel/reboot.c | 28 ++++++++++++++++++++++++++++ > include/linux/kernel.h | 12 ++++++++++-- > kernel/panic.c | 10 ++++++++++ > kernel/watchdog.c | 2 +- > 6 files changed, 63 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h > index a82c4f1..964e82f 100644 > --- a/arch/x86/include/asm/reboot.h > +++ b/arch/x86/include/asm/reboot.h > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type); > > typedef void (*nmi_shootdown_cb)(int, struct pt_regs*); > void nmi_shootdown_cpus(nmi_shootdown_cb callback); > +void poll_crash_ipi_and_callback(struct pt_regs *regs); > > #endif /* _ASM_X86_REBOOT_H */ > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c > index 5131714..74a1434 100644 > --- a/arch/x86/kernel/nmi.c > +++ b/arch/x86/kernel/nmi.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #define CREATE_TRACE_POINTS > #include > @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs) > #endif > > if (panic_on_unrecovered_nmi) > - nmi_panic("NMI: Not continuing"); > + nmi_panic(regs, "NMI: Not continuing"); > > pr_emerg("Dazed and confused, but trying to continue\n"); > > @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs) > show_regs(regs); > > if (panic_on_io_nmi) { > - nmi_panic("NMI IOCK error: Not continuing"); > + nmi_panic(regs, "NMI IOCK error: Not continuing"); > > /* > * If we return from nmi_panic(), it means we have received > @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs) > > pr_emerg("Do you have a strange power saving mode enabled?\n"); > if (unknown_nmi_panic || panic_on_unrecovered_nmi) > - nmi_panic("NMI: Not continuing"); > + nmi_panic(regs, "NMI: Not continuing"); > > pr_emerg("Dazed and confused, but trying to continue\n"); > } > @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs) > } > > /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */ > - raw_spin_lock(&nmi_reason_lock); > + > + /* > + * Another CPU may be processing panic routines with holding > + * nmi_reason_lock. Check IPI issuance from the panicking CPU > + * and call the callback directly. > + */ > + while (!raw_spin_trylock(&nmi_reason_lock)) > + poll_crash_ipi_and_callback(regs); > + > reason = x86_platform.get_nmi_reason(); > > if (reason & NMI_REASON_MASK) { > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > index 02693dd..44c5f5b 100644 > --- a/arch/x86/kernel/reboot.c > +++ b/arch/x86/kernel/reboot.c > @@ -718,6 +718,7 @@ static int crashing_cpu; > static nmi_shootdown_cb shootdown_callback; > > static atomic_t waiting_for_crash_ipi; > +static int crash_ipi_done; > > static int crash_nmi_callback(unsigned int val, struct pt_regs *regs) > { > @@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback) > > smp_send_nmi_allbutself(); > > + /* Kick cpus looping in nmi context. */ > + WRITE_ONCE(crash_ipi_done, 1); > + > msecs = 1000; /* Wait at most a second for the other cpus to stop */ > while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { > mdelay(1); > @@ -788,9 +792,33 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback) > > /* Leave the nmi callback set */ > } > + > +/* > + * Wait for the timing of IPI for crash dumping, and then call its callback > + * directly. This function is used when we have already been in NMI handler. > + */ > +void poll_crash_ipi_and_callback(struct pt_regs *regs) > +{ > + if (crash_ipi_done) > + crash_nmi_callback(0, regs); /* Shouldn't return */ > +} > + > +/* Override the weak function in kernel/panic.c */ > +void nmi_panic_self_stop(struct pt_regs *regs) > +{ > + while (1) { > + poll_crash_ipi_and_callback(regs); > + cpu_relax(); > + } > +} > + > #else /* !CONFIG_SMP */ > void nmi_shootdown_cpus(nmi_shootdown_cb callback) > { > /* No other CPUs to shoot down */ > } > + > +void poll_crash_ipi_and_callback(struct pt_regs *regs) > +{ > +} > #endif > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 480a4fd..728a31b 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -255,6 +255,7 @@ extern long (*panic_blink)(int state); > __printf(1, 2) > void panic(const char *fmt, ...) > __noreturn __cold; > +void nmi_panic_self_stop(struct pt_regs *); > extern void oops_enter(void); > extern void oops_exit(void); > void print_oops_end_marker(void); > @@ -450,12 +451,19 @@ extern atomic_t panic_cpu; > /* > * A variant of panic() called from NMI context. > * If we've already panicked on this cpu, return from here. > + * If another cpu already panicked, loop in nmi_panic_self_stop() which > + * can provide architecture dependent code such as saving register states > + * for crash dump. > */ > -#define nmi_panic(fmt, ...) \ > +#define nmi_panic(regs, fmt, ...) \ > do { \ > + int old_cpu; \ > int this_cpu = raw_smp_processor_id(); \ > - if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \ > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu); \ > + if (old_cpu == -1) \ > panic(fmt, ##__VA_ARGS__); \ > + else if (old_cpu != this_cpu) \ > + nmi_panic_self_stop(regs); \ > } while (0) > > /* > diff --git a/kernel/panic.c b/kernel/panic.c > index 24ee2ea..4fce2be 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -61,6 +61,16 @@ void __weak panic_smp_self_stop(void) > cpu_relax(); > } > > +/* > + * Stop ourself in NMI context if another cpu has already panicked. > + * Architecture code may override this to prepare for crash dumping > + * (e.g. save register information). > + */ > +void __weak nmi_panic_self_stop(struct pt_regs *regs) > +{ > + panic_smp_self_stop(); > +} > + > atomic_t panic_cpu = ATOMIC_INIT(-1); > > /** > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index b9be18f..84b5035 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event, > trigger_allbutself_cpu_backtrace(); > > if (hardlockup_panic) > - nmi_panic("Hard LOCKUP"); > + nmi_panic(regs, "Hard LOCKUP"); > > __this_cpu_write(hard_watchdog_warn, true); > return; > -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail-wm0-f53.google.com ([74.125.82.53]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1a1DBN-0003zf-W5 for kexec@lists.infradead.org; Tue, 24 Nov 2015 12:58:55 +0000 Received: by wmvv187 with SMTP id v187so208041580wmv.1 for ; Tue, 24 Nov 2015 04:58:31 -0800 (PST) Date: Tue, 24 Nov 2015 13:58:30 +0100 From: Michal Hocko Subject: Re: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Message-ID: <20151124125830.GH29472@dhcp22.suse.cz> References: <20151120093641.4285.97253.stgit@softrs> <20151120093646.4285.62259.stgit@softrs> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20151120093646.4285.62259.stgit@softrs> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Hidehiro Kawai Cc: x86@kernel.org, Baoquan He , Jonathan Corbet , Peter Zijlstra , linux-doc@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Thomas Gleixner , "Eric W. Biederman" , "H. Peter Anvin" , Masami Hiramatsu , Borislav Petkov , Andrew Morton , Ingo Molnar , Vivek Goyal On Fri 20-11-15 18:36:46, Hidehiro Kawai wrote: > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI > to non-panic cpus to stop them while saving their register > information and doing some cleanups for crash dumping. So if a > non-panic cpus is infinitely looping in NMI context, we fail to > save its register information and lose the information from the > crash dump. > > `Infinite loop in NMI context' can happen: > > a. when a cpu panics on NMI while another cpu is processing panic > b. when a cpu received an external or unknown NMI while another > cpu is processing panic on NMI > > In the case of a, it loops in panic_smp_self_stop(). In the case > of b, it loops in raw_spin_lock() of nmi_reason_lock. This can > happen on some servers which broadcasts NMIs to all CPUs when a dump > button is pushed. > > To save registers in these case too, this patch does following things: > > 1. Move the timing of `infinite loop in NMI context' (actually > done by panic_smp_self_stop()) outside of panic() to enable us to > refer pt_regs > 2. call a callback of nmi_shootdown_cpus() directly to save > registers and do some cleanups after setting waiting_for_crash_ipi > which is used for counting down the number of cpus which handled > the callback > > V5: > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the > compiler doesn't change the instruction order > - Support the case of b in the above description > - Add poll_crash_ipi_and_callback() > > V4: > - Rewrite the patch description > > V3: > - Newly introduced > > Signed-off-by: Hidehiro Kawai > Cc: Andrew Morton > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: Peter Zijlstra > Cc: Eric Biederman > Cc: Vivek Goyal > Cc: Michal Hocko Yes this seems correct Acked-by: Michal Hocko > --- > arch/x86/include/asm/reboot.h | 1 + > arch/x86/kernel/nmi.c | 17 +++++++++++++---- > arch/x86/kernel/reboot.c | 28 ++++++++++++++++++++++++++++ > include/linux/kernel.h | 12 ++++++++++-- > kernel/panic.c | 10 ++++++++++ > kernel/watchdog.c | 2 +- > 6 files changed, 63 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h > index a82c4f1..964e82f 100644 > --- a/arch/x86/include/asm/reboot.h > +++ b/arch/x86/include/asm/reboot.h > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type); > > typedef void (*nmi_shootdown_cb)(int, struct pt_regs*); > void nmi_shootdown_cpus(nmi_shootdown_cb callback); > +void poll_crash_ipi_and_callback(struct pt_regs *regs); > > #endif /* _ASM_X86_REBOOT_H */ > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c > index 5131714..74a1434 100644 > --- a/arch/x86/kernel/nmi.c > +++ b/arch/x86/kernel/nmi.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #define CREATE_TRACE_POINTS > #include > @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs) > #endif > > if (panic_on_unrecovered_nmi) > - nmi_panic("NMI: Not continuing"); > + nmi_panic(regs, "NMI: Not continuing"); > > pr_emerg("Dazed and confused, but trying to continue\n"); > > @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs) > show_regs(regs); > > if (panic_on_io_nmi) { > - nmi_panic("NMI IOCK error: Not continuing"); > + nmi_panic(regs, "NMI IOCK error: Not continuing"); > > /* > * If we return from nmi_panic(), it means we have received > @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs) > > pr_emerg("Do you have a strange power saving mode enabled?\n"); > if (unknown_nmi_panic || panic_on_unrecovered_nmi) > - nmi_panic("NMI: Not continuing"); > + nmi_panic(regs, "NMI: Not continuing"); > > pr_emerg("Dazed and confused, but trying to continue\n"); > } > @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs) > } > > /* Non-CPU-specific NMI: NMI sources can be processed on any CPU */ > - raw_spin_lock(&nmi_reason_lock); > + > + /* > + * Another CPU may be processing panic routines with holding > + * nmi_reason_lock. Check IPI issuance from the panicking CPU > + * and call the callback directly. > + */ > + while (!raw_spin_trylock(&nmi_reason_lock)) > + poll_crash_ipi_and_callback(regs); > + > reason = x86_platform.get_nmi_reason(); > > if (reason & NMI_REASON_MASK) { > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > index 02693dd..44c5f5b 100644 > --- a/arch/x86/kernel/reboot.c > +++ b/arch/x86/kernel/reboot.c > @@ -718,6 +718,7 @@ static int crashing_cpu; > static nmi_shootdown_cb shootdown_callback; > > static atomic_t waiting_for_crash_ipi; > +static int crash_ipi_done; > > static int crash_nmi_callback(unsigned int val, struct pt_regs *regs) > { > @@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback) > > smp_send_nmi_allbutself(); > > + /* Kick cpus looping in nmi context. */ > + WRITE_ONCE(crash_ipi_done, 1); > + > msecs = 1000; /* Wait at most a second for the other cpus to stop */ > while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) { > mdelay(1); > @@ -788,9 +792,33 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback) > > /* Leave the nmi callback set */ > } > + > +/* > + * Wait for the timing of IPI for crash dumping, and then call its callback > + * directly. This function is used when we have already been in NMI handler. > + */ > +void poll_crash_ipi_and_callback(struct pt_regs *regs) > +{ > + if (crash_ipi_done) > + crash_nmi_callback(0, regs); /* Shouldn't return */ > +} > + > +/* Override the weak function in kernel/panic.c */ > +void nmi_panic_self_stop(struct pt_regs *regs) > +{ > + while (1) { > + poll_crash_ipi_and_callback(regs); > + cpu_relax(); > + } > +} > + > #else /* !CONFIG_SMP */ > void nmi_shootdown_cpus(nmi_shootdown_cb callback) > { > /* No other CPUs to shoot down */ > } > + > +void poll_crash_ipi_and_callback(struct pt_regs *regs) > +{ > +} > #endif > diff --git a/include/linux/kernel.h b/include/linux/kernel.h > index 480a4fd..728a31b 100644 > --- a/include/linux/kernel.h > +++ b/include/linux/kernel.h > @@ -255,6 +255,7 @@ extern long (*panic_blink)(int state); > __printf(1, 2) > void panic(const char *fmt, ...) > __noreturn __cold; > +void nmi_panic_self_stop(struct pt_regs *); > extern void oops_enter(void); > extern void oops_exit(void); > void print_oops_end_marker(void); > @@ -450,12 +451,19 @@ extern atomic_t panic_cpu; > /* > * A variant of panic() called from NMI context. > * If we've already panicked on this cpu, return from here. > + * If another cpu already panicked, loop in nmi_panic_self_stop() which > + * can provide architecture dependent code such as saving register states > + * for crash dump. > */ > -#define nmi_panic(fmt, ...) \ > +#define nmi_panic(regs, fmt, ...) \ > do { \ > + int old_cpu; \ > int this_cpu = raw_smp_processor_id(); \ > - if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \ > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu); \ > + if (old_cpu == -1) \ > panic(fmt, ##__VA_ARGS__); \ > + else if (old_cpu != this_cpu) \ > + nmi_panic_self_stop(regs); \ > } while (0) > > /* > diff --git a/kernel/panic.c b/kernel/panic.c > index 24ee2ea..4fce2be 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -61,6 +61,16 @@ void __weak panic_smp_self_stop(void) > cpu_relax(); > } > > +/* > + * Stop ourself in NMI context if another cpu has already panicked. > + * Architecture code may override this to prepare for crash dumping > + * (e.g. save register information). > + */ > +void __weak nmi_panic_self_stop(struct pt_regs *regs) > +{ > + panic_smp_self_stop(); > +} > + > atomic_t panic_cpu = ATOMIC_INIT(-1); > > /** > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index b9be18f..84b5035 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -351,7 +351,7 @@ static void watchdog_overflow_callback(struct perf_event *event, > trigger_allbutself_cpu_backtrace(); > > if (hardlockup_panic) > - nmi_panic("Hard LOCKUP"); > + nmi_panic(regs, "Hard LOCKUP"); > > __this_cpu_write(hard_watchdog_warn, true); > return; > -- Michal Hocko SUSE Labs _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec