This is the update version of the patch which I posted one year ago (https://lkml.org/lkml/2015/7/23/864). Because I couldn't received positive opinions against the previous version, I had changed the approach to another one which utilizes kexec purgatory. Then I'm back here. PATCH 1/2 is a bugfix for crash_kexec_post_notifiers boot option which allows users to call panic notifiers and kmsg dumpers before kdump. When this option is enabled, smp_send_stop() is used to stop other CPUs instead of a function provided by kdump. smp_send_stop() just stops other CPUs and doesn't other things needed for kdump. So, fix it by replacing smp_send_stop() with new appropriate one, panic_smp_send_stop(). Since panic_smp_send_stop() is architecture specific and I can't simply copy from kdump routines, only x86-specific version is provided. PATCH 2/2 makes crash_kexec_post_notifiers boot option modifiable after boot. We don't need to limit it to being modifiable at boot time. Changes in V3: - Revise comments, description, and symbol names (the logic doesn't change) - Make crash_kexec_post_notifiers boot option modifiable after boot Changes in V2: https://lkml.org/lkml/2015/7/23/864 - Replace smp_send_stop() call with crash_kexec version which saves cpu states and does cleanups instead of changing execution flow - Drop a fix for Problem 1 - Drop other patches because they aren't needed anymore V1: https://lkml.org/lkml/2015/7/10/316 --- Hidehiro Kawai (2): x86/panic: Replace smp_send_stop() with kdump friendly version kexec: Use core_param for crash_kexec_post_notifiers boot option arch/x86/kernel/crash.c | 14 ++++++++---- kernel/panic.c | 56 ++++++++++++++++++++++++++++++----------------- 2 files changed, 46 insertions(+), 24 deletions(-) -- Hidehiro Kawai Hitachi, Ltd. Research & Development Group
This patch fixes one of the problems reported by Daniel Walker (https://lkml.org/lkml/2015/6/24/44). If crash_kexec_post_notifiers boot option is specified, other CPUs are stopped by smp_send_stop() instead of machine_crash_shutdown() in crash_kexec() path. This behavior change leads two problems. Problem 1: octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are still online and try to stop their watchdog timer. If smp_send_stop() is called before octeon_generic_shutdown(), stopping watchdog timer will fail because other CPUs have been offlined by smp_send_stop(). panic() if crash_kexec_post_notifiers == 1 smp_send_stop() atomic_notifier_call_chain() kmsg_dump() crash_kexec() machine_crash_shutdown() octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs Problem 2: Most of architectures stop other CPUs in machine_crash_shutdown() path, and they also do something needed for kdump. For example, they save registers, disable virtualization extensions, and so on. However, if smp_send_stop() stops other CPUs before machine_crash_shutdown(), we miss those operations. How do we fix these problems? In the first place, we should stop other CPUs as soon as possible when panic() was called, otherwise other CPUs may wipe out a clue to the cause of the failure. So, we replace smp_send_stop() with more suitable one for kdump. This patch solves Problem 2 by replacing smp_send_stop() in panic() with panic_smp_send_stop(). This is a weak function which calls smp_send_stop(), and architecture dependent code may override this with appropriate one. This patch only provides x86-specific version. Changes in V3: - Revise comments, description, and symbol names Changes in V2: - Replace smp_send_stop() call with crash_kexec version which saves cpu states and cleans up VMX/SVM - Drop a fix for Problem 1 at this moment Reported-by: Daniel Walker <dwalker@fifo99.com> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> Cc: Minfei Huang <mnfhuang@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> --- arch/x86/kernel/crash.c | 14 ++++++++++---- kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- 2 files changed, 42 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 9ef978d..3305433 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) disable_local_APIC(); } -static void kdump_nmi_shootdown_cpus(void) +/* Override the weak function in kernel/panic.c */ +void panic_smp_send_stop(void) { - nmi_shootdown_cpus(kdump_nmi_callback); + static int cpus_stopped; + + if (cpus_stopped) + return; + nmi_shootdown_cpus(kdump_nmi_callback); disable_local_APIC(); + cpus_stopped = 1; } #else -static void kdump_nmi_shootdown_cpus(void) +void panic_smp_send_stop(void) { /* There are no cpus to shootdown */ } @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) /* The kernel is broken so disable interrupts */ local_irq_disable(); - kdump_nmi_shootdown_cpus(); + panic_smp_send_stop(); /* * VMCLEAR VMCSs loaded on this cpu if needed. diff --git a/kernel/panic.c b/kernel/panic.c index 8aa7449..da8062d2 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) panic_smp_self_stop(); } +/* + * Stop other CPUs in panic. Architecture dependent code may override this + * with more suitable version. For example, if the architecture supports + * crash dump, it should save registers of each stopped CPU and disable + * per-CPU features such as virtualization extensions. + */ +void __weak panic_smp_send_stop(void) +{ + static int cpus_stopped; + + /* + * This function can be called twice in panic path, but obviously + * we execute this only once. + */ + if (cpus_stopped) + return; + + /* + * Note smp_send_stop is the usual smp shutdown function, which + * unfortunately means it may not be hardened to work in a panic + * situation. + */ + smp_send_stop(); + cpus_stopped = 1; +} + atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); /* @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) * Only one CPU is allowed to execute the panic code from here. For * multiple parallel invocations of panic, all other CPUs either * stop themself or will wait until they are stopped by the 1st CPU - * with smp_send_stop(). + * with panic_smp_send_stop(). * * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which * comes here, so go ahead. @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) __crash_kexec(NULL); } - /* - * Note smp_send_stop is the usual smp shutdown function, which - * unfortunately means it may not be hardened to work in a panic - * situation. - */ - smp_send_stop(); + panic_smp_send_stop(); /* * Run any panic handlers, including those that might need to @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) /* * We may have ended up stopping the CPU holding the lock (in - * smp_send_stop()) while still having some valuable data in the console - * buffer. Try to acquire the lock then release it regardless of the - * result. The release will also print the buffers out. Locks debug - * should be disabled to avoid reporting bad unlock balance when + * panic_smp_send_stop()) while still having some valuable data in the + * console buffer. Try to acquire the lock then release it regardless + * of the result. The release will also print the buffers out. Locks + * debug should be disabled to avoid reporting bad unlock balance when * panic() is not being callled from OOPS. */ debug_locks_off();
crash_kexec_post_notifiers ia a boot option which controls whether the 1st kernel calls panic notifiers or not before booting the 2nd kernel. However, there is no need to limit it to being modifiable only at boot time. So, use core_param instead of early_param. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> --- kernel/panic.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index da8062d2..0d0edeb 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -134,6 +134,7 @@ void panic(const char *fmt, ...) long i, i_next = 0; int state = 0; int old_cpu, this_cpu; + bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; /* * Disable local interrupts. This will prevent panic_smp_self_stop @@ -186,7 +187,7 @@ void panic(const char *fmt, ...) * * Bypass the panic_cpu check and call __crash_kexec directly. */ - if (!crash_kexec_post_notifiers) { + if (!_crash_kexec_post_notifiers) { printk_nmi_flush_on_panic(); __crash_kexec(NULL); } @@ -212,7 +213,7 @@ void panic(const char *fmt, ...) * * Bypass the panic_cpu check and call __crash_kexec directly. */ - if (crash_kexec_post_notifiers) + if (_crash_kexec_post_notifiers) __crash_kexec(NULL); bust_spinlocks(0); @@ -592,13 +593,7 @@ EXPORT_SYMBOL(__stack_chk_fail); core_param(panic, panic_timeout, int, 0644); core_param(pause_on_oops, pause_on_oops, int, 0644); core_param(panic_on_warn, panic_on_warn, int, 0644); - -static int __init setup_crash_kexec_post_notifiers(char *s) -{ - crash_kexec_post_notifiers = true; - return 0; -} -early_param("crash_kexec_post_notifiers", setup_crash_kexec_post_notifiers); +core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644); static int __init oops_setup(char *s) {
On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > This patch fixes one of the problems reported by Daniel Walker > (https://lkml.org/lkml/2015/6/24/44). > > If crash_kexec_post_notifiers boot option is specified, other CPUs > are stopped by smp_send_stop() instead of machine_crash_shutdown() > in crash_kexec() path. This behavior change leads two problems. > > Problem 1: > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > still online and try to stop their watchdog timer. If > smp_send_stop() is called before octeon_generic_shutdown(), stopping > watchdog timer will fail because other CPUs have been offlined by > smp_send_stop(). > > panic() > if crash_kexec_post_notifiers == 1 > smp_send_stop() > atomic_notifier_call_chain() > kmsg_dump() > crash_kexec() > machine_crash_shutdown() > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > Problem 2: > Most of architectures stop other CPUs in machine_crash_shutdown() > path, and they also do something needed for kdump. For example, > they save registers, disable virtualization extensions, and so on. > However, if smp_send_stop() stops other CPUs before > machine_crash_shutdown(), we miss those operations. > > How do we fix these problems? In the first place, we should stop > other CPUs as soon as possible when panic() was called, otherwise > other CPUs may wipe out a clue to the cause of the failure. So, we > replace smp_send_stop() with more suitable one for kdump. We have been avoiding extra things in panic path, but unfortunately crash_kexec_post_notifiers were added. I tend to agree the best place for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. As for this patch I'm not sure it is safe to replace the smp_send_stop with the kdump friendly function. I'm also not sure if the kdump friendly function is safe for kdump. Will glad to hear opinions from other arch experts. BTW, if one want to use crash_kexec_post_notifiers he should take the risk of unreliable kdump. How about only call smp_send_stop in case no crash_kexec_post_notifiers being used. > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > with panic_smp_send_stop(). This is a weak function which calls > smp_send_stop(), and architecture dependent code may override this > with appropriate one. This patch only provides x86-specific version. It does not fix the Problem 1, it seem not possible to fix it? > > Changes in V3: > - Revise comments, description, and symbol names > > Changes in V2: > - Replace smp_send_stop() call with crash_kexec version which > saves cpu states and cleans up VMX/SVM > - Drop a fix for Problem 1 at this moment > > Reported-by: Daniel Walker <dwalker@fifo99.com> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Borislav Petkov <bp@suse.de> > Cc: Toshi Kani <toshi.kani@hpe.com> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > Cc: Takao Indoh <indou.takao@jp.fujitsu.com> > Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> > Cc: Minfei Huang <mnfhuang@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: Josh Poimboeuf <jpoimboe@redhat.com> > --- > arch/x86/kernel/crash.c | 14 ++++++++++---- > kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- > 2 files changed, 42 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index 9ef978d..3305433 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) > disable_local_APIC(); > } > > -static void kdump_nmi_shootdown_cpus(void) > +/* Override the weak function in kernel/panic.c */ > +void panic_smp_send_stop(void) > { > - nmi_shootdown_cpus(kdump_nmi_callback); > + static int cpus_stopped; > + > + if (cpus_stopped) > + return; > > + nmi_shootdown_cpus(kdump_nmi_callback); > disable_local_APIC(); > + cpus_stopped = 1; > } > > #else > -static void kdump_nmi_shootdown_cpus(void) > +void panic_smp_send_stop(void) > { > /* There are no cpus to shootdown */ > } > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > /* The kernel is broken so disable interrupts */ > local_irq_disable(); > > - kdump_nmi_shootdown_cpus(); > + panic_smp_send_stop(); > > /* > * VMCLEAR VMCSs loaded on this cpu if needed. > diff --git a/kernel/panic.c b/kernel/panic.c > index 8aa7449..da8062d2 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) > panic_smp_self_stop(); > } > > +/* > + * Stop other CPUs in panic. Architecture dependent code may override this > + * with more suitable version. For example, if the architecture supports > + * crash dump, it should save registers of each stopped CPU and disable > + * per-CPU features such as virtualization extensions. > + */ > +void __weak panic_smp_send_stop(void) > +{ > + static int cpus_stopped; > + > + /* > + * This function can be called twice in panic path, but obviously > + * we execute this only once. > + */ > + if (cpus_stopped) > + return; > + > + /* > + * Note smp_send_stop is the usual smp shutdown function, which > + * unfortunately means it may not be hardened to work in a panic > + * situation. > + */ > + smp_send_stop(); > + cpus_stopped = 1; > +} > + > atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); > > /* > @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) > * Only one CPU is allowed to execute the panic code from here. For > * multiple parallel invocations of panic, all other CPUs either > * stop themself or will wait until they are stopped by the 1st CPU > - * with smp_send_stop(). > + * with panic_smp_send_stop(). > * > * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which > * comes here, so go ahead. > @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) > __crash_kexec(NULL); > } > > - /* > - * Note smp_send_stop is the usual smp shutdown function, which > - * unfortunately means it may not be hardened to work in a panic > - * situation. > - */ > - smp_send_stop(); > + panic_smp_send_stop(); > > /* > * Run any panic handlers, including those that might need to > @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) > > /* > * We may have ended up stopping the CPU holding the lock (in > - * smp_send_stop()) while still having some valuable data in the console > - * buffer. Try to acquire the lock then release it regardless of the > - * result. The release will also print the buffers out. Locks debug > - * should be disabled to avoid reporting bad unlock balance when > + * panic_smp_send_stop()) while still having some valuable data in the > + * console buffer. Try to acquire the lock then release it regardless > + * of the result. The release will also print the buffers out. Locks > + * debug should be disabled to avoid reporting bad unlock balance when > * panic() is not being callled from OOPS. > */ > debug_locks_off(); > > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec Thanks Dave
Hi Dave, Thanks for the comments. > From: Dave Young [mailto:dyoung@redhat.com] > Sent: Monday, July 11, 2016 5:35 PM > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > This patch fixes one of the problems reported by Daniel Walker > > (https://lkml.org/lkml/2015/6/24/44). > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > in crash_kexec() path. This behavior change leads two problems. > > > > Problem 1: > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > > still online and try to stop their watchdog timer. If > > smp_send_stop() is called before octeon_generic_shutdown(), stopping > > watchdog timer will fail because other CPUs have been offlined by > > smp_send_stop(). > > > > panic() > > if crash_kexec_post_notifiers == 1 > > smp_send_stop() > > atomic_notifier_call_chain() > > kmsg_dump() > > crash_kexec() > > machine_crash_shutdown() > > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > > > Problem 2: > > Most of architectures stop other CPUs in machine_crash_shutdown() > > path, and they also do something needed for kdump. For example, > > they save registers, disable virtualization extensions, and so on. > > However, if smp_send_stop() stops other CPUs before > > machine_crash_shutdown(), we miss those operations. > > > > How do we fix these problems? In the first place, we should stop > > other CPUs as soon as possible when panic() was called, otherwise > > other CPUs may wipe out a clue to the cause of the failure. So, we > > replace smp_send_stop() with more suitable one for kdump. > > We have been avoiding extra things in panic path, but unfortunately > crash_kexec_post_notifiers were added. I tend to agree the best place > for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. Several months ago, I posted a patch set which writes regs to SEL, generate an event to send SNMP message, and start/stop BMC's watchdog timer in purgatory. This feature requires BMC with KCS (Keyboard Controller Style) I/F, but the most of enterprise grade server would have it. (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) Doing kmsg_dump things in purgatory wouldn't be suitable (should be done in the 2nd kernel before enabling devices and IRQs?) > As for this patch I'm not sure it is safe to replace the smp_send_stop > with the kdump friendly function. I'm also not sure if the kdump friendly > function is safe for kdump. Will glad to hear opinions from other > arch experts. This stuff depends on architectures, so I speak only about x86 (the logic doesn't change on other architectures at this time). kdump path with crash_kexec_post_notifiers disabled: panic() __crash_kexec() crash_setup_regs() crash_save_vmcoreinfo() machine_crash_shutdown() native_machine_crash_shutdown() panic_smp_send_stop() /* mostly same as original * kdump_nmi_shootdown_cpus() */ kdump path with crash_kexec_post_notifiers enabled: panic() panic_smp_send_stop() __crash_kexec() crash_setup_regs() crash_save_vmcoreinfo() machine_crash_shutdown() native_machine_crash_shutdown() panic_smp_send_stop() // do nothing The difference is that stopping other CPUs before crash_setup_regs() and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and crash_save_vmcoreinfo() just save information to some memory area, they wouldn't be affected by panic_smp_send_stop(). This means placing panic_smp_send_stop before __crash_kexec is safe. BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next version. > BTW, if one want to use crash_kexec_post_notifiers he should take the > risk of unreliable kdump. How about only call smp_send_stop in case no > crash_kexec_post_notifiers being used. Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop() for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. This would be because NMI IPI has a risk of deadlock. We checked if the kdump path has a risk of deadlock in the case of NMI panic and fixed it. But I'm not sure about normal panic path. I agree with that use smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > > with panic_smp_send_stop(). This is a weak function which calls > > smp_send_stop(), and architecture dependent code may override this > > with appropriate one. This patch only provides x86-specific version. > > It does not fix the Problem 1, it seem not possible to fix it? Problem 1 depends on architectures, and at least it doesn't happen on x86. I can try to fix the Problem 1 for MIPS, but I can't test it. Possible solution will be to use an smp_send_stop variant which stop the CPU without offlining. Best regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group > > Changes in V3: > > - Revise comments, description, and symbol names > > > > Changes in V2: > > - Replace smp_send_stop() call with crash_kexec version which > > saves cpu states and cleans up VMX/SVM > > - Drop a fix for Problem 1 at this moment > > > > Reported-by: Daniel Walker <dwalker@fifo99.com> > > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) > > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> > > Cc: Dave Young <dyoung@redhat.com> > > Cc: Baoquan He <bhe@redhat.com> > > Cc: Vivek Goyal <vgoyal@redhat.com> > > Cc: Eric Biederman <ebiederm@xmission.com> > > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: "H. Peter Anvin" <hpa@zytor.com> > > Cc: Borislav Petkov <bp@suse.de> > > Cc: Toshi Kani <toshi.kani@hpe.com> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > Cc: Takao Indoh <indou.takao@jp.fujitsu.com> > > Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> > > Cc: Minfei Huang <mnfhuang@gmail.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: Tejun Heo <tj@kernel.org> > > Cc: Josh Poimboeuf <jpoimboe@redhat.com> > > --- > > arch/x86/kernel/crash.c | 14 ++++++++++---- > > kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- > > 2 files changed, 42 insertions(+), 15 deletions(-) > > > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > > index 9ef978d..3305433 100644 > > --- a/arch/x86/kernel/crash.c > > +++ b/arch/x86/kernel/crash.c > > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) > > disable_local_APIC(); > > } > > > > -static void kdump_nmi_shootdown_cpus(void) > > +/* Override the weak function in kernel/panic.c */ > > +void panic_smp_send_stop(void) > > { > > - nmi_shootdown_cpus(kdump_nmi_callback); > > + static int cpus_stopped; > > + > > + if (cpus_stopped) > > + return; > > > > + nmi_shootdown_cpus(kdump_nmi_callback); > > disable_local_APIC(); > > + cpus_stopped = 1; > > } > > > > #else > > -static void kdump_nmi_shootdown_cpus(void) > > +void panic_smp_send_stop(void) > > { > > /* There are no cpus to shootdown */ > > } > > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > /* The kernel is broken so disable interrupts */ > > local_irq_disable(); > > > > - kdump_nmi_shootdown_cpus(); > > + panic_smp_send_stop(); > > > > /* > > * VMCLEAR VMCSs loaded on this cpu if needed. > > diff --git a/kernel/panic.c b/kernel/panic.c > > index 8aa7449..da8062d2 100644 > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) > > panic_smp_self_stop(); > > } > > > > +/* > > + * Stop other CPUs in panic. Architecture dependent code may override this > > + * with more suitable version. For example, if the architecture supports > > + * crash dump, it should save registers of each stopped CPU and disable > > + * per-CPU features such as virtualization extensions. > > + */ > > +void __weak panic_smp_send_stop(void) > > +{ > > + static int cpus_stopped; > > + > > + /* > > + * This function can be called twice in panic path, but obviously > > + * we execute this only once. > > + */ > > + if (cpus_stopped) > > + return; > > + > > + /* > > + * Note smp_send_stop is the usual smp shutdown function, which > > + * unfortunately means it may not be hardened to work in a panic > > + * situation. > > + */ > > + smp_send_stop(); > > + cpus_stopped = 1; > > +} > > + > > atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); > > > > /* > > @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) > > * Only one CPU is allowed to execute the panic code from here. For > > * multiple parallel invocations of panic, all other CPUs either > > * stop themself or will wait until they are stopped by the 1st CPU > > - * with smp_send_stop(). > > + * with panic_smp_send_stop(). > > * > > * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which > > * comes here, so go ahead. > > @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) > > __crash_kexec(NULL); > > } > > > > - /* > > - * Note smp_send_stop is the usual smp shutdown function, which > > - * unfortunately means it may not be hardened to work in a panic > > - * situation. > > - */ > > - smp_send_stop(); > > + panic_smp_send_stop(); > > > > /* > > * Run any panic handlers, including those that might need to > > @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) > > > > /* > > * We may have ended up stopping the CPU holding the lock (in > > - * smp_send_stop()) while still having some valuable data in the console > > - * buffer. Try to acquire the lock then release it regardless of the > > - * result. The release will also print the buffers out. Locks debug > > - * should be disabled to avoid reporting bad unlock balance when > > + * panic_smp_send_stop()) while still having some valuable data in the > > + * console buffer. Try to acquire the lock then release it regardless > > + * of the result. The release will also print the buffers out. Locks > > + * debug should be disabled to avoid reporting bad unlock balance when > > * panic() is not being callled from OOPS. > > */ > > debug_locks_off(); > > > > > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > Thanks > Dave
On 2016/07/05 at 19:33, Hidehiro Kawai wrote: > This patch fixes one of the problems reported by Daniel Walker > (https://lkml.org/lkml/2015/6/24/44). > > If crash_kexec_post_notifiers boot option is specified, other CPUs > are stopped by smp_send_stop() instead of machine_crash_shutdown() > in crash_kexec() path. This behavior change leads two problems. > > Problem 1: > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > still online and try to stop their watchdog timer. If > smp_send_stop() is called before octeon_generic_shutdown(), stopping > watchdog timer will fail because other CPUs have been offlined by > smp_send_stop(). > > panic() > if crash_kexec_post_notifiers == 1 > smp_send_stop() > atomic_notifier_call_chain() > kmsg_dump() > crash_kexec() > machine_crash_shutdown() > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > Problem 2: > Most of architectures stop other CPUs in machine_crash_shutdown() > path, and they also do something needed for kdump. For example, > they save registers, disable virtualization extensions, and so on. > However, if smp_send_stop() stops other CPUs before > machine_crash_shutdown(), we miss those operations. > > How do we fix these problems? In the first place, we should stop > other CPUs as soon as possible when panic() was called, otherwise > other CPUs may wipe out a clue to the cause of the failure. So, we > replace smp_send_stop() with more suitable one for kdump. > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > with panic_smp_send_stop(). This is a weak function which calls > smp_send_stop(), and architecture dependent code may override this > with appropriate one. This patch only provides x86-specific version. > > Changes in V3: > - Revise comments, description, and symbol names > > Changes in V2: > - Replace smp_send_stop() call with crash_kexec version which > saves cpu states and cleans up VMX/SVM > - Drop a fix for Problem 1 at this moment > > Reported-by: Daniel Walker <dwalker@fifo99.com> > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> > Cc: Dave Young <dyoung@redhat.com> > Cc: Baoquan He <bhe@redhat.com> > Cc: Vivek Goyal <vgoyal@redhat.com> > Cc: Eric Biederman <ebiederm@xmission.com> > Cc: Masami Hiramatsu <mhiramat@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Borislav Petkov <bp@suse.de> > Cc: Toshi Kani <toshi.kani@hpe.com> > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > Cc: Takao Indoh <indou.takao@jp.fujitsu.com> > Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> > Cc: Minfei Huang <mnfhuang@gmail.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > Cc: Petr Mladek <pmladek@suse.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: Josh Poimboeuf <jpoimboe@redhat.com> > --- > arch/x86/kernel/crash.c | 14 ++++++++++---- > kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- > 2 files changed, 42 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index 9ef978d..3305433 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) > disable_local_APIC(); > } > > -static void kdump_nmi_shootdown_cpus(void) > +/* Override the weak function in kernel/panic.c */ > +void panic_smp_send_stop(void) > { > - nmi_shootdown_cpus(kdump_nmi_callback); > + static int cpus_stopped; Should be atomic_t type? > + > + if (cpus_stopped) > + return; > > + nmi_shootdown_cpus(kdump_nmi_callback); > disable_local_APIC(); > + cpus_stopped = 1; > } > > #else > -static void kdump_nmi_shootdown_cpus(void) > +void panic_smp_send_stop(void) > { > /* There are no cpus to shootdown */ > } > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > /* The kernel is broken so disable interrupts */ > local_irq_disable(); > > - kdump_nmi_shootdown_cpus(); > + panic_smp_send_stop(); > > /* > * VMCLEAR VMCSs loaded on this cpu if needed. > diff --git a/kernel/panic.c b/kernel/panic.c > index 8aa7449..da8062d2 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) > panic_smp_self_stop(); > } > > +/* > + * Stop other CPUs in panic. Architecture dependent code may override this > + * with more suitable version. For example, if the architecture supports > + * crash dump, it should save registers of each stopped CPU and disable > + * per-CPU features such as virtualization extensions. > + */ > +void __weak panic_smp_send_stop(void) > +{ > + static int cpus_stopped; > + > + /* > + * This function can be called twice in panic path, but obviously > + * we execute this only once. > + */ > + if (cpus_stopped) > + return; > + > + /* > + * Note smp_send_stop is the usual smp shutdown function, which > + * unfortunately means it may not be hardened to work in a panic > + * situation. > + */ > + smp_send_stop(); > + cpus_stopped = 1; > +} > + > atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); > > /* > @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) > * Only one CPU is allowed to execute the panic code from here. For > * multiple parallel invocations of panic, all other CPUs either > * stop themself or will wait until they are stopped by the 1st CPU > - * with smp_send_stop(). > + * with panic_smp_send_stop(). > * > * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which > * comes here, so go ahead. > @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) > __crash_kexec(NULL); > } > > - /* > - * Note smp_send_stop is the usual smp shutdown function, which > - * unfortunately means it may not be hardened to work in a panic > - * situation. > - */ > - smp_send_stop(); > + panic_smp_send_stop(); If crash_kexec_post_notifiers and kdump both are not used, you actually changed the original behaviour, it would be better to make it conditionally like: if (crash_kexec_post_notifiers) panic_smp_send_stop(); // renaming it to crash_smp_send_stop() is better? else smp_send_stop(); Regards, Xunlei > > /* > * Run any panic handlers, including those that might need to > @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) > > /* > * We may have ended up stopping the CPU holding the lock (in > - * smp_send_stop()) while still having some valuable data in the console > - * buffer. Try to acquire the lock then release it regardless of the > - * result. The release will also print the buffers out. Locks debug > - * should be disabled to avoid reporting bad unlock balance when > + * panic_smp_send_stop()) while still having some valuable data in the > + * console buffer. Try to acquire the lock then release it regardless > + * of the result. The release will also print the buffers out. Locks > + * debug should be disabled to avoid reporting bad unlock balance when > * panic() is not being callled from OOPS. > */ > debug_locks_off(); > > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec
Hi Xunlei, Thanks for the review. > From: Xunlei Pang [mailto:xpang@redhat.com] > Sent: Tuesday, July 12, 2016 12:12 PM > On 2016/07/05 at 19:33, Hidehiro Kawai wrote: > > This patch fixes one of the problems reported by Daniel Walker > > (https://lkml.org/lkml/2015/6/24/44). > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > in crash_kexec() path. This behavior change leads two problems. > > > > Problem 1: > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > > still online and try to stop their watchdog timer. If > > smp_send_stop() is called before octeon_generic_shutdown(), stopping > > watchdog timer will fail because other CPUs have been offlined by > > smp_send_stop(). > > > > panic() > > if crash_kexec_post_notifiers == 1 > > smp_send_stop() > > atomic_notifier_call_chain() > > kmsg_dump() > > crash_kexec() > > machine_crash_shutdown() > > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > > > Problem 2: > > Most of architectures stop other CPUs in machine_crash_shutdown() > > path, and they also do something needed for kdump. For example, > > they save registers, disable virtualization extensions, and so on. > > However, if smp_send_stop() stops other CPUs before > > machine_crash_shutdown(), we miss those operations. > > > > How do we fix these problems? In the first place, we should stop > > other CPUs as soon as possible when panic() was called, otherwise > > other CPUs may wipe out a clue to the cause of the failure. So, we > > replace smp_send_stop() with more suitable one for kdump. > > > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > > with panic_smp_send_stop(). This is a weak function which calls > > smp_send_stop(), and architecture dependent code may override this > > with appropriate one. This patch only provides x86-specific version. > > > > Changes in V3: > > - Revise comments, description, and symbol names > > > > Changes in V2: > > - Replace smp_send_stop() call with crash_kexec version which > > saves cpu states and cleans up VMX/SVM > > - Drop a fix for Problem 1 at this moment > > > > Reported-by: Daniel Walker <dwalker@fifo99.com> > > Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) > > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> > > Cc: Dave Young <dyoung@redhat.com> > > Cc: Baoquan He <bhe@redhat.com> > > Cc: Vivek Goyal <vgoyal@redhat.com> > > Cc: Eric Biederman <ebiederm@xmission.com> > > Cc: Masami Hiramatsu <mhiramat@kernel.org> > > Cc: Thomas Gleixner <tglx@linutronix.de> > > Cc: Ingo Molnar <mingo@redhat.com> > > Cc: "H. Peter Anvin" <hpa@zytor.com> > > Cc: Borislav Petkov <bp@suse.de> > > Cc: Toshi Kani <toshi.kani@hpe.com> > > Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> > > Cc: Takao Indoh <indou.takao@jp.fujitsu.com> > > Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> > > Cc: Minfei Huang <mnfhuang@gmail.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Michal Hocko <mhocko@suse.com> > > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: Petr Mladek <pmladek@suse.com> > > Cc: Tejun Heo <tj@kernel.org> > > Cc: Josh Poimboeuf <jpoimboe@redhat.com> > > --- > > arch/x86/kernel/crash.c | 14 ++++++++++---- > > kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- > > 2 files changed, 42 insertions(+), 15 deletions(-) > > > > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > > index 9ef978d..3305433 100644 > > --- a/arch/x86/kernel/crash.c > > +++ b/arch/x86/kernel/crash.c > > @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) > > disable_local_APIC(); > > } > > > > -static void kdump_nmi_shootdown_cpus(void) > > +/* Override the weak function in kernel/panic.c */ > > +void panic_smp_send_stop(void) > > { > > - nmi_shootdown_cpus(kdump_nmi_callback); > > + static int cpus_stopped; > > Should be atomic_t type? panic_smp_send_stop() can be called by only one panicking CPU (but can be called twice). It is sufficient to be normal variable. > > + > > + if (cpus_stopped) > > + return; > > > > + nmi_shootdown_cpus(kdump_nmi_callback); > > disable_local_APIC(); > > + cpus_stopped = 1; > > } > > > > #else > > -static void kdump_nmi_shootdown_cpus(void) > > +void panic_smp_send_stop(void) > > { > > /* There are no cpus to shootdown */ > > } > > @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > /* The kernel is broken so disable interrupts */ > > local_irq_disable(); > > > > - kdump_nmi_shootdown_cpus(); > > + panic_smp_send_stop(); > > > > /* > > * VMCLEAR VMCSs loaded on this cpu if needed. > > diff --git a/kernel/panic.c b/kernel/panic.c > > index 8aa7449..da8062d2 100644 > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) > > panic_smp_self_stop(); > > } > > > > +/* > > + * Stop other CPUs in panic. Architecture dependent code may override this > > + * with more suitable version. For example, if the architecture supports > > + * crash dump, it should save registers of each stopped CPU and disable > > + * per-CPU features such as virtualization extensions. > > + */ > > +void __weak panic_smp_send_stop(void) > > +{ > > + static int cpus_stopped; > > + > > + /* > > + * This function can be called twice in panic path, but obviously > > + * we execute this only once. > > + */ > > + if (cpus_stopped) > > + return; > > + > > + /* > > + * Note smp_send_stop is the usual smp shutdown function, which > > + * unfortunately means it may not be hardened to work in a panic > > + * situation. > > + */ > > + smp_send_stop(); > > + cpus_stopped = 1; > > +} > > + > > atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); > > > > /* > > @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) > > * Only one CPU is allowed to execute the panic code from here. For > > * multiple parallel invocations of panic, all other CPUs either > > * stop themself or will wait until they are stopped by the 1st CPU > > - * with smp_send_stop(). > > + * with panic_smp_send_stop(). > > * > > * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which > > * comes here, so go ahead. > > @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) > > __crash_kexec(NULL); > > } > > > > - /* > > - * Note smp_send_stop is the usual smp shutdown function, which > > - * unfortunately means it may not be hardened to work in a panic > > - * situation. > > - */ > > - smp_send_stop(); > > + panic_smp_send_stop(); > > If crash_kexec_post_notifiers and kdump both are not used, you actually changed the original behaviour, > it would be better to make it conditionally like: > if (crash_kexec_post_notifiers) > panic_smp_send_stop(); // renaming it to crash_smp_send_stop() is better? > else > smp_send_stop(); Dave also pointed about this, and I'll fix it. Best regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group > > /* > > * Run any panic handlers, including those that might need to > > @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) > > > > /* > > * We may have ended up stopping the CPU holding the lock (in > > - * smp_send_stop()) while still having some valuable data in the console > > - * buffer. Try to acquire the lock then release it regardless of the > > - * result. The release will also print the buffers out. Locks debug > > - * should be disabled to avoid reporting bad unlock balance when > > + * panic_smp_send_stop()) while still having some valuable data in the > > + * console buffer. Try to acquire the lock then release it regardless > > + * of the result. The release will also print the buffers out. Locks > > + * debug should be disabled to avoid reporting bad unlock balance when > > * panic() is not being callled from OOPS. > > */ > > debug_locks_off(); > > > > > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec
On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi Xunlei, > > Thanks for the review. > >> From: Xunlei Pang [mailto:xpang@redhat.com] >> Sent: Tuesday, July 12, 2016 12:12 PM >> On 2016/07/05 at 19:33, Hidehiro Kawai wrote: >>> This patch fixes one of the problems reported by Daniel Walker >>> (https://lkml.org/lkml/2015/6/24/44). >>> >>> If crash_kexec_post_notifiers boot option is specified, other CPUs >>> are stopped by smp_send_stop() instead of machine_crash_shutdown() >>> in crash_kexec() path. This behavior change leads two problems. >>> >>> Problem 1: >>> octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are >>> still online and try to stop their watchdog timer. If >>> smp_send_stop() is called before octeon_generic_shutdown(), stopping >>> watchdog timer will fail because other CPUs have been offlined by >>> smp_send_stop(). >>> >>> panic() >>> if crash_kexec_post_notifiers == 1 >>> smp_send_stop() >>> atomic_notifier_call_chain() >>> kmsg_dump() >>> crash_kexec() >>> machine_crash_shutdown() >>> octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs >>> >>> Problem 2: >>> Most of architectures stop other CPUs in machine_crash_shutdown() >>> path, and they also do something needed for kdump. For example, >>> they save registers, disable virtualization extensions, and so on. >>> However, if smp_send_stop() stops other CPUs before >>> machine_crash_shutdown(), we miss those operations. >>> >>> How do we fix these problems? In the first place, we should stop >>> other CPUs as soon as possible when panic() was called, otherwise >>> other CPUs may wipe out a clue to the cause of the failure. So, we >>> replace smp_send_stop() with more suitable one for kdump. >>> >>> This patch solves Problem 2 by replacing smp_send_stop() in panic() >>> with panic_smp_send_stop(). This is a weak function which calls >>> smp_send_stop(), and architecture dependent code may override this >>> with appropriate one. This patch only provides x86-specific version. >>> >>> Changes in V3: >>> - Revise comments, description, and symbol names >>> >>> Changes in V2: >>> - Replace smp_send_stop() call with crash_kexec version which >>> saves cpu states and cleans up VMX/SVM >>> - Drop a fix for Problem 1 at this moment >>> >>> Reported-by: Daniel Walker <dwalker@fifo99.com> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) >>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> >>> Cc: Dave Young <dyoung@redhat.com> >>> Cc: Baoquan He <bhe@redhat.com> >>> Cc: Vivek Goyal <vgoyal@redhat.com> >>> Cc: Eric Biederman <ebiederm@xmission.com> >>> Cc: Masami Hiramatsu <mhiramat@kernel.org> >>> Cc: Thomas Gleixner <tglx@linutronix.de> >>> Cc: Ingo Molnar <mingo@redhat.com> >>> Cc: "H. Peter Anvin" <hpa@zytor.com> >>> Cc: Borislav Petkov <bp@suse.de> >>> Cc: Toshi Kani <toshi.kani@hpe.com> >>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> >>> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> >>> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> >>> Cc: Minfei Huang <mnfhuang@gmail.com> >>> Cc: Andrew Morton <akpm@linux-foundation.org> >>> Cc: Michal Hocko <mhocko@suse.com> >>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> >>> Cc: Petr Mladek <pmladek@suse.com> >>> Cc: Tejun Heo <tj@kernel.org> >>> Cc: Josh Poimboeuf <jpoimboe@redhat.com> >>> --- >>> arch/x86/kernel/crash.c | 14 ++++++++++---- >>> kernel/panic.c | 43 ++++++++++++++++++++++++++++++++----------- >>> 2 files changed, 42 insertions(+), 15 deletions(-) >>> >>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c >>> index 9ef978d..3305433 100644 >>> --- a/arch/x86/kernel/crash.c >>> +++ b/arch/x86/kernel/crash.c >>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) >>> disable_local_APIC(); >>> } >>> >>> -static void kdump_nmi_shootdown_cpus(void) >>> +/* Override the weak function in kernel/panic.c */ >>> +void panic_smp_send_stop(void) >>> { >>> - nmi_shootdown_cpus(kdump_nmi_callback); >>> + static int cpus_stopped; >> Should be atomic_t type? > panic_smp_send_stop() can be called by only one panicking CPU > (but can be called twice). It is sufficient to be normal variable. There are other call sites of __crash_kexec() for oops cases, which can call panic_smp_send_stop() concurrently on a different cpu. Regards, Xunlei >>> + >>> + if (cpus_stopped) >>> + return; >>> >>> + nmi_shootdown_cpus(kdump_nmi_callback); >>> disable_local_APIC(); >>> + cpus_stopped = 1; >>> } >>> >>> #else >>> -static void kdump_nmi_shootdown_cpus(void) >>> +void panic_smp_send_stop(void) >>> { >>> /* There are no cpus to shootdown */ >>> } >>> @@ -160,7 +166,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >>> /* The kernel is broken so disable interrupts */ >>> local_irq_disable(); >>> >>> - kdump_nmi_shootdown_cpus(); >>> + panic_smp_send_stop(); >>> >>> /* >>> * VMCLEAR VMCSs loaded on this cpu if needed. >>> diff --git a/kernel/panic.c b/kernel/panic.c >>> index 8aa7449..da8062d2 100644 >>> --- a/kernel/panic.c >>> +++ b/kernel/panic.c >>> @@ -71,6 +71,32 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs) >>> panic_smp_self_stop(); >>> } >>> >>> +/* >>> + * Stop other CPUs in panic. Architecture dependent code may override this >>> + * with more suitable version. For example, if the architecture supports >>> + * crash dump, it should save registers of each stopped CPU and disable >>> + * per-CPU features such as virtualization extensions. >>> + */ >>> +void __weak panic_smp_send_stop(void) >>> +{ >>> + static int cpus_stopped; >>> + >>> + /* >>> + * This function can be called twice in panic path, but obviously >>> + * we execute this only once. >>> + */ >>> + if (cpus_stopped) >>> + return; >>> + >>> + /* >>> + * Note smp_send_stop is the usual smp shutdown function, which >>> + * unfortunately means it may not be hardened to work in a panic >>> + * situation. >>> + */ >>> + smp_send_stop(); >>> + cpus_stopped = 1; >>> +} >>> + >>> atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); >>> >>> /* >>> @@ -125,7 +151,7 @@ void panic(const char *fmt, ...) >>> * Only one CPU is allowed to execute the panic code from here. For >>> * multiple parallel invocations of panic, all other CPUs either >>> * stop themself or will wait until they are stopped by the 1st CPU >>> - * with smp_send_stop(). >>> + * with panic_smp_send_stop(). >>> * >>> * `old_cpu == PANIC_CPU_INVALID' means this is the 1st CPU which >>> * comes here, so go ahead. >>> @@ -165,12 +191,7 @@ void panic(const char *fmt, ...) >>> __crash_kexec(NULL); >>> } >>> >>> - /* >>> - * Note smp_send_stop is the usual smp shutdown function, which >>> - * unfortunately means it may not be hardened to work in a panic >>> - * situation. >>> - */ >>> - smp_send_stop(); >>> + panic_smp_send_stop(); >> If crash_kexec_post_notifiers and kdump both are not used, you actually changed the original behaviour, >> it would be better to make it conditionally like: >> if (crash_kexec_post_notifiers) >> panic_smp_send_stop(); // renaming it to crash_smp_send_stop() is better? >> else >> smp_send_stop(); > Dave also pointed about this, and I'll fix it. > > Best regards, > > Hidehiro Kawai > Hitachi, Ltd. Research & Development Group > >>> /* >>> * Run any panic handlers, including those that might need to >>> @@ -198,10 +219,10 @@ void panic(const char *fmt, ...) >>> >>> /* >>> * We may have ended up stopping the CPU holding the lock (in >>> - * smp_send_stop()) while still having some valuable data in the console >>> - * buffer. Try to acquire the lock then release it regardless of the >>> - * result. The release will also print the buffers out. Locks debug >>> - * should be disabled to avoid reporting bad unlock balance when >>> + * panic_smp_send_stop()) while still having some valuable data in the >>> + * console buffer. Try to acquire the lock then release it regardless >>> + * of the result. The release will also print the buffers out. Locks >>> + * debug should be disabled to avoid reporting bad unlock balance when >>> * panic() is not being callled from OOPS. >>> */ >>> debug_locks_off(); >>> >>> >>> >>> _______________________________________________ >>> kexec mailing list >>> kexec@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/kexec
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Xunlei Pang
> Sent: Tuesday, July 12, 2016 3:57 PM
> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi Xunlei,
> >
> > Thanks for the review.
> >
> >> From: Xunlei Pang [mailto:xpang@redhat.com]
> >> Sent: Tuesday, July 12, 2016 12:12 PM
> >> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
> >>> This patch fixes one of the problems reported by Daniel Walker
> >>> (https://lkml.org/lkml/2015/6/24/44).
> >>>
> >>> If crash_kexec_post_notifiers boot option is specified, other CPUs
> >>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
> >>> in crash_kexec() path. This behavior change leads two problems.
> >>>
> >>> Problem 1:
> >>> octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
> >>> still online and try to stop their watchdog timer. If
> >>> smp_send_stop() is called before octeon_generic_shutdown(), stopping
> >>> watchdog timer will fail because other CPUs have been offlined by
> >>> smp_send_stop().
> >>>
> >>> panic()
> >>> if crash_kexec_post_notifiers == 1
> >>> smp_send_stop()
> >>> atomic_notifier_call_chain()
> >>> kmsg_dump()
> >>> crash_kexec()
> >>> machine_crash_shutdown()
> >>> octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
> >>>
> >>> Problem 2:
> >>> Most of architectures stop other CPUs in machine_crash_shutdown()
> >>> path, and they also do something needed for kdump. For example,
> >>> they save registers, disable virtualization extensions, and so on.
> >>> However, if smp_send_stop() stops other CPUs before
> >>> machine_crash_shutdown(), we miss those operations.
> >>>
> >>> How do we fix these problems? In the first place, we should stop
> >>> other CPUs as soon as possible when panic() was called, otherwise
> >>> other CPUs may wipe out a clue to the cause of the failure. So, we
> >>> replace smp_send_stop() with more suitable one for kdump.
> >>>
> >>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
> >>> with panic_smp_send_stop(). This is a weak function which calls
> >>> smp_send_stop(), and architecture dependent code may override this
> >>> with appropriate one. This patch only provides x86-specific version.
> >>>
> >>> Changes in V3:
> >>> - Revise comments, description, and symbol names
> >>>
> >>> Changes in V2:
> >>> - Replace smp_send_stop() call with crash_kexec version which
> >>> saves cpu states and cleans up VMX/SVM
> >>> - Drop a fix for Problem 1 at this moment
> >>>
> >>> Reported-by: Daniel Walker <dwalker@fifo99.com>
> >>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
> >>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> >>> Cc: Dave Young <dyoung@redhat.com>
> >>> Cc: Baoquan He <bhe@redhat.com>
> >>> Cc: Vivek Goyal <vgoyal@redhat.com>
> >>> Cc: Eric Biederman <ebiederm@xmission.com>
> >>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> >>> Cc: Thomas Gleixner <tglx@linutronix.de>
> >>> Cc: Ingo Molnar <mingo@redhat.com>
> >>> Cc: "H. Peter Anvin" <hpa@zytor.com>
> >>> Cc: Borislav Petkov <bp@suse.de>
> >>> Cc: Toshi Kani <toshi.kani@hpe.com>
> >>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> >>> Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
> >>> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
> >>> Cc: Minfei Huang <mnfhuang@gmail.com>
> >>> Cc: Andrew Morton <akpm@linux-foundation.org>
> >>> Cc: Michal Hocko <mhocko@suse.com>
> >>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> >>> Cc: Petr Mladek <pmladek@suse.com>
> >>> Cc: Tejun Heo <tj@kernel.org>
> >>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> >>> ---
> >>> arch/x86/kernel/crash.c | 14 ++++++++++----
> >>> kernel/panic.c | 43 ++++++++++++++++++++++++++++++++-----------
> >>> 2 files changed, 42 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> >>> index 9ef978d..3305433 100644
> >>> --- a/arch/x86/kernel/crash.c
> >>> +++ b/arch/x86/kernel/crash.c
> >>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
> >>> disable_local_APIC();
> >>> }
> >>>
> >>> -static void kdump_nmi_shootdown_cpus(void)
> >>> +/* Override the weak function in kernel/panic.c */
> >>> +void panic_smp_send_stop(void)
> >>> {
> >>> - nmi_shootdown_cpus(kdump_nmi_callback);
> >>> + static int cpus_stopped;
> >> Should be atomic_t type?
> > panic_smp_send_stop() can be called by only one panicking CPU
> > (but can be called twice). It is sufficient to be normal variable.
>
> There are other call sites of __crash_kexec() for oops cases, which can
> call panic_smp_send_stop() concurrently on a different cpu.
In oops cases, crash_kexec() is called first, then __crash_kexec() is
called. crash_kexec() excludes concurrent execution of panic() and
crash_kexec() via panic_cpu, so panic_smp_send_stop() shouldn't be
called concurrently.
Best regards,
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group
On 2016/07/12 at 15:12, 河合英宏 / KAWAI,HIDEHIRO wrote:
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Xunlei Pang
>> Sent: Tuesday, July 12, 2016 3:57 PM
>> On 2016/07/12 at 11:56, 河合英宏 / KAWAI,HIDEHIRO wrote:
>>> Hi Xunlei,
>>>
>>> Thanks for the review.
>>>
>>>> From: Xunlei Pang [mailto:xpang@redhat.com]
>>>> Sent: Tuesday, July 12, 2016 12:12 PM
>>>> On 2016/07/05 at 19:33, Hidehiro Kawai wrote:
>>>>> This patch fixes one of the problems reported by Daniel Walker
>>>>> (https://lkml.org/lkml/2015/6/24/44).
>>>>>
>>>>> If crash_kexec_post_notifiers boot option is specified, other CPUs
>>>>> are stopped by smp_send_stop() instead of machine_crash_shutdown()
>>>>> in crash_kexec() path. This behavior change leads two problems.
>>>>>
>>>>> Problem 1:
>>>>> octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are
>>>>> still online and try to stop their watchdog timer. If
>>>>> smp_send_stop() is called before octeon_generic_shutdown(), stopping
>>>>> watchdog timer will fail because other CPUs have been offlined by
>>>>> smp_send_stop().
>>>>>
>>>>> panic()
>>>>> if crash_kexec_post_notifiers == 1
>>>>> smp_send_stop()
>>>>> atomic_notifier_call_chain()
>>>>> kmsg_dump()
>>>>> crash_kexec()
>>>>> machine_crash_shutdown()
>>>>> octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs
>>>>>
>>>>> Problem 2:
>>>>> Most of architectures stop other CPUs in machine_crash_shutdown()
>>>>> path, and they also do something needed for kdump. For example,
>>>>> they save registers, disable virtualization extensions, and so on.
>>>>> However, if smp_send_stop() stops other CPUs before
>>>>> machine_crash_shutdown(), we miss those operations.
>>>>>
>>>>> How do we fix these problems? In the first place, we should stop
>>>>> other CPUs as soon as possible when panic() was called, otherwise
>>>>> other CPUs may wipe out a clue to the cause of the failure. So, we
>>>>> replace smp_send_stop() with more suitable one for kdump.
>>>>>
>>>>> This patch solves Problem 2 by replacing smp_send_stop() in panic()
>>>>> with panic_smp_send_stop(). This is a weak function which calls
>>>>> smp_send_stop(), and architecture dependent code may override this
>>>>> with appropriate one. This patch only provides x86-specific version.
>>>>>
>>>>> Changes in V3:
>>>>> - Revise comments, description, and symbol names
>>>>>
>>>>> Changes in V2:
>>>>> - Replace smp_send_stop() call with crash_kexec version which
>>>>> saves cpu states and cleans up VMX/SVM
>>>>> - Drop a fix for Problem 1 at this moment
>>>>>
>>>>> Reported-by: Daniel Walker <dwalker@fifo99.com>
>>>>> Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option)
>>>>> Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
>>>>> Cc: Dave Young <dyoung@redhat.com>
>>>>> Cc: Baoquan He <bhe@redhat.com>
>>>>> Cc: Vivek Goyal <vgoyal@redhat.com>
>>>>> Cc: Eric Biederman <ebiederm@xmission.com>
>>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>> Cc: Borislav Petkov <bp@suse.de>
>>>>> Cc: Toshi Kani <toshi.kani@hpe.com>
>>>>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>>>>> Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
>>>>> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com>
>>>>> Cc: Minfei Huang <mnfhuang@gmail.com>
>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>> Cc: Michal Hocko <mhocko@suse.com>
>>>>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>>>>> Cc: Petr Mladek <pmladek@suse.com>
>>>>> Cc: Tejun Heo <tj@kernel.org>
>>>>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
>>>>> ---
>>>>> arch/x86/kernel/crash.c | 14 ++++++++++----
>>>>> kernel/panic.c | 43 ++++++++++++++++++++++++++++++++-----------
>>>>> 2 files changed, 42 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>>>> index 9ef978d..3305433 100644
>>>>> --- a/arch/x86/kernel/crash.c
>>>>> +++ b/arch/x86/kernel/crash.c
>>>>> @@ -133,15 +133,21 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
>>>>> disable_local_APIC();
>>>>> }
>>>>>
>>>>> -static void kdump_nmi_shootdown_cpus(void)
>>>>> +/* Override the weak function in kernel/panic.c */
>>>>> +void panic_smp_send_stop(void)
>>>>> {
>>>>> - nmi_shootdown_cpus(kdump_nmi_callback);
>>>>> + static int cpus_stopped;
>>>> Should be atomic_t type?
>>> panic_smp_send_stop() can be called by only one panicking CPU
>>> (but can be called twice). It is sufficient to be normal variable.
>> There are other call sites of __crash_kexec() for oops cases, which can
>> call panic_smp_send_stop() concurrently on a different cpu.
> In oops cases, crash_kexec() is called first, then __crash_kexec() is
> called. crash_kexec() excludes concurrent execution of panic() and
> crash_kexec() via panic_cpu, so panic_smp_send_stop() shouldn't be
> called concurrently.
Right, that's why oops calls crash_kexec() not __crash_kexec().
I have no problem on this. Thanks!
Regards,
Xunlei
On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi Dave, > > Thanks for the comments. > > > From: Dave Young [mailto:dyoung@redhat.com] > > Sent: Monday, July 11, 2016 5:35 PM > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > > This patch fixes one of the problems reported by Daniel Walker > > > (https://lkml.org/lkml/2015/6/24/44). > > > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > > in crash_kexec() path. This behavior change leads two problems. > > > > > > Problem 1: > > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > > > still online and try to stop their watchdog timer. If > > > smp_send_stop() is called before octeon_generic_shutdown(), stopping > > > watchdog timer will fail because other CPUs have been offlined by > > > smp_send_stop(). > > > > > > panic() > > > if crash_kexec_post_notifiers == 1 > > > smp_send_stop() > > > atomic_notifier_call_chain() > > > kmsg_dump() > > > crash_kexec() > > > machine_crash_shutdown() > > > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > > > > > Problem 2: > > > Most of architectures stop other CPUs in machine_crash_shutdown() > > > path, and they also do something needed for kdump. For example, > > > they save registers, disable virtualization extensions, and so on. > > > However, if smp_send_stop() stops other CPUs before > > > machine_crash_shutdown(), we miss those operations. > > > > > > How do we fix these problems? In the first place, we should stop > > > other CPUs as soon as possible when panic() was called, otherwise > > > other CPUs may wipe out a clue to the cause of the failure. So, we > > > replace smp_send_stop() with more suitable one for kdump. > > > > We have been avoiding extra things in panic path, but unfortunately > > crash_kexec_post_notifiers were added. I tend to agree the best place > > for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. > > Several months ago, I posted a patch set which writes regs to SEL, generate > an event to send SNMP message, and start/stop BMC's watchdog timer in > purgatory. This feature requires BMC with KCS (Keyboard Controller Style) > I/F, but the most of enterprise grade server would have it. > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be done > in the 2nd kernel before enabling devices and IRQs?) In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing? > > > As for this patch I'm not sure it is safe to replace the smp_send_stop > > with the kdump friendly function. I'm also not sure if the kdump friendly > > function is safe for kdump. Will glad to hear opinions from other > > arch experts. > > This stuff depends on architectures, so I speak only about > x86 (the logic doesn't change on other architectures at this time). > > kdump path with crash_kexec_post_notifiers disabled: > panic() > __crash_kexec() > crash_setup_regs() > crash_save_vmcoreinfo() > machine_crash_shutdown() > native_machine_crash_shutdown() > panic_smp_send_stop() /* mostly same as original > * kdump_nmi_shootdown_cpus() > */ > > kdump path with crash_kexec_post_notifiers enabled: > panic() > panic_smp_send_stop() > __crash_kexec() > crash_setup_regs() > crash_save_vmcoreinfo() > machine_crash_shutdown() > native_machine_crash_shutdown() > panic_smp_send_stop() // do nothing > > The difference is that stopping other CPUs before crash_setup_regs() > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > crash_save_vmcoreinfo() just save information to some memory area, > they wouldn't be affected by panic_smp_send_stop(). This means > placing panic_smp_send_stop before __crash_kexec is safe. > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the > next version. But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. > > > BTW, if one want to use crash_kexec_post_notifiers he should take the > > risk of unreliable kdump. How about only call smp_send_stop in case no > > crash_kexec_post_notifiers being used. > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop() > for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. > This would be because NMI IPI has a risk of deadlock. We checked if > the kdump path has a risk of deadlock in the case of NMI panic and fixed > it. But I'm not sure about normal panic path. I agree with that use > smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable: if (!crash_kexec_post_notifiers) smp_send_stop() > > > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > > > with panic_smp_send_stop(). This is a weak function which calls > > > smp_send_stop(), and architecture dependent code may override this > > > with appropriate one. This patch only provides x86-specific version. > > > > It does not fix the Problem 1, it seem not possible to fix it? > > Problem 1 depends on architectures, and at least it doesn't happen > on x86. I can try to fix the Problem 1 for MIPS, but I can't test it. > Possible solution will be to use an smp_send_stop variant which stop > the CPU without offlining. > > Best regards, > > Hidehiro Kawai > Hitachi, Ltd. Research & Development Group > [snip] Thanks Dave
Hi Dave, Thanks for your reply. > From: 'Dave Young' [mailto:dyoung@redhat.com] > Sent: Wednesday, July 13, 2016 11:04 AM > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi Dave, > > > > Thanks for the comments. > > > > > From: Dave Young [mailto:dyoung@redhat.com] > > > Sent: Monday, July 11, 2016 5:35 PM > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > > > This patch fixes one of the problems reported by Daniel Walker > > > > (https://lkml.org/lkml/2015/6/24/44). > > > > > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > > > in crash_kexec() path. This behavior change leads two problems. > > > > > > > > Problem 1: > > > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs > > > > are still online and try to stop their watchdog timer. If > > > > smp_send_stop() is called before octeon_generic_shutdown(), > > > > stopping watchdog timer will fail because other CPUs have been > > > > offlined by smp_send_stop(). > > > > > > > > panic() > > > > if crash_kexec_post_notifiers == 1 > > > > smp_send_stop() > > > > atomic_notifier_call_chain() > > > > kmsg_dump() > > > > crash_kexec() > > > > machine_crash_shutdown() > > > > octeon_generic_shutdown() // shutdown watchdog for ONLINE > > > > CPUs > > > > > > > > Problem 2: > > > > Most of architectures stop other CPUs in machine_crash_shutdown() > > > > path, and they also do something needed for kdump. For example, > > > > they save registers, disable virtualization extensions, and so on. > > > > However, if smp_send_stop() stops other CPUs before > > > > machine_crash_shutdown(), we miss those operations. > > > > > > > > How do we fix these problems? In the first place, we should stop > > > > other CPUs as soon as possible when panic() was called, otherwise > > > > other CPUs may wipe out a clue to the cause of the failure. So, > > > > we replace smp_send_stop() with more suitable one for kdump. > > > > > > We have been avoiding extra things in panic path, but unfortunately > > > crash_kexec_post_notifiers were added. I tend to agree the best > > > place for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. > > > > Several months ago, I posted a patch set which writes regs to SEL, > > generate an event to send SNMP message, and start/stop BMC's watchdog > > timer in purgatory. This feature requires BMC with KCS (Keyboard > > Controller Style) I/F, but the most of enterprise grade server would have it. > > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) > > > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be > > done in the 2nd kernel before enabling devices and IRQs?) > > In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing? Hmm, I checked the case of using ACPI ERST as a persistent storage. The feature is initialized by device_initcall(): device_initcall erst_init pstore_register And vmcore is initialized by fs_initcall() which is called before device_initcall(). We may be able to change the sequence, but anyway, these are done in later phase of the kernel initialization. So, it may get less reliable although it would be doable. > > > As for this patch I'm not sure it is safe to replace the > > > smp_send_stop with the kdump friendly function. I'm also not sure if > > > the kdump friendly function is safe for kdump. Will glad to hear > > > opinions from other arch experts. > > > > This stuff depends on architectures, so I speak only about > > x86 (the logic doesn't change on other architectures at this time). > > > > kdump path with crash_kexec_post_notifiers disabled: > > panic() > > __crash_kexec() > > crash_setup_regs() > > crash_save_vmcoreinfo() > > machine_crash_shutdown() > > native_machine_crash_shutdown() > > panic_smp_send_stop() /* mostly same as original > > * kdump_nmi_shootdown_cpus() > > */ > > > > kdump path with crash_kexec_post_notifiers enabled: > > panic() > > panic_smp_send_stop() > > __crash_kexec() > > crash_setup_regs() > > crash_save_vmcoreinfo() > > machine_crash_shutdown() > > native_machine_crash_shutdown() > > panic_smp_send_stop() // do nothing > > > > The difference is that stopping other CPUs before crash_setup_regs() > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > > crash_save_vmcoreinfo() just save information to some memory area, > > they wouldn't be affected by panic_smp_send_stop(). This means > > placing panic_smp_send_stop before __crash_kexec is safe. > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next > > version. > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. As I mentioned in the description of this patch, we should stop other CPUs ASAP to preserve current state either crash_kexec_post_notifiers is enabled or not. Then, all remaining procedures should work well after stopping other CPUs (but keep the CPU map online). Vivek also mentioned similar things: https://lkml.org/lkml/2015/7/14/433 > > > BTW, if one want to use crash_kexec_post_notifiers he should take > > > the risk of unreliable kdump. How about only call smp_send_stop in > > > case no crash_kexec_post_notifiers being used. > > > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), > > smp_send_stop() for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. > > This would be because NMI IPI has a risk of deadlock. We checked if > > the kdump path has a risk of deadlock in the case of NMI panic and > > fixed it. But I'm not sure about normal panic path. I agree with > > that use smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. > > What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable: > if (!crash_kexec_post_notifiers) > smp_send_stop() Remaining procedures including atomic_notifier_call_chain and kmsg_dump may assume that other CPUs have stopped. Actually, IPMI driver callback assumes so that. Also, other CPUs may change the current state while calling these callbacks and make the dump analysis difficult. [snip] Best regards, Hidehiro Kawai
Hi, On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi Dave, > > Thanks for your reply. > > > From: 'Dave Young' [mailto:dyoung@redhat.com] > > Sent: Wednesday, July 13, 2016 11:04 AM > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > Hi Dave, > > > > > > Thanks for the comments. > > > > > > > From: Dave Young [mailto:dyoung@redhat.com] > > > > Sent: Monday, July 11, 2016 5:35 PM > > > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > > > > This patch fixes one of the problems reported by Daniel Walker > > > > > (https://lkml.org/lkml/2015/6/24/44). > > > > > > > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > > > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > > > > in crash_kexec() path. This behavior change leads two problems. > > > > > > > > > > Problem 1: > > > > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs > > > > > are still online and try to stop their watchdog timer. If > > > > > smp_send_stop() is called before octeon_generic_shutdown(), > > > > > stopping watchdog timer will fail because other CPUs have been > > > > > offlined by smp_send_stop(). > > > > > > > > > > panic() > > > > > if crash_kexec_post_notifiers == 1 > > > > > smp_send_stop() > > > > > atomic_notifier_call_chain() > > > > > kmsg_dump() > > > > > crash_kexec() > > > > > machine_crash_shutdown() > > > > > octeon_generic_shutdown() // shutdown watchdog for ONLINE > > > > > CPUs > > > > > > > > > > Problem 2: > > > > > Most of architectures stop other CPUs in machine_crash_shutdown() > > > > > path, and they also do something needed for kdump. For example, > > > > > they save registers, disable virtualization extensions, and so on. > > > > > However, if smp_send_stop() stops other CPUs before > > > > > machine_crash_shutdown(), we miss those operations. > > > > > > > > > > How do we fix these problems? In the first place, we should stop > > > > > other CPUs as soon as possible when panic() was called, otherwise > > > > > other CPUs may wipe out a clue to the cause of the failure. So, > > > > > we replace smp_send_stop() with more suitable one for kdump. > > > > > > > > We have been avoiding extra things in panic path, but unfortunately > > > > crash_kexec_post_notifiers were added. I tend to agree the best > > > > place for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. > > > > > > Several months ago, I posted a patch set which writes regs to SEL, > > > generate an event to send SNMP message, and start/stop BMC's watchdog > > > timer in purgatory. This feature requires BMC with KCS (Keyboard > > > Controller Style) I/F, but the most of enterprise grade server would have it. > > > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) > > > > > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be > > > done in the 2nd kernel before enabling devices and IRQs?) > > > > In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing? > > Hmm, I checked the case of using ACPI ERST as a persistent > storage. The feature is initialized by device_initcall(): > > device_initcall > erst_init > pstore_register > > And vmcore is initialized by fs_initcall() which is called > before device_initcall(). We may be able to change the sequence, > but anyway, these are done in later phase of the kernel initialization. > So, it may get less reliable although it would be doable. Agreed, it is just an idea, it may need more experiments if you need. > > > > > As for this patch I'm not sure it is safe to replace the > > > > smp_send_stop with the kdump friendly function. I'm also not sure if > > > > the kdump friendly function is safe for kdump. Will glad to hear > > > > opinions from other arch experts. > > > > > > This stuff depends on architectures, so I speak only about > > > x86 (the logic doesn't change on other architectures at this time). > > > > > > kdump path with crash_kexec_post_notifiers disabled: > > > panic() > > > __crash_kexec() > > > crash_setup_regs() > > > crash_save_vmcoreinfo() > > > machine_crash_shutdown() > > > native_machine_crash_shutdown() > > > panic_smp_send_stop() /* mostly same as original > > > * kdump_nmi_shootdown_cpus() > > > */ > > > > > > kdump path with crash_kexec_post_notifiers enabled: > > > panic() > > > panic_smp_send_stop() > > > __crash_kexec() > > > crash_setup_regs() > > > crash_save_vmcoreinfo() > > > machine_crash_shutdown() > > > native_machine_crash_shutdown() > > > panic_smp_send_stop() // do nothing > > > > > > The difference is that stopping other CPUs before crash_setup_regs() > > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > > > crash_save_vmcoreinfo() just save information to some memory area, > > > they wouldn't be affected by panic_smp_send_stop(). This means > > > placing panic_smp_send_stop before __crash_kexec is safe. > > > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next > > > version. > > > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. > > As I mentioned in the description of this patch, we should stop > other CPUs ASAP to preserve current state either > crash_kexec_post_notifiers is enabled or not. > Then, all remaining procedures should work well > after stopping other CPUs (but keep the CPU map online). > > Vivek also mentioned similar things: > https://lkml.org/lkml/2015/7/14/433 The implementation in this patchset is different from suggestion in above link? I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below: stop_cpus_save_register_state; if (!crash_kexec_post_notifiers) crash_kexec() atomic_notifier_call_chain() kmsg_dump() I'm just commenting from code flow point of view, the detail implementation definitely need more comments from Arch experts. Any reason did not move the kdump friendly function to earlier point like before previous __crash_kexec() below? if (!crash_kexec_post_notifiers) { printk_nmi_flush_on_panic(); __crash_kexec(NULL); } > > > > > BTW, if one want to use crash_kexec_post_notifiers he should take > > > > the risk of unreliable kdump. How about only call smp_send_stop in > > > > case no crash_kexec_post_notifiers being used. > > > > > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), > > > smp_send_stop() for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. > > > This would be because NMI IPI has a risk of deadlock. We checked if > > > the kdump path has a risk of deadlock in the case of NMI panic and > > > fixed it. But I'm not sure about normal panic path. I agree with > > > that use smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. > > > > What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable: > > if (!crash_kexec_post_notifiers) > > smp_send_stop() > > Remaining procedures including atomic_notifier_call_chain and > kmsg_dump may assume that other CPUs have stopped. Actually, > IPMI driver callback assumes so that. Also, other CPUs may Ok, if so please ignore my previous suggestion. > change the current state while calling these callbacks and make > the dump analysis difficult. > > [snip] > > Best regards, > > Hidehiro Kawai > Thanks Dave
Hi, > From: 'Dave Young' [mailto:dyoung@redhat.com] > Sent: Monday, July 18, 2016 6:02 PM > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi Dave, > > > > Thanks for your reply. > > > > > From: 'Dave Young' [mailto:dyoung@redhat.com] > > > Sent: Wednesday, July 13, 2016 11:04 AM > > > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > > > > Hi Dave, > > > > > > > > Thanks for the comments. > > > > > > > > > From: Dave Young [mailto:dyoung@redhat.com] > > > > > Sent: Monday, July 11, 2016 5:35 PM > > > > > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: [snip] > > > > > As for this patch I'm not sure it is safe to replace the > > > > > smp_send_stop with the kdump friendly function. I'm also not sure if > > > > > the kdump friendly function is safe for kdump. Will glad to hear > > > > > opinions from other arch experts. > > > > > > > > This stuff depends on architectures, so I speak only about > > > > x86 (the logic doesn't change on other architectures at this time). > > > > > > > > kdump path with crash_kexec_post_notifiers disabled: > > > > panic() > > > > __crash_kexec() > > > > crash_setup_regs() > > > > crash_save_vmcoreinfo() > > > > machine_crash_shutdown() > > > > native_machine_crash_shutdown() > > > > panic_smp_send_stop() /* mostly same as original > > > > * kdump_nmi_shootdown_cpus() > > > > */ > > > > > > > > kdump path with crash_kexec_post_notifiers enabled: > > > > panic() > > > > panic_smp_send_stop() > > > > __crash_kexec() > > > > crash_setup_regs() > > > > crash_save_vmcoreinfo() > > > > machine_crash_shutdown() > > > > native_machine_crash_shutdown() > > > > panic_smp_send_stop() // do nothing > > > > > > > > The difference is that stopping other CPUs before crash_setup_regs() > > > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > > > > crash_save_vmcoreinfo() just save information to some memory area, > > > > they wouldn't be affected by panic_smp_send_stop(). This means > > > > placing panic_smp_send_stop before __crash_kexec is safe. > > > > > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next > > > > version. > > > > > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. > > > > As I mentioned in the description of this patch, we should stop > > other CPUs ASAP to preserve current state either > > crash_kexec_post_notifiers is enabled or not. > > Then, all remaining procedures should work well > > after stopping other CPUs (but keep the CPU map online). > > > > Vivek also mentioned similar things: > > https://lkml.org/lkml/2015/7/14/433 > > The implementation in this patchset is different from suggestion in above link? > > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below: > > stop_cpus_save_register_state; > > if (!crash_kexec_post_notifiers) > crash_kexec() > atomic_notifier_call_chain() > kmsg_dump() > > I'm just commenting from code flow point of view, the detail implementation > definitely need more comments from Arch experts. > > Any reason did not move the kdump friendly function to earlier point like > before previous __crash_kexec() below? > if (!crash_kexec_post_notifiers) { > printk_nmi_flush_on_panic(); > __crash_kexec(NULL); > } The reason why the implementation differs from Vivek's is to keep the current code flow if crash_kexec_post_notifiers is not specified. If we apply Vivek's or your suggestion, it may always cause kdump to fail on MIPS OCTEON due to Problem 1. I don't want to make things any worse. I may post a patch for MIPS OCTEON, but I can't test it. For other architectures, I'm not sure what problems there are. So at first, I want to fix the case where crash_kexec_post_notifiers is specified on x86. Then, if all other architectures support `stop other CPUs before crash_kexec', switch to your or Vivek's suggesting code. Is this acceptable? Best regards, Hidehiro Kawai Hitachi, Ltd. Research & Development Group
Hi,
On 07/19/16 at 05:51am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> Hi,
>
> > From: 'Dave Young' [mailto:dyoung@redhat.com]
> > Sent: Monday, July 18, 2016 6:02 PM
> > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > Hi Dave,
> > >
> > > Thanks for your reply.
> > >
> > > > From: 'Dave Young' [mailto:dyoung@redhat.com]
> > > > Sent: Wednesday, July 13, 2016 11:04 AM
> > > >
> > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > > Hi Dave,
> > > > >
> > > > > Thanks for the comments.
> > > > >
> > > > > > From: Dave Young [mailto:dyoung@redhat.com]
> > > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > > >
> > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> [snip]
> > > > > > As for this patch I'm not sure it is safe to replace the
> > > > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > > opinions from other arch experts.
> > > > >
> > > > > This stuff depends on architectures, so I speak only about
> > > > > x86 (the logic doesn't change on other architectures at this time).
> > > > >
> > > > > kdump path with crash_kexec_post_notifiers disabled:
> > > > > panic()
> > > > > __crash_kexec()
> > > > > crash_setup_regs()
> > > > > crash_save_vmcoreinfo()
> > > > > machine_crash_shutdown()
> > > > > native_machine_crash_shutdown()
> > > > > panic_smp_send_stop() /* mostly same as original
> > > > > * kdump_nmi_shootdown_cpus()
> > > > > */
> > > > >
> > > > > kdump path with crash_kexec_post_notifiers enabled:
> > > > > panic()
> > > > > panic_smp_send_stop()
> > > > > __crash_kexec()
> > > > > crash_setup_regs()
> > > > > crash_save_vmcoreinfo()
> > > > > machine_crash_shutdown()
> > > > > native_machine_crash_shutdown()
> > > > > panic_smp_send_stop() // do nothing
> > > > >
> > > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and
> > > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > > they wouldn't be affected by panic_smp_send_stop(). This means
> > > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > > >
> > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next
> > > > > version.
> > > >
> > > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log.
> > >
> > > As I mentioned in the description of this patch, we should stop
> > > other CPUs ASAP to preserve current state either
> > > crash_kexec_post_notifiers is enabled or not.
> > > Then, all remaining procedures should work well
> > > after stopping other CPUs (but keep the CPU map online).
> > >
> > > Vivek also mentioned similar things:
> > > https://lkml.org/lkml/2015/7/14/433
> >
> > The implementation in this patchset is different from suggestion in above link?
> >
> > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:
> >
> > stop_cpus_save_register_state;
> >
> > if (!crash_kexec_post_notifiers)
> > crash_kexec()
> > atomic_notifier_call_chain()
> > kmsg_dump()
> >
> > I'm just commenting from code flow point of view, the detail implementation
> > definitely need more comments from Arch experts.
> >
> > Any reason did not move the kdump friendly function to earlier point like
> > before previous __crash_kexec() below?
> > if (!crash_kexec_post_notifiers) {
> > printk_nmi_flush_on_panic();
> > __crash_kexec(NULL);
> > }
>
> The reason why the implementation differs from Vivek's is to keep
> the current code flow if crash_kexec_post_notifiers is not specified.
>
> If we apply Vivek's or your suggestion, it may always cause kdump
> to fail on MIPS OCTEON due to Problem 1. I don't want to make things
> any worse. I may post a patch for MIPS OCTEON, but I can't test it.
> For other architectures, I'm not sure what problems there are.
> So at first, I want to fix the case where crash_kexec_post_notifiers is
> specified on x86. Then, if all other architectures support
> `stop other CPUs before crash_kexec', switch to your or Vivek's
> suggesting code.
>
> Is this acceptable?
Maybe you can find someone who can test MIPS OCTEON so that they can give
some thoughts first and maybe test a fix?
[dyoung@localhost linux]$ ./scripts/get_maintainer.pl -f arch/mips/cavium-octeon
Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS,commit_signer:32/35=91%)
David Daney <david.daney@cavium.com> (commit_signer:21/35=60%,authored:8/35=23%)
Aaro Koskinen <aaro.koskinen@iki.fi> (commit_signer:15/35=43%,authored:8/35=23%)
Janne Huttunen <janne.huttunen@nokia.com>
(commit_signer:7/35=20%,authored:7/35=20%)
Thomas Gleixner <tglx@linutronix.de> (commit_signer:4/35=11%,authored:2/35=6%)
linux-mips@linux-mips.org (open list:MIPS)
linux-kernel@vger.kernel.org (open list)
Thanks
Dave
> From: 'Dave Young' [mailto:dyoung@redhat.com]
> Sent: Tuesday, July 19, 2016 3:52 PM
> Hi,
> On 07/19/16 at 05:51am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > Hi,
> >
> > > From: 'Dave Young' [mailto:dyoung@redhat.com]
> > > Sent: Monday, July 18, 2016 6:02 PM
> > > On 07/15/16 at 11:50am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > Hi Dave,
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > From: 'Dave Young' [mailto:dyoung@redhat.com]
> > > > > Sent: Wednesday, July 13, 2016 11:04 AM
> > > > >
> > > > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote:
> > > > > > Hi Dave,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > > From: Dave Young [mailto:dyoung@redhat.com]
> > > > > > > Sent: Monday, July 11, 2016 5:35 PM
> > > > > > >
> > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote:
> > [snip]
> > > > > > > As for this patch I'm not sure it is safe to replace the
> > > > > > > smp_send_stop with the kdump friendly function. I'm also not sure if
> > > > > > > the kdump friendly function is safe for kdump. Will glad to hear
> > > > > > > opinions from other arch experts.
> > > > > >
> > > > > > This stuff depends on architectures, so I speak only about
> > > > > > x86 (the logic doesn't change on other architectures at this time).
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers disabled:
> > > > > > panic()
> > > > > > __crash_kexec()
> > > > > > crash_setup_regs()
> > > > > > crash_save_vmcoreinfo()
> > > > > > machine_crash_shutdown()
> > > > > > native_machine_crash_shutdown()
> > > > > > panic_smp_send_stop() /* mostly same as original
> > > > > > * kdump_nmi_shootdown_cpus()
> > > > > > */
> > > > > >
> > > > > > kdump path with crash_kexec_post_notifiers enabled:
> > > > > > panic()
> > > > > > panic_smp_send_stop()
> > > > > > __crash_kexec()
> > > > > > crash_setup_regs()
> > > > > > crash_save_vmcoreinfo()
> > > > > > machine_crash_shutdown()
> > > > > > native_machine_crash_shutdown()
> > > > > > panic_smp_send_stop() // do nothing
> > > > > >
> > > > > > The difference is that stopping other CPUs before crash_setup_regs()
> > > > > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and
> > > > > > crash_save_vmcoreinfo() just save information to some memory area,
> > > > > > they wouldn't be affected by panic_smp_send_stop(). This means
> > > > > > placing panic_smp_send_stop before __crash_kexec is safe.
> > > > > >
> > > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next
> > > > > > version.
> > > > >
> > > > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log.
> > > >
> > > > As I mentioned in the description of this patch, we should stop
> > > > other CPUs ASAP to preserve current state either
> > > > crash_kexec_post_notifiers is enabled or not.
> > > > Then, all remaining procedures should work well
> > > > after stopping other CPUs (but keep the CPU map online).
> > > >
> > > > Vivek also mentioned similar things:
> > > > https://lkml.org/lkml/2015/7/14/433
> > >
> > > The implementation in this patchset is different from suggestion in above link?
> > >
> > > I think Vivek's suggestion is a good idea, to drop smp_send_stop and do below:
> > >
> > > stop_cpus_save_register_state;
> > >
> > > if (!crash_kexec_post_notifiers)
> > > crash_kexec()
> > > atomic_notifier_call_chain()
> > > kmsg_dump()
> > >
> > > I'm just commenting from code flow point of view, the detail implementation
> > > definitely need more comments from Arch experts.
> > >
> > > Any reason did not move the kdump friendly function to earlier point like
> > > before previous __crash_kexec() below?
> > > if (!crash_kexec_post_notifiers) {
> > > printk_nmi_flush_on_panic();
> > > __crash_kexec(NULL);
> > > }
> >
> > The reason why the implementation differs from Vivek's is to keep
> > the current code flow if crash_kexec_post_notifiers is not specified.
> >
> > If we apply Vivek's or your suggestion, it may always cause kdump
> > to fail on MIPS OCTEON due to Problem 1. I don't want to make things
> > any worse. I may post a patch for MIPS OCTEON, but I can't test it.
> > For other architectures, I'm not sure what problems there are.
> > So at first, I want to fix the case where crash_kexec_post_notifiers is
> > specified on x86. Then, if all other architectures support
> > `stop other CPUs before crash_kexec', switch to your or Vivek's
> > suggesting code.
> >
> > Is this acceptable?
>
> Maybe you can find someone who can test MIPS OCTEON so that they can give
> some thoughts first and maybe test a fix?
>
> [dyoung@localhost linux]$ ./scripts/get_maintainer.pl -f arch/mips/cavium-octeon
> Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS,commit_signer:32/35=91%)
> David Daney <david.daney@cavium.com> (commit_signer:21/35=60%,authored:8/35=23%)
> Aaro Koskinen <aaro.koskinen@iki.fi> (commit_signer:15/35=43%,authored:8/35=23%)
> Janne Huttunen <janne.huttunen@nokia.com>
> (commit_signer:7/35=20%,authored:7/35=20%)
> Thomas Gleixner <tglx@linutronix.de> (commit_signer:4/35=11%,authored:2/35=6%)
> linux-mips@linux-mips.org (open list:MIPS)
> linux-kernel@vger.kernel.org (open list)
So I'll try to fix for MIPS OCTEON, but I'm going to keep the current
behavior when crash_kexec_post_notifiers is not specified because
I'm not sure what will happen on other architectures.
Best regards,
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group