All of lore.kernel.org
 help / color / mirror / Atom feed
From: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>
To: "'Borislav Petkov'" <bp@alien8.de>
Cc: "Jonathan Corbet" <corbet@lwn.net>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vivek Goyal" <vgoyal@redhat.com>, "Baoquan He" <bhe@redhat.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"平松雅巳 / HIRAMATU,MASAMI" <masami.hiramatsu.pt@hitachi.com>
Subject: RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
Date: Wed, 25 Nov 2015 05:51:59 +0000	[thread overview]
Message-ID: <04EAB7311EE43145B2D3536183D1A84454A2A425@GSjpTKYDCembx31.service.hitachi.net> (raw)
In-Reply-To: <20151124104853.GC3785@pd.tnic>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 11812 bytes --]

> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
> 		 ...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >      cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
     Ex.
     CPU 0                     CPU 1
     =================         =================
     panic()
       panic_cpu <-- 0
       check panic_cpu
       crash_kexec()
                               receive an unknown NMI
                               unknown_nmi_error()
                                 nmi_panic()
                                   panic()
                                     check panic_cpu
                                     panic_smp_self_stop()
                                       infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
     cpu is processing panic on NMI
     Ex.
     CPU 0                     CPU 1
     ======================    ==================
     receive an unknown NMI
     unknown_nmi_error()
       nmi_panic()             receive an unknown NMI
         panic_cpu <-- 0       unknown_nmi_error()
         panic()                 nmi_panic()
           check panic_cpu         panic
           crash_kexec()             check panic_cpu
                                     panic_smp_self_stop()
                                       infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >    done by panic_smp_self_stop()) outside of panic() to enable us to
> >    refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >    registers and do some cleanups after setting waiting_for_crash_ipi
> >    which is used for counting down the number of cpus which handled
> >    the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > ---
> >  arch/x86/include/asm/reboot.h |    1 +
> >  arch/x86/kernel/nmi.c         |   17 +++++++++++++----
> >  arch/x86/kernel/reboot.c      |   28 ++++++++++++++++++++++++++++
> >  include/linux/kernel.h        |   12 ++++++++++--
> >  kernel/panic.c                |   10 ++++++++++
> >  kernel/watchdog.c             |    2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 5131714..74a1434 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -29,6 +29,7 @@
> >  #include <asm/mach_traps.h>
> >  #include <asm/nmi.h>
> >  #include <asm/x86_init.h>
> > +#include <asm/reboot.h>
> >
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/nmi.h>
> > @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
> >  #endif
> >
> >  	if (panic_on_unrecovered_nmi)
> > -		nmi_panic("NMI: Not continuing");
> > +		nmi_panic(regs, "NMI: Not continuing");
> >
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> >
> > @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
> >  	show_regs(regs);
> >
> >  	if (panic_on_io_nmi) {
> > -		nmi_panic("NMI IOCK error: Not continuing");
> > +		nmi_panic(regs, "NMI IOCK error: Not continuing");
> >
> >  		/*
> >  		 * If we return from nmi_panic(), it means we have received
> > @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >
> >  	pr_emerg("Do you have a strange power saving mode enabled?\n");
> >  	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> > -		nmi_panic("NMI: Not continuing");
> > +		nmi_panic(regs, "NMI: Not continuing");
> >
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> > @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
> >  	}
> >
> >  	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> > -	raw_spin_lock(&nmi_reason_lock);
> > +
> > +	/*
> > +	 * Another CPU may be processing panic routines with holding
> 
> 							while

I'll fix it.

> > +	 * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> > +	 * and call the callback directly.
> 
> This is one strange sentence. Does it mean something like:
> 
> "Check if the panicking CPU issued the IPI and, if so, call the crash
> callback directly."
> 
> ?

Yes.  Thanks for the suggestion.

> > +	 */
> > +	while (!raw_spin_trylock(&nmi_reason_lock))
> > +		poll_crash_ipi_and_callback(regs);
> 
> Waaait a minute: so if we're getting NMIs broadcasted on every core but
> we're *not* crash dumping, we will run into here too. This can't be
> right. :-\

As Steven commented, poll_crash_ipi_and_callback() does nothing
if we're not crash dumping.  But yes, this is confusing.  I'll add
more detailed comment, or change the logic a bit if I come up with
better one.

> > +
> >  	reason = x86_platform.get_nmi_reason();
> >
> >  	if (reason & NMI_REASON_MASK) {
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 02693dd..44c5f5b 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ static int crashing_cpu;
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >
> >  	smp_send_nmi_allbutself();
> >
> > +	/* Kick cpus looping in nmi context. */
> > +	WRITE_ONCE(crash_ipi_done, 1);
> > +
> >  	msecs = 1000; /* Wait at most a second for the other cpus to stop */
> >  	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> >  		mdelay(1);
> > @@ -788,9 +792,33 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >
> >  	/* Leave the nmi callback set */
> >  }
> > +
> > +/*
> > + * Wait for the timing of IPI for crash dumping, and then call its callback
> 
> "Wait for the crash dumping IPI to complete... "

So, I think "Wait for the crash dumping IPI to be issued..." is better.
"complete" would be ambiguous in this context.

> > + * directly.  This function is used when we have already been in NMI handler.
> > + */
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> 
> Why "poll"? We won't return from crash_nmi_callback() if we're not the
> crashing CPU.

This function polls that crash IPI has been issued by checking
crash_ipi_done, then invokes the callback.  This is different
from so-called "poll" functions.  Do you have some good name?

> > +{
> > +	if (crash_ipi_done)
> > +		crash_nmi_callback(0, regs); /* Shouldn't return */
> > +}
> > +
> > +/* Override the weak function in kernel/panic.c */
> > +void nmi_panic_self_stop(struct pt_regs *regs)
> > +{
> > +	while (1) {
> > +		poll_crash_ipi_and_callback(regs);
> > +		cpu_relax();
> > +	}
> > +}
> > +
> >  #else /* !CONFIG_SMP */
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >  {
> >  	/* No other CPUs to shoot down */
> >  }
> > +
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> > +{
> > +}
> >  #endif
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 480a4fd..728a31b 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -255,6 +255,7 @@ extern long (*panic_blink)(int state);
> >  __printf(1, 2)
> >  void panic(const char *fmt, ...)
> >  	__noreturn __cold;
> > +void nmi_panic_self_stop(struct pt_regs *);
> >  extern void oops_enter(void);
> >  extern void oops_exit(void);
> >  void print_oops_end_marker(void);
> > @@ -450,12 +451,19 @@ extern atomic_t panic_cpu;
> >  /*
> >   * A variant of panic() called from NMI context.
> >   * If we've already panicked on this cpu, return from here.
> > + * If another cpu already panicked, loop in nmi_panic_self_stop() which
> > + * can provide architecture dependent code such as saving register states
> > + * for crash dump.
> >   */
> > -#define nmi_panic(fmt, ...)						\
> > +#define nmi_panic(regs, fmt, ...)					\
> >  	do {								\
> > +		int old_cpu;						\
> >  		int this_cpu = raw_smp_processor_id();			\
> > -		if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> > +		old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);	\
> > +		if (old_cpu == -1)					\
> >  			panic(fmt, ##__VA_ARGS__);			\
> > +		else if (old_cpu != this_cpu)				\
> > +			nmi_panic_self_stop(regs);			\
> 
> Same here - this is assuming that broadcasting NMIs to all cores
> automatically means there's a crash dump in progress:
> 
> nmi_panic_self_stop() -> poll_crash_ipi_and_callback()
> 
> and this cannot be right.
> 
> >  	} while (0)
> >
> >  /*
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 24ee2ea..4fce2be 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -61,6 +61,16 @@ void __weak panic_smp_self_stop(void)
> >  		cpu_relax();
> >  }
> >
> > +/*
> > + * Stop ourself in NMI context if another cpu has already panicked.
> 
> 	  "ourselves"

Thanks. I'll fix it.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

WARNING: multiple messages have this Message-ID (diff)
From: "河合英宏 / KAWAI,HIDEHIRO" <hidehiro.kawai.ez@hitachi.com>
To: 'Borislav Petkov' <bp@alien8.de>
Cc: "x86@kernel.org" <x86@kernel.org>, "Baoquan He" <bhe@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"平松雅巳 / HIRAMATU,MASAMI" <masami.hiramatsu.pt@hitachi.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Vivek Goyal" <vgoyal@redhat.com>
Subject: RE: [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
Date: Wed, 25 Nov 2015 05:51:59 +0000	[thread overview]
Message-ID: <04EAB7311EE43145B2D3536183D1A84454A2A425@GSjpTKYDCembx31.service.hitachi.net> (raw)
In-Reply-To: <20151124104853.GC3785@pd.tnic>

> On Fri, Nov 20, 2015 at 06:36:46PM +0900, Hidehiro Kawai wrote:
> > nmi_shootdown_cpus(), a subroutine of crash_kexec(), sends NMI IPI
> > to non-panic cpus to stop them while saving their register
> 
> 		 ...to stop them and save their register...

Thanks for the correction.

> > information and doing some cleanups for crash dumping.  So if a
> > non-panic cpus is infinitely looping in NMI context, we fail to
> 
> That should be CPU. Please use "CPU" instead of "cpu" in all your text
> in your next submission.

OK, I'll fix that.

> > save its register information and lose the information from the
> > crash dump.
> >
> > `Infinite loop in NMI context' can happen:
> >
> >   a. when a cpu panics on NMI while another cpu is processing panic
> >   b. when a cpu received an external or unknown NMI while another
> >      cpu is processing panic on NMI
> >
> > In the case of a, it loops in panic_smp_self_stop().  In the case
> > of b, it loops in raw_spin_lock() of nmi_reason_lock.
> 
> Please describe those two cases more verbosely - it takes slow people
> like me a while to figure out what exactly can happen.

  a. when a cpu panics on NMI while another cpu is processing panic
     Ex.
     CPU 0                     CPU 1
     =================         =================
     panic()
       panic_cpu <-- 0
       check panic_cpu
       crash_kexec()
                               receive an unknown NMI
                               unknown_nmi_error()
                                 nmi_panic()
                                   panic()
                                     check panic_cpu
                                     panic_smp_self_stop()
                                       infinite loop in NMI context

  b. when a cpu received an external or unknown NMI while another
     cpu is processing panic on NMI
     Ex.
     CPU 0                     CPU 1
     ======================    ==================
     receive an unknown NMI
     unknown_nmi_error()
       nmi_panic()             receive an unknown NMI
         panic_cpu <-- 0       unknown_nmi_error()
         panic()                 nmi_panic()
           check panic_cpu         panic
           crash_kexec()             check panic_cpu
                                     panic_smp_self_stop()
                                       infinite loop in NMI context
 
> > This can
> > happen on some servers which broadcasts NMIs to all CPUs when a dump
> > button is pushed.
> >
> > To save registers in these case too, this patch does following things:
> >
> > 1. Move the timing of `infinite loop in NMI context' (actually
> >    done by panic_smp_self_stop()) outside of panic() to enable us to
> >    refer pt_regs
> 
> I can't parse that sentence. And I really tried :-\
> panic_smp_self_stop() is still in panic().

panic_smp_self_stop() is still used when a CPU in normal context
should go into infinite loop.  Only when a CPU is in NMI context,
nmi_panic_self_stop() is called and the CPU loops infinitely
without entering panic().

I'll try to revise this sentense.

> > 2. call a callback of nmi_shootdown_cpus() directly to save
> >    registers and do some cleanups after setting waiting_for_crash_ipi
> >    which is used for counting down the number of cpus which handled
> >    the callback
> >
> > V5:
> > - Use WRITE_ONCE() when setting crash_ipi_done to 1 so that the
> >   compiler doesn't change the instruction order
> > - Support the case of b in the above description
> > - Add poll_crash_ipi_and_callback()
> >
> > V4:
> > - Rewrite the patch description
> >
> > V3:
> > - Newly introduced
> >
> > Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: Vivek Goyal <vgoyal@redhat.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > ---
> >  arch/x86/include/asm/reboot.h |    1 +
> >  arch/x86/kernel/nmi.c         |   17 +++++++++++++----
> >  arch/x86/kernel/reboot.c      |   28 ++++++++++++++++++++++++++++
> >  include/linux/kernel.h        |   12 ++++++++++--
> >  kernel/panic.c                |   10 ++++++++++
> >  kernel/watchdog.c             |    2 +-
> >  6 files changed, 63 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
> > index a82c4f1..964e82f 100644
> > --- a/arch/x86/include/asm/reboot.h
> > +++ b/arch/x86/include/asm/reboot.h
> > @@ -25,5 +25,6 @@ void __noreturn machine_real_restart(unsigned int type);
> >
> >  typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback);
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs);
> >
> >  #endif /* _ASM_X86_REBOOT_H */
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 5131714..74a1434 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -29,6 +29,7 @@
> >  #include <asm/mach_traps.h>
> >  #include <asm/nmi.h>
> >  #include <asm/x86_init.h>
> > +#include <asm/reboot.h>
> >
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/nmi.h>
> > @@ -231,7 +232,7 @@ pci_serr_error(unsigned char reason, struct pt_regs *regs)
> >  #endif
> >
> >  	if (panic_on_unrecovered_nmi)
> > -		nmi_panic("NMI: Not continuing");
> > +		nmi_panic(regs, "NMI: Not continuing");
> >
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> >
> > @@ -256,7 +257,7 @@ io_check_error(unsigned char reason, struct pt_regs *regs)
> >  	show_regs(regs);
> >
> >  	if (panic_on_io_nmi) {
> > -		nmi_panic("NMI IOCK error: Not continuing");
> > +		nmi_panic(regs, "NMI IOCK error: Not continuing");
> >
> >  		/*
> >  		 * If we return from nmi_panic(), it means we have received
> > @@ -305,7 +306,7 @@ unknown_nmi_error(unsigned char reason, struct pt_regs *regs)
> >
> >  	pr_emerg("Do you have a strange power saving mode enabled?\n");
> >  	if (unknown_nmi_panic || panic_on_unrecovered_nmi)
> > -		nmi_panic("NMI: Not continuing");
> > +		nmi_panic(regs, "NMI: Not continuing");
> >
> >  	pr_emerg("Dazed and confused, but trying to continue\n");
> >  }
> > @@ -357,7 +358,15 @@ static void default_do_nmi(struct pt_regs *regs)
> >  	}
> >
> >  	/* Non-CPU-specific NMI: NMI sources can be processed on any CPU */
> > -	raw_spin_lock(&nmi_reason_lock);
> > +
> > +	/*
> > +	 * Another CPU may be processing panic routines with holding
> 
> 							while

I'll fix it.

> > +	 * nmi_reason_lock.  Check IPI issuance from the panicking CPU
> > +	 * and call the callback directly.
> 
> This is one strange sentence. Does it mean something like:
> 
> "Check if the panicking CPU issued the IPI and, if so, call the crash
> callback directly."
> 
> ?

Yes.  Thanks for the suggestion.

> > +	 */
> > +	while (!raw_spin_trylock(&nmi_reason_lock))
> > +		poll_crash_ipi_and_callback(regs);
> 
> Waaait a minute: so if we're getting NMIs broadcasted on every core but
> we're *not* crash dumping, we will run into here too. This can't be
> right. :-\

As Steven commented, poll_crash_ipi_and_callback() does nothing
if we're not crash dumping.  But yes, this is confusing.  I'll add
more detailed comment, or change the logic a bit if I come up with
better one.

> > +
> >  	reason = x86_platform.get_nmi_reason();
> >
> >  	if (reason & NMI_REASON_MASK) {
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 02693dd..44c5f5b 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -718,6 +718,7 @@ static int crashing_cpu;
> >  static nmi_shootdown_cb shootdown_callback;
> >
> >  static atomic_t waiting_for_crash_ipi;
> > +static int crash_ipi_done;
> >
> >  static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
> >  {
> > @@ -780,6 +781,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >
> >  	smp_send_nmi_allbutself();
> >
> > +	/* Kick cpus looping in nmi context. */
> > +	WRITE_ONCE(crash_ipi_done, 1);
> > +
> >  	msecs = 1000; /* Wait at most a second for the other cpus to stop */
> >  	while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> >  		mdelay(1);
> > @@ -788,9 +792,33 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >
> >  	/* Leave the nmi callback set */
> >  }
> > +
> > +/*
> > + * Wait for the timing of IPI for crash dumping, and then call its callback
> 
> "Wait for the crash dumping IPI to complete... "

So, I think "Wait for the crash dumping IPI to be issued..." is better.
"complete" would be ambiguous in this context.

> > + * directly.  This function is used when we have already been in NMI handler.
> > + */
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> 
> Why "poll"? We won't return from crash_nmi_callback() if we're not the
> crashing CPU.

This function polls that crash IPI has been issued by checking
crash_ipi_done, then invokes the callback.  This is different
from so-called "poll" functions.  Do you have some good name?

> > +{
> > +	if (crash_ipi_done)
> > +		crash_nmi_callback(0, regs); /* Shouldn't return */
> > +}
> > +
> > +/* Override the weak function in kernel/panic.c */
> > +void nmi_panic_self_stop(struct pt_regs *regs)
> > +{
> > +	while (1) {
> > +		poll_crash_ipi_and_callback(regs);
> > +		cpu_relax();
> > +	}
> > +}
> > +
> >  #else /* !CONFIG_SMP */
> >  void nmi_shootdown_cpus(nmi_shootdown_cb callback)
> >  {
> >  	/* No other CPUs to shoot down */
> >  }
> > +
> > +void poll_crash_ipi_and_callback(struct pt_regs *regs)
> > +{
> > +}
> >  #endif
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 480a4fd..728a31b 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -255,6 +255,7 @@ extern long (*panic_blink)(int state);
> >  __printf(1, 2)
> >  void panic(const char *fmt, ...)
> >  	__noreturn __cold;
> > +void nmi_panic_self_stop(struct pt_regs *);
> >  extern void oops_enter(void);
> >  extern void oops_exit(void);
> >  void print_oops_end_marker(void);
> > @@ -450,12 +451,19 @@ extern atomic_t panic_cpu;
> >  /*
> >   * A variant of panic() called from NMI context.
> >   * If we've already panicked on this cpu, return from here.
> > + * If another cpu already panicked, loop in nmi_panic_self_stop() which
> > + * can provide architecture dependent code such as saving register states
> > + * for crash dump.
> >   */
> > -#define nmi_panic(fmt, ...)						\
> > +#define nmi_panic(regs, fmt, ...)					\
> >  	do {								\
> > +		int old_cpu;						\
> >  		int this_cpu = raw_smp_processor_id();			\
> > -		if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
> > +		old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);	\
> > +		if (old_cpu == -1)					\
> >  			panic(fmt, ##__VA_ARGS__);			\
> > +		else if (old_cpu != this_cpu)				\
> > +			nmi_panic_self_stop(regs);			\
> 
> Same here - this is assuming that broadcasting NMIs to all cores
> automatically means there's a crash dump in progress:
> 
> nmi_panic_self_stop() -> poll_crash_ipi_and_callback()
> 
> and this cannot be right.
> 
> >  	} while (0)
> >
> >  /*
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 24ee2ea..4fce2be 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -61,6 +61,16 @@ void __weak panic_smp_self_stop(void)
> >  		cpu_relax();
> >  }
> >
> > +/*
> > + * Stop ourself in NMI context if another cpu has already panicked.
> 
> 	  "ourselves"

Thanks. I'll fix it.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  parent reply	other threads:[~2015-11-25  5:52 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20  9:36 [V5 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec Hidehiro Kawai
2015-11-20  9:36 ` Hidehiro Kawai
2015-11-20  9:36 ` [V5 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI Hidehiro Kawai
2015-11-20  9:36   ` Hidehiro Kawai
2015-11-23 18:49   ` Borislav Petkov
2015-11-23 18:49     ` Borislav Petkov
2015-11-24  4:06     ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-24  4:06       ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-24 12:45   ` Michal Hocko
2015-11-24 12:45     ` Michal Hocko
2015-11-24 15:05   ` Steven Rostedt
2015-11-24 15:05     ` Steven Rostedt
2015-11-24 15:12     ` Steven Rostedt
2015-11-24 15:12       ` Steven Rostedt
2015-11-24 20:27     ` Michal Hocko
2015-11-24 20:27       ` Michal Hocko
2015-11-24 20:45       ` Steven Rostedt
2015-11-24 20:45         ` Steven Rostedt
2015-11-20  9:36 ` [V5 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context Hidehiro Kawai
2015-11-20  9:36   ` Hidehiro Kawai
2015-11-24 10:48   ` Borislav Petkov
2015-11-24 10:48     ` Borislav Petkov
2015-11-24 19:37     ` Steven Rostedt
2015-11-24 19:37       ` Steven Rostedt
2015-11-24 20:16       ` Borislav Petkov
2015-11-24 20:16         ` Borislav Petkov
2015-11-25  5:57       ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  5:57         ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  5:51     ` 河合英宏 / KAWAI,HIDEHIRO [this message]
2015-11-25  5:51       ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  8:56       ` Borislav Petkov
2015-11-25  8:56         ` Borislav Petkov
2015-11-25  9:46         ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  9:46           ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  9:57           ` Borislav Petkov
2015-11-25  9:57             ` Borislav Petkov
2015-11-25 15:11             ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 15:11               ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-24 12:58   ` Michal Hocko
2015-11-24 12:58     ` Michal Hocko
2015-12-03  2:23   ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03  2:23     ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-20  9:36 ` [V5 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly Hidehiro Kawai
2015-11-20  9:36   ` Hidehiro Kawai
2015-11-24 13:05   ` Michal Hocko
2015-11-24 13:05     ` Michal Hocko
2015-11-24 20:35   ` Steven Rostedt
2015-11-24 20:35     ` Steven Rostedt
2015-11-25  6:28     ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  6:28       ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25  9:54   ` Borislav Petkov
2015-11-25  9:54     ` Borislav Petkov
2015-12-02 11:57     ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-02 11:57       ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-02 15:40       ` Borislav Petkov
2015-12-02 15:40         ` Borislav Petkov
2015-12-03  2:01         ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03  2:01           ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03  9:35           ` Borislav Petkov
2015-12-03  9:35             ` Borislav Petkov
2015-12-03 11:29             ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03 11:29               ` 河合英宏 / KAWAI,HIDEHIRO
2015-12-03 12:22               ` Borislav Petkov
2015-12-03 12:22                 ` Borislav Petkov
2015-11-20  9:36 ` [V5 PATCH 4/4] x86/apic: Introduce apic_extnmi boot option Hidehiro Kawai
2015-11-20  9:36   ` Hidehiro Kawai
2015-11-25 11:49   ` Borislav Petkov
2015-11-25 11:49     ` Borislav Petkov
2015-11-25 15:29     ` 河合英宏 / KAWAI,HIDEHIRO
2015-11-25 15:29       ` 河合英宏 / KAWAI,HIDEHIRO

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=04EAB7311EE43145B2D3536183D1A84454A2A425@GSjpTKYDCembx31.service.hitachi.net \
    --to=hidehiro.kawai.ez@hitachi.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.