All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback
@ 2017-12-13  8:08 Balbir Singh
  2017-12-13  8:08 ` [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's Balbir Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2017-12-13  8:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin, mpe, Balbir Singh

Our check was extra cautious, we've audited crash_send_ipi
and it sends an IPI only to online CPU's. Removal of this
check should have not functional impact on crash kdump.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/kernel/crash.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index cbabb5adccd9..29c56ca2ddfd 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -69,9 +69,6 @@ static void crash_ipi_callback(struct pt_regs *regs)
 
 	int cpu = smp_processor_id();
 
-	if (!cpu_online(cpu))
-		return;
-
 	hard_irq_disable();
 	if (!cpumask_test_cpu(cpu, &cpus_state_saved)) {
 		crash_save_cpu(regs, cpu);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-13  8:08 [PATCH 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback Balbir Singh
@ 2017-12-13  8:08 ` Balbir Singh
  2017-12-13 10:51   ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2017-12-13  8:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: npiggin, mpe, Balbir Singh

Certain HMI's such as malfunction error propagate through
all threads/core on the system. If a thread was offline
prior to us crashing the system and jumping to the kdump
kernel, bad things happen when it wakes up due to an HMI
in the kdump kernel.

There are several possible ways to solve this problem

1. Put the offline cores in a state such that they are
not woken up for machine check and HMI errors. This
does not work, since we might need to wake up offline
threads to handle TB errors
2. Ignore HMI errors, setup HMEER to mask HMI errors,
but this still leads the window open for any MCEs
and masking them for the duration of the dump might
be a concern
3. Wake up offline CPUs, as in send them to
crash_ipi_callback (not wake them up as in mark them
online as seen by the hotplug). kexec does a
wake_online_cpus() call, this patch does something
similar, but instead sends an IPI and forces them to
crash_ipi_callback()

This patch takes approach #3.

Care is taken to enable this only for powenv platforms
via crash_wake_offline (a global value set at setup
time). The crash code sends out IPI's to all CPU's
which then move to crash_ipi_callback and kexec_smp_wait().

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---

Changelog
 - Michael Ellerman recommended refactoring
   the code so as to not modify major portions of the
   NMI handling/IPI code.

 arch/powerpc/include/asm/kexec.h     |  2 ++
 arch/powerpc/kernel/crash.c          | 13 ++++++++++++-
 arch/powerpc/kernel/smp.c            | 18 ++++++++++++++++++
 arch/powerpc/platforms/powernv/smp.c | 17 +++++++++++++++++
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 4419d435639a..9dcbfa6bbb91 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -73,6 +73,8 @@ extern void kexec_smp_wait(void);	/* get and clear naca physid, wait for
 					  master to copy new code to 0 */
 extern int crashing_cpu;
 extern void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *));
+extern void crash_ipi_callback(struct pt_regs *);
+extern int crash_wake_offline;
 
 struct kimage;
 struct pt_regs;
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index 29c56ca2ddfd..00b215125d3e 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -44,6 +44,14 @@
 #define REAL_MODE_TIMEOUT	10000
 
 static int time_to_dump;
+/*
+ * crash_wake_offline should be set to 1 by platforms that intend to wake
+ * up offline cpus prior to jumping to a kdump kernel. Currently powernv
+ * sets it to 1, since we want to avoid things from happening when an
+ * offline CPU wakes up due to something like an HMI (malfunction error),
+ * which propagates to all threads.
+ */
+int crash_wake_offline;
 
 #define CRASH_HANDLER_MAX 3
 /* List of shutdown handles */
@@ -63,7 +71,7 @@ static int handle_fault(struct pt_regs *regs)
 #ifdef CONFIG_SMP
 
 static atomic_t cpus_in_crash;
-static void crash_ipi_callback(struct pt_regs *regs)
+void crash_ipi_callback(struct pt_regs *regs)
 {
 	static cpumask_t cpus_state_saved = CPU_MASK_NONE;
 
@@ -106,6 +114,9 @@ static void crash_kexec_prepare_cpus(int cpu)
 
 	printk(KERN_EMERG "Sending IPI to other CPUs\n");
 
+	if (crash_wake_offline)
+		ncpus = num_present_cpus() - 1;
+
 	crash_send_ipi(crash_ipi_callback);
 	smp_wmb();
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e0a4c1f82e25..bbe7634b3a43 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
 #ifdef CONFIG_KEXEC_CORE
 void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 {
+	int cpu;
+
 	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
+	if (kdump_in_progress() && crash_wake_offline) {
+		for_each_present_cpu(cpu) {
+			if (cpu_online(cpu))
+				continue;
+			/*
+			 * crash_ipi_callback will wait for
+			 * all cpus, including offline CPUs.
+			 * We don't care about nmi_ipi_function.
+			 * Offline cpus will jump straight into
+			 * crash_ipi_callback, we can skip the
+			 * entire NMI dance and waiting for
+			 * cpus to clear pending mask, etc.
+			 */
+			do_smp_send_nmi_ipi(cpu);
+		}
+	}
 }
 #endif
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index ba030669eca1..49a23f9db5f0 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -37,6 +37,8 @@
 #include <asm/kvm_ppc.h>
 #include <asm/ppc-opcode.h>
 #include <asm/cpuidle.h>
+#include <asm/kexec.h>
+#include <asm/reg.h>
 
 #include "powernv.h"
 
@@ -212,6 +214,18 @@ static void pnv_smp_cpu_kill_self(void)
 		}
 		smp_mb();
 
+		/*
+		 * For kdump kernels, we process the ipi and jump to
+		 * crash_ipi_callback. For more details see the description
+		 * at crash_wake_offline
+		 */
+		if (kdump_in_progress()) {
+			struct pt_regs regs;
+
+			ppc_save_regs(&regs);
+			crash_ipi_callback(&regs);
+		}
+
 		if (cpu_core_split_required())
 			continue;
 
@@ -371,5 +385,8 @@ void __init pnv_smp_init(void)
 
 #ifdef CONFIG_HOTPLUG_CPU
 	ppc_md.cpu_die	= pnv_smp_cpu_kill_self;
+#ifdef CONFIG_KEXEC_CORE
+	crash_wake_offline = 1;
+#endif
 #endif
 }
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-13  8:08 ` [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's Balbir Singh
@ 2017-12-13 10:51   ` Nicholas Piggin
  2017-12-14  0:12     ` Balbir Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2017-12-13 10:51 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, mpe

This is looking pretty nice now...

On Wed, 13 Dec 2017 19:08:28 +1100
Balbir Singh <bsingharora@gmail.com> wrote:

> @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
>  #ifdef CONFIG_KEXEC_CORE
>  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
>  {
> +	int cpu;
> +
>  	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> +	if (kdump_in_progress() && crash_wake_offline) {
> +		for_each_present_cpu(cpu) {
> +			if (cpu_online(cpu))
> +				continue;
> +			/*
> +			 * crash_ipi_callback will wait for
> +			 * all cpus, including offline CPUs.
> +			 * We don't care about nmi_ipi_function.
> +			 * Offline cpus will jump straight into
> +			 * crash_ipi_callback, we can skip the
> +			 * entire NMI dance and waiting for
> +			 * cpus to clear pending mask, etc.
> +			 */
> +			do_smp_send_nmi_ipi(cpu);

Still a little bit concerned about using NMI IPI for this.

If you take an NMI IPI from stop, the idle code should do the
right thing and we would just return the system reset wakeup
reason in SRR1 here (which does not need to be cleared).

If you take the system reset anywhere else in the loop, it's
going to go out via system_reset_exception. I guess that
would end up doing the right thing, it probably gets to
crash_ipi_callback from crash_kexec_secondary?

It's just going to be a very untested code path :( What we
gain I suppose is better ability to handle a CPU that's locked
up somewhere in the cpu offline path. Assuming the uncommon
case works...

Actually, if you *always* go via the system reset exception
handler, then code paths will be shared. That might be the
way to go. So I would check for system reset wakeup SRR1 reason
and call replay_system_reset() for it. What do you think?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-13 10:51   ` Nicholas Piggin
@ 2017-12-14  0:12     ` Balbir Singh
  2017-12-14  1:51       ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2017-12-14  0:12 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, mpe

On Wed, 13 Dec 2017 20:51:01 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> This is looking pretty nice now...
> 
> On Wed, 13 Dec 2017 19:08:28 +1100
> Balbir Singh <bsingharora@gmail.com> wrote:
> 
> > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
> >  #ifdef CONFIG_KEXEC_CORE
> >  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
> >  {
> > +	int cpu;
> > +
> >  	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> > +	if (kdump_in_progress() && crash_wake_offline) {
> > +		for_each_present_cpu(cpu) {
> > +			if (cpu_online(cpu))
> > +				continue;
> > +			/*
> > +			 * crash_ipi_callback will wait for
> > +			 * all cpus, including offline CPUs.
> > +			 * We don't care about nmi_ipi_function.
> > +			 * Offline cpus will jump straight into
> > +			 * crash_ipi_callback, we can skip the
> > +			 * entire NMI dance and waiting for
> > +			 * cpus to clear pending mask, etc.
> > +			 */
> > +			do_smp_send_nmi_ipi(cpu);  
> 
> Still a little bit concerned about using NMI IPI for this.
>

OK -- for offline CPUs you mean?

> If you take an NMI IPI from stop, the idle code should do the
> right thing and we would just return the system reset wakeup
> reason in SRR1 here (which does not need to be cleared).
> 
> If you take the system reset anywhere else in the loop, it's
> going to go out via system_reset_exception. I guess that
> would end up doing the right thing, it probably gets to
> crash_ipi_callback from crash_kexec_secondary?

You mean like if we are online at the time of NMI'ing? If so
the original loop will NMI us back into crash_ipi_callback
anyway. We don't expect this to occur for offline CPUs

> 
> It's just going to be a very untested code path :( What we
> gain I suppose is better ability to handle a CPU that's locked
> up somewhere in the cpu offline path. Assuming the uncommon
> case works...
> 
> Actually, if you *always* go via the system reset exception
> handler, then code paths will be shared. That might be the
> way to go. So I would check for system reset wakeup SRR1 reason
> and call replay_system_reset() for it. What do you think?
> 

We could do that, but that would call pnv_system_reset_exception
and try to call the NMI function, but we've not used that path
to initiate the NMI, so it should call the stale nmi_ipi_function
which is crash_ipi_callback and not go via the crash_kexec path.


I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
I'm worried about calling a stale nmi_ipi_function via the
system_reset_exception path, if we are OK with it, I can revisit
the code path

Thanks,
Balbir Singh.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-14  0:12     ` Balbir Singh
@ 2017-12-14  1:51       ` Nicholas Piggin
  2017-12-14 12:16         ` Balbir Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2017-12-14  1:51 UTC (permalink / raw)
  To: Balbir Singh; +Cc: linuxppc-dev, mpe

On Thu, 14 Dec 2017 11:12:13 +1100
Balbir Singh <bsingharora@gmail.com> wrote:

> On Wed, 13 Dec 2017 20:51:01 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > This is looking pretty nice now...
> > 
> > On Wed, 13 Dec 2017 19:08:28 +1100
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >   
> > > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
> > >  #ifdef CONFIG_KEXEC_CORE
> > >  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
> > >  {
> > > +	int cpu;
> > > +
> > >  	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> > > +	if (kdump_in_progress() && crash_wake_offline) {
> > > +		for_each_present_cpu(cpu) {
> > > +			if (cpu_online(cpu))
> > > +				continue;
> > > +			/*
> > > +			 * crash_ipi_callback will wait for
> > > +			 * all cpus, including offline CPUs.
> > > +			 * We don't care about nmi_ipi_function.
> > > +			 * Offline cpus will jump straight into
> > > +			 * crash_ipi_callback, we can skip the
> > > +			 * entire NMI dance and waiting for
> > > +			 * cpus to clear pending mask, etc.
> > > +			 */
> > > +			do_smp_send_nmi_ipi(cpu);    
> > 
> > Still a little bit concerned about using NMI IPI for this.
> >  
> 
> OK -- for offline CPUs you mean?

Yes.

> > If you take an NMI IPI from stop, the idle code should do the
> > right thing and we would just return the system reset wakeup
> > reason in SRR1 here (which does not need to be cleared).
> > 
> > If you take the system reset anywhere else in the loop, it's
> > going to go out via system_reset_exception. I guess that
> > would end up doing the right thing, it probably gets to
> > crash_ipi_callback from crash_kexec_secondary?  
> 
> You mean like if we are online at the time of NMI'ing? If so
> the original loop will NMI us back into crash_ipi_callback
> anyway. We don't expect this to occur for offline CPUs

No, if the offline CPU is executing any instruction except for
stop when the crash occurs.

> 
> > 
> > It's just going to be a very untested code path :( What we
> > gain I suppose is better ability to handle a CPU that's locked
> > up somewhere in the cpu offline path. Assuming the uncommon
> > case works...
> > 
> > Actually, if you *always* go via the system reset exception
> > handler, then code paths will be shared. That might be the
> > way to go. So I would check for system reset wakeup SRR1 reason
> > and call replay_system_reset() for it. What do you think?
> >   
> 
> We could do that, but that would call pnv_system_reset_exception
> and try to call the NMI function, but we've not used that path
> to initiate the NMI, so it should call the stale nmi_ipi_function
> which is crash_ipi_callback and not go via the crash_kexec path.

It shouldn't, if the CPU is not set in the NMI bitmask, I think
it should go back out and do the rest of the system_reset_exception
handler.

Anyway we have to get this case right, because it can already hit
as I said if the offline CPU takes the NMI when it is not stopped.
This is why I want to try to use a unified code path.

> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
> I'm worried about calling a stale nmi_ipi_function via the
> system_reset_exception path, if we are OK with it, I can revisit
> the code path

You shouldn't get a stale one, that would also be a bug -- we
have to cope with NMIs coming in at any time that are triggered
externally (not by smp_send_nmi_ipi), so if you see any bugs
there those need to be fixed separately.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-14  1:51       ` Nicholas Piggin
@ 2017-12-14 12:16         ` Balbir Singh
  2017-12-14 13:32           ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2017-12-14 12:16 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Michael Ellerman

On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Thu, 14 Dec 2017 11:12:13 +1100
> Balbir Singh <bsingharora@gmail.com> wrote:
>
>> On Wed, 13 Dec 2017 20:51:01 +1000
>> Nicholas Piggin <npiggin@gmail.com> wrote:
>>
>> > This is looking pretty nice now...
>> >
>> > On Wed, 13 Dec 2017 19:08:28 +1100
>> > Balbir Singh <bsingharora@gmail.com> wrote:
>> >
>> > > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
>> > >  #ifdef CONFIG_KEXEC_CORE
>> > >  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
>> > >  {
>> > > + int cpu;
>> > > +
>> > >   smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
>> > > + if (kdump_in_progress() && crash_wake_offline) {
>> > > +         for_each_present_cpu(cpu) {
>> > > +                 if (cpu_online(cpu))
>> > > +                         continue;
>> > > +                 /*
>> > > +                  * crash_ipi_callback will wait for
>> > > +                  * all cpus, including offline CPUs.
>> > > +                  * We don't care about nmi_ipi_function.
>> > > +                  * Offline cpus will jump straight into
>> > > +                  * crash_ipi_callback, we can skip the
>> > > +                  * entire NMI dance and waiting for
>> > > +                  * cpus to clear pending mask, etc.
>> > > +                  */
>> > > +                 do_smp_send_nmi_ipi(cpu);
>> >
>> > Still a little bit concerned about using NMI IPI for this.
>> >
>>
>> OK -- for offline CPUs you mean?
>
> Yes.
>
>> > If you take an NMI IPI from stop, the idle code should do the
>> > right thing and we would just return the system reset wakeup
>> > reason in SRR1 here (which does not need to be cleared).
>> >
>> > If you take the system reset anywhere else in the loop, it's
>> > going to go out via system_reset_exception. I guess that
>> > would end up doing the right thing, it probably gets to
>> > crash_ipi_callback from crash_kexec_secondary?
>>
>> You mean like if we are online at the time of NMI'ing? If so
>> the original loop will NMI us back into crash_ipi_callback
>> anyway. We don't expect this to occur for offline CPUs
>
> No, if the offline CPU is executing any instruction except for
> stop when the crash occurs.
>

OK, yeah

>>
>> >
>> > It's just going to be a very untested code path :( What we
>> > gain I suppose is better ability to handle a CPU that's locked
>> > up somewhere in the cpu offline path. Assuming the uncommon
>> > case works...
>> >
>> > Actually, if you *always* go via the system reset exception
>> > handler, then code paths will be shared. That might be the
>> > way to go. So I would check for system reset wakeup SRR1 reason
>> > and call replay_system_reset() for it. What do you think?
>> >
>>
>> We could do that, but that would call pnv_system_reset_exception
>> and try to call the NMI function, but we've not used that path
>> to initiate the NMI, so it should call the stale nmi_ipi_function
>> which is crash_ipi_callback and not go via the crash_kexec path.
>
> It shouldn't, if the CPU is not set in the NMI bitmask, I think
> it should go back out and do the rest of the system_reset_exception
> handler.
>
> Anyway we have to get this case right, because it can already hit
> as I said if the offline CPU takes the NMI when it is not stopped.
> This is why I want to try to use a unified code path.
>

OK

>> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
>> I'm worried about calling a stale nmi_ipi_function via the
>> system_reset_exception path, if we are OK with it, I can revisit
>> the code path
>
> You shouldn't get a stale one, that would also be a bug -- we
> have to cope with NMIs coming in at any time that are triggered
> externally (not by smp_send_nmi_ipi), so if you see any bugs
> there those need to be fixed separately.
>

Yes, I think it's a bug, nothing clears nmi_ipi_function (from what
I can see), so when the next NMI comes in and goes into
pnv_system_reset_exception
it'll execute the stale handler. I'll respin things based on the
suggestion above
and deal with any bugs as well.

Balbir Singh.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-14 12:16         ` Balbir Singh
@ 2017-12-14 13:32           ` Nicholas Piggin
  2017-12-15  2:55             ` Balbir Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2017-12-14 13:32 UTC (permalink / raw)
  To: Balbir Singh
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Michael Ellerman

On Thu, 14 Dec 2017 23:16:26 +1100
Balbir Singh <bsingharora@gmail.com> wrote:

> On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:


> >> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
> >> I'm worried about calling a stale nmi_ipi_function via the
> >> system_reset_exception path, if we are OK with it, I can revisit
> >> the code path  
> >
> > You shouldn't get a stale one, that would also be a bug -- we
> > have to cope with NMIs coming in at any time that are triggered
> > externally (not by smp_send_nmi_ipi), so if you see any bugs
> > there those need to be fixed separately.
> >  
> 
> Yes, I think it's a bug, nothing clears nmi_ipi_function (from what
> I can see), so when the next NMI comes in and goes into
> pnv_system_reset_exception
> it'll execute the stale handler.

The CPU won't be in the nmi_ipi_pending_mask though, so it shouldn't
get that far. You could add a bit of paranoia to clear the function
pointer I suppose, but AFAIKS it's not needed.

> I'll respin things based on the
> suggestion above
> and deal with any bugs as well.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
  2017-12-14 13:32           ` Nicholas Piggin
@ 2017-12-15  2:55             ` Balbir Singh
  0 siblings, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2017-12-15  2:55 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Michael Ellerman

On Fri, Dec 15, 2017 at 12:32 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Thu, 14 Dec 2017 23:16:26 +1100
> Balbir Singh <bsingharora@gmail.com> wrote:
>
>> On Thu, Dec 14, 2017 at 12:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
>
>> >> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
>> >> I'm worried about calling a stale nmi_ipi_function via the
>> >> system_reset_exception path, if we are OK with it, I can revisit
>> >> the code path
>> >
>> > You shouldn't get a stale one, that would also be a bug -- we
>> > have to cope with NMIs coming in at any time that are triggered
>> > externally (not by smp_send_nmi_ipi), so if you see any bugs
>> > there those need to be fixed separately.
>> >
>>
>> Yes, I think it's a bug, nothing clears nmi_ipi_function (from what
>> I can see), so when the next NMI comes in and goes into
>> pnv_system_reset_exception
>> it'll execute the stale handler.
>
> The CPU won't be in the nmi_ipi_pending_mask though, so it shouldn't
> get that far. You could add a bit of paranoia to clear the function
> pointer I suppose, but AFAIKS it's not needed.
>

Yep your right, but these things are so subtle :) I will as paranoia
cleanup nmi_ipi_function, but I'll add that as a TODO

Balbir Singh.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-12-15  2:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-13  8:08 [PATCH 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback Balbir Singh
2017-12-13  8:08 ` [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's Balbir Singh
2017-12-13 10:51   ` Nicholas Piggin
2017-12-14  0:12     ` Balbir Singh
2017-12-14  1:51       ` Nicholas Piggin
2017-12-14 12:16         ` Balbir Singh
2017-12-14 13:32           ` Nicholas Piggin
2017-12-15  2:55             ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.