[RFC][PATCH] x86, reboot: use NMI instead of REBOOT_VECTOR to stop cpus

* [RFC][PATCH] x86, reboot:  use NMI instead of REBOOT_VECTOR to stop cpus
@ 2011-09-28 18:55 Don Zickus
  2011-10-06 20:50 ` Don Zickus
  2011-10-10  6:53 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Don Zickus @ 2011-09-28 18:55 UTC (permalink / raw)
  To: Ingo Molnar, Andi Kleen, x86
  Cc: LKML, Peter Zijlstra, Robert Richter, Andrew Morton,
	seiji.aguchi, vgoyal, mjg, tony.luck, gong.chen, satoru.moriya,
	avi, Don Zickus

A recent discussion started talking about the locking on the pstore fs
and how it relates to the kmsg infrastructure.  We noticed it was possible
for userspace to r/w to the pstore fs (grabbing the locks in the process)
and block the panic path from r/w to the same fs.

The reason was the cpu with the lock could be doing work while the crashing
cpu is panic'ing.  Busting those spinlocks might cause those cpus to step
on each other's data.  Fine, fair enough.

It was suggested it would be nice to serialize the panic path (ie stop
the other cpus) and have only one cpu running.  This would allow us to
bust the spinlocks and not worry about another cpu stepping on the data.

Of course, smp_send_stop() does this in the panic case.  kmsg_dump() would
have to be moved to be called after it.  Easy enough.

The only problem is on x86 the smp_send_stop() function calls the
REBOOT_VECTOR.  Any cpu with irqs disabled (which pstore and its backend
ERST would do), block this IPI and thus do not stop.  This makes it
difficult to reliably log data to the pstore fs.

The patch below switches from the REBOOT_VECTOR to NMI (and mimics what
kdump does).  Switching to NMI allows us to deliver the IPI when irqs are
disabled, increasing the reliability of this function.

However, Andi carefully noted that on some machines this approach does not
work because of broken BIOSes or whatever.

I was hoping to get feedback on how much of a problem this really is.  Are
there that many machines?  I assume most modern machines have a reliable NMI
IPI mechanism (well on x86).  Is this just a problem on 32-bit machines?
Early SMP machines?

One idea I had was to create a blacklist of machines and have those machines
fallback to the original native_stop_other_cpus() that Andi wrote originally.
The hope was that list was small.

Does anyone have any feedback whether this is a good idea or not?  Perhaps I am
missing something?  Perhaps I should approach this problem differently?

[note] this patch sits on top of another NMI infrastructure change I have
submitted, so the nmi registeration might not apply cleanly without that patch.
However, for discussion purposes, I don't think that change is relevant, it is
more the idea/philosophy of this patch that I am worried about.

Thanks,
Don
---
 arch/x86/kernel/smp.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 013e7eb..e98f0a1 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -28,6 +28,7 @@
 #include <asm/mmu_context.h>
 #include <asm/proto.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 /*
  *	Some notes on x86 processor bugs affecting SMP operation:
  *
@@ -147,6 +148,57 @@ void native_send_call_func_ipi(const struct cpumask *mask)
 	free_cpumask_var(allbutself);
 }
 
+static int stopping_cpu;
+
+static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
+{
+	/* We are registerd on stopping cpu too, avoid spurious NMI */
+	if (raw_smp_processor_id() == stopping_cpu)
+		return NMI_HANDLED;
+
+	stop_this_cpu(NULL);
+
+	return NMI_HANDLED;
+}
+
+static void native_nmi_stop_other_cpus(int wait)
+{
+	unsigned long flags;
+	unsigned long timeout;
+
+	if (reboot_force)
+		return;
+
+	/*
+	 * Use an own vector here because smp_call_function
+	 * does lots of things not suitable in a panic situation.
+	 */
+	if (num_online_cpus() > 1) {
+		stopping_cpu = safe_smp_processor_id();
+
+		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
+					 NMI_FLAG_FIRST, "smp_stop"))
+			return;		/* return what? */
+
+		/* sync above data before sending NMI */
+		wmb();
+
+		apic->send_IPI_allbutself(NMI_VECTOR);
+
+		/*
+		 * Don't wait longer than a second if the caller
+		 * didn't ask us to wait.
+		 */
+		timeout = USEC_PER_SEC;
+		while (num_online_cpus() > 1 && (wait || timeout--))
+			udelay(1);
+	}
+
+	local_irq_save(flags);
+	disable_local_APIC();
+	local_irq_restore(flags);
+}
+
 /*
  * this function calls the 'stop' function on all other CPUs in the system.
  */
@@ -159,7 +211,7 @@ asmlinkage void smp_reboot_interrupt(void)
 	irq_exit();
 }
 
-static void native_stop_other_cpus(int wait)
+static void native_irq_stop_other_cpus(int wait)
 {
 	unsigned long flags;
 	unsigned long timeout;
@@ -229,7 +281,7 @@ struct smp_ops smp_ops = {
 	.smp_prepare_cpus	= native_smp_prepare_cpus,
 	.smp_cpus_done		= native_smp_cpus_done,
 
-	.stop_other_cpus	= native_stop_other_cpus,
+	.stop_other_cpus	= native_nmi_stop_other_cpus,
 	.smp_send_reschedule	= native_smp_send_reschedule,
 
 	.cpu_up			= native_cpu_up,
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread