[PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV
@ 2013-09-05 22:50 Mike Travis
  2013-09-05 22:50 ` [PATCH 1/9] x86/UV: Move NMI support Mike Travis
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

V2:  Split KDB updates from NMI updates.  Broke up the big patch to
     uv_nmi.c into smaller patches.  Updated to the latest linux
     kernel version.

The current UV NMI handler has not been updated for the changes in the
system NMI handler and the perf operations.  The UV NMI handler reads
an MMR in the UV Hub to check to see if the NMI event was caused by
the external 'system NMI' that the operator can initiate on the System
Mgmt Controller.

The problem arises when the perf tools are running, causing millions of
perf events per second on very large CPU count systems.  Previously this
was okay because the perf NMI handler ran at a higher priority on the
NMI call chain and if the NMI was a perf event, it would stop calling
other NMI handlers remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI call
chain including the UV NMI handler.  This causes the UV NMI handler
to read the MMRs at the same millions per second rate.  This can lead
to significant performance loss and possible system failures.  It also
can cause thousands of 'Dazed and Confused' messages being sent to the
system console.  This effectively makes perf tools unusable on UV systems.

This patch set addresses this problem and allows the perf tools to run on
UV without impacting performance and causing system failures.

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/9] x86/UV: Move NMI support
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 2/9] x86/UV: Update UV support for external NMI signals Mike Travis
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-move-nmi-support.patch --]
[-- Type: text/plain, Size: 7556 bytes --]

This patch moves the UV NMI support from the x2apic file to a new
separate uv_nmi.c file in preparation for the next sequence of patches.
It prevents upcoming bloat of the x2apic file, and has the added benefit
of putting the upcoming /sys/module parameters under the name 'uv_nmi'
instead of 'x2apic_uv_x', which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/include/asm/uv/uv.h       |    2 
 arch/x86/kernel/apic/x2apic_uv_x.c |   69 -------------------------
 arch/x86/platform/uv/Makefile      |    2 
 arch/x86/platform/uv/uv_nmi.c      |  102 +++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+), 70 deletions(-)

--- linux.orig/arch/x86/include/asm/uv/uv.h
+++ linux/arch/x86/include/asm/uv/uv.h
@@ -12,6 +12,7 @@ extern enum uv_system_type get_uv_system
 extern int is_uv_system(void);
 extern void uv_cpu_init(void);
 extern void uv_nmi_init(void);
+extern void uv_register_nmi_notifier(void);
 extern void uv_system_init(void);
 extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
 						 struct mm_struct *mm,
@@ -25,6 +26,7 @@ static inline enum uv_system_type get_uv
 static inline int is_uv_system(void)	{ return 0; }
 static inline void uv_cpu_init(void)	{ }
 static inline void uv_system_init(void)	{ }
+static inline void uv_register_nmi_notifier(void) { }
 static inline const struct cpumask *
 uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm,
 		    unsigned long start, unsigned long end, unsigned int cpu)
--- linux.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -39,12 +39,6 @@
 #include <asm/x86_init.h>
 #include <asm/nmi.h>
 
-/* BMC sets a bit this MMR non-zero before sending an NMI */
-#define UVH_NMI_MMR				UVH_SCRATCH5
-#define UVH_NMI_MMR_CLEAR			(UVH_NMI_MMR + 8)
-#define UV_NMI_PENDING_MASK			(1UL << 63)
-DEFINE_PER_CPU(unsigned long, cpu_last_nmi_count);
-
 DEFINE_PER_CPU(int, x2apic_extra_bits);
 
 #define PR_DEVEL(fmt, args...)	pr_devel("%s: " fmt, __func__, args)
@@ -58,7 +52,6 @@ int uv_min_hub_revision_id;
 EXPORT_SYMBOL_GPL(uv_min_hub_revision_id);
 unsigned int uv_apicid_hibits;
 EXPORT_SYMBOL_GPL(uv_apicid_hibits);
-static DEFINE_SPINLOCK(uv_nmi_lock);
 
 static struct apic apic_x2apic_uv_x;
 
@@ -854,68 +847,6 @@ void uv_cpu_init(void)
 		set_x2apic_extra_bits(uv_hub_info->pnode);
 }
 
-/*
- * When NMI is received, print a stack trace.
- */
-int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
-{
-	unsigned long real_uv_nmi;
-	int bid;
-
-	/*
-	 * Each blade has an MMR that indicates when an NMI has been sent
-	 * to cpus on the blade. If an NMI is detected, atomically
-	 * clear the MMR and update a per-blade NMI count used to
-	 * cause each cpu on the blade to notice a new NMI.
-	 */
-	bid = uv_numa_blade_id();
-	real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
-
-	if (unlikely(real_uv_nmi)) {
-		spin_lock(&uv_blade_info[bid].nmi_lock);
-		real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
-		if (real_uv_nmi) {
-			uv_blade_info[bid].nmi_count++;
-			uv_write_local_mmr(UVH_NMI_MMR_CLEAR, UV_NMI_PENDING_MASK);
-		}
-		spin_unlock(&uv_blade_info[bid].nmi_lock);
-	}
-
-	if (likely(__get_cpu_var(cpu_last_nmi_count) == uv_blade_info[bid].nmi_count))
-		return NMI_DONE;
-
-	__get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
-
-	/*
-	 * Use a lock so only one cpu prints at a time.
-	 * This prevents intermixed output.
-	 */
-	spin_lock(&uv_nmi_lock);
-	pr_info("UV NMI stack dump cpu %u:\n", smp_processor_id());
-	dump_stack();
-	spin_unlock(&uv_nmi_lock);
-
-	return NMI_HANDLED;
-}
-
-void uv_register_nmi_notifier(void)
-{
-	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
-		printk(KERN_WARNING "UV NMI handler failed to register\n");
-}
-
-void uv_nmi_init(void)
-{
-	unsigned int value;
-
-	/*
-	 * Unmask NMI on all cpus
-	 */
-	value = apic_read(APIC_LVT1) | APIC_DM_NMI;
-	value &= ~APIC_LVT_MASKED;
-	apic_write(APIC_LVT1, value);
-}
-
 void __init uv_system_init(void)
 {
 	union uvh_rh_gam_config_mmr_u  m_n_config;
--- linux.orig/arch/x86/platform/uv/Makefile
+++ linux/arch/x86/platform/uv/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_X86_UV)		+= tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o uv_time.o
+obj-$(CONFIG_X86_UV)		+= tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o uv_time.o uv_nmi.o
--- /dev/null
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -0,0 +1,102 @@
+/*
+ * SGI NMI support routines
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
+ *
+ *  Copyright (c) 2009-2013 Silicon Graphics, Inc.  All Rights Reserved.
+ *  Copyright (c) Mike Travis
+ */
+
+#include <linux/cpu.h>
+#include <linux/nmi.h>
+
+#include <asm/apic.h>
+#include <asm/nmi.h>
+#include <asm/uv/uv.h>
+#include <asm/uv/uv_hub.h>
+#include <asm/uv/uv_mmrs.h>
+
+/* BMC sets a bit this MMR non-zero before sending an NMI */
+#define UVH_NMI_MMR				UVH_SCRATCH5
+#define UVH_NMI_MMR_CLEAR			(UVH_NMI_MMR + 8)
+#define UV_NMI_PENDING_MASK			(1UL << 63)
+DEFINE_PER_CPU(unsigned long, cpu_last_nmi_count);
+static DEFINE_SPINLOCK(uv_nmi_lock);
+
+/*
+ * When NMI is received, print a stack trace.
+ */
+int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
+{
+	unsigned long real_uv_nmi;
+	int bid;
+
+	/*
+	 * Each blade has an MMR that indicates when an NMI has been sent
+	 * to cpus on the blade. If an NMI is detected, atomically
+	 * clear the MMR and update a per-blade NMI count used to
+	 * cause each cpu on the blade to notice a new NMI.
+	 */
+	bid = uv_numa_blade_id();
+	real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
+
+	if (unlikely(real_uv_nmi)) {
+		spin_lock(&uv_blade_info[bid].nmi_lock);
+		real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) &
+				UV_NMI_PENDING_MASK);
+		if (real_uv_nmi) {
+			uv_blade_info[bid].nmi_count++;
+			uv_write_local_mmr(UVH_NMI_MMR_CLEAR,
+						UV_NMI_PENDING_MASK);
+		}
+		spin_unlock(&uv_blade_info[bid].nmi_lock);
+	}
+
+	if (likely(__get_cpu_var(cpu_last_nmi_count) ==
+			uv_blade_info[bid].nmi_count))
+		return NMI_DONE;
+
+	__get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
+
+	/*
+	 * Use a lock so only one cpu prints at a time.
+	 * This prevents intermixed output.
+	 */
+	spin_lock(&uv_nmi_lock);
+	pr_info("UV NMI stack dump cpu %u:\n", smp_processor_id());
+	dump_stack();
+	spin_unlock(&uv_nmi_lock);
+
+	return NMI_HANDLED;
+}
+
+void uv_register_nmi_notifier(void)
+{
+	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
+		pr_warn("UV NMI handler failed to register\n");
+}
+
+void uv_nmi_init(void)
+{
+	unsigned int value;
+
+	/*
+	 * Unmask NMI on all cpus
+	 */
+	value = apic_read(APIC_LVT1) | APIC_DM_NMI;
+	value &= ~APIC_LVT_MASKED;
+	apic_write(APIC_LVT1, value);
+}
+

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/9] x86/UV: Update UV support for external NMI signals
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
  2013-09-05 22:50 ` [PATCH 1/9] x86/UV: Move NMI support Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 3/9] x86/UV: Add summary of cpu activity to UV NMI handler Mike Travis
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-update-nmi-support.patch --]
[-- Type: text/plain, Size: 23889 bytes --]

The current UV NMI handler has not been updated for the changes in the
system NMI handler and the perf operations.  The UV NMI handler reads
an MMR in the UV Hub to check to see if the NMI event was caused by
the external 'system NMI' that the operator can initiate on the System
Mgmt Controller.

The problem arises when the perf tools are running, causing millions of
perf events per second on very large CPU count systems.  Previously this
was okay because the perf NMI handler ran at a higher priority on the
NMI call chain and if the NMI was a perf event, it would stop calling
other NMI handlers remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI call
chain including the UV NMI handler.  This causes the UV NMI handler
to read the MMRs at the same millions per second rate.  This can lead
to significant performance loss and possible system failures.  It also
can cause thousands of 'Dazed and Confused' messages being sent to the
system console.  This effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running, this code
has been optimized to minimize reading of the MMRs as much as possible,
by moving to the NMI_UNKNOWN notifier chain.  This chain is called only
when all the users on the standard NMI_LOCAL call chain have been called
and none of them have claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is used.  When
the perf tools are in use, it's possible that the UV NMI was captured by
some other NMI handler and then either ignored or mistakenly processed as
a perf event.  We set a per_cpu ('ping') flag for those CPUs that ignored
the initial NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead checks the
in memory flag indicating it was pinged.  There are two module variables,
'ping_count' indicating how many requested NMI events occurred, and
'ping_misses' indicating how many stray NMI events.  These most likely
are perf events so it shows the overhead of the perf NMI interrupts
and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the first
cpu entering the NMI handler on each node set a per HUB in-memory
atomic value.  (Having a per HUB value avoids sending lock traffic over
NumaLink.)  Both types of UV NMIs from the SMI layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/include/asm/uv/uv_hub.h   |   57 +++
 arch/x86/include/asm/uv/uv_mmrs.h  |   31 ++
 arch/x86/kernel/apic/x2apic_uv_x.c |    1 
 arch/x86/platform/uv/uv_nmi.c      |  551 ++++++++++++++++++++++++++++++++++---
 4 files changed, 599 insertions(+), 41 deletions(-)

--- linux.orig/arch/x86/include/asm/uv/uv_hub.h
+++ linux/arch/x86/include/asm/uv/uv_hub.h
@@ -502,8 +502,8 @@ struct uv_blade_info {
 	unsigned short	nr_online_cpus;
 	unsigned short	pnode;
 	short		memory_nid;
-	spinlock_t	nmi_lock;
-	unsigned long	nmi_count;
+	spinlock_t	nmi_lock;	/* obsolete, see uv_hub_nmi */
+	unsigned long	nmi_count;	/* obsolete, see uv_hub_nmi */
 };
 extern struct uv_blade_info *uv_blade_info;
 extern short *uv_node_to_blade;
@@ -576,6 +576,59 @@ static inline int uv_num_possible_blades
 	return uv_possible_blades;
 }
 
+/* Per Hub NMI support */
+extern void uv_nmi_setup(void);
+
+/* BMC sets a bit this MMR non-zero before sending an NMI */
+#define UVH_NMI_MMR		UVH_SCRATCH5
+#define UVH_NMI_MMR_CLEAR	UVH_SCRATCH5_ALIAS
+#define UVH_NMI_MMR_SHIFT	63
+#define	UVH_NMI_MMR_TYPE	"SCRATCH5"
+
+/* Newer SMM NMI handler, not present in all systems */
+#define UVH_NMI_MMRX		UVH_EVENT_OCCURRED0
+#define UVH_NMI_MMRX_CLEAR	UVH_EVENT_OCCURRED0_ALIAS
+#define UVH_NMI_MMRX_SHIFT	(is_uv1_hub() ? \
+					UV1H_EVENT_OCCURRED0_EXTIO_INT0_SHFT :\
+					UVXH_EVENT_OCCURRED0_EXTIO_INT0_SHFT)
+#define	UVH_NMI_MMRX_TYPE	"EXTIO_INT0"
+
+/* Non-zero indicates newer SMM NMI handler present */
+#define UVH_NMI_MMRX_SUPPORTED	UVH_EXTIO_INT0_BROADCAST
+
+/* Indicates to BIOS that we want to use the newer SMM NMI handler */
+#define UVH_NMI_MMRX_REQ	UVH_SCRATCH5_ALIAS_2
+#define UVH_NMI_MMRX_REQ_SHIFT	62
+
+struct uv_hub_nmi_s {
+	raw_spinlock_t	nmi_lock;
+	atomic_t	in_nmi;		/* flag this node in UV NMI IRQ */
+	atomic_t	cpu_owner;	/* last locker of this struct */
+	atomic_t	read_mmr_count;	/* count of MMR reads */
+	atomic_t	nmi_count;	/* count of true UV NMIs */
+	unsigned long	nmi_value;	/* last value read from NMI MMR */
+};
+
+struct uv_cpu_nmi_s {
+	struct uv_hub_nmi_s	*hub;
+	atomic_t		state;
+	atomic_t		pinging;
+	int			queries;
+	int			pings;
+};
+
+DECLARE_PER_CPU(struct uv_cpu_nmi_s, __uv_cpu_nmi);
+#define uv_cpu_nmi			(__get_cpu_var(__uv_cpu_nmi))
+#define uv_hub_nmi			(uv_cpu_nmi.hub)
+#define uv_cpu_nmi_per(cpu)		(per_cpu(__uv_cpu_nmi, cpu))
+#define uv_hub_nmi_per(cpu)		(uv_cpu_nmi_per(cpu).hub)
+
+/* uv_cpu_nmi_states */
+#define	UV_NMI_STATE_OUT		0
+#define	UV_NMI_STATE_IN			1
+#define	UV_NMI_STATE_DUMP		2
+#define	UV_NMI_STATE_DUMP_DONE		3
+
 /* Update SCIR state */
 static inline void uv_set_scir_bits(unsigned char value)
 {
--- linux.orig/arch/x86/include/asm/uv/uv_mmrs.h
+++ linux/arch/x86/include/asm/uv/uv_mmrs.h
@@ -461,6 +461,23 @@ union uvh_event_occurred0_u {
 
 
 /* ========================================================================= */
+/*                         UVH_EXTIO_INT0_BROADCAST                          */
+/* ========================================================================= */
+#define UVH_EXTIO_INT0_BROADCAST 0x61448UL
+#define UVH_EXTIO_INT0_BROADCAST_32 0x3f0
+
+#define UVH_EXTIO_INT0_BROADCAST_ENABLE_SHFT		0
+#define UVH_EXTIO_INT0_BROADCAST_ENABLE_MASK		0x0000000000000001UL
+
+union uvh_extio_int0_broadcast_u {
+	unsigned long	v;
+	struct uvh_extio_int0_broadcast_s {
+		unsigned long	enable:1;			/* RW */
+		unsigned long	rsvd_1_63:63;
+	} s;
+};
+
+/* ========================================================================= */
 /*                         UVH_GR0_TLB_INT0_CONFIG                           */
 /* ========================================================================= */
 #define UVH_GR0_TLB_INT0_CONFIG 0x61b00UL
@@ -2606,6 +2623,20 @@ union uvh_scratch5_u {
 };
 
 /* ========================================================================= */
+/*                            UVH_SCRATCH5_ALIAS                             */
+/* ========================================================================= */
+#define UVH_SCRATCH5_ALIAS 0x2d0208UL
+#define UVH_SCRATCH5_ALIAS_32 0x780
+
+
+/* ========================================================================= */
+/*                           UVH_SCRATCH5_ALIAS_2                            */
+/* ========================================================================= */
+#define UVH_SCRATCH5_ALIAS_2 0x2d0210UL
+#define UVH_SCRATCH5_ALIAS_2_32 0x788
+
+
+/* ========================================================================= */
 /*                          UVXH_EVENT_OCCURRED2                             */
 /* ========================================================================= */
 #define UVXH_EVENT_OCCURRED2 0x70100UL
--- linux.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -984,6 +984,7 @@ void __init uv_system_init(void)
 	map_mmr_high(max_pnode);
 	map_mmioh_high(min_pnode, max_pnode);
 
+	uv_nmi_setup();
 	uv_cpu_init();
 	uv_scir_register_cpu_notifier();
 	uv_register_nmi_notifier();
--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -20,72 +20,518 @@
  */
 
 #include <linux/cpu.h>
+#include <linux/delay.h>
+#include <linux/module.h>
 #include <linux/nmi.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
 
 #include <asm/apic.h>
+#include <asm/current.h>
+#include <asm/kdebug.h>
+#include <asm/local64.h>
 #include <asm/nmi.h>
 #include <asm/uv/uv.h>
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_mmrs.h>
 
-/* BMC sets a bit this MMR non-zero before sending an NMI */
-#define UVH_NMI_MMR				UVH_SCRATCH5
-#define UVH_NMI_MMR_CLEAR			(UVH_NMI_MMR + 8)
-#define UV_NMI_PENDING_MASK			(1UL << 63)
-DEFINE_PER_CPU(unsigned long, cpu_last_nmi_count);
-static DEFINE_SPINLOCK(uv_nmi_lock);
+/*
+ * UV handler for NMI
+ *
+ * Handle system-wide NMI events generated by the global 'power nmi' command.
+ *
+ * Basic operation is to field the NMI interrupt on each cpu and wait
+ * until all cpus have arrived into the nmi handler.  If some cpus do not
+ * make it into the handler, try and force them in with the IPI(NMI) signal.
+ *
+ * We also have to lessen UV Hub MMR accesses as much as possible as this
+ * disrupts the UV Hub's primary mission of directing NumaLink traffic and
+ * can cause system problems to occur.
+ *
+ * To do this we register our primary NMI notifier on the NMI_UNKNOWN
+ * chain.  This reduces the number of false NMI calls when the perf
+ * tools are running which generate an enormous number of NMIs per
+ * second (~4M/s for 1024 cpu threads).  Our secondary NMI handler is
+ * very short as it only checks that if it has been "pinged" with the
+ * IPI(NMI) signal as mentioned above, and does not read the UV Hub's MMR.
+ *
+ */
+
+static struct uv_hub_nmi_s **uv_hub_nmi_list;
+
+DEFINE_PER_CPU(struct uv_cpu_nmi_s, __uv_cpu_nmi);
+EXPORT_PER_CPU_SYMBOL_GPL(__uv_cpu_nmi);
+
+static unsigned long nmi_mmr;
+static unsigned long nmi_mmr_clear;
+static unsigned long nmi_mmr_pending;
+
+static atomic_t	uv_in_nmi;
+static atomic_t uv_nmi_cpu = ATOMIC_INIT(-1);
+static atomic_t uv_nmi_cpus_in_nmi = ATOMIC_INIT(-1);
+static atomic_t uv_nmi_slave_continue;
+static cpumask_var_t uv_nmi_cpu_mask;
+
+/* Values for uv_nmi_slave_continue */
+#define SLAVE_CLEAR	0
+#define SLAVE_CONTINUE	1
+#define SLAVE_EXIT	2
 
 /*
- * When NMI is received, print a stack trace.
+ * Default is all stack dumps go to the console and buffer.
+ * Lower level to send to log buffer only.
  */
-int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
+static int uv_nmi_loglevel = 7;
+module_param_named(dump_loglevel, uv_nmi_loglevel, int, 0644);
+
+/*
+ * The following values show statistics on how perf events are affecting
+ * this system.
+ */
+static int param_get_local64(char *buffer, const struct kernel_param *kp)
 {
-	unsigned long real_uv_nmi;
-	int bid;
+	return sprintf(buffer, "%lu\n", local64_read((local64_t *)kp->arg));
+}
 
-	/*
-	 * Each blade has an MMR that indicates when an NMI has been sent
-	 * to cpus on the blade. If an NMI is detected, atomically
-	 * clear the MMR and update a per-blade NMI count used to
-	 * cause each cpu on the blade to notice a new NMI.
-	 */
-	bid = uv_numa_blade_id();
-	real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) & UV_NMI_PENDING_MASK);
+static int param_set_local64(const char *val, const struct kernel_param *kp)
+{
+	/* clear on any write */
+	local64_set((local64_t *)kp->arg, 0);
+	return 0;
+}
+
+static struct kernel_param_ops param_ops_local64 = {
+	.get = param_get_local64,
+	.set = param_set_local64,
+};
+#define param_check_local64(name, p) __param_check(name, p, local64_t)
+
+static local64_t uv_nmi_count;
+module_param_named(nmi_count, uv_nmi_count, local64, 0644);
+
+static local64_t uv_nmi_misses;
+module_param_named(nmi_misses, uv_nmi_misses, local64, 0644);
+
+static local64_t uv_nmi_ping_count;
+module_param_named(ping_count, uv_nmi_ping_count, local64, 0644);
+
+static local64_t uv_nmi_ping_misses;
+module_param_named(ping_misses, uv_nmi_ping_misses, local64, 0644);
 
-	if (unlikely(real_uv_nmi)) {
-		spin_lock(&uv_blade_info[bid].nmi_lock);
-		real_uv_nmi = (uv_read_local_mmr(UVH_NMI_MMR) &
-				UV_NMI_PENDING_MASK);
-		if (real_uv_nmi) {
-			uv_blade_info[bid].nmi_count++;
-			uv_write_local_mmr(UVH_NMI_MMR_CLEAR,
-						UV_NMI_PENDING_MASK);
+/*
+ * Following values allow tuning for large systems under heavy loading
+ */
+static int uv_nmi_initial_delay = 100;
+module_param_named(initial_delay, uv_nmi_initial_delay, int, 0644);
+
+static int uv_nmi_slave_delay = 100;
+module_param_named(slave_delay, uv_nmi_slave_delay, int, 0644);
+
+static int uv_nmi_loop_delay = 100;
+module_param_named(loop_delay, uv_nmi_loop_delay, int, 0644);
+
+static int uv_nmi_trigger_delay = 10000;
+module_param_named(trigger_delay, uv_nmi_trigger_delay, int, 0644);
+
+static int uv_nmi_wait_count = 100;
+module_param_named(wait_count, uv_nmi_wait_count, int, 0644);
+
+static int uv_nmi_retry_count = 500;
+module_param_named(retry_count, uv_nmi_retry_count, int, 0644);
+
+/* Setup which NMI support is present in system */
+static void uv_nmi_setup_mmrs(void)
+{
+	if (uv_read_local_mmr(UVH_NMI_MMRX_SUPPORTED)) {
+		uv_write_local_mmr(UVH_NMI_MMRX_REQ,
+					1UL << UVH_NMI_MMRX_REQ_SHIFT);
+		nmi_mmr = UVH_NMI_MMRX;
+		nmi_mmr_clear = UVH_NMI_MMRX_CLEAR;
+		nmi_mmr_pending = 1UL << UVH_NMI_MMRX_SHIFT;
+		pr_info("UV: SMI NMI support: %s\n", UVH_NMI_MMRX_TYPE);
+	} else {
+		nmi_mmr = UVH_NMI_MMR;
+		nmi_mmr_clear = UVH_NMI_MMR_CLEAR;
+		nmi_mmr_pending = 1UL << UVH_NMI_MMR_SHIFT;
+		pr_info("UV: SMI NMI support: %s\n", UVH_NMI_MMR_TYPE);
+	}
+}
+
+/* Read NMI MMR and check if NMI flag was set by BMC. */
+static inline int uv_nmi_test_mmr(struct uv_hub_nmi_s *hub_nmi)
+{
+	hub_nmi->nmi_value = uv_read_local_mmr(nmi_mmr);
+	atomic_inc(&hub_nmi->read_mmr_count);
+	return !!(hub_nmi->nmi_value & nmi_mmr_pending);
+}
+
+static inline void uv_local_mmr_clear_nmi(void)
+{
+	uv_write_local_mmr(nmi_mmr_clear, nmi_mmr_pending);
+}
+
+/*
+ * If first cpu in on this hub, set hub_nmi "in_nmi" and "owner" values and
+ * return true.  If first cpu in on the system, set global "in_nmi" flag.
+ */
+static int uv_set_in_nmi(int cpu, struct uv_hub_nmi_s *hub_nmi)
+{
+	int first = atomic_add_unless(&hub_nmi->in_nmi, 1, 1);
+
+	if (first) {
+		atomic_set(&hub_nmi->cpu_owner, cpu);
+		if (atomic_add_unless(&uv_in_nmi, 1, 1))
+			atomic_set(&uv_nmi_cpu, cpu);
+
+		atomic_inc(&hub_nmi->nmi_count);
+	}
+	return first;
+}
+
+/* Check if this is a system NMI event */
+static int uv_check_nmi(struct uv_hub_nmi_s *hub_nmi)
+{
+	int cpu = smp_processor_id();
+	int nmi = 0;
+
+	local64_inc(&uv_nmi_count);
+	uv_cpu_nmi.queries++;
+
+	do {
+		nmi = atomic_read(&hub_nmi->in_nmi);
+		if (nmi)
+			break;
+
+		if (raw_spin_trylock(&hub_nmi->nmi_lock)) {
+
+			/* check hub MMR NMI flag */
+			if (uv_nmi_test_mmr(hub_nmi)) {
+				uv_set_in_nmi(cpu, hub_nmi);
+				nmi = 1;
+				break;
+			}
+
+			/* MMR NMI flag is clear */
+			raw_spin_unlock(&hub_nmi->nmi_lock);
+
+		} else {
+			/* wait a moment for the hub nmi locker to set flag */
+			cpu_relax();
+			udelay(uv_nmi_slave_delay);
+
+			/* re-check hub in_nmi flag */
+			nmi = atomic_read(&hub_nmi->in_nmi);
+			if (nmi)
+				break;
+		}
+
+		/* check if this BMC missed setting the MMR NMI flag */
+		if (!nmi) {
+			nmi = atomic_read(&uv_in_nmi);
+			if (nmi)
+				uv_set_in_nmi(cpu, hub_nmi);
 		}
-		spin_unlock(&uv_blade_info[bid].nmi_lock);
+
+	} while (0);
+
+	if (!nmi)
+		local64_inc(&uv_nmi_misses);
+
+	return nmi;
+}
+
+/* Need to reset the NMI MMR register, but only once per hub. */
+static inline void uv_clear_nmi(int cpu)
+{
+	struct uv_hub_nmi_s *hub_nmi = uv_hub_nmi;
+
+	if (cpu == atomic_read(&hub_nmi->cpu_owner)) {
+		atomic_set(&hub_nmi->cpu_owner, -1);
+		atomic_set(&hub_nmi->in_nmi, 0);
+		uv_local_mmr_clear_nmi();
+		raw_spin_unlock(&hub_nmi->nmi_lock);
 	}
+}
+
+/* Print non-responding cpus */
+static void uv_nmi_nr_cpus_pr(char *fmt)
+{
+	static char cpu_list[1024];
+	int len = sizeof(cpu_list);
+	int c = cpumask_weight(uv_nmi_cpu_mask);
+	int n = cpulist_scnprintf(cpu_list, len, uv_nmi_cpu_mask);
+
+	if (n >= len-1)
+		strcpy(&cpu_list[len - 6], "...\n");
+
+	printk(fmt, c, cpu_list);
+}
+
+/* Ping non-responding cpus attemping to force them into the NMI handler */
+static void uv_nmi_nr_cpus_ping(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, uv_nmi_cpu_mask)
+		atomic_set(&uv_cpu_nmi_per(cpu).pinging, 1);
+
+	apic->send_IPI_mask(uv_nmi_cpu_mask, APIC_DM_NMI);
+}
+
+/* Clean up flags for cpus that ignored both NMI and ping */
+static void uv_nmi_cleanup_mask(void)
+{
+	int cpu;
 
-	if (likely(__get_cpu_var(cpu_last_nmi_count) ==
-			uv_blade_info[bid].nmi_count))
+	for_each_cpu(cpu, uv_nmi_cpu_mask) {
+		atomic_set(&uv_cpu_nmi_per(cpu).pinging, 0);
+		atomic_set(&uv_cpu_nmi_per(cpu).state, UV_NMI_STATE_OUT);
+		cpumask_clear_cpu(cpu, uv_nmi_cpu_mask);
+	}
+}
+
+/* Loop waiting as cpus enter nmi handler */
+static int uv_nmi_wait_cpus(int first)
+{
+	int i, j, k, n = num_online_cpus();
+	int last_k = 0, waiting = 0;
+
+	if (first) {
+		cpumask_copy(uv_nmi_cpu_mask, cpu_online_mask);
+		k = 0;
+	} else {
+		k = n - cpumask_weight(uv_nmi_cpu_mask);
+	}
+
+	udelay(uv_nmi_initial_delay);
+	for (i = 0; i < uv_nmi_retry_count; i++) {
+		int loop_delay = uv_nmi_loop_delay;
+
+		for_each_cpu(j, uv_nmi_cpu_mask) {
+			if (atomic_read(&uv_cpu_nmi_per(j).state)) {
+				cpumask_clear_cpu(j, uv_nmi_cpu_mask);
+				if (++k >= n)
+					break;
+			}
+		}
+		if (k >= n) {		/* all in? */
+			k = n;
+			break;
+		}
+		if (last_k != k) {	/* abort if no new cpus coming in */
+			last_k = k;
+			waiting = 0;
+		} else if (++waiting > uv_nmi_wait_count)
+			break;
+
+		/* extend delay if waiting only for cpu 0 */
+		if (waiting && (n - k) == 1 &&
+		    cpumask_test_cpu(0, uv_nmi_cpu_mask))
+			loop_delay *= 100;
+
+		udelay(loop_delay);
+	}
+	atomic_set(&uv_nmi_cpus_in_nmi, k);
+	return n - k;
+}
+
+/* Wait until all slave cpus have entered UV NMI handler */
+static void uv_nmi_wait(int master)
+{
+	/* indicate this cpu is in */
+	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_IN);
+
+	/* if not the first cpu in (the master), then we are a slave cpu */
+	if (!master)
+		return;
+
+	do {
+		/* wait for all other cpus to gather here */
+		if (!uv_nmi_wait_cpus(1))
+			break;
+
+		/* if not all made it in, send IPI NMI to them */
+		uv_nmi_nr_cpus_pr(KERN_ALERT
+			"UV: Sending NMI IPI to %d non-responding CPUs: %s\n");
+		uv_nmi_nr_cpus_ping();
+
+		/* if all cpus are in, then done */
+		if (!uv_nmi_wait_cpus(0))
+			break;
+
+		uv_nmi_nr_cpus_pr(KERN_ALERT
+			"UV: %d CPUs not in NMI loop: %s\n");
+	} while (0);
+
+	pr_alert("UV: %d of %d CPUs in NMI\n",
+		atomic_read(&uv_nmi_cpus_in_nmi), num_online_cpus());
+}
+
+/* Dump this cpu's state */
+static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
+{
+	const char *dots = " ................................. ";
+
+	printk(KERN_DEFAULT "UV:%sNMI process trace for CPU %d\n", dots, cpu);
+	show_regs(regs);
+	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_DUMP_DONE);
+}
+
+/* Trigger a slave cpu to dump it's state */
+static void uv_nmi_trigger_dump(int cpu)
+{
+	int retry = uv_nmi_trigger_delay;
+
+	if (atomic_read(&uv_cpu_nmi_per(cpu).state) != UV_NMI_STATE_IN)
+		return;
+
+	atomic_set(&uv_cpu_nmi_per(cpu).state, UV_NMI_STATE_DUMP);
+	do {
+		cpu_relax();
+		udelay(10);
+		if (atomic_read(&uv_cpu_nmi_per(cpu).state)
+				!= UV_NMI_STATE_DUMP)
+			return;
+	} while (--retry > 0);
+
+	pr_crit("UV: CPU %d stuck in process dump function\n", cpu);
+	atomic_set(&uv_cpu_nmi_per(cpu).state, UV_NMI_STATE_DUMP_DONE);
+}
+
+/* Wait until all cpus ready to exit */
+static void uv_nmi_sync_exit(int master)
+{
+	atomic_dec(&uv_nmi_cpus_in_nmi);
+	if (master) {
+		while (atomic_read(&uv_nmi_cpus_in_nmi) > 0)
+			cpu_relax();
+		atomic_set(&uv_nmi_slave_continue, SLAVE_CLEAR);
+	} else {
+		while (atomic_read(&uv_nmi_slave_continue))
+			cpu_relax();
+	}
+}
+
+/* Walk through cpu list and dump state of each */
+static void uv_nmi_dump_state(int cpu, struct pt_regs *regs, int master)
+{
+	if (master) {
+		int tcpu;
+		int ignored = 0;
+		int saved_console_loglevel = console_loglevel;
+
+		pr_alert("UV: tracing processes for %d CPUs from CPU %d\n",
+			atomic_read(&uv_nmi_cpus_in_nmi), cpu);
+
+		console_loglevel = uv_nmi_loglevel;
+		atomic_set(&uv_nmi_slave_continue, SLAVE_EXIT);
+		for_each_online_cpu(tcpu) {
+			if (cpumask_test_cpu(tcpu, uv_nmi_cpu_mask))
+				ignored++;
+			else if (tcpu == cpu)
+				uv_nmi_dump_state_cpu(tcpu, regs);
+			else
+				uv_nmi_trigger_dump(tcpu);
+		}
+		if (ignored)
+			printk(KERN_DEFAULT "UV: %d CPUs ignored NMI\n",
+				ignored);
+
+		console_loglevel = saved_console_loglevel;
+		pr_alert("UV: process trace complete\n");
+	} else {
+		while (!atomic_read(&uv_nmi_slave_continue))
+			cpu_relax();
+		while (atomic_read(&uv_cpu_nmi.state) != UV_NMI_STATE_DUMP)
+			cpu_relax();
+		uv_nmi_dump_state_cpu(cpu, regs);
+	}
+	uv_nmi_sync_exit(master);
+}
+
+static void uv_nmi_touch_watchdogs(void)
+{
+	touch_softlockup_watchdog_sync();
+	clocksource_touch_watchdog();
+	rcu_cpu_stall_reset();
+	touch_nmi_watchdog();
+}
+
+/*
+ * UV NMI handler
+ */
+int uv_handle_nmi(unsigned int reason, struct pt_regs *regs)
+{
+	struct uv_hub_nmi_s *hub_nmi = uv_hub_nmi;
+	int cpu = smp_processor_id();
+	int master = 0;
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	/* If not a UV System NMI, ignore */
+	if (!atomic_read(&uv_cpu_nmi.pinging) && !uv_check_nmi(hub_nmi)) {
+		local_irq_restore(flags);
 		return NMI_DONE;
+	}
 
-	__get_cpu_var(cpu_last_nmi_count) = uv_blade_info[bid].nmi_count;
+	/* Indicate we are the first CPU into the NMI handler */
+	master = (atomic_read(&uv_nmi_cpu) == cpu);
 
-	/*
-	 * Use a lock so only one cpu prints at a time.
-	 * This prevents intermixed output.
-	 */
-	spin_lock(&uv_nmi_lock);
-	pr_info("UV NMI stack dump cpu %u:\n", smp_processor_id());
-	dump_stack();
-	spin_unlock(&uv_nmi_lock);
+	/* Pause as all cpus enter the NMI handler */
+	uv_nmi_wait(master);
+
+	/* Dump state of each cpu */
+	uv_nmi_dump_state(cpu, regs, master);
+
+	/* Clear per_cpu "in nmi" flag */
+	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_OUT);
+
+	/* Clear MMR NMI flag on each hub */
+	uv_clear_nmi(cpu);
+
+	/* Clear global flags */
+	if (master) {
+		if (cpumask_weight(uv_nmi_cpu_mask))
+			uv_nmi_cleanup_mask();
+		atomic_set(&uv_nmi_cpus_in_nmi, -1);
+		atomic_set(&uv_nmi_cpu, -1);
+		atomic_set(&uv_in_nmi, 0);
+	}
+
+	uv_nmi_touch_watchdogs();
+	local_irq_restore(flags);
 
 	return NMI_HANDLED;
 }
 
+/*
+ * NMI handler for pulling in CPUs when perf events are grabbing our NMI
+ */
+int uv_handle_nmi_ping(unsigned int reason, struct pt_regs *regs)
+{
+	int ret;
+
+	uv_cpu_nmi.queries++;
+	if (!atomic_read(&uv_cpu_nmi.pinging)) {
+		local64_inc(&uv_nmi_ping_misses);
+		return NMI_DONE;
+	}
+
+	uv_cpu_nmi.pings++;
+	local64_inc(&uv_nmi_ping_count);
+	ret = uv_handle_nmi(reason, regs);
+	atomic_set(&uv_cpu_nmi.pinging, 0);
+	return ret;
+}
+
 void uv_register_nmi_notifier(void)
 {
 	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
-		pr_warn("UV NMI handler failed to register\n");
+		pr_warn("UV: NMI handler failed to register\n");
+
+	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping"))
+		pr_warn("UV: PING NMI handler failed to register\n");
 }
 
 void uv_nmi_init(void)
@@ -100,3 +546,30 @@ void uv_nmi_init(void)
 	apic_write(APIC_LVT1, value);
 }
 
+void uv_nmi_setup(void)
+{
+	int size = sizeof(void *) * (1 << NODES_SHIFT);
+	int cpu, nid;
+
+	/* Setup hub nmi info */
+	uv_nmi_setup_mmrs();
+	uv_hub_nmi_list = kzalloc(size, GFP_KERNEL);
+	pr_info("UV: NMI hub list @ 0x%p (%d)\n", uv_hub_nmi_list, size);
+	BUG_ON(!uv_hub_nmi_list);
+	size = sizeof(struct uv_hub_nmi_s);
+	for_each_present_cpu(cpu) {
+		nid = cpu_to_node(cpu);
+		if (uv_hub_nmi_list[nid] == NULL) {
+			uv_hub_nmi_list[nid] = kzalloc_node(size,
+							    GFP_KERNEL, nid);
+			BUG_ON(!uv_hub_nmi_list[nid]);
+			raw_spin_lock_init(&(uv_hub_nmi_list[nid]->nmi_lock));
+			atomic_set(&uv_hub_nmi_list[nid]->cpu_owner, -1);
+		}
+		uv_hub_nmi_per(cpu) = uv_hub_nmi_list[nid];
+	}
+	alloc_cpumask_var(&uv_nmi_cpu_mask, GFP_KERNEL);
+	BUG_ON(!uv_nmi_cpu_mask);
+}
+
+

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/9] x86/UV: Add summary of cpu activity to UV NMI handler
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
  2013-09-05 22:50 ` [PATCH 1/9] x86/UV: Move NMI support Mike Travis
  2013-09-05 22:50 ` [PATCH 2/9] x86/UV: Update UV support for external NMI signals Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 4/9] x86/UV: Add kdump " Mike Travis
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-dump-ips-on-nmi.patch --]
[-- Type: text/plain, Size: 3197 bytes --]

The standard NMI handler dumps the states of all the cpus.  This includes
a full register dump and stack trace.  This can be way more information
than what is needed.  This patch adds a "summary" dump that is basically
a form of the "ps" command.  It includes the symbolic IP address as well
as the command field and basic process information.

It is enabled when the nmi action is changed to "ips".

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/platform/uv/uv_nmi.c |   48 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -139,6 +139,19 @@ module_param_named(wait_count, uv_nmi_wa
 static int uv_nmi_retry_count = 500;
 module_param_named(retry_count, uv_nmi_retry_count, int, 0644);
 
+/*
+ * Valid NMI Actions:
+ *  "dump"	- dump process stack for each cpu
+ *  "ips"	- dump IP info for each cpu
+ */
+static char uv_nmi_action[8] = "dump";
+module_param_string(action, uv_nmi_action, sizeof(uv_nmi_action), 0644);
+
+static inline bool uv_nmi_action_is(const char *action)
+{
+	return (strncmp(uv_nmi_action, action, strlen(action)) == 0);
+}
+
 /* Setup which NMI support is present in system */
 static void uv_nmi_setup_mmrs(void)
 {
@@ -367,13 +380,38 @@ static void uv_nmi_wait(int master)
 		atomic_read(&uv_nmi_cpus_in_nmi), num_online_cpus());
 }
 
+static void uv_nmi_dump_cpu_ip_hdr(void)
+{
+	printk(KERN_DEFAULT
+		"\nUV: %4s %6s %-32s %s   (Note: PID 0 not listed)\n",
+		"CPU", "PID", "COMMAND", "IP");
+}
+
+static void uv_nmi_dump_cpu_ip(int cpu, struct pt_regs *regs)
+{
+	printk(KERN_DEFAULT "UV: %4d %6d %-32.32s ",
+		cpu, current->pid, current->comm);
+
+	printk_address(regs->ip, 1);
+}
+
 /* Dump this cpu's state */
 static void uv_nmi_dump_state_cpu(int cpu, struct pt_regs *regs)
 {
 	const char *dots = " ................................. ";
 
-	printk(KERN_DEFAULT "UV:%sNMI process trace for CPU %d\n", dots, cpu);
-	show_regs(regs);
+	if (uv_nmi_action_is("ips")) {
+		if (cpu == 0)
+			uv_nmi_dump_cpu_ip_hdr();
+
+		if (current->pid != 0)
+			uv_nmi_dump_cpu_ip(cpu, regs);
+
+	} else if (uv_nmi_action_is("dump")) {
+		printk(KERN_DEFAULT
+			"UV:%sNMI process trace for CPU %d\n", dots, cpu);
+		show_regs(regs);
+	}
 	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_DUMP_DONE);
 }
 
@@ -420,7 +458,8 @@ static void uv_nmi_dump_state(int cpu, s
 		int ignored = 0;
 		int saved_console_loglevel = console_loglevel;
 
-		pr_alert("UV: tracing processes for %d CPUs from CPU %d\n",
+		pr_alert("UV: tracing %s for %d CPUs from CPU %d\n",
+			uv_nmi_action_is("ips") ? "IPs" : "processes",
 			atomic_read(&uv_nmi_cpus_in_nmi), cpu);
 
 		console_loglevel = uv_nmi_loglevel;
@@ -482,7 +521,8 @@ int uv_handle_nmi(unsigned int reason, s
 	uv_nmi_wait(master);
 
 	/* Dump state of each cpu */
-	uv_nmi_dump_state(cpu, regs, master);
+	if (uv_nmi_action_is("ips") || uv_nmi_action_is("dump"))
+		uv_nmi_dump_state(cpu, regs, master);
 
 	/* Clear per_cpu "in nmi" flag */
 	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_OUT);

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/9] x86/UV: Add kdump to UV NMI handler
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (2 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 3/9] x86/UV: Add summary of cpu activity to UV NMI handler Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-add-kdump-on-nmi.patch --]
[-- Type: text/plain, Size: 2896 bytes --]

If a system has hung and it no longer responds to external events, this
patch adds the capability of doing a standard kdump and system reboot
then triggered by the system NMI command.

It is enabled when the nmi action is changed to "kdump" and the
kernel is built with CONFIG_KEXEC enabled.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/platform/uv/uv_nmi.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -21,6 +21,7 @@
 
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/kexec.h>
 #include <linux/module.h>
 #include <linux/nmi.h>
 #include <linux/sched.h>
@@ -70,6 +71,7 @@ static atomic_t	uv_in_nmi;
 static atomic_t uv_nmi_cpu = ATOMIC_INIT(-1);
 static atomic_t uv_nmi_cpus_in_nmi = ATOMIC_INIT(-1);
 static atomic_t uv_nmi_slave_continue;
+static atomic_t uv_nmi_kexec_failed;
 static cpumask_var_t uv_nmi_cpu_mask;
 
 /* Values for uv_nmi_slave_continue */
@@ -143,6 +145,7 @@ module_param_named(retry_count, uv_nmi_r
  * Valid NMI Actions:
  *  "dump"	- dump process stack for each cpu
  *  "ips"	- dump IP info for each cpu
+ *  "kdump"	- do crash dump
  */
 static char uv_nmi_action[8] = "dump";
 module_param_string(action, uv_nmi_action, sizeof(uv_nmi_action), 0644);
@@ -496,6 +499,40 @@ static void uv_nmi_touch_watchdogs(void)
 	touch_nmi_watchdog();
 }
 
+#if defined(CONFIG_KEXEC)
+static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
+{
+	/* Call crash to dump system state */
+	if (master) {
+		pr_emerg("UV: NMI executing crash_kexec on CPU%d\n", cpu);
+		crash_kexec(regs);
+
+		pr_emerg("UV: crash_kexec unexpectedly returned, ");
+		if (!kexec_crash_image) {
+			pr_cont("crash kernel not loaded\n");
+			atomic_set(&uv_nmi_kexec_failed, 1);
+			uv_nmi_sync_exit(1);
+			return;
+		}
+		pr_cont("kexec busy, stalling cpus while waiting\n");
+	}
+
+	/* If crash exec fails the slaves should return, otherwise stall */
+	while (atomic_read(&uv_nmi_kexec_failed) == 0)
+		mdelay(10);
+
+	/* Crash kernel most likely not loaded, return in an orderly fashion */
+	uv_nmi_sync_exit(0);
+}
+
+#else /* !CONFIG_KEXEC */
+static inline void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
+{
+	if (master)
+		pr_err("UV: NMI kdump: KEXEC not supported in this kernel\n");
+}
+#endif /* !CONFIG_KEXEC */
+
 /*
  * UV NMI handler
  */
@@ -517,6 +554,10 @@ int uv_handle_nmi(unsigned int reason, s
 	/* Indicate we are the first CPU into the NMI handler */
 	master = (atomic_read(&uv_nmi_cpu) == cpu);
 
+	/* If NMI action is "kdump", then attempt to do it */
+	if (uv_nmi_action_is("kdump"))
+		uv_nmi_kdump(cpu, master, regs);
+
 	/* Pause as all cpus enter the NMI handler */
 	uv_nmi_wait(master);
 

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (3 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 4/9] x86/UV: Add kdump " Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-06  4:36   ` Jason Wessel
  2013-09-05 22:50 ` [PATCH 6/9] x86/UV: Add call to KGDB/KDB from NMI handler Mike Travis
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: kgdb-add-nmi-callin.patch --]
[-- Type: text/plain, Size: 3166 bytes --]

This patch adds a kgdb_nmicallin() interface that can be used by
external NMI handlers to call the KGDB/KDB handler.  The primary need
for this is for those types of NMI interrupts where all the CPUs
have already received the NMI signal.  Therefore no send_IPI(NMI)
is required, and in fact it will cause a 2nd unhandled NMI to occur.
This generates the "Dazed and Confuzed" messages.

Since all the CPUs are getting the NMI at roughly the same time, it's not
guaranteed that the first CPU that hits the NMI handler will manage to
enter KGDB and set the dbg_master_lock before the slaves start entering.
The new argument "send_ready" was added for KGDB to signal the NMI handler
to release the slave CPUs for entry into KGDB.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 include/linux/kgdb.h      |    1 +
 kernel/debug/debug_core.c |   41 +++++++++++++++++++++++++++++++++++++++++
 kernel/debug/debug_core.h |    1 +
 3 files changed, 43 insertions(+)

--- linux.orig/include/linux/kgdb.h
+++ linux/include/linux/kgdb.h
@@ -310,6 +310,7 @@ extern int
 kgdb_handle_exception(int ex_vector, int signo, int err_code,
 		      struct pt_regs *regs);
 extern int kgdb_nmicallback(int cpu, void *regs);
+extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *snd_rdy);
 extern void gdbstub_exit(int status);
 
 extern int			kgdb_single_step;
--- linux.orig/kernel/debug/debug_core.c
+++ linux/kernel/debug/debug_core.c
@@ -578,6 +578,10 @@ return_normal:
 	/* Signal the other CPUs to enter kgdb_wait() */
 	if ((!kgdb_single_step) && kgdb_do_roundup)
 		kgdb_roundup_cpus(flags);
+
+	/* If optional send ready pointer, signal CPUs to proceed */
+	if (kgdb_info[cpu].send_ready)
+		atomic_set(kgdb_info[cpu].send_ready, 1);
 #endif
 
 	/*
@@ -729,6 +733,43 @@ int kgdb_nmicallback(int cpu, void *regs
 		return 0;
 	}
 #endif
+	return 1;
+}
+
+int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
+{
+#ifdef CONFIG_SMP
+	if (!kgdb_io_ready(0))
+		return 1;
+
+	if (kgdb_info[cpu].enter_kgdb == 0) {
+		struct kgdb_state kgdb_var;
+		struct kgdb_state *ks = &kgdb_var;
+		int save_kgdb_do_roundup = kgdb_do_roundup;
+
+		memset(ks, 0, sizeof(struct kgdb_state));
+		ks->cpu			= cpu;
+		ks->ex_vector		= trapnr;
+		ks->signo		= SIGTRAP;
+		ks->err_code		= 0;
+		ks->kgdb_usethreadid	= 0;
+		ks->linux_regs		= regs;
+
+		/* Do not broadcast NMI */
+		kgdb_do_roundup = 0;
+
+		/* Indicate there are slaves waiting */
+		kgdb_info[cpu].send_ready = send_ready;
+		kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
+		kgdb_do_roundup = save_kgdb_do_roundup;
+		kgdb_info[cpu].send_ready = NULL;
+
+		/* Wait till all the CPUs have quit from the debugger. */
+		while (atomic_read(&slaves_in_kgdb))
+			cpu_relax();
+		return 0;
+	}
+#endif
 	return 1;
 }
 
--- linux.orig/kernel/debug/debug_core.h
+++ linux/kernel/debug/debug_core.h
@@ -37,6 +37,7 @@ struct kgdb_state {
 struct debuggerinfo_struct {
 	void			*debuggerinfo;
 	struct task_struct	*task;
+	atomic_t		*send_ready;
 	int			exception_state;
 	int			ret_state;
 	int			irq_depth;

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 6/9] x86/UV: Add call to KGDB/KDB from NMI handler
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (4 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB Mike Travis
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-add-nmi-call-kdb.patch --]
[-- Type: text/plain, Size: 3530 bytes --]

This patch restores the capability to enter KDB (and KGDB) from the UV NMI
handler.  This is needed because the UV system console is not capable of
sending the 'break' signal to the serial console port.  It is also useful
when the kernel is hung in such a way that it isn't responding to normal
external I/O, so sending 'g' to sysreq-trigger does not work either.

Another benefit of the external NMI command is that all the cpus receive
the NMI signal at roughly the same time so they are more closely aligned
timewise.

It utilizes the newly added kgdb_nmicallin function to gain entry
to KGDB/KDB by the master.  The slaves still enter via the standard
kgdb_nmicallback function.  It also uses the new 'send_ready' pointer
to tell KGDB/KDB to signal the slaves when to proceed into the KGDB
slave loop.

It is enabled when the nmi action is set to "kdb" and the kernel is
built with CONFIG_KDB enabled.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/platform/uv/uv_nmi.c |   47 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -21,7 +21,9 @@
 
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/kdb.h>
 #include <linux/kexec.h>
+#include <linux/kgdb.h>
 #include <linux/module.h>
 #include <linux/nmi.h>
 #include <linux/sched.h>
@@ -32,6 +34,7 @@
 #include <asm/kdebug.h>
 #include <asm/local64.h>
 #include <asm/nmi.h>
+#include <asm/traps.h>
 #include <asm/uv/uv.h>
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_mmrs.h>
@@ -146,8 +149,9 @@ module_param_named(retry_count, uv_nmi_r
  *  "dump"	- dump process stack for each cpu
  *  "ips"	- dump IP info for each cpu
  *  "kdump"	- do crash dump
+ *  "kdb"	- enter KDB/KGDB (default)
  */
-static char uv_nmi_action[8] = "dump";
+static char uv_nmi_action[8] = "kdb";
 module_param_string(action, uv_nmi_action, sizeof(uv_nmi_action), 0644);
 
 static inline bool uv_nmi_action_is(const char *action)
@@ -533,6 +537,43 @@ static inline void uv_nmi_kdump(int cpu,
 }
 #endif /* !CONFIG_KEXEC */
 
+#ifdef CONFIG_KGDB_KDB
+/* Call KDB from NMI handler */
+static void uv_call_kdb(int cpu, struct pt_regs *regs, int master)
+{
+	int ret;
+
+	if (master) {
+		/* call KGDB NMI handler as MASTER */
+		ret = kgdb_nmicallin(cpu, X86_TRAP_NMI, regs,
+					&uv_nmi_slave_continue);
+		if (ret) {
+			pr_alert("KDB returned error, is kgdboc set?\n");
+			atomic_set(&uv_nmi_slave_continue, SLAVE_EXIT);
+		}
+	} else {
+		/* wait for KGDB signal that it's ready for slaves to enter */
+		int sig;
+
+		do {
+			cpu_relax();
+			sig = atomic_read(&uv_nmi_slave_continue);
+		} while (!sig);
+
+		/* call KGDB as slave */
+		if (sig == SLAVE_CONTINUE)
+			ret = kgdb_nmicallback(cpu, regs);
+	}
+	uv_nmi_sync_exit(master);
+}
+
+#else /* !CONFIG_KGDB_KDB */
+static inline void uv_call_kdb(int cpu, struct pt_regs *regs, int master)
+{
+	pr_err("UV: NMI error: KGDB/KDB is not enabled in this kernel\n");
+}
+#endif /* !CONFIG_KGDB_KDB */
+
 /*
  * UV NMI handler
  */
@@ -565,6 +606,10 @@ int uv_handle_nmi(unsigned int reason, s
 	if (uv_nmi_action_is("ips") || uv_nmi_action_is("dump"))
 		uv_nmi_dump_state(cpu, regs, master);
 
+	/* Call KDB if enabled */
+	else if (uv_nmi_action_is("kdb"))
+		uv_call_kdb(cpu, regs, master);
+
 	/* Clear per_cpu "in nmi" flag */
 	atomic_set(&uv_cpu_nmi.state, UV_NMI_STATE_OUT);
 

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (5 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 6/9] x86/UV: Add call to KGDB/KDB from NMI handler Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-06  5:00   ` Jason Wessel
  2013-09-05 22:50 ` [PATCH 8/9] x86/UV: Add uvtrace support Mike Travis
  2013-09-05 22:50 ` [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler Mike Travis
  8 siblings, 1 reply; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: kdb-add-system-nmi.patch --]
[-- Type: text/plain, Size: 3015 bytes --]

This patch adds a new "KDB_REASON" code (KDB_REASON_SYSTEM_NMI).  This
is purely cosmetic to distinguish it from the other various reasons that
NMI may occur and are usually after an error occurred.  Also the dumping
of registers is not done to more closely match what is displayed when KDB
is entered manually via the sysreq 'g' key.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 include/linux/kdb.h             |    1 +
 include/linux/kgdb.h            |    1 +
 kernel/debug/debug_core.c       |    5 +++++
 kernel/debug/kdb/kdb_debugger.c |    5 ++++-
 kernel/debug/kdb/kdb_main.c     |    3 +++
 5 files changed, 14 insertions(+), 1 deletion(-)

--- linux.orig/include/linux/kdb.h
+++ linux/include/linux/kdb.h
@@ -109,6 +109,7 @@ typedef enum {
 	KDB_REASON_RECURSE,	/* Recursive entry to kdb;
 				 * regs probably valid */
 	KDB_REASON_SSTEP,	/* Single Step trap. - regs valid */
+	KDB_REASON_SYSTEM_NMI,	/* In NMI due to SYSTEM cmd; regs valid */
 } kdb_reason_t;
 
 extern int kdb_trap_printk;
--- linux.orig/include/linux/kgdb.h
+++ linux/include/linux/kgdb.h
@@ -52,6 +52,7 @@ extern int kgdb_connected;
 extern int kgdb_io_module_registered;
 
 extern atomic_t			kgdb_setting_breakpoint;
+extern atomic_t			kgdb_system_nmi;
 extern atomic_t			kgdb_cpu_doing_single_step;
 
 extern struct task_struct	*kgdb_usethread;
--- linux.orig/kernel/debug/debug_core.c
+++ linux/kernel/debug/debug_core.c
@@ -125,6 +125,7 @@ static atomic_t			masters_in_kgdb;
 static atomic_t			slaves_in_kgdb;
 static atomic_t			kgdb_break_tasklet_var;
 atomic_t			kgdb_setting_breakpoint;
+atomic_t			kgdb_system_nmi;
 
 struct task_struct		*kgdb_usethread;
 struct task_struct		*kgdb_contthread;
@@ -760,7 +761,11 @@ int kgdb_nmicallin(int cpu, int trapnr,
 
 		/* Indicate there are slaves waiting */
 		kgdb_info[cpu].send_ready = send_ready;
+
+		/* Use new reason code "SYSTEM_NMI" */
+		atomic_inc(&kgdb_system_nmi);
 		kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
+		atomic_dec(&kgdb_system_nmi);
 		kgdb_do_roundup = save_kgdb_do_roundup;
 		kgdb_info[cpu].send_ready = NULL;
 
--- linux.orig/kernel/debug/kdb/kdb_debugger.c
+++ linux/kernel/debug/kdb/kdb_debugger.c
@@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
 	if (atomic_read(&kgdb_setting_breakpoint))
 		reason = KDB_REASON_KEYBOARD;
 
-	if (in_nmi())
+	if (atomic_read(&kgdb_system_nmi))
+		reason = KDB_REASON_SYSTEM_NMI;
+
+	else if (in_nmi())
 		reason = KDB_REASON_NMI;
 
 	for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
--- linux.orig/kernel/debug/kdb/kdb_main.c
+++ linux/kernel/debug/kdb/kdb_main.c
@@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
 			   instruction_pointer(regs));
 		kdb_dumpregs(regs);
 		break;
+	case KDB_REASON_SYSTEM_NMI:
+		kdb_printf("due to System NonMaskable Interrupt\n");
+		break;
 	case KDB_REASON_NMI:
 		kdb_printf("due to NonMaskable Interrupt @ "
 			   kdb_machreg_fmt "\n",

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/9] x86/UV: Add uvtrace support
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (6 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-05 22:50 ` [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler Mike Travis
  8 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-add-trace-support.patch --]
[-- Type: text/plain, Size: 2593 bytes --]

This patch adds support for the uvtrace module by providing a skeleton
call to the registered trace function.  It also provides another separate
'NMI' tracer that is triggered by the system wide 'power nmi' command.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/include/asm/uv/uv.h  |    8 ++++++++
 arch/x86/platform/uv/uv_nmi.c |   13 ++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

--- linux.orig/arch/x86/include/asm/uv/uv.h
+++ linux/arch/x86/include/asm/uv/uv.h
@@ -14,6 +14,13 @@ extern void uv_cpu_init(void);
 extern void uv_nmi_init(void);
 extern void uv_register_nmi_notifier(void);
 extern void uv_system_init(void);
+extern void (*uv_trace_nmi_func)(unsigned int reason, struct pt_regs *regs);
+extern void (*uv_trace_func)(const char *f, const int l, const char *fmt, ...);
+#define uv_trace(fmt, ...)						\
+do {									\
+	if (unlikely(uv_trace_func))					\
+		(uv_trace_func)(__func__, __LINE__, fmt, ##__VA_ARGS__);\
+} while (0)
 extern const struct cpumask *uv_flush_tlb_others(const struct cpumask *cpumask,
 						 struct mm_struct *mm,
 						 unsigned long start,
@@ -26,6 +33,7 @@ static inline enum uv_system_type get_uv
 static inline int is_uv_system(void)	{ return 0; }
 static inline void uv_cpu_init(void)	{ }
 static inline void uv_system_init(void)	{ }
+static inline void uv_trace(void *fmt, ...)	{ }
 static inline void uv_register_nmi_notifier(void) { }
 static inline const struct cpumask *
 uv_flush_tlb_others(const struct cpumask *cpumask, struct mm_struct *mm,
--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -1,5 +1,5 @@
 /*
- * SGI NMI support routines
+ * SGI NMI/TRACE support routines
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -39,6 +39,13 @@
 #include <asm/uv/uv_hub.h>
 #include <asm/uv/uv_mmrs.h>
 
+void (*uv_trace_func)(const char *f, const int l, const char *fmt, ...);
+EXPORT_SYMBOL(uv_trace_func);
+
+void (*uv_trace_nmi_func)(unsigned int reason, struct pt_regs *regs);
+EXPORT_SYMBOL(uv_trace_nmi_func);
+
+
 /*
  * UV handler for NMI
  *
@@ -592,6 +599,10 @@ int uv_handle_nmi(unsigned int reason, s
 		return NMI_DONE;
 	}
 
+	/* Call possible NMI trace function */
+	if (unlikely(uv_trace_nmi_func))
+		(uv_trace_nmi_func)(reason, regs);
+
 	/* Indicate we are the first CPU into the NMI handler */
 	master = (atomic_read(&uv_nmi_cpu) == cpu);
 

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
                   ` (7 preceding siblings ...)
  2013-09-05 22:50 ` [PATCH 8/9] x86/UV: Add uvtrace support Mike Travis
@ 2013-09-05 22:50 ` Mike Travis
  2013-09-09 12:43   ` Peter Zijlstra
  8 siblings, 1 reply; 23+ messages in thread
From: Mike Travis @ 2013-09-05 22:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton
  Cc: Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

[-- Attachment #1: uv-add-nmi-disable.patch --]
[-- Type: text/plain, Size: 3757 bytes --]

For performance reasons, the NMI handler may be disabled to lessen the
performance impact caused by the multiple perf tools running concurently.
If the system nmi command is issued when the UV NMI handler is disabled,
the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
disabled by setting the nmi disabled variable to '1'.  Setting it back to
'0' will re-enable the NMI handler.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hberrich@sgi.com>
---
 arch/x86/platform/uv/uv_nmi.c |   69 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

--- linux.orig/arch/x86/platform/uv/uv_nmi.c
+++ linux/arch/x86/platform/uv/uv_nmi.c
@@ -73,6 +73,7 @@ static struct uv_hub_nmi_s **uv_hub_nmi_
 DEFINE_PER_CPU(struct uv_cpu_nmi_s, __uv_cpu_nmi);
 EXPORT_PER_CPU_SYMBOL_GPL(__uv_cpu_nmi);
 
+static int uv_nmi_registered;
 static unsigned long nmi_mmr;
 static unsigned long nmi_mmr_clear;
 static unsigned long nmi_mmr_pending;
@@ -130,6 +131,31 @@ module_param_named(ping_count, uv_nmi_pi
 static local64_t uv_nmi_ping_misses;
 module_param_named(ping_misses, uv_nmi_ping_misses, local64, 0644);
 
+static int uv_nmi_disabled;
+static int param_get_disabled(char *buffer, const struct kernel_param *kp)
+{
+	return sprintf(buffer, "%u\n", uv_nmi_disabled);
+}
+
+static void uv_nmi_notify_disabled(void);
+static int param_set_disabled(const char *val, const struct kernel_param *kp)
+{
+	int ret = param_set_bint(val, kp);
+
+	if (ret)
+		return ret;
+
+	uv_nmi_notify_disabled();
+	return 0;
+}
+
+static struct kernel_param_ops param_ops_disabled = {
+	.get = param_get_disabled,
+	.set = param_set_disabled,
+};
+#define param_check_disabled(name, p) __param_check(name, p, int)
+module_param_named(disabled, uv_nmi_disabled, disabled, 0644);
+
 /*
  * Following values allow tuning for large systems under heavy loading
  */
@@ -634,6 +660,8 @@ int uv_handle_nmi(unsigned int reason, s
 		atomic_set(&uv_nmi_cpus_in_nmi, -1);
 		atomic_set(&uv_nmi_cpu, -1);
 		atomic_set(&uv_in_nmi, 0);
+		if (uv_nmi_disabled)
+			uv_nmi_notify_disabled();
 	}
 
 	uv_nmi_touch_watchdogs();
@@ -664,11 +692,30 @@ int uv_handle_nmi_ping(unsigned int reas
 
 void uv_register_nmi_notifier(void)
 {
+	if (uv_nmi_registered || uv_nmi_disabled)
+		return;
+
 	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
 		pr_warn("UV: NMI handler failed to register\n");
 
 	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping"))
 		pr_warn("UV: PING NMI handler failed to register\n");
+
+	uv_nmi_registered = 1;
+	pr_info("UV: NMI handler registered\n");
+}
+
+static void uv_nmi_disabled_msg(void)
+{
+	pr_err("UV: NMI handler disabled, power nmi command will be ignored\n");
+}
+
+static void uv_unregister_nmi_notifier(void)
+{
+	unregister_nmi_handler(NMI_UNKNOWN, "uv");
+	unregister_nmi_handler(NMI_LOCAL, "uvping");
+	uv_nmi_registered = 0;
+	uv_nmi_disabled_msg();
 }
 
 void uv_nmi_init(void)
@@ -688,6 +735,11 @@ void uv_nmi_setup(void)
 	int size = sizeof(void *) * (1 << NODES_SHIFT);
 	int cpu, nid;
 
+	if (uv_nmi_disabled) {
+		uv_nmi_disabled_msg();
+		return;
+	}
+
 	/* Setup hub nmi info */
 	uv_nmi_setup_mmrs();
 	uv_hub_nmi_list = kzalloc(size, GFP_KERNEL);
@@ -709,4 +761,21 @@ void uv_nmi_setup(void)
 	BUG_ON(!uv_nmi_cpu_mask);
 }
 
+static void uv_nmi_notify_disabled(void)
+{
+	if (uv_nmi_disabled) {
+		/* if in nmi, handler will disable when finished */
+		if (atomic_read(&uv_in_nmi))
+			return;
 
+		if (uv_nmi_registered)
+			uv_unregister_nmi_notifier();
+
+	} else {
+		/* nmi control lists not yet allocated? */
+		if (!uv_hub_nmi_list)
+			uv_nmi_setup();
+
+		uv_register_nmi_notifier();
+	}
+}

-- 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB.
  2013-09-05 22:50 ` [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
@ 2013-09-06  4:36   ` Jason Wessel
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Wessel @ 2013-09-06  4:36 UTC (permalink / raw)
  To: Mike Travis
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Thomas Gleixner,
	Andrew Morton, Dimitri Sivanich, Hedi Berriche, x86,
	linux-kernel

On 09/05/2013 05:50 PM, Mike Travis wrote:
> This patch adds a kgdb_nmicallin() interface that can be used by
> external NMI handlers to call the KGDB/KDB handler.  The primary need
> for this is for those types of NMI interrupts where all the CPUs
> have already received the NMI signal.  Therefore no send_IPI(NMI)
> is required, and in fact it will cause a 2nd unhandled NMI to occur.
> This generates the "Dazed and Confuzed" messages.
>
> Since all the CPUs are getting the NMI at roughly the same time, it's not
> guaranteed that the first CPU that hits the NMI handler will manage to
> enter KGDB and set the dbg_master_lock before the slaves start entering.

It should have been ok to have more than one master if this was some kind of watch dog.  The raw spin lock for the dbg_master_lock should have ensured that only a single CPU is in fact the master.  If it is the case that we cannot send a nested IPI at this point, the UV machine type should have replaced the kgdb_roundup_cpus() routine with something that will work, such as looking at the exception type on the way in and perhaps skipping the IPI send.

Also if there is no possibility of restarting the machine from this state it would have been possible to simply turn off kgdb_do_roundup in the custom kgdb_roundup_cpus().

The patch you created appears that it will work, but it comes at the cost of some complexity because you are also checking on the state of "kgdb_info[cpu].send_ready" in some other location in the NMI handler.  It might be better to consider not sending a nested NMI if all the CPUs are going to enter anyway in the master state.


>
> The new argument "send_ready" was added for KGDB to signal the NMI handler
> to release the slave CPUs for entry into KGDB.
>
> Signed-off-by: Mike Travis <travis@sgi.com>
> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
> Reviewed-by: Hedi Berriche <hberrich@sgi.com>
> ---
>  include/linux/kgdb.h      |    1 +
>  kernel/debug/debug_core.c |   41 +++++++++++++++++++++++++++++++++++++++++
>  kernel/debug/debug_core.h |    1 +
>  3 files changed, 43 insertions(+)
>
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -310,6 +310,7 @@ extern int
>  kgdb_handle_exception(int ex_vector, int signo, int err_code,
>                struct pt_regs *regs);
>  extern int kgdb_nmicallback(int cpu, void *regs);
> +extern int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *snd_rdy);
>  extern void gdbstub_exit(int status);
> 
>  extern int            kgdb_single_step;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -578,6 +578,10 @@ return_normal:
>      /* Signal the other CPUs to enter kgdb_wait() */
>      if ((!kgdb_single_step) && kgdb_do_roundup)
>          kgdb_roundup_cpus(flags);
> +
> +    /* If optional send ready pointer, signal CPUs to proceed */
> +    if (kgdb_info[cpu].send_ready)
> +        atomic_set(kgdb_info[cpu].send_ready, 1);
>  #endif
> 
>      /*
> @@ -729,6 +733,43 @@ int kgdb_nmicallback(int cpu, void *regs
>          return 0;
>      }
>  #endif
> +    return 1;
> +}
> +
> +int kgdb_nmicallin(int cpu, int trapnr, void *regs, atomic_t *send_ready)
> +{
> +#ifdef CONFIG_SMP
> +    if (!kgdb_io_ready(0))
> +        return 1;
> +
> +    if (kgdb_info[cpu].enter_kgdb == 0) {
> +        struct kgdb_state kgdb_var;
> +        struct kgdb_state *ks = &kgdb_var;
> +        int save_kgdb_do_roundup = kgdb_do_roundup;
> +
> +        memset(ks, 0, sizeof(struct kgdb_state));
> +        ks->cpu            = cpu;
> +        ks->ex_vector        = trapnr;
> +        ks->signo        = SIGTRAP;
> +        ks->err_code        = 0;
> +        ks->kgdb_usethreadid    = 0;
> +        ks->linux_regs        = regs;
> +
> +        /* Do not broadcast NMI */
> +        kgdb_do_roundup = 0;
> +
> +        /* Indicate there are slaves waiting */
> +        kgdb_info[cpu].send_ready = send_ready;
> +        kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);

This is the one part of the patch I don't quite understand.  Why does the kgdb_nmicallin() desire to be the master core?

It was not obvious the circumstance as to why this is called.  Is it some kind of watch dog where you really do want to enter the debugger or is it more to deal with nested slave interrupts were the round up would have possibly hung on this hardware.  If it is the later, I would have thought this should be a slave and not the master.

Perhaps a comment in the code can clear this up?

Thanks,
Jason.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB
  2013-09-05 22:50 ` [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB Mike Travis
@ 2013-09-06  5:00   ` Jason Wessel
  2013-09-06 16:48     ` Mike Travis
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Wessel @ 2013-09-06  5:00 UTC (permalink / raw)
  To: Mike Travis
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Thomas Gleixner,
	Andrew Morton, Dimitri Sivanich, Hedi Berriche, x86,
	linux-kernel

On 09/05/2013 05:50 PM, Mike Travis wrote:
> This patch adds a new "KDB_REASON" code (KDB_REASON_SYSTEM_NMI).  This
> is purely cosmetic to distinguish it from the other various reasons that
> NMI may occur and are usually after an error occurred.  Also the dumping
> of registers is not done to more closely match what is displayed when KDB
> is entered manually via the sysreq 'g' key.


This patch is not quite right.   See below.


> 
> Signed-off-by: Mike Travis <travis@sgi.com>
> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
> Reviewed-by: Hedi Berriche <hberrich@sgi.com>
> ---
>  include/linux/kdb.h             |    1 +
>  include/linux/kgdb.h            |    1 +
>  kernel/debug/debug_core.c       |    5 +++++
>  kernel/debug/kdb/kdb_debugger.c |    5 ++++-
>  kernel/debug/kdb/kdb_main.c     |    3 +++
>  5 files changed, 14 insertions(+), 1 deletion(-)
> 
> --- linux.orig/include/linux/kdb.h
> +++ linux/include/linux/kdb.h
> @@ -109,6 +109,7 @@ typedef enum {
>  	KDB_REASON_RECURSE,	/* Recursive entry to kdb;
>  				 * regs probably valid */
>  	KDB_REASON_SSTEP,	/* Single Step trap. - regs valid */
> +	KDB_REASON_SYSTEM_NMI,	/* In NMI due to SYSTEM cmd; regs valid */
>  } kdb_reason_t;
>  
>  extern int kdb_trap_printk;
> --- linux.orig/include/linux/kgdb.h
> +++ linux/include/linux/kgdb.h
> @@ -52,6 +52,7 @@ extern int kgdb_connected;
>  extern int kgdb_io_module_registered;
>  
>  extern atomic_t			kgdb_setting_breakpoint;
> +extern atomic_t			kgdb_system_nmi;


We don't need extra atomics.  You should add another variable to the kgdb_state which is processor specific in this case.

Better yet, just set the ks->err_code properly in your kgdb_nmicallin() or in the origination call to kgdb_nmicallback() from your nmi handler (remember I still have the question pending if we actually need kgdb_nmicallin() in the first place.  You already did the work of adding another NMI type to the enum.  We just need to use the ks->err_code variable as well.


>  extern atomic_t			kgdb_cpu_doing_single_step;
>  
>  extern struct task_struct	*kgdb_usethread;
> --- linux.orig/kernel/debug/debug_core.c
> +++ linux/kernel/debug/debug_core.c
> @@ -125,6 +125,7 @@ static atomic_t			masters_in_kgdb;
>  static atomic_t			slaves_in_kgdb;
>  static atomic_t			kgdb_break_tasklet_var;
>  atomic_t			kgdb_setting_breakpoint;
> +atomic_t			kgdb_system_nmi;
>  
>  struct task_struct		*kgdb_usethread;
>  struct task_struct		*kgdb_contthread;
> @@ -760,7 +761,11 @@ int kgdb_nmicallin(int cpu, int trapnr,
>  
>  		/* Indicate there are slaves waiting */
>  		kgdb_info[cpu].send_ready = send_ready;
> +
> +		/* Use new reason code "SYSTEM_NMI" */
> +		atomic_inc(&kgdb_system_nmi);
>  		kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
> +		atomic_dec(&kgdb_system_nmi);
>  		kgdb_do_roundup = save_kgdb_do_roundup;
>  		kgdb_info[cpu].send_ready = NULL;
>  
> --- linux.orig/kernel/debug/kdb/kdb_debugger.c
> +++ linux/kernel/debug/kdb/kdb_debugger.c
> @@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
>  	if (atomic_read(&kgdb_setting_breakpoint))
>  		reason = KDB_REASON_KEYBOARD;
>  
> -	if (in_nmi())
> +	if (atomic_read(&kgdb_system_nmi))
> +		reason = KDB_REASON_SYSTEM_NMI;


This would get changed to if (ks->err == KDB_REASON_SYSNMI && ks->signo == SIGTRAP) ....

Cheers,
Jason.

> +
> +	else if (in_nmi())
>  		reason = KDB_REASON_NMI;
>  
>  	for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
> --- linux.orig/kernel/debug/kdb/kdb_main.c
> +++ linux/kernel/debug/kdb/kdb_main.c
> @@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
>  			   instruction_pointer(regs));
>  		kdb_dumpregs(regs);
>  		break;
> +	case KDB_REASON_SYSTEM_NMI:
> +		kdb_printf("due to System NonMaskable Interrupt\n");
> +		break;
>  	case KDB_REASON_NMI:
>  		kdb_printf("due to NonMaskable Interrupt @ "
>  			   kdb_machreg_fmt "\n",
> 
> -- 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB
  2013-09-06  5:00   ` Jason Wessel
@ 2013-09-06 16:48     ` Mike Travis
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-06 16:48 UTC (permalink / raw)
  To: Jason Wessel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Thomas Gleixner,
	Andrew Morton, Dimitri Sivanich, Hedi Berriche, x86,
	linux-kernel



On 9/5/2013 10:00 PM, Jason Wessel wrote:
> On 09/05/2013 05:50 PM, Mike Travis wrote:
>> This patch adds a new "KDB_REASON" code (KDB_REASON_SYSTEM_NMI).  This
>> is purely cosmetic to distinguish it from the other various reasons that
>> NMI may occur and are usually after an error occurred.  Also the dumping
>> of registers is not done to more closely match what is displayed when KDB
>> is entered manually via the sysreq 'g' key.
> 
> 
> This patch is not quite right.   See below.
> 
> 
>>
>> Signed-off-by: Mike Travis <travis@sgi.com>
>> Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
>> Reviewed-by: Hedi Berriche <hberrich@sgi.com>
>> ---
>>  include/linux/kdb.h             |    1 +
>>  include/linux/kgdb.h            |    1 +
>>  kernel/debug/debug_core.c       |    5 +++++
>>  kernel/debug/kdb/kdb_debugger.c |    5 ++++-
>>  kernel/debug/kdb/kdb_main.c     |    3 +++
>>  5 files changed, 14 insertions(+), 1 deletion(-)
>>
>> --- linux.orig/include/linux/kdb.h
>> +++ linux/include/linux/kdb.h
>> @@ -109,6 +109,7 @@ typedef enum {
>>  	KDB_REASON_RECURSE,	/* Recursive entry to kdb;
>>  				 * regs probably valid */
>>  	KDB_REASON_SSTEP,	/* Single Step trap. - regs valid */
>> +	KDB_REASON_SYSTEM_NMI,	/* In NMI due to SYSTEM cmd; regs valid */
>>  } kdb_reason_t;
>>  
>>  extern int kdb_trap_printk;
>> --- linux.orig/include/linux/kgdb.h
>> +++ linux/include/linux/kgdb.h
>> @@ -52,6 +52,7 @@ extern int kgdb_connected;
>>  extern int kgdb_io_module_registered;
>>  
>>  extern atomic_t			kgdb_setting_breakpoint;
>> +extern atomic_t			kgdb_system_nmi;
> 
> 
> We don't need extra atomics. You should add another variable to the
kgdb_state which is processor specific in this case.
> 
> Better yet, just set the ks->err_code properly in your
kgdb_nmicallin() or in the origination call to kgdb_nmicallback() from
your nmi handler (remember I still have the question pending if we
actually need kgdb_nmicallin() in the first place. You already did the
work of adding another NMI type to the enum. We just need to use the
ks->err_code variable as well.

Good idea, I hadn't thought of using that field.  In fact, it
simplified the patch enough that I just folded into the other.

I'll address your other question separately.

Thanks!
Mike
> 
> 
>>  extern atomic_t			kgdb_cpu_doing_single_step;
>>  
>>  extern struct task_struct	*kgdb_usethread;
>> --- linux.orig/kernel/debug/debug_core.c
>> +++ linux/kernel/debug/debug_core.c
>> @@ -125,6 +125,7 @@ static atomic_t			masters_in_kgdb;
>>  static atomic_t			slaves_in_kgdb;
>>  static atomic_t			kgdb_break_tasklet_var;
>>  atomic_t			kgdb_setting_breakpoint;
>> +atomic_t			kgdb_system_nmi;
>>  
>>  struct task_struct		*kgdb_usethread;
>>  struct task_struct		*kgdb_contthread;
>> @@ -760,7 +761,11 @@ int kgdb_nmicallin(int cpu, int trapnr,
>>  
>>  		/* Indicate there are slaves waiting */
>>  		kgdb_info[cpu].send_ready = send_ready;
>> +
>> +		/* Use new reason code "SYSTEM_NMI" */
>> +		atomic_inc(&kgdb_system_nmi);
>>  		kgdb_cpu_enter(ks, regs, DCPU_WANT_MASTER);
>> +		atomic_dec(&kgdb_system_nmi);
>>  		kgdb_do_roundup = save_kgdb_do_roundup;
>>  		kgdb_info[cpu].send_ready = NULL;
>>  
>> --- linux.orig/kernel/debug/kdb/kdb_debugger.c
>> +++ linux/kernel/debug/kdb/kdb_debugger.c
>> @@ -69,7 +69,10 @@ int kdb_stub(struct kgdb_state *ks)
>>  	if (atomic_read(&kgdb_setting_breakpoint))
>>  		reason = KDB_REASON_KEYBOARD;
>>  
>> -	if (in_nmi())
>> +	if (atomic_read(&kgdb_system_nmi))
>> +		reason = KDB_REASON_SYSTEM_NMI;
> 
> 
> This would get changed to if (ks->err == KDB_REASON_SYSNMI &&
ks->signo == SIGTRAP) ....
> 
> Cheers,
> Jason.
> 
>> +
>> +	else if (in_nmi())
>>  		reason = KDB_REASON_NMI;
>>  
>>  	for (i = 0, bp = kdb_breakpoints; i < KDB_MAXBPT; i++, bp++) {
>> --- linux.orig/kernel/debug/kdb/kdb_main.c
>> +++ linux/kernel/debug/kdb/kdb_main.c
>> @@ -1200,6 +1200,9 @@ static int kdb_local(kdb_reason_t reason
>>  			   instruction_pointer(regs));
>>  		kdb_dumpregs(regs);
>>  		break;
>> +	case KDB_REASON_SYSTEM_NMI:
>> +		kdb_printf("due to System NonMaskable Interrupt\n");
>> +		break;
>>  	case KDB_REASON_NMI:
>>  		kdb_printf("due to NonMaskable Interrupt @ "
>>  			   kdb_machreg_fmt "\n",
>>
>> -- 
>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-05 22:50 ` [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler Mike Travis
@ 2013-09-09 12:43   ` Peter Zijlstra
  2013-09-09 17:07     ` Mike Travis
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2013-09-09 12:43 UTC (permalink / raw)
  To: Mike Travis
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jason Wessel, H. Peter Anvin, Thomas Gleixner, Andrew Morton,
	Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> For performance reasons, the NMI handler may be disabled to lessen the
> performance impact caused by the multiple perf tools running concurently.
> If the system nmi command is issued when the UV NMI handler is disabled,
> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> '0' will re-enable the NMI handler.

I'm not entirely sure why this is still needed now that you've moved all
really expensive bits into the UNKNOWN handler.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-09 12:43   ` Peter Zijlstra
@ 2013-09-09 17:07     ` Mike Travis
  2013-09-10  9:03       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Travis @ 2013-09-09 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jason Wessel, H. Peter Anvin, Thomas Gleixner, Andrew Morton,
	Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
>> For performance reasons, the NMI handler may be disabled to lessen the
>> performance impact caused by the multiple perf tools running concurently.
>> If the system nmi command is issued when the UV NMI handler is disabled,
>> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
>> disabled by setting the nmi disabled variable to '1'.  Setting it back to
>> '0' will re-enable the NMI handler.
> 
> I'm not entirely sure why this is still needed now that you've moved all
> really expensive bits into the UNKNOWN handler.
> 

Yes, it could be considered optional.  My primary use was to isolate
new bugs I found to see if my NMI changes were causing them.  But it
appears that they are not since the problems occur with or without
using the NMI entry into KDB.  So it can be safely removed.

(The basic problem is that if you hang out in KDB too long the machine
locks up.  Other problems like the rcu stall detector does not have a
means to be "touched" like the nmi_watchdog_timer so it fires off a
few to many, many messages.  Another, any network connections will time
out if you are in KDB more than say 20 or 30 seconds.)

One other problem is with the perf tool.  It seems running more than
about 2 or 3 perf top instances on a medium (1k cpu threads) sized
system, they start behaving badly with a bunch of NMI stackdumps
appearing on the console.  Eventually the system become unusable.

On a large system (4k), the perf tools get an error message (sorry
don't have it handy at the moment) the basically implies that the
perf config option is not set.  Again, I wanted to remove the new
NMI handler to insure that it wasn't doing something weird, and
it wasn't.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-09 17:07     ` Mike Travis
@ 2013-09-10  9:03       ` Peter Zijlstra
  2013-09-12 17:27         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2013-09-10  9:03 UTC (permalink / raw)
  To: Mike Travis
  Cc: Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo,
	Jason Wessel, H. Peter Anvin, Thomas Gleixner, Andrew Morton,
	Dimitri Sivanich, Hedi Berriche, x86, linux-kernel

On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
> 
> 
> On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> > On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> >> For performance reasons, the NMI handler may be disabled to lessen the
> >> performance impact caused by the multiple perf tools running concurently.
> >> If the system nmi command is issued when the UV NMI handler is disabled,
> >> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> >> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> >> '0' will re-enable the NMI handler.
> > 
> > I'm not entirely sure why this is still needed now that you've moved all
> > really expensive bits into the UNKNOWN handler.
> > 
> 
> Yes, it could be considered optional.  My primary use was to isolate
> new bugs I found to see if my NMI changes were causing them.  But it
> appears that they are not since the problems occur with or without
> using the NMI entry into KDB.  So it can be safely removed.

OK, as a debug option it might make sense, but removing it is (of course)
fine with me ;-)

> (The basic problem is that if you hang out in KDB too long the machine
> locks up.  

Yeah, known issue. Not much you can do about it either I suspect. The
system generally isn't build for things like that.

> Other problems like the rcu stall detector does not have a
> means to be "touched" like the nmi_watchdog_timer so it fires off a
> few to many, many messages.  

That however might be easily cured if you ask Paul nicely ;-)

> Another, any network connections will time
> out if you are in KDB more than say 20 or 30 seconds.)
> 
> One other problem is with the perf tool.  It seems running more than
> about 2 or 3 perf top instances on a medium (1k cpu threads) sized
> system, they start behaving badly with a bunch of NMI stackdumps
> appearing on the console.  Eventually the system become unusable.

Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/

> On a large system (4k), the perf tools get an error message (sorry
> don't have it handy at the moment) the basically implies that the
> perf config option is not set.  Again, I wanted to remove the new
> NMI handler to insure that it wasn't doing something weird, and
> it wasn't.

Cute.. 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-10  9:03       ` Peter Zijlstra
@ 2013-09-12 17:27         ` Paul E. McKenney
  2013-09-12 18:35           ` Paul E. McKenney
  2013-09-12 18:59           ` Mike Travis
  0 siblings, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2013-09-12 17:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Travis, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, Hedi Berriche,
	x86, linux-kernel

On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
> On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
> > On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> > > On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> > >> For performance reasons, the NMI handler may be disabled to lessen the
> > >> performance impact caused by the multiple perf tools running concurently.
> > >> If the system nmi command is issued when the UV NMI handler is disabled,
> > >> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> > >> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> > >> '0' will re-enable the NMI handler.
> > > 
> > > I'm not entirely sure why this is still needed now that you've moved all
> > > really expensive bits into the UNKNOWN handler.
> > > 
> > 
> > Yes, it could be considered optional.  My primary use was to isolate
> > new bugs I found to see if my NMI changes were causing them.  But it
> > appears that they are not since the problems occur with or without
> > using the NMI entry into KDB.  So it can be safely removed.
> 
> OK, as a debug option it might make sense, but removing it is (of course)
> fine with me ;-)
> 
> > (The basic problem is that if you hang out in KDB too long the machine
> > locks up.  
> 
> Yeah, known issue. Not much you can do about it either I suspect. The
> system generally isn't build for things like that.
> 
> > Other problems like the rcu stall detector does not have a
> > means to be "touched" like the nmi_watchdog_timer so it fires off a
> > few to many, many messages.  
> 
> That however might be easily cured if you ask Paul nicely ;-)

RCU's grace-period mechanism is supposed to be what touches it.  ;-)

But what is it that you are looking for?  If you want to silence it
completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
you want to use.

> > Another, any network connections will time
> > out if you are in KDB more than say 20 or 30 seconds.)

Ah, you are looking for RCU to refrain from complaining about grace
periods that have been delayed by breakpoints in the kernel?  Is there
some way that RCU can learn that a breakpoint has happened?  If so,
this should not be hard.

If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
earlier.

> > One other problem is with the perf tool.  It seems running more than
> > about 2 or 3 perf top instances on a medium (1k cpu threads) sized
> > system, they start behaving badly with a bunch of NMI stackdumps
> > appearing on the console.  Eventually the system become unusable.
> 
> Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/

Indeed, with that definition of "medium", large must be truly impressive!

							Thanx, Paul

> > On a large system (4k), the perf tools get an error message (sorry
> > don't have it handy at the moment) the basically implies that the
> > perf config option is not set.  Again, I wanted to remove the new
> > NMI handler to insure that it wasn't doing something weird, and
> > it wasn't.
> 
> Cute.. 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 17:27         ` Paul E. McKenney
@ 2013-09-12 18:35           ` Paul E. McKenney
  2013-09-12 19:08             ` Mike Travis
  2013-09-12 18:59           ` Mike Travis
  1 sibling, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2013-09-12 18:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Travis, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, Hedi Berriche,
	x86, linux-kernel

On Thu, Sep 12, 2013 at 10:27:31AM -0700, Paul E. McKenney wrote:
> On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
> > > On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> > > > On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> > > >> For performance reasons, the NMI handler may be disabled to lessen the
> > > >> performance impact caused by the multiple perf tools running concurently.
> > > >> If the system nmi command is issued when the UV NMI handler is disabled,
> > > >> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> > > >> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> > > >> '0' will re-enable the NMI handler.
> > > > 
> > > > I'm not entirely sure why this is still needed now that you've moved all
> > > > really expensive bits into the UNKNOWN handler.
> > > > 
> > > 
> > > Yes, it could be considered optional.  My primary use was to isolate
> > > new bugs I found to see if my NMI changes were causing them.  But it
> > > appears that they are not since the problems occur with or without
> > > using the NMI entry into KDB.  So it can be safely removed.
> > 
> > OK, as a debug option it might make sense, but removing it is (of course)
> > fine with me ;-)
> > 
> > > (The basic problem is that if you hang out in KDB too long the machine
> > > locks up.  
> > 
> > Yeah, known issue. Not much you can do about it either I suspect. The
> > system generally isn't build for things like that.
> > 
> > > Other problems like the rcu stall detector does not have a
> > > means to be "touched" like the nmi_watchdog_timer so it fires off a
> > > few to many, many messages.  
> > 
> > That however might be easily cured if you ask Paul nicely ;-)
> 
> RCU's grace-period mechanism is supposed to be what touches it.  ;-)
> 
> But what is it that you are looking for?  If you want to silence it
> completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
> you want to use.
> 
> > > Another, any network connections will time
> > > out if you are in KDB more than say 20 or 30 seconds.)
> 
> Ah, you are looking for RCU to refrain from complaining about grace
> periods that have been delayed by breakpoints in the kernel?  Is there
> some way that RCU can learn that a breakpoint has happened?  If so,
> this should not be hard.

But wait...  RCU relies on the jiffies counter for RCU CPU stall warnings.
Doesn't the jiffies counter stop during breakpoints?

							Thanx, Paul

> If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
> earlier.
> 
> > > One other problem is with the perf tool.  It seems running more than
> > > about 2 or 3 perf top instances on a medium (1k cpu threads) sized
> > > system, they start behaving badly with a bunch of NMI stackdumps
> > > appearing on the console.  Eventually the system become unusable.
> > 
> > Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/
> 
> Indeed, with that definition of "medium", large must be truly impressive!
> 
> 							Thanx, Paul
> 
> > > On a large system (4k), the perf tools get an error message (sorry
> > > don't have it handy at the moment) the basically implies that the
> > > perf config option is not set.  Again, I wanted to remove the new
> > > NMI handler to insure that it wasn't doing something weird, and
> > > it wasn't.
> > 
> > Cute.. 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 17:27         ` Paul E. McKenney
  2013-09-12 18:35           ` Paul E. McKenney
@ 2013-09-12 18:59           ` Mike Travis
  2013-09-12 19:48             ` Hedi Berriche
  2013-09-12 20:16             ` Paul E. McKenney
  1 sibling, 2 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-12 18:59 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, Hedi Berriche,
	x86, linux-kernel



On 9/12/2013 10:27 AM, Paul E. McKenney wrote:
> On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
>> On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
>>> On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
>>>> On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
>>>>> For performance reasons, the NMI handler may be disabled to lessen the
>>>>> performance impact caused by the multiple perf tools running concurently.
>>>>> If the system nmi command is issued when the UV NMI handler is disabled,
>>>>> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
>>>>> disabled by setting the nmi disabled variable to '1'.  Setting it back to
>>>>> '0' will re-enable the NMI handler.
>>>>
>>>> I'm not entirely sure why this is still needed now that you've moved all
>>>> really expensive bits into the UNKNOWN handler.
>>>>
>>>
>>> Yes, it could be considered optional.  My primary use was to isolate
>>> new bugs I found to see if my NMI changes were causing them.  But it
>>> appears that they are not since the problems occur with or without
>>> using the NMI entry into KDB.  So it can be safely removed.
>>
>> OK, as a debug option it might make sense, but removing it is (of course)
>> fine with me ;-)
>>
>>> (The basic problem is that if you hang out in KDB too long the machine
>>> locks up.  
>>
>> Yeah, known issue. Not much you can do about it either I suspect. The
>> system generally isn't build for things like that.
>>
>>> Other problems like the rcu stall detector does not have a
>>> means to be "touched" like the nmi_watchdog_timer so it fires off a
>>> few to many, many messages.  
>>
>> That however might be easily cured if you ask Paul nicely ;-)
> 
> RCU's grace-period mechanism is supposed to be what touches it.  ;-)
> 
> But what is it that you are looking for?  If you want to silence it
> completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
> you want to use.

We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel
cmdline.  I'll double check if it was set during my testing.

> 
>>> Another, any network connections will time
>>> out if you are in KDB more than say 20 or 30 seconds.)
> 
> Ah, you are looking for RCU to refrain from complaining about grace
> periods that have been delayed by breakpoints in the kernel?  Is there
> some way that RCU can learn that a breakpoint has happened?  If so,
> this should not be hard.

Yes, exactly.  After a UV NMI event which might or might not call KDB,
but definitely can consume some time with the system stopped, I have
these notifications:

static void uv_nmi_touch_watchdogs(void)
{
        touch_softlockup_watchdog_sync();
        clocksource_touch_watchdog();
        rcu_cpu_stall_reset();
        touch_nmi_watchdog();
}


In all the cases I checked, I had all the cpus in the NMI event so
I don't think it was a straggler who triggered the problem.  One
question though, the above is called by all cpus exiting the NMI
event.  Should I limit that to only one cpu?

Note btw, that this also happens when KGDB/KDB is entered via the
sysrq-trigger 'g' event.

Perhaps there is some other timer that is going off?

> If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
> earlier.
> 
>>> One other problem is with the perf tool.  It seems running more than
>>> about 2 or 3 perf top instances on a medium (1k cpu threads) sized
>>> system, they start behaving badly with a bunch of NMI stackdumps
>>> appearing on the console.  Eventually the system become unusable.
>>
>> Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/
> 
> Indeed, with that definition of "medium", large must be truly impressive!

I say medium because it's only one rack w/~4TB of memory (and quite
popular).  Large would be 4k cpus/64TB.  Not sure yet what is "huge",
at least in terms of an SSI system.

> 
> 							Thanx, Paul
> 
>>> On a large system (4k), the perf tools get an error message (sorry
>>> don't have it handy at the moment) the basically implies that the
>>> perf config option is not set.  Again, I wanted to remove the new
>>> NMI handler to insure that it wasn't doing something weird, and
>>> it wasn't.
>>
>> Cute.. 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 18:35           ` Paul E. McKenney
@ 2013-09-12 19:08             ` Mike Travis
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Travis @ 2013-09-12 19:08 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, Hedi Berriche,
	x86, linux-kernel



On 9/12/2013 11:35 AM, Paul E. McKenney wrote:
> On Thu, Sep 12, 2013 at 10:27:31AM -0700, Paul E. McKenney wrote:
>> On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
>>> On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
>>>> On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
>>>>> On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
>>>>>> For performance reasons, the NMI handler may be disabled to lessen the
>>>>>> performance impact caused by the multiple perf tools running concurently.
>>>>>> If the system nmi command is issued when the UV NMI handler is disabled,
>>>>>> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
>>>>>> disabled by setting the nmi disabled variable to '1'.  Setting it back to
>>>>>> '0' will re-enable the NMI handler.
>>>>>
>>>>> I'm not entirely sure why this is still needed now that you've moved all
>>>>> really expensive bits into the UNKNOWN handler.
>>>>>
>>>>
>>>> Yes, it could be considered optional.  My primary use was to isolate
>>>> new bugs I found to see if my NMI changes were causing them.  But it
>>>> appears that they are not since the problems occur with or without
>>>> using the NMI entry into KDB.  So it can be safely removed.
>>>
>>> OK, as a debug option it might make sense, but removing it is (of course)
>>> fine with me ;-)
>>>
>>>> (The basic problem is that if you hang out in KDB too long the machine
>>>> locks up.  
>>>
>>> Yeah, known issue. Not much you can do about it either I suspect. The
>>> system generally isn't build for things like that.
>>>
>>>> Other problems like the rcu stall detector does not have a
>>>> means to be "touched" like the nmi_watchdog_timer so it fires off a
>>>> few to many, many messages.  
>>>
>>> That however might be easily cured if you ask Paul nicely ;-)
>>
>> RCU's grace-period mechanism is supposed to be what touches it.  ;-)
>>
>> But what is it that you are looking for?  If you want to silence it
>> completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
>> you want to use.
>>
>>>> Another, any network connections will time
>>>> out if you are in KDB more than say 20 or 30 seconds.)
>>
>> Ah, you are looking for RCU to refrain from complaining about grace
>> periods that have been delayed by breakpoints in the kernel?  Is there
>> some way that RCU can learn that a breakpoint has happened?  If so,
>> this should not be hard.
> 
> But wait...  RCU relies on the jiffies counter for RCU CPU stall warnings.
> Doesn't the jiffies counter stop during breakpoints?
> 
> 							Thanx, Paul

All cpus entering the UV NMI event use local_irq_save (as does the
entry into KGDB/KDB).  So the question becomes more what happens
after all the cpus do the local_irq_restore?  The hardware clocks
are of course still running.

> 
>> If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
>> earlier.
>>
>>>> One other problem is with the perf tool.  It seems running more than
>>>> about 2 or 3 perf top instances on a medium (1k cpu threads) sized
>>>> system, they start behaving badly with a bunch of NMI stackdumps
>>>> appearing on the console.  Eventually the system become unusable.
>>>
>>> Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/
>>
>> Indeed, with that definition of "medium", large must be truly impressive!
>>
>> 							Thanx, Paul
>>
>>>> On a large system (4k), the perf tools get an error message (sorry
>>>> don't have it handy at the moment) the basically implies that the
>>>> perf config option is not set.  Again, I wanted to remove the new
>>>> NMI handler to insure that it wasn't doing something weird, and
>>>> it wasn't.
>>>
>>> Cute.. 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 18:59           ` Mike Travis
@ 2013-09-12 19:48             ` Hedi Berriche
  2013-09-12 20:17               ` Paul E. McKenney
  2013-09-12 20:16             ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Hedi Berriche @ 2013-09-12 19:48 UTC (permalink / raw)
  To: Mike Travis
  Cc: paulmck, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, x86,
	linux-kernel

On Thu, Sep 12, 2013 at 19:59 Mike Travis wrote:
| On 9/12/2013 10:27 AM, Paul E. McKenney wrote:
|
| > But what is it that you are looking for?  If you want to silence it
| > completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
| > you want to use.
| 
| We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel
| cmdline.  I'll double check if it was set during my testing.

FWIW, for recent enough kernels the correct boot parameter is
rcupdate.rcu_cpu_stall_suppress.

It used to be rcutree.rcu_cpu_stall_suppress, but that has changed after
commit 6bfc09e.

Cheers,
Hedi.
-- 
Be careful of reading health books, you might die of a misprint.
	-- Mark Twain

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 18:59           ` Mike Travis
  2013-09-12 19:48             ` Hedi Berriche
@ 2013-09-12 20:16             ` Paul E. McKenney
  1 sibling, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2013-09-12 20:16 UTC (permalink / raw)
  To: Mike Travis
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, Hedi Berriche,
	x86, linux-kernel

On Thu, Sep 12, 2013 at 11:59:36AM -0700, Mike Travis wrote:
> On 9/12/2013 10:27 AM, Paul E. McKenney wrote:
> > On Tue, Sep 10, 2013 at 11:03:49AM +0200, Peter Zijlstra wrote:
> >> On Mon, Sep 09, 2013 at 10:07:03AM -0700, Mike Travis wrote:
> >>> On 9/9/2013 5:43 AM, Peter Zijlstra wrote:
> >>>> On Thu, Sep 05, 2013 at 05:50:41PM -0500, Mike Travis wrote:
> >>>>> For performance reasons, the NMI handler may be disabled to lessen the
> >>>>> performance impact caused by the multiple perf tools running concurently.
> >>>>> If the system nmi command is issued when the UV NMI handler is disabled,
> >>>>> the "Dazed and Confused" messages occur for all cpus.  The NMI handler is
> >>>>> disabled by setting the nmi disabled variable to '1'.  Setting it back to
> >>>>> '0' will re-enable the NMI handler.
> >>>>
> >>>> I'm not entirely sure why this is still needed now that you've moved all
> >>>> really expensive bits into the UNKNOWN handler.
> >>>>
> >>>
> >>> Yes, it could be considered optional.  My primary use was to isolate
> >>> new bugs I found to see if my NMI changes were causing them.  But it
> >>> appears that they are not since the problems occur with or without
> >>> using the NMI entry into KDB.  So it can be safely removed.
> >>
> >> OK, as a debug option it might make sense, but removing it is (of course)
> >> fine with me ;-)
> >>
> >>> (The basic problem is that if you hang out in KDB too long the machine
> >>> locks up.  
> >>
> >> Yeah, known issue. Not much you can do about it either I suspect. The
> >> system generally isn't build for things like that.
> >>
> >>> Other problems like the rcu stall detector does not have a
> >>> means to be "touched" like the nmi_watchdog_timer so it fires off a
> >>> few to many, many messages.  
> >>
> >> That however might be easily cured if you ask Paul nicely ;-)
> > 
> > RCU's grace-period mechanism is supposed to be what touches it.  ;-)
> > 
> > But what is it that you are looking for?  If you want to silence it
> > completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
> > you want to use.
> 
> We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel
> cmdline.  I'll double check if it was set during my testing.
> 
> > 
> >>> Another, any network connections will time
> >>> out if you are in KDB more than say 20 or 30 seconds.)
> > 
> > Ah, you are looking for RCU to refrain from complaining about grace
> > periods that have been delayed by breakpoints in the kernel?  Is there
> > some way that RCU can learn that a breakpoint has happened?  If so,
> > this should not be hard.
> 
> Yes, exactly.  After a UV NMI event which might or might not call KDB,
> but definitely can consume some time with the system stopped, I have
> these notifications:
> 
> static void uv_nmi_touch_watchdogs(void)
> {
>         touch_softlockup_watchdog_sync();
>         clocksource_touch_watchdog();
>         rcu_cpu_stall_reset();

This function effectively disables RCU CPU stall warnings for the current
set of grace periods.  Or is supposed to do so, anyway.  I won't guarantee
that this is avoids false positive in the face of all possible races
between grace-period initialization, calls to rcu_cpu_stall_reset(),
and stall warnings.

So how often are you seeing RCU CPU stall warnings?

>         touch_nmi_watchdog();
> }
> 
> 
> In all the cases I checked, I had all the cpus in the NMI event so
> I don't think it was a straggler who triggered the problem.  One
> question though, the above is called by all cpus exiting the NMI
> event.  Should I limit that to only one cpu?

You should only need to invoke rcu_cpu_stall_reset() from a single CPU.
That said, I would not expect problems from concurrent invocations,
unless your compiler stores to a long with a pair of smaller stores
or something.

> Note btw, that this also happens when KGDB/KDB is entered via the
> sysrq-trigger 'g' event.
> 
> Perhaps there is some other timer that is going off?

Is uv_nmi_touch_watchdogs() invoked on the way in to the breakpoint?
On the way out?  Both?  Either way, what software environment does it
run in?  The only environment completely safe against races on the way
in would be stop_machine() -- otherwise, a grace period might start just
after uv_nmi_touch_watchdogs() returned, which would cause a normal RCU
CPU stall timeout to be in effect.

> > If not, I must fall back on the rcu_cpu_stall_suppress that I mentioned
> > earlier.
> > 
> >>> One other problem is with the perf tool.  It seems running more than
> >>> about 2 or 3 perf top instances on a medium (1k cpu threads) sized
> >>> system, they start behaving badly with a bunch of NMI stackdumps
> >>> appearing on the console.  Eventually the system become unusable.
> >>
> >> Yuck.. I haven't seen anything like that on the 'tiny' systems I have :/
> > 
> > Indeed, with that definition of "medium", large must be truly impressive!
> 
> I say medium because it's only one rack w/~4TB of memory (and quite
> popular).  Large would be 4k cpus/64TB.  Not sure yet what is "huge",
> at least in terms of an SSI system.

Well, when I tell people that someone reported a bug running on a
4K-CPU system, they look at me funny.  ;-)

							Thanx, Paul

> >>> On a large system (4k), the perf tools get an error message (sorry
> >>> don't have it handy at the moment) the basically implies that the
> >>> perf config option is not set.  Again, I wanted to remove the new
> >>> NMI handler to insure that it wasn't doing something weird, and
> >>> it wasn't.
> >>
> >> Cute.. 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >>
> > 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler
  2013-09-12 19:48             ` Hedi Berriche
@ 2013-09-12 20:17               ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2013-09-12 20:17 UTC (permalink / raw)
  To: Mike Travis, Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Jason Wessel, H. Peter Anvin,
	Thomas Gleixner, Andrew Morton, Dimitri Sivanich, x86,
	linux-kernel

On Thu, Sep 12, 2013 at 08:48:33PM +0100, Hedi Berriche wrote:
> On Thu, Sep 12, 2013 at 19:59 Mike Travis wrote:
> | On 9/12/2013 10:27 AM, Paul E. McKenney wrote:
> |
> | > But what is it that you are looking for?  If you want to silence it
> | > completely, the rcu_cpu_stall_suppress boot/sysfs parameter is what
> | > you want to use.
> | 
> | We have by default rcutree.rcu_cpu_stall_suppress=1 on the kernel
> | cmdline.  I'll double check if it was set during my testing.
> 
> FWIW, for recent enough kernels the correct boot parameter is
> rcupdate.rcu_cpu_stall_suppress.
> 
> It used to be rcutree.rcu_cpu_stall_suppress, but that has changed after
> commit 6bfc09e.

Good point, Hedi!  That change happened when rcutiny gained RCU CPU
stall warning capability.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2013-09-12 20:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-05 22:50 [PATCH 0/9] x86/UV/KDB/NMI: Updates for NMI/KDB handler for SGI UV Mike Travis
2013-09-05 22:50 ` [PATCH 1/9] x86/UV: Move NMI support Mike Travis
2013-09-05 22:50 ` [PATCH 2/9] x86/UV: Update UV support for external NMI signals Mike Travis
2013-09-05 22:50 ` [PATCH 3/9] x86/UV: Add summary of cpu activity to UV NMI handler Mike Travis
2013-09-05 22:50 ` [PATCH 4/9] x86/UV: Add kdump " Mike Travis
2013-09-05 22:50 ` [PATCH 5/9] KGDB/KDB: add support for external NMI handler to call KGDB/KDB Mike Travis
2013-09-06  4:36   ` Jason Wessel
2013-09-05 22:50 ` [PATCH 6/9] x86/UV: Add call to KGDB/KDB from NMI handler Mike Travis
2013-09-05 22:50 ` [PATCH 7/9] KGDB/KDB: add new system NMI entry code to KDB Mike Travis
2013-09-06  5:00   ` Jason Wessel
2013-09-06 16:48     ` Mike Travis
2013-09-05 22:50 ` [PATCH 8/9] x86/UV: Add uvtrace support Mike Travis
2013-09-05 22:50 ` [PATCH 9/9] x86/UV: Add ability to disable UV NMI handler Mike Travis
2013-09-09 12:43   ` Peter Zijlstra
2013-09-09 17:07     ` Mike Travis
2013-09-10  9:03       ` Peter Zijlstra
2013-09-12 17:27         ` Paul E. McKenney
2013-09-12 18:35           ` Paul E. McKenney
2013-09-12 19:08             ` Mike Travis
2013-09-12 18:59           ` Mike Travis
2013-09-12 19:48             ` Hedi Berriche
2013-09-12 20:17               ` Paul E. McKenney
2013-09-12 20:16             ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.