All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4][V3] Improve watchdog config for arch watchdogs
@ 2017-05-30  1:26 Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 1/4] watchdog: remove unused declaration Nicholas Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Nicholas Piggin @ 2017-05-30  1:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: Nicholas Piggin, linux-kernel, linux-arch

Since last time:

- Have the perf based hardlockup detector use arch_touch_nmi_watchdog()
  rather than hld_touch_nmi_watchdog(). This changes direction slightly
  to make the perf-based hard lockup detector an alternative that an
  arch may select, rather than standalone. This better reflects how the
  code works in practice).

- Hopefully fixed the Kconfig options. There's still a bit of ugliness
  that will require another pass or two over interfaces and config
  scheme, but the idea is to make a minimal change to get the powerpc
  HLD in, which gives a reasonable starting point to improve things
  further.

Nicholas Piggin (4):
  watchdog: remove unused declaration
  watchdog: Introduce arch_touch_nmi_watchdog()
  watchdog: Split up config options
  watchdog: Provide watchdog_reconfigure() for arch watchdogs

 arch/blackfin/include/asm/nmi.h            |   2 +
 arch/blackfin/kernel/nmi.c                 |   2 +-
 arch/mn10300/include/asm/nmi.h             |   2 +
 arch/mn10300/kernel/mn10300-watchdog-low.S |   8 +-
 arch/mn10300/kernel/mn10300-watchdog.c     |   2 +-
 arch/powerpc/kernel/setup_64.c             |   2 +-
 arch/sparc/include/asm/nmi.h               |   1 +
 arch/sparc/kernel/nmi.c                    |   6 +-
 arch/x86/kernel/apic/hw_nmi.c              |   2 +-
 include/linux/nmi.h                        |  57 ++++---
 kernel/Makefile                            |   2 +-
 kernel/sysctl.c                            |  18 +-
 kernel/watchdog.c                          | 263 +++++++++++++++++++----------
 kernel/watchdog_hld.c                      |  37 +---
 lib/Kconfig.debug                          |  29 +++-
 15 files changed, 263 insertions(+), 170 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/4] watchdog: remove unused declaration
  2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
@ 2017-05-30  1:26 ` Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 2/4] watchdog: Introduce arch_touch_nmi_watchdog() Nicholas Piggin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Nicholas Piggin @ 2017-05-30  1:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: Nicholas Piggin, linux-kernel, linux-arch

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 include/linux/nmi.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index aa3cd0878270..5e2e57536d98 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -12,9 +12,6 @@ extern void touch_softlockup_watchdog_sched(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
-extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
-				  void __user *buffer,
-				  size_t *lenp, loff_t *ppos);
 extern unsigned int  softlockup_panic;
 extern unsigned int  hardlockup_panic;
 void lockup_detector_init(void);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/4] watchdog: Introduce arch_touch_nmi_watchdog()
  2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 1/4] watchdog: remove unused declaration Nicholas Piggin
@ 2017-05-30  1:26 ` Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 3/4] watchdog: Split up config options Nicholas Piggin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Nicholas Piggin @ 2017-05-30  1:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: Nicholas Piggin, linux-kernel, linux-arch

For architectures that define HAVE_NMI_WATCHDOG, instead of having
them provide the complete touch_nmi_watchdog() function, just have
them provide arch_touch_nmi_watchdog().

This gives the generic code more flexibility in implementing this
function, and arch implementations don't miss out on touching the
softlockup watchdog or other generic details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/blackfin/include/asm/nmi.h            |  2 ++
 arch/blackfin/kernel/nmi.c                 |  2 +-
 arch/mn10300/include/asm/nmi.h             |  2 ++
 arch/mn10300/kernel/mn10300-watchdog-low.S |  8 ++++----
 arch/mn10300/kernel/mn10300-watchdog.c     |  2 +-
 arch/sparc/include/asm/nmi.h               |  1 +
 arch/sparc/kernel/nmi.c                    |  6 ++----
 include/linux/nmi.h                        | 27 ++++++++++++++++-----------
 kernel/watchdog_hld.c                      |  5 ++---
 9 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/arch/blackfin/include/asm/nmi.h b/arch/blackfin/include/asm/nmi.h
index b9caac4fcfd8..107d23705f46 100644
--- a/arch/blackfin/include/asm/nmi.h
+++ b/arch/blackfin/include/asm/nmi.h
@@ -9,4 +9,6 @@
 
 #include <linux/nmi.h>
 
+extern void arch_touch_nmi_watchdog(void);
+
 #endif
diff --git a/arch/blackfin/kernel/nmi.c b/arch/blackfin/kernel/nmi.c
index 633c37083e87..1e714329fe8a 100644
--- a/arch/blackfin/kernel/nmi.c
+++ b/arch/blackfin/kernel/nmi.c
@@ -190,7 +190,7 @@ static int __init init_nmi_wdt(void)
 }
 device_initcall(init_nmi_wdt);
 
-void touch_nmi_watchdog(void)
+void arch_touch_nmi_watchdog(void)
 {
 	atomic_set(&nmi_touched[smp_processor_id()], 1);
 }
diff --git a/arch/mn10300/include/asm/nmi.h b/arch/mn10300/include/asm/nmi.h
index f3671cbbc117..b05627597b1b 100644
--- a/arch/mn10300/include/asm/nmi.h
+++ b/arch/mn10300/include/asm/nmi.h
@@ -11,4 +11,6 @@
 #ifndef _ASM_NMI_H
 #define _ASM_NMI_H
 
+extern void arch_touch_nmi_watchdog(void);
+
 #endif /* _ASM_NMI_H */
diff --git a/arch/mn10300/kernel/mn10300-watchdog-low.S b/arch/mn10300/kernel/mn10300-watchdog-low.S
index f2f5c9cfaabd..34f8773de7d0 100644
--- a/arch/mn10300/kernel/mn10300-watchdog-low.S
+++ b/arch/mn10300/kernel/mn10300-watchdog-low.S
@@ -50,9 +50,9 @@ watchdog_handler:
 #   we can't inline it)
 #
 ###############################################################################
-	.globl	touch_nmi_watchdog
-	.type	touch_nmi_watchdog,@function
-touch_nmi_watchdog:
+	.globl	arch_touch_nmi_watchdog
+	.type	arch_touch_nmi_watchdog,@function
+arch_touch_nmi_watchdog:
 	clr	d0
 	clr	d1
 	mov	watchdog_alert_counter, a0
@@ -63,4 +63,4 @@ touch_nmi_watchdog:
 	lne
 	ret	[],0
 
-	.size	touch_nmi_watchdog,.-touch_nmi_watchdog
+	.size	arch_touch_nmi_watchdog,.-arch_touch_nmi_watchdog
diff --git a/arch/mn10300/kernel/mn10300-watchdog.c b/arch/mn10300/kernel/mn10300-watchdog.c
index a2d8e6938d67..0d5641beadf5 100644
--- a/arch/mn10300/kernel/mn10300-watchdog.c
+++ b/arch/mn10300/kernel/mn10300-watchdog.c
@@ -31,7 +31,7 @@ static unsigned int watchdog;
 static unsigned int watchdog_hz = 1;
 unsigned int watchdog_alert_counter[NR_CPUS];
 
-EXPORT_SYMBOL(touch_nmi_watchdog);
+EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
 /*
  * the best way to detect whether a CPU has a 'hard lockup' problem
diff --git a/arch/sparc/include/asm/nmi.h b/arch/sparc/include/asm/nmi.h
index 26ad2b2607c6..284eac3ffaf2 100644
--- a/arch/sparc/include/asm/nmi.h
+++ b/arch/sparc/include/asm/nmi.h
@@ -7,6 +7,7 @@ void nmi_adjust_hz(unsigned int new_hz);
 
 extern atomic_t nmi_active;
 
+void arch_touch_nmi_watchdog(void);
 void start_nmi_watchdog(void *unused);
 void stop_nmi_watchdog(void *unused);
 
diff --git a/arch/sparc/kernel/nmi.c b/arch/sparc/kernel/nmi.c
index 95e73c63c99d..048ad783ea3f 100644
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -51,7 +51,7 @@ static DEFINE_PER_CPU(unsigned int, last_irq_sum);
 static DEFINE_PER_CPU(long, alert_counter);
 static DEFINE_PER_CPU(int, nmi_touch);
 
-void touch_nmi_watchdog(void)
+void arch_touch_nmi_watchdog(void)
 {
 	if (atomic_read(&nmi_active)) {
 		int cpu;
@@ -61,10 +61,8 @@ void touch_nmi_watchdog(void)
 				per_cpu(nmi_touch, cpu) = 1;
 		}
 	}
-
-	touch_softlockup_watchdog();
 }
-EXPORT_SYMBOL(touch_nmi_watchdog);
+EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
 static void die_nmi(const char *str, struct pt_regs *regs, int do_panic)
 {
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 5e2e57536d98..bd387ef8bccd 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -6,6 +6,9 @@
 
 #include <linux/sched.h>
 #include <asm/irq.h>
+#if defined(CONFIG_HAVE_NMI_WATCHDOG)
+#include <asm/nmi.h>
+#endif
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 extern void touch_softlockup_watchdog_sched(void);
@@ -58,6 +61,18 @@ static inline void reset_hung_task_detector(void)
 #define NMI_WATCHDOG_ENABLED      (1 << NMI_WATCHDOG_ENABLED_BIT)
 #define SOFT_WATCHDOG_ENABLED     (1 << SOFT_WATCHDOG_ENABLED_BIT)
 
+#if defined(CONFIG_HARDLOCKUP_DETECTOR)
+extern void hardlockup_detector_disable(void);
+#else
+static inline void hardlockup_detector_disable(void) {}
+#endif
+
+#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
+extern void arch_touch_nmi_watchdog(void);
+#else
+static inline void arch_touch_nmi_watchdog(void) {}
+#endif
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  * 
@@ -65,21 +80,11 @@ static inline void reset_hung_task_detector(void)
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
  */
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
-#include <asm/nmi.h>
-extern void touch_nmi_watchdog(void);
-#else
 static inline void touch_nmi_watchdog(void)
 {
+	arch_touch_nmi_watchdog();
 	touch_softlockup_watchdog();
 }
-#endif
-
-#if defined(CONFIG_HARDLOCKUP_DETECTOR)
-extern void hardlockup_detector_disable(void);
-#else
-static inline void hardlockup_detector_disable(void) {}
-#endif
 
 /*
  * Create trigger_all_cpu_backtrace() out of the arch-provided
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 54a427d1f344..90d688df6ce1 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -56,7 +56,7 @@ static int __init hardlockup_panic_setup(char *str)
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
-void touch_nmi_watchdog(void)
+void arch_touch_nmi_watchdog(void)
 {
 	/*
 	 * Using __raw here because some code paths have
@@ -66,9 +66,8 @@ void touch_nmi_watchdog(void)
 	 * going off.
 	 */
 	raw_cpu_write(watchdog_nmi_touch, true);
-	touch_softlockup_watchdog();
 }
-EXPORT_SYMBOL(touch_nmi_watchdog);
+EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/4] watchdog: Split up config options
  2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 1/4] watchdog: remove unused declaration Nicholas Piggin
  2017-05-30  1:26 ` [PATCH 2/4] watchdog: Introduce arch_touch_nmi_watchdog() Nicholas Piggin
@ 2017-05-30  1:26 ` Nicholas Piggin
  2017-06-02 20:15   ` Don Zickus
  2017-05-30  1:26 ` [PATCH 4/4] watchdog: Provide watchdog_reconfigure() for arch watchdogs Nicholas Piggin
  2017-06-06 16:08 ` [PATCH 0/4][V3] Improve watchdog config " Don Zickus
  4 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-05-30  1:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: Nicholas Piggin, linux-kernel, linux-arch

Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
for lockup detectors. An architecture that defines HAVE_NMI_WATCHDOG
need not use this this if it has a very basic watchdog or uses its own
options and interfaces (e.g., sparc). touch_nmi_watchdog() will
continue to call their arch_touch_nmi_watchdog().

HARDLOCKUP_DETECTOR_PERF is the perf-based lockup detector.

HARDLOCKUP_DETECTOR is the framework for arch NMI_WATCHDOG hard lockup
detectors that conform to the LOCKUP_DETECTOR interfaces.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/setup_64.c |   2 +-
 arch/x86/kernel/apic/hw_nmi.c  |   2 +-
 include/linux/nmi.h            |  31 ++++--
 kernel/Makefile                |   2 +-
 kernel/sysctl.c                |  18 ++--
 kernel/watchdog.c              | 238 ++++++++++++++++++++++++++---------------
 kernel/watchdog_hld.c          |  32 ------
 lib/Kconfig.debug              |  29 +++--
 8 files changed, 211 insertions(+), 143 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f35ff9dea4fb..ab650905f75a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
 EXPORT_SYMBOL(ppc_pci_io);
 #endif
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return ppc_proc_freq * watchdog_thresh;
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index c73c9fb281e1..d6f387780849 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,7 +19,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index bd387ef8bccd..257e6d7a9e6a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -11,13 +11,21 @@
 #endif
 
 #ifdef CONFIG_LOCKUP_DETECTOR
+void lockup_detector_init(void);
+#else
+static inline void lockup_detector_init(void)
+{
+}
+#endif
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 extern void touch_softlockup_watchdog_sched(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
-extern unsigned int  hardlockup_panic;
-void lockup_detector_init(void);
+extern int soft_watchdog_enabled;
+extern atomic_t watchdog_park_in_progress;
 #else
 static inline void touch_softlockup_watchdog_sched(void)
 {
@@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
 static inline void touch_all_softlockup_watchdogs(void)
 {
 }
-static inline void lockup_detector_init(void)
-{
-}
 #endif
 
 #ifdef CONFIG_DETECT_HUNG_TASK
@@ -63,13 +68,16 @@ static inline void reset_hung_task_detector(void)
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 extern void arch_touch_nmi_watchdog(void);
-#else
+#endif
+
+#if !defined(CONFIG_HARDLOCKUP_DETECTOR) && !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
 
@@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
 }
 #endif
 
-#ifdef CONFIG_LOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+#endif
+
+#ifdef CONFIG_LOCKUP_DETECTOR
 extern int nmi_watchdog_enabled;
-extern int soft_watchdog_enabled;
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
+extern struct cpumask watchdog_cpumask;
 extern unsigned long *watchdog_cpumask_bits;
-extern atomic_t watchdog_park_in_progress;
+extern int __read_mostly watchdog_suspended;
 #ifdef CONFIG_SMP
 extern int sysctl_softlockup_all_cpu_backtrace;
 extern int sysctl_hardlockup_all_cpu_backtrace;
diff --git a/kernel/Makefile b/kernel/Makefile
index 72aa080f91f0..4cb8e8b23c6e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a76cc3..fa37faa89143 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -873,13 +873,21 @@ static struct ctl_table kern_table[] = {
 		.mode           = 0644,
 		.proc_handler   = proc_nmi_watchdog,
 		.extra1		= &zero,
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 		.extra2		= &one,
 #else
 		.extra2		= &zero,
 #endif
 	},
 	{
+		.procname	= "watchdog_cpumask",
+		.data		= &watchdog_cpumask_bits,
+		.maxlen		= NR_CPUS,
+		.mode		= 0644,
+		.proc_handler	= proc_watchdog_cpumask,
+	},
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+	{
 		.procname       = "soft_watchdog",
 		.data           = &soft_watchdog_enabled,
 		.maxlen         = sizeof (int),
@@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 	{
-		.procname	= "watchdog_cpumask",
-		.data		= &watchdog_cpumask_bits,
-		.maxlen		= NR_CPUS,
-		.mode		= 0644,
-		.proc_handler	= proc_watchdog_cpumask,
-	},
-	{
 		.procname	= "softlockup_panic",
 		.data		= &softlockup_panic,
 		.maxlen		= sizeof(int),
@@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+#endif
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 	{
 		.procname	= "hardlockup_panic",
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 03e0b69bb5bf..deb010505646 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,15 +29,55 @@
 #include <linux/kvm_para.h>
 #include <linux/kthread.h>
 
+/* Watchdog configuration */
 static DEFINE_MUTEX(watchdog_proc_mutex);
 
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
+
+/* boot commands */
+/*
+ * Should we panic when a soft-lockup or hard-lockup occurs:
+ */
+unsigned int __read_mostly hardlockup_panic =
+			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
+/*
+ * We may not want to enable hard lockup detection by default in all cases,
+ * for example when running the kernel as a guest on a hypervisor. In these
+ * cases this function can be called to disable hard lockup detection. This
+ * function should only be executed once by the boot processor before the
+ * kernel command line parameters are parsed, because otherwise it is not
+ * possible to override this in hardlockup_panic_setup().
+ */
+void hardlockup_detector_disable(void)
+{
+	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+}
+
+static int __init hardlockup_panic_setup(char *str)
+{
+	if (!strncmp(str, "panic", 5))
+		hardlockup_panic = 1;
+	else if (!strncmp(str, "nopanic", 7))
+		hardlockup_panic = 0;
+	else if (!strncmp(str, "0", 1))
+		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+	else if (!strncmp(str, "1", 1))
+		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+	return 1;
+}
+__setup("nmi_watchdog=", hardlockup_panic_setup);
+
 #else
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
 #endif
-int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 int __read_mostly soft_watchdog_enabled;
+#endif
+
 int __read_mostly watchdog_user_enabled;
 int __read_mostly watchdog_thresh = 10;
 
@@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
 int __read_mostly sysctl_softlockup_all_cpu_backtrace;
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 #endif
-static struct cpumask watchdog_cpumask __read_mostly;
+struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
-/* Helper for online, unparked cpus. */
-#define for_each_watchdog_cpu(cpu) \
-	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
-
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
 /*
  * The 'watchdog_running' variable is set to 1 when the watchdog threads
  * are registered/started and is set to 0 when the watchdog threads are
@@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
  * of 'watchdog_running' cannot change while the watchdog is deactivated
  * temporarily (see related code in 'proc' handlers).
  */
-static int __read_mostly watchdog_suspended;
+int __read_mostly watchdog_suspended;
+
+/*
+ * These functions can be overridden if an architecture implements its
+ * own hardlockup detector.
+ */
+int __weak watchdog_nmi_enable(unsigned int cpu)
+{
+	return 0;
+}
+void __weak watchdog_nmi_disable(unsigned int cpu)
+{
+}
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+
+/* Helper for online, unparked cpus. */
+#define for_each_watchdog_cpu(cpu) \
+	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
+
+atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
 
 static u64 __read_mostly sample_period;
 
@@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
 	return 1;
 }
 __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 {
 	sysctl_hardlockup_all_cpu_backtrace =
@@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 #endif
+#endif
 
 /*
  * Hard-lockup warnings should be triggered after just a few seconds. Soft-
@@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
 	__this_cpu_write(watchdog_touch_ts, 0);
 }
 
-/* watchdog detector functions */
-bool is_hardlockup(void)
-{
-	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
-
-	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
-		return true;
-
-	__this_cpu_write(hrtimer_interrupts_saved, hrint);
-	return false;
-}
-
 static int is_softlockup(unsigned long touch_ts)
 {
 	unsigned long now = get_timestamp();
@@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
 	return 0;
 }
 
-static void watchdog_interrupt_count(void)
+/* watchdog detector functions */
+bool is_hardlockup(void)
 {
-	__this_cpu_inc(hrtimer_interrupts);
-}
+	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
 
-/*
- * These two functions are mostly architecture specific
- * defining them as weak here.
- */
-int __weak watchdog_nmi_enable(unsigned int cpu)
-{
-	return 0;
+	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+		return true;
+
+	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+	return false;
 }
-void __weak watchdog_nmi_disable(unsigned int cpu)
+
+static void watchdog_interrupt_count(void)
 {
+	__this_cpu_inc(hrtimer_interrupts);
 }
 
 static int watchdog_enable_all_cpus(void);
@@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
 		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
 }
 
-/*
- * Suspend the hard and soft lockup detector by parking the watchdog threads.
- */
-int lockup_detector_suspend(void)
-{
-	int ret = 0;
-
-	get_online_cpus();
-	mutex_lock(&watchdog_proc_mutex);
-	/*
-	 * Multiple suspend requests can be active in parallel (counted by
-	 * the 'watchdog_suspended' variable). If the watchdog threads are
-	 * running, the first caller takes care that they will be parked.
-	 * The state of 'watchdog_running' cannot change while a suspend
-	 * request is active (see related code in 'proc' handlers).
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		ret = watchdog_park_threads();
-
-	if (ret == 0)
-		watchdog_suspended++;
-	else {
-		watchdog_disable_all_cpus();
-		pr_err("Failed to suspend lockup detectors, disabled\n");
-		watchdog_enabled = 0;
-	}
-
-	mutex_unlock(&watchdog_proc_mutex);
-
-	return ret;
-}
-
-/*
- * Resume the hard and soft lockup detector by unparking the watchdog threads.
- */
-void lockup_detector_resume(void)
-{
-	mutex_lock(&watchdog_proc_mutex);
-
-	watchdog_suspended--;
-	/*
-	 * The watchdog threads are unparked if they were previously running
-	 * and if there is no more active suspend request.
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		watchdog_unpark_threads();
-
-	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
-}
-
 static int update_watchdog_all_cpus(void)
 {
 	int ret;
@@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
 	}
 }
 
+#else /* SOFTLOCKUP */
+static int watchdog_park_threads(void)
+{
+	return 0;
+}
+
+static void watchdog_unpark_threads(void)
+{
+}
+
+static int watchdog_enable_all_cpus(void)
+{
+	return 0;
+}
+
+static void watchdog_disable_all_cpus(void)
+{
+}
+
+static void set_sample_period(void)
+{
+}
+#endif /* SOFTLOCKUP */
+
+/*
+ * Suspend the hard and soft lockup detector by parking the watchdog threads.
+ */
+int lockup_detector_suspend(void)
+{
+	int ret = 0;
+
+	get_online_cpus();
+	mutex_lock(&watchdog_proc_mutex);
+	/*
+	 * Multiple suspend requests can be active in parallel (counted by
+	 * the 'watchdog_suspended' variable). If the watchdog threads are
+	 * running, the first caller takes care that they will be parked.
+	 * The state of 'watchdog_running' cannot change while a suspend
+	 * request is active (see related code in 'proc' handlers).
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		ret = watchdog_park_threads();
+
+	if (ret == 0)
+		watchdog_suspended++;
+	else {
+		watchdog_disable_all_cpus();
+		pr_err("Failed to suspend lockup detectors, disabled\n");
+		watchdog_enabled = 0;
+	}
+
+	mutex_unlock(&watchdog_proc_mutex);
+
+	return ret;
+}
+
+/*
+ * Resume the hard and soft lockup detector by unparking the watchdog threads.
+ */
+void lockup_detector_resume(void)
+{
+	mutex_lock(&watchdog_proc_mutex);
+
+	watchdog_suspended--;
+	/*
+	 * The watchdog threads are unparked if they were previously running
+	 * and if there is no more active suspend request.
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		watchdog_unpark_threads();
+
+	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
+}
+
 #ifdef CONFIG_SYSCTL
 
 /*
@@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 			 * a temporary cpumask, so we are likely not in a
 			 * position to do much else to make things better.
 			 */
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 			if (smpboot_update_cpumask_percpu_thread(
 				    &watchdog_threads, &watchdog_cpumask) != 0)
 				pr_err("cpumask update failed\n");
+#endif
 		}
 	}
 out:
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 90d688df6ce1..295a0d84934c 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 
-/* boot commands */
-/*
- * Should we panic when a soft-lockup or hard-lockup occurs:
- */
-unsigned int __read_mostly hardlockup_panic =
-			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
 static unsigned long hardlockup_allcpu_dumped;
-/*
- * We may not want to enable hard lockup detection by default in all cases,
- * for example when running the kernel as a guest on a hypervisor. In these
- * cases this function can be called to disable hard lockup detection. This
- * function should only be executed once by the boot processor before the
- * kernel command line parameters are parsed, because otherwise it is not
- * possible to override this in hardlockup_panic_setup().
- */
-void hardlockup_detector_disable(void)
-{
-	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
-}
-
-static int __init hardlockup_panic_setup(char *str)
-{
-	if (!strncmp(str, "panic", 5))
-		hardlockup_panic = 1;
-	else if (!strncmp(str, "nopanic", 7))
-		hardlockup_panic = 0;
-	else if (!strncmp(str, "0", 1))
-		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
-	else if (!strncmp(str, "1", 1))
-		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
-	return 1;
-}
-__setup("nmi_watchdog=", hardlockup_panic_setup);
 
 void arch_touch_nmi_watchdog(void)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587ebe52c7..a3afe3e10278 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
 	  The frequency of hrtimer and NMI events and the soft and hard lockup
 	  thresholds can be controlled through the sysctl watchdog_thresh.
 
+config SOFTLOCKUP_DETECTOR
+	bool "Detect Soft Lockups"
+	depends on LOCKUP_DETECTOR
+
+config HARDLOCKUP_DETECTOR_PERF
+	bool
+	select SOFTLOCKUP_DETECTOR
+
+#
+# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
+# rather than the perf based detector.
+#
+# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
+# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
+# (e.g., sysctl and cmdline).
+#
 config HARDLOCKUP_DETECTOR
-	def_bool y
-	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
-	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
+	bool "Detect Hard Lockups"
+	depends on LOCKUP_DETECTOR
+	depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
+	select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
 
 config BOOTPARAM_HARDLOCKUP_PANIC
 	bool "Panic (Reboot) On Hard Lockups"
@@ -826,7 +843,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
 
 config BOOTPARAM_SOFTLOCKUP_PANIC
 	bool "Panic (Reboot) On Soft Lockups"
-	depends on LOCKUP_DETECTOR
+	depends on SOFTLOCKUP_DETECTOR
 	help
 	  Say Y here to enable the kernel to panic on "soft lockups",
 	  which are bugs that cause the kernel to loop in kernel
@@ -843,7 +860,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
 
 config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
 	int
-	depends on LOCKUP_DETECTOR
+	depends on SOFTLOCKUP_DETECTOR
 	range 0 1
 	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
 	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
@@ -851,7 +868,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
 config DETECT_HUNG_TASK
 	bool "Detect Hung Tasks"
 	depends on DEBUG_KERNEL
-	default LOCKUP_DETECTOR
+	default SOFTLOCKUP_DETECTOR
 	help
 	  Say Y here to enable the kernel to detect "hung tasks",
 	  which are bugs that cause the task to be stuck in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/4] watchdog: Provide watchdog_reconfigure() for arch watchdogs
  2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-05-30  1:26 ` [PATCH 3/4] watchdog: Split up config options Nicholas Piggin
@ 2017-05-30  1:26 ` Nicholas Piggin
  2017-06-06 16:08 ` [PATCH 0/4][V3] Improve watchdog config " Don Zickus
  4 siblings, 0 replies; 23+ messages in thread
From: Nicholas Piggin @ 2017-05-30  1:26 UTC (permalink / raw)
  To: Don Zickus; +Cc: Nicholas Piggin, linux-kernel, linux-arch

After reconfiguring watchdog sysctls etc., architecture specific
watchdogs may not get all their parameters updated.

watchdog_reconfigure() can be implemented to pull the new values
in and set the arch NMI watchdog.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 kernel/watchdog.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index deb010505646..5397c637db2d 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -120,6 +120,11 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
 {
 }
 
+void __weak watchdog_nmi_reconfigure(void)
+{
+}
+
+
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
 /* Helper for online, unparked cpus. */
@@ -597,6 +602,12 @@ static void watchdog_disable_all_cpus(void)
 	}
 }
 
+static int watchdog_update_cpus(void)
+{
+	return smpboot_update_cpumask_percpu_thread(
+		    &watchdog_threads, &watchdog_cpumask);
+}
+
 #else /* SOFTLOCKUP */
 static int watchdog_park_threads(void)
 {
@@ -616,6 +627,11 @@ static void watchdog_disable_all_cpus(void)
 {
 }
 
+static int watchdog_update_cpus(void)
+{
+	return 0;
+}
+
 static void set_sample_period(void)
 {
 }
@@ -648,6 +664,8 @@ int lockup_detector_suspend(void)
 		watchdog_enabled = 0;
 	}
 
+	watchdog_nmi_reconfigure();
+
 	mutex_unlock(&watchdog_proc_mutex);
 
 	return ret;
@@ -668,6 +686,8 @@ void lockup_detector_resume(void)
 	if (watchdog_running && !watchdog_suspended)
 		watchdog_unpark_threads();
 
+	watchdog_nmi_reconfigure();
+
 	mutex_unlock(&watchdog_proc_mutex);
 	put_online_cpus();
 }
@@ -693,6 +713,8 @@ static int proc_watchdog_update(void)
 	else
 		watchdog_disable_all_cpus();
 
+	watchdog_nmi_reconfigure();
+
 	return err;
 
 }
@@ -878,12 +900,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 			 * a temporary cpumask, so we are likely not in a
 			 * position to do much else to make things better.
 			 */
-#ifdef CONFIG_SOFTLOCKUP_DETECTOR
-			if (smpboot_update_cpumask_percpu_thread(
-				    &watchdog_threads, &watchdog_cpumask) != 0)
+			if (watchdog_update_cpus() != 0)
 				pr_err("cpumask update failed\n");
-#endif
 		}
+
+		watchdog_nmi_reconfigure();
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-05-30  1:26 ` [PATCH 3/4] watchdog: Split up config options Nicholas Piggin
@ 2017-06-02 20:15   ` Don Zickus
  2017-06-03  6:10     ` Nicholas Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-02 20:15 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, linux-arch

On Tue, May 30, 2017 at 11:26:58AM +1000, Nicholas Piggin wrote:
> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
> 
> LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
> for lockup detectors. An architecture that defines HAVE_NMI_WATCHDOG
> need not use this this if it has a very basic watchdog or uses its own
> options and interfaces (e.g., sparc). touch_nmi_watchdog() will
> continue to call their arch_touch_nmi_watchdog().
> 
> HARDLOCKUP_DETECTOR_PERF is the perf-based lockup detector.
> 
> HARDLOCKUP_DETECTOR is the framework for arch NMI_WATCHDOG hard lockup
> detectors that conform to the LOCKUP_DETECTOR interfaces.

Hi Nick,

Sorry for the late response.  I did some sanity testing on your patches on
x86_64 and it seems to work fine.  I don't think I have any real issues with
the patches (without making time-consuming cleanup changes).

My last concern is wrapping my head around the config options.

HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.

In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
original split out of watchdog_hld.c).  Actually more like the
SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.

In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
SOFTLOCKUP_DETECTOR framework.  Instead just the bare bones LOCKUP_DETECTOR.


If so, the following is a little confusing to me..


<snip>

> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e4587ebe52c7..a3afe3e10278 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
>  	  The frequency of hrtimer and NMI events and the soft and hard lockup
>  	  thresholds can be controlled through the sysctl watchdog_thresh.
>  
> +config SOFTLOCKUP_DETECTOR
> +	bool "Detect Soft Lockups"
> +	depends on LOCKUP_DETECTOR
> +
> +config HARDLOCKUP_DETECTOR_PERF
> +	bool
> +	select SOFTLOCKUP_DETECTOR

Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'

> +
> +#
> +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> +# rather than the perf based detector.
> +#
> +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> +# (e.g., sysctl and cmdline).
> +#
>  config HARDLOCKUP_DETECTOR
> -	def_bool y
> -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> +	bool "Detect Hard Lockups"
> +	depends on LOCKUP_DETECTOR
> +	depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> +	select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG

Here is my confusion with HAVE_NMI_WATCHDOG

It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
which would break sparc, I think.

And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
dependency on !HAVE_NMI_WATCHDOG???

Yeah, the config options are confusing.

If the above is right then we might need something like

depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI

(to replace the depends on !HAVE_NMI_WATCHDOG.. line)

Cheers,
Don

>  
>  config BOOTPARAM_HARDLOCKUP_PANIC
>  	bool "Panic (Reboot) On Hard Lockups"
> @@ -826,7 +843,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>  
>  config BOOTPARAM_SOFTLOCKUP_PANIC
>  	bool "Panic (Reboot) On Soft Lockups"
> -	depends on LOCKUP_DETECTOR
> +	depends on SOFTLOCKUP_DETECTOR
>  	help
>  	  Say Y here to enable the kernel to panic on "soft lockups",
>  	  which are bugs that cause the kernel to loop in kernel
> @@ -843,7 +860,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>  
>  config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>  	int
> -	depends on LOCKUP_DETECTOR
> +	depends on SOFTLOCKUP_DETECTOR
>  	range 0 1
>  	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
>  	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
> @@ -851,7 +868,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>  config DETECT_HUNG_TASK
>  	bool "Detect Hung Tasks"
>  	depends on DEBUG_KERNEL
> -	default LOCKUP_DETECTOR
> +	default SOFTLOCKUP_DETECTOR
>  	help
>  	  Say Y here to enable the kernel to detect "hung tasks",
>  	  which are bugs that cause the task to be stuck in
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-02 20:15   ` Don Zickus
@ 2017-06-03  6:10     ` Nicholas Piggin
  2017-06-06 16:49       ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-03  6:10 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, linux-arch

On Fri, 2 Jun 2017 16:15:00 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Tue, May 30, 2017 at 11:26:58AM +1000, Nicholas Piggin wrote:
> > Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> > HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
> > 
> > LOCKUP_DETECTOR provides the boot, sysctl, and programming interfaces
> > for lockup detectors. An architecture that defines HAVE_NMI_WATCHDOG
> > need not use this this if it has a very basic watchdog or uses its own
> > options and interfaces (e.g., sparc). touch_nmi_watchdog() will
> > continue to call their arch_touch_nmi_watchdog().
> > 
> > HARDLOCKUP_DETECTOR_PERF is the perf-based lockup detector.
> > 
> > HARDLOCKUP_DETECTOR is the framework for arch NMI_WATCHDOG hard lockup
> > detectors that conform to the LOCKUP_DETECTOR interfaces.  
> 
> Hi Nick,
> 
> Sorry for the late response.  I did some sanity testing on your patches on
> x86_64 and it seems to work fine.  I don't think I have any real issues with
> the patches (without making time-consuming cleanup changes).
> 
> My last concern is wrapping my head around the config options.
> 
> HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.

Yeah it's not the clearest. I think we need another pass over config
options to start straightening them out.

It means the arch has a hardlockup detector, so it has the
arch_touch_nmi_watchdog() and you can't also select the perf HLD.

> In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
> original split out of watchdog_hld.c).  Actually more like the
> SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.

Well yes it uses some of the start/stop framework for the SLD, but
doesn't use much beyond that of the lockup detector stuff (most of
the boot options and sysctl parameters etc it does not use). So sparc
is a little odd.

I would hope to convert it over to more like powerpc patch and make
it a first class HLD, but it seems not all options are 100% compatible
so it would need some careful testing.

> 
> In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
> SOFTLOCKUP_DETECTOR framework.  Instead just the bare bones LOCKUP_DETECTOR.

It does use the HLD framework. The subsequent patch for powerpc adds
a PPC64 case in the dependencies.

> 
> 
> If so, the following is a little confusing to me..
> 
> 
> <snip>
> 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index e4587ebe52c7..a3afe3e10278 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
> >  	  The frequency of hrtimer and NMI events and the soft and hard lockup
> >  	  thresholds can be controlled through the sysctl watchdog_thresh.
> >  
> > +config SOFTLOCKUP_DETECTOR
> > +	bool "Detect Soft Lockups"
> > +	depends on LOCKUP_DETECTOR
> > +
> > +config HARDLOCKUP_DETECTOR_PERF
> > +	bool
> > +	select SOFTLOCKUP_DETECTOR  
> 
> Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'

Kconfig is pretty clunky, I was struggling to make it do the right
thing... I could try.

> 
> > +
> > +#
> > +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> > +# rather than the perf based detector.
> > +#
> > +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> > +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> > +# (e.g., sysctl and cmdline).
> > +#
> >  config HARDLOCKUP_DETECTOR
> > -	def_bool y
> > -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > +	bool "Detect Hard Lockups"
> > +	depends on LOCKUP_DETECTOR
> > +	depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > +	select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG  
> 
> Here is my confusion with HAVE_NMI_WATCHDOG
> 
> It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> which would break sparc, I think.
> 
> And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> dependency on !HAVE_NMI_WATCHDOG???

I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.

After this patch, it always selects the _PERF detector because that's
the only one available. See the powerpc patch which adds the PPC64
exception here.

Yes it's a bit clunky. I think we can subsequently remove HAVE_NMI_DETECTOR
and replace it with something a bit saner and clean up some of these
convoluted cases.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4][V3] Improve watchdog config for arch watchdogs
  2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-05-30  1:26 ` [PATCH 4/4] watchdog: Provide watchdog_reconfigure() for arch watchdogs Nicholas Piggin
@ 2017-06-06 16:08 ` Don Zickus
  2017-06-06 19:46   ` Babu Moger
  4 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-06 16:08 UTC (permalink / raw)
  To: Nicholas Piggin, babu.moger; +Cc: linux-kernel, linux-arch

(adding Babu)

On Tue, May 30, 2017 at 11:26:55AM +1000, Nicholas Piggin wrote:
> Since last time:
> 
> - Have the perf based hardlockup detector use arch_touch_nmi_watchdog()
>   rather than hld_touch_nmi_watchdog(). This changes direction slightly
>   to make the perf-based hard lockup detector an alternative that an
>   arch may select, rather than standalone. This better reflects how the
>   code works in practice).
> 
> - Hopefully fixed the Kconfig options. There's still a bit of ugliness
>   that will require another pass or two over interfaces and config
>   scheme, but the idea is to make a minimal change to get the powerpc
>   HLD in, which gives a reasonable starting point to improve things
>   further.

Hi Babu,

Does this patchset break sparc?  Specifically patch3 with all the config
option changes?

Cheers,
Don

> 
> Nicholas Piggin (4):
>   watchdog: remove unused declaration
>   watchdog: Introduce arch_touch_nmi_watchdog()
>   watchdog: Split up config options
>   watchdog: Provide watchdog_reconfigure() for arch watchdogs
> 
>  arch/blackfin/include/asm/nmi.h            |   2 +
>  arch/blackfin/kernel/nmi.c                 |   2 +-
>  arch/mn10300/include/asm/nmi.h             |   2 +
>  arch/mn10300/kernel/mn10300-watchdog-low.S |   8 +-
>  arch/mn10300/kernel/mn10300-watchdog.c     |   2 +-
>  arch/powerpc/kernel/setup_64.c             |   2 +-
>  arch/sparc/include/asm/nmi.h               |   1 +
>  arch/sparc/kernel/nmi.c                    |   6 +-
>  arch/x86/kernel/apic/hw_nmi.c              |   2 +-
>  include/linux/nmi.h                        |  57 ++++---
>  kernel/Makefile                            |   2 +-
>  kernel/sysctl.c                            |  18 +-
>  kernel/watchdog.c                          | 263 +++++++++++++++++++----------
>  kernel/watchdog_hld.c                      |  37 +---
>  lib/Kconfig.debug                          |  29 +++-
>  15 files changed, 263 insertions(+), 170 deletions(-)
> 
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-03  6:10     ` Nicholas Piggin
@ 2017-06-06 16:49       ` Don Zickus
  2017-06-07  3:50         ` Nicholas Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-06 16:49 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, linux-arch

On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:
> > My last concern is wrapping my head around the config options.
> > 
> > HAVE_NMI_WATCHDOG seems to have a dual meaning, I think.
> 
> Yeah it's not the clearest. I think we need another pass over config
> options to start straightening them out.
> 
> It means the arch has a hardlockup detector, so it has the
> arch_touch_nmi_watchdog() and you can't also select the perf HLD.

Ok, agreed.

> 
> > In the sparc case, it uses the HARDLOCKUP_DETECTOR framework (hence the
> > original split out of watchdog_hld.c).  Actually more like the
> > SOFTLOCKUP_DETECTOR framework for which HARDLOCKUP_DETECTOR is a part of.
> 
> Well yes it uses some of the start/stop framework for the SLD, but
> doesn't use much beyond that of the lockup detector stuff (most of
> the boot options and sysctl parameters etc it does not use). So sparc
> is a little odd.
> 
> I would hope to convert it over to more like powerpc patch and make
> it a first class HLD, but it seems not all options are 100% compatible
> so it would need some careful testing.
> 

Ok.  More comments on sparc below..

> > 
> > In your ppc64 case, it means do _not_ use the HARDLOCKUP_DETECTOR or
> > SOFTLOCKUP_DETECTOR framework.  Instead just the bare bones LOCKUP_DETECTOR.
> 
> It does use the HLD framework. The subsequent patch for powerpc adds
> a PPC64 case in the dependencies.

Ah, ok.

> 
> > 
> > 
> > If so, the following is a little confusing to me..
> > 
> > 
> > <snip>
> > 
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index e4587ebe52c7..a3afe3e10278 100644
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -801,10 +801,27 @@ config LOCKUP_DETECTOR
> > >  	  The frequency of hrtimer and NMI events and the soft and hard lockup
> > >  	  thresholds can be controlled through the sysctl watchdog_thresh.
> > >  
> > > +config SOFTLOCKUP_DETECTOR
> > > +	bool "Detect Soft Lockups"
> > > +	depends on LOCKUP_DETECTOR
> > > +
> > > +config HARDLOCKUP_DETECTOR_PERF
> > > +	bool
> > > +	select SOFTLOCKUP_DETECTOR  
> > 
> > Perhaps add a 'depends on (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)'
> 
> Kconfig is pretty clunky, I was struggling to make it do the right
> thing... I could try.
> 
> > 
> > > +
> > > +#
> > > +# arch/ can define HAVE_NMI_WATCHDOG to provide their own NMI watchdog
> > > +# rather than the perf based detector.
> > > +#
> > > +# The arch may do its own thing, or select HARDLOCKUP_DETECTOR, in which
> > > +# case it should conform to HARDLOCKUP_DETECTOR interfaces and settings
> > > +# (e.g., sysctl and cmdline).
> > > +#
> > >  config HARDLOCKUP_DETECTOR
> > > -	def_bool y
> > > -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > +	bool "Detect Hard Lockups"
> > > +	depends on LOCKUP_DETECTOR
> > > +	depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > > +	select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG  
> > 
> > Here is my confusion with HAVE_NMI_WATCHDOG
> > 
> > It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> > which would break sparc, I think.
> > 
> > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > dependency on !HAVE_NMI_WATCHDOG???
> 
> I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.

Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.

For example look at kernel/watchdog.c::watchdog_enabled (line 38)

sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
which I believes means sparc's nmi_watchdog is disabled on boot and has to
be manually enabled?


I _think_ having

depends on LOCKUP_DETECTOR
depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG

will work because your new definition of HARDLOCKUP_DETECTOR is a
combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??

Did I get that right?


I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
HAVE_PERF_NMI_WATCHDOG and then use those two to determine
HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
confusing?

Thoughts?

Cheers,
Don

> 
> After this patch, it always selects the _PERF detector because that's
> the only one available. See the powerpc patch which adds the PPC64
> exception here.
> 
> Yes it's a bit clunky. I think we can subsequently remove HAVE_NMI_DETECTOR
> and replace it with something a bit saner and clean up some of these
> convoluted cases.
> 
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4][V3] Improve watchdog config for arch watchdogs
  2017-06-06 16:08 ` [PATCH 0/4][V3] Improve watchdog config " Don Zickus
@ 2017-06-06 19:46   ` Babu Moger
  2017-06-07 14:37     ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Babu Moger @ 2017-06-06 19:46 UTC (permalink / raw)
  To: Don Zickus, Nicholas Piggin; +Cc: linux-kernel, linux-arch

Hi Don, Nicholas,


On 6/6/2017 11:08 AM, Don Zickus wrote:
> (adding Babu)
>
> On Tue, May 30, 2017 at 11:26:55AM +1000, Nicholas Piggin wrote:
>> Since last time:
>>
>> - Have the perf based hardlockup detector use arch_touch_nmi_watchdog()
>>    rather than hld_touch_nmi_watchdog(). This changes direction slightly
>>    to make the perf-based hard lockup detector an alternative that an
>>    arch may select, rather than standalone. This better reflects how the
>>    code works in practice).
>>
>> - Hopefully fixed the Kconfig options. There's still a bit of ugliness
>>    that will require another pass or two over interfaces and config
>>    scheme, but the idea is to make a minimal change to get the powerpc
>>    HLD in, which gives a reasonable starting point to improve things
>>    further.
> Hi Babu,
>
> Does this patchset break sparc?  Specifically patch3 with all the config
Patches applies, compiles fine and also works fine for most part. 
However, there are few issues.

We need to enter 'N' or 'Y'  for SOFTLOCKUP_DETECTOR.

*
* Restart config...
*
*
* Debug Lockups and Hangs
*
Detect Hard and Soft Lockups (LOCKUP_DETECTOR) [Y/n/?] y
   Detect Soft Lockups (SOFTLOCKUP_DETECTOR) [N/y] (NEW)

For SPARC, softlockup is enabled by default earlier.   May be we need to 
submit another patch to enable this in

arch/sparc/configs/sparc64_defconfig.   Not  a big issue.


Another issue.
before that patch

# cat /proc/sys/kernel/watchdog
1
# cat /proc/sys/kernel/nmi_watchdog
1

After the patch

# cat /proc/sys/kernel/watchdog
1
# cat /proc/sys/kernel/nmi_watchdog
0

I think this is mostly due to change in this code below.

#ifdef CONFIG_HARDLOCKUP_DETECTOR
unsigned long __read_mostly watchdog_enabled = 
SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;

Old code was like this

#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
unsigned long __read_mostly watchdog_enabled = 
SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
#else
unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
#endif

SPARC defines CONFIG_HAVE_NMI_WATCHDOG.

Thanks
Babu

> option changes?
>
> Cheers,
> Don
>
>> Nicholas Piggin (4):
>>    watchdog: remove unused declaration
>>    watchdog: Introduce arch_touch_nmi_watchdog()
>>    watchdog: Split up config options
>>    watchdog: Provide watchdog_reconfigure() for arch watchdogs
>>
>>   arch/blackfin/include/asm/nmi.h            |   2 +
>>   arch/blackfin/kernel/nmi.c                 |   2 +-
>>   arch/mn10300/include/asm/nmi.h             |   2 +
>>   arch/mn10300/kernel/mn10300-watchdog-low.S |   8 +-
>>   arch/mn10300/kernel/mn10300-watchdog.c     |   2 +-
>>   arch/powerpc/kernel/setup_64.c             |   2 +-
>>   arch/sparc/include/asm/nmi.h               |   1 +
>>   arch/sparc/kernel/nmi.c                    |   6 +-
>>   arch/x86/kernel/apic/hw_nmi.c              |   2 +-
>>   include/linux/nmi.h                        |  57 ++++---
>>   kernel/Makefile                            |   2 +-
>>   kernel/sysctl.c                            |  18 +-
>>   kernel/watchdog.c                          | 263 +++++++++++++++++++----------
>>   kernel/watchdog_hld.c                      |  37 +---
>>   lib/Kconfig.debug                          |  29 +++-
>>   15 files changed, 263 insertions(+), 170 deletions(-)
>>
>> -- 
>> 2.11.0
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-06 16:49       ` Don Zickus
@ 2017-06-07  3:50         ` Nicholas Piggin
  2017-06-08 16:05           ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-07  3:50 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, linux-arch

On Tue, 6 Jun 2017 12:49:58 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Sat, Jun 03, 2017 at 04:10:05PM +1000, Nicholas Piggin wrote:

> > > >  config HARDLOCKUP_DETECTOR
> > > > -	def_bool y
> > > > -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> > > > -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> > > > +	bool "Detect Hard Lockups"
> > > > +	depends on LOCKUP_DETECTOR
> > > > +	depends on !HAVE_NMI_WATCHDOG && (PERF_EVENTS && HAVE_PERF_EVENTS_NMI)
> > > > +	select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG    
> > > 
> > > Here is my confusion with HAVE_NMI_WATCHDOG
> > > 
> > > It seems like you can only select HARDLOCKUP_DETECTOR if !HAVE_NMI_DETECTOR
> > > which would break sparc, I think.
> > > 
> > > And then it always selects HARDLOCKUP_DETECTOR_PERF because of the
> > > dependency on !HAVE_NMI_WATCHDOG???  
> > 
> > I don't think so -- sparc today does not select HARDLOCKUP_DETECTOR.  
> 
> Well yes, but your patch changes the definition of HARDLOCKUP_DETECTOR to
> HARDLOCKUP_DETECTOR_PERF and then recreates HARDLOCKUP_DETECTOR.
> 
> For example look at kernel/watchdog.c::watchdog_enabled (line 38)
> 
> sparc has HAVE_NMI_WATCHDOG set, but that will disable HARDLOCKUP_DETECTOR
> which I believes means sparc's nmi_watchdog is disabled on boot and has to
> be manually enabled?

Ahh okay because sparc is using watchdog_nmi_enable/disable from the
softlockup watchdog, which checks watchdog_enabled.

> 
> 
> I _think_ having
> 
> depends on LOCKUP_DETECTOR
> depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> 
> will work because your new definition of HARDLOCKUP_DETECTOR is a
> combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> 
> Did I get that right?

Well in some ways, except that most of the NMI watchdogs do not seem to
heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.

NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
completely it own thing.

sparc is somewhere in the middle. It uses some of the HLD stuff, but
not all. That makes it a bit tricky.


> I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
> confusing?

This would probably be the right direction to go in, but it will take
slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
meaning that an arch has its own watchdog and does not want any HLD
stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.

While transitioning, we could add a new option instead,

HAVE_ARCH_HARDLOCKUP_DETECTOR

I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
HLD. Possibly you could just change the name to be a bit more regular,
HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Thanks,
Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/4][V3] Improve watchdog config for arch watchdogs
  2017-06-06 19:46   ` Babu Moger
@ 2017-06-07 14:37     ` Don Zickus
  0 siblings, 0 replies; 23+ messages in thread
From: Don Zickus @ 2017-06-07 14:37 UTC (permalink / raw)
  To: Babu Moger; +Cc: Nicholas Piggin, linux-kernel, linux-arch

On Tue, Jun 06, 2017 at 02:46:48PM -0500, Babu Moger wrote:
> Hi Don, Nicholas,
> 
> 
> On 6/6/2017 11:08 AM, Don Zickus wrote:
> > (adding Babu)
> > 
> > On Tue, May 30, 2017 at 11:26:55AM +1000, Nicholas Piggin wrote:
> > > Since last time:
> > > 
> > > - Have the perf based hardlockup detector use arch_touch_nmi_watchdog()
> > >    rather than hld_touch_nmi_watchdog(). This changes direction slightly
> > >    to make the perf-based hard lockup detector an alternative that an
> > >    arch may select, rather than standalone. This better reflects how the
> > >    code works in practice).
> > > 
> > > - Hopefully fixed the Kconfig options. There's still a bit of ugliness
> > >    that will require another pass or two over interfaces and config
> > >    scheme, but the idea is to make a minimal change to get the powerpc
> > >    HLD in, which gives a reasonable starting point to improve things
> > >    further.
> > Hi Babu,
> > 
> > Does this patchset break sparc?  Specifically patch3 with all the config
> Patches applies, compiles fine and also works fine for most part. However,
> there are few issues.

Thanks for the quick turnaround!

> 
> We need to enter 'N' or 'Y'  for SOFTLOCKUP_DETECTOR.
> 
> *
> * Restart config...
> *
> *
> * Debug Lockups and Hangs
> *
> Detect Hard and Soft Lockups (LOCKUP_DETECTOR) [Y/n/?] y
>   Detect Soft Lockups (SOFTLOCKUP_DETECTOR) [N/y] (NEW)
> 
> For SPARC, softlockup is enabled by default earlier.   May be we need to
> submit another patch to enable this in
> 
> arch/sparc/configs/sparc64_defconfig.   Not  a big issue.

Hmm, I think the nmi_enable/disable stuff is wrapped into the
SOFTLOCKUP_DETECTOR code, so you might need it.  Though Nick did create a
separate interface outside of SOFTLOCKUP to something similar.  I believe
patch4 introduces nmi_reconfigure().

Not sure if the spirit of the sparc nmi_watchdog code wants SOFTLOCKUP or
not.

> 
> 
> Another issue.
> before that patch
> 
> # cat /proc/sys/kernel/watchdog
> 1
> # cat /proc/sys/kernel/nmi_watchdog
> 1
> 
> After the patch
> 
> # cat /proc/sys/kernel/watchdog
> 1
> # cat /proc/sys/kernel/nmi_watchdog
> 0

Yes, that is what I thought.  Thanks for confirming!

> 
> I think this is mostly due to change in this code below.
> 
> #ifdef CONFIG_HARDLOCKUP_DETECTOR
> unsigned long __read_mostly watchdog_enabled =
> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> 
> Old code was like this
> 
> #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
> unsigned long __read_mostly watchdog_enabled =
> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> #else
> unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> #endif
> 
> SPARC defines CONFIG_HAVE_NMI_WATCHDOG.


I am still working with Nick to deal with these config issues.  But I am
going to keep this one in mind while we work through it.

Cheers,
Don

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-07  3:50         ` Nicholas Piggin
@ 2017-06-08 16:05           ` Don Zickus
  2017-06-12  8:07             ` Nicholas Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-08 16:05 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, linux-arch

On Wed, Jun 07, 2017 at 01:50:26PM +1000, Nicholas Piggin wrote:
> > 
> > I _think_ having
> > 
> > depends on LOCKUP_DETECTOR
> > depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> > select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > 
> > will work because your new definition of HARDLOCKUP_DETECTOR is a
> > combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> > 
> > Did I get that right?
> 
> Well in some ways, except that most of the NMI watchdogs do not seem to
> heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.
> 
> NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
> completely it own thing.
> 
> sparc is somewhere in the middle. It uses some of the HLD stuff, but
> not all. That makes it a bit tricky.

Hmm, I can see sparc relying on SOFTLOCKUP, but the HARDLOCKUP code with
your patches seems really small.  The only sysctl is nmi_watchdog and
hardlockup_panic.  The former is needed but the later is only implemented in
the HARDLOCKUP_DETECTOR_PERF case.

So by having HAVE_NMI_WATCHDOG, you sorta need a sysctl knob which the
HARDLOCKUP_DETECTOR seems to provide on top of the default knobs.

With sparc being special and needing SOFTLOCKUP to call its nmi
enable/disable hooks.


Is there a particular chunk of code you had in mind that did not make sense
with HARDLOCKUP_DETECTOR enabled?

> 
> 
> > I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> > HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> > HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
> > confusing?
> 
> This would probably be the right direction to go in, but it will take
> slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> meaning that an arch has its own watchdog and does not want any HLD
> stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> 
> While transitioning, we could add a new option instead,
> 
> HAVE_ARCH_HARDLOCKUP_DETECTOR
> 
> I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> HLD. Possibly you could just change the name to be a bit more regular,
> HAVE_PERF_NMI_HARDLOCKUP_DETECTOR

Actually, I don't think I can just rename it as it has a specific use to let
OPROFILE know the perf events are being NMI triggered as opposed to IRQ
triggered.

Though I like the direction you are going.  Then arches either have one or
the other.  Or in the ppc case it is dependent on what ppc platform is being
used.

Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
the arch/<arch>/Kconfig explicitly stating which one to use?

Cheers,
Don



> 
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-08 16:05           ` Don Zickus
@ 2017-06-12  8:07             ` Nicholas Piggin
  2017-06-12 20:41               ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-12  8:07 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, linux-arch

On Thu, 8 Jun 2017 12:05:02 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Wed, Jun 07, 2017 at 01:50:26PM +1000, Nicholas Piggin wrote:
> > > 
> > > I _think_ having
> > > 
> > > depends on LOCKUP_DETECTOR
> > > depends on HAVE_NMI_WATCHDOG || HAVE_PERF_EVENTS_NMI
> > > select HARDLOCKUP_DETECTOR_PERF if !HAVE_NMI_WATCHDOG
> > > 
> > > will work because your new definition of HARDLOCKUP_DETECTOR is a
> > > combination of HAVE_NMI_WATCHDOG && HARDLOCKUP_DETECTOR_PERF ??
> > > 
> > > Did I get that right?  
> > 
> > Well in some ways, except that most of the NMI watchdogs do not seem to
> > heed the HARDLOCKUP_DETECTOR configuration sysctls and commands.
> > 
> > NMI_WATCHDOG by itself was supposed to be for an arch that wnats to do
> > completely it own thing.
> > 
> > sparc is somewhere in the middle. It uses some of the HLD stuff, but
> > not all. That makes it a bit tricky.  
> 
> Hmm, I can see sparc relying on SOFTLOCKUP, but the HARDLOCKUP code with
> your patches seems really small.  The only sysctl is nmi_watchdog and
> hardlockup_panic.  The former is needed but the later is only implemented in
> the HARDLOCKUP_DETECTOR_PERF case.
> 
> So by having HAVE_NMI_WATCHDOG, you sorta need a sysctl knob which the
> HARDLOCKUP_DETECTOR seems to provide on top of the default knobs.

I would say there's a few others like the cpumask, threshold, panic,
backtrace, etc.

> With sparc being special and needing SOFTLOCKUP to call its nmi
> enable/disable hooks.

Yes, although we could just remove that dependency (it's minimal code
to start them up with hotplug).

That said, I would hope to actually go the other way in general and move
architectures to using the generic parameters as much as possible.

> Is there a particular chunk of code you had in mind that did not make sense
> with HARDLOCKUP_DETECTOR enabled?

Perhaps the couple of other archs that have their own NMI watchdogs.

> 
> > 
> >   
> > > I almost wonder if arches should set either HAVE_NMI_WATCHDOG or
> > > HAVE_PERF_NMI_WATCHDOG and then use those two to determine
> > > HARDLOCKUP_DETECTOR.  Would that make the config options slightly less
> > > confusing?  
> > 
> > This would probably be the right direction to go in, but it will take
> > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > meaning that an arch has its own watchdog and does not want any HLD
> > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> > 
> > While transitioning, we could add a new option instead,
> > 
> > HAVE_ARCH_HARDLOCKUP_DETECTOR
> > 
> > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > HLD. Possibly you could just change the name to be a bit more regular,
> > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR  
> 
> Actually, I don't think I can just rename it as it has a specific use to let
> OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> triggered.
> 
> Though I like the direction you are going.  Then arches either have one or
> the other.  Or in the ppc case it is dependent on what ppc platform is being
> used.

Okay, glad we're on the same page conceptually :)

> 
> Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> the arch/<arch>/Kconfig explicitly stating which one to use?

Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
it provides its own. HARDLOCKUP_DETECTOR option would then depend on
one of the two being defined.

I could try redoing the series with those changes to Kconfig and see how
it looks?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-12  8:07             ` Nicholas Piggin
@ 2017-06-12 20:41               ` Don Zickus
  2017-06-13 16:11                 ` Nicholas Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-12 20:41 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, linux-arch

On Mon, Jun 12, 2017 at 06:07:39PM +1000, Nicholas Piggin wrote:
> > > This would probably be the right direction to go in, but it will take
> > > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > > meaning that an arch has its own watchdog and does not want any HLD
> > > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> > > 
> > > While transitioning, we could add a new option instead,
> > > 
> > > HAVE_ARCH_HARDLOCKUP_DETECTOR
> > > 
> > > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > > HLD. Possibly you could just change the name to be a bit more regular,
> > > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR  
> > 
> > Actually, I don't think I can just rename it as it has a specific use to let
> > OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> > triggered.
> > 
> > Though I like the direction you are going.  Then arches either have one or
> > the other.  Or in the ppc case it is dependent on what ppc platform is being
> > used.
> 
> Okay, glad we're on the same page conceptually :)
> 
> > 
> > Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> > the arch/<arch>/Kconfig explicitly stating which one to use?
> 
> Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
> it provides its own. HARDLOCKUP_DETECTOR option would then depend on
> one of the two being defined.
> 
> I could try redoing the series with those changes to Kconfig and see how
> it looks?

Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
are getting close to have this defined properly which would allow us to
split the code up correctly in the future.

Cheers,
Don

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-12 20:41               ` Don Zickus
@ 2017-06-13 16:11                 ` Nicholas Piggin
  2017-06-14 14:09                   ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-13 16:11 UTC (permalink / raw)
  To: Don Zickus; +Cc: linux-kernel, linux-arch

On Mon, 12 Jun 2017 16:41:56 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Mon, Jun 12, 2017 at 06:07:39PM +1000, Nicholas Piggin wrote:
> > > > This would probably be the right direction to go in, but it will take
> > > > slightly more I think. We first need to remove HAVE_NMI_WATCHDOG from
> > > > meaning that an arch has its own watchdog and does not want any HLD
> > > > stuff. I think with arch_touch_nmi_watchdog(), we can probably get there.
> > > > 
> > > > While transitioning, we could add a new option instead,
> > > > 
> > > > HAVE_ARCH_HARDLOCKUP_DETECTOR
> > > > 
> > > > I think HAVE_PERF_EVENTS_NMI is sufficient to imply it will use the PERF
> > > > HLD. Possibly you could just change the name to be a bit more regular,
> > > > HAVE_PERF_NMI_HARDLOCKUP_DETECTOR    
> > > 
> > > Actually, I don't think I can just rename it as it has a specific use to let
> > > OPROFILE know the perf events are being NMI triggered as opposed to IRQ
> > > triggered.
> > > 
> > > Though I like the direction you are going.  Then arches either have one or
> > > the other.  Or in the ppc case it is dependent on what ppc platform is being
> > > used.  
> > 
> > Okay, glad we're on the same page conceptually :)
> >   
> > > 
> > > Then the HARDLOCKUP_DETECTOR needs one or the other to work correctly with
> > > the arch/<arch>/Kconfig explicitly stating which one to use?  
> > 
> > Yeah I guess the arch would advertise it has the PERF_HLD or ARCH_HLD if
> > it provides its own. HARDLOCKUP_DETECTOR option would then depend on
> > one of the two being defined.
> > 
> > I could try redoing the series with those changes to Kconfig and see how
> > it looks?  
> 
> Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
> are getting close to have this defined properly which would allow us to
> split the code up correctly in the future.

How's this for a replacement patch 3? I think the Kconfig works out much
better now.
--

Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
the minimum arch_touch_nmi_watchdog, and it otherwise does its own
thing and does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to
a full HAVE_HARDLOCKUP_DETECTOR_ARCH.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/Kconfig                   |  23 ++++
 arch/powerpc/Kconfig           |   1 +
 arch/powerpc/kernel/setup_64.c |   2 +-
 arch/x86/Kconfig               |   1 +
 arch/x86/kernel/apic/hw_nmi.c  |   2 +-
 include/linux/nmi.h            |  29 +++--
 kernel/Makefile                |   2 +-
 kernel/sysctl.c                |  16 +--
 kernel/watchdog.c              | 238 ++++++++++++++++++++++++++---------------
 kernel/watchdog_hld.c          |  32 ------
 lib/Kconfig.debug              |  26 +++--
 11 files changed, 231 insertions(+), 141 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 6c00e5b00f8b..878addc6f141 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
 	  subsystem.  Also has support for calculating CPU cycle events
 	  to determine how many clock cycles in a given period.
 
+
+config HAVE_HARDLOCKUP_DETECTOR_PERF
+	bool
+	depends on HAVE_PERF_EVENTS_NMI
+	help
+	  The arch chooses to use the generic perf-NMI-based hardlockup
+	  detector. Must define HAVE_PERF_EVENTS_NMI.
+
+config HAVE_NMI_WATCHDOG
+	bool
+	help
+	  The arch provides a low level NMI watchdog. It provides
+	  asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
+
+config HAVE_HARDLOCKUP_DETECTOR_ARCH
+	bool
+	select HAVE_NMI_WATCHDOG
+	help
+	  The arch chooses to provide its own hardlockup detector, which is
+	  a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
+	  interfaces and parameters provided by hardlockup detector subsystem.
+
+
 config HAVE_PERF_REGS
 	bool
 	help
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index f7c8f9972f61..7aba96ec3378 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -202,6 +202,7 @@ config PPC
 	select HAVE_OPTPROBES			if PPC64
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_EVENTS_NMI		if PPC64
+	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE		if SMP
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f35ff9dea4fb..ab650905f75a 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
 EXPORT_SYMBOL(ppc_pci_io);
 #endif
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return ppc_proc_freq * watchdog_thresh;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc7232a..3c084149b5d1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -157,6 +157,7 @@ config X86
 	select HAVE_PCSPKR_PLATFORM
 	select HAVE_PERF_EVENTS
 	select HAVE_PERF_EVENTS_NMI
+	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
 	select HAVE_PERF_REGS
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index c73c9fb281e1..d6f387780849 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -19,7 +19,7 @@
 #include <linux/init.h>
 #include <linux/delay.h>
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index bd387ef8bccd..8aa01fd859fb 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -11,13 +11,21 @@
 #endif
 
 #ifdef CONFIG_LOCKUP_DETECTOR
+void lockup_detector_init(void);
+#else
+static inline void lockup_detector_init(void)
+{
+}
+#endif
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 extern void touch_softlockup_watchdog_sched(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
-extern unsigned int  hardlockup_panic;
-void lockup_detector_init(void);
+extern int soft_watchdog_enabled;
+extern atomic_t watchdog_park_in_progress;
 #else
 static inline void touch_softlockup_watchdog_sched(void)
 {
@@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
 static inline void touch_all_softlockup_watchdogs(void)
 {
 }
-static inline void lockup_detector_init(void)
-{
-}
 #endif
 
 #ifdef CONFIG_DETECT_HUNG_TASK
@@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void hardlockup_detector_disable(void);
+extern unsigned int hardlockup_panic;
 #else
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
-#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
+#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void arch_touch_nmi_watchdog(void);
 #else
+#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
+#endif
 
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
@@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
 }
 #endif
 
-#ifdef CONFIG_LOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+#endif
+
+#ifdef CONFIG_LOCKUP_DETECTOR
 extern int nmi_watchdog_enabled;
-extern int soft_watchdog_enabled;
 extern int watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
+extern struct cpumask watchdog_cpumask;
 extern unsigned long *watchdog_cpumask_bits;
-extern atomic_t watchdog_park_in_progress;
+extern int __read_mostly watchdog_suspended;
 #ifdef CONFIG_SMP
 extern int sysctl_softlockup_all_cpu_backtrace;
 extern int sysctl_hardlockup_all_cpu_backtrace;
diff --git a/kernel/Makefile b/kernel/Makefile
index 72aa080f91f0..4cb8e8b23c6e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
 obj-$(CONFIG_KGDB) += debug/
 obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
 obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
-obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
+obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
 obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4dfba1a76cc3..46b189fd865b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
 #endif
 	},
 	{
+		.procname	= "watchdog_cpumask",
+		.data		= &watchdog_cpumask_bits,
+		.maxlen		= NR_CPUS,
+		.mode		= 0644,
+		.proc_handler	= proc_watchdog_cpumask,
+	},
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+	{
 		.procname       = "soft_watchdog",
 		.data           = &soft_watchdog_enabled,
 		.maxlen         = sizeof (int),
@@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 	{
-		.procname	= "watchdog_cpumask",
-		.data		= &watchdog_cpumask_bits,
-		.maxlen		= NR_CPUS,
-		.mode		= 0644,
-		.proc_handler	= proc_watchdog_cpumask,
-	},
-	{
 		.procname	= "softlockup_panic",
 		.data		= &softlockup_panic,
 		.maxlen		= sizeof(int),
@@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+#endif
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 	{
 		.procname	= "hardlockup_panic",
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 03e0b69bb5bf..deb010505646 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,15 +29,55 @@
 #include <linux/kvm_para.h>
 #include <linux/kthread.h>
 
+/* Watchdog configuration */
 static DEFINE_MUTEX(watchdog_proc_mutex);
 
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
+
+/* boot commands */
+/*
+ * Should we panic when a soft-lockup or hard-lockup occurs:
+ */
+unsigned int __read_mostly hardlockup_panic =
+			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
+/*
+ * We may not want to enable hard lockup detection by default in all cases,
+ * for example when running the kernel as a guest on a hypervisor. In these
+ * cases this function can be called to disable hard lockup detection. This
+ * function should only be executed once by the boot processor before the
+ * kernel command line parameters are parsed, because otherwise it is not
+ * possible to override this in hardlockup_panic_setup().
+ */
+void hardlockup_detector_disable(void)
+{
+	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+}
+
+static int __init hardlockup_panic_setup(char *str)
+{
+	if (!strncmp(str, "panic", 5))
+		hardlockup_panic = 1;
+	else if (!strncmp(str, "nopanic", 7))
+		hardlockup_panic = 0;
+	else if (!strncmp(str, "0", 1))
+		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+	else if (!strncmp(str, "1", 1))
+		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+	return 1;
+}
+__setup("nmi_watchdog=", hardlockup_panic_setup);
+
 #else
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
 #endif
-int __read_mostly nmi_watchdog_enabled;
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 int __read_mostly soft_watchdog_enabled;
+#endif
+
 int __read_mostly watchdog_user_enabled;
 int __read_mostly watchdog_thresh = 10;
 
@@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
 int __read_mostly sysctl_softlockup_all_cpu_backtrace;
 int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 #endif
-static struct cpumask watchdog_cpumask __read_mostly;
+struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
-/* Helper for online, unparked cpus. */
-#define for_each_watchdog_cpu(cpu) \
-	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
-
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
 /*
  * The 'watchdog_running' variable is set to 1 when the watchdog threads
  * are registered/started and is set to 0 when the watchdog threads are
@@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
  * of 'watchdog_running' cannot change while the watchdog is deactivated
  * temporarily (see related code in 'proc' handlers).
  */
-static int __read_mostly watchdog_suspended;
+int __read_mostly watchdog_suspended;
+
+/*
+ * These functions can be overridden if an architecture implements its
+ * own hardlockup detector.
+ */
+int __weak watchdog_nmi_enable(unsigned int cpu)
+{
+	return 0;
+}
+void __weak watchdog_nmi_disable(unsigned int cpu)
+{
+}
+
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
+
+/* Helper for online, unparked cpus. */
+#define for_each_watchdog_cpu(cpu) \
+	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
+
+atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
 
 static u64 __read_mostly sample_period;
 
@@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
 	return 1;
 }
 __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
+#ifdef CONFIG_HARDLOCKUP_DETECTOR
 static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 {
 	sysctl_hardlockup_all_cpu_backtrace =
@@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
 }
 __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
 #endif
+#endif
 
 /*
  * Hard-lockup warnings should be triggered after just a few seconds. Soft-
@@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
 	__this_cpu_write(watchdog_touch_ts, 0);
 }
 
-/* watchdog detector functions */
-bool is_hardlockup(void)
-{
-	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
-
-	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
-		return true;
-
-	__this_cpu_write(hrtimer_interrupts_saved, hrint);
-	return false;
-}
-
 static int is_softlockup(unsigned long touch_ts)
 {
 	unsigned long now = get_timestamp();
@@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
 	return 0;
 }
 
-static void watchdog_interrupt_count(void)
+/* watchdog detector functions */
+bool is_hardlockup(void)
 {
-	__this_cpu_inc(hrtimer_interrupts);
-}
+	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
 
-/*
- * These two functions are mostly architecture specific
- * defining them as weak here.
- */
-int __weak watchdog_nmi_enable(unsigned int cpu)
-{
-	return 0;
+	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+		return true;
+
+	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+	return false;
 }
-void __weak watchdog_nmi_disable(unsigned int cpu)
+
+static void watchdog_interrupt_count(void)
 {
+	__this_cpu_inc(hrtimer_interrupts);
 }
 
 static int watchdog_enable_all_cpus(void);
@@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
 		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
 }
 
-/*
- * Suspend the hard and soft lockup detector by parking the watchdog threads.
- */
-int lockup_detector_suspend(void)
-{
-	int ret = 0;
-
-	get_online_cpus();
-	mutex_lock(&watchdog_proc_mutex);
-	/*
-	 * Multiple suspend requests can be active in parallel (counted by
-	 * the 'watchdog_suspended' variable). If the watchdog threads are
-	 * running, the first caller takes care that they will be parked.
-	 * The state of 'watchdog_running' cannot change while a suspend
-	 * request is active (see related code in 'proc' handlers).
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		ret = watchdog_park_threads();
-
-	if (ret == 0)
-		watchdog_suspended++;
-	else {
-		watchdog_disable_all_cpus();
-		pr_err("Failed to suspend lockup detectors, disabled\n");
-		watchdog_enabled = 0;
-	}
-
-	mutex_unlock(&watchdog_proc_mutex);
-
-	return ret;
-}
-
-/*
- * Resume the hard and soft lockup detector by unparking the watchdog threads.
- */
-void lockup_detector_resume(void)
-{
-	mutex_lock(&watchdog_proc_mutex);
-
-	watchdog_suspended--;
-	/*
-	 * The watchdog threads are unparked if they were previously running
-	 * and if there is no more active suspend request.
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		watchdog_unpark_threads();
-
-	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
-}
-
 static int update_watchdog_all_cpus(void)
 {
 	int ret;
@@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
 	}
 }
 
+#else /* SOFTLOCKUP */
+static int watchdog_park_threads(void)
+{
+	return 0;
+}
+
+static void watchdog_unpark_threads(void)
+{
+}
+
+static int watchdog_enable_all_cpus(void)
+{
+	return 0;
+}
+
+static void watchdog_disable_all_cpus(void)
+{
+}
+
+static void set_sample_period(void)
+{
+}
+#endif /* SOFTLOCKUP */
+
+/*
+ * Suspend the hard and soft lockup detector by parking the watchdog threads.
+ */
+int lockup_detector_suspend(void)
+{
+	int ret = 0;
+
+	get_online_cpus();
+	mutex_lock(&watchdog_proc_mutex);
+	/*
+	 * Multiple suspend requests can be active in parallel (counted by
+	 * the 'watchdog_suspended' variable). If the watchdog threads are
+	 * running, the first caller takes care that they will be parked.
+	 * The state of 'watchdog_running' cannot change while a suspend
+	 * request is active (see related code in 'proc' handlers).
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		ret = watchdog_park_threads();
+
+	if (ret == 0)
+		watchdog_suspended++;
+	else {
+		watchdog_disable_all_cpus();
+		pr_err("Failed to suspend lockup detectors, disabled\n");
+		watchdog_enabled = 0;
+	}
+
+	mutex_unlock(&watchdog_proc_mutex);
+
+	return ret;
+}
+
+/*
+ * Resume the hard and soft lockup detector by unparking the watchdog threads.
+ */
+void lockup_detector_resume(void)
+{
+	mutex_lock(&watchdog_proc_mutex);
+
+	watchdog_suspended--;
+	/*
+	 * The watchdog threads are unparked if they were previously running
+	 * and if there is no more active suspend request.
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		watchdog_unpark_threads();
+
+	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
+}
+
 #ifdef CONFIG_SYSCTL
 
 /*
@@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 			 * a temporary cpumask, so we are likely not in a
 			 * position to do much else to make things better.
 			 */
+#ifdef CONFIG_SOFTLOCKUP_DETECTOR
 			if (smpboot_update_cpumask_percpu_thread(
 				    &watchdog_threads, &watchdog_cpumask) != 0)
 				pr_err("cpumask update failed\n");
+#endif
 		}
 	}
 out:
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 90d688df6ce1..295a0d84934c 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 
-/* boot commands */
-/*
- * Should we panic when a soft-lockup or hard-lockup occurs:
- */
-unsigned int __read_mostly hardlockup_panic =
-			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
 static unsigned long hardlockup_allcpu_dumped;
-/*
- * We may not want to enable hard lockup detection by default in all cases,
- * for example when running the kernel as a guest on a hypervisor. In these
- * cases this function can be called to disable hard lockup detection. This
- * function should only be executed once by the boot processor before the
- * kernel command line parameters are parsed, because otherwise it is not
- * possible to override this in hardlockup_panic_setup().
- */
-void hardlockup_detector_disable(void)
-{
-	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
-}
-
-static int __init hardlockup_panic_setup(char *str)
-{
-	if (!strncmp(str, "panic", 5))
-		hardlockup_panic = 1;
-	else if (!strncmp(str, "nopanic", 7))
-		hardlockup_panic = 0;
-	else if (!strncmp(str, "0", 1))
-		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
-	else if (!strncmp(str, "1", 1))
-		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
-	return 1;
-}
-__setup("nmi_watchdog=", hardlockup_panic_setup);
 
 void arch_touch_nmi_watchdog(void)
 {
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587ebe52c7..f87a559dd178 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
 	  The frequency of hrtimer and NMI events and the soft and hard lockup
 	  thresholds can be controlled through the sysctl watchdog_thresh.
 
+config SOFTLOCKUP_DETECTOR
+	bool "Detect Soft Lockups"
+	depends on LOCKUP_DETECTOR
+
+config HARDLOCKUP_DETECTOR_PERF
+	bool
+	select SOFTLOCKUP_DETECTOR
+
+#
+# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
+# lockup detector rather than the perf based detector.
+#
 config HARDLOCKUP_DETECTOR
-	def_bool y
-	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
-	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
+	bool "Detect Hard Lockups"
+	depends on LOCKUP_DETECTOR
+	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
+	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
+	select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
 
 config BOOTPARAM_HARDLOCKUP_PANIC
 	bool "Panic (Reboot) On Hard Lockups"
@@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
 
 config BOOTPARAM_SOFTLOCKUP_PANIC
 	bool "Panic (Reboot) On Soft Lockups"
-	depends on LOCKUP_DETECTOR
+	depends on SOFTLOCKUP_DETECTOR
 	help
 	  Say Y here to enable the kernel to panic on "soft lockups",
 	  which are bugs that cause the kernel to loop in kernel
@@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
 
 config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
 	int
-	depends on LOCKUP_DETECTOR
+	depends on SOFTLOCKUP_DETECTOR
 	range 0 1
 	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
 	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
@@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
 config DETECT_HUNG_TASK
 	bool "Detect Hung Tasks"
 	depends on DEBUG_KERNEL
-	default LOCKUP_DETECTOR
+	default SOFTLOCKUP_DETECTOR
 	help
 	  Say Y here to enable the kernel to detect "hung tasks",
 	  which are bugs that cause the task to be stuck in
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-13 16:11                 ` Nicholas Piggin
@ 2017-06-14 14:09                   ` Don Zickus
  2017-06-15  2:16                     ` Babu Moger
  0 siblings, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-14 14:09 UTC (permalink / raw)
  To: Nicholas Piggin, babu.moger; +Cc: linux-kernel, linux-arch

On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
> > Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
> > are getting close to have this defined properly which would allow us to
> > split the code up correctly in the future.
> 
> How's this for a replacement patch 3? I think the Kconfig works out much
> better now.

Hi Nick,

I think you made this much clearer, thank you!  I am good with this.


Hi Babu,

Can you give this patchset (and particularly this version of patch 3) a try
on sparc to make sure we didn't break anything?  I believe this should
resolve the start nmi watchdog on boot issue you noticed.  Thanks!

Cheers,
Don

> --
> 
> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
> 
> LOCKUP_DETECTOR implies the general boot, sysctl, and programming
> interfaces for the lockup detectors.
> 
> An architecture that wants to use a hard lockup detector must define
> HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
> 
> Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
> the minimum arch_touch_nmi_watchdog, and it otherwise does its own
> thing and does not implement the LOCKUP_DETECTOR interfaces.
> 
> sparc is unusual in that it has started to implement some of the
> interfaces, but not fully yet. It should probably be converted to
> a full HAVE_HARDLOCKUP_DETECTOR_ARCH.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/Kconfig                   |  23 ++++
>  arch/powerpc/Kconfig           |   1 +
>  arch/powerpc/kernel/setup_64.c |   2 +-
>  arch/x86/Kconfig               |   1 +
>  arch/x86/kernel/apic/hw_nmi.c  |   2 +-
>  include/linux/nmi.h            |  29 +++--
>  kernel/Makefile                |   2 +-
>  kernel/sysctl.c                |  16 +--
>  kernel/watchdog.c              | 238 ++++++++++++++++++++++++++---------------
>  kernel/watchdog_hld.c          |  32 ------
>  lib/Kconfig.debug              |  26 +++--
>  11 files changed, 231 insertions(+), 141 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 6c00e5b00f8b..878addc6f141 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
>  	  subsystem.  Also has support for calculating CPU cycle events
>  	  to determine how many clock cycles in a given period.
>  
> +
> +config HAVE_HARDLOCKUP_DETECTOR_PERF
> +	bool
> +	depends on HAVE_PERF_EVENTS_NMI
> +	help
> +	  The arch chooses to use the generic perf-NMI-based hardlockup
> +	  detector. Must define HAVE_PERF_EVENTS_NMI.
> +
> +config HAVE_NMI_WATCHDOG
> +	bool
> +	help
> +	  The arch provides a low level NMI watchdog. It provides
> +	  asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
> +
> +config HAVE_HARDLOCKUP_DETECTOR_ARCH
> +	bool
> +	select HAVE_NMI_WATCHDOG
> +	help
> +	  The arch chooses to provide its own hardlockup detector, which is
> +	  a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
> +	  interfaces and parameters provided by hardlockup detector subsystem.
> +
> +
>  config HAVE_PERF_REGS
>  	bool
>  	help
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index f7c8f9972f61..7aba96ec3378 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -202,6 +202,7 @@ config PPC
>  	select HAVE_OPTPROBES			if PPC64
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_EVENTS_NMI		if PPC64
> +	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
>  	select HAVE_RCU_TABLE_FREE		if SMP
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index f35ff9dea4fb..ab650905f75a 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
>  EXPORT_SYMBOL(ppc_pci_io);
>  #endif
>  
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>  u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  {
>  	return ppc_proc_freq * watchdog_thresh;
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4ccfacc7232a..3c084149b5d1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -157,6 +157,7 @@ config X86
>  	select HAVE_PCSPKR_PLATFORM
>  	select HAVE_PERF_EVENTS
>  	select HAVE_PERF_EVENTS_NMI
> +	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
>  	select HAVE_PERF_REGS
>  	select HAVE_PERF_USER_STACK_DUMP
>  	select HAVE_REGS_AND_STACK_ACCESS_API
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index c73c9fb281e1..d6f387780849 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -19,7 +19,7 @@
>  #include <linux/init.h>
>  #include <linux/delay.h>
>  
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>  u64 hw_nmi_get_sample_period(int watchdog_thresh)
>  {
>  	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index bd387ef8bccd..8aa01fd859fb 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -11,13 +11,21 @@
>  #endif
>  
>  #ifdef CONFIG_LOCKUP_DETECTOR
> +void lockup_detector_init(void);
> +#else
> +static inline void lockup_detector_init(void)
> +{
> +}
> +#endif
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  extern void touch_softlockup_watchdog_sched(void);
>  extern void touch_softlockup_watchdog(void);
>  extern void touch_softlockup_watchdog_sync(void);
>  extern void touch_all_softlockup_watchdogs(void);
>  extern unsigned int  softlockup_panic;
> -extern unsigned int  hardlockup_panic;
> -void lockup_detector_init(void);
> +extern int soft_watchdog_enabled;
> +extern atomic_t watchdog_park_in_progress;
>  #else
>  static inline void touch_softlockup_watchdog_sched(void)
>  {
> @@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
>  static inline void touch_all_softlockup_watchdogs(void)
>  {
>  }
> -static inline void lockup_detector_init(void)
> -{
> -}
>  #endif
>  
>  #ifdef CONFIG_DETECT_HUNG_TASK
> @@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)
>  
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>  extern void hardlockup_detector_disable(void);
> +extern unsigned int hardlockup_panic;
>  #else
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
>  
> -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  extern void arch_touch_nmi_watchdog(void);
>  #else
> +#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
>  static inline void arch_touch_nmi_watchdog(void) {}
>  #endif
> +#endif
>  
>  /**
>   * touch_nmi_watchdog - restart NMI watchdog timeout.
> @@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
>  }
>  #endif
>  
> -#ifdef CONFIG_LOCKUP_DETECTOR
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>  u64 hw_nmi_get_sample_period(int watchdog_thresh);
> +#endif
> +
> +#ifdef CONFIG_LOCKUP_DETECTOR
>  extern int nmi_watchdog_enabled;
> -extern int soft_watchdog_enabled;
>  extern int watchdog_user_enabled;
>  extern int watchdog_thresh;
>  extern unsigned long watchdog_enabled;
> +extern struct cpumask watchdog_cpumask;
>  extern unsigned long *watchdog_cpumask_bits;
> -extern atomic_t watchdog_park_in_progress;
> +extern int __read_mostly watchdog_suspended;
>  #ifdef CONFIG_SMP
>  extern int sysctl_softlockup_all_cpu_backtrace;
>  extern int sysctl_hardlockup_all_cpu_backtrace;
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 72aa080f91f0..4cb8e8b23c6e 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
>  obj-$(CONFIG_KGDB) += debug/
>  obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>  obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
> -obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>  obj-$(CONFIG_SECCOMP) += seccomp.o
>  obj-$(CONFIG_RELAY) += relay.o
>  obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 4dfba1a76cc3..46b189fd865b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
>  #endif
>  	},
>  	{
> +		.procname	= "watchdog_cpumask",
> +		.data		= &watchdog_cpumask_bits,
> +		.maxlen		= NR_CPUS,
> +		.mode		= 0644,
> +		.proc_handler	= proc_watchdog_cpumask,
> +	},
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> +	{
>  		.procname       = "soft_watchdog",
>  		.data           = &soft_watchdog_enabled,
>  		.maxlen         = sizeof (int),
> @@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
>  		.extra2		= &one,
>  	},
>  	{
> -		.procname	= "watchdog_cpumask",
> -		.data		= &watchdog_cpumask_bits,
> -		.maxlen		= NR_CPUS,
> -		.mode		= 0644,
> -		.proc_handler	= proc_watchdog_cpumask,
> -	},
> -	{
>  		.procname	= "softlockup_panic",
>  		.data		= &softlockup_panic,
>  		.maxlen		= sizeof(int),
> @@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
>  		.extra1		= &zero,
>  		.extra2		= &one,
>  	},
> +#endif
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  	{
>  		.procname	= "hardlockup_panic",
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 03e0b69bb5bf..deb010505646 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -29,15 +29,55 @@
>  #include <linux/kvm_para.h>
>  #include <linux/kthread.h>
>  
> +/* Watchdog configuration */
>  static DEFINE_MUTEX(watchdog_proc_mutex);
>  
> -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
> +int __read_mostly nmi_watchdog_enabled;
> +
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>  unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> +
> +/* boot commands */
> +/*
> + * Should we panic when a soft-lockup or hard-lockup occurs:
> + */
> +unsigned int __read_mostly hardlockup_panic =
> +			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
> +/*
> + * We may not want to enable hard lockup detection by default in all cases,
> + * for example when running the kernel as a guest on a hypervisor. In these
> + * cases this function can be called to disable hard lockup detection. This
> + * function should only be executed once by the boot processor before the
> + * kernel command line parameters are parsed, because otherwise it is not
> + * possible to override this in hardlockup_panic_setup().
> + */
> +void hardlockup_detector_disable(void)
> +{
> +	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> +}
> +
> +static int __init hardlockup_panic_setup(char *str)
> +{
> +	if (!strncmp(str, "panic", 5))
> +		hardlockup_panic = 1;
> +	else if (!strncmp(str, "nopanic", 7))
> +		hardlockup_panic = 0;
> +	else if (!strncmp(str, "0", 1))
> +		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> +	else if (!strncmp(str, "1", 1))
> +		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
> +	return 1;
> +}
> +__setup("nmi_watchdog=", hardlockup_panic_setup);
> +
>  #else
>  unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>  #endif
> -int __read_mostly nmi_watchdog_enabled;
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  int __read_mostly soft_watchdog_enabled;
> +#endif
> +
>  int __read_mostly watchdog_user_enabled;
>  int __read_mostly watchdog_thresh = 10;
>  
> @@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
>  int __read_mostly sysctl_softlockup_all_cpu_backtrace;
>  int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
>  #endif
> -static struct cpumask watchdog_cpumask __read_mostly;
> +struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>  
> -/* Helper for online, unparked cpus. */
> -#define for_each_watchdog_cpu(cpu) \
> -	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> -
> -atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
> -
>  /*
>   * The 'watchdog_running' variable is set to 1 when the watchdog threads
>   * are registered/started and is set to 0 when the watchdog threads are
> @@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
>   * of 'watchdog_running' cannot change while the watchdog is deactivated
>   * temporarily (see related code in 'proc' handlers).
>   */
> -static int __read_mostly watchdog_suspended;
> +int __read_mostly watchdog_suspended;
> +
> +/*
> + * These functions can be overridden if an architecture implements its
> + * own hardlockup detector.
> + */
> +int __weak watchdog_nmi_enable(unsigned int cpu)
> +{
> +	return 0;
> +}
> +void __weak watchdog_nmi_disable(unsigned int cpu)
> +{
> +}
> +
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
> +
> +/* Helper for online, unparked cpus. */
> +#define for_each_watchdog_cpu(cpu) \
> +	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> +
> +atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>  
>  static u64 __read_mostly sample_period;
>  
> @@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
>  	return 1;
>  }
>  __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>  static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  {
>  	sysctl_hardlockup_all_cpu_backtrace =
> @@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  }
>  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  #endif
> +#endif
>  
>  /*
>   * Hard-lockup warnings should be triggered after just a few seconds. Soft-
> @@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
>  	__this_cpu_write(watchdog_touch_ts, 0);
>  }
>  
> -/* watchdog detector functions */
> -bool is_hardlockup(void)
> -{
> -	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> -
> -	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> -		return true;
> -
> -	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> -	return false;
> -}
> -
>  static int is_softlockup(unsigned long touch_ts)
>  {
>  	unsigned long now = get_timestamp();
> @@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
>  	return 0;
>  }
>  
> -static void watchdog_interrupt_count(void)
> +/* watchdog detector functions */
> +bool is_hardlockup(void)
>  {
> -	__this_cpu_inc(hrtimer_interrupts);
> -}
> +	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>  
> -/*
> - * These two functions are mostly architecture specific
> - * defining them as weak here.
> - */
> -int __weak watchdog_nmi_enable(unsigned int cpu)
> -{
> -	return 0;
> +	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> +		return true;
> +
> +	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> +	return false;
>  }
> -void __weak watchdog_nmi_disable(unsigned int cpu)
> +
> +static void watchdog_interrupt_count(void)
>  {
> +	__this_cpu_inc(hrtimer_interrupts);
>  }
>  
>  static int watchdog_enable_all_cpus(void);
> @@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
>  		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
>  }
>  
> -/*
> - * Suspend the hard and soft lockup detector by parking the watchdog threads.
> - */
> -int lockup_detector_suspend(void)
> -{
> -	int ret = 0;
> -
> -	get_online_cpus();
> -	mutex_lock(&watchdog_proc_mutex);
> -	/*
> -	 * Multiple suspend requests can be active in parallel (counted by
> -	 * the 'watchdog_suspended' variable). If the watchdog threads are
> -	 * running, the first caller takes care that they will be parked.
> -	 * The state of 'watchdog_running' cannot change while a suspend
> -	 * request is active (see related code in 'proc' handlers).
> -	 */
> -	if (watchdog_running && !watchdog_suspended)
> -		ret = watchdog_park_threads();
> -
> -	if (ret == 0)
> -		watchdog_suspended++;
> -	else {
> -		watchdog_disable_all_cpus();
> -		pr_err("Failed to suspend lockup detectors, disabled\n");
> -		watchdog_enabled = 0;
> -	}
> -
> -	mutex_unlock(&watchdog_proc_mutex);
> -
> -	return ret;
> -}
> -
> -/*
> - * Resume the hard and soft lockup detector by unparking the watchdog threads.
> - */
> -void lockup_detector_resume(void)
> -{
> -	mutex_lock(&watchdog_proc_mutex);
> -
> -	watchdog_suspended--;
> -	/*
> -	 * The watchdog threads are unparked if they were previously running
> -	 * and if there is no more active suspend request.
> -	 */
> -	if (watchdog_running && !watchdog_suspended)
> -		watchdog_unpark_threads();
> -
> -	mutex_unlock(&watchdog_proc_mutex);
> -	put_online_cpus();
> -}
> -
>  static int update_watchdog_all_cpus(void)
>  {
>  	int ret;
> @@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
>  	}
>  }
>  
> +#else /* SOFTLOCKUP */
> +static int watchdog_park_threads(void)
> +{
> +	return 0;
> +}
> +
> +static void watchdog_unpark_threads(void)
> +{
> +}
> +
> +static int watchdog_enable_all_cpus(void)
> +{
> +	return 0;
> +}
> +
> +static void watchdog_disable_all_cpus(void)
> +{
> +}
> +
> +static void set_sample_period(void)
> +{
> +}
> +#endif /* SOFTLOCKUP */
> +
> +/*
> + * Suspend the hard and soft lockup detector by parking the watchdog threads.
> + */
> +int lockup_detector_suspend(void)
> +{
> +	int ret = 0;
> +
> +	get_online_cpus();
> +	mutex_lock(&watchdog_proc_mutex);
> +	/*
> +	 * Multiple suspend requests can be active in parallel (counted by
> +	 * the 'watchdog_suspended' variable). If the watchdog threads are
> +	 * running, the first caller takes care that they will be parked.
> +	 * The state of 'watchdog_running' cannot change while a suspend
> +	 * request is active (see related code in 'proc' handlers).
> +	 */
> +	if (watchdog_running && !watchdog_suspended)
> +		ret = watchdog_park_threads();
> +
> +	if (ret == 0)
> +		watchdog_suspended++;
> +	else {
> +		watchdog_disable_all_cpus();
> +		pr_err("Failed to suspend lockup detectors, disabled\n");
> +		watchdog_enabled = 0;
> +	}
> +
> +	mutex_unlock(&watchdog_proc_mutex);
> +
> +	return ret;
> +}
> +
> +/*
> + * Resume the hard and soft lockup detector by unparking the watchdog threads.
> + */
> +void lockup_detector_resume(void)
> +{
> +	mutex_lock(&watchdog_proc_mutex);
> +
> +	watchdog_suspended--;
> +	/*
> +	 * The watchdog threads are unparked if they were previously running
> +	 * and if there is no more active suspend request.
> +	 */
> +	if (watchdog_running && !watchdog_suspended)
> +		watchdog_unpark_threads();
> +
> +	mutex_unlock(&watchdog_proc_mutex);
> +	put_online_cpus();
> +}
> +
>  #ifdef CONFIG_SYSCTL
>  
>  /*
> @@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>  			 * a temporary cpumask, so we are likely not in a
>  			 * position to do much else to make things better.
>  			 */
> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  			if (smpboot_update_cpumask_percpu_thread(
>  				    &watchdog_threads, &watchdog_cpumask) != 0)
>  				pr_err("cpumask update failed\n");
> +#endif
>  		}
>  	}
>  out:
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 90d688df6ce1..295a0d84934c 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
>  
> -/* boot commands */
> -/*
> - * Should we panic when a soft-lockup or hard-lockup occurs:
> - */
> -unsigned int __read_mostly hardlockup_panic =
> -			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>  static unsigned long hardlockup_allcpu_dumped;
> -/*
> - * We may not want to enable hard lockup detection by default in all cases,
> - * for example when running the kernel as a guest on a hypervisor. In these
> - * cases this function can be called to disable hard lockup detection. This
> - * function should only be executed once by the boot processor before the
> - * kernel command line parameters are parsed, because otherwise it is not
> - * possible to override this in hardlockup_panic_setup().
> - */
> -void hardlockup_detector_disable(void)
> -{
> -	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> -}
> -
> -static int __init hardlockup_panic_setup(char *str)
> -{
> -	if (!strncmp(str, "panic", 5))
> -		hardlockup_panic = 1;
> -	else if (!strncmp(str, "nopanic", 7))
> -		hardlockup_panic = 0;
> -	else if (!strncmp(str, "0", 1))
> -		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
> -	else if (!strncmp(str, "1", 1))
> -		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
> -	return 1;
> -}
> -__setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  void arch_touch_nmi_watchdog(void)
>  {
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index e4587ebe52c7..f87a559dd178 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
>  	  The frequency of hrtimer and NMI events and the soft and hard lockup
>  	  thresholds can be controlled through the sysctl watchdog_thresh.
>  
> +config SOFTLOCKUP_DETECTOR
> +	bool "Detect Soft Lockups"
> +	depends on LOCKUP_DETECTOR
> +
> +config HARDLOCKUP_DETECTOR_PERF
> +	bool
> +	select SOFTLOCKUP_DETECTOR
> +
> +#
> +# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
> +# lockup detector rather than the perf based detector.
> +#
>  config HARDLOCKUP_DETECTOR
> -	def_bool y
> -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
> -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
> +	bool "Detect Hard Lockups"
> +	depends on LOCKUP_DETECTOR
> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
> +	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
> +	select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
>  
>  config BOOTPARAM_HARDLOCKUP_PANIC
>  	bool "Panic (Reboot) On Hard Lockups"
> @@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>  
>  config BOOTPARAM_SOFTLOCKUP_PANIC
>  	bool "Panic (Reboot) On Soft Lockups"
> -	depends on LOCKUP_DETECTOR
> +	depends on SOFTLOCKUP_DETECTOR
>  	help
>  	  Say Y here to enable the kernel to panic on "soft lockups",
>  	  which are bugs that cause the kernel to loop in kernel
> @@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>  
>  config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>  	int
> -	depends on LOCKUP_DETECTOR
> +	depends on SOFTLOCKUP_DETECTOR
>  	range 0 1
>  	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
>  	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
> @@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>  config DETECT_HUNG_TASK
>  	bool "Detect Hung Tasks"
>  	depends on DEBUG_KERNEL
> -	default LOCKUP_DETECTOR
> +	default SOFTLOCKUP_DETECTOR
>  	help
>  	  Say Y here to enable the kernel to detect "hung tasks",
>  	  which are bugs that cause the task to be stuck in
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-14 14:09                   ` Don Zickus
@ 2017-06-15  2:16                     ` Babu Moger
  2017-06-15  3:04                       ` Nicholas Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: Babu Moger @ 2017-06-15  2:16 UTC (permalink / raw)
  To: Don Zickus, Nicholas Piggin; +Cc: linux-kernel, linux-arch

Hi Don,

On 6/14/2017 9:09 AM, Don Zickus wrote:
> On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
>>> Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
>>> are getting close to have this defined properly which would allow us to
>>> split the code up correctly in the future.
>> How's this for a replacement patch 3? I think the Kconfig works out much
>> better now.
> Hi Nick,
>
> I think you made this much clearer, thank you!  I am good with this.
>
>
> Hi Babu,
>
> Can you give this patchset (and particularly this version of patch 3) a try
> on sparc to make sure we didn't break anything?  I believe this should
> resolve the start nmi watchdog on boot issue you noticed.  Thanks!

There is still one problem with the patch.

# cat /proc/sys/kernel/watchdog
1
# cat /proc/sys/kernel/nmi_watchdog
0

Problem is setting the initial value for  "nmi_watchdog"

We need something(or similar) patch on top to address this.
============================================
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5397c63..0105856 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -34,9 +34,13 @@

  int __read_mostly nmi_watchdog_enabled;

-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#if defined(CONFIG_HARDLOCKUP_DETECTOR) || 
defined(CONFIG_HAVE_NMI_WATCHDOG)
  unsigned long __read_mostly watchdog_enabled = 
SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
+#else
+unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
+#endif

+#ifdef CONFIG_HARDLOCKUP_DETECTOR
  /* boot commands */
  /*
   * Should we panic when a soft-lockup or hard-lockup occurs:
@@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
         return 1;
  }
  __setup("nmi_watchdog=", hardlockup_panic_setup);
-
-#else
-unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
  #endif

  #ifdef CONFIG_SOFTLOCKUP_DETECTOR


>
> Cheers,
> Don
>
>> --
>>
>> Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
>> HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
>>
>> LOCKUP_DETECTOR implies the general boot, sysctl, and programming
>> interfaces for the lockup detectors.
>>
>> An architecture that wants to use a hard lockup detector must define
>> HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
>>
>> Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides
>> the minimum arch_touch_nmi_watchdog, and it otherwise does its own
>> thing and does not implement the LOCKUP_DETECTOR interfaces.
>>
>> sparc is unusual in that it has started to implement some of the
>> interfaces, but not fully yet. It should probably be converted to
>> a full HAVE_HARDLOCKUP_DETECTOR_ARCH.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/Kconfig                   |  23 ++++
>>   arch/powerpc/Kconfig           |   1 +
>>   arch/powerpc/kernel/setup_64.c |   2 +-
>>   arch/x86/Kconfig               |   1 +
>>   arch/x86/kernel/apic/hw_nmi.c  |   2 +-
>>   include/linux/nmi.h            |  29 +++--
>>   kernel/Makefile                |   2 +-
>>   kernel/sysctl.c                |  16 +--
>>   kernel/watchdog.c              | 238 ++++++++++++++++++++++++++---------------
>>   kernel/watchdog_hld.c          |  32 ------
>>   lib/Kconfig.debug              |  26 +++--
>>   11 files changed, 231 insertions(+), 141 deletions(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 6c00e5b00f8b..878addc6f141 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -288,6 +288,29 @@ config HAVE_PERF_EVENTS_NMI
>>   	  subsystem.  Also has support for calculating CPU cycle events
>>   	  to determine how many clock cycles in a given period.
>>   
>> +
>> +config HAVE_HARDLOCKUP_DETECTOR_PERF
>> +	bool
>> +	depends on HAVE_PERF_EVENTS_NMI
>> +	help
>> +	  The arch chooses to use the generic perf-NMI-based hardlockup
>> +	  detector. Must define HAVE_PERF_EVENTS_NMI.
>> +
>> +config HAVE_NMI_WATCHDOG
>> +	bool
>> +	help
>> +	  The arch provides a low level NMI watchdog. It provides
>> +	  asm/nmi.h, and defines its own arch_touch_nmi_watchdog().
>> +
>> +config HAVE_HARDLOCKUP_DETECTOR_ARCH
>> +	bool
>> +	select HAVE_NMI_WATCHDOG
>> +	help
>> +	  The arch chooses to provide its own hardlockup detector, which is
>> +	  a superset of the HAVE_NMI_WATCHDOG. It also conforms to config
>> +	  interfaces and parameters provided by hardlockup detector subsystem.
>> +
>> +
>>   config HAVE_PERF_REGS
>>   	bool
>>   	help
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>> index f7c8f9972f61..7aba96ec3378 100644
>> --- a/arch/powerpc/Kconfig
>> +++ b/arch/powerpc/Kconfig
>> @@ -202,6 +202,7 @@ config PPC
>>   	select HAVE_OPTPROBES			if PPC64
>>   	select HAVE_PERF_EVENTS
>>   	select HAVE_PERF_EVENTS_NMI		if PPC64
>> +	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
>>   	select HAVE_PERF_REGS
>>   	select HAVE_PERF_USER_STACK_DUMP
>>   	select HAVE_RCU_TABLE_FREE		if SMP
>> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
>> index f35ff9dea4fb..ab650905f75a 100644
>> --- a/arch/powerpc/kernel/setup_64.c
>> +++ b/arch/powerpc/kernel/setup_64.c
>> @@ -727,7 +727,7 @@ struct ppc_pci_io ppc_pci_io;
>>   EXPORT_SYMBOL(ppc_pci_io);
>>   #endif
>>   
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>>   u64 hw_nmi_get_sample_period(int watchdog_thresh)
>>   {
>>   	return ppc_proc_freq * watchdog_thresh;
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 4ccfacc7232a..3c084149b5d1 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -157,6 +157,7 @@ config X86
>>   	select HAVE_PCSPKR_PLATFORM
>>   	select HAVE_PERF_EVENTS
>>   	select HAVE_PERF_EVENTS_NMI
>> +	select HAVE_HARDLOCKUP_DETECTOR_PERF	if HAVE_PERF_EVENTS_NMI
>>   	select HAVE_PERF_REGS
>>   	select HAVE_PERF_USER_STACK_DUMP
>>   	select HAVE_REGS_AND_STACK_ACCESS_API
>> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
>> index c73c9fb281e1..d6f387780849 100644
>> --- a/arch/x86/kernel/apic/hw_nmi.c
>> +++ b/arch/x86/kernel/apic/hw_nmi.c
>> @@ -19,7 +19,7 @@
>>   #include <linux/init.h>
>>   #include <linux/delay.h>
>>   
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>>   u64 hw_nmi_get_sample_period(int watchdog_thresh)
>>   {
>>   	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index bd387ef8bccd..8aa01fd859fb 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -11,13 +11,21 @@
>>   #endif
>>   
>>   #ifdef CONFIG_LOCKUP_DETECTOR
>> +void lockup_detector_init(void);
>> +#else
>> +static inline void lockup_detector_init(void)
>> +{
>> +}
>> +#endif
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>>   extern void touch_softlockup_watchdog_sched(void);
>>   extern void touch_softlockup_watchdog(void);
>>   extern void touch_softlockup_watchdog_sync(void);
>>   extern void touch_all_softlockup_watchdogs(void);
>>   extern unsigned int  softlockup_panic;
>> -extern unsigned int  hardlockup_panic;
>> -void lockup_detector_init(void);
>> +extern int soft_watchdog_enabled;
>> +extern atomic_t watchdog_park_in_progress;
>>   #else
>>   static inline void touch_softlockup_watchdog_sched(void)
>>   {
>> @@ -31,9 +39,6 @@ static inline void touch_softlockup_watchdog_sync(void)
>>   static inline void touch_all_softlockup_watchdogs(void)
>>   {
>>   }
>> -static inline void lockup_detector_init(void)
>> -{
>> -}
>>   #endif
>>   
>>   #ifdef CONFIG_DETECT_HUNG_TASK
>> @@ -63,15 +68,18 @@ static inline void reset_hung_task_detector(void)
>>   
>>   #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>>   extern void hardlockup_detector_disable(void);
>> +extern unsigned int hardlockup_panic;
>>   #else
>>   static inline void hardlockup_detector_disable(void) {}
>>   #endif
>>   
>> -#if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
>> +#if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>>   extern void arch_touch_nmi_watchdog(void);
>>   #else
>> +#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
>>   static inline void arch_touch_nmi_watchdog(void) {}
>>   #endif
>> +#endif
>>   
>>   /**
>>    * touch_nmi_watchdog - restart NMI watchdog timeout.
>> @@ -141,15 +149,18 @@ static inline bool trigger_single_cpu_backtrace(int cpu)
>>   }
>>   #endif
>>   
>> -#ifdef CONFIG_LOCKUP_DETECTOR
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_PERF
>>   u64 hw_nmi_get_sample_period(int watchdog_thresh);
>> +#endif
>> +
>> +#ifdef CONFIG_LOCKUP_DETECTOR
>>   extern int nmi_watchdog_enabled;
>> -extern int soft_watchdog_enabled;
>>   extern int watchdog_user_enabled;
>>   extern int watchdog_thresh;
>>   extern unsigned long watchdog_enabled;
>> +extern struct cpumask watchdog_cpumask;
>>   extern unsigned long *watchdog_cpumask_bits;
>> -extern atomic_t watchdog_park_in_progress;
>> +extern int __read_mostly watchdog_suspended;
>>   #ifdef CONFIG_SMP
>>   extern int sysctl_softlockup_all_cpu_backtrace;
>>   extern int sysctl_hardlockup_all_cpu_backtrace;
>> diff --git a/kernel/Makefile b/kernel/Makefile
>> index 72aa080f91f0..4cb8e8b23c6e 100644
>> --- a/kernel/Makefile
>> +++ b/kernel/Makefile
>> @@ -82,7 +82,7 @@ obj-$(CONFIG_KPROBES) += kprobes.o
>>   obj-$(CONFIG_KGDB) += debug/
>>   obj-$(CONFIG_DETECT_HUNG_TASK) += hung_task.o
>>   obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
>> -obj-$(CONFIG_HARDLOCKUP_DETECTOR) += watchdog_hld.o
>> +obj-$(CONFIG_HARDLOCKUP_DETECTOR_PERF) += watchdog_hld.o
>>   obj-$(CONFIG_SECCOMP) += seccomp.o
>>   obj-$(CONFIG_RELAY) += relay.o
>>   obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 4dfba1a76cc3..46b189fd865b 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -880,6 +880,14 @@ static struct ctl_table kern_table[] = {
>>   #endif
>>   	},
>>   	{
>> +		.procname	= "watchdog_cpumask",
>> +		.data		= &watchdog_cpumask_bits,
>> +		.maxlen		= NR_CPUS,
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_watchdog_cpumask,
>> +	},
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> +	{
>>   		.procname       = "soft_watchdog",
>>   		.data           = &soft_watchdog_enabled,
>>   		.maxlen         = sizeof (int),
>> @@ -889,13 +897,6 @@ static struct ctl_table kern_table[] = {
>>   		.extra2		= &one,
>>   	},
>>   	{
>> -		.procname	= "watchdog_cpumask",
>> -		.data		= &watchdog_cpumask_bits,
>> -		.maxlen		= NR_CPUS,
>> -		.mode		= 0644,
>> -		.proc_handler	= proc_watchdog_cpumask,
>> -	},
>> -	{
>>   		.procname	= "softlockup_panic",
>>   		.data		= &softlockup_panic,
>>   		.maxlen		= sizeof(int),
>> @@ -904,6 +905,7 @@ static struct ctl_table kern_table[] = {
>>   		.extra1		= &zero,
>>   		.extra2		= &one,
>>   	},
>> +#endif
>>   #ifdef CONFIG_HARDLOCKUP_DETECTOR
>>   	{
>>   		.procname	= "hardlockup_panic",
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 03e0b69bb5bf..deb010505646 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -29,15 +29,55 @@
>>   #include <linux/kvm_para.h>
>>   #include <linux/kthread.h>
>>   
>> +/* Watchdog configuration */
>>   static DEFINE_MUTEX(watchdog_proc_mutex);
>>   
>> -#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
>> +int __read_mostly nmi_watchdog_enabled;
>> +
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>>   unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
>> +
>> +/* boot commands */
>> +/*
>> + * Should we panic when a soft-lockup or hard-lockup occurs:
>> + */
>> +unsigned int __read_mostly hardlockup_panic =
>> +			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>> +/*
>> + * We may not want to enable hard lockup detection by default in all cases,
>> + * for example when running the kernel as a guest on a hypervisor. In these
>> + * cases this function can be called to disable hard lockup detection. This
>> + * function should only be executed once by the boot processor before the
>> + * kernel command line parameters are parsed, because otherwise it is not
>> + * possible to override this in hardlockup_panic_setup().
>> + */
>> +void hardlockup_detector_disable(void)
>> +{
>> +	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> +}
>> +
>> +static int __init hardlockup_panic_setup(char *str)
>> +{
>> +	if (!strncmp(str, "panic", 5))
>> +		hardlockup_panic = 1;
>> +	else if (!strncmp(str, "nopanic", 7))
>> +		hardlockup_panic = 0;
>> +	else if (!strncmp(str, "0", 1))
>> +		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> +	else if (!strncmp(str, "1", 1))
>> +		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
>> +	return 1;
>> +}
>> +__setup("nmi_watchdog=", hardlockup_panic_setup);
>> +
>>   #else
>>   unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>>   #endif
>> -int __read_mostly nmi_watchdog_enabled;
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>>   int __read_mostly soft_watchdog_enabled;
>> +#endif
>> +
>>   int __read_mostly watchdog_user_enabled;
>>   int __read_mostly watchdog_thresh = 10;
>>   
>> @@ -45,15 +85,9 @@ int __read_mostly watchdog_thresh = 10;
>>   int __read_mostly sysctl_softlockup_all_cpu_backtrace;
>>   int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
>>   #endif
>> -static struct cpumask watchdog_cpumask __read_mostly;
>> +struct cpumask watchdog_cpumask __read_mostly;
>>   unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>>   
>> -/* Helper for online, unparked cpus. */
>> -#define for_each_watchdog_cpu(cpu) \
>> -	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>> -
>> -atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>> -
>>   /*
>>    * The 'watchdog_running' variable is set to 1 when the watchdog threads
>>    * are registered/started and is set to 0 when the watchdog threads are
>> @@ -72,7 +106,27 @@ static int __read_mostly watchdog_running;
>>    * of 'watchdog_running' cannot change while the watchdog is deactivated
>>    * temporarily (see related code in 'proc' handlers).
>>    */
>> -static int __read_mostly watchdog_suspended;
>> +int __read_mostly watchdog_suspended;
>> +
>> +/*
>> + * These functions can be overridden if an architecture implements its
>> + * own hardlockup detector.
>> + */
>> +int __weak watchdog_nmi_enable(unsigned int cpu)
>> +{
>> +	return 0;
>> +}
>> +void __weak watchdog_nmi_disable(unsigned int cpu)
>> +{
>> +}
>> +
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>> +
>> +/* Helper for online, unparked cpus. */
>> +#define for_each_watchdog_cpu(cpu) \
>> +	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>> +
>> +atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
>>   
>>   static u64 __read_mostly sample_period;
>>   
>> @@ -120,6 +174,7 @@ static int __init softlockup_all_cpu_backtrace_setup(char *str)
>>   	return 1;
>>   }
>>   __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>>   static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>   {
>>   	sysctl_hardlockup_all_cpu_backtrace =
>> @@ -128,6 +183,7 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>   }
>>   __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>   #endif
>> +#endif
>>   
>>   /*
>>    * Hard-lockup warnings should be triggered after just a few seconds. Soft-
>> @@ -213,18 +269,6 @@ void touch_softlockup_watchdog_sync(void)
>>   	__this_cpu_write(watchdog_touch_ts, 0);
>>   }
>>   
>> -/* watchdog detector functions */
>> -bool is_hardlockup(void)
>> -{
>> -	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>> -
>> -	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
>> -		return true;
>> -
>> -	__this_cpu_write(hrtimer_interrupts_saved, hrint);
>> -	return false;
>> -}
>> -
>>   static int is_softlockup(unsigned long touch_ts)
>>   {
>>   	unsigned long now = get_timestamp();
>> @@ -237,21 +281,21 @@ static int is_softlockup(unsigned long touch_ts)
>>   	return 0;
>>   }
>>   
>> -static void watchdog_interrupt_count(void)
>> +/* watchdog detector functions */
>> +bool is_hardlockup(void)
>>   {
>> -	__this_cpu_inc(hrtimer_interrupts);
>> -}
>> +	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
>>   
>> -/*
>> - * These two functions are mostly architecture specific
>> - * defining them as weak here.
>> - */
>> -int __weak watchdog_nmi_enable(unsigned int cpu)
>> -{
>> -	return 0;
>> +	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
>> +		return true;
>> +
>> +	__this_cpu_write(hrtimer_interrupts_saved, hrint);
>> +	return false;
>>   }
>> -void __weak watchdog_nmi_disable(unsigned int cpu)
>> +
>> +static void watchdog_interrupt_count(void)
>>   {
>> +	__this_cpu_inc(hrtimer_interrupts);
>>   }
>>   
>>   static int watchdog_enable_all_cpus(void);
>> @@ -502,57 +546,6 @@ static void watchdog_unpark_threads(void)
>>   		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
>>   }
>>   
>> -/*
>> - * Suspend the hard and soft lockup detector by parking the watchdog threads.
>> - */
>> -int lockup_detector_suspend(void)
>> -{
>> -	int ret = 0;
>> -
>> -	get_online_cpus();
>> -	mutex_lock(&watchdog_proc_mutex);
>> -	/*
>> -	 * Multiple suspend requests can be active in parallel (counted by
>> -	 * the 'watchdog_suspended' variable). If the watchdog threads are
>> -	 * running, the first caller takes care that they will be parked.
>> -	 * The state of 'watchdog_running' cannot change while a suspend
>> -	 * request is active (see related code in 'proc' handlers).
>> -	 */
>> -	if (watchdog_running && !watchdog_suspended)
>> -		ret = watchdog_park_threads();
>> -
>> -	if (ret == 0)
>> -		watchdog_suspended++;
>> -	else {
>> -		watchdog_disable_all_cpus();
>> -		pr_err("Failed to suspend lockup detectors, disabled\n");
>> -		watchdog_enabled = 0;
>> -	}
>> -
>> -	mutex_unlock(&watchdog_proc_mutex);
>> -
>> -	return ret;
>> -}
>> -
>> -/*
>> - * Resume the hard and soft lockup detector by unparking the watchdog threads.
>> - */
>> -void lockup_detector_resume(void)
>> -{
>> -	mutex_lock(&watchdog_proc_mutex);
>> -
>> -	watchdog_suspended--;
>> -	/*
>> -	 * The watchdog threads are unparked if they were previously running
>> -	 * and if there is no more active suspend request.
>> -	 */
>> -	if (watchdog_running && !watchdog_suspended)
>> -		watchdog_unpark_threads();
>> -
>> -	mutex_unlock(&watchdog_proc_mutex);
>> -	put_online_cpus();
>> -}
>> -
>>   static int update_watchdog_all_cpus(void)
>>   {
>>   	int ret;
>> @@ -604,6 +597,81 @@ static void watchdog_disable_all_cpus(void)
>>   	}
>>   }
>>   
>> +#else /* SOFTLOCKUP */
>> +static int watchdog_park_threads(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void watchdog_unpark_threads(void)
>> +{
>> +}
>> +
>> +static int watchdog_enable_all_cpus(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void watchdog_disable_all_cpus(void)
>> +{
>> +}
>> +
>> +static void set_sample_period(void)
>> +{
>> +}
>> +#endif /* SOFTLOCKUP */
>> +
>> +/*
>> + * Suspend the hard and soft lockup detector by parking the watchdog threads.
>> + */
>> +int lockup_detector_suspend(void)
>> +{
>> +	int ret = 0;
>> +
>> +	get_online_cpus();
>> +	mutex_lock(&watchdog_proc_mutex);
>> +	/*
>> +	 * Multiple suspend requests can be active in parallel (counted by
>> +	 * the 'watchdog_suspended' variable). If the watchdog threads are
>> +	 * running, the first caller takes care that they will be parked.
>> +	 * The state of 'watchdog_running' cannot change while a suspend
>> +	 * request is active (see related code in 'proc' handlers).
>> +	 */
>> +	if (watchdog_running && !watchdog_suspended)
>> +		ret = watchdog_park_threads();
>> +
>> +	if (ret == 0)
>> +		watchdog_suspended++;
>> +	else {
>> +		watchdog_disable_all_cpus();
>> +		pr_err("Failed to suspend lockup detectors, disabled\n");
>> +		watchdog_enabled = 0;
>> +	}
>> +
>> +	mutex_unlock(&watchdog_proc_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +/*
>> + * Resume the hard and soft lockup detector by unparking the watchdog threads.
>> + */
>> +void lockup_detector_resume(void)
>> +{
>> +	mutex_lock(&watchdog_proc_mutex);
>> +
>> +	watchdog_suspended--;
>> +	/*
>> +	 * The watchdog threads are unparked if they were previously running
>> +	 * and if there is no more active suspend request.
>> +	 */
>> +	if (watchdog_running && !watchdog_suspended)
>> +		watchdog_unpark_threads();
>> +
>> +	mutex_unlock(&watchdog_proc_mutex);
>> +	put_online_cpus();
>> +}
>> +
>>   #ifdef CONFIG_SYSCTL
>>   
>>   /*
>> @@ -810,9 +878,11 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>>   			 * a temporary cpumask, so we are likely not in a
>>   			 * position to do much else to make things better.
>>   			 */
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR
>>   			if (smpboot_update_cpumask_percpu_thread(
>>   				    &watchdog_threads, &watchdog_cpumask) != 0)
>>   				pr_err("cpumask update failed\n");
>> +#endif
>>   		}
>>   	}
>>   out:
>> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
>> index 90d688df6ce1..295a0d84934c 100644
>> --- a/kernel/watchdog_hld.c
>> +++ b/kernel/watchdog_hld.c
>> @@ -22,39 +22,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>>   static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>>   static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
>>   
>> -/* boot commands */
>> -/*
>> - * Should we panic when a soft-lockup or hard-lockup occurs:
>> - */
>> -unsigned int __read_mostly hardlockup_panic =
>> -			CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE;
>>   static unsigned long hardlockup_allcpu_dumped;
>> -/*
>> - * We may not want to enable hard lockup detection by default in all cases,
>> - * for example when running the kernel as a guest on a hypervisor. In these
>> - * cases this function can be called to disable hard lockup detection. This
>> - * function should only be executed once by the boot processor before the
>> - * kernel command line parameters are parsed, because otherwise it is not
>> - * possible to override this in hardlockup_panic_setup().
>> - */
>> -void hardlockup_detector_disable(void)
>> -{
>> -	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> -}
>> -
>> -static int __init hardlockup_panic_setup(char *str)
>> -{
>> -	if (!strncmp(str, "panic", 5))
>> -		hardlockup_panic = 1;
>> -	else if (!strncmp(str, "nopanic", 7))
>> -		hardlockup_panic = 0;
>> -	else if (!strncmp(str, "0", 1))
>> -		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
>> -	else if (!strncmp(str, "1", 1))
>> -		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
>> -	return 1;
>> -}
>> -__setup("nmi_watchdog=", hardlockup_panic_setup);
>>   
>>   void arch_touch_nmi_watchdog(void)
>>   {
>> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
>> index e4587ebe52c7..f87a559dd178 100644
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -801,10 +801,24 @@ config LOCKUP_DETECTOR
>>   	  The frequency of hrtimer and NMI events and the soft and hard lockup
>>   	  thresholds can be controlled through the sysctl watchdog_thresh.
>>   
>> +config SOFTLOCKUP_DETECTOR
>> +	bool "Detect Soft Lockups"
>> +	depends on LOCKUP_DETECTOR
>> +
>> +config HARDLOCKUP_DETECTOR_PERF
>> +	bool
>> +	select SOFTLOCKUP_DETECTOR
>> +
>> +#
>> +# arch/ can define HAVE_HARDLOCKUP_DETECTOR_ARCH to provide their own hard
>> +# lockup detector rather than the perf based detector.
>> +#
>>   config HARDLOCKUP_DETECTOR
>> -	def_bool y
>> -	depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
>> -	depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI
>> +	bool "Detect Hard Lockups"
>> +	depends on LOCKUP_DETECTOR
>> +	depends on HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_ARCH
>> +	select HARDLOCKUP_DETECTOR_PERF if HAVE_HARDLOCKUP_DETECTOR_PERF
>> +	select HARDLOCKUP_DETECTOR_ARCH if HAVE_HARDLOCKUP_DETECTOR_ARCH
>>   
>>   config BOOTPARAM_HARDLOCKUP_PANIC
>>   	bool "Panic (Reboot) On Hard Lockups"
>> @@ -826,7 +840,7 @@ config BOOTPARAM_HARDLOCKUP_PANIC_VALUE
>>   
>>   config BOOTPARAM_SOFTLOCKUP_PANIC
>>   	bool "Panic (Reboot) On Soft Lockups"
>> -	depends on LOCKUP_DETECTOR
>> +	depends on SOFTLOCKUP_DETECTOR
>>   	help
>>   	  Say Y here to enable the kernel to panic on "soft lockups",
>>   	  which are bugs that cause the kernel to loop in kernel
>> @@ -843,7 +857,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC
>>   
>>   config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>>   	int
>> -	depends on LOCKUP_DETECTOR
>> +	depends on SOFTLOCKUP_DETECTOR
>>   	range 0 1
>>   	default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
>>   	default 1 if BOOTPARAM_SOFTLOCKUP_PANIC
>> @@ -851,7 +865,7 @@ config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
>>   config DETECT_HUNG_TASK
>>   	bool "Detect Hung Tasks"
>>   	depends on DEBUG_KERNEL
>> -	default LOCKUP_DETECTOR
>> +	default SOFTLOCKUP_DETECTOR
>>   	help
>>   	  Say Y here to enable the kernel to detect "hung tasks",
>>   	  which are bugs that cause the task to be stuck in
>> -- 
>> 2.11.0
>>

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-15  2:16                     ` Babu Moger
@ 2017-06-15  3:04                       ` Nicholas Piggin
  2017-06-15 15:14                         ` Babu Moger
  2017-06-15 15:51                         ` Don Zickus
  0 siblings, 2 replies; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-15  3:04 UTC (permalink / raw)
  To: Babu Moger; +Cc: Don Zickus, linux-kernel, linux-arch

On Wed, 14 Jun 2017 21:16:04 -0500
Babu Moger <babu.moger@oracle.com> wrote:

> Hi Don,
> 
> On 6/14/2017 9:09 AM, Don Zickus wrote:
> > On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:  
> >>> Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
> >>> are getting close to have this defined properly which would allow us to
> >>> split the code up correctly in the future.  
> >> How's this for a replacement patch 3? I think the Kconfig works out much
> >> better now.  
> > Hi Nick,
> >
> > I think you made this much clearer, thank you!  I am good with this.
> >
> >
> > Hi Babu,
> >
> > Can you give this patchset (and particularly this version of patch 3) a try
> > on sparc to make sure we didn't break anything?  I believe this should
> > resolve the start nmi watchdog on boot issue you noticed.  Thanks!  
> 
> There is still one problem with the patch.
> 
> # cat /proc/sys/kernel/watchdog
> 1
> # cat /proc/sys/kernel/nmi_watchdog
> 0
> 
> Problem is setting the initial value for  "nmi_watchdog"
> 
> We need something(or similar) patch on top to address this.
> ============================================
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5397c63..0105856 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -34,9 +34,13 @@
> 
>   int __read_mostly nmi_watchdog_enabled;
> 
> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +#if defined(CONFIG_HARDLOCKUP_DETECTOR) || 
> defined(CONFIG_HAVE_NMI_WATCHDOG)
>   unsigned long __read_mostly watchdog_enabled = 
> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
> +#else
> +unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> +#endif
> 
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>   /* boot commands */
>   /*
>    * Should we panic when a soft-lockup or hard-lockup occurs:
> @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
>          return 1;
>   }
>   __setup("nmi_watchdog=", hardlockup_panic_setup);
> -
> -#else
> -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>   #endif
> 
>   #ifdef CONFIG_SOFTLOCKUP_DETECTOR

Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
also relies on the watchdog_enabled value.

I guess I can fold your incremental patch in. I hope we could get
sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
afterwards though, so we only have 2 cases -- complete hardlockup
detector, or the very bare minimum NMI_WATCHDOG.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-15  3:04                       ` Nicholas Piggin
@ 2017-06-15 15:14                         ` Babu Moger
  2017-06-15 15:51                         ` Don Zickus
  1 sibling, 0 replies; 23+ messages in thread
From: Babu Moger @ 2017-06-15 15:14 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Don Zickus, linux-kernel, linux-arch

Nick,

On 6/14/2017 10:04 PM, Nicholas Piggin wrote:
> On Wed, 14 Jun 2017 21:16:04 -0500
> Babu Moger <babu.moger@oracle.com> wrote:
>
>> Hi Don,
>>
>> On 6/14/2017 9:09 AM, Don Zickus wrote:
>>> On Wed, Jun 14, 2017 at 02:11:18AM +1000, Nicholas Piggin wrote:
>>>>> Yeah, if you wouldn't mind.  Sorry for dragging this out, but I feel like we
>>>>> are getting close to have this defined properly which would allow us to
>>>>> split the code up correctly in the future.
>>>> How's this for a replacement patch 3? I think the Kconfig works out much
>>>> better now.
>>> Hi Nick,
>>>
>>> I think you made this much clearer, thank you!  I am good with this.
>>>
>>>
>>> Hi Babu,
>>>
>>> Can you give this patchset (and particularly this version of patch 3) a try
>>> on sparc to make sure we didn't break anything?  I believe this should
>>> resolve the start nmi watchdog on boot issue you noticed.  Thanks!
>> There is still one problem with the patch.
>>
>> # cat /proc/sys/kernel/watchdog
>> 1
>> # cat /proc/sys/kernel/nmi_watchdog
>> 0
>>
>> Problem is setting the initial value for  "nmi_watchdog"
>>
>> We need something(or similar) patch on top to address this.
>> ============================================
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 5397c63..0105856 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -34,9 +34,13 @@
>>
>>    int __read_mostly nmi_watchdog_enabled;
>>
>> -#ifdef CONFIG_HARDLOCKUP_DETECTOR
>> +#if defined(CONFIG_HARDLOCKUP_DETECTOR) ||
>> defined(CONFIG_HAVE_NMI_WATCHDOG)
>>    unsigned long __read_mostly watchdog_enabled =
>> SOFT_WATCHDOG_ENABLED|NMI_WATCHDOG_ENABLED;
>> +#else
>> +unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>> +#endif
>>
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
>>    /* boot commands */
>>    /*
>>     * Should we panic when a soft-lockup or hard-lockup occurs:
>> @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
>>           return 1;
>>    }
>>    __setup("nmi_watchdog=", hardlockup_panic_setup);
>> -
>> -#else
>> -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
>>    #endif
>>
>>    #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> also relies on the watchdog_enabled value.
>
> I guess I can fold your incremental patch in. I hope we could get
  Sure. Please go ahead.
> sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> afterwards though, so we only have 2 cases -- complete hardlockup
Sure. Sounds good. Will look at it later.
> detector, or the very bare minimum NMI_WATCHDOG.
>
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-15  3:04                       ` Nicholas Piggin
  2017-06-15 15:14                         ` Babu Moger
@ 2017-06-15 15:51                         ` Don Zickus
  2017-06-15 15:59                           ` Nicholas Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Don Zickus @ 2017-06-15 15:51 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Babu Moger, linux-kernel, linux-arch

On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> >   /* boot commands */
> >   /*
> >    * Should we panic when a soft-lockup or hard-lockup occurs:
> > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> >          return 1;
> >   }
> >   __setup("nmi_watchdog=", hardlockup_panic_setup);
> > -
> > -#else
> > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> >   #endif
> > 
> >   #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> 
> Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> also relies on the watchdog_enabled value.
> 
> I guess I can fold your incremental patch in. I hope we could get
> sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> afterwards though, so we only have 2 cases -- complete hardlockup
> detector, or the very bare minimum NMI_WATCHDOG.

Hi Nick,

I agree.  Let's move forward with this temp fix just to get things in the
kernel for initial testing.  Then follow up with a cleanup patch.  The idea
is we can always revert the cleanup patch if things still don't quite work.

Thoughts?

Cheers,
Don

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-15 15:51                         ` Don Zickus
@ 2017-06-15 15:59                           ` Nicholas Piggin
  2017-06-15 18:42                             ` Don Zickus
  0 siblings, 1 reply; 23+ messages in thread
From: Nicholas Piggin @ 2017-06-15 15:59 UTC (permalink / raw)
  To: Don Zickus; +Cc: Babu Moger, linux-kernel, linux-arch

On Thu, 15 Jun 2017 11:51:22 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > >   /* boot commands */
> > >   /*
> > >    * Should we panic when a soft-lockup or hard-lockup occurs:
> > > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > >          return 1;
> > >   }
> > >   __setup("nmi_watchdog=", hardlockup_panic_setup);
> > > -
> > > -#else
> > > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> > >   #endif
> > > 
> > >   #ifdef CONFIG_SOFTLOCKUP_DETECTOR  
> > 
> > Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> > also relies on the watchdog_enabled value.
> > 
> > I guess I can fold your incremental patch in. I hope we could get
> > sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> > afterwards though, so we only have 2 cases -- complete hardlockup
> > detector, or the very bare minimum NMI_WATCHDOG.  
> 
> Hi Nick,
> 
> I agree.  Let's move forward with this temp fix just to get things in the
> kernel for initial testing.  Then follow up with a cleanup patch.  The idea
> is we can always revert the cleanup patch if things still don't quite work.
> 
> Thoughts?

Hi Don,

Yeah that sounds good to me. Would you like me to re-test things
and resend the series?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/4] watchdog: Split up config options
  2017-06-15 15:59                           ` Nicholas Piggin
@ 2017-06-15 18:42                             ` Don Zickus
  0 siblings, 0 replies; 23+ messages in thread
From: Don Zickus @ 2017-06-15 18:42 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Babu Moger, linux-kernel, linux-arch

On Fri, Jun 16, 2017 at 01:59:00AM +1000, Nicholas Piggin wrote:
> On Thu, 15 Jun 2017 11:51:22 -0400
> Don Zickus <dzickus@redhat.com> wrote:
> 
> > On Thu, Jun 15, 2017 at 01:04:01PM +1000, Nicholas Piggin wrote:
> > > > +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> > > >   /* boot commands */
> > > >   /*
> > > >    * Should we panic when a soft-lockup or hard-lockup occurs:
> > > > @@ -69,9 +73,6 @@ static int __init hardlockup_panic_setup(char *str)
> > > >          return 1;
> > > >   }
> > > >   __setup("nmi_watchdog=", hardlockup_panic_setup);
> > > > -
> > > > -#else
> > > > -unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
> > > >   #endif
> > > > 
> > > >   #ifdef CONFIG_SOFTLOCKUP_DETECTOR  
> > > 
> > > Hmm, I guess I missed this because sparc parses nmi_watchdog=, but it
> > > also relies on the watchdog_enabled value.
> > > 
> > > I guess I can fold your incremental patch in. I hope we could get
> > > sparc quickly to adopt the complate HAVE_HARDLOCKUP_DETECTOR_ARCH soon
> > > afterwards though, so we only have 2 cases -- complete hardlockup
> > > detector, or the very bare minimum NMI_WATCHDOG.  
> > 
> > Hi Nick,
> > 
> > I agree.  Let's move forward with this temp fix just to get things in the
> > kernel for initial testing.  Then follow up with a cleanup patch.  The idea
> > is we can always revert the cleanup patch if things still don't quite work.
> > 
> > Thoughts?
> 
> Hi Don,
> 
> Yeah that sounds good to me. Would you like me to re-test things
> and resend the series?

Yes, please.  Thanks!

Cheers,
Don

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-06-15 18:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-30  1:26 [PATCH 0/4][V3] Improve watchdog config for arch watchdogs Nicholas Piggin
2017-05-30  1:26 ` [PATCH 1/4] watchdog: remove unused declaration Nicholas Piggin
2017-05-30  1:26 ` [PATCH 2/4] watchdog: Introduce arch_touch_nmi_watchdog() Nicholas Piggin
2017-05-30  1:26 ` [PATCH 3/4] watchdog: Split up config options Nicholas Piggin
2017-06-02 20:15   ` Don Zickus
2017-06-03  6:10     ` Nicholas Piggin
2017-06-06 16:49       ` Don Zickus
2017-06-07  3:50         ` Nicholas Piggin
2017-06-08 16:05           ` Don Zickus
2017-06-12  8:07             ` Nicholas Piggin
2017-06-12 20:41               ` Don Zickus
2017-06-13 16:11                 ` Nicholas Piggin
2017-06-14 14:09                   ` Don Zickus
2017-06-15  2:16                     ` Babu Moger
2017-06-15  3:04                       ` Nicholas Piggin
2017-06-15 15:14                         ` Babu Moger
2017-06-15 15:51                         ` Don Zickus
2017-06-15 15:59                           ` Nicholas Piggin
2017-06-15 18:42                             ` Don Zickus
2017-05-30  1:26 ` [PATCH 4/4] watchdog: Provide watchdog_reconfigure() for arch watchdogs Nicholas Piggin
2017-06-06 16:08 ` [PATCH 0/4][V3] Improve watchdog config " Don Zickus
2017-06-06 19:46   ` Babu Moger
2017-06-07 14:37     ` Don Zickus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.