[patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
@ 2017-08-31  7:15 Thomas Gleixner
  2017-08-31  7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
                   ` (30 more replies)
  0 siblings, 31 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:15 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

The lockup detector is broken is several ways:

    - It's deadlock prone vs. CPU hotplug in various ways. Some of these
      are due to recursive cpus_read_lock() others are due to
      cpus_read_lock() from CPU hotplug callbacks which immediately lock
      the machine because cpus are write locked.

    - The handling of the cpu hotplug threads happens sideways to the
      smpboot thread infrastructure, which is racy and pointless

    - The handling of the user space sysctl interface is a complete
      trainwreck as it fiddles directly with variables which can be
      modified or evaluated by the running watchdogs.

    - The perf event initialization is a steaming pile of duct tape as it
      idiotically tries to create perf events over and over even if perf is
      not functional (no hardware, ....). To avoid excessive dmesg spam it
      contains magic printk ratelimiting along with either wrong or useless
      messages.

    - The code structure is horrible as ifdef sections are scattered all
      over the place which makes it unreadable

    - There is more wreckage, but see the changelogs for the ugly details.

Before I get utterly grumpy, I just pretend that I don't give a sh*t!

The following series sanitizes the facility and addresses the problems.

Thanks,

	tglx
---
 arch/parisc/kernel/process.c   |    2 
 arch/powerpc/kernel/watchdog.c |   22 -
 arch/x86/events/intel/core.c   |   11 
 include/linux/nmi.h            |  121 +++----
 include/linux/smpboot.h        |    4 
 kernel/cpu.c                   |    6 
 kernel/smpboot.c               |   22 -
 kernel/sysctl.c                |   22 -
 kernel/watchdog.c              |  638 ++++++++++++++---------------------------
 kernel/watchdog_hld.c          |  193 ++++++------
 10 files changed, 433 insertions(+), 608 deletions(-)

      

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
@ 2017-08-31  7:15 ` Thomas Gleixner
  2017-09-06 16:14   ` Borislav Petkov
  2017-08-31  7:16 ` [patch 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:15 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: hardlockup_detector--Provide-interface-to-stop-start-perf-events.patch --]
[-- Type: text/plain, Size: 2065 bytes --]

From: Peter Zijlstra <peterz@infradead.org>

Provide a interface to stop and restart perf NMI watchdog events on all
CPUs. This is only useable during init and especially for handling the perf
HT bug on Intel machines. It's safe to use it this way as nothing can
start/stop the NMI watchdog in parallel.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h   |    4 ++++
 kernel/watchdog_hld.c |   41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -75,7 +75,11 @@ static inline void hardlockup_detector_d
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void arch_touch_nmi_watchdog(void);
+extern void hardlockup_detector_perf_stop(void);
+extern void hardlockup_detector_perf_restart(void);
 #else
+static inline void hardlockup_detector_perf_stop(void) { }
+static inline void hardlockup_detector_perf_restart(void) { }
 #if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -261,3 +261,44 @@ void watchdog_nmi_disable(unsigned int c
 			firstcpu_err = 0;
 	}
 }
+
+/**
+ * hardlockup_detector_perf_stop - Globally stop watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_stop(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_disable(event);
+	}
+}
+
+/**
+ * hardlockup_detector_perf_restart - Globally restart watchdog events
+ *
+ * Special interface for x86 to handle the perf HT bug.
+ */
+void __init hardlockup_detector_perf_restart(void)
+{
+	int cpu;
+
+	lockdep_assert_cpus_held();
+
+	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
+		return;
+
+	for_each_online_cpu(cpu) {
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
+
+		if (event)
+			perf_event_enable(event);
+	}
+}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 02/29] perf/x86/intel: Sanitize PMU HT bug workaround
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
  2017-08-31  7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: perf-x86-intel--Sanitize-PMU-HT-bug-workaround.patch --]
[-- Type: text/plain, Size: 1236 bytes --]

From: Peter Zijlstra <peterz@infradead.org>

The lockup_detector_suspend/resume() interface is broken in several ways
especially as it results in recursive locking of the CPU hotplug lock.

Use the new stop/restart interface in the perf NMI watchdog to temporarily
disable and reenable the already active watchdog events. That's enough to
handle it.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/events/intel/core.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4318,10 +4318,9 @@ static __init int fixup_ht_bug(void)
 		return 0;
 	}
 
-	if (lockup_detector_suspend() != 0) {
-		pr_debug("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
-		return 0;
-	}
+	cpus_read_lock();
+
+	hardlockup_detector_perf_stop();
 
 	x86_pmu.flags &= ~(PMU_FL_EXCL_CNTRS | PMU_FL_EXCL_ENABLED);
 
@@ -4329,9 +4328,7 @@ static __init int fixup_ht_bug(void)
 	x86_pmu.commit_scheduling = NULL;
 	x86_pmu.stop_scheduling = NULL;
 
-	lockup_detector_resume();
-
-	cpus_read_lock();
+	hardlockup_detector_perf_restart();
 
 	for_each_online_cpu(c)
 		free_excl_cntrs(c);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 03/29] lockup_detector: Provide interface to stop from poweroff()
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
  2017-08-31  7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
  2017-08-31  7:16 ` [patch 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Helge Deller, linux-parisc

[-- Attachment #1: lockup_detector--Provide-interface-to-stop.patch --]
[-- Type: text/plain, Size: 1778 bytes --]

PARISC has a a busy looping power off routine. If the watchdog is enabled
the watchdog timer will still fire, but the thread is not running, which
causes the softlockup watchdog to trigger.

Provide a interface which allows to turn the watchdog off.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
---
 include/linux/nmi.h |    6 +++---
 kernel/watchdog.c   |   14 +++++++++++++-
 2 files changed, 16 insertions(+), 4 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -12,10 +12,10 @@
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 void lockup_detector_init(void);
+void lockup_detector_soft_poweroff(void);
 #else
-static inline void lockup_detector_init(void)
-{
-}
+static inline void lockup_detector_init(void) { }
+static inline void lockup_detector_soft_poweroff(void) { }
 #endif
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -333,7 +333,8 @@ static enum hrtimer_restart watchdog_tim
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
 
-	if (atomic_read(&watchdog_park_in_progress) != 0)
+	if (!watchdog_enabled ||
+	    atomic_read(&watchdog_park_in_progress) != 0)
 		return HRTIMER_NORESTART;
 
 	/* kick the hardlockup detector */
@@ -660,6 +661,17 @@ static void set_sample_period(void)
 }
 #endif /* SOFTLOCKUP */
 
+/**
+ * lockup_detector_soft_poweroff - Interface to stop lockup detector(s)
+ *
+ * Special interface for parisc. It prevents lockup detector warnings from
+ * the default pm_poweroff() function which busy loops forever.
+ */
+void lockup_detector_soft_poweroff(void)
+{
+	watchdog_enabled = 0;
+}
+
 /*
  * Suspend the hard and soft lockup detector by parking the watchdog threads.
  */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 04/29] parisc: Use lockup_detector_stop()
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (2 preceding siblings ...)
  2017-08-31  7:16 ` [patch 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Helge Deller, linux-parisc

[-- Attachment #1: parisc--Use-lockup_detector_stop--.patch --]
[-- Type: text/plain, Size: 686 bytes --]

The broken lockup_detector_suspend/resume() interface is going away. Use
the new lockup_detector_soft_poweroff() interface to stop the watchdog from
the busy looping power off routine.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Helge Deller <deller@gmx.de>
Cc: linux-parisc@vger.kernel.org
---
 arch/parisc/kernel/process.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -146,7 +146,7 @@ void machine_power_off(void)
 
 	/* prevent soft lockup/stalled CPU messages for endless loop. */
 	rcu_sysrq_start();
-	lockup_detector_suspend();
+	lockup_detector_soft_poweroff();
 	for (;;);
 }
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 05/29] lockup_detector: Remove broken suspend/resume interfaces
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (3 preceding siblings ...)
  2017-08-31  7:16 ` [patch 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Remove-broken-suspend-resume-interfaces.patch --]
[-- Type: text/plain, Size: 5895 bytes --]

This interface has several issues:

 - It's causing recursive locking of the hotplug lock.

 - It's complete overkill to teardown all threads and then recreate them

The same can be achieved with the simple hardlockup_detector_perf_stop /
restart() interfaces. The abuse from the busy looping poweroff() loop of
PARISC has been solved as well.

Remove the cruft.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/powerpc/kernel/watchdog.c |    3 -
 include/linux/nmi.h            |   12 -----
 kernel/watchdog.c              |   89 -----------------------------------------
 3 files changed, 1 insertion(+), 103 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -307,9 +307,6 @@ static int start_wd_on_cpu(unsigned int
 	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
 		return 0;
 
-	if (watchdog_suspended)
-		return 0;
-
 	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
 		return 0;
 
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -164,7 +164,6 @@ extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
 extern struct cpumask watchdog_cpumask;
 extern unsigned long *watchdog_cpumask_bits;
-extern int __read_mostly watchdog_suspended;
 #ifdef CONFIG_SMP
 extern int sysctl_softlockup_all_cpu_backtrace;
 extern int sysctl_hardlockup_all_cpu_backtrace;
@@ -192,17 +191,6 @@ extern int proc_watchdog_thresh(struct c
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
-extern int lockup_detector_suspend(void);
-extern void lockup_detector_resume(void);
-#else
-static inline int lockup_detector_suspend(void)
-{
-	return 0;
-}
-
-static inline void lockup_detector_resume(void)
-{
-}
 #endif
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -97,19 +97,6 @@ unsigned long *watchdog_cpumask_bits = c
  * unregistered/stopped, so it is an indicator whether the threads exist.
  */
 static int __read_mostly watchdog_running;
-/*
- * If a subsystem has a need to deactivate the watchdog temporarily, it
- * can use the suspend/resume interface to achieve this. The content of
- * the 'watchdog_suspended' variable reflects this state. Existing threads
- * are parked/unparked by the lockup_detector_{suspend|resume} functions
- * (see comment blocks pertaining to those functions for further details).
- *
- * 'watchdog_suspended' also prevents threads from being registered/started
- * or unregistered/stopped via parameters in /proc/sys/kernel, so the state
- * of 'watchdog_running' cannot change while the watchdog is deactivated
- * temporarily (see related code in 'proc' handlers).
- */
-int __read_mostly watchdog_suspended;
 
 /*
  * These functions can be overridden if an architecture implements its
@@ -136,7 +123,6 @@ void __weak watchdog_nmi_disable(unsigne
  * - watchdog_cpumask
  * - sysctl_hardlockup_all_cpu_backtrace
  * - hardlockup_panic
- * - watchdog_suspended
  */
 void __weak watchdog_nmi_reconfigure(void)
 {
@@ -672,61 +658,6 @@ void lockup_detector_soft_poweroff(void)
 	watchdog_enabled = 0;
 }
 
-/*
- * Suspend the hard and soft lockup detector by parking the watchdog threads.
- */
-int lockup_detector_suspend(void)
-{
-	int ret = 0;
-
-	get_online_cpus();
-	mutex_lock(&watchdog_proc_mutex);
-	/*
-	 * Multiple suspend requests can be active in parallel (counted by
-	 * the 'watchdog_suspended' variable). If the watchdog threads are
-	 * running, the first caller takes care that they will be parked.
-	 * The state of 'watchdog_running' cannot change while a suspend
-	 * request is active (see related code in 'proc' handlers).
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		ret = watchdog_park_threads();
-
-	if (ret == 0)
-		watchdog_suspended++;
-	else {
-		watchdog_disable_all_cpus();
-		pr_err("Failed to suspend lockup detectors, disabled\n");
-		watchdog_enabled = 0;
-	}
-
-	watchdog_nmi_reconfigure();
-
-	mutex_unlock(&watchdog_proc_mutex);
-
-	return ret;
-}
-
-/*
- * Resume the hard and soft lockup detector by unparking the watchdog threads.
- */
-void lockup_detector_resume(void)
-{
-	mutex_lock(&watchdog_proc_mutex);
-
-	watchdog_suspended--;
-	/*
-	 * The watchdog threads are unparked if they were previously running
-	 * and if there is no more active suspend request.
-	 */
-	if (watchdog_running && !watchdog_suspended)
-		watchdog_unpark_threads();
-
-	watchdog_nmi_reconfigure();
-
-	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
-}
-
 #ifdef CONFIG_SYSCTL
 
 /*
@@ -775,12 +706,6 @@ static int proc_watchdog_common(int whic
 	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
-	if (watchdog_suspended) {
-		/* no parameter changes allowed while watchdog is suspended */
-		err = -EAGAIN;
-		goto out;
-	}
-
 	/*
 	 * If the parameter is being read return the state of the corresponding
 	 * bit(s) in 'watchdog_enabled', else update 'watchdog_enabled' and the
@@ -872,12 +797,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
-	if (watchdog_suspended) {
-		/* no parameter changes allowed while watchdog is suspended */
-		err = -EAGAIN;
-		goto out;
-	}
-
 	old = ACCESS_ONCE(watchdog_thresh);
 	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 
@@ -917,12 +836,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
-	if (watchdog_suspended) {
-		/* no parameter changes allowed while watchdog is suspended */
-		err = -EAGAIN;
-		goto out;
-	}
-
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
 	if (!err && write) {
 		/* Remove impossible cpus to keep sysctl output cleaner. */
@@ -941,7 +854,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 
 		watchdog_nmi_reconfigure();
 	}
-out:
+
 	mutex_unlock(&watchdog_proc_mutex);
 	put_online_cpus();
 	return err;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 06/29] lockup_detector: Rework cpu hotplug locking
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (4 preceding siblings ...)
  2017-08-31  7:16 ` [patch 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Rework-cpu-hotplug-locking.patch --]
[-- Type: text/plain, Size: 1672 bytes --]

The watchdog proc interface causes extensive recursive locking of the cpu
hotplug percpu rwsem, which is deadlock prone.

Replace the get/put_online_cpus() pairs with cpu_hotplug_disable()/enable()
calls for now. Later patches will remove that requirement completely.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -703,7 +703,7 @@ static int proc_watchdog_common(int whic
 	int err, old, new;
 	int *watchdog_param = (int *)table->data;
 
-	get_online_cpus();
+	cpu_hotplug_disable();
 	mutex_lock(&watchdog_proc_mutex);
 
 	/*
@@ -752,7 +752,7 @@ static int proc_watchdog_common(int whic
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
+	cpu_hotplug_enable();
 	return err;
 }
 
@@ -794,7 +794,7 @@ int proc_watchdog_thresh(struct ctl_tabl
 {
 	int err, old, new;
 
-	get_online_cpus();
+	cpu_hotplug_disable();
 	mutex_lock(&watchdog_proc_mutex);
 
 	old = ACCESS_ONCE(watchdog_thresh);
@@ -818,7 +818,7 @@ int proc_watchdog_thresh(struct ctl_tabl
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
+	cpu_hotplug_enable();
 	return err;
 }
 
@@ -833,7 +833,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 {
 	int err;
 
-	get_online_cpus();
+	cpu_hotplug_disable();
 	mutex_lock(&watchdog_proc_mutex);
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
@@ -856,7 +856,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 	}
 
 	mutex_unlock(&watchdog_proc_mutex);
-	put_online_cpus();
+	cpu_hotplug_enable();
 	return err;
 }
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 07/29] lockup_detector: Rename watchdog_proc_mutex
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (5 preceding siblings ...)
  2017-08-31  7:16 ` [patch 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Rename-watchdog_proc_mutex.patch --]
[-- Type: text/plain, Size: 1996 bytes --]

Following patches will use the mutex for other purposes as well. Rename it
as it is not longer a proc specific thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -29,8 +29,7 @@
 #include <linux/kvm_para.h>
 #include <linux/kthread.h>
 
-/* Watchdog configuration */
-static DEFINE_MUTEX(watchdog_proc_mutex);
+static DEFINE_MUTEX(watchdog_mutex);
 
 int __read_mostly nmi_watchdog_enabled;
 
@@ -704,7 +703,7 @@ static int proc_watchdog_common(int whic
 	int *watchdog_param = (int *)table->data;
 
 	cpu_hotplug_disable();
-	mutex_lock(&watchdog_proc_mutex);
+	mutex_lock(&watchdog_mutex);
 
 	/*
 	 * If the parameter is being read return the state of the corresponding
@@ -751,7 +750,7 @@ static int proc_watchdog_common(int whic
 		err = proc_watchdog_update();
 	}
 out:
-	mutex_unlock(&watchdog_proc_mutex);
+	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;
 }
@@ -795,7 +794,7 @@ int proc_watchdog_thresh(struct ctl_tabl
 	int err, old, new;
 
 	cpu_hotplug_disable();
-	mutex_lock(&watchdog_proc_mutex);
+	mutex_lock(&watchdog_mutex);
 
 	old = ACCESS_ONCE(watchdog_thresh);
 	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
@@ -817,7 +816,7 @@ int proc_watchdog_thresh(struct ctl_tabl
 		set_sample_period();
 	}
 out:
-	mutex_unlock(&watchdog_proc_mutex);
+	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;
 }
@@ -834,7 +833,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 	int err;
 
 	cpu_hotplug_disable();
-	mutex_lock(&watchdog_proc_mutex);
+	mutex_lock(&watchdog_mutex);
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
 	if (!err && write) {
@@ -855,7 +854,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 		watchdog_nmi_reconfigure();
 	}
 
-	mutex_unlock(&watchdog_proc_mutex);
+	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;
 }

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 08/29] lockup_detector: Mark hardlockup_detector_disable() __init
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (6 preceding siblings ...)
  2017-08-31  7:16 ` [patch 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Mark-hardlockup_detector_disable---__init.patch --]
[-- Type: text/plain, Size: 608 bytes --]

The function is only used by the KVM init code. Mark it __init to prevent
creative abuse.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -55,7 +55,7 @@ unsigned int __read_mostly hardlockup_pa
  * kernel command line parameters are parsed, because otherwise it is not
  * possible to override this in hardlockup_panic_setup().
  */
-void hardlockup_detector_disable(void)
+void __init hardlockup_detector_disable(void)
 {
 	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
 }

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 09/29] lockup_detector/perf: Remove broken self disable on failure
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (7 preceding siblings ...)
  2017-08-31  7:16 ` [patch 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-perf--Remove-broken-self-disable-on-failure.patch --]
[-- Type: text/plain, Size: 4750 bytes --]

The self disabling feature is broken vs. CPU hotplug locking:

CPU 0	 	   	   CPU 1
cpus_write_lock();
 cpu_up(1)
   wait_for_completion()
			   ....
			   unpark_watchdog()
			   ->unpark()
			     perf_event_create() <- fails
			       watchdog_enable &= ~NMI_WATCHDOG;
			   ....    
cpus_write_unlock();
			   CPU 2
cpus_write_lock()
 cpu_down(2)
   wait_for_completion()
			   wakeup(watchdog);
			     watchdog()
			     if (!(watchdog_enable & NMI_WATCHDOG))
    			     	watchdog_nmi_disable()
      				  perf_event_disable()
				  ....
				  cpus_read_lock();

			   stop_smpboot_threads()
			     park_watchdog();
			       wait_for_completion(watchdog->parked);

Result: End of hotplug and instantaneous full lockup of the machine.

There is a similar problem with disabling the watchdog via the user space
interface as the sysctl function fiddles with watchdog_enable directly.

It's very debatable whether this is required at all. If the watchdog works
nicely on N CPUs and it fails to enable on the N + 1 CPU either during
hotplug or because the user space interface disabled it via sysctl cpumask
and then some perf user grabbed the counter which is then unavailable for
the watchdog when the sysctl cpumask gets changed back.

There is no real justification for this.

One of the reasons WHY this is done is the utter stupidity of the init code
of the perf NMI watchdog. Instead of checking upfront at boot whether PERF
is available and functional at all, it just does this check at run time
over and over when user space fiddles with the sysctl. That's broken beyond
repair along with the idiotic error code dependent warn level printks and
the even more silly printk rate limiting.

If the init code checks whether perf works at boot time, then this mess can
be more or less avoided completely. Perf does not come magically into life
at runtime. Brain usage while coding is overrated.

Remove the cruft and add a temporary safe guard which gets removed later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c     |   15 ---------------
 kernel/watchdog_hld.c |   20 +++++++-------------
 2 files changed, 7 insertions(+), 28 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -485,21 +485,6 @@ static void watchdog(unsigned int cpu)
 	__this_cpu_write(soft_lockup_hrtimer_cnt,
 			 __this_cpu_read(hrtimer_interrupts));
 	__touch_watchdog();
-
-	/*
-	 * watchdog_nmi_enable() clears the NMI_WATCHDOG_ENABLED bit in the
-	 * failure path. Check for failures that can occur asynchronously -
-	 * for example, when CPUs are on-lined - and shut down the hardware
-	 * perf event on each CPU accordingly.
-	 *
-	 * The only non-obvious place this bit can be cleared is through
-	 * watchdog_nmi_enable(), so a pr_info() is placed there.  Placing a
-	 * pr_info here would be too noisy as it would result in a message
-	 * every few seconds if the hardlockup was disabled but the softlockup
-	 * enabled.
-	 */
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		watchdog_nmi_disable(cpu);
 }
 
 static struct smp_hotplug_thread watchdog_threads = {
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -23,6 +23,7 @@ static DEFINE_PER_CPU(bool, watchdog_nmi
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 
 static unsigned long hardlockup_allcpu_dumped;
+static bool hardlockup_detector_disabled;
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -178,6 +179,10 @@ int watchdog_nmi_enable(unsigned int cpu
 	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
 		goto out;
 
+	/* A failure disabled the hardlockup detector permanently */
+	if (hardlockup_detector_disabled)
+		return -ENODEV;
+
 	/* is it already setup and enabled? */
 	if (event && event->state > PERF_EVENT_STATE_OFF)
 		goto out;
@@ -206,18 +211,6 @@ int watchdog_nmi_enable(unsigned int cpu
 		goto out_save;
 	}
 
-	/*
-	 * Disable the hard lockup detector if _any_ CPU fails to set up
-	 * set up the hardware perf event. The watchdog() function checks
-	 * the NMI_WATCHDOG_ENABLED bit periodically.
-	 *
-	 * The barriers are for syncing up watchdog_enabled across all the
-	 * cpus, as clear_bit() does not use barriers.
-	 */
-	smp_mb__before_atomic();
-	clear_bit(NMI_WATCHDOG_ENABLED_BIT, &watchdog_enabled);
-	smp_mb__after_atomic();
-
 	/* skip displaying the same error again */
 	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
 		return PTR_ERR(event);
@@ -232,7 +225,8 @@ int watchdog_nmi_enable(unsigned int cpu
 		pr_err("disabled (cpu%i): unable to create perf event: %ld\n",
 			cpu, PTR_ERR(event));
 
-	pr_info("Shutting down hard lockup detector on all cpus\n");
+	pr_info("Disabling hard lockup detector permanently\n");
+	hardlockup_detector_disabled = true;
 
 	return PTR_ERR(event);
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (8 preceding siblings ...)
  2017-08-31  7:16 ` [patch 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-09-01 19:02   ` Don Zickus
  2017-08-31  7:16 ` [patch 11/29] lockup_detector: Remove park_in_progress hackery Thomas Gleixner
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-perf--Prevent-cpu-hotplug-deadlock.patch --]
[-- Type: text/plain, Size: 6728 bytes --]

The following deadlock is possible in the watchdog hotplug code:

  cpus_write_lock()
    ...
      takedown_cpu()
        smpboot_park_threads()
          smpboot_park_thread()
            kthread_park()
              ->park() := watchdog_disable()
                watchdog_nmi_disable()
                  perf_event_release_kernel();
                    put_event()
                      _free_event()
                        ->destroy() := hw_perf_event_destroy()
                          x86_release_hardware()
                            release_ds_buffers()
                              get_online_cpus()

when a per cpu watchdog perf event is destroyed which drops the last
reference to the PMU hardware. The cleanup code there invokes
get_online_cpus() which instantly deadlocks because the hotplug percpu
rwsem is write locked.

To solve this add a deferring mechanism:

  cpus_write_lock()
			   kthread_park()
			    watchdog_nmi_disable(deferred)
			      perf_event_disable(event);
			      move_event_to_deferred(event);
   			   ....
  cpus_write_unlock()
  cleaup_deferred_events()
    perf_event_release_kernel()

This is still properly serialized against concurrent hotplug via the
cpu_add_remove_lock, which is held by the task which initiated the hotplug
event.

This is also used to handle event destruction when the watchdog threads are
parked via other mechanisms than CPU hotplug.

Reported-by: Borislav Petkov <bp@alien8.de>
Analyzed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h   |    6 ++++++
 kernel/cpu.c          |    6 ++++++
 kernel/watchdog.c     |   24 ++++++++++++++++++++++++
 kernel/watchdog_hld.c |   34 ++++++++++++++++++++++++++++------
 4 files changed, 64 insertions(+), 6 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -13,9 +13,11 @@
 #ifdef CONFIG_LOCKUP_DETECTOR
 void lockup_detector_init(void);
 void lockup_detector_soft_poweroff(void);
+void lockup_detector_cleanup(void);
 #else
 static inline void lockup_detector_init(void) { }
 static inline void lockup_detector_soft_poweroff(void) { }
+static inline void lockup_detector_cleanup(void) { }
 #endif
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
@@ -77,9 +79,13 @@ static inline void hardlockup_detector_d
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
+extern void hardlockup_detector_perf_disable(void);
+extern void hardlockup_detector_perf_cleanup(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
+static inline void hardlockup_detector_perf_disable(void) { }
+static inline void hardlockup_detector_perf_cleanup(void) { }
 #if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static inline void arch_touch_nmi_watchdog(void) {}
 #endif
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -24,6 +24,7 @@
 #include <linux/lockdep.h>
 #include <linux/tick.h>
 #include <linux/irq.h>
+#include <linux/nmi.h>
 #include <linux/smpboot.h>
 #include <linux/relay.h>
 #include <linux/slab.h>
@@ -733,6 +734,11 @@ static int __ref _cpu_down(unsigned int
 
 out:
 	cpus_write_unlock();
+	/*
+	 * Do post unplug cleanup. This is still protected against
+	 * concurrent CPU hotplug via cpu_add_remove_lock.
+	 */
+	lockup_detector_cleanup();
 	return ret;
 }
 
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -193,6 +193,8 @@ static int __init hardlockup_all_cpu_bac
 #endif
 #endif
 
+static void __lockup_detector_cleanup(void);
+
 /*
  * Hard-lockup warnings should be triggered after just a few seconds. Soft-
  * lockups can have false positives under extreme conditions. So we generally
@@ -459,6 +461,7 @@ static void watchdog_disable(unsigned in
 	hrtimer_cancel(hrtimer);
 	/* disable the perf event */
 	watchdog_nmi_disable(cpu);
+	hardlockup_detector_perf_disable();
 }
 
 static void watchdog_cleanup(unsigned int cpu, bool online)
@@ -631,6 +634,24 @@ static void set_sample_period(void)
 }
 #endif /* SOFTLOCKUP */
 
+static void __lockup_detector_cleanup(void)
+{
+	lockdep_assert_held(&watchdog_mutex);
+	hardlockup_detector_perf_cleanup();
+}
+
+/**
+ * lockup_detector_cleanup - Cleanup after cpu hotplug or sysctl changes
+ *
+ * Caller must not hold the cpu hotplug rwsem.
+ */
+void lockup_detector_cleanup(void)
+{
+	mutex_lock(&watchdog_mutex);
+	__lockup_detector_cleanup();
+	mutex_unlock(&watchdog_mutex);
+}
+
 /**
  * lockup_detector_soft_poweroff - Interface to stop lockup detector(s)
  *
@@ -665,6 +686,8 @@ static int proc_watchdog_update(void)
 
 	watchdog_nmi_reconfigure();
 
+	__lockup_detector_cleanup();
+
 	return err;
 
 }
@@ -837,6 +860,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 		}
 
 		watchdog_nmi_reconfigure();
+		__lockup_detector_cleanup();
 	}
 
 	mutex_unlock(&watchdog_mutex);
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -21,6 +21,8 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
+static DEFINE_PER_CPU(struct perf_event *, dead_event);
+static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
 static bool hardlockup_detector_disabled;
@@ -239,16 +241,18 @@ int watchdog_nmi_enable(unsigned int cpu
 	return 0;
 }
 
-void watchdog_nmi_disable(unsigned int cpu)
+/**
+ * hardlockup_detector_perf_disable - Disable the local event
+ */
+void hardlockup_detector_perf_disable(void)
 {
-	struct perf_event *event = per_cpu(watchdog_ev, cpu);
+	struct perf_event *event = this_cpu_read(watchdog_ev);
 
 	if (event) {
 		perf_event_disable(event);
-		per_cpu(watchdog_ev, cpu) = NULL;
-
-		/* should be in cleanup, but blocks oprofile */
-		perf_event_release_kernel(event);
+		this_cpu_write(watchdog_ev, NULL);
+		this_cpu_write(dead_event, event);
+		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
 
 		/* watchdog_nmi_enable() expects this to be zero initially. */
 		if (atomic_dec_and_test(&watchdog_cpus))
@@ -257,6 +261,24 @@ void watchdog_nmi_disable(unsigned int c
 }
 
 /**
+ * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
+ *
+ * Called from lockup_detector_cleanup(). Serialized by the caller.
+ */
+void hardlockup_detector_perf_cleanup(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, &dead_events_mask) {
+		struct perf_event *event = per_cpu(dead_event, cpu);
+
+		per_cpu(dead_event, cpu) = NULL;
+		perf_event_release_kernel(event);
+	}
+	cpumask_clear(&dead_events_mask);
+}
+
+/**
  * hardlockup_detector_perf_stop - Globally stop watchdog events
  *
  * Special interface for x86 to handle the perf HT bug.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 11/29] lockup_detector: Remove park_in_progress hackery
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (9 preceding siblings ...)
  2017-08-31  7:16 ` [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
       [not found]   ` <CAEeg4=CJohPTi8FUNWqb3egsbZnExyJapcNC7wD-2amXTsMrYw@mail.gmail.com>
  2017-08-31  7:16 ` [patch 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Remove-park_in_progress-hackery.patch --]
[-- Type: text/plain, Size: 4798 bytes --]

b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded
system") tries to fix the following issue:

 watchdog_stop()
	hrtimer_cancel()
	perf_nmi_event_stop()

If the task gets preempted after canceling the hrtimer or the VM gets
scheduled out long enough, then there is a chance that the next NMI will
see a stale hrtimer interrupt count and trigger a false positive hard
lockup splat.

That commit added a complete abstrusity with a atomic variable telling the
NMI watchdog function that stop is in progress. This is so stupid, that
it's not funny anymore.

Of course the patch missed that the same issue can happen in start:

 watchdog_start()
	perf_nmi_event_start()
	hrtimer_start()

The same effect can be achieved by reversing the start/stop order:

 watchdog_stop()
	perf_nmi_event_stop()
	hrtimer_cancel()

 watchdog_start()
	hrtimer_start()
	perf_nmi_event_start()

Get rid of the nonsense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h   |    1 -
 kernel/watchdog.c     |   39 ++++++++++++++++++---------------------
 kernel/watchdog_hld.c |    7 ++-----
 3 files changed, 20 insertions(+), 27 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -27,7 +27,6 @@ extern void touch_softlockup_watchdog_sy
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
 extern int soft_watchdog_enabled;
-extern atomic_t watchdog_park_in_progress;
 #else
 static inline void touch_softlockup_watchdog_sched(void)
 {
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -134,8 +134,6 @@ void __weak watchdog_nmi_reconfigure(voi
 #define for_each_watchdog_cpu(cpu) \
 	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
 
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -320,8 +318,7 @@ static enum hrtimer_restart watchdog_tim
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
 
-	if (!watchdog_enabled ||
-	    atomic_read(&watchdog_park_in_progress) != 0)
+	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
 	/* kick the hardlockup detector */
@@ -435,33 +432,38 @@ static void watchdog_set_prio(unsigned i
 
 static void watchdog_enable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
-	/* kick off the timer for the hardlockup detector */
+	/*
+	 * Start the timer first to prevent the NMI watchdog triggering
+	 * before the timer has a chance to fire.
+	 */
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
+	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
+		      HRTIMER_MODE_REL_PINNED);
 
+	/* Initialize timestamp */
+	__touch_watchdog();
 	/* Enable the perf event */
 	watchdog_nmi_enable(cpu);
 
-	/* done here because hrtimer_start can only pin to smp_processor_id() */
-	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
-		      HRTIMER_MODE_REL_PINNED);
-
-	/* initialize timestamp */
 	watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
-	__touch_watchdog();
 }
 
 static void watchdog_disable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
 	watchdog_set_prio(SCHED_NORMAL, 0);
-	hrtimer_cancel(hrtimer);
-	/* disable the perf event */
-	watchdog_nmi_disable(cpu);
+	/*
+	 * Disable the perf event first. That prevents that a large delay
+	 * between disabling the timer and disabling the perf event causes
+	 * the perf NMI to detect a false positive.
+	 */
 	hardlockup_detector_perf_disable();
+	watchdog_nmi_disable(cpu);
+	hrtimer_cancel(hrtimer);
 }
 
 static void watchdog_cleanup(unsigned int cpu, bool online)
@@ -517,16 +519,11 @@ static int watchdog_park_threads(void)
 {
 	int cpu, ret = 0;
 
-	atomic_set(&watchdog_park_in_progress, 1);
-
 	for_each_watchdog_cpu(cpu) {
 		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
 		if (ret)
 			break;
 	}
-
-	atomic_set(&watchdog_park_in_progress, 0);
-
 	return ret;
 }
 
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,15 +106,12 @@ static struct perf_event_attr wd_hw_attr
 
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event,
-		 struct perf_sample_data *data,
-		 struct pt_regs *regs)
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
 {
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
-	if (atomic_read(&watchdog_park_in_progress) != 0)
-		return;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 12/29] lockup_detector: Cleanup stub functions
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (10 preceding siblings ...)
  2017-08-31  7:16 ` [patch 11/29] lockup_detector: Remove park_in_progress hackery Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Cleanup-stub-functions.patch --]
[-- Type: text/plain, Size: 3563 bytes --]

Having stub functions which take a full page is not helping the
readablility of code.

Condense them and move the doubled #ifdef variant into the SYSFS section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   77 ++++++++++++++++--------------------------------------
 1 file changed, 24 insertions(+), 53 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -105,13 +105,8 @@ static int __read_mostly watchdog_runnin
  * softlockup watchdog threads start and stop. The arch must select the
  * SOFTLOCKUP_DETECTOR Kconfig.
  */
-int __weak watchdog_nmi_enable(unsigned int cpu)
-{
-	return 0;
-}
-void __weak watchdog_nmi_disable(unsigned int cpu)
-{
-}
+int __weak watchdog_nmi_enable(unsigned int cpu) { return 0; }
+void __weak watchdog_nmi_disable(unsigned int cpu) { }
 
 /*
  * watchdog_nmi_reconfigure can be implemented to be notified after any
@@ -123,10 +118,7 @@ void __weak watchdog_nmi_disable(unsigne
  * - sysctl_hardlockup_all_cpu_backtrace
  * - hardlockup_panic
  */
-void __weak watchdog_nmi_reconfigure(void)
-{
-}
-
+void __weak watchdog_nmi_reconfigure(void) { }
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
@@ -134,6 +126,11 @@ void __weak watchdog_nmi_reconfigure(voi
 #define for_each_watchdog_cpu(cpu) \
 	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
 
+/* Global variables, exported for sysctl */
+unsigned int __read_mostly softlockup_panic =
+			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
+int __read_mostly soft_watchdog_enabled;
+
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -147,13 +144,9 @@ static DEFINE_PER_CPU(struct task_struct
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
 static unsigned long soft_lockup_nmi_warn;
 
-unsigned int __read_mostly softlockup_panic =
-			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
-
 static int __init softlockup_panic_setup(char *str)
 {
 	softlockup_panic = simple_strtoul(str, NULL, 0);
-
 	return 1;
 }
 __setup("softlockup_panic=", softlockup_panic_setup);
@@ -592,44 +585,13 @@ static void watchdog_disable_all_cpus(vo
 	}
 }
 
-#ifdef CONFIG_SYSCTL
-static int watchdog_update_cpus(void)
-{
-	return smpboot_update_cpumask_percpu_thread(
-		    &watchdog_threads, &watchdog_cpumask);
-}
-#endif
-
-#else /* SOFTLOCKUP */
-static int watchdog_park_threads(void)
-{
-	return 0;
-}
-
-static void watchdog_unpark_threads(void)
-{
-}
-
-static int watchdog_enable_all_cpus(void)
-{
-	return 0;
-}
-
-static void watchdog_disable_all_cpus(void)
-{
-}
-
-#ifdef CONFIG_SYSCTL
-static int watchdog_update_cpus(void)
-{
-	return 0;
-}
-#endif
-
-static void set_sample_period(void)
-{
-}
-#endif /* SOFTLOCKUP */
+#else /* CONFIG_SOFTLOCKUP_DETECTOR */
+static inline int watchdog_park_threads(void) { return 0; }
+static inline void watchdog_unpark_threads(void) { }
+static inline int watchdog_enable_all_cpus(void) { return 0; }
+static inline void watchdog_disable_all_cpus(void) { }
+static inline void set_sample_period(void) { }
+#endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
 static void __lockup_detector_cleanup(void)
 {
@@ -826,6 +788,15 @@ int proc_watchdog_thresh(struct ctl_tabl
 	return err;
 }
 
+static int watchdog_update_cpus(void)
+{
+	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR)) {
+		return smpboot_update_cpumask_percpu_thread(&watchdog_threads,
+							    &watchdog_cpumask);
+	}
+	return 0;
+}
+
 /*
  * The cpumask is the mask of possible cpus that the watchdog can run
  * on, not the mask of cpus it is actually running on.  This allows the

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 13/29] lockup_detector: Cleanup the ifdef maze
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (11 preceding siblings ...)
  2017-08-31  7:16 ` [patch 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Cleanup-the-ifdef-maze.patch --]
[-- Type: text/plain, Size: 2293 bytes --]

The #ifdef maze in this file is horrible, Group stuff at least a bit so one
can figure out what belongs to what.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -41,7 +41,6 @@ unsigned long __read_mostly watchdog_ena
 #endif
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
-/* boot commands */
 /*
  * Should we panic when a soft-lockup or hard-lockup occurs:
  */
@@ -74,19 +73,21 @@ static int __init hardlockup_panic_setup
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
 
-#endif
+# ifdef CONFIG_SMP
+int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
 
-#ifdef CONFIG_SOFTLOCKUP_DETECTOR
-int __read_mostly soft_watchdog_enabled;
-#endif
+static int __init hardlockup_all_cpu_backtrace_setup(char *str)
+{
+	sysctl_hardlockup_all_cpu_backtrace = !!simple_strtol(str, NULL, 0);
+	return 1;
+}
+__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
+# endif /* CONFIG_SMP */
+#endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 int __read_mostly watchdog_user_enabled;
 int __read_mostly watchdog_thresh = 10;
 
-#ifdef CONFIG_SMP
-int __read_mostly sysctl_softlockup_all_cpu_backtrace;
-int __read_mostly sysctl_hardlockup_all_cpu_backtrace;
-#endif
 struct cpumask watchdog_cpumask __read_mostly;
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
@@ -166,22 +167,14 @@ static int __init nosoftlockup_setup(cha
 __setup("nosoftlockup", nosoftlockup_setup);
 
 #ifdef CONFIG_SMP
+int __read_mostly sysctl_softlockup_all_cpu_backtrace;
+
 static int __init softlockup_all_cpu_backtrace_setup(char *str)
 {
-	sysctl_softlockup_all_cpu_backtrace =
-		!!simple_strtol(str, NULL, 0);
+	sysctl_softlockup_all_cpu_backtrace = !!simple_strtol(str, NULL, 0);
 	return 1;
 }
 __setup("softlockup_all_cpu_backtrace=", softlockup_all_cpu_backtrace_setup);
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
-static int __init hardlockup_all_cpu_backtrace_setup(char *str)
-{
-	sysctl_hardlockup_all_cpu_backtrace =
-		!!simple_strtol(str, NULL, 0);
-	return 1;
-}
-__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
-#endif
 #endif
 
 static void __lockup_detector_cleanup(void);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 14/29] lockup_detector: Split out cpumask write function
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (12 preceding siblings ...)
  2017-08-31  7:16 ` [patch 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Split-out-cpumask-write-function.patch --]
[-- Type: text/plain, Size: 2193 bytes --]

Split the write part of the cpumask proc handler out into a separate helper
to avoid deep indentation. This also reduces the patch complexity in the
following cleanups.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   40 +++++++++++++++++++++-------------------
 1 file changed, 21 insertions(+), 19 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -786,10 +786,29 @@ static int watchdog_update_cpus(void)
 	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR)) {
 		return smpboot_update_cpumask_percpu_thread(&watchdog_threads,
 							    &watchdog_cpumask);
+		__lockup_detector_cleanup();
 	}
 	return 0;
 }
 
+static void proc_watchdog_cpumask_update(void)
+{
+	/* Remove impossible cpus to keep sysctl output clean. */
+	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
+
+	if (watchdog_running) {
+		/*
+		 * Failure would be due to being unable to allocate a
+		 * temporary cpumask, so we are likely not in a position to
+		 * do much else to make things better.
+		 */
+		if (watchdog_update_cpus() != 0)
+			pr_err("cpumask update failed\n");
+	}
+
+	watchdog_nmi_reconfigure();
+}
+
 /*
  * The cpumask is the mask of possible cpus that the watchdog can run
  * on, not the mask of cpus it is actually running on.  This allows the
@@ -805,30 +824,13 @@ int proc_watchdog_cpumask(struct ctl_tab
 	mutex_lock(&watchdog_mutex);
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
-	if (!err && write) {
-		/* Remove impossible cpus to keep sysctl output cleaner. */
-		cpumask_and(&watchdog_cpumask, &watchdog_cpumask,
-			    cpu_possible_mask);
-
-		if (watchdog_running) {
-			/*
-			 * Failure would be due to being unable to allocate
-			 * a temporary cpumask, so we are likely not in a
-			 * position to do much else to make things better.
-			 */
-			if (watchdog_update_cpus() != 0)
-				pr_err("cpumask update failed\n");
-		}
-
-		watchdog_nmi_reconfigure();
-		__lockup_detector_cleanup();
-	}
+	if (!err && write)
+		proc_watchdog_cpumask_update();
 
 	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;
 }
-
 #endif /* CONFIG_SYSCTL */
 
 void __init lockup_detector_init(void)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 15/29] smpboot/threads: Avoid runtime allocation
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (13 preceding siblings ...)
  2017-08-31  7:16 ` [patch 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: smpboot-threads--Avoid-runtime-allocation.patch --]
[-- Type: text/plain, Size: 3818 bytes --]

smpboot_update_cpumask_threads_percpu() allocates a temporary cpumask at
runtime. This is suboptimal because the call site needs more code size for
proper error handling than a statically allocated temporary mask requires
data size.

Add static temporary cpumask. The function is globaly serialized, so no
further protection required.

Remove the half baken error handling in the watchdog code and get rid of
the export as there are no in tree modular users of that function.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/smpboot.h |    4 ++--
 kernel/smpboot.c        |   22 +++++++---------------
 kernel/watchdog.c       |   21 +++++----------------
 3 files changed, 14 insertions(+), 33 deletions(-)

--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -55,7 +55,7 @@ smpboot_register_percpu_thread(struct sm
 }
 
 void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread);
-int smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
-					 const struct cpumask *);
+void smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
+					  const struct cpumask *);
 
 #endif
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -344,39 +344,31 @@ EXPORT_SYMBOL_GPL(smpboot_unregister_per
  * by the client, but only by calling this function.
  * This function can only be called on a registered smp_hotplug_thread.
  */
-int smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
-					 const struct cpumask *new)
+void smpboot_update_cpumask_percpu_thread(struct smp_hotplug_thread *plug_thread,
+					  const struct cpumask *new)
 {
 	struct cpumask *old = plug_thread->cpumask;
-	cpumask_var_t tmp;
+	static struct cpumask tmp;
 	unsigned int cpu;
 
-	if (!alloc_cpumask_var(&tmp, GFP_KERNEL))
-		return -ENOMEM;
-
 	get_online_cpus();
 	mutex_lock(&smpboot_threads_lock);
 
 	/* Park threads that were exclusively enabled on the old mask. */
-	cpumask_andnot(tmp, old, new);
-	for_each_cpu_and(cpu, tmp, cpu_online_mask)
+	cpumask_andnot(&tmp, old, new);
+	for_each_cpu_and(cpu, &tmp, cpu_online_mask)
 		smpboot_park_thread(plug_thread, cpu);
 
 	/* Unpark threads that are exclusively enabled on the new mask. */
-	cpumask_andnot(tmp, new, old);
-	for_each_cpu_and(cpu, tmp, cpu_online_mask)
+	cpumask_andnot(&tmp, new, old);
+	for_each_cpu_and(cpu, &tmp, cpu_online_mask)
 		smpboot_unpark_thread(plug_thread, cpu);
 
 	cpumask_copy(old, new);
 
 	mutex_unlock(&smpboot_threads_lock);
 	put_online_cpus();
-
-	free_cpumask_var(tmp);
-
-	return 0;
 }
-EXPORT_SYMBOL_GPL(smpboot_update_cpumask_percpu_thread);
 
 static DEFINE_PER_CPU(atomic_t, cpu_hotplug_state) = ATOMIC_INIT(CPU_POST_DEAD);
 
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -781,31 +781,20 @@ int proc_watchdog_thresh(struct ctl_tabl
 	return err;
 }
 
-static int watchdog_update_cpus(void)
+static void watchdog_update_cpus(void)
 {
-	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR)) {
-		return smpboot_update_cpumask_percpu_thread(&watchdog_threads,
-							    &watchdog_cpumask);
+	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR) && watchdog_running) {
+		smpboot_update_cpumask_percpu_thread(&watchdog_threads,
+						     &watchdog_cpumask);
 		__lockup_detector_cleanup();
 	}
-	return 0;
 }
 
 static void proc_watchdog_cpumask_update(void)
 {
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-
-	if (watchdog_running) {
-		/*
-		 * Failure would be due to being unable to allocate a
-		 * temporary cpumask, so we are likely not in a position to
-		 * do much else to make things better.
-		 */
-		if (watchdog_update_cpus() != 0)
-			pr_err("cpumask update failed\n");
-	}
-
+	watchdog_update_cpus();
 	watchdog_nmi_reconfigure();
 }
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 16/29] lockup_detector: Create new thread handling infrastructure
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (14 preceding siblings ...)
  2017-08-31  7:16 ` [patch 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Create-new-thread-handling-infrastructure.patch --]
[-- Type: text/plain, Size: 3995 bytes --]

The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.

That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.

Another horrible mechanism are the open coded park/unpark loops which are
used for reconfiguration of the watchdog. The smpboot infrastructure allows
exactly the same via smpboot_update_cpumask_thread_percpu(), which is cpu
hotplug safe. Using that instead of the open coded loops allows to get rid
of the hotplug locking mess in the watchdog code.

Implement a clean infrastructure which allows to replace the open coded
nonsense.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -132,6 +132,9 @@ unsigned int __read_mostly softlockup_pa
 			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
 int __read_mostly soft_watchdog_enabled;
 
+struct cpumask watchdog_allowed_mask __read_mostly;
+static bool softlockup_threads_initialized __read_mostly;
+
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -578,12 +581,84 @@ static void watchdog_disable_all_cpus(vo
 	}
 }
 
+static void softlockup_update_smpboot_threads(void)
+{
+	lockdep_assert_held(&watchdog_mutex);
+
+	if (!softlockup_threads_initialized)
+		return;
+
+	smpboot_update_cpumask_percpu_thread(&watchdog_threads,
+					     &watchdog_allowed_mask);
+	__lockup_detector_cleanup();
+}
+
+/* Temporarily park all watchdog threads */
+static void softlockup_park_all_threads(void)
+{
+	cpumask_clear(&watchdog_allowed_mask);
+	softlockup_update_smpboot_threads();
+}
+
+/*
+ * Park threads which are not longer enabled and unpark threads which have
+ * been newly enabled.
+ */
+static void softlockup_update_threads(void)
+{
+	cpumask_copy(&watchdog_allowed_mask, &watchdog_cpumask);
+	softlockup_update_smpboot_threads();
+}
+
+static void softlockup_reconfigure_threads(bool enabled)
+{
+	softlockup_park_all_threads();
+	set_sample_period();
+	if (enabled)
+		softlockup_update_threads();
+}
+
+/*
+ * Create the watchdog thread infrastructure.
+ *
+ * The threads are not unparked as watchdog_allowed_mask is empty.  When
+ * the threads are sucessfully initialized, take the proper locks and
+ * unpark the threads in the watchdog_cpumask if the watchdog is enabled.
+ */
+static __init void softlockup_init_threads(void)
+{
+	int ret;
+
+	/*
+	 * If sysctl is off and watchdog got disabled on the command line,
+	 * nothing to do here.
+	 */
+	if (!IS_ENABLED(CONFIG_SYSCTL) &&
+	    !(watchdog_enabled && watchdog_thresh))
+		return;
+
+	ret = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
+						     &watchdog_allowed_mask);
+	if (ret) {
+		pr_err("Failed to initialize soft lockup detector threads\n");
+		return;
+	}
+
+	mutex_lock(&watchdog_mutex);
+	softlockup_threads_initialized = true;
+	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
+	mutex_unlock(&watchdog_mutex);
+}
+
 #else /* CONFIG_SOFTLOCKUP_DETECTOR */
 static inline int watchdog_park_threads(void) { return 0; }
 static inline void watchdog_unpark_threads(void) { }
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
 static inline void set_sample_period(void) { }
+static inline void softlockup_init_threads(void) { }
+static inline void softlockup_update_threads(void) { }
+static inline void softlockup_reconfigure_threads(bool enabled) { }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
 static void __lockup_detector_cleanup(void)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (15 preceding siblings ...)
  2017-08-31  7:16 ` [patch 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-09-01 19:08   ` Don Zickus
  2017-08-31  7:16 ` [patch 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Avoid-thread-setup-teardown.patch --]
[-- Type: text/plain, Size: 8938 bytes --]

The lockup detector reconfiguration tears down all watchdog threads when
the watchdog is disabled and sets them up again when its enabled.

That's a pointless exercise. The watchdog threads are not consuming an
insane amount of resources, so it's enough to set them up at init time and
keep them in parked position when the watchdog is disabled and unpark them
when it is reenabled. The smpboot thread infrastructure takes care of
keeping the force parked threads in place even across cpu hotplug.

Aside of that the code implements the park/unpark facility of smp hotplug
threads on its own, which is even more pointless. We have functionality in
the smpboot thread code to do so.

Use the new thread management functions and get rid of the unholy mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |  190 +++++-------------------------------------------------
 1 file changed, 19 insertions(+), 171 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,13 +92,6 @@ struct cpumask watchdog_cpumask __read_m
 unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 
 /*
- * The 'watchdog_running' variable is set to 1 when the watchdog threads
- * are registered/started and is set to 0 when the watchdog threads are
- * unregistered/stopped, so it is an indicator whether the threads exist.
- */
-static int __read_mostly watchdog_running;
-
-/*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
  *
@@ -123,10 +116,6 @@ void __weak watchdog_nmi_reconfigure(voi
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
-/* Helper for online, unparked cpus. */
-#define for_each_watchdog_cpu(cpu) \
-	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
-
 /* Global variables, exported for sysctl */
 unsigned int __read_mostly softlockup_panic =
 			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
@@ -252,11 +241,15 @@ void touch_all_softlockup_watchdogs(void
 	int cpu;
 
 	/*
-	 * this is done lockless
-	 * do we care if a 0 races with a timestamp?
-	 * all it means is the softlock check starts one cycle later
+	 * watchdog_mutex cannpt be taken here, as this might be called
+	 * from (soft)interrupt context, so the access to
+	 * watchdog_allowed_cpumask might race with a concurrent update.
+	 *
+	 * The watchdog time stamp can race against a concurrent real
+	 * update as well, the only side effect might be a cycle delay for
+	 * the softlockup check.
 	 */
-	for_each_watchdog_cpu(cpu)
+	for_each_cpu(cpu, &watchdog_allowed_mask)
 		per_cpu(watchdog_touch_ts, cpu) = 0;
 	wq_watchdog_touch(-1);
 }
@@ -296,9 +289,6 @@ static void watchdog_interrupt_count(voi
 	__this_cpu_inc(hrtimer_interrupts);
 }
 
-static int watchdog_enable_all_cpus(void);
-static void watchdog_disable_all_cpus(void);
-
 /* watchdog kicker functions */
 static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
@@ -492,95 +482,6 @@ static struct smp_hotplug_thread watchdo
 	.unpark			= watchdog_enable,
 };
 
-/*
- * park all watchdog threads that are specified in 'watchdog_cpumask'
- *
- * This function returns an error if kthread_park() of a watchdog thread
- * fails. In this situation, the watchdog threads of some CPUs can already
- * be parked and the watchdog threads of other CPUs can still be runnable.
- * Callers are expected to handle this special condition as appropriate in
- * their context.
- *
- * This function may only be called in a context that is protected against
- * races with CPU hotplug - for example, via get_online_cpus().
- */
-static int watchdog_park_threads(void)
-{
-	int cpu, ret = 0;
-
-	for_each_watchdog_cpu(cpu) {
-		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
-		if (ret)
-			break;
-	}
-	return ret;
-}
-
-/*
- * unpark all watchdog threads that are specified in 'watchdog_cpumask'
- *
- * This function may only be called in a context that is protected against
- * races with CPU hotplug - for example, via get_online_cpus().
- */
-static void watchdog_unpark_threads(void)
-{
-	int cpu;
-
-	for_each_watchdog_cpu(cpu)
-		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
-}
-
-static int update_watchdog_all_cpus(void)
-{
-	int ret;
-
-	ret = watchdog_park_threads();
-	if (ret)
-		return ret;
-
-	watchdog_unpark_threads();
-
-	return 0;
-}
-
-static int watchdog_enable_all_cpus(void)
-{
-	int err = 0;
-
-	if (!watchdog_running) {
-		err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
-							     &watchdog_cpumask);
-		if (err)
-			pr_err("Failed to create watchdog threads, disabled\n");
-		else
-			watchdog_running = 1;
-	} else {
-		/*
-		 * Enable/disable the lockup detectors or
-		 * change the sample period 'on the fly'.
-		 */
-		err = update_watchdog_all_cpus();
-
-		if (err) {
-			watchdog_disable_all_cpus();
-			pr_err("Failed to update lockup detectors, disabled\n");
-		}
-	}
-
-	if (err)
-		watchdog_enabled = 0;
-
-	return err;
-}
-
-static void watchdog_disable_all_cpus(void)
-{
-	if (watchdog_running) {
-		watchdog_running = 0;
-		smpboot_unregister_percpu_thread(&watchdog_threads);
-	}
-}
-
 static void softlockup_update_smpboot_threads(void)
 {
 	lockdep_assert_held(&watchdog_mutex);
@@ -655,7 +556,6 @@ static inline int watchdog_park_threads(
 static inline void watchdog_unpark_threads(void) { }
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
-static inline void set_sample_period(void) { }
 static inline void softlockup_init_threads(void) { }
 static inline void softlockup_update_threads(void) { }
 static inline void softlockup_reconfigure_threads(bool enabled) { }
@@ -695,28 +595,10 @@ void lockup_detector_soft_poweroff(void)
 /*
  * Update the run state of the lockup detectors.
  */
-static int proc_watchdog_update(void)
+static void proc_watchdog_update(void)
 {
-	int err = 0;
-
-	/*
-	 * Watchdog threads won't be started if they are already active.
-	 * The 'watchdog_running' variable in watchdog_*_all_cpus() takes
-	 * care of this. If those threads are already active, the sample
-	 * period will be updated and the lockup detectors will be enabled
-	 * or disabled 'on the fly'.
-	 */
-	if (watchdog_enabled && watchdog_thresh)
-		err = watchdog_enable_all_cpus();
-	else
-		watchdog_disable_all_cpus();
-
+	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
 	watchdog_nmi_reconfigure();
-
-	__lockup_detector_cleanup();
-
-	return err;
-
 }
 
 /*
@@ -772,17 +654,8 @@ static int proc_watchdog_common(int whic
 				new = old & ~which;
 		} while (cmpxchg(&watchdog_enabled, old, new) != old);
 
-		/*
-		 * Update the run state of the lockup detectors. There is _no_
-		 * need to check the value returned by proc_watchdog_update()
-		 * and to restore the previous value of 'watchdog_enabled' as
-		 * both lockup detectors are disabled if proc_watchdog_update()
-		 * returns an error.
-		 */
-		if (old == new)
-			goto out;
-
-		err = proc_watchdog_update();
+		if (old != new)
+			proc_watchdog_update();
 	}
 out:
 	mutex_unlock(&watchdog_mutex);
@@ -826,50 +699,28 @@ int proc_soft_watchdog(struct ctl_table
 int proc_watchdog_thresh(struct ctl_table *table, int write,
 			 void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int err, old, new;
+	int err, old;
 
 	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
-	old = ACCESS_ONCE(watchdog_thresh);
+	old = READ_ONCE(watchdog_thresh);
 	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 
-	if (err || !write)
-		goto out;
+	if (!err && write && old != READ_ONCE(watchdog_thresh))
+		proc_watchdog_update();
 
-	/*
-	 * Update the sample period. Restore on failure.
-	 */
-	new = ACCESS_ONCE(watchdog_thresh);
-	if (old == new)
-		goto out;
-
-	set_sample_period();
-	err = proc_watchdog_update();
-	if (err) {
-		watchdog_thresh = old;
-		set_sample_period();
-	}
-out:
 	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;
 }
 
-static void watchdog_update_cpus(void)
-{
-	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR) && watchdog_running) {
-		smpboot_update_cpumask_percpu_thread(&watchdog_threads,
-						     &watchdog_cpumask);
-		__lockup_detector_cleanup();
-	}
-}
-
 static void proc_watchdog_cpumask_update(void)
 {
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-	watchdog_update_cpus();
+
+	softlockup_update_threads();
 	watchdog_nmi_reconfigure();
 }
 
@@ -899,8 +750,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 
 void __init lockup_detector_init(void)
 {
-	set_sample_period();
-
 #ifdef CONFIG_NO_HZ_FULL
 	if (tick_nohz_full_enabled()) {
 		pr_info("Disabling watchdog on nohz_full cores by default\n");
@@ -911,6 +760,5 @@ void __init lockup_detector_init(void)
 	cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
 #endif
 
-	if (watchdog_enabled)
-		watchdog_enable_all_cpus();
+	softlockup_init_threads();
 }

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 18/29] lockup_detector: Further simplify sysctl handling
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (16 preceding siblings ...)
  2017-08-31  7:16 ` [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Further-simplify-sysfs-handling.patch --]
[-- Type: text/plain, Size: 2662 bytes --]

Use a single function to update sysctl changes. This is not a high
frequency user space interface and it's root only.

Preparatory patch to cleanup the sysctl variable handling.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   27 +++++++--------------------
 1 file changed, 7 insertions(+), 20 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -501,11 +501,8 @@ static void softlockup_park_all_threads(
 	softlockup_update_smpboot_threads();
 }
 
-/*
- * Park threads which are not longer enabled and unpark threads which have
- * been newly enabled.
- */
-static void softlockup_update_threads(void)
+/* Unpark enabled threads */
+static void softlockup_unpark_threads(void)
 {
 	cpumask_copy(&watchdog_allowed_mask, &watchdog_cpumask);
 	softlockup_update_smpboot_threads();
@@ -516,7 +513,7 @@ static void softlockup_reconfigure_threa
 	softlockup_park_all_threads();
 	set_sample_period();
 	if (enabled)
-		softlockup_update_threads();
+		softlockup_unpark_threads();
 }
 
 /*
@@ -557,7 +554,6 @@ static inline void watchdog_unpark_threa
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
 static inline void softlockup_init_threads(void) { }
-static inline void softlockup_update_threads(void) { }
 static inline void softlockup_reconfigure_threads(bool enabled) { }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
@@ -592,11 +588,11 @@ void lockup_detector_soft_poweroff(void)
 
 #ifdef CONFIG_SYSCTL
 
-/*
- * Update the run state of the lockup detectors.
- */
+/* Propagate any changes to the watchdog threads */
 static void proc_watchdog_update(void)
 {
+	/* Remove impossible cpus to keep sysctl output clean. */
+	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
 	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
 	watchdog_nmi_reconfigure();
 }
@@ -715,15 +711,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 	return err;
 }
 
-static void proc_watchdog_cpumask_update(void)
-{
-	/* Remove impossible cpus to keep sysctl output clean. */
-	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-
-	softlockup_update_threads();
-	watchdog_nmi_reconfigure();
-}
-
 /*
  * The cpumask is the mask of possible cpus that the watchdog can run
  * on, not the mask of cpus it is actually running on.  This allows the
@@ -740,7 +727,7 @@ int proc_watchdog_cpumask(struct ctl_tab
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
 	if (!err && write)
-		proc_watchdog_cpumask_update();
+		proc_watchdog_update();
 
 	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 19/29] lockup_detector: Cleanup header mess
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (17 preceding siblings ...)
  2017-08-31  7:16 ` [patch 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Cleanup-header-mess.patch --]
[-- Type: text/plain, Size: 3944 bytes --]

Having the same #ifdef in various places does not make it more
readable. Collect stuff into one place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h |   60 +++++++++++++++++++++-------------------------------
 1 file changed, 25 insertions(+), 35 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -14,11 +14,29 @@
 void lockup_detector_init(void);
 void lockup_detector_soft_poweroff(void);
 void lockup_detector_cleanup(void);
+bool is_hardlockup(void);
+
+extern int watchdog_user_enabled;
+extern int nmi_watchdog_enabled;
+extern int soft_watchdog_enabled;
+extern int watchdog_thresh;
+extern unsigned long watchdog_enabled;
+
+extern struct cpumask watchdog_cpumask;
+extern unsigned long *watchdog_cpumask_bits;
+#ifdef CONFIG_SMP
+extern int sysctl_softlockup_all_cpu_backtrace;
+extern int sysctl_hardlockup_all_cpu_backtrace;
 #else
+#define sysctl_softlockup_all_cpu_backtrace 0
+#define sysctl_hardlockup_all_cpu_backtrace 0
+#endif /* !CONFIG_SMP */
+
+#else /* CONFIG_LOCKUP_DETECTOR */
 static inline void lockup_detector_init(void) { }
 static inline void lockup_detector_soft_poweroff(void) { }
 static inline void lockup_detector_cleanup(void) { }
-#endif
+#endif /* !CONFIG_LOCKUP_DETECTOR */
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 extern void touch_softlockup_watchdog_sched(void);
@@ -26,28 +44,17 @@ extern void touch_softlockup_watchdog(vo
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
-extern int soft_watchdog_enabled;
 #else
-static inline void touch_softlockup_watchdog_sched(void)
-{
-}
-static inline void touch_softlockup_watchdog(void)
-{
-}
-static inline void touch_softlockup_watchdog_sync(void)
-{
-}
-static inline void touch_all_softlockup_watchdogs(void)
-{
-}
+static inline void touch_softlockup_watchdog_sched(void) { }
+static inline void touch_softlockup_watchdog(void) { }
+static inline void touch_softlockup_watchdog_sync(void) { }
+static inline void touch_all_softlockup_watchdogs(void) { }
 #endif
 
 #ifdef CONFIG_DETECT_HUNG_TASK
 void reset_hung_task_detector(void);
 #else
-static inline void reset_hung_task_detector(void)
-{
-}
+static inline void reset_hung_task_detector(void) { }
 #endif
 
 /*
@@ -92,7 +99,7 @@ static inline void arch_touch_nmi_watchd
 
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
- * 
+ *
  * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
@@ -162,21 +169,6 @@ static inline bool trigger_single_cpu_ba
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
 #endif
 
-#ifdef CONFIG_LOCKUP_DETECTOR
-extern int nmi_watchdog_enabled;
-extern int watchdog_user_enabled;
-extern int watchdog_thresh;
-extern unsigned long watchdog_enabled;
-extern struct cpumask watchdog_cpumask;
-extern unsigned long *watchdog_cpumask_bits;
-#ifdef CONFIG_SMP
-extern int sysctl_softlockup_all_cpu_backtrace;
-extern int sysctl_hardlockup_all_cpu_backtrace;
-#else
-#define sysctl_softlockup_all_cpu_backtrace 0
-#define sysctl_hardlockup_all_cpu_backtrace 0
-#endif
-
 #if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
     defined(CONFIG_HARDLOCKUP_DETECTOR)
 void watchdog_update_hrtimer_threshold(u64 period);
@@ -184,7 +176,6 @@ void watchdog_update_hrtimer_threshold(u
 static inline void watchdog_update_hrtimer_threshold(u64 period) { }
 #endif
 
-extern bool is_hardlockup(void);
 struct ctl_table;
 extern int proc_watchdog(struct ctl_table *, int ,
 			 void __user *, size_t *, loff_t *);
@@ -196,7 +187,6 @@ extern int proc_watchdog_thresh(struct c
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
-#endif
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 #include <asm/nmi.h>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 20/29] lockup_detector/sysctl: Get rid of the ifdeffery
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (18 preceding siblings ...)
  2017-08-31  7:16 ` [patch 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-sysctl--Get-rid-of-the-ifdeffery.patch --]
[-- Type: text/plain, Size: 1524 bytes --]

The sysctl of the nmi_watchdog file prevents writes by setting
    min = max = 0

if none of the users is enabled. That involves ifdeffery and is competely
non obvious.

If none of the facilities is enabeld, then the file can simply be made read
only. Move the ifdeffery into the header and use a constant for file
permissions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h |    6 ++++++
 kernel/sysctl.c     |    6 +-----
 2 files changed, 7 insertions(+), 5 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -81,6 +81,12 @@ extern unsigned int hardlockup_panic;
 static inline void hardlockup_detector_disable(void) {}
 #endif
 
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+# define NMI_WATCHDOG_SYSCTL_PERM	0644
+#else
+# define NMI_WATCHDOG_SYSCTL_PERM	0444
+#endif
+
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 extern void arch_touch_nmi_watchdog(void);
 extern void hardlockup_detector_perf_stop(void);
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -891,14 +891,10 @@ static struct ctl_table kern_table[] = {
 		.procname       = "nmi_watchdog",
 		.data           = &nmi_watchdog_enabled,
 		.maxlen         = sizeof (int),
-		.mode           = 0644,
+		.mode		= NMI_WATCHDOG_SYSCTL_PERM,
 		.proc_handler   = proc_nmi_watchdog,
 		.extra1		= &zero,
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 		.extra2		= &one,
-#else
-		.extra2		= &zero,
-#endif
 	},
 	{
 		.procname	= "watchdog_cpumask",

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 21/29] lockup_detector: Cleanup sysctl variable name space
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (19 preceding siblings ...)
  2017-08-31  7:16 ` [patch 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Cleanup-variable-name-space.patch --]
[-- Type: text/plain, Size: 6834 bytes --]

Reflect that these variables are user interface related and remove the
whitespace damage in the sysctl table while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h |   16 ++++++++--------
 kernel/sysctl.c     |   16 ++++++++--------
 kernel/watchdog.c   |   41 ++++++++++++++++++++---------------------
 3 files changed, 36 insertions(+), 37 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -17,8 +17,8 @@ void lockup_detector_cleanup(void);
 bool is_hardlockup(void);
 
 extern int watchdog_user_enabled;
-extern int nmi_watchdog_enabled;
-extern int soft_watchdog_enabled;
+extern int nmi_watchdog_user_enabled;
+extern int soft_watchdog_user_enabled;
 extern int watchdog_thresh;
 extern unsigned long watchdog_enabled;
 
@@ -62,12 +62,12 @@ static inline void reset_hung_task_detec
  * 'watchdog_enabled' variable. Each lockup detector has its dedicated bit -
  * bit 0 for the hard lockup detector and bit 1 for the soft lockup detector.
  *
- * 'watchdog_user_enabled', 'nmi_watchdog_enabled' and 'soft_watchdog_enabled'
- * are variables that are only used as an 'interface' between the parameters
- * in /proc/sys/kernel and the internal state bits in 'watchdog_enabled'. The
- * 'watchdog_thresh' variable is handled differently because its value is not
- * boolean, and the lockup detectors are 'suspended' while 'watchdog_thresh'
- * is equal zero.
+ * 'watchdog_user_enabled', 'nmi_watchdog_user_enabled' and
+ * 'soft_watchdog_user_enabled' are variables that are only used as an
+ * 'interface' between the parameters in /proc/sys/kernel and the internal
+ * state bits in 'watchdog_enabled'. The 'watchdog_thresh' variable is
+ * handled differently because its value is not boolean, and the lockup
+ * detectors are 'suspended' while 'watchdog_thresh' is equal zero.
  */
 #define NMI_WATCHDOG_ENABLED_BIT   0
 #define SOFT_WATCHDOG_ENABLED_BIT  1
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -871,9 +871,9 @@ static struct ctl_table kern_table[] = {
 #if defined(CONFIG_LOCKUP_DETECTOR)
 	{
 		.procname       = "watchdog",
-		.data           = &watchdog_user_enabled,
-		.maxlen         = sizeof (int),
-		.mode           = 0644,
+		.data		= &watchdog_user_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
 		.proc_handler   = proc_watchdog,
 		.extra1		= &zero,
 		.extra2		= &one,
@@ -889,8 +889,8 @@ static struct ctl_table kern_table[] = {
 	},
 	{
 		.procname       = "nmi_watchdog",
-		.data           = &nmi_watchdog_enabled,
-		.maxlen         = sizeof (int),
+		.data		= &nmi_watchdog_user_enabled,
+		.maxlen		= sizeof(int),
 		.mode		= NMI_WATCHDOG_SYSCTL_PERM,
 		.proc_handler   = proc_nmi_watchdog,
 		.extra1		= &zero,
@@ -906,9 +906,9 @@ static struct ctl_table kern_table[] = {
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 	{
 		.procname       = "soft_watchdog",
-		.data           = &soft_watchdog_enabled,
-		.maxlen         = sizeof (int),
-		.mode           = 0644,
+		.data		= &soft_watchdog_user_enabled,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
 		.proc_handler   = proc_soft_watchdog,
 		.extra1		= &zero,
 		.extra2		= &one,
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -31,8 +31,6 @@
 
 static DEFINE_MUTEX(watchdog_mutex);
 
-int __read_mostly nmi_watchdog_enabled;
-
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED |
 						NMI_WATCHDOG_ENABLED;
@@ -40,6 +38,17 @@ unsigned long __read_mostly watchdog_ena
 unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
 #endif
 
+int __read_mostly nmi_watchdog_user_enabled;
+int __read_mostly soft_watchdog_user_enabled;
+int __read_mostly watchdog_user_enabled;
+int __read_mostly watchdog_thresh = 10;
+
+struct cpumask watchdog_allowed_mask __read_mostly;
+static bool softlockup_threads_initialized __read_mostly;
+
+struct cpumask watchdog_cpumask __read_mostly;
+unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
+
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 /*
  * Should we panic when a soft-lockup or hard-lockup occurs:
@@ -85,12 +94,6 @@ static int __init hardlockup_all_cpu_bac
 # endif /* CONFIG_SMP */
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
-int __read_mostly watchdog_user_enabled;
-int __read_mostly watchdog_thresh = 10;
-
-struct cpumask watchdog_cpumask __read_mostly;
-unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
-
 /*
  * These functions can be overridden if an architecture implements its
  * own hardlockup detector.
@@ -106,7 +109,7 @@ void __weak watchdog_nmi_disable(unsigne
  * watchdog_nmi_reconfigure can be implemented to be notified after any
  * watchdog configuration change. The arch hardlockup watchdog should
  * respond to the following variables:
- * - nmi_watchdog_enabled
+ * - watchdog_enabled
  * - watchdog_thresh
  * - watchdog_cpumask
  * - sysctl_hardlockup_all_cpu_backtrace
@@ -119,10 +122,6 @@ void __weak watchdog_nmi_reconfigure(voi
 /* Global variables, exported for sysctl */
 unsigned int __read_mostly softlockup_panic =
 			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
-int __read_mostly soft_watchdog_enabled;
-
-struct cpumask watchdog_allowed_mask __read_mostly;
-static bool softlockup_threads_initialized __read_mostly;
 
 static u64 __read_mostly sample_period;
 
@@ -600,14 +599,14 @@ static void proc_watchdog_update(void)
 /*
  * common function for watchdog, nmi_watchdog and soft_watchdog parameter
  *
- * caller             | table->data points to | 'which' contains the flag(s)
- * -------------------|-----------------------|-----------------------------
- * proc_watchdog      | watchdog_user_enabled | NMI_WATCHDOG_ENABLED or'ed
- *                    |                       | with SOFT_WATCHDOG_ENABLED
- * -------------------|-----------------------|-----------------------------
- * proc_nmi_watchdog  | nmi_watchdog_enabled  | NMI_WATCHDOG_ENABLED
- * -------------------|-----------------------|-----------------------------
- * proc_soft_watchdog | soft_watchdog_enabled | SOFT_WATCHDOG_ENABLED
+ * caller             | table->data points to      | 'which'
+ * -------------------|----------------------------|--------------------------
+ * proc_watchdog      | watchdog_user_enabled      | NMI_WATCHDOG_ENABLED |
+ *                    |                            | SOFT_WATCHDOG_ENABLED
+ * -------------------|----------------------------|--------------------------
+ * proc_nmi_watchdog  | nmi_watchdog_user_enabled  | NMI_WATCHDOG_ENABLED
+ * -------------------|----------------------------|--------------------------
+ * proc_soft_watchdog | soft_watchdog_user_enabled | SOFT_WATCHDOG_ENABLED
  */
 static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (20 preceding siblings ...)
  2017-08-31  7:16 ` [patch 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Benjamin Herrenschmidt, Michael Ellerman,
	linuxppc-dev

[-- Attachment #1: lockup_detector--Make-watchdog_nmi_reconfigure---two-stage.patch --]
[-- Type: text/plain, Size: 4055 bytes --]

Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure()
need to be done in two steps.

     1) Stop all NMIs
     2) Read the new parameters and start NMIs

Right now watchdog_nmi_reconfigure() is a combination of both. To allow a
clean reconfiguration add a 'run' argument and split the functionality in
powerpc.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/watchdog.c |   17 +++++++++--------
 include/linux/nmi.h            |    2 ++
 kernel/watchdog.c              |   31 ++++++++++++++++++++++---------
 3 files changed, 33 insertions(+), 17 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -352,17 +352,18 @@ static void watchdog_calc_timeouts(void)
 	wd_timer_period_ms = watchdog_thresh * 1000 * 2 / 5;
 }
 
-void watchdog_nmi_reconfigure(void)
+void watchdog_nmi_reconfigure(bool stop)
 {
 	int cpu;
 
-	watchdog_calc_timeouts();
-
-	for_each_cpu(cpu, &wd_cpus_enabled)
-		stop_wd_on_cpu(cpu);
-
-	for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
-		start_wd_on_cpu(cpu);
+	if (stop) {
+		for_each_cpu(cpu, &wd_cpus_enabled)
+			stop_wd_on_cpu(cpu);
+	} else {
+		watchdog_calc_timeouts();
+		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
+			start_wd_on_cpu(cpu);
+	}
 }
 
 /*
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -103,6 +103,8 @@ static inline void arch_touch_nmi_watchd
 #endif
 #endif
 
+void watchdog_nmi_reconfigure(bool run);
+
 /**
  * touch_nmi_watchdog - restart NMI watchdog timeout.
  *
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -105,17 +105,25 @@ static int __init hardlockup_all_cpu_bac
 int __weak watchdog_nmi_enable(unsigned int cpu) { return 0; }
 void __weak watchdog_nmi_disable(unsigned int cpu) { }
 
-/*
- * watchdog_nmi_reconfigure can be implemented to be notified after any
- * watchdog configuration change. The arch hardlockup watchdog should
- * respond to the following variables:
+/**
+ * watchdog_nmi_reconfigure - Optional function to reconfigure NMI watchdogs
+ * @stop:	If true stop the watchdogs on all enabled CPUs
+ *		If false start the watchdogs on all enabled CPUs
+ *
+ * The core call order is:
+ * watchdog_nmi_reconfigure(true);
+ * update_variables();
+ * watchdog_nmi_reconfigure(false);
+ *
+ * The second call which starts the watchdogs again guarantees that the
+ * following variables are stable across the call.
  * - watchdog_enabled
  * - watchdog_thresh
  * - watchdog_cpumask
- * - sysctl_hardlockup_all_cpu_backtrace
- * - hardlockup_panic
+ *
+ * After the call the variables can be changed again.
  */
-void __weak watchdog_nmi_reconfigure(void) { }
+void __weak watchdog_nmi_reconfigure(bool stop) { }
 
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
@@ -509,10 +517,12 @@ static void softlockup_unpark_threads(vo
 
 static void softlockup_reconfigure_threads(bool enabled)
 {
+	watchdog_nmi_reconfigure(false);
 	softlockup_park_all_threads();
 	set_sample_period();
 	if (enabled)
 		softlockup_unpark_threads();
+	watchdog_nmi_reconfigure(true);
 }
 
 /*
@@ -553,7 +563,11 @@ static inline void watchdog_unpark_threa
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
 static inline void softlockup_init_threads(void) { }
-static inline void softlockup_reconfigure_threads(bool enabled) { }
+static void softlockup_reconfigure_threads(bool enabled)
+{
+	watchdog_nmi_reconfigure(false);
+	watchdog_nmi_reconfigure(true);
+}
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
 
 static void __lockup_detector_cleanup(void)
@@ -593,7 +607,6 @@ static void proc_watchdog_update(void)
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
 	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
-	watchdog_nmi_reconfigure();
 }
 
 /*

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 23/29] lockup_detector: Get rid of the racy update loop
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (21 preceding siblings ...)
  2017-08-31  7:16 ` [patch 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Get-rid-of-the-racy-update-loop.patch --]
[-- Type: text/plain, Size: 6905 bytes --]

Letting user space poke directly at variables which are used at run time is
stupid and causes a lot of race conditions and other issues.

Seperate the user variables and on change invoke the reconfiguration, which
then stops the watchdogs, reevaluates the new user value and restarts the
watchdogs with the new parameters.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   95 ++++++++++++++++++++++++++----------------------------
 1 file changed, 47 insertions(+), 48 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -32,15 +32,17 @@
 static DEFINE_MUTEX(watchdog_mutex);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR) || defined(CONFIG_HAVE_NMI_WATCHDOG)
-unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED |
-						NMI_WATCHDOG_ENABLED;
+# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED | NMI_WATCHDOG_ENABLED)
+# define NMI_WATCHDOG_DEFAULT	1
 #else
-unsigned long __read_mostly watchdog_enabled = SOFT_WATCHDOG_ENABLED;
+# define WATCHDOG_DEFAULT	(SOFT_WATCHDOG_ENABLED)
+# define NMI_WATCHDOG_DEFAULT	0
 #endif
 
-int __read_mostly nmi_watchdog_user_enabled;
-int __read_mostly soft_watchdog_user_enabled;
-int __read_mostly watchdog_user_enabled;
+unsigned long __read_mostly watchdog_enabled;
+int __read_mostly watchdog_user_enabled = 1;
+int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
+int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
 
 struct cpumask watchdog_allowed_mask __read_mostly;
@@ -65,7 +67,7 @@ unsigned int __read_mostly hardlockup_pa
  */
 void __init hardlockup_detector_disable(void)
 {
-	watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+	nmi_watchdog_user_enabled = 0;
 }
 
 static int __init hardlockup_panic_setup(char *str)
@@ -75,9 +77,9 @@ static int __init hardlockup_panic_setup
 	else if (!strncmp(str, "nopanic", 7))
 		hardlockup_panic = 0;
 	else if (!strncmp(str, "0", 1))
-		watchdog_enabled &= ~NMI_WATCHDOG_ENABLED;
+		nmi_watchdog_user_enabled = 0;
 	else if (!strncmp(str, "1", 1))
-		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+		nmi_watchdog_user_enabled = 1;
 	return 1;
 }
 __setup("nmi_watchdog=", hardlockup_panic_setup);
@@ -125,6 +127,23 @@ void __weak watchdog_nmi_disable(unsigne
  */
 void __weak watchdog_nmi_reconfigure(bool stop) { }
 
+/**
+ * lockup_detector_update_enable - Update the sysctl enable bit
+ *
+ * Caller needs to make sure that the NMI/perf watchdogs are off, so this
+ * can't race with hardlockup_detector_disable().
+ */
+static void lockup_detector_update_enable(void)
+{
+	watchdog_enabled = 0;
+	if (!watchdog_user_enabled)
+		return;
+	if (nmi_watchdog_user_enabled)
+		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
+	if (soft_watchdog_user_enabled)
+		watchdog_enabled |= SOFT_WATCHDOG_ENABLED;
+}
+
 #ifdef CONFIG_SOFTLOCKUP_DETECTOR
 
 /* Global variables, exported for sysctl */
@@ -153,14 +172,14 @@ static int __init softlockup_panic_setup
 
 static int __init nowatchdog_setup(char *str)
 {
-	watchdog_enabled = 0;
+	watchdog_user_enabled = 0;
 	return 1;
 }
 __setup("nowatchdog", nowatchdog_setup);
 
 static int __init nosoftlockup_setup(char *str)
 {
-	watchdog_enabled &= ~SOFT_WATCHDOG_ENABLED;
+	soft_watchdog_user_enabled = 0;
 	return 1;
 }
 __setup("nosoftlockup", nosoftlockup_setup);
@@ -515,12 +534,13 @@ static void softlockup_unpark_threads(vo
 	softlockup_update_smpboot_threads();
 }
 
-static void softlockup_reconfigure_threads(bool enabled)
+static void softlockup_reconfigure_threads(void)
 {
 	watchdog_nmi_reconfigure(false);
 	softlockup_park_all_threads();
 	set_sample_period();
-	if (enabled)
+	lockup_detector_update_enable();
+	if (watchdog_enabled && watchdog_thresh)
 		softlockup_unpark_threads();
 	watchdog_nmi_reconfigure(true);
 }
@@ -540,6 +560,8 @@ static __init void softlockup_init_threa
 	 * If sysctl is off and watchdog got disabled on the command line,
 	 * nothing to do here.
 	 */
+	lockup_detector_update_enable();
+
 	if (!IS_ENABLED(CONFIG_SYSCTL) &&
 	    !(watchdog_enabled && watchdog_thresh))
 		return;
@@ -553,7 +575,7 @@ static __init void softlockup_init_threa
 
 	mutex_lock(&watchdog_mutex);
 	softlockup_threads_initialized = true;
-	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
+	softlockup_reconfigure_threads();
 	mutex_unlock(&watchdog_mutex);
 }
 
@@ -563,9 +585,10 @@ static inline void watchdog_unpark_threa
 static inline int watchdog_enable_all_cpus(void) { return 0; }
 static inline void watchdog_disable_all_cpus(void) { }
 static inline void softlockup_init_threads(void) { }
-static void softlockup_reconfigure_threads(bool enabled)
+static void softlockup_reconfigure_threads(void)
 {
 	watchdog_nmi_reconfigure(false);
+	lockup_detector_update_enable();
 	watchdog_nmi_reconfigure(true);
 }
 #endif /* !CONFIG_SOFTLOCKUP_DETECTOR */
@@ -606,7 +629,7 @@ static void proc_watchdog_update(void)
 {
 	/* Remove impossible cpus to keep sysctl output clean. */
 	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
-	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
+	softlockup_reconfigure_threads();
 }
 
 /*
@@ -624,48 +647,24 @@ static void proc_watchdog_update(void)
 static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 				void __user *buffer, size_t *lenp, loff_t *ppos)
 {
-	int err, old, new;
-	int *watchdog_param = (int *)table->data;
+	int err, old, *param = table->data;
 
 	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
-	/*
-	 * If the parameter is being read return the state of the corresponding
-	 * bit(s) in 'watchdog_enabled', else update 'watchdog_enabled' and the
-	 * run state of the lockup detectors.
-	 */
 	if (!write) {
-		*watchdog_param = (watchdog_enabled & which) != 0;
+		/*
+		 * On read synchronize the userspace interface. This is a
+		 * racy snapshot.
+		 */
+		*param = (watchdog_enabled & which) != 0;
 		err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 	} else {
+		old = READ_ONCE(*param);
 		err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
-		if (err)
-			goto out;
-
-		/*
-		 * There is a race window between fetching the current value
-		 * from 'watchdog_enabled' and storing the new value. During
-		 * this race window, watchdog_nmi_enable() can sneak in and
-		 * clear the NMI_WATCHDOG_ENABLED bit in 'watchdog_enabled'.
-		 * The 'cmpxchg' detects this race and the loop retries.
-		 */
-		do {
-			old = watchdog_enabled;
-			/*
-			 * If the parameter value is not zero set the
-			 * corresponding bit(s), else clear it(them).
-			 */
-			if (*watchdog_param)
-				new = old | which;
-			else
-				new = old & ~which;
-		} while (cmpxchg(&watchdog_enabled, old, new) != old);
-
-		if (old != new)
+		if (!err && old != READ_ONCE(*param))
 			proc_watchdog_update();
 	}
-out:
 	mutex_unlock(&watchdog_mutex);
 	cpu_hotplug_enable();
 	return err;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 24/29] lockup_detector/perf: Implement init time perf validation
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (22 preceding siblings ...)
  2017-08-31  7:16 ` [patch 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-09-07 15:58   ` Don Zickus
  2017-08-31  7:16 ` [patch 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-perf--Implement-init-time-perf-validation.patch --]
[-- Type: text/plain, Size: 2875 bytes --]

The watchdog tries to create perf events even after it figured out that
perf is not functional or the requested event is not supported.

That's braindead as this can be done once at init time and if not supported
the NMI watchdog can be turned off unconditonally.

Implement the perf hardlockup detector functionality for that. This creates
a new event create function, which will replace the unholy mess of the
existing one in later patches.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h   |    8 ++++++--
 kernel/watchdog_hld.c |   37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 2 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -93,14 +93,18 @@ extern void hardlockup_detector_perf_sto
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
 extern void hardlockup_detector_perf_cleanup(void);
+extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
 static inline void hardlockup_detector_perf_disable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
-#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
+static int hardlockup_detector_perf_init(void) { return -ENODEV; }
 static inline void arch_touch_nmi_watchdog(void) {}
-#endif
+# else
+static int hardlockup_detector_perf_init(void) { return 0; }
+# endif
 #endif
 
 void watchdog_nmi_reconfigure(bool run);
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -238,6 +238,27 @@ int watchdog_nmi_enable(unsigned int cpu
 	return 0;
 }
 
+static int hardlockup_detector_event_create(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct perf_event_attr *wd_attr;
+	struct perf_event *evt;
+
+	wd_attr = &wd_hw_attr;
+	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+
+	/* Try to register using hardware perf events */
+	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
+					       watchdog_overflow_callback, NULL);
+	if (IS_ERR(evt)) {
+		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
+			PTR_ERR(evt));
+		return PTR_ERR(evt);
+	}
+	this_cpu_write(watchdog_ev, evt);
+	return 0;
+}
+
 /**
  * hardlockup_detector_perf_disable - Disable the local event
  */
@@ -315,3 +336,19 @@ void __init hardlockup_detector_perf_res
 			perf_event_enable(event);
 	}
 }
+
+/**
+ * hardlockup_detector_perf_init - Probe whether NMI event is available at all
+ */
+int __init hardlockup_detector_perf_init(void)
+{
+	int ret = hardlockup_detector_event_create();
+
+	if (ret) {
+		pr_info("Perf NMI watchdog permanetely disabled\n");
+	} else {
+		perf_event_release_kernel(this_cpu_read(watchdog_ev));
+		this_cpu_write(watchdog_ev, NULL);
+	}
+	return ret;
+}

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 25/29] lockup_detector: Implement init time detection of perf
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (23 preceding siblings ...)
  2017-08-31  7:16 ` [patch 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Implement-init-time-detection-of-PERF.patch --]
[-- Type: text/plain, Size: 1910 bytes --]

Use the init time detection of the perf NMI watchdog to determine whether
the perf NMI watchdog is functional. If not disable it permanentely. It
won't come back magically at runtime.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,6 +44,7 @@ int __read_mostly watchdog_user_enabled
 int __read_mostly nmi_watchdog_user_enabled = NMI_WATCHDOG_DEFAULT;
 int __read_mostly soft_watchdog_user_enabled = 1;
 int __read_mostly watchdog_thresh = 10;
+int __read_mostly nmi_watchdog_available;
 
 struct cpumask watchdog_allowed_mask __read_mostly;
 static bool softlockup_threads_initialized __read_mostly;
@@ -138,7 +139,7 @@ static void lockup_detector_update_enabl
 	watchdog_enabled = 0;
 	if (!watchdog_user_enabled)
 		return;
-	if (nmi_watchdog_user_enabled)
+	if (nmi_watchdog_available && nmi_watchdog_user_enabled)
 		watchdog_enabled |= NMI_WATCHDOG_ENABLED;
 	if (soft_watchdog_user_enabled)
 		watchdog_enabled |= SOFT_WATCHDOG_ENABLED;
@@ -686,6 +687,8 @@ int proc_watchdog(struct ctl_table *tabl
 int proc_nmi_watchdog(struct ctl_table *table, int write,
 		      void __user *buffer, size_t *lenp, loff_t *ppos)
 {
+	if (!nmi_watchdog_available && write)
+		return -ENOTSUPP;
 	return proc_watchdog_common(NMI_WATCHDOG_ENABLED,
 				    table, write, buffer, lenp, ppos);
 }
@@ -746,6 +749,12 @@ int proc_watchdog_cpumask(struct ctl_tab
 }
 #endif /* CONFIG_SYSCTL */
 
+static __init void detect_nmi_watchdog(void)
+{
+	if (!hardlockup_detector_perf_init())
+		nmi_watchdog_available = true;
+}
+
 void __init lockup_detector_init(void)
 {
 #ifdef CONFIG_NO_HZ_FULL
@@ -758,5 +767,6 @@ void __init lockup_detector_init(void)
 	cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
 #endif
 
+	detect_nmi_watchdog();
 	softlockup_init_threads();
 }

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 26/29] lockup_detector/perf: Implement CPU enable replacement
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (24 preceding siblings ...)
  2017-08-31  7:16 ` [patch 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-perf--Implement-CPU-enable-replacement.patch --]
[-- Type: text/plain, Size: 1648 bytes --]

watchdog_nmi_enable() is an unparseable mess, Provide a clean perf specific
implementation, which will be used when the existing setup/teardown mess is
replaced.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/nmi.h   |    2 ++
 kernel/watchdog_hld.c |   11 +++++++++++
 2 files changed, 13 insertions(+)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -92,12 +92,14 @@ extern void arch_touch_nmi_watchdog(void
 extern void hardlockup_detector_perf_stop(void);
 extern void hardlockup_detector_perf_restart(void);
 extern void hardlockup_detector_perf_disable(void);
+extern void hardlockup_detector_perf_enable(void);
 extern void hardlockup_detector_perf_cleanup(void);
 extern int hardlockup_detector_perf_init(void);
 #else
 static inline void hardlockup_detector_perf_stop(void) { }
 static inline void hardlockup_detector_perf_restart(void) { }
 static inline void hardlockup_detector_perf_disable(void) { }
+static inline void hardlockup_detector_perf_enable(void) { }
 static inline void hardlockup_detector_perf_cleanup(void) { }
 # if !defined(CONFIG_HAVE_NMI_WATCHDOG)
 static int hardlockup_detector_perf_init(void) { return -ENODEV; }
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -260,6 +260,17 @@ static int hardlockup_detector_event_cre
 }
 
 /**
+ * hardlockup_detector_perf_enable - Enable the local event
+ */
+void hardlockup_detector_perf_enable(void)
+{
+	if (hardlockup_detector_event_create())
+		return;
+
+	perf_event_enable(this_cpu_read(watchdog_ev));
+}
+
+/**
  * hardlockup_detector_perf_disable - Disable the local event
  */
 void hardlockup_detector_perf_disable(void)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 27/29] lockup_detector: Use new perf CPU enable mechanism
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (25 preceding siblings ...)
  2017-08-31  7:16 ` [patch 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector--Use-new-perf-CPU-enable-mechanism.patch --]
[-- Type: text/plain, Size: 4179 bytes --]

Get rid of the hodgepodge which tries to be smart about perf being
unavailable and error printout rate limiting.

That's all not required simply because this is never invoked when the perf
NMI watchdog is not functional.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog.c     |    2 +
 kernel/watchdog_hld.c |   88 ++------------------------------------------------
 2 files changed, 7 insertions(+), 83 deletions(-)

--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -452,6 +452,8 @@ static void watchdog_enable(unsigned int
 	/* Initialize timestamp */
 	__touch_watchdog();
 	/* Enable the perf event */
+	if (watchdog_enabled & NMI_WATCHDOG_ENABLED)
+		hardlockup_detector_perf_enable();
 	watchdog_nmi_enable(cpu);
 
 	watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -25,7 +25,7 @@ static DEFINE_PER_CPU(struct perf_event
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static bool hardlockup_detector_disabled;
+static unsigned int watchdog_cpus;
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -160,84 +160,6 @@ static void watchdog_overflow_callback(s
 	return;
 }
 
-/*
- * People like the simple clean cpu node info on boot.
- * Reduce the watchdog noise by only printing messages
- * that are different from what cpu0 displayed.
- */
-static unsigned long firstcpu_err;
-static atomic_t watchdog_cpus;
-
-int watchdog_nmi_enable(unsigned int cpu)
-{
-	struct perf_event_attr *wd_attr;
-	struct perf_event *event = per_cpu(watchdog_ev, cpu);
-	int firstcpu = 0;
-
-	/* nothing to do if the hard lockup detector is disabled */
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		goto out;
-
-	/* A failure disabled the hardlockup detector permanently */
-	if (hardlockup_detector_disabled)
-		return -ENODEV;
-
-	/* is it already setup and enabled? */
-	if (event && event->state > PERF_EVENT_STATE_OFF)
-		goto out;
-
-	/* it is setup but not enabled */
-	if (event != NULL)
-		goto out_enable;
-
-	if (atomic_inc_return(&watchdog_cpus) == 1)
-		firstcpu = 1;
-
-	wd_attr = &wd_hw_attr;
-	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
-
-	/* Try to register using hardware perf events */
-	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
-
-	/* save the first cpu's error for future comparision */
-	if (firstcpu && IS_ERR(event))
-		firstcpu_err = PTR_ERR(event);
-
-	if (!IS_ERR(event)) {
-		/* only print for the first cpu initialized */
-		if (firstcpu || firstcpu_err)
-			pr_info("enabled on all CPUs, permanently consumes one hw-PMU counter.\n");
-		goto out_save;
-	}
-
-	/* skip displaying the same error again */
-	if (!firstcpu && (PTR_ERR(event) == firstcpu_err))
-		return PTR_ERR(event);
-
-	/* vary the KERN level based on the returned errno */
-	if (PTR_ERR(event) == -EOPNOTSUPP)
-		pr_info("disabled (cpu%i): not supported (no LAPIC?)\n", cpu);
-	else if (PTR_ERR(event) == -ENOENT)
-		pr_warn("disabled (cpu%i): hardware events not enabled\n",
-			 cpu);
-	else
-		pr_err("disabled (cpu%i): unable to create perf event: %ld\n",
-			cpu, PTR_ERR(event));
-
-	pr_info("Disabling hard lockup detector permanently\n");
-	hardlockup_detector_disabled = true;
-
-	return PTR_ERR(event);
-
-	/* success path */
-out_save:
-	per_cpu(watchdog_ev, cpu) = event;
-out_enable:
-	perf_event_enable(per_cpu(watchdog_ev, cpu));
-out:
-	return 0;
-}
-
 static int hardlockup_detector_event_create(void)
 {
 	unsigned int cpu = smp_processor_id();
@@ -267,6 +189,9 @@ void hardlockup_detector_perf_enable(voi
 	if (hardlockup_detector_event_create())
 		return;
 
+	if (!watchdog_cpus++)
+		pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+
 	perf_event_enable(this_cpu_read(watchdog_ev));
 }
 
@@ -282,10 +207,7 @@ void hardlockup_detector_perf_disable(vo
 		this_cpu_write(watchdog_ev, NULL);
 		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-
-		/* watchdog_nmi_enable() expects this to be zero initially. */
-		if (atomic_dec_and_test(&watchdog_cpus))
-			firstcpu_err = 0;
+		watchdog_cpus--;
 	}
 }
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 28/29] lockup_detector/perf: Simplify deferred event destroy
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (26 preceding siblings ...)
  2017-08-31  7:16 ` [patch 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31  7:16 ` [patch 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

[-- Attachment #1: lockup_detector-perf--Simplify-deferred-event-destroy.patch --]
[-- Type: text/plain, Size: 1393 bytes --]

Now that all functionality is properly serialized against CPU hotplug,
remove the extra per cpu storage which holds the disabled events for
cleanup. The core makes sure that cleanup happens before new events are
created.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/watchdog_hld.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -21,7 +21,6 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
-static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
@@ -204,8 +203,6 @@ void hardlockup_detector_perf_disable(vo
 
 	if (event) {
 		perf_event_disable(event);
-		this_cpu_write(watchdog_ev, NULL);
-		this_cpu_write(dead_event, event);
 		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
 		watchdog_cpus--;
 	}
@@ -221,9 +218,9 @@ void hardlockup_detector_perf_cleanup(vo
 	int cpu;
 
 	for_each_cpu(cpu, &dead_events_mask) {
-		struct perf_event *event = per_cpu(dead_event, cpu);
+		struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
-		per_cpu(dead_event, cpu) = NULL;
+		per_cpu(watchdog_ev, cpu) = NULL;
 		perf_event_release_kernel(event);
 	}
 	cpumask_clear(&dead_events_mask);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch 29/29] lockup_detector: Cleanup hotplug locking mess
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (27 preceding siblings ...)
  2017-08-31  7:16 ` [patch 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
@ 2017-08-31  7:16 ` Thomas Gleixner
  2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
  2017-09-07 16:04 ` Don Zickus
  30 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-08-31  7:16 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, Borislav Petkov,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell, Benjamin Herrenschmidt, Michael Ellerman,
	linuxppc-dev

[-- Attachment #1: lockup_detector--Cleanup-hotplug-locking-mess.patch --]
[-- Type: text/plain, Size: 2336 bytes --]

All watchdog thread related functions are delegated to the smpboot thread
infrastructure, which handles serialization against CPU hotplug correctly.

The sysctl interface is completely decoupled from anything which requires
CPU hotplug protection.

No need to protect the sysctl writes against cpu hotplug anymore. Remove it
and add the now required protection to the powerpc arch_nmi_watchdog
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/watchdog.c |    2 ++
 kernel/watchdog.c              |    6 ------
 2 files changed, 2 insertions(+), 6 deletions(-)

--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -356,6 +356,7 @@ void watchdog_nmi_reconfigure(bool stop)
 {
 	int cpu;
 
+	cpus_read_lock();
 	if (stop) {
 		for_each_cpu(cpu, &wd_cpus_enabled)
 			stop_wd_on_cpu(cpu);
@@ -364,6 +365,7 @@ void watchdog_nmi_reconfigure(bool stop)
 		for_each_cpu_and(cpu, cpu_online_mask, &watchdog_cpumask)
 			start_wd_on_cpu(cpu);
 	}
+	cpus_read_unlock();
 }
 
 /*
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -652,7 +652,6 @@ static int proc_watchdog_common(int whic
 {
 	int err, old, *param = table->data;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	if (!write) {
@@ -669,7 +668,6 @@ static int proc_watchdog_common(int whic
 			proc_watchdog_update();
 	}
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 
@@ -713,7 +711,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 {
 	int err, old;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	old = READ_ONCE(watchdog_thresh);
@@ -723,7 +720,6 @@ int proc_watchdog_thresh(struct ctl_tabl
 		proc_watchdog_update();
 
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 
@@ -738,7 +734,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 {
 	int err;
 
-	cpu_hotplug_disable();
 	mutex_lock(&watchdog_mutex);
 
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
@@ -746,7 +741,6 @@ int proc_watchdog_cpumask(struct ctl_tab
 		proc_watchdog_update();
 
 	mutex_unlock(&watchdog_mutex);
-	cpu_hotplug_enable();
 	return err;
 }
 #endif /* CONFIG_SYSCTL */

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (28 preceding siblings ...)
  2017-08-31  7:16 ` [patch 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
@ 2017-08-31 22:10 ` Don Zickus
  2017-09-01  4:42   ` Nicholas Piggin
  2017-09-01  9:18   ` Thomas Gleixner
  2017-09-07 16:04 ` Don Zickus
  30 siblings, 2 replies; 47+ messages in thread
From: Don Zickus @ 2017-08-31 22:10 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
> 
>     - It's deadlock prone vs. CPU hotplug in various ways. Some of these
>       are due to recursive cpus_read_lock() others are due to
>       cpus_read_lock() from CPU hotplug callbacks which immediately lock
>       the machine because cpus are write locked.
> 
>     - The handling of the cpu hotplug threads happens sideways to the
>       smpboot thread infrastructure, which is racy and pointless
> 
>     - The handling of the user space sysctl interface is a complete
>       trainwreck as it fiddles directly with variables which can be
>       modified or evaluated by the running watchdogs.
> 
>     - The perf event initialization is a steaming pile of duct tape as it
>       idiotically tries to create perf events over and over even if perf is
>       not functional (no hardware, ....). To avoid excessive dmesg spam it
>       contains magic printk ratelimiting along with either wrong or useless
>       messages.
> 
>     - The code structure is horrible as ifdef sections are scattered all
>       over the place which makes it unreadable
> 
>     - There is more wreckage, but see the changelogs for the ugly details.
> 
> Before I get utterly grumpy, I just pretend that I don't give a sh*t!
> 
> The following series sanitizes the facility and addresses the problems.

Hi Thomas,

Thanks for the patchset.  I agree with most your issues you complained
about, just wasn't smart enough to figure out the right way to solve them.
Despite your aggressive comments, I will review the code to see if it covers
the scenarios that have popped up over the years and run some testing on my
side.  Probably need a few days to do that.

Cheers,
Don

> 
> Thanks,
> 
> 	tglx
> ---
>  arch/parisc/kernel/process.c   |    2 
>  arch/powerpc/kernel/watchdog.c |   22 -
>  arch/x86/events/intel/core.c   |   11 
>  include/linux/nmi.h            |  121 +++----
>  include/linux/smpboot.h        |    4 
>  kernel/cpu.c                   |    6 
>  kernel/smpboot.c               |   22 -
>  kernel/sysctl.c                |   22 -
>  kernel/watchdog.c              |  638 ++++++++++++++---------------------------
>  kernel/watchdog_hld.c          |  193 ++++++------
>  10 files changed, 433 insertions(+), 608 deletions(-)
> 
>       

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
  2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
@ 2017-09-01  4:42   ` Nicholas Piggin
  2017-09-01  9:18   ` Thomas Gleixner
  1 sibling, 0 replies; 47+ messages in thread
From: Nicholas Piggin @ 2017-09-01  4:42 UTC (permalink / raw)
  To: Don Zickus
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Borislav Petkov, Sebastian Siewior, Chris Metcalf,
	Ulrich Obergfell, Michael Ellerman

On Thu, 31 Aug 2017 18:10:14 -0400
Don Zickus <dzickus@redhat.com> wrote:

> On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> > The lockup detector is broken is several ways:
> > 
> >     - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> >       are due to recursive cpus_read_lock() others are due to
> >       cpus_read_lock() from CPU hotplug callbacks which immediately lock
> >       the machine because cpus are write locked.
> > 
> >     - The handling of the cpu hotplug threads happens sideways to the
> >       smpboot thread infrastructure, which is racy and pointless
> > 
> >     - The handling of the user space sysctl interface is a complete
> >       trainwreck as it fiddles directly with variables which can be
> >       modified or evaluated by the running watchdogs.
> > 
> >     - The perf event initialization is a steaming pile of duct tape as it
> >       idiotically tries to create perf events over and over even if perf is
> >       not functional (no hardware, ....). To avoid excessive dmesg spam it
> >       contains magic printk ratelimiting along with either wrong or useless
> >       messages.
> > 
> >     - The code structure is horrible as ifdef sections are scattered all
> >       over the place which makes it unreadable
> > 
> >     - There is more wreckage, but see the changelogs for the ugly details.
> > 
> > Before I get utterly grumpy, I just pretend that I don't give a sh*t!
> > 
> > The following series sanitizes the facility and addresses the problems.  
> 
> Hi Thomas,
> 
> Thanks for the patchset.  I agree with most your issues you complained
> about, just wasn't smart enough to figure out the right way to solve them.
> Despite your aggressive comments, I will review the code to see if it covers
> the scenarios that have popped up over the years and run some testing on my
> side.  Probably need a few days to do that.

The powerpc bits look fine, there's no real changes pending there,
so just take them through your tree if you like.

I had a glance throught the series, no comments yet. The powerpc watchdog
already duplicates the proc tunables rather than using them directly, so
in theory it did not need the 2 stage reconfigure. In practice, it has a
brown paper bag bug because it does not stop the watchdog before changing
its internal variables :P 2 stage is probably safer and clearer way to go
though.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
  2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
  2017-09-01  4:42   ` Nicholas Piggin
@ 2017-09-01  9:18   ` Thomas Gleixner
  1 sibling, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-01  9:18 UTC (permalink / raw)
  To: Don Zickus
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, 31 Aug 2017, Don Zickus wrote:
> On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> > The following series sanitizes the facility and addresses the problems.
> 
> Hi Thomas,
> 
> Thanks for the patchset.  I agree with most your issues you complained
> about, just wasn't smart enough to figure out the right way to solve them.
> Despite your aggressive comments, I will review the code to see if it covers

Sorry, I didn't want to offend you or anyone else, but having to spend more
than a day to distangle the duct tape in that code, was not really lifting
my mood.

> the scenarios that have popped up over the years and run some testing on my
> side.  Probably need a few days to do that.

Yes, that would be appreciated.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock
  2017-08-31  7:16 ` [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
@ 2017-09-01 19:02   ` Don Zickus
  2017-09-01 19:29     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Don Zickus @ 2017-09-01 19:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote:
> The following deadlock is possible in the watchdog hotplug code:
> 
>   cpus_write_lock()
>     ...
>       takedown_cpu()
>         smpboot_park_threads()
>           smpboot_park_thread()
>             kthread_park()
>               ->park() := watchdog_disable()
>                 watchdog_nmi_disable()
>                   perf_event_release_kernel();
>                     put_event()
>                       _free_event()
>                         ->destroy() := hw_perf_event_destroy()
>                           x86_release_hardware()
>                             release_ds_buffers()
>                               get_online_cpus()
> 
> when a per cpu watchdog perf event is destroyed which drops the last
> reference to the PMU hardware. The cleanup code there invokes
> get_online_cpus() which instantly deadlocks because the hotplug percpu
> rwsem is write locked.

The main reason perf_event_release_kernel is in this path is because the
oprofile folks complained they couldn't use the perf counters when the
nmi_watchdog was disabled on the command line.

I believe your patches only release the perf event on cpu unplug now.

Unless oprofile has been updated, this may still cause issues.  But
detecting this on sysctl changes can probably work around this.

Cheers,
Don

> 
> To solve this add a deferring mechanism:
> 
>   cpus_write_lock()
> 			   kthread_park()
> 			    watchdog_nmi_disable(deferred)
> 			      perf_event_disable(event);
> 			      move_event_to_deferred(event);
>    			   ....
>   cpus_write_unlock()
>   cleaup_deferred_events()
>     perf_event_release_kernel()
> 
> This is still properly serialized against concurrent hotplug via the
> cpu_add_remove_lock, which is held by the task which initiated the hotplug
> event.
> 
> This is also used to handle event destruction when the watchdog threads are
> parked via other mechanisms than CPU hotplug.
> 
> Reported-by: Borislav Petkov <bp@alien8.de>
> Analyzed-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/nmi.h   |    6 ++++++
>  kernel/cpu.c          |    6 ++++++
>  kernel/watchdog.c     |   24 ++++++++++++++++++++++++
>  kernel/watchdog_hld.c |   34 ++++++++++++++++++++++++++++------
>  4 files changed, 64 insertions(+), 6 deletions(-)
> 
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -13,9 +13,11 @@
>  #ifdef CONFIG_LOCKUP_DETECTOR
>  void lockup_detector_init(void);
>  void lockup_detector_soft_poweroff(void);
> +void lockup_detector_cleanup(void);
>  #else
>  static inline void lockup_detector_init(void) { }
>  static inline void lockup_detector_soft_poweroff(void) { }
> +static inline void lockup_detector_cleanup(void) { }
>  #endif
>  
>  #ifdef CONFIG_SOFTLOCKUP_DETECTOR
> @@ -77,9 +79,13 @@ static inline void hardlockup_detector_d
>  extern void arch_touch_nmi_watchdog(void);
>  extern void hardlockup_detector_perf_stop(void);
>  extern void hardlockup_detector_perf_restart(void);
> +extern void hardlockup_detector_perf_disable(void);
> +extern void hardlockup_detector_perf_cleanup(void);
>  #else
>  static inline void hardlockup_detector_perf_stop(void) { }
>  static inline void hardlockup_detector_perf_restart(void) { }
> +static inline void hardlockup_detector_perf_disable(void) { }
> +static inline void hardlockup_detector_perf_cleanup(void) { }
>  #if !defined(CONFIG_HAVE_NMI_WATCHDOG)
>  static inline void arch_touch_nmi_watchdog(void) {}
>  #endif
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -24,6 +24,7 @@
>  #include <linux/lockdep.h>
>  #include <linux/tick.h>
>  #include <linux/irq.h>
> +#include <linux/nmi.h>
>  #include <linux/smpboot.h>
>  #include <linux/relay.h>
>  #include <linux/slab.h>
> @@ -733,6 +734,11 @@ static int __ref _cpu_down(unsigned int
>  
>  out:
>  	cpus_write_unlock();
> +	/*
> +	 * Do post unplug cleanup. This is still protected against
> +	 * concurrent CPU hotplug via cpu_add_remove_lock.
> +	 */
> +	lockup_detector_cleanup();
>  	return ret;
>  }
>  
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -193,6 +193,8 @@ static int __init hardlockup_all_cpu_bac
>  #endif
>  #endif
>  
> +static void __lockup_detector_cleanup(void);
> +
>  /*
>   * Hard-lockup warnings should be triggered after just a few seconds. Soft-
>   * lockups can have false positives under extreme conditions. So we generally
> @@ -459,6 +461,7 @@ static void watchdog_disable(unsigned in
>  	hrtimer_cancel(hrtimer);
>  	/* disable the perf event */
>  	watchdog_nmi_disable(cpu);
> +	hardlockup_detector_perf_disable();
>  }
>  
>  static void watchdog_cleanup(unsigned int cpu, bool online)
> @@ -631,6 +634,24 @@ static void set_sample_period(void)
>  }
>  #endif /* SOFTLOCKUP */
>  
> +static void __lockup_detector_cleanup(void)
> +{
> +	lockdep_assert_held(&watchdog_mutex);
> +	hardlockup_detector_perf_cleanup();
> +}
> +
> +/**
> + * lockup_detector_cleanup - Cleanup after cpu hotplug or sysctl changes
> + *
> + * Caller must not hold the cpu hotplug rwsem.
> + */
> +void lockup_detector_cleanup(void)
> +{
> +	mutex_lock(&watchdog_mutex);
> +	__lockup_detector_cleanup();
> +	mutex_unlock(&watchdog_mutex);
> +}
> +
>  /**
>   * lockup_detector_soft_poweroff - Interface to stop lockup detector(s)
>   *
> @@ -665,6 +686,8 @@ static int proc_watchdog_update(void)
>  
>  	watchdog_nmi_reconfigure();
>  
> +	__lockup_detector_cleanup();
> +
>  	return err;
>  
>  }
> @@ -837,6 +860,7 @@ int proc_watchdog_cpumask(struct ctl_tab
>  		}
>  
>  		watchdog_nmi_reconfigure();
> +		__lockup_detector_cleanup();
>  	}
>  
>  	mutex_unlock(&watchdog_mutex);
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -21,6 +21,8 @@
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct perf_event *, dead_event);
> +static struct cpumask dead_events_mask;
>  
>  static unsigned long hardlockup_allcpu_dumped;
>  static bool hardlockup_detector_disabled;
> @@ -239,16 +241,18 @@ int watchdog_nmi_enable(unsigned int cpu
>  	return 0;
>  }
>  
> -void watchdog_nmi_disable(unsigned int cpu)
> +/**
> + * hardlockup_detector_perf_disable - Disable the local event
> + */
> +void hardlockup_detector_perf_disable(void)
>  {
> -	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +	struct perf_event *event = this_cpu_read(watchdog_ev);
>  
>  	if (event) {
>  		perf_event_disable(event);
> -		per_cpu(watchdog_ev, cpu) = NULL;
> -
> -		/* should be in cleanup, but blocks oprofile */
> -		perf_event_release_kernel(event);
> +		this_cpu_write(watchdog_ev, NULL);
> +		this_cpu_write(dead_event, event);
> +		cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
>  
>  		/* watchdog_nmi_enable() expects this to be zero initially. */
>  		if (atomic_dec_and_test(&watchdog_cpus))
> @@ -257,6 +261,24 @@ void watchdog_nmi_disable(unsigned int c
>  }
>  
>  /**
> + * hardlockup_detector_perf_cleanup - Cleanup disabled events and destroy them
> + *
> + * Called from lockup_detector_cleanup(). Serialized by the caller.
> + */
> +void hardlockup_detector_perf_cleanup(void)
> +{
> +	int cpu;
> +
> +	for_each_cpu(cpu, &dead_events_mask) {
> +		struct perf_event *event = per_cpu(dead_event, cpu);
> +
> +		per_cpu(dead_event, cpu) = NULL;
> +		perf_event_release_kernel(event);
> +	}
> +	cpumask_clear(&dead_events_mask);
> +}
> +
> +/**
>   * hardlockup_detector_perf_stop - Globally stop watchdog events
>   *
>   * Special interface for x86 to handle the perf HT bug.
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance
  2017-08-31  7:16 ` [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
@ 2017-09-01 19:08   ` Don Zickus
  2017-09-01 19:45     ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Don Zickus @ 2017-09-01 19:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:16:15AM +0200, Thomas Gleixner wrote:
> The lockup detector reconfiguration tears down all watchdog threads when
> the watchdog is disabled and sets them up again when its enabled.
> 
> That's a pointless exercise. The watchdog threads are not consuming an
> insane amount of resources, so it's enough to set them up at init time and
> keep them in parked position when the watchdog is disabled and unpark them
> when it is reenabled. The smpboot thread infrastructure takes care of
> keeping the force parked threads in place even across cpu hotplug.

The original reasoning for implementing this complexity was we were worried
about catching kthread_[un]park errors.  It was thought in the rare case if
it failed, we would hang.  Perhaps we misunderstood the kthread_[un]park
code.

Cheers,
Don

> 
> Aside of that the code implements the park/unpark facility of smp hotplug
> threads on its own, which is even more pointless. We have functionality in
> the smpboot thread code to do so.
> 
> Use the new thread management functions and get rid of the unholy mess.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/watchdog.c |  190 +++++-------------------------------------------------
>  1 file changed, 19 insertions(+), 171 deletions(-)
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,13 +92,6 @@ struct cpumask watchdog_cpumask __read_m
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>  
>  /*
> - * The 'watchdog_running' variable is set to 1 when the watchdog threads
> - * are registered/started and is set to 0 when the watchdog threads are
> - * unregistered/stopped, so it is an indicator whether the threads exist.
> - */
> -static int __read_mostly watchdog_running;
> -
> -/*
>   * These functions can be overridden if an architecture implements its
>   * own hardlockup detector.
>   *
> @@ -123,10 +116,6 @@ void __weak watchdog_nmi_reconfigure(voi
>  
>  #ifdef CONFIG_SOFTLOCKUP_DETECTOR
>  
> -/* Helper for online, unparked cpus. */
> -#define for_each_watchdog_cpu(cpu) \
> -	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
> -
>  /* Global variables, exported for sysctl */
>  unsigned int __read_mostly softlockup_panic =
>  			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
> @@ -252,11 +241,15 @@ void touch_all_softlockup_watchdogs(void
>  	int cpu;
>  
>  	/*
> -	 * this is done lockless
> -	 * do we care if a 0 races with a timestamp?
> -	 * all it means is the softlock check starts one cycle later
> +	 * watchdog_mutex cannpt be taken here, as this might be called
> +	 * from (soft)interrupt context, so the access to
> +	 * watchdog_allowed_cpumask might race with a concurrent update.
> +	 *
> +	 * The watchdog time stamp can race against a concurrent real
> +	 * update as well, the only side effect might be a cycle delay for
> +	 * the softlockup check.
>  	 */
> -	for_each_watchdog_cpu(cpu)
> +	for_each_cpu(cpu, &watchdog_allowed_mask)
>  		per_cpu(watchdog_touch_ts, cpu) = 0;
>  	wq_watchdog_touch(-1);
>  }
> @@ -296,9 +289,6 @@ static void watchdog_interrupt_count(voi
>  	__this_cpu_inc(hrtimer_interrupts);
>  }
>  
> -static int watchdog_enable_all_cpus(void);
> -static void watchdog_disable_all_cpus(void);
> -
>  /* watchdog kicker functions */
>  static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  {
> @@ -492,95 +482,6 @@ static struct smp_hotplug_thread watchdo
>  	.unpark			= watchdog_enable,
>  };
>  
> -/*
> - * park all watchdog threads that are specified in 'watchdog_cpumask'
> - *
> - * This function returns an error if kthread_park() of a watchdog thread
> - * fails. In this situation, the watchdog threads of some CPUs can already
> - * be parked and the watchdog threads of other CPUs can still be runnable.
> - * Callers are expected to handle this special condition as appropriate in
> - * their context.
> - *
> - * This function may only be called in a context that is protected against
> - * races with CPU hotplug - for example, via get_online_cpus().
> - */
> -static int watchdog_park_threads(void)
> -{
> -	int cpu, ret = 0;
> -
> -	for_each_watchdog_cpu(cpu) {
> -		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
> -		if (ret)
> -			break;
> -	}
> -	return ret;
> -}
> -
> -/*
> - * unpark all watchdog threads that are specified in 'watchdog_cpumask'
> - *
> - * This function may only be called in a context that is protected against
> - * races with CPU hotplug - for example, via get_online_cpus().
> - */
> -static void watchdog_unpark_threads(void)
> -{
> -	int cpu;
> -
> -	for_each_watchdog_cpu(cpu)
> -		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> -}
> -
> -static int update_watchdog_all_cpus(void)
> -{
> -	int ret;
> -
> -	ret = watchdog_park_threads();
> -	if (ret)
> -		return ret;
> -
> -	watchdog_unpark_threads();
> -
> -	return 0;
> -}
> -
> -static int watchdog_enable_all_cpus(void)
> -{
> -	int err = 0;
> -
> -	if (!watchdog_running) {
> -		err = smpboot_register_percpu_thread_cpumask(&watchdog_threads,
> -							     &watchdog_cpumask);
> -		if (err)
> -			pr_err("Failed to create watchdog threads, disabled\n");
> -		else
> -			watchdog_running = 1;
> -	} else {
> -		/*
> -		 * Enable/disable the lockup detectors or
> -		 * change the sample period 'on the fly'.
> -		 */
> -		err = update_watchdog_all_cpus();
> -
> -		if (err) {
> -			watchdog_disable_all_cpus();
> -			pr_err("Failed to update lockup detectors, disabled\n");
> -		}
> -	}
> -
> -	if (err)
> -		watchdog_enabled = 0;
> -
> -	return err;
> -}
> -
> -static void watchdog_disable_all_cpus(void)
> -{
> -	if (watchdog_running) {
> -		watchdog_running = 0;
> -		smpboot_unregister_percpu_thread(&watchdog_threads);
> -	}
> -}
> -
>  static void softlockup_update_smpboot_threads(void)
>  {
>  	lockdep_assert_held(&watchdog_mutex);
> @@ -655,7 +556,6 @@ static inline int watchdog_park_threads(
>  static inline void watchdog_unpark_threads(void) { }
>  static inline int watchdog_enable_all_cpus(void) { return 0; }
>  static inline void watchdog_disable_all_cpus(void) { }
> -static inline void set_sample_period(void) { }
>  static inline void softlockup_init_threads(void) { }
>  static inline void softlockup_update_threads(void) { }
>  static inline void softlockup_reconfigure_threads(bool enabled) { }
> @@ -695,28 +595,10 @@ void lockup_detector_soft_poweroff(void)
>  /*
>   * Update the run state of the lockup detectors.
>   */
> -static int proc_watchdog_update(void)
> +static void proc_watchdog_update(void)
>  {
> -	int err = 0;
> -
> -	/*
> -	 * Watchdog threads won't be started if they are already active.
> -	 * The 'watchdog_running' variable in watchdog_*_all_cpus() takes
> -	 * care of this. If those threads are already active, the sample
> -	 * period will be updated and the lockup detectors will be enabled
> -	 * or disabled 'on the fly'.
> -	 */
> -	if (watchdog_enabled && watchdog_thresh)
> -		err = watchdog_enable_all_cpus();
> -	else
> -		watchdog_disable_all_cpus();
> -
> +	softlockup_reconfigure_threads(watchdog_enabled && watchdog_thresh);
>  	watchdog_nmi_reconfigure();
> -
> -	__lockup_detector_cleanup();
> -
> -	return err;
> -
>  }
>  
>  /*
> @@ -772,17 +654,8 @@ static int proc_watchdog_common(int whic
>  				new = old & ~which;
>  		} while (cmpxchg(&watchdog_enabled, old, new) != old);
>  
> -		/*
> -		 * Update the run state of the lockup detectors. There is _no_
> -		 * need to check the value returned by proc_watchdog_update()
> -		 * and to restore the previous value of 'watchdog_enabled' as
> -		 * both lockup detectors are disabled if proc_watchdog_update()
> -		 * returns an error.
> -		 */
> -		if (old == new)
> -			goto out;
> -
> -		err = proc_watchdog_update();
> +		if (old != new)
> +			proc_watchdog_update();
>  	}
>  out:
>  	mutex_unlock(&watchdog_mutex);
> @@ -826,50 +699,28 @@ int proc_soft_watchdog(struct ctl_table
>  int proc_watchdog_thresh(struct ctl_table *table, int write,
>  			 void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> -	int err, old, new;
> +	int err, old;
>  
>  	cpu_hotplug_disable();
>  	mutex_lock(&watchdog_mutex);
>  
> -	old = ACCESS_ONCE(watchdog_thresh);
> +	old = READ_ONCE(watchdog_thresh);
>  	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>  
> -	if (err || !write)
> -		goto out;
> +	if (!err && write && old != READ_ONCE(watchdog_thresh))
> +		proc_watchdog_update();
>  
> -	/*
> -	 * Update the sample period. Restore on failure.
> -	 */
> -	new = ACCESS_ONCE(watchdog_thresh);
> -	if (old == new)
> -		goto out;
> -
> -	set_sample_period();
> -	err = proc_watchdog_update();
> -	if (err) {
> -		watchdog_thresh = old;
> -		set_sample_period();
> -	}
> -out:
>  	mutex_unlock(&watchdog_mutex);
>  	cpu_hotplug_enable();
>  	return err;
>  }
>  
> -static void watchdog_update_cpus(void)
> -{
> -	if (IS_ENABLED(CONFIG_SOFTLOCKUP_DETECTOR) && watchdog_running) {
> -		smpboot_update_cpumask_percpu_thread(&watchdog_threads,
> -						     &watchdog_cpumask);
> -		__lockup_detector_cleanup();
> -	}
> -}
> -
>  static void proc_watchdog_cpumask_update(void)
>  {
>  	/* Remove impossible cpus to keep sysctl output clean. */
>  	cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
> -	watchdog_update_cpus();
> +
> +	softlockup_update_threads();
>  	watchdog_nmi_reconfigure();
>  }
>  
> @@ -899,8 +750,6 @@ int proc_watchdog_cpumask(struct ctl_tab
>  
>  void __init lockup_detector_init(void)
>  {
> -	set_sample_period();
> -
>  #ifdef CONFIG_NO_HZ_FULL
>  	if (tick_nohz_full_enabled()) {
>  		pr_info("Disabling watchdog on nohz_full cores by default\n");
> @@ -911,6 +760,5 @@ void __init lockup_detector_init(void)
>  	cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
>  #endif
>  
> -	if (watchdog_enabled)
> -		watchdog_enable_all_cpus();
> +	softlockup_init_threads();
>  }
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock
  2017-09-01 19:02   ` Don Zickus
@ 2017-09-01 19:29     ` Thomas Gleixner
  2017-09-05 14:51       ` Don Zickus
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-01 19:29 UTC (permalink / raw)
  To: Don Zickus
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Fri, 1 Sep 2017, Don Zickus wrote:
> On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote:
> > The following deadlock is possible in the watchdog hotplug code:
> > 
> >   cpus_write_lock()
> >     ...
> >       takedown_cpu()
> >         smpboot_park_threads()
> >           smpboot_park_thread()
> >             kthread_park()
> >               ->park() := watchdog_disable()
> >                 watchdog_nmi_disable()
> >                   perf_event_release_kernel();
> >                     put_event()
> >                       _free_event()
> >                         ->destroy() := hw_perf_event_destroy()
> >                           x86_release_hardware()
> >                             release_ds_buffers()
> >                               get_online_cpus()
> > 
> > when a per cpu watchdog perf event is destroyed which drops the last
> > reference to the PMU hardware. The cleanup code there invokes
> > get_online_cpus() which instantly deadlocks because the hotplug percpu
> > rwsem is write locked.
> 
> The main reason perf_event_release_kernel is in this path is because the
> oprofile folks complained they couldn't use the perf counters when the
> nmi_watchdog was disabled on the command line.

If the nmi watchdog is disabled on the command line then there are no
counters claimed at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance
  2017-09-01 19:08   ` Don Zickus
@ 2017-09-01 19:45     ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-01 19:45 UTC (permalink / raw)
  To: Don Zickus
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Fri, 1 Sep 2017, Don Zickus wrote:
> On Thu, Aug 31, 2017 at 09:16:15AM +0200, Thomas Gleixner wrote:
> > The lockup detector reconfiguration tears down all watchdog threads when
> > the watchdog is disabled and sets them up again when its enabled.
> > 
> > That's a pointless exercise. The watchdog threads are not consuming an
> > insane amount of resources, so it's enough to set them up at init time and
> > keep them in parked position when the watchdog is disabled and unpark them
> > when it is reenabled. The smpboot thread infrastructure takes care of
> > keeping the force parked threads in place even across cpu hotplug.
> 
> The original reasoning for implementing this complexity was we were worried
> about catching kthread_[un]park errors.  It was thought in the rare case if
> it failed, we would hang.  Perhaps we misunderstood the kthread_[un]park
> code.

park is waiting for the thread to reach park position. So if this fails,
then the thread which invoked the parking will be stuck forever.

the unpark is async, i.e. it just kicks the thread out of park position and
let it run.

So you cannot catch anything with your open coding as it just uses the same
mechanisms as the smpboot infrastructure does.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery
       [not found]   ` <CAEeg4=CJohPTi8FUNWqb3egsbZnExyJapcNC7wD-2amXTsMrYw@mail.gmail.com>
@ 2017-09-04 12:10     ` Peter Zijlstra
  2017-09-05 15:15       ` Don Zickus
  2017-09-05 13:58     ` Thomas Gleixner
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Zijlstra @ 2017-09-04 12:10 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: Thomas Gleixner, LKML, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf

On Mon, Sep 04, 2017 at 01:09:06PM +0200, Ulrich Obergfell wrote:

> - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run.
> - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration
>   requires parking/unparking of all watchdog threads.

This is where you fail, its silly to require parking for
reconfiguration.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery
       [not found]   ` <CAEeg4=CJohPTi8FUNWqb3egsbZnExyJapcNC7wD-2amXTsMrYw@mail.gmail.com>
  2017-09-04 12:10     ` Peter Zijlstra
@ 2017-09-05 13:58     ` Thomas Gleixner
  2017-09-05 19:19       ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
  1 sibling, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-05 13:58 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf

On Mon, 4 Sep 2017, Ulrich Obergfell wrote:
>  "b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded
>   system") tries to fix the following issue:
> 
>    watchdog_stop()
>           hrtimer_cancel()
>           perf_nmi_event_stop()
> 
>   If the task gets preempted after canceling the hrtimer or the VM gets
>   scheduled out long enough, then there is a chance that the next NMI will
>   see a stale hrtimer interrupt count and trigger a false positive hard
>   lockup splat."
> 
> This is not exactly the issue that b94f51183b06 is actually supposed to fix.
> For details, please see the accompanying commit log message at:
> 
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel/watchdog.c?id=b94f51183b0617e7b9b4fb4137d4cf1cab7547c2
> 
> For convenience, here is a slightly different description of the scenario
> that b94f51183b06 originally aims to address:
> 
> - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run.
> - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration
>   requires parking/unparking of all watchdog threads.
> - watchdog_timer_fn() on all CPUs can already pick up the new sample period,
>   but the perf counter frequency cannot be updated on all CPUs yet because
>   watchdog/N is unable to park, so watchdog_overflow_callback() still occurs
>   at a frequency that is based on the old sample period (on CPUs >= N which
>   have not had a chance to park their watchdog threads yet).
> - If 'watchdog_thresh' was increased by the reconfiguration, the interval
>   at which watchdog_timer_fn() is now running could be too large for the
>   watchdog_overflow_callback() function to observe a change of the timer
>   interrupt counter. This can trigger a false positive.

Oh well. You did not fix that at all with that hack.

The real problem is that the watchdog threshold write in proc immediately
updates sample_period, which is evaluated by the running thread.

proc_write()
  set_sample_period()
					<----- Broken starts
  proc_watchdog_update()
    watchdog_enable_all_cpus()
    update_watchdog_all_cpus()
        watchdog_park_threads()
					<----- Broken ends
	  watchdog_park_in_progress = 1

So up to the point where watchdog_park_in_progress is set to 1, the change
to sample_period is visible to all active watchdog threads and will be
picked up by the watchdog threads to rearm their timer. The NMI watchdog
will run with the old frequency until the thread is parked and the watchdog
NMI disarmed.

The underlying problem is the non synchronized update of sample_period and
this band aid hack is not solving that at all.
 
> The above scenario should be rare in practice, so it seemed reasonable to
> come up with a simple low-risk fix that addresses merely this particular
> scenario (watchdog_timer_fn() and watchdog_overflow_callback() now return
> immediately while park is in progress / no lockup detection is performed
> during that window of time).

That thing is neither reasonable nor a fix. It's mindless hackery which
papers over the problem instead of fixing the root cause. It neither saw
the problem in the park/unpark in general which is again a valid source for
false positives.

> In relation to:
> 
>  "That commit added a complete abstrusity with a atomic variable telling the
>   NMI watchdog function that stop is in progress. This is so stupid, that
>   it's not funny anymore."
> 
> I cannot see any 'abstrusity' or 'stupidity' in the pragmatic approach that
> was taken in b94f51183b06 to fix an issue that should be rare in practice
> as described above.
> 
> Please change the commit log of [patch 11/29] since it is inconsistent with
> the commit log of b94f51183b06.

Yes, I will change it to include the above reasoning why this is complete
crap and keep the extra findings for reference.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock
  2017-09-01 19:29     ` Thomas Gleixner
@ 2017-09-05 14:51       ` Don Zickus
  0 siblings, 0 replies; 47+ messages in thread
From: Don Zickus @ 2017-09-05 14:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Fri, Sep 01, 2017 at 09:29:07PM +0200, Thomas Gleixner wrote:
> On Fri, 1 Sep 2017, Don Zickus wrote:
> > On Thu, Aug 31, 2017 at 09:16:08AM +0200, Thomas Gleixner wrote:
> > > The following deadlock is possible in the watchdog hotplug code:
> > > 
> > >   cpus_write_lock()
> > >     ...
> > >       takedown_cpu()
> > >         smpboot_park_threads()
> > >           smpboot_park_thread()
> > >             kthread_park()
> > >               ->park() := watchdog_disable()
> > >                 watchdog_nmi_disable()
> > >                   perf_event_release_kernel();
> > >                     put_event()
> > >                       _free_event()
> > >                         ->destroy() := hw_perf_event_destroy()
> > >                           x86_release_hardware()
> > >                             release_ds_buffers()
> > >                               get_online_cpus()
> > > 
> > > when a per cpu watchdog perf event is destroyed which drops the last
> > > reference to the PMU hardware. The cleanup code there invokes
> > > get_online_cpus() which instantly deadlocks because the hotplug percpu
> > > rwsem is write locked.
> > 
> > The main reason perf_event_release_kernel is in this path is because the
> > oprofile folks complained they couldn't use the perf counters when the
> > nmi_watchdog was disabled on the command line.
> 
> If the nmi watchdog is disabled on the command line then there are no
> counters claimed at all.

Ah, I see it now.  When you park all the threads, you clear the cpumask
which then calls the lockup detector_cleanup to release all perf_counters
that are no longer used by the cpumask.

Further reading this code, I see how the code covers various race conditions
we tried solving in the past.

Thanks!

Cheers,
Don

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery
  2017-09-04 12:10     ` Peter Zijlstra
@ 2017-09-05 15:15       ` Don Zickus
  2017-09-05 15:42         ` Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Don Zickus @ 2017-09-05 15:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ulrich Obergfell, Thomas Gleixner, LKML, Ingo Molnar,
	Andrew Morton, Borislav Petkov, Sebastian Siewior,
	Nicholas Piggin, Chris Metcalf

On Mon, Sep 04, 2017 at 02:10:50PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 04, 2017 at 01:09:06PM +0200, Ulrich Obergfell wrote:
> 
> > - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run.
> > - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration
> >   requires parking/unparking of all watchdog threads.
> 
> This is where you fail, its silly to require parking for
> reconfiguration.

Hi Peter,

Ok, please elaborate.  Unless I am misunderstanding, that is what Thomas
requested us do years ago when he implemented the parking/unparking scheme
and what his current patch set is doing now.

The point of parking I believe was to avoid the overhead of tearing down a
thread and restarting it when the code needed to update various lockup
detector settings.

So if we can't depend on parking for reconfiguration, then are the other
options (besides tearing down threads)?

I am not trying to be argumentative here, just trying to fill in the
disconnect between us.

Hi Uli,

I think the race you detailed is solved with Thomas's patches. In the
original design we set the sample period first, then tried parking the
threads, which created the mess.  With this patchset, Thomas properly
parks the threads first, then sets the sample period, thus avoiding the race
I believe.  You should be able to see that in patch 16,
softlockup_reconfigure_threads().

Cheers,
Don

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 11/29] lockup_detector: Remove park_in_progress hackery
  2017-09-05 15:15       ` Don Zickus
@ 2017-09-05 15:42         ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-05 15:42 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, Ulrich Obergfell, LKML, Ingo Molnar,
	Andrew Morton, Borislav Petkov, Sebastian Siewior,
	Nicholas Piggin, Chris Metcalf

On Tue, 5 Sep 2017, Don Zickus wrote:

> On Mon, Sep 04, 2017 at 02:10:50PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 04, 2017 at 01:09:06PM +0200, Ulrich Obergfell wrote:
> > 
> > > - A thread hogs CPU N (soft lockup) so that watchdog/N is unable to run.
> > > - A user re-configures 'watchdog_thresh' on the fly. The reconfiguration
> > >   requires parking/unparking of all watchdog threads.
> > 
> > This is where you fail, its silly to require parking for
> > reconfiguration.
> 
> Hi Peter,
> 
> Ok, please elaborate.  Unless I am misunderstanding, that is what Thomas
> requested us do years ago when he implemented the parking/unparking scheme
> and what his current patch set is doing now.
> 
> The point of parking I believe was to avoid the overhead of tearing down a
> thread and restarting it when the code needed to update various lockup
> detector settings.
> 
> So if we can't depend on parking for reconfiguration, then are the other
> options (besides tearing down threads)?

Yes, the park/unpark is what I still use as this was the simplest way to
keep everything in sync.

I pondered to do on the fly reconfiguration as well, but that would add
more code and would not solve the general issue of park/unpark. So I rather
went for a single mechanism which just works, even if it is suboptimal cpu
cycle wise. OTOH that reconfiguration is not something which happens every
5ms, so we can just go for the stupid, but simple mechanism.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation
  2017-09-05 13:58     ` Thomas Gleixner
@ 2017-09-05 19:19       ` Thomas Gleixner
  2017-09-14 10:43         ` [tip:core/urgent] watchdog/core: Remove the " tip-bot for Thomas Gleixner
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2017-09-05 19:19 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin, Don Zickus,
	Chris Metcalf

b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded
system") tries to fix the following issue:

proc_write()
   set_sample_period()    <--- New sample period becoms visible
 			  <----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus() 	watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
        watchdog_park_threads()

					thread->park()
					  disable_nmi()
			  <----- Broken ends

The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.

The patch 'fixed' this by adding a variable:

proc_write()
   set_sample_period()
 					<----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
         watchdog_park_threads()
 	  park_in_progress = 1
 					<----- Broken ends
	  			        nmi_watchdog()
					  if (park_in_progress)
					     return;

The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.

Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.

 watchdog_start()
	perf_nmi_event_start()
	hrtimer_start()

 watchdog_stop()
	hrtimer_cancel()
	perf_nmi_event_stop()

In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.

Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.

Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.

Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

V2: Make the changelog technically and politically correct.

 include/linux/nmi.h   |    1 -
 kernel/watchdog.c     |   39 ++++++++++++++++++---------------------
 kernel/watchdog_hld.c |    7 ++-----
 3 files changed, 20 insertions(+), 27 deletions(-)

--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -27,7 +27,6 @@ extern void touch_softlockup_watchdog_sy
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
 extern int soft_watchdog_enabled;
-extern atomic_t watchdog_park_in_progress;
 #else
 static inline void touch_softlockup_watchdog_sched(void)
 {
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -134,8 +134,6 @@ void __weak watchdog_nmi_reconfigure(voi
 #define for_each_watchdog_cpu(cpu) \
 	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
 
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -320,8 +318,7 @@ static enum hrtimer_restart watchdog_tim
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
 
-	if (!watchdog_enabled ||
-	    atomic_read(&watchdog_park_in_progress) != 0)
+	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
 	/* kick the hardlockup detector */
@@ -435,33 +432,38 @@ static void watchdog_set_prio(unsigned i
 
 static void watchdog_enable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
-	/* kick off the timer for the hardlockup detector */
+	/*
+	 * Start the timer first to prevent the NMI watchdog triggering
+	 * before the timer has a chance to fire.
+	 */
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
+	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
+		      HRTIMER_MODE_REL_PINNED);
 
+	/* Initialize timestamp */
+	__touch_watchdog();
 	/* Enable the perf event */
 	watchdog_nmi_enable(cpu);
 
-	/* done here because hrtimer_start can only pin to smp_processor_id() */
-	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
-		      HRTIMER_MODE_REL_PINNED);
-
-	/* initialize timestamp */
 	watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
-	__touch_watchdog();
 }
 
 static void watchdog_disable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
 	watchdog_set_prio(SCHED_NORMAL, 0);
-	hrtimer_cancel(hrtimer);
-	/* disable the perf event */
-	watchdog_nmi_disable(cpu);
+	/*
+	 * Disable the perf event first. That prevents that a large delay
+	 * between disabling the timer and disabling the perf event causes
+	 * the perf NMI to detect a false positive.
+	 */
 	hardlockup_detector_perf_disable();
+	watchdog_nmi_disable(cpu);
+	hrtimer_cancel(hrtimer);
 }
 
 static void watchdog_cleanup(unsigned int cpu, bool online)
@@ -517,16 +519,11 @@ static int watchdog_park_threads(void)
 {
 	int cpu, ret = 0;
 
-	atomic_set(&watchdog_park_in_progress, 1);
-
 	for_each_watchdog_cpu(cpu) {
 		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
 		if (ret)
 			break;
 	}
-
-	atomic_set(&watchdog_park_in_progress, 0);
-
 	return ret;
 }
 
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,15 +106,12 @@ static struct perf_event_attr wd_hw_attr
 
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event,
-		 struct perf_sample_data *data,
-		 struct pt_regs *regs)
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
 {
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
-	if (atomic_read(&watchdog_park_in_progress) != 0)
-		return;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events
  2017-08-31  7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
@ 2017-09-06 16:14   ` Borislav Petkov
  0 siblings, 0 replies; 47+ messages in thread
From: Borislav Petkov @ 2017-09-06 16:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Sebastian Siewior, Nicholas Piggin, Don Zickus, Chris Metcalf,
	Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:15:59AM +0200, Thomas Gleixner wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Provide a interface to stop and restart perf NMI watchdog events on all

"an interface"

> CPUs. This is only useable during init and especially for handling the perf

"usable"

> HT bug on Intel machines. It's safe to use it this way as nothing can
> start/stop the NMI watchdog in parallel.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/nmi.h   |    4 ++++
>  kernel/watchdog_hld.c |   41 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+)
> 
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -75,7 +75,11 @@ static inline void hardlockup_detector_d
>  
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  extern void arch_touch_nmi_watchdog(void);
> +extern void hardlockup_detector_perf_stop(void);
> +extern void hardlockup_detector_perf_restart(void);
>  #else
> +static inline void hardlockup_detector_perf_stop(void) { }
> +static inline void hardlockup_detector_perf_restart(void) { }
>  #if !defined(CONFIG_HAVE_NMI_WATCHDOG)
>  static inline void arch_touch_nmi_watchdog(void) {}
>  #endif
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -261,3 +261,44 @@ void watchdog_nmi_disable(unsigned int c
>  			firstcpu_err = 0;
>  	}
>  }
> +
> +/**
> + * hardlockup_detector_perf_stop - Globally stop watchdog events
> + *
> + * Special interface for x86 to handle the perf HT bug.
> + */
> +void __init hardlockup_detector_perf_stop(void)

Right, can we make those naming prefixes shorter? Something like
hld_perf_stop() and hld_perf_restart()?

File has already "hld" in the name so ...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 24/29] lockup_detector/perf: Implement init time perf validation
  2017-08-31  7:16 ` [patch 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
@ 2017-09-07 15:58   ` Don Zickus
  0 siblings, 0 replies; 47+ messages in thread
From: Don Zickus @ 2017-09-07 15:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:16:22AM +0200, Thomas Gleixner wrote:
> The watchdog tries to create perf events even after it figured out that
> perf is not functional or the requested event is not supported.
> 
> That's braindead as this can be done once at init time and if not supported
> the NMI watchdog can be turned off unconditonally.
> 
> Implement the perf hardlockup detector functionality for that. This creates
> a new event create function, which will replace the unholy mess of the
> existing one in later patches.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  include/linux/nmi.h   |    8 ++++++--
>  kernel/watchdog_hld.c |   37 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -93,14 +93,18 @@ extern void hardlockup_detector_perf_sto
>  extern void hardlockup_detector_perf_restart(void);
>  extern void hardlockup_detector_perf_disable(void);
>  extern void hardlockup_detector_perf_cleanup(void);
> +extern int hardlockup_detector_perf_init(void);
>  #else
>  static inline void hardlockup_detector_perf_stop(void) { }
>  static inline void hardlockup_detector_perf_restart(void) { }
>  static inline void hardlockup_detector_perf_disable(void) { }
>  static inline void hardlockup_detector_perf_cleanup(void) { }
> -#if !defined(CONFIG_HAVE_NMI_WATCHDOG)
> +# if !defined(CONFIG_HAVE_NMI_WATCHDOG)
> +static int hardlockup_detector_perf_init(void) { return -ENODEV; }
>  static inline void arch_touch_nmi_watchdog(void) {}
> -#endif
> +# else
> +static int hardlockup_detector_perf_init(void) { return 0; }

hardlockup_detector_perf_init needs to be 'inline' otherwise fails to
compile with CONFIG_HARDLOCKUP_DETECTOR_PERF turned off.


Cheers,
Don

> +# endif
>  #endif
>  
>  void watchdog_nmi_reconfigure(bool run);
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -238,6 +238,27 @@ int watchdog_nmi_enable(unsigned int cpu
>  	return 0;
>  }
>  
> +static int hardlockup_detector_event_create(void)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	struct perf_event_attr *wd_attr;
> +	struct perf_event *evt;
> +
> +	wd_attr = &wd_hw_attr;
> +	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
> +
> +	/* Try to register using hardware perf events */
> +	evt = perf_event_create_kernel_counter(wd_attr, cpu, NULL,
> +					       watchdog_overflow_callback, NULL);
> +	if (IS_ERR(evt)) {
> +		pr_info("Perf event create on CPU %d failed with %ld\n", cpu,
> +			PTR_ERR(evt));
> +		return PTR_ERR(evt);
> +	}
> +	this_cpu_write(watchdog_ev, evt);
> +	return 0;
> +}
> +
>  /**
>   * hardlockup_detector_perf_disable - Disable the local event
>   */
> @@ -315,3 +336,19 @@ void __init hardlockup_detector_perf_res
>  			perf_event_enable(event);
>  	}
>  }
> +
> +/**
> + * hardlockup_detector_perf_init - Probe whether NMI event is available at all
> + */
> +int __init hardlockup_detector_perf_init(void)
> +{
> +	int ret = hardlockup_detector_event_create();
> +
> +	if (ret) {
> +		pr_info("Perf NMI watchdog permanetely disabled\n");
> +	} else {
> +		perf_event_release_kernel(this_cpu_read(watchdog_ev));
> +		this_cpu_write(watchdog_ev, NULL);
> +	}
> +	return ret;
> +}
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape
  2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
                   ` (29 preceding siblings ...)
  2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
@ 2017-09-07 16:04 ` Don Zickus
  30 siblings, 0 replies; 47+ messages in thread
From: Don Zickus @ 2017-09-07 16:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Borislav Petkov, Sebastian Siewior, Nicholas Piggin,
	Chris Metcalf, Ulrich Obergfell

On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
> 
>     - It's deadlock prone vs. CPU hotplug in various ways. Some of these
>       are due to recursive cpus_read_lock() others are due to
>       cpus_read_lock() from CPU hotplug callbacks which immediately lock
>       the machine because cpus are write locked.
> 
>     - The handling of the cpu hotplug threads happens sideways to the
>       smpboot thread infrastructure, which is racy and pointless
> 
>     - The handling of the user space sysctl interface is a complete
>       trainwreck as it fiddles directly with variables which can be
>       modified or evaluated by the running watchdogs.
> 
>     - The perf event initialization is a steaming pile of duct tape as it
>       idiotically tries to create perf events over and over even if perf is
>       not functional (no hardware, ....). To avoid excessive dmesg spam it
>       contains magic printk ratelimiting along with either wrong or useless
>       messages.
> 
>     - The code structure is horrible as ifdef sections are scattered all
>       over the place which makes it unreadable
> 
>     - There is more wreckage, but see the changelogs for the ugly details.
> 
> Before I get utterly grumpy, I just pretend that I don't give a sh*t!
> 
> The following series sanitizes the facility and addresses the problems.

One of the goals I was trying to achieve with splitting out watchdog_hld.c
was to abstract it as another hw nmi thing.  As some arches wanted to move
away from using perf as a hardlockup detector.

So watchdog_nmi_enable/disable was an attempt to be that hook, maybe
watchdog_nmi_reconfigure.

I think some of your hardlockup_detector_perf_enable/disable/restart might
fit into that.  The _cleanup() probably not.

Other than that and the compile issue, I don't really have much problems
with the bulk of the changes and my simple tests seem to work fine.

Cheers,
Don

> 
> Thanks,
> 
> 	tglx
> ---
>  arch/parisc/kernel/process.c   |    2 
>  arch/powerpc/kernel/watchdog.c |   22 -
>  arch/x86/events/intel/core.c   |   11 
>  include/linux/nmi.h            |  121 +++----
>  include/linux/smpboot.h        |    4 
>  kernel/cpu.c                   |    6 
>  kernel/smpboot.c               |   22 -
>  kernel/sysctl.c                |   22 -
>  kernel/watchdog.c              |  638 ++++++++++++++---------------------------
>  kernel/watchdog_hld.c          |  193 ++++++------
>  10 files changed, 433 insertions(+), 608 deletions(-)
> 
>       

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [tip:core/urgent] watchdog/core: Remove the park_in_progress obfuscation
  2017-09-05 19:19       ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
@ 2017-09-14 10:43         ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: tip-bot for Thomas Gleixner @ 2017-09-14 10:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dzickus, hpa, bigeasy, tglx, akpm, mingo, npiggin,
	cmetcalf, torvalds, bp, uobergfe, peterz

Commit-ID:  01f0a02701cbcf32d22cfc9d1ab9a3f0ff2ba68c
Gitweb:     http://git.kernel.org/tip/01f0a02701cbcf32d22cfc9d1ab9a3f0ff2ba68c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Tue, 12 Sep 2017 21:37:05 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 14 Sep 2017 11:41:05 +0200

watchdog/core: Remove the park_in_progress obfuscation

Commit:

  b94f51183b06 ("kernel/watchdog: prevent false hardlockup on overloaded system")

tries to fix the following issue:

proc_write()
   set_sample_period()    <--- New sample period becoms visible
			  <----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
        watchdog_park_threads()

					thread->park()
					  disable_nmi()
			  <----- Broken ends

The reason why this is broken is that the update of the watchdog threshold
becomes immediately effective and visible for the hrtimer function which
uses that value to rearm the timer. But the NMI/perf side still uses the
old value up to the point where it is disabled. If the rate has been
lowered then the NMI can run fast enough to 'detect' a hard lockup because
the timer has not fired due to the longer period.

The patch 'fixed' this by adding a variable:

proc_write()
   set_sample_period()
					<----- Broken starts
   proc_watchdog_update()
     watchdog_enable_all_cpus()		watchdog_hrtimer_fn()
     update_watchdog_all_cpus()		   restart_timer(sample_period)
         watchdog_park_threads()
	  park_in_progress = 1
					<----- Broken ends
				        nmi_watchdog()
					  if (park_in_progress)
					     return;

The only effect of this variable was to make the window where the breakage
can hit small enough that it was not longer observable in testing. From a
correctness point of view it is a pointless bandaid which merily papers
over the root cause: the unsychronized update of the variable.

Looking deeper into the related code pathes unearthed similar problems in
the watchdog_start()/stop() functions.

 watchdog_start()
	perf_nmi_event_start()
	hrtimer_start()

 watchdog_stop()
	hrtimer_cancel()
	perf_nmi_event_stop()

In both cases the call order is wrong because if the tasks gets preempted
or the VM gets scheduled out long enough after the first call, then there is
a chance that the next NMI will see a stale hrtimer interrupt count and
trigger a false positive hard lockup splat.

Get rid of park_in_progress so the code can be gradually deobfuscated and
pruned from several layers of duct tape papering over the root cause,
which has been either ignored or not understood at all.

Once this is removed the underlying problem will be fixed by rewriting the
proc interface to do a proper synchronized update.

Address the start/stop() ordering problem as well by reverting the call
order, so this part is at least correct now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1709052038270.2393@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/nmi.h   |  1 -
 kernel/watchdog.c     | 37 +++++++++++++++++--------------------
 kernel/watchdog_hld.c |  7 ++-----
 3 files changed, 19 insertions(+), 26 deletions(-)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 80354e6..91a3a4a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -27,7 +27,6 @@ extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
 extern unsigned int  softlockup_panic;
 extern int soft_watchdog_enabled;
-extern atomic_t watchdog_park_in_progress;
 #else
 static inline void touch_softlockup_watchdog_sched(void)
 {
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index dd1fd59..c290135 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -136,8 +136,6 @@ void __weak watchdog_nmi_reconfigure(void)
 #define for_each_watchdog_cpu(cpu) \
 	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
 
-atomic_t watchdog_park_in_progress = ATOMIC_INIT(0);
-
 static u64 __read_mostly sample_period;
 
 static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts);
@@ -322,8 +320,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 	int duration;
 	int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
 
-	if (!watchdog_enabled ||
-	    atomic_read(&watchdog_park_in_progress) != 0)
+	if (!watchdog_enabled)
 		return HRTIMER_NORESTART;
 
 	/* kick the hardlockup detector */
@@ -437,32 +434,37 @@ static void watchdog_set_prio(unsigned int policy, unsigned int prio)
 
 static void watchdog_enable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
-	/* kick off the timer for the hardlockup detector */
+	/*
+	 * Start the timer first to prevent the NMI watchdog triggering
+	 * before the timer has a chance to fire.
+	 */
 	hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	hrtimer->function = watchdog_timer_fn;
+	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
+		      HRTIMER_MODE_REL_PINNED);
 
+	/* Initialize timestamp */
+	__touch_watchdog();
 	/* Enable the perf event */
 	watchdog_nmi_enable(cpu);
 
-	/* done here because hrtimer_start can only pin to smp_processor_id() */
-	hrtimer_start(hrtimer, ns_to_ktime(sample_period),
-		      HRTIMER_MODE_REL_PINNED);
-
-	/* initialize timestamp */
 	watchdog_set_prio(SCHED_FIFO, MAX_RT_PRIO - 1);
-	__touch_watchdog();
 }
 
 static void watchdog_disable(unsigned int cpu)
 {
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
+	struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
 
 	watchdog_set_prio(SCHED_NORMAL, 0);
-	hrtimer_cancel(hrtimer);
-	/* disable the perf event */
+	/*
+	 * Disable the perf event first. That prevents that a large delay
+	 * between disabling the timer and disabling the perf event causes
+	 * the perf NMI to detect a false positive.
+	 */
 	watchdog_nmi_disable(cpu);
+	hrtimer_cancel(hrtimer);
 }
 
 static void watchdog_cleanup(unsigned int cpu, bool online)
@@ -518,16 +520,11 @@ static int watchdog_park_threads(void)
 {
 	int cpu, ret = 0;
 
-	atomic_set(&watchdog_park_in_progress, 1);
-
 	for_each_watchdog_cpu(cpu) {
 		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
 		if (ret)
 			break;
 	}
-
-	atomic_set(&watchdog_park_in_progress, 0);
-
 	return ret;
 }
 
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 94111cc..0aa191e 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -106,15 +106,12 @@ static struct perf_event_attr wd_hw_attr = {
 
 /* Callback function for perf event subsystem */
 static void watchdog_overflow_callback(struct perf_event *event,
-		 struct perf_sample_data *data,
-		 struct pt_regs *regs)
+				       struct perf_sample_data *data,
+				       struct pt_regs *regs)
 {
 	/* Ensure the watchdog never gets throttled */
 	event->hw.interrupts = 0;
 
-	if (atomic_read(&watchdog_park_in_progress) != 0)
-		return;
-
 	if (__this_cpu_read(watchdog_nmi_touch) == true) {
 		__this_cpu_write(watchdog_nmi_touch, false);
 		return;

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-09-14 10:49 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-31  7:15 [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Thomas Gleixner
2017-08-31  7:15 ` [patch 01/29] hardlockup_detector: Provide interface to stop/restart perf events Thomas Gleixner
2017-09-06 16:14   ` Borislav Petkov
2017-08-31  7:16 ` [patch 02/29] perf/x86/intel: Sanitize PMU HT bug workaround Thomas Gleixner
2017-08-31  7:16 ` [patch 03/29] lockup_detector: Provide interface to stop from poweroff() Thomas Gleixner
2017-08-31  7:16 ` [patch 04/29] parisc: Use lockup_detector_stop() Thomas Gleixner
2017-08-31  7:16 ` [patch 05/29] lockup_detector: Remove broken suspend/resume interfaces Thomas Gleixner
2017-08-31  7:16 ` [patch 06/29] lockup_detector: Rework cpu hotplug locking Thomas Gleixner
2017-08-31  7:16 ` [patch 07/29] lockup_detector: Rename watchdog_proc_mutex Thomas Gleixner
2017-08-31  7:16 ` [patch 08/29] lockup_detector: Mark hardlockup_detector_disable() __init Thomas Gleixner
2017-08-31  7:16 ` [patch 09/29] lockup_detector/perf: Remove broken self disable on failure Thomas Gleixner
2017-08-31  7:16 ` [patch 10/29] lockup_detector/perf: Prevent cpu hotplug deadlock Thomas Gleixner
2017-09-01 19:02   ` Don Zickus
2017-09-01 19:29     ` Thomas Gleixner
2017-09-05 14:51       ` Don Zickus
2017-08-31  7:16 ` [patch 11/29] lockup_detector: Remove park_in_progress hackery Thomas Gleixner
     [not found]   ` <CAEeg4=CJohPTi8FUNWqb3egsbZnExyJapcNC7wD-2amXTsMrYw@mail.gmail.com>
2017-09-04 12:10     ` Peter Zijlstra
2017-09-05 15:15       ` Don Zickus
2017-09-05 15:42         ` Thomas Gleixner
2017-09-05 13:58     ` Thomas Gleixner
2017-09-05 19:19       ` [patch V2 11/29] lockup_detector: Remove park_in_progress obfuscation Thomas Gleixner
2017-09-14 10:43         ` [tip:core/urgent] watchdog/core: Remove the " tip-bot for Thomas Gleixner
2017-08-31  7:16 ` [patch 12/29] lockup_detector: Cleanup stub functions Thomas Gleixner
2017-08-31  7:16 ` [patch 13/29] lockup_detector: Cleanup the ifdef maze Thomas Gleixner
2017-08-31  7:16 ` [patch 14/29] lockup_detector: Split out cpumask write function Thomas Gleixner
2017-08-31  7:16 ` [patch 15/29] smpboot/threads: Avoid runtime allocation Thomas Gleixner
2017-08-31  7:16 ` [patch 16/29] lockup_detector: Create new thread handling infrastructure Thomas Gleixner
2017-08-31  7:16 ` [patch 17/29] lockup_detector: Get rid of the thread teardown/setup dance Thomas Gleixner
2017-09-01 19:08   ` Don Zickus
2017-09-01 19:45     ` Thomas Gleixner
2017-08-31  7:16 ` [patch 18/29] lockup_detector: Further simplify sysctl handling Thomas Gleixner
2017-08-31  7:16 ` [patch 19/29] lockup_detector: Cleanup header mess Thomas Gleixner
2017-08-31  7:16 ` [patch 20/29] lockup_detector/sysctl: Get rid of the ifdeffery Thomas Gleixner
2017-08-31  7:16 ` [patch 21/29] lockup_detector: Cleanup sysctl variable name space Thomas Gleixner
2017-08-31  7:16 ` [patch 22/29] lockup_detector: Make watchdog_nmi_reconfigure() two stage Thomas Gleixner
2017-08-31  7:16 ` [patch 23/29] lockup_detector: Get rid of the racy update loop Thomas Gleixner
2017-08-31  7:16 ` [patch 24/29] lockup_detector/perf: Implement init time perf validation Thomas Gleixner
2017-09-07 15:58   ` Don Zickus
2017-08-31  7:16 ` [patch 25/29] lockup_detector: Implement init time detection of perf Thomas Gleixner
2017-08-31  7:16 ` [patch 26/29] lockup_detector/perf: Implement CPU enable replacement Thomas Gleixner
2017-08-31  7:16 ` [patch 27/29] lockup_detector: Use new perf CPU enable mechanism Thomas Gleixner
2017-08-31  7:16 ` [patch 28/29] lockup_detector/perf: Simplify deferred event destroy Thomas Gleixner
2017-08-31  7:16 ` [patch 29/29] lockup_detector: Cleanup hotplug locking mess Thomas Gleixner
2017-08-31 22:10 ` [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape Don Zickus
2017-09-01  4:42   ` Nicholas Piggin
2017-09-01  9:18   ` Thomas Gleixner
2017-09-07 16:04 ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).