All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions
@ 2015-08-01 12:49 Ulrich Obergfell
  2015-08-01 12:49 ` [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads() Ulrich Obergfell
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, dzickus, atomlin, uobergfe, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
only called in watchdog thread context. However, the following commits
utilize these functions outside of watchdog thread context too.

  commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
  Author: Michal Hocko <mhocko@suse.cz>
  Date:   Tue Sep 24 15:27:30 2013 -0700

      watchdog: update watchdog_thresh properly

  commit b3738d29323344da3017a91010530cf3a58590fc
  Author: Stephane Eranian <eranian@google.com>
  Date:   Mon Nov 17 20:07:03 2014 +0100

      watchdog: Add watchdog enable/disable all functions

Hence, it is now possible that these functions execute concurrently with
the same 'cpu' argument. This concurrency is problematic because per-cpu
'watchdog_ev' can be accessed/modified without adequate synchronization.

The patch series aims to address the above problem. However, instead of
introducing locks to protect per-cpu 'watchdog_ev' a different approach
is taken: Invoke these functions by parking and unparking the watchdog
threads (to ensure they are always called in watchdog thread context).

  static struct smp_hotplug_thread watchdog_threads = {
           ...
          .park   = watchdog_disable, // calls watchdog_nmi_disable()
          .unpark = watchdog_enable,  // calls watchdog_nmi_enable()
  };

Both previously mentioned commits call these functions in a similar way
and thus in principle contain some duplicate code. The patch series also
avoids this duplication by providing a commonly usable mechanism.

- Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
  park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
  are intended to be called inside of kernel/watchdog.c only.

- Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
  be utilized by external callers to deactivate the hard and soft lockup
  detector temporarily.

- Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
  that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.

- Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
  was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.

A few corner cases should be mentioned here for completeness.

- kthread_park() of watchdog/N could hang if cpu N is already locked up.
  However, if watchdog is enabled the lockup will be detected anyway.

- kthread_unpark() of watchdog/N could hang if cpu N got locked up after
  kthread_park(). The occurrence of this scenario should be _very_ rare
  in practice, in particular because it is not expected that temporary
  deactivation will happen frequently, and if it happens at all it is
  expected that the duration of deactivation will be short.

Ulrich Obergfell (4):
  watchdog: introduce watchdog_park_threads() and
    watchdog_unpark_threads()
  watchdog: introduce watchdog_suspend() and watchdog_resume()
  watchdog: use park/unpark functions in update_watchdog_all_cpus()
  watchdog: use suspend/resume interface in fixup_ht_bug()

 arch/x86/kernel/cpu/perf_event_intel.c |   9 +-
 include/linux/nmi.h                    |   2 +
 include/linux/watchdog.h               |   8 --
 kernel/watchdog.c                      | 157 +++++++++++++++++++--------------
 4 files changed, 100 insertions(+), 76 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads()
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
@ 2015-08-01 12:49 ` Ulrich Obergfell
  2015-08-04 13:26   ` Michal Hocko
  2015-08-01 12:49 ` [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume() Ulrich Obergfell
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, dzickus, atomlin, uobergfe, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

These functions are intended to be used only from inside kernel/watchdog.c
to park/unpark all watchdog threads that are specified in watchdog_cpumask.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a6ffa43..5571f20 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -24,6 +24,7 @@
 #include <asm/irq_regs.h>
 #include <linux/kvm_para.h>
 #include <linux/perf_event.h>
+#include <linux/kthread.h>
 
 /*
  * The run state of the lockup detectors is controlled by the content of the
@@ -666,6 +667,41 @@ static struct smp_hotplug_thread watchdog_threads = {
 	.unpark			= watchdog_enable,
 };
 
+/*
+ * park all watchdog threads that are specified in 'watchdog_cpumask'
+ */
+static int watchdog_park_threads(void)
+{
+	int cpu, ret = 0;
+
+	get_online_cpus();
+	for_each_watchdog_cpu(cpu) {
+		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
+		if (ret)
+			break;
+	}
+	if (ret) {
+		for_each_watchdog_cpu(cpu)
+			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+	}
+	put_online_cpus();
+
+	return ret;
+}
+
+/*
+ * unpark all watchdog threads that are specified in 'watchdog_cpumask'
+ */
+static void watchdog_unpark_threads(void)
+{
+	int cpu;
+
+	get_online_cpus();
+	for_each_watchdog_cpu(cpu)
+		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
+	put_online_cpus();
+}
+
 static void restart_watchdog_hrtimer(void *info)
 {
 	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
  2015-08-01 12:49 ` [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads() Ulrich Obergfell
@ 2015-08-01 12:49 ` Ulrich Obergfell
  2015-08-01 14:04   ` Guenter Roeck
  2015-08-04 22:58   ` Andrew Morton
  2015-08-01 12:49 ` [PATCH 3/4] watchdog: use park/unpark functions in update_watchdog_all_cpus() Ulrich Obergfell
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, dzickus, atomlin, uobergfe, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

This interface can be utilized to deactivate the hard and soft lockup
detector temporarily. Callers are expected to minimize the duration of
deactivation. Multiple deactivations are allowed to occur in parallel
but should be rare in practice.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 include/linux/nmi.h |  2 ++
 kernel/watchdog.c   | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index f94da0e..60050c2 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
 				void __user *, size_t *, loff_t *);
 extern int proc_watchdog_cpumask(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
+extern int watchdog_suspend(void);
+extern void watchdog_resume(void);
 #endif
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5571f20..98d44b1 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
 #define for_each_watchdog_cpu(cpu) \
 	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
 
+static int __read_mostly watchdog_suspended = 0;
 static int __read_mostly watchdog_running;
 static u64 __read_mostly sample_period;
 
@@ -702,6 +703,50 @@ static void watchdog_unpark_threads(void)
 	put_online_cpus();
 }
 
+/*
+ * Suspend the hard and soft lockup detector by parking the watchdog threads.
+ */
+int watchdog_suspend(void)
+{
+	int ret = 0;
+
+	mutex_lock(&watchdog_proc_mutex);
+	/*
+	 * Multiple suspend requests can be active in parallel (counted by
+	 * the 'watchdog_suspended' variable). If the watchdog threads are
+	 * running, the first caller takes care that they will be parked.
+	 * The state of 'watchdog_running' cannot change while a suspend
+	 * request is active (see related changes in 'proc' handlers).
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		ret = watchdog_park_threads();
+
+	if (ret == 0)
+		watchdog_suspended++;
+
+	mutex_unlock(&watchdog_proc_mutex);
+
+	return ret;
+}
+
+/*
+ * Resume the hard and soft lockup detector by unparking the watchdog threads.
+ */
+void watchdog_resume(void)
+{
+	mutex_lock(&watchdog_proc_mutex);
+
+	watchdog_suspended--;
+	/*
+	 * The watchdog threads are unparked if they were previously running
+	 * and if there is no more active suspend request.
+	 */
+	if (watchdog_running && !watchdog_suspended)
+		watchdog_unpark_threads();
+
+	mutex_unlock(&watchdog_proc_mutex);
+}
+
 static void restart_watchdog_hrtimer(void *info)
 {
 	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
@@ -823,6 +868,12 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 
 	mutex_lock(&watchdog_proc_mutex);
 
+	if (watchdog_suspended) {
+		/* no parameter changes allowed while watchdog is suspended */
+		err = -EAGAIN;
+		goto out;
+	}
+
 	/*
 	 * If the parameter is being read return the state of the corresponding
 	 * bit(s) in 'watchdog_enabled', else update 'watchdog_enabled' and the
@@ -908,6 +959,12 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
 
 	mutex_lock(&watchdog_proc_mutex);
 
+	if (watchdog_suspended) {
+		/* no parameter changes allowed while watchdog is suspended */
+		err = -EAGAIN;
+		goto out;
+	}
+
 	old = ACCESS_ONCE(watchdog_thresh);
 	err = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
 
@@ -939,6 +996,13 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 	int err;
 
 	mutex_lock(&watchdog_proc_mutex);
+
+	if (watchdog_suspended) {
+		/* no parameter changes allowed while watchdog is suspended */
+		err = -EAGAIN;
+		goto out;
+	}
+
 	err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
 	if (!err && write) {
 		/* Remove impossible cpus to keep sysctl output cleaner. */
@@ -956,6 +1020,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 				pr_err("cpumask update failed\n");
 		}
 	}
+out:
 	mutex_unlock(&watchdog_proc_mutex);
 	return err;
 }
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] watchdog: use park/unpark functions in update_watchdog_all_cpus()
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
  2015-08-01 12:49 ` [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads() Ulrich Obergfell
  2015-08-01 12:49 ` [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume() Ulrich Obergfell
@ 2015-08-01 12:49 ` Ulrich Obergfell
  2015-08-01 12:49 ` [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug() Ulrich Obergfell
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, dzickus, atomlin, uobergfe, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

Remove update_watchdog() and restart_watchdog_hrtimer() since these
functions are no longer needed. Changes of parameters such as the
sample period are honored at the time when the watchdog threads are
being unparked.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 40 ++--------------------------------------
 1 file changed, 2 insertions(+), 38 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 98d44b1..3618e93 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -747,46 +747,10 @@ void watchdog_resume(void)
 	mutex_unlock(&watchdog_proc_mutex);
 }
 
-static void restart_watchdog_hrtimer(void *info)
-{
-	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
-	int ret;
-
-	/*
-	 * No need to cancel and restart hrtimer if it is currently executing
-	 * because it will reprogram itself with the new period now.
-	 * We should never see it unqueued here because we are running per-cpu
-	 * with interrupts disabled.
-	 */
-	ret = hrtimer_try_to_cancel(hrtimer);
-	if (ret == 1)
-		hrtimer_start(hrtimer, ns_to_ktime(sample_period),
-				HRTIMER_MODE_REL_PINNED);
-}
-
-static void update_watchdog(int cpu)
-{
-	/*
-	 * Make sure that perf event counter will adopt to a new
-	 * sampling period. Updating the sampling period directly would
-	 * be much nicer but we do not have an API for that now so
-	 * let's use a big hammer.
-	 * Hrtimer will adopt the new period on the next tick but this
-	 * might be late already so we have to restart the timer as well.
-	 */
-	watchdog_nmi_disable(cpu);
-	smp_call_function_single(cpu, restart_watchdog_hrtimer, NULL, 1);
-	watchdog_nmi_enable(cpu);
-}
-
 static void update_watchdog_all_cpus(void)
 {
-	int cpu;
-
-	get_online_cpus();
-	for_each_watchdog_cpu(cpu)
-		update_watchdog(cpu);
-	put_online_cpus();
+	watchdog_park_threads();
+	watchdog_unpark_threads();
 }
 
 static int watchdog_enable_all_cpus(void)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
                   ` (2 preceding siblings ...)
  2015-08-01 12:49 ` [PATCH 3/4] watchdog: use park/unpark functions in update_watchdog_all_cpus() Ulrich Obergfell
@ 2015-08-01 12:49 ` Ulrich Obergfell
  2015-08-04 13:31   ` Michal Hocko
  2015-08-05 22:17   ` Andrew Morton
  2015-08-02 10:54 ` [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Aaron Tomlin
  2015-08-02 22:14 ` Jiri Olsa
  5 siblings, 2 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 12:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, dzickus, atomlin, uobergfe, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all()
since these functions are no longer needed. If a subsystem has a
need to deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 arch/x86/kernel/cpu/perf_event_intel.c |  9 +++++---
 include/linux/watchdog.h               |  8 -------
 kernel/watchdog.c                      | 38 ----------------------------------
 3 files changed, 6 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index b9826a9..d4e1b0c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -12,7 +12,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/export.h>
-#include <linux/watchdog.h>
+#include <linux/nmi.h>
 
 #include <asm/cpufeature.h>
 #include <asm/hardirq.h>
@@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
 		return 0;
 	}
 
-	watchdog_nmi_disable_all();
+	if (watchdog_suspend() != 0) {
+		pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
+		return 0;
+	}
 
 	x86_pmu.flags &= ~(PMU_FL_EXCL_CNTRS | PMU_FL_EXCL_ENABLED);
 
@@ -3376,7 +3379,7 @@ static __init int fixup_ht_bug(void)
 	x86_pmu.commit_scheduling = NULL;
 	x86_pmu.stop_scheduling = NULL;
 
-	watchdog_nmi_enable_all();
+	watchdog_resume();
 
 	get_online_cpus();
 
diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
index f47fead..d74a0e9 100644
--- a/include/linux/watchdog.h
+++ b/include/linux/watchdog.h
@@ -140,12 +140,4 @@ extern int watchdog_init_timeout(struct watchdog_device *wdd,
 extern int watchdog_register_device(struct watchdog_device *);
 extern void watchdog_unregister_device(struct watchdog_device *);
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
-void watchdog_nmi_disable_all(void);
-void watchdog_nmi_enable_all(void);
-#else
-static inline void watchdog_nmi_disable_all(void) {}
-static inline void watchdog_nmi_enable_all(void) {}
-#endif
-
 #endif  /* ifndef _LINUX_WATCHDOG_H */
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3618e93..bf751ef 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -614,47 +614,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
 		cpu0_err = 0;
 	}
 }
-
-void watchdog_nmi_enable_all(void)
-{
-	int cpu;
-
-	mutex_lock(&watchdog_proc_mutex);
-
-	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
-		goto unlock;
-
-	get_online_cpus();
-	for_each_watchdog_cpu(cpu)
-		watchdog_nmi_enable(cpu);
-	put_online_cpus();
-
-unlock:
-	mutex_unlock(&watchdog_proc_mutex);
-}
-
-void watchdog_nmi_disable_all(void)
-{
-	int cpu;
-
-	mutex_lock(&watchdog_proc_mutex);
-
-	if (!watchdog_running)
-		goto unlock;
-
-	get_online_cpus();
-	for_each_watchdog_cpu(cpu)
-		watchdog_nmi_disable(cpu);
-	put_online_cpus();
-
-unlock:
-	mutex_unlock(&watchdog_proc_mutex);
-}
 #else
 static int watchdog_nmi_enable(unsigned int cpu) { return 0; }
 static void watchdog_nmi_disable(unsigned int cpu) { return; }
-void watchdog_nmi_enable_all(void) {}
-void watchdog_nmi_disable_all(void) {}
 #endif /* CONFIG_HARDLOCKUP_DETECTOR */
 
 static struct smp_hotplug_thread watchdog_threads = {
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-01 12:49 ` [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume() Ulrich Obergfell
@ 2015-08-01 14:04   ` Guenter Roeck
  2015-08-01 14:39     ` Ulrich Obergfell
  2015-08-04 22:58   ` Andrew Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Guenter Roeck @ 2015-08-01 14:04 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
> This interface can be utilized to deactivate the hard and soft lockup
> detector temporarily. Callers are expected to minimize the duration of
> deactivation. Multiple deactivations are allowed to occur in parallel
> but should be rare in practice.
> 
> Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
> ---
>  include/linux/nmi.h |  2 ++
>  kernel/watchdog.c   | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 67 insertions(+)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index f94da0e..60050c2 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>  				void __user *, size_t *, loff_t *);
>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
>  				 void __user *, size_t *, loff_t *);
> +extern int watchdog_suspend(void);
> +extern void watchdog_resume(void);

How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
with the watchdog subsystem ?

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-01 14:04   ` Guenter Roeck
@ 2015-08-01 14:39     ` Ulrich Obergfell
  2015-08-01 14:47       ` Guenter Roeck
  0 siblings, 1 reply; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-01 14:39 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

> ----- Original Message -----
> From: "Guenter Roeck" <linux@roeck-us.net>
> ...
> Subject: Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
>
> On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
>> This interface can be utilized to deactivate the hard and soft lockup
>> detector temporarily. Callers are expected to minimize the duration of
>> deactivation. Multiple deactivations are allowed to occur in parallel
>> but should be rare in practice.
>> 
>> Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
>> ---
>>  include/linux/nmi.h |  2 ++
>>  kernel/watchdog.c   | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 67 insertions(+)
>> 
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index f94da0e..60050c2 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>>  				void __user *, size_t *, loff_t *);
>>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
>>  				 void __user *, size_t *, loff_t *);
>> +extern int watchdog_suspend(void);
>> +extern void watchdog_resume(void);
>
> How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
> with the watchdog subsystem ?

Guenter,

Good point. However, I would like to avoid the 'nmi_' prefix in the
function names as it could be misleading. watchdog_{suspend|resume}
affect both -the hard and soft lockup detector- so I think function
names like

  lockup_detector_suspend()  instead of watchdog_suspend()
  lockup_detector_resume()   instead of watchdog_resume()

would summarize better what these functions are intended to be used
for. The above names would also be consistent with the existing name
 
  lockup_detector_init()

Many Thanks,

Uli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-01 14:39     ` Ulrich Obergfell
@ 2015-08-01 14:47       ` Guenter Roeck
  0 siblings, 0 replies; 19+ messages in thread
From: Guenter Roeck @ 2015-08-01 14:47 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

On Sat, Aug 01, 2015 at 10:39:22AM -0400, Ulrich Obergfell wrote:
> > ----- Original Message -----
> > From: "Guenter Roeck" <linux@roeck-us.net>
> > ...
> > Subject: Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
> >
> > On Sat, Aug 01, 2015 at 02:49:23PM +0200, Ulrich Obergfell wrote:
> >> This interface can be utilized to deactivate the hard and soft lockup
> >> detector temporarily. Callers are expected to minimize the duration of
> >> deactivation. Multiple deactivations are allowed to occur in parallel
> >> but should be rare in practice.
> >> 
> >> Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
> >> ---
> >>  include/linux/nmi.h |  2 ++
> >>  kernel/watchdog.c   | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 67 insertions(+)
> >> 
> >> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> >> index f94da0e..60050c2 100644
> >> --- a/include/linux/nmi.h
> >> +++ b/include/linux/nmi.h
> >> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
> >>  				void __user *, size_t *, loff_t *);
> >>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
> >>  				 void __user *, size_t *, loff_t *);
> >> +extern int watchdog_suspend(void);
> >> +extern void watchdog_resume(void);
> >
> > How about nmi_watchdog_enable() and nmi_watchdog_disable() to avoid confusion
> > with the watchdog subsystem ?
> 
> Guenter,
> 
> Good point. However, I would like to avoid the 'nmi_' prefix in the
> function names as it could be misleading. watchdog_{suspend|resume}
> affect both -the hard and soft lockup detector- so I think function
> names like
> 
>   lockup_detector_suspend()  instead of watchdog_suspend()
>   lockup_detector_resume()   instead of watchdog_resume()
> 
> would summarize better what these functions are intended to be used
> for. The above names would also be consistent with the existing name
>  
>   lockup_detector_init()
> 
Makes sense.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
                   ` (3 preceding siblings ...)
  2015-08-01 12:49 ` [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug() Ulrich Obergfell
@ 2015-08-02 10:54 ` Aaron Tomlin
  2015-08-02 22:14 ` Jiri Olsa
  5 siblings, 0 replies; 19+ messages in thread
From: Aaron Tomlin @ 2015-08-02 10:54 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, jolsa, mhocko, eranian, cmetcalf, fweisbec

[-- Attachment #1: Type: text/plain, Size: 3689 bytes --]

On Sat 2015-08-01 14:49 +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
> 
>   commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
>   Author: Michal Hocko <mhocko@suse.cz>
>   Date:   Tue Sep 24 15:27:30 2013 -0700
> 
>       watchdog: update watchdog_thresh properly
> 
>   commit b3738d29323344da3017a91010530cf3a58590fc
>   Author: Stephane Eranian <eranian@google.com>
>   Date:   Mon Nov 17 20:07:03 2014 +0100
> 
>       watchdog: Add watchdog enable/disable all functions
> 
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
> 
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
> 
>   static struct smp_hotplug_thread watchdog_threads = {
>            ...
>           .park   = watchdog_disable, // calls watchdog_nmi_disable()
>           .unpark = watchdog_enable,  // calls watchdog_nmi_enable()
>   };
> 
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
> 
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
>   park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
>   are intended to be called inside of kernel/watchdog.c only.
> 
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
>   be utilized by external callers to deactivate the hard and soft lockup
>   detector temporarily.
> 
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
>   that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
> 
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
>   was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
> 
> A few corner cases should be mentioned here for completeness.
> 
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
>   However, if watchdog is enabled the lockup will be detected anyway.
> 
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
>   kthread_park(). The occurrence of this scenario should be _very_ rare
>   in practice, in particular because it is not expected that temporary
>   deactivation will happen frequently, and if it happens at all it is
>   expected that the duration of deactivation will be short.
> 
> Ulrich Obergfell (4):
>   watchdog: introduce watchdog_park_threads() and
>     watchdog_unpark_threads()
>   watchdog: introduce watchdog_suspend() and watchdog_resume()
>   watchdog: use park/unpark functions in update_watchdog_all_cpus()
>   watchdog: use suspend/resume interface in fixup_ht_bug()
> 
>  arch/x86/kernel/cpu/perf_event_intel.c |   9 +-
>  include/linux/nmi.h                    |   2 +
>  include/linux/watchdog.h               |   8 --
>  kernel/watchdog.c                      | 157 +++++++++++++++++++--------------
>  4 files changed, 100 insertions(+), 76 deletions(-)
> 

The whole patch series looks good to me.

Reviewed-by: Aaron Tomlin <atomlin@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions
  2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
                   ` (4 preceding siblings ...)
  2015-08-02 10:54 ` [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Aaron Tomlin
@ 2015-08-02 22:14 ` Jiri Olsa
  5 siblings, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2015-08-02 22:14 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, mhocko, eranian,
	cmetcalf, fweisbec

On Sat, Aug 01, 2015 at 02:49:21PM +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
> 
>   commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
>   Author: Michal Hocko <mhocko@suse.cz>
>   Date:   Tue Sep 24 15:27:30 2013 -0700
> 
>       watchdog: update watchdog_thresh properly
> 
>   commit b3738d29323344da3017a91010530cf3a58590fc
>   Author: Stephane Eranian <eranian@google.com>
>   Date:   Mon Nov 17 20:07:03 2014 +0100
> 
>       watchdog: Add watchdog enable/disable all functions
> 
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
> 
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
> 
>   static struct smp_hotplug_thread watchdog_threads = {
>            ...
>           .park   = watchdog_disable, // calls watchdog_nmi_disable()
>           .unpark = watchdog_enable,  // calls watchdog_nmi_enable()
>   };
> 
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
> 
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
>   park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
>   are intended to be called inside of kernel/watchdog.c only.
> 
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
>   be utilized by external callers to deactivate the hard and soft lockup
>   detector temporarily.
> 
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
>   that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
> 
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
>   was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
> 
> A few corner cases should be mentioned here for completeness.
> 
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
>   However, if watchdog is enabled the lockup will be detected anyway.
> 
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
>   kthread_park(). The occurrence of this scenario should be _very_ rare
>   in practice, in particular because it is not expected that temporary
>   deactivation will happen frequently, and if it happens at all it is
>   expected that the duration of deactivation will be short.
> 
> Ulrich Obergfell (4):
>   watchdog: introduce watchdog_park_threads() and
>     watchdog_unpark_threads()
>   watchdog: introduce watchdog_suspend() and watchdog_resume()
>   watchdog: use park/unpark functions in update_watchdog_all_cpus()
>   watchdog: use suspend/resume interface in fixup_ht_bug()

tested the same way as for I did for my other patch for this issue:
  http://marc.info/?l=linux-kernel&m=143834383400803&w=2

works nicely ;-)

thanks,
jirka

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads()
  2015-08-01 12:49 ` [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads() Ulrich Obergfell
@ 2015-08-04 13:26   ` Michal Hocko
  2015-08-04 15:20     ` Ulrich Obergfell
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2015-08-04 13:26 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, eranian, cmetcalf, fweisbec

On Sat 01-08-15 14:49:22, Ulrich Obergfell wrote:
> These functions are intended to be used only from inside kernel/watchdog.c
> to park/unpark all watchdog threads that are specified in watchdog_cpumask.

I would suggest merging this into Patch2. It is usually better to add
new functions along with their users.

> Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
> ---
>  kernel/watchdog.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index a6ffa43..5571f20 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -24,6 +24,7 @@
>  #include <asm/irq_regs.h>
>  #include <linux/kvm_para.h>
>  #include <linux/perf_event.h>
> +#include <linux/kthread.h>
>  
>  /*
>   * The run state of the lockup detectors is controlled by the content of the
> @@ -666,6 +667,41 @@ static struct smp_hotplug_thread watchdog_threads = {
>  	.unpark			= watchdog_enable,
>  };
>  
> +/*
> + * park all watchdog threads that are specified in 'watchdog_cpumask'
> + */
> +static int watchdog_park_threads(void)
> +{
> +	int cpu, ret = 0;
> +
> +	get_online_cpus();
> +	for_each_watchdog_cpu(cpu) {
> +		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
> +		if (ret)
> +			break;
> +	}
> +	if (ret) {
> +		for_each_watchdog_cpu(cpu)
> +			kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> +	}
> +	put_online_cpus();
> +
> +	return ret;
> +}
> +
> +/*
> + * unpark all watchdog threads that are specified in 'watchdog_cpumask'
> + */
> +static void watchdog_unpark_threads(void)
> +{
> +	int cpu;
> +
> +	get_online_cpus();
> +	for_each_watchdog_cpu(cpu)
> +		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
> +	put_online_cpus();
> +}
> +
>  static void restart_watchdog_hrtimer(void *info)
>  {
>  	struct hrtimer *hrtimer = raw_cpu_ptr(&watchdog_hrtimer);
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-01 12:49 ` [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug() Ulrich Obergfell
@ 2015-08-04 13:31   ` Michal Hocko
  2015-08-04 14:27     ` Don Zickus
  2015-08-05 22:17   ` Andrew Morton
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2015-08-04 13:31 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, eranian, cmetcalf, fweisbec

On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
[...]
> @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
>  		return 0;
>  	}
>  
> -	watchdog_nmi_disable_all();
> +	if (watchdog_suspend() != 0) {
> +		pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> +		return 0;
> +	}

Is this really worth reporting to the log? What is an admin supposed to
do about it?
<looking into the code>
Ok, so kthread_park fails only when the kernel thread has already
exited. Can this ever happen during this call path?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-04 13:31   ` Michal Hocko
@ 2015-08-04 14:27     ` Don Zickus
  2015-08-04 14:44       ` Michal Hocko
  2015-08-04 14:59       ` Ulrich Obergfell
  0 siblings, 2 replies; 19+ messages in thread
From: Don Zickus @ 2015-08-04 14:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ulrich Obergfell, linux-kernel, akpm, atomlin, jolsa, eranian,
	cmetcalf, fweisbec

On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
> On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
> [...]
> > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
> >  		return 0;
> >  	}
> >  
> > -	watchdog_nmi_disable_all();
> > +	if (watchdog_suspend() != 0) {
> > +		pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> > +		return 0;
> > +	}
> 
> Is this really worth reporting to the log? What is an admin supposed to
> do about it?

I think it was more for developers to aid in debugging a strange behaviour
of the performance counters.

> <looking into the code>
> Ok, so kthread_park fails only when the kernel thread has already
> exited. Can this ever happen during this call path?

It might be overkill, but it is just a harmless informational failure
message.

Cheers,
Don

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-04 14:27     ` Don Zickus
@ 2015-08-04 14:44       ` Michal Hocko
  2015-08-04 14:59       ` Ulrich Obergfell
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2015-08-04 14:44 UTC (permalink / raw)
  To: Don Zickus
  Cc: Ulrich Obergfell, linux-kernel, akpm, atomlin, jolsa, eranian,
	cmetcalf, fweisbec

On Tue 04-08-15 10:27:50, Don Zickus wrote:
> On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
> > On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
> > [...]
> > > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
> > >  		return 0;
> > >  	}
> > >  
> > > -	watchdog_nmi_disable_all();
> > > +	if (watchdog_suspend() != 0) {
> > > +		pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
> > > +		return 0;
> > > +	}
> > 
> > Is this really worth reporting to the log? What is an admin supposed to
> > do about it?
> 
> I think it was more for developers to aid in debugging a strange behaviour
> of the performance counters.

pr_debug then?

> > <looking into the code>
> > Ok, so kthread_park fails only when the kernel thread has already
> > exited. Can this ever happen during this call path?
> 
> It might be overkill, but it is just a harmless informational failure
> message.

Maybe we have way too many of those harmless informational failure
admins scratch their heads about...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-04 14:27     ` Don Zickus
  2015-08-04 14:44       ` Michal Hocko
@ 2015-08-04 14:59       ` Ulrich Obergfell
  1 sibling, 0 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-04 14:59 UTC (permalink / raw)
  To: Don Zickus
  Cc: Michal Hocko, linux-kernel, akpm, atomlin, jolsa, eranian,
	cmetcalf, fweisbec

> ----- Original Message -----
> From: "Don Zickus" <dzickus@redhat.com>
...
> On Tue, Aug 04, 2015 at 03:31:30PM +0200, Michal Hocko wrote:
>> On Sat 01-08-15 14:49:25, Ulrich Obergfell wrote:
>> [...]
>> > @@ -3368,7 +3368,10 @@ static __init int fixup_ht_bug(void)
>> >  		return 0;
>> >  	}
>> >  
>> > -	watchdog_nmi_disable_all();
>> > +	if (watchdog_suspend() != 0) {
>> > +		pr_info("failed to disable PMU erratum BJ122, BV98, HSD29 workaround\n");
>> > +		return 0;
>> > +	}
>> 
>> Is this really worth reporting to the log? What is an admin supposed to
>> do about it?
>
> I think it was more for developers to aid in debugging a strange behaviour
> of the performance counters.
>
>> <looking into the code>
>> Ok, so kthread_park fails only when the kernel thread has already
>> exited. Can this ever happen during this call path?
>
> It might be overkill, but it is just a harmless informational failure
> message.

Don, Michal,

the module prints a message if the workaround is enabled and if the
workaround is disabled. Hence, I think we should keep the messages
consistent and thus inform the user also if we fail to disable the
workaround. Even though at the moment this seems to be an unlikely
failure case, I agree with Don that the message could be useful in
debugging.

Regards,

Uli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads()
  2015-08-04 13:26   ` Michal Hocko
@ 2015-08-04 15:20     ` Ulrich Obergfell
  0 siblings, 0 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-04 15:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, akpm, dzickus, atomlin, jolsa, eranian, cmetcalf, fweisbec

> ----- Original Message -----
> From: "Michal Hocko" <mhocko@kernel.org>
...
> On Sat 01-08-15 14:49:22, Ulrich Obergfell wrote:
>> These functions are intended to be used only from inside kernel/watchdog.c
>> to park/unpark all watchdog threads that are specified in watchdog_cpumask.
>
> I would suggest merging this into Patch2. It is usually better to add
> new functions along with their users.

Michal,

watchdog_{park|unpark}_threads are called by watchdog_{suspend|resume}
in Patch 2/4 and by update_watchdog_all_cpus() in Patch 3/4, so I would
have to merge three patches into one, and I would end up with one large
patch and one smaller patch. I think the result would be harder to read,
review and understand. Thus I'd prefer to leave the patches split as they
currently are.

Regards,

Uli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-01 12:49 ` [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume() Ulrich Obergfell
  2015-08-01 14:04   ` Guenter Roeck
@ 2015-08-04 22:58   ` Andrew Morton
  2015-08-05  8:10     ` Ulrich Obergfell
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2015-08-04 22:58 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, dzickus, atomlin, jolsa, mhocko, eranian, cmetcalf,
	fweisbec

On Sat,  1 Aug 2015 14:49:23 +0200 Ulrich Obergfell <uobergfe@redhat.com> wrote:

> This interface can be utilized to deactivate the hard and soft lockup
> detector temporarily. Callers are expected to minimize the duration of
> deactivation. Multiple deactivations are allowed to occur in parallel
> but should be rare in practice.
> 
> ...
>
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>  				void __user *, size_t *, loff_t *);
>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
>  				 void __user *, size_t *, loff_t *);
> +extern int watchdog_suspend(void);
> +extern void watchdog_resume(void);
>  #endif
>  
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5571f20..98d44b1 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>  #define for_each_watchdog_cpu(cpu) \
>  	for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>  
> +static int __read_mostly watchdog_suspended = 0;

With my compiler the "= 0" increases the size of watchdog.o data.  For
some reason by 16 bytes(!).

>  static int __read_mostly watchdog_running;
>  static u64 __read_mostly sample_period;

The relationship between watchdog_running and watchdog_suspended hurts
my brain a bit.  It appears that all watchdog_running transitions
happen under watchdog_proc_mutex so I don't think it's racy, but I
wonder if things would be simpler if these were folded into a single
up/down counter.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume()
  2015-08-04 22:58   ` Andrew Morton
@ 2015-08-05  8:10     ` Ulrich Obergfell
  0 siblings, 0 replies; 19+ messages in thread
From: Ulrich Obergfell @ 2015-08-05  8:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, dzickus, atomlin, jolsa, mhocko, eranian, cmetcalf,
	fweisbec

> ----- Original Message -----
> From: "Andrew Morton" <akpm@linux-foundation.org>
> ...
> On Sat,  1 Aug 2015 14:49:23 +0200 Ulrich Obergfell <uobergfe@redhat.com> wrote:
>
>> This interface can be utilized to deactivate the hard and soft lockup
>> detector temporarily. Callers are expected to minimize the duration of
>> deactivation. Multiple deactivations are allowed to occur in parallel
>> but should be rare in practice.
>>
>> ...
>>
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -80,6 +80,8 @@ extern int proc_watchdog_thresh(struct ctl_table *, int ,
>>                                  void __user *, size_t *, loff_t *);
>>  extern int proc_watchdog_cpumask(struct ctl_table *, int,
>>                                   void __user *, size_t *, loff_t *);
>> +extern int watchdog_suspend(void);
>> +extern void watchdog_resume(void);
>>  #endif
>>  
>>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 5571f20..98d44b1 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -67,6 +67,7 @@ unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
>>  #define for_each_watchdog_cpu(cpu) \
>>          for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask)
>>  
>> +static int __read_mostly watchdog_suspended = 0;
>
> With my compiler the "= 0" increases the size of watchdog.o data.  For
> some reason by 16 bytes(!).

I see that you already fixed this. Many Thanks.

>>  static int __read_mostly watchdog_running;
>>  static u64 __read_mostly sample_period;
>
> The relationship between watchdog_running and watchdog_suspended hurts
> my brain a bit.  It appears that all watchdog_running transitions
> happen under watchdog_proc_mutex so I don't think it's racy, but I
> wonder if things would be simpler if these were folded into a single
> up/down counter.

The 'watchdog_running' variable indicates whether watchdog threads exist.
[Whether they have been launched via smpboot_register_percpu_thread().]

The 'watchdog_suspended' variable indicates whether existing threads are
currently parked, and whether multiple requests to suspend the watchdog
have been made by different callers.

I think folding them into one variable would not improve the readability
of the code, because instead of two variables with distinct semantics we
would then have one variable with multiple semantics (which I think would
also make code more complex). If we would want to improve the readability
I'd prefer to either just add a comment block to explain the semantics of
the variables or to rename them -for example- like:

  watchdog_running   to watchdog_threads_exist
  watchdog_suspended to watchdog_suspend_count

Please let me know if you would want any changes in this regard.

I also received these suggestions:

- rename the watchdog_{suspend|resume} functions as discussed in
  http://marc.info/?l=linux-kernel&m=143844050610220&w=2

- use pr_debug() instead of pr_info() as discussed in
  http://marc.info/?l=linux-kernel&m=143869949229461&w=2

Please let me know if you want me to post a 'version 2' of the patch set
or if you want me to post these changes as separate follow-up patches.

Regards,

Uli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug()
  2015-08-01 12:49 ` [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug() Ulrich Obergfell
  2015-08-04 13:31   ` Michal Hocko
@ 2015-08-05 22:17   ` Andrew Morton
  1 sibling, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2015-08-05 22:17 UTC (permalink / raw)
  To: Ulrich Obergfell
  Cc: linux-kernel, dzickus, atomlin, jolsa, mhocko, eranian, cmetcalf,
	fweisbec

On Sat,  1 Aug 2015 14:49:25 +0200 Ulrich Obergfell <uobergfe@redhat.com> wrote:

> Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all()
> since these functions are no longer needed. If a subsystem has a
> need to deactivate the watchdog temporarily, it should utilize the
> watchdog_suspend() and watchdog_resume() functions.
> 

With x86_64 allnoconfig I'm getting

arch/x86/kernel/cpu/perf_event_intel.c: In function 'fixup_ht_bug':
arch/x86/kernel/cpu/perf_event_intel.c:3371: error: implicit declaration of function 'watchdog_suspend'
arch/x86/kernel/cpu/perf_event_intel.c:3382: error: implicit declaration of function 'watchdog_resume'

I had to mangle your patches fairly heavily to make them fit against
other pending changes.  Specifically
http://ozlabs.org/~akpm/mmots/broken-out/watchdog-move-nmi-function-header-declarations-from-watchdogh-to-nmih.patch
and
http://ozlabs.org/~akpm/mmots/broken-out/watchdog-move-nmi-function-header-declarations-from-watchdogh-to-nmih-v2.patch.

But I don't *think* I caused this...  I'm testing this:

--- a/include/linux/nmi.h~watchdog-use-suspend-resume-interface-in-fixup_ht_bug-fix
+++ a/include/linux/nmi.h
@@ -80,6 +80,15 @@ extern int proc_watchdog_cpumask(struct
 				 void __user *, size_t *, loff_t *);
 extern int watchdog_suspend(void);
 extern void watchdog_resume(void);
+#else
+static inline int watchdog_suspend(void)
+{
+	return 0;
+}
+
+static inline void watchdog_resume(void)
+{
+}
 #endif
 
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-08-05 22:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-01 12:49 [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Ulrich Obergfell
2015-08-01 12:49 ` [PATCH 1/4] watchdog: introduce watchdog_park_threads() and watchdog_unpark_threads() Ulrich Obergfell
2015-08-04 13:26   ` Michal Hocko
2015-08-04 15:20     ` Ulrich Obergfell
2015-08-01 12:49 ` [PATCH 2/4] watchdog: introduce watchdog_suspend() and watchdog_resume() Ulrich Obergfell
2015-08-01 14:04   ` Guenter Roeck
2015-08-01 14:39     ` Ulrich Obergfell
2015-08-01 14:47       ` Guenter Roeck
2015-08-04 22:58   ` Andrew Morton
2015-08-05  8:10     ` Ulrich Obergfell
2015-08-01 12:49 ` [PATCH 3/4] watchdog: use park/unpark functions in update_watchdog_all_cpus() Ulrich Obergfell
2015-08-01 12:49 ` [PATCH 4/4] watchdog: use suspend/resume interface in fixup_ht_bug() Ulrich Obergfell
2015-08-04 13:31   ` Michal Hocko
2015-08-04 14:27     ` Don Zickus
2015-08-04 14:44       ` Michal Hocko
2015-08-04 14:59       ` Ulrich Obergfell
2015-08-05 22:17   ` Andrew Morton
2015-08-02 10:54 ` [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions Aaron Tomlin
2015-08-02 22:14 ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.