linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry)
@ 2015-11-03 15:20 Ulrich Obergfell
  2015-11-03 15:20 ` [PATCH 1/4] watchdog: avoid race between lockup detector suspend/resume and CPU hotplug Ulrich Obergfell
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Ulrich Obergfell @ 2015-11-03 15:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, dzickus, atomlin, uobergfe

This patch set addresses various races in relation to CPU hotplug
and a race in relation to watchdog timer expiry. I discovered the
corner cases during code inspection. I haven't seen any of these
issues occur in practice.

Ulrich Obergfell (4):
  watchdog: avoid race between lockup detector suspend/resume and CPU
    hotplug
  watchdog: avoid races between /proc handlers and CPU hotplug
  watchdog: remove {get|put}_online_cpus() from
    watchdog_{park|unpark}_threads()
  watchdog: fix race between proc_watchdog_thresh() and
    watchdog_timer_fn()

 kernel/watchdog.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] watchdog: avoid race between lockup detector suspend/resume and CPU hotplug
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
@ 2015-11-03 15:20 ` Ulrich Obergfell
  2015-11-03 15:20 ` [PATCH 2/4] watchdog: avoid races between /proc handlers " Ulrich Obergfell
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ulrich Obergfell @ 2015-11-03 15:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, dzickus, atomlin, uobergfe

The lockup detector suspend/resume interface that was introduced by
commit 8c073d27d7ad293bf734cc8475689413afadab81 does not protect
itself against races with CPU hotplug. Hence, theoretically it is
possible that a new watchdog thread is started on a hotplugged CPU
while the lockup detector is suspended, and the thread could thus
interfere unexpectedly with the code that requested to suspend the
lockup detector. Avoid the race by calling

  get_online_cpus() in lockup_detector_suspend()
  put_online_cpus() in lockup_detector_resume()

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 0a23125..7357842 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -719,6 +719,7 @@ int lockup_detector_suspend(void)
 {
 	int ret = 0;
 
+	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 	/*
 	 * Multiple suspend requests can be active in parallel (counted by
@@ -759,6 +760,7 @@ void lockup_detector_resume(void)
 		watchdog_unpark_threads();
 
 	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
 }
 
 static int update_watchdog_all_cpus(void)
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] watchdog: avoid races between /proc handlers and CPU hotplug
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
  2015-11-03 15:20 ` [PATCH 1/4] watchdog: avoid race between lockup detector suspend/resume and CPU hotplug Ulrich Obergfell
@ 2015-11-03 15:20 ` Ulrich Obergfell
  2015-11-03 15:21 ` [PATCH 3/4] watchdog: remove {get|put}_online_cpus() from watchdog_{park|unpark}_threads() Ulrich Obergfell
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ulrich Obergfell @ 2015-11-03 15:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, dzickus, atomlin, uobergfe

The handler functions for watchdog parameters in /proc/sys/kernel
do not protect themselves against races with CPU hotplug. Hence,
theoretically it is possible that a new watchdog thread is started
on a hotplugged CPU while a parameter is being modified, and the
thread could thus use a parameter value that is 'in transition'.

For example, if 'watchdog_thresh' is being set to zero (note: this
disables the lockup detectors) the thread would erroneously use the
value zero as the sample period.

To avoid such races and to keep the /proc handler code consistent,
call
     {get|put}_online_cpus() in proc_watchdog_common()
     {get|put}_online_cpus() in proc_watchdog_thresh()
     {get|put}_online_cpus() in proc_watchdog_cpumask()

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7357842..13fdda1 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -857,6 +857,7 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 	int err, old, new;
 	int *watchdog_param = (int *)table->data;
 
+	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
 	if (watchdog_suspended) {
@@ -908,6 +909,7 @@ static int proc_watchdog_common(int which, struct ctl_table *table, int write,
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
 	return err;
 }
 
@@ -949,6 +951,7 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
 {
 	int err, old;
 
+	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
 	if (watchdog_suspended) {
@@ -974,6 +977,7 @@ int proc_watchdog_thresh(struct ctl_table *table, int write,
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
 	return err;
 }
 
@@ -988,6 +992,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 {
 	int err;
 
+	get_online_cpus();
 	mutex_lock(&watchdog_proc_mutex);
 
 	if (watchdog_suspended) {
@@ -1015,6 +1020,7 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
 	}
 out:
 	mutex_unlock(&watchdog_proc_mutex);
+	put_online_cpus();
 	return err;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] watchdog: remove {get|put}_online_cpus() from watchdog_{park|unpark}_threads()
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
  2015-11-03 15:20 ` [PATCH 1/4] watchdog: avoid race between lockup detector suspend/resume and CPU hotplug Ulrich Obergfell
  2015-11-03 15:20 ` [PATCH 2/4] watchdog: avoid races between /proc handlers " Ulrich Obergfell
@ 2015-11-03 15:21 ` Ulrich Obergfell
  2015-11-03 15:21 ` [PATCH 4/4] watchdog: fix race between proc_watchdog_thresh() and watchdog_timer_fn() Ulrich Obergfell
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Ulrich Obergfell @ 2015-11-03 15:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, dzickus, atomlin, uobergfe

watchdog_{park|unpark}_threads() are now called in code paths that
protect themselves against CPU hotplug, so {get|put}_online_cpus()
calls are redundant and can be removed.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 13fdda1..84c4744 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -683,33 +683,35 @@ static struct smp_hotplug_thread watchdog_threads = {
  * be parked and the watchdog threads of other CPUs can still be runnable.
  * Callers are expected to handle this special condition as appropriate in
  * their context.
+ *
+ * This function may only be called in a context that is protected against
+ * races with CPU hotplug - for example, via get_online_cpus().
  */
 static int watchdog_park_threads(void)
 {
 	int cpu, ret = 0;
 
-	get_online_cpus();
 	for_each_watchdog_cpu(cpu) {
 		ret = kthread_park(per_cpu(softlockup_watchdog, cpu));
 		if (ret)
 			break;
 	}
-	put_online_cpus();
 
 	return ret;
 }
 
 /*
  * unpark all watchdog threads that are specified in 'watchdog_cpumask'
+ *
+ * This function may only be called in a context that is protected against
+ * races with CPU hotplug - for example, via get_online_cpus().
  */
 static void watchdog_unpark_threads(void)
 {
 	int cpu;
 
-	get_online_cpus();
 	for_each_watchdog_cpu(cpu)
 		kthread_unpark(per_cpu(softlockup_watchdog, cpu));
-	put_online_cpus();
 }
 
 /*
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] watchdog: fix race between proc_watchdog_thresh() and watchdog_timer_fn()
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
                   ` (2 preceding siblings ...)
  2015-11-03 15:21 ` [PATCH 3/4] watchdog: remove {get|put}_online_cpus() from watchdog_{park|unpark}_threads() Ulrich Obergfell
@ 2015-11-03 15:21 ` Ulrich Obergfell
  2015-11-05 14:46 ` [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Don Zickus
  2015-11-05 20:50 ` Aaron Tomlin
  5 siblings, 0 replies; 7+ messages in thread
From: Ulrich Obergfell @ 2015-11-03 15:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, dzickus, atomlin, uobergfe

Theoretically it is possible that the watchdog timer expires
right at the time when a user sets 'watchdog_thresh' to zero
(note: this disables the lockup detectors). In this scenario,
the is_softlockup() function - which is called by the timer -
could produce a false positive.

Fix this by checking the current value of 'watchdog_thresh'.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
---
 kernel/watchdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 84c4744..18f34cf 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -289,7 +289,7 @@ static int is_softlockup(unsigned long touch_ts)
 {
 	unsigned long now = get_timestamp();
 
-	if (watchdog_enabled & SOFT_WATCHDOG_ENABLED) {
+	if ((watchdog_enabled & SOFT_WATCHDOG_ENABLED) && watchdog_thresh){
 		/* Warn about unreasonable delays. */
 		if (time_after(now, touch_ts + get_softlockup_thresh()))
 			return now - touch_ts;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry)
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
                   ` (3 preceding siblings ...)
  2015-11-03 15:21 ` [PATCH 4/4] watchdog: fix race between proc_watchdog_thresh() and watchdog_timer_fn() Ulrich Obergfell
@ 2015-11-05 14:46 ` Don Zickus
  2015-11-05 20:50 ` Aaron Tomlin
  5 siblings, 0 replies; 7+ messages in thread
From: Don Zickus @ 2015-11-05 14:46 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: linux-kernel, akpm, atomlin

On Tue, Nov 03, 2015 at 04:20:57PM +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.

Series looks fine to me.  I have run some local panic tests with no
problems, along with modifying various values and everything seems fine.

Acked-by: Don Zickus <dzickus@redhat.com>

> 
> Ulrich Obergfell (4):
>   watchdog: avoid race between lockup detector suspend/resume and CPU
>     hotplug
>   watchdog: avoid races between /proc handlers and CPU hotplug
>   watchdog: remove {get|put}_online_cpus() from
>     watchdog_{park|unpark}_threads()
>   watchdog: fix race between proc_watchdog_thresh() and
>     watchdog_timer_fn()
> 
>  kernel/watchdog.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> -- 
> 1.7.11.7
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry)
  2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
                   ` (4 preceding siblings ...)
  2015-11-05 14:46 ` [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Don Zickus
@ 2015-11-05 20:50 ` Aaron Tomlin
  5 siblings, 0 replies; 7+ messages in thread
From: Aaron Tomlin @ 2015-11-05 20:50 UTC (permalink / raw)
  To: Ulrich Obergfell; +Cc: linux-kernel, akpm, dzickus

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Tue 2015-11-03 16:20 +0100, Ulrich Obergfell wrote:
> This patch set addresses various races in relation to CPU hotplug
> and a race in relation to watchdog timer expiry. I discovered the
> corner cases during code inspection. I haven't seen any of these
> issues occur in practice.

This patch series does adequately address the race conditions mentioned
above. Thanks.

Reviewed-by: Aaron Tomlin <atomlin@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-11-05 20:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-03 15:20 [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Ulrich Obergfell
2015-11-03 15:20 ` [PATCH 1/4] watchdog: avoid race between lockup detector suspend/resume and CPU hotplug Ulrich Obergfell
2015-11-03 15:20 ` [PATCH 2/4] watchdog: avoid races between /proc handlers " Ulrich Obergfell
2015-11-03 15:21 ` [PATCH 3/4] watchdog: remove {get|put}_online_cpus() from watchdog_{park|unpark}_threads() Ulrich Obergfell
2015-11-03 15:21 ` [PATCH 4/4] watchdog: fix race between proc_watchdog_thresh() and watchdog_timer_fn() Ulrich Obergfell
2015-11-05 14:46 ` [PATCH 0/4] watchdog: address various races (CPU hotplug, timer expiry) Don Zickus
2015-11-05 20:50 ` Aaron Tomlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).