[PATCH 1/2] watchdog: introduce touch_softlockup_watchdog_sched()

* [PATCH 1/2] watchdog: introduce touch_softlockup_watchdog_sched()
@ 2015-12-03  0:28 Tejun Heo
  2015-12-03  0:28 ` [PATCH 2/2] workqueue: implement lockup detector Tejun Heo
  2015-12-03  9:33 ` [PATCH 1/2] watchdog: introduce touch_softlockup_watchdog_sched() Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Tejun Heo @ 2015-12-03  0:28 UTC (permalink / raw)
  To: Ulrich Obergfell, Ingo Molnar, Peter Zijlstra, Andrew Morton
  Cc: linux-kernel, kernel-team

Hello,

There haven't been too many workqueue stall bugs; however, good part
of them have been pretty painful to track down because there's no
lockup detection mechanism for workqueue and it isn't easy to tell
what's going on with workqueues; furthermore, some requirements are
tricky to get right - e.g. it's not too difficult to miss
WQ_MEM_RECLAIM for a workqueue which runs a work item which is flushed
by something which sits in the reclaim path.

To alleviate the situation, this two patch series implements workqueue
lockup detector.  Each worker_pool tracks the last time it made
forward progress and if no forward progress is made for longer than
threshold it triggers warnings and dumps workqueue state.  It's
controlled together with scheduler softlockup mechanism and uses the
same threshold value as it shares a lot of the characteristics.

Thanks.

------ 8< ------
touch_softlockup_watchdog() is used to tell watchdog that scheduler
stall is expected.  One group of usage is from paths where the task
may not be able to yield for a long time such as performing slow PIO
to finicky device and coming out of suspend.  The other is to account
for scheduler and timer going idle.

For scheduler softlockup detection, there's no reason to distinguish
the two cases; however, workqueue lockup detector is planned and it
can use the same signals from the former group while the latter would
spuriously prevent detection.  This patch introduces a new function
touch_softlockup_watchdog_sched() and convert the latter group to call
it instead.  For now, it just calls touch_softlockup_watchdog() and
there's no functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/sched.h    |    4 ++++
 kernel/sched/clock.c     |    2 +-
 kernel/time/tick-sched.c |    6 +++---
 kernel/watchdog.c        |   15 ++++++++++++++-
 4 files changed, 22 insertions(+), 5 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -377,6 +377,7 @@ extern void scheduler_tick(void);
 extern void sched_show_task(struct task_struct *p);
 
 #ifdef CONFIG_LOCKUP_DETECTOR
+extern void touch_softlockup_watchdog_sched(void);
 extern void touch_softlockup_watchdog(void);
 extern void touch_softlockup_watchdog_sync(void);
 extern void touch_all_softlockup_watchdogs(void);
@@ -387,6 +388,9 @@ extern unsigned int  softlockup_panic;
 extern unsigned int  hardlockup_panic;
 void lockup_detector_init(void);
 #else
+static inline void touch_softlockup_watchdog_sched(void)
+{
+}
 static inline void touch_softlockup_watchdog(void)
 {
 }
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -354,7 +354,7 @@ void sched_clock_idle_wakeup_event(u64 d
 		return;
 
 	sched_clock_tick();
-	touch_softlockup_watchdog();
+	touch_softlockup_watchdog_sched();
 }
 EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
 
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -143,7 +143,7 @@ static void tick_sched_handle(struct tic
 	 * when we go busy again does not account too much ticks.
 	 */
 	if (ts->tick_stopped) {
-		touch_softlockup_watchdog();
+		touch_softlockup_watchdog_sched();
 		if (is_idle_task(current))
 			ts->idle_jiffies++;
 	}
@@ -430,7 +430,7 @@ static void tick_nohz_update_jiffies(kti
 	tick_do_update_jiffies64(now);
 	local_irq_restore(flags);
 
-	touch_softlockup_watchdog();
+	touch_softlockup_watchdog_sched();
 }
 
 /*
@@ -701,7 +701,7 @@ static void tick_nohz_restart_sched_tick
 	update_cpu_load_nohz();
 
 	calc_load_exit_idle();
-	touch_softlockup_watchdog();
+	touch_softlockup_watchdog_sched();
 	/*
 	 * Cancel the scheduled timer and restore the tick
 	 */
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -225,7 +225,15 @@ static void __touch_watchdog(void)
 	__this_cpu_write(watchdog_touch_ts, get_timestamp());
 }
 
-void touch_softlockup_watchdog(void)
+/**
+ * touch_softlockup_watchdog_sched - touch watchdog on scheduler stalls
+ *
+ * Call when the scheduler may have stalled for legitimate reasons
+ * preventing the watchdog task from executing - e.g. the scheduler
+ * entering idle state.  This should only be used for scheduler events.
+ * Use touch_softlockup_watchdog() for everything else.
+ */
+void touch_softlockup_watchdog_sched(void)
 {
 	/*
 	 * Preemption can be enabled.  It doesn't matter which CPU's timestamp
@@ -233,6 +241,11 @@ void touch_softlockup_watchdog(void)
 	 */
 	raw_cpu_write(watchdog_touch_ts, 0);
 }
+
+void touch_softlockup_watchdog(void)
+{
+	touch_softlockup_watchdog_sched();
+}
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
 void touch_all_softlockup_watchdogs(void)

^ permalink raw reply	[flat|nested] 51+ messages in thread