All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v2 2/2] workqueue: implement lockup detector
Date: Mon, 7 Dec 2015 16:38:16 -0500	[thread overview]
Message-ID: <20151207213816.GA230987@redhat.com> (raw)
In-Reply-To: <20151207190617.GH9175@mtj.duckdns.org>

On Mon, Dec 07, 2015 at 02:06:17PM -0500, Tejun Heo wrote:
> Hello,
> 
> Decoupled the control knobs from softlockup.  It's now workqueue
> module param which can be updated at runtime.  If there's no
> objection, I'll push the two patches through wq/for-4.5.

Does this still compile correctly with CONFIG_WQ_WATCHDOG disabled?

Cheers,
Don

> 
> Thanks.
> ------ 8< ------
> Workqueue stalls can happen from a variety of usage bugs such as
> missing WQ_MEM_RECLAIM flag or concurrency managed work item
> indefinitely staying RUNNING.  These stalls can be extremely difficult
> to hunt down because the usual warning mechanisms can't detect
> workqueue stalls and the internal state is pretty opaque.
> 
> To alleviate the situation, this patch implements workqueue lockup
> detector.  It periodically monitors all worker_pools periodically and,
> if any pool failed to make forward progress longer than the threshold
> duration, triggers warning and dumps workqueue state as follows.
> 
>  BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
>  Showing busy workqueues and worker pools:
>  workqueue events: flags=0x0
>    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
>      pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
>  workqueue events_power_efficient: flags=0x80
>    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
>      pending: check_lifetime, neigh_periodic_work
>  workqueue cgroup_pidlist_destroy: flags=0x0
>    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
>      pending: cgroup_pidlist_destroy_work_fn
>  ...
> 
> The detection mechanism is controller through kernel parameter
> workqueue.watchdog_thresh and can be updated at runtime through the
> sysfs module parameter file.
> 
> v2: Decoupled from softlockup control knobs.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Ulrich Obergfell <uobergfe@redhat.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Chris Mason <clm@fb.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  Documentation/kernel-parameters.txt |    9 +
>  include/linux/workqueue.h           |    6 +
>  kernel/watchdog.c                   |    3 
>  kernel/workqueue.c                  |  174 +++++++++++++++++++++++++++++++++++-
>  lib/Kconfig.debug                   |   11 ++
>  5 files changed, 200 insertions(+), 3 deletions(-)
> 
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -4114,6 +4114,15 @@ bytes respectively. Such letter suffixes
>  			or other driver-specific files in the
>  			Documentation/watchdog/ directory.
>  
> +	workqueue.watchdog_thresh=
> +			If CONFIG_WQ_WATCHDOG is configured, workqueue can
> +			warn stall conditions and dump internal state to
> +			help debugging.  0 disables workqueue stall
> +			detection; otherwise, it's the stall threshold
> +			duration in seconds.  The default value is 30 and
> +			it can be updated at runtime by writing to the
> +			corresponding sysfs file.
> +
>  	workqueue.disable_numa
>  			By default, all work items queued to unbound
>  			workqueues are affine to the NUMA nodes they're
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -618,4 +618,10 @@ static inline int workqueue_sysfs_regist
>  { return 0; }
>  #endif	/* CONFIG_SYSFS */
>  
> +#ifdef CONFIG_WQ_WATCHDOG
> +void wq_watchdog_touch(int cpu);
> +#else	/* CONFIG_WQ_WATCHDOG */
> +static inline void wq_watchdog_touch(int cpu) { }
> +#endif	/* CONFIG_WQ_WATCHDOG */
> +
>  #endif
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -20,6 +20,7 @@
>  #include <linux/smpboot.h>
>  #include <linux/sched/rt.h>
>  #include <linux/tick.h>
> +#include <linux/workqueue.h>
>  
>  #include <asm/irq_regs.h>
>  #include <linux/kvm_para.h>
> @@ -245,6 +246,7 @@ void touch_softlockup_watchdog_sched(voi
>  void touch_softlockup_watchdog(void)
>  {
>  	touch_softlockup_watchdog_sched();
> +	wq_watchdog_touch(raw_smp_processor_id());
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -259,6 +261,7 @@ void touch_all_softlockup_watchdogs(void
>  	 */
>  	for_each_watchdog_cpu(cpu)
>  		per_cpu(watchdog_touch_ts, cpu) = 0;
> +	wq_watchdog_touch(-1);
>  }
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -148,6 +148,8 @@ struct worker_pool {
>  	int			id;		/* I: pool ID */
>  	unsigned int		flags;		/* X: flags */
>  
> +	unsigned long		watchdog_ts;	/* L: watchdog timestamp */
> +
>  	struct list_head	worklist;	/* L: list of pending works */
>  	int			nr_workers;	/* L: total number of workers */
>  
> @@ -1083,6 +1085,8 @@ static void pwq_activate_delayed_work(st
>  	struct pool_workqueue *pwq = get_work_pwq(work);
>  
>  	trace_workqueue_activate_work(work);
> +	if (list_empty(&pwq->pool->worklist))
> +		pwq->pool->watchdog_ts = jiffies;
>  	move_linked_works(work, &pwq->pool->worklist, NULL);
>  	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
>  	pwq->nr_active++;
> @@ -1385,6 +1389,8 @@ retry:
>  		trace_workqueue_activate_work(work);
>  		pwq->nr_active++;
>  		worklist = &pwq->pool->worklist;
> +		if (list_empty(worklist))
> +			pwq->pool->watchdog_ts = jiffies;
>  	} else {
>  		work_flags |= WORK_STRUCT_DELAYED;
>  		worklist = &pwq->delayed_works;
> @@ -2157,6 +2163,8 @@ recheck:
>  			list_first_entry(&pool->worklist,
>  					 struct work_struct, entry);
>  
> +		pool->watchdog_ts = jiffies;
> +
>  		if (likely(!(*work_data_bits(work) & WORK_STRUCT_LINKED))) {
>  			/* optimization path, not strictly necessary */
>  			process_one_work(worker, work);
> @@ -2240,6 +2248,7 @@ repeat:
>  					struct pool_workqueue, mayday_node);
>  		struct worker_pool *pool = pwq->pool;
>  		struct work_struct *work, *n;
> +		bool first = true;
>  
>  		__set_current_state(TASK_RUNNING);
>  		list_del_init(&pwq->mayday_node);
> @@ -2256,9 +2265,14 @@ repeat:
>  		 * process'em.
>  		 */
>  		WARN_ON_ONCE(!list_empty(scheduled));
> -		list_for_each_entry_safe(work, n, &pool->worklist, entry)
> -			if (get_work_pwq(work) == pwq)
> +		list_for_each_entry_safe(work, n, &pool->worklist, entry) {
> +			if (get_work_pwq(work) == pwq) {
> +				if (first)
> +					pool->watchdog_ts = jiffies;
>  				move_linked_works(work, scheduled, &n);
> +			}
> +			first = false;
> +		}
>  
>  		if (!list_empty(scheduled)) {
>  			process_scheduled_works(rescuer);
> @@ -3104,6 +3118,7 @@ static int init_worker_pool(struct worke
>  	pool->cpu = -1;
>  	pool->node = NUMA_NO_NODE;
>  	pool->flags |= POOL_DISASSOCIATED;
> +	pool->watchdog_ts = jiffies;
>  	INIT_LIST_HEAD(&pool->worklist);
>  	INIT_LIST_HEAD(&pool->idle_list);
>  	hash_init(pool->busy_hash);
> @@ -4343,7 +4358,9 @@ void show_workqueue_state(void)
>  
>  		pr_info("pool %d:", pool->id);
>  		pr_cont_pool_info(pool);
> -		pr_cont(" workers=%d", pool->nr_workers);
> +		pr_cont(" hung=%us workers=%d",
> +			jiffies_to_msecs(jiffies - pool->watchdog_ts) / 1000,
> +			pool->nr_workers);
>  		if (pool->manager)
>  			pr_cont(" manager: %d",
>  				task_pid_nr(pool->manager->task));
> @@ -5202,6 +5219,154 @@ static void workqueue_sysfs_unregister(s
>  static void workqueue_sysfs_unregister(struct workqueue_struct *wq)	{ }
>  #endif	/* CONFIG_SYSFS */
>  
> +/*
> + * Workqueue watchdog.
> + *
> + * Stall may be caused by various bugs - missing WQ_MEM_RECLAIM, illegal
> + * flush dependency, a concurrency managed work item which stays RUNNING
> + * indefinitely.  Workqueue stalls can be very difficult to debug as the
> + * usual warning mechanisms don't trigger and internal workqueue state is
> + * largely opaque.
> + *
> + * Workqueue watchdog monitors all worker pools periodically and dumps
> + * state if some pools failed to make forward progress for a while where
> + * forward progress is defined as the first item on ->worklist changing.
> + *
> + * This mechanism is controlled through the kernel parameter
> + * "workqueue.watchdog_thresh" which can be updated at runtime through the
> + * corresponding sysfs parameter file.
> + */
> +#ifdef CONFIG_WQ_WATCHDOG
> +
> +static void wq_watchdog_timer_fn(unsigned long data);
> +
> +static unsigned long wq_watchdog_thresh = 30;
> +static struct timer_list wq_watchdog_timer =
> +	TIMER_DEFERRED_INITIALIZER(wq_watchdog_timer_fn, 0, 0);
> +
> +static unsigned long wq_watchdog_touched = INITIAL_JIFFIES;
> +static DEFINE_PER_CPU(unsigned long, wq_watchdog_touched_cpu) = INITIAL_JIFFIES;
> +
> +static void wq_watchdog_reset_touched(void)
> +{
> +	int cpu;
> +
> +	wq_watchdog_touched = jiffies;
> +	for_each_possible_cpu(cpu)
> +		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
> +}
> +
> +static void wq_watchdog_timer_fn(unsigned long data)
> +{
> +	unsigned long thresh = READ_ONCE(wq_watchdog_thresh) * HZ;
> +	bool lockup_detected = false;
> +	struct worker_pool *pool;
> +	int pi;
> +
> +	if (!thresh)
> +		return;
> +
> +	rcu_read_lock();
> +
> +	for_each_pool(pool, pi) {
> +		unsigned long pool_ts, touched, ts;
> +
> +		if (list_empty(&pool->worklist))
> +			continue;
> +
> +		/* get the latest of pool and touched timestamps */
> +		pool_ts = READ_ONCE(pool->watchdog_ts);
> +		touched = READ_ONCE(wq_watchdog_touched);
> +
> +		if (time_after(pool_ts, touched))
> +			ts = pool_ts;
> +		else
> +			ts = touched;
> +
> +		if (pool->cpu >= 0) {
> +			unsigned long cpu_touched =
> +				READ_ONCE(per_cpu(wq_watchdog_touched_cpu,
> +						  pool->cpu));
> +			if (time_after(cpu_touched, ts))
> +				ts = cpu_touched;
> +		}
> +
> +		/* did we stall? */
> +		if (time_after(jiffies, ts + thresh)) {
> +			lockup_detected = true;
> +			pr_emerg("BUG: workqueue lockup - pool");
> +			pr_cont_pool_info(pool);
> +			pr_cont(" stuck for %us!\n",
> +				jiffies_to_msecs(jiffies - pool_ts) / 1000);
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +
> +	if (lockup_detected)
> +		show_workqueue_state();
> +
> +	wq_watchdog_reset_touched();
> +	mod_timer(&wq_watchdog_timer, jiffies + thresh);
> +}
> +
> +void wq_watchdog_touch(int cpu)
> +{
> +	if (cpu >= 0)
> +		per_cpu(wq_watchdog_touched_cpu, cpu) = jiffies;
> +	else
> +		wq_watchdog_touched = jiffies;
> +}
> +
> +static void wq_watchdog_set_thresh(unsigned long thresh)
> +{
> +	wq_watchdog_thresh = 0;
> +	del_timer_sync(&wq_watchdog_timer);
> +
> +	if (thresh) {
> +		wq_watchdog_thresh = thresh;
> +		wq_watchdog_reset_touched();
> +		mod_timer(&wq_watchdog_timer, jiffies + thresh * HZ);
> +	}
> +}
> +
> +static int wq_watchdog_param_set_thresh(const char *val,
> +					const struct kernel_param *kp)
> +{
> +	unsigned long thresh;
> +	int ret;
> +
> +	ret = kstrtoul(val, 0, &thresh);
> +	if (ret)
> +		return ret;
> +
> +	if (system_wq)
> +		wq_watchdog_set_thresh(thresh);
> +	else
> +		wq_watchdog_thresh = thresh;
> +
> +	return 0;
> +}
> +
> +static const struct kernel_param_ops wq_watchdog_thresh_ops = {
> +	.set	= wq_watchdog_param_set_thresh,
> +	.get	= param_get_ulong,
> +};
> +
> +module_param_cb(watchdog_thresh, &wq_watchdog_thresh_ops, &wq_watchdog_thresh,
> +		0644);
> +
> +static void wq_watchdog_init(void)
> +{
> +	wq_watchdog_set_thresh(wq_watchdog_thresh);
> +}
> +
> +#else	/* CONFIG_WQ_WATCHDOG */
> +
> +static inline void wq_watchdog_init(void) { }
> +
> +#endif	/* CONFIG_WQ_WATCHDOG */
> +
>  static void __init wq_numa_init(void)
>  {
>  	cpumask_var_t *tbl;
> @@ -5325,6 +5490,9 @@ static int __init init_workqueues(void)
>  	       !system_unbound_wq || !system_freezable_wq ||
>  	       !system_power_efficient_wq ||
>  	       !system_freezable_power_efficient_wq);
> +
> +	wq_watchdog_init();
> +
>  	return 0;
>  }
>  early_initcall(init_workqueues);
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -812,6 +812,17 @@ config BOOTPARAM_HUNG_TASK_PANIC_VALUE
>  	default 0 if !BOOTPARAM_HUNG_TASK_PANIC
>  	default 1 if BOOTPARAM_HUNG_TASK_PANIC
>  
> +config WQ_WATCHDOG
> +	bool "Detect Workqueue Stalls"
> +	depends on DEBUG_KERNEL
> +	help
> +	  Say Y here to enable stall detection on workqueues.  If a
> +	  worker pool doesn't make forward progress on a pending work
> +	  item for over a given amount of time, 30s by default, a
> +	  warning message is printed along with dump of workqueue
> +	  state.  This can be configured through kernel parameter
> +	  "workqueue.watchdog_thresh" and its sysfs counterpart.
> +
>  endmenu # "Debug lockups and hangs"
>  
>  config PANIC_ON_OOPS
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

  reply	other threads:[~2015-12-07 21:38 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03  0:28 [PATCH 1/2] watchdog: introduce touch_softlockup_watchdog_sched() Tejun Heo
2015-12-03  0:28 ` [PATCH 2/2] workqueue: implement lockup detector Tejun Heo
2015-12-03 14:49   ` Tejun Heo
2015-12-03 17:50   ` Don Zickus
2015-12-03 19:43     ` Tejun Heo
2015-12-03 20:12       ` Ulrich Obergfell
2015-12-03 20:54         ` Tejun Heo
2015-12-04  8:02           ` Ingo Molnar
2015-12-04 16:52             ` Don Zickus
2015-12-04 13:19           ` Ulrich Obergfell
2015-12-07 19:06   ` [PATCH v2 " Tejun Heo
2015-12-07 21:38     ` Don Zickus [this message]
2015-12-07 21:39       ` Tejun Heo
2015-12-08 16:00         ` Don Zickus
2015-12-08 16:31           ` Tejun Heo
2015-12-03  9:33 ` [PATCH 1/2] watchdog: introduce touch_softlockup_watchdog_sched() Peter Zijlstra
2015-12-03 10:00   ` Peter Zijlstra
2015-12-03 14:48     ` Tejun Heo
2015-12-03 15:04       ` Peter Zijlstra
2015-12-03 15:06         ` Tejun Heo
2015-12-03 19:26           ` [PATCH] workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue Tejun Heo
2015-12-03 20:43             ` Peter Zijlstra
2015-12-03 20:56               ` Tejun Heo
2015-12-03 21:09                 ` Peter Zijlstra
2015-12-03 22:04                   ` Tejun Heo
2015-12-04 12:51                     ` Peter Zijlstra
2015-12-07 15:58             ` Tejun Heo
     [not found]             ` <20151203192616.GJ27463-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2016-01-26 17:38               ` Thierry Reding
2016-01-26 17:38                 ` Thierry Reding
     [not found]                 ` <20160126173843.GA11115-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2016-01-28 10:12                   ` Peter Zijlstra
2016-01-28 10:12                     ` Peter Zijlstra
     [not found]                     ` <20160128101210.GC6357-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2016-01-28 12:47                       ` Thierry Reding
2016-01-28 12:47                         ` Thierry Reding
2016-01-28 12:48                         ` Thierry Reding
2016-01-28 12:48                           ` Thierry Reding
2016-01-29 11:09                       ` Tejun Heo
2016-01-29 11:09                         ` Tejun Heo
2016-01-29 11:09                         ` Tejun Heo
     [not found]                         ` <20160129110941.GK32380-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-01-29 15:17                           ` Peter Zijlstra
2016-01-29 15:17                             ` Peter Zijlstra
2016-01-29 15:17                             ` Peter Zijlstra
2016-01-29 18:28                             ` Tejun Heo
2016-01-29 18:28                               ` Tejun Heo
2016-01-29 10:59                   ` [PATCH wq/for-4.5-fixes] workqueue: skip flush dependency checks for legacy workqueues Tejun Heo
2016-01-29 10:59                     ` Tejun Heo
2016-01-29 15:07                     ` Thierry Reding
2016-01-29 18:32                     ` Tejun Heo
     [not found]                     ` <20160129105946.GJ32380-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-02-02  6:54                       ` Archit Taneja
2016-02-02  6:54                         ` Archit Taneja
2016-03-10 15:12             ` [PATCH] workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue Adrian Hunter
2016-03-11 17:52               ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151207213816.GA230987@redhat.com \
    --to=dzickus@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.