linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: [PATCH 4.9 01/23] workqueue: replace pool->manager_arb mutex with a flag
Date: Tue, 31 Oct 2017 10:55:25 +0100	[thread overview]
Message-ID: <20171031095517.456393241@linuxfoundation.org> (raw)
In-Reply-To: <20171031095517.400573240@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 692b48258dda7c302e777d7d5f4217244478f1f6 upstream.

Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:

 [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
 [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
 [ 1270.473240] -----------------------------------------------------
 [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
 [ 1270.474994]
 [ 1270.474994] and this task is already holding:
 [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
 [ 1270.476046] which would create a new lock dependency:
 [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
 [ 1270.476949]
 [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
 [ 1270.477553]  (&pool->lock/1){-.-.}
 ...
 [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
 [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
 ...
 [ 1270.494735]  Possible interrupt unsafe locking scenario:
 [ 1270.494735]
 [ 1270.495250]        CPU0                    CPU1
 [ 1270.495600]        ----                    ----
 [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
 [ 1270.496295]                                local_irq_disable();
 [ 1270.496753]                                lock(&pool->lock/1);
 [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
 [ 1270.497744]   <Interrupt>
 [ 1270.497948]     lock(&pool->lock/1);

, which will cause a irq inversion deadlock if the above lock scenario
happens.

The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.

Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
lockdep_assert_held() fail").

Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag.  The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.

This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.

v2: Drop unnecessary flag clearing before pool destruction as
    suggested by Boqun.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/workqueue.c |   37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,6 +68,7 @@ enum {
 	 * attach_mutex to avoid changing binding state while
 	 * worker_attach_to_pool() is in progress.
 	 */
+	POOL_MANAGER_ACTIVE	= 1 << 0,	/* being managed */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 
 	/* worker flags */
@@ -165,7 +166,6 @@ struct worker_pool {
 						/* L: hash of busy workers */
 
 	/* see manage_workers() for details on the two manager mutexes */
-	struct mutex		manager_arb;	/* manager arbitration */
 	struct worker		*manager;	/* L: purely informational */
 	struct mutex		attach_mutex;	/* attach/detach exclusion */
 	struct list_head	workers;	/* A: attached workers */
@@ -297,6 +297,7 @@ static struct workqueue_attrs *wq_update
 
 static DEFINE_MUTEX(wq_pool_mutex);	/* protects pools and workqueues list */
 static DEFINE_SPINLOCK(wq_mayday_lock);	/* protects wq->maydays list */
+static DECLARE_WAIT_QUEUE_HEAD(wq_manager_wait); /* wait for manager to go away */
 
 static LIST_HEAD(workqueues);		/* PR: list of all workqueues */
 static bool workqueue_freezing;		/* PL: have wqs started freezing? */
@@ -799,7 +800,7 @@ static bool need_to_create_worker(struct
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_arb);
+	bool managing = pool->flags & POOL_MANAGER_ACTIVE;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1979,24 +1980,17 @@ static bool manage_workers(struct worker
 {
 	struct worker_pool *pool = worker->pool;
 
-	/*
-	 * Anyone who successfully grabs manager_arb wins the arbitration
-	 * and becomes the manager.  mutex_trylock() on pool->manager_arb
-	 * failure while holding pool->lock reliably indicates that someone
-	 * else is managing the pool and the worker which failed trylock
-	 * can proceed to executing work items.  This means that anyone
-	 * grabbing manager_arb is responsible for actually performing
-	 * manager duties.  If manager_arb is grabbed and released without
-	 * actual management, the pool may stall indefinitely.
-	 */
-	if (!mutex_trylock(&pool->manager_arb))
+	if (pool->flags & POOL_MANAGER_ACTIVE)
 		return false;
+
+	pool->flags |= POOL_MANAGER_ACTIVE;
 	pool->manager = worker;
 
 	maybe_create_worker(pool);
 
 	pool->manager = NULL;
-	mutex_unlock(&pool->manager_arb);
+	pool->flags &= ~POOL_MANAGER_ACTIVE;
+	wake_up(&wq_manager_wait);
 	return true;
 }
 
@@ -3203,7 +3197,6 @@ static int init_worker_pool(struct worke
 	setup_timer(&pool->mayday_timer, pool_mayday_timeout,
 		    (unsigned long)pool);
 
-	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->attach_mutex);
 	INIT_LIST_HEAD(&pool->workers);
 
@@ -3273,13 +3266,15 @@ static void put_unbound_pool(struct work
 	hash_del(&pool->hash_node);
 
 	/*
-	 * Become the manager and destroy all workers.  Grabbing
-	 * manager_arb prevents @pool's workers from blocking on
-	 * attach_mutex.
+	 * Become the manager and destroy all workers.  This prevents
+	 * @pool's workers from blocking on attach_mutex.  We're the last
+	 * manager and @pool gets freed with the flag set.
 	 */
-	mutex_lock(&pool->manager_arb);
-
 	spin_lock_irq(&pool->lock);
+	wait_event_lock_irq(wq_manager_wait,
+			    !(pool->flags & POOL_MANAGER_ACTIVE), pool->lock);
+	pool->flags |= POOL_MANAGER_ACTIVE;
+
 	while ((worker = first_idle_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
@@ -3293,8 +3288,6 @@ static void put_unbound_pool(struct work
 	if (pool->detach_completion)
 		wait_for_completion(pool->detach_completion);
 
-	mutex_unlock(&pool->manager_arb);
-
 	/* shut down the timers */
 	del_timer_sync(&pool->idle_timer);
 	del_timer_sync(&pool->mayday_timer);

  reply	other threads:[~2017-10-31  9:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-31  9:55 [PATCH 4.9 00/23] 4.9.60-stable review Greg Kroah-Hartman
2017-10-31  9:55 ` Greg Kroah-Hartman [this message]
2017-10-31  9:55 ` [PATCH 4.9 02/23] ALSA: hda/realtek - Add support for ALC236/ALC3204 Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 03/23] ALSA: hda - fix headset mic problem for Dell machines with alc236 Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 04/23] ceph: unlock dangling spinlock in try_flush_caps() Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 05/23] usb: xhci: Handle error condition in xhci_stop_device() Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 06/23] KVM: PPC: Fix oops when checking KVM_CAP_PPC_HTM Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 07/23] spi: uapi: spidev: add missing ioctl header Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 08/23] spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 09/23] fuse: fix READDIRPLUS skipping an entry Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 10/23] xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap() Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 11/23] Input: elan_i2c - add ELAN0611 to the ACPI table Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 12/23] Input: gtco - fix potential out-of-bound access Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 13/23] assoc_array: Fix a buggy node-splitting case Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 14/23] scsi: zfcp: fix erp_action use-before-initialize in REC action trace Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 15/23] scsi: sg: Re-fix off by one in sg_fill_request_table() Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 16/23] drm/amd/powerplay: fix uninitialized variable Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 17/23] can: sun4i: fix loopback mode Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 18/23] can: kvaser_usb: Correct return value in printout Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 19/23] can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 20/23] cfg80211: fix connect/disconnect edge cases Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 21/23] ipsec: Fix aborted xfrm policy dump crash Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 22/23] regulator: fan53555: fix I2C device ids Greg Kroah-Hartman
2017-10-31  9:55 ` [PATCH 4.9 23/23] ecryptfs: fix dereference of NULL user_key_payload Greg Kroah-Hartman
     [not found] ` <59f8951e.87d8500a.cbc53.9419@mx.google.com>
2017-10-31 16:52   ` [PATCH 4.9 00/23] 4.9.60-stable review Greg Kroah-Hartman
2017-10-31 17:01     ` Mark Brown
2017-10-31 17:19 ` Guenter Roeck
2017-10-31 20:08 ` Shuah Khan
2017-10-31 20:59 ` Tom Gall
2017-11-01 15:21   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171031095517.456393241@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=jiangshanlai@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).