linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: linux-kernel@vger.kernel.org, Lai Jiangshan <laijs@cn.fujitsu.com>
Subject: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
Date: Thu, 6 Sep 2012 13:08:02 -0700	[thread overview]
Message-ID: <20120906200802.GI29092@google.com> (raw)
In-Reply-To: <20120906200723.GH29092@google.com>

>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 6 Sep 2012 12:50:41 -0700

To simplify both normal and CPU hotplug paths, while CPU hotplug is in
progress, manager_mutex is held to prevent one of the workers from
becoming a manager and creating or destroying workers; unfortunately,
it currently may lead to idle worker depletion which in turn can lead
to deadlock under extreme circumstances.

Idle workers aren't allowed to become busy if there's no other idle
worker left to create more idle workers, but during CPU_ONLINE
gcwq_associate() is holding all managerships and all the idle workers
can proceed to become busy before gcwq_associate() is finished.

This patch fixes the bug by releasing manager_mutexes before letting
the rebound idle workers go.  This ensures that by the time idle
workers check whether management is necessary, CPU_ONLINE already has
released the positions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b19170b..74487ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1454,10 +1454,19 @@ retry:
 	}
 
 	/*
-	 * All idle workers are rebound and waiting for %WORKER_REBIND to
-	 * be cleared inside idle_worker_rebind().  Clear and release.
-	 * Clearing %WORKER_REBIND from this foreign context is safe
-	 * because these workers are still guaranteed to be idle.
+	 * At this point, each pool is guaranteed to have at least one idle
+	 * worker and all idle workers are waiting for WORKER_REBIND to
+	 * clear.  Release management before releasing idle workers;
+	 * otherwise, they can all go become busy as we're holding the
+	 * manager_mutexes, which can lead to deadlock as we don't actually
+	 * create new workers.
+	 */
+	gcwq_release_management(gcwq);
+
+	/*
+	 * Clear %WORKER_REBIND and release.  Clearing it from this foreign
+	 * context is safe because these workers are still guaranteed to be
+	 * idle.
 	 *
 	 * We need to make sure all idle workers passed WORKER_REBIND wait
 	 * in idle_worker_rebind() before returning; otherwise, workers can
@@ -1467,6 +1476,7 @@ retry:
 	INIT_COMPLETION(idle_rebind.done);
 
 	for_each_worker_pool(pool, gcwq) {
+		WARN_ON_ONCE(list_empty(&pool->idle_list));
 		list_for_each_entry(worker, &pool->idle_list, entry) {
 			worker->flags &= ~WORKER_REBIND;
 			idle_rebind.cnt++;
@@ -1481,8 +1491,6 @@ retry:
 	} else {
 		spin_unlock_irq(&gcwq->lock);
 	}
-
-	gcwq_release_management(gcwq);
 }
 
 static struct worker *alloc_worker(void)
-- 
1.7.7.3


  reply	other threads:[~2012-09-06 20:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-06 20:06 [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock() Tejun Heo
2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
2012-09-06 20:08   ` Tejun Heo [this message]
2012-09-07  1:53     ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Lai Jiangshan
2012-09-07 19:25       ` Tejun Heo
2012-09-07  3:10     ` Lai Jiangshan
2012-09-07 19:29       ` Tejun Heo
2012-09-07 20:22         ` Tejun Heo
2012-09-07 20:34           ` Tejun Heo
2012-09-07 23:05             ` Tejun Heo
2012-09-07 23:07               ` Tejun Heo
2012-09-07 23:41                 ` Tejun Heo
2012-09-08 17:18                   ` Lai Jiangshan
2012-09-08 17:29                     ` Tejun Heo
2012-09-08 17:32                       ` Tejun Heo
2012-09-08 17:40                         ` Lai Jiangshan
2012-09-08 17:41                           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120906200802.GI29092@google.com \
    --to=tj@kernel.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).