linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers()
@ 2012-09-01 16:28 Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags Lai Jiangshan
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Patch 1~4   fix possible bugs.

Patch 1     fix possible double-write bug
Patch 2,5,6 makes the waiting logic more clear
Patch 3,4   fix bugs from manage VS hotplug
Patch 7,8,9,10 explicit logic to wait in busy-work-rebind and make rebind_workers()
	       single pass.

Change from V3.
A new approach to fix the bug manage VS hotplug. The new approach
still need POOL_MANAGING_WORKERS. so patch3 is kept.

busy-work-rebind wait on a different thing to wait all idles. so
the patch 8,9,10 's aim and changlog are almost not changed, but
the code are changed a little.(based on synchronize_all_idles_rebound())


Lai Jiangshan (10):
  workqueue: ensure the wq_worker_sleeping() see the right flags
  workqueue: fix deadlock in rebind_workers()
  workqueue: add POOL_MANAGING_WORKERS
  workqueue: add manage_workers_slowpath()
  workqueue: move rebind_hold to idle_rebind
  workqueue: simple clear WORKER_REBIND
  workqueue: move idle_rebind pointer to gcwq
  workqueue: explicit way to wait for idles workers to finish
  workqueue: single pass rebind_workers
  workqueue: merge the role of rebind_hold to idle_done

 kernel/workqueue.c |  236 +++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 187 insertions(+), 49 deletions(-)

-- 
1.7.4.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-04 23:39   ` [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic Tejun Heo
  2012-09-01 16:28 ` [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers() Lai Jiangshan
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

The compiler may compile this code into TWO write/modify instructions.
		worker->flags &= ~WORKER_UNBOUND;
		worker->flags |= WORKER_REBIND;

so the other CPU may see the temporary of worker->flags which has
not WORKER_UNBOUND nor WORKER_REBIND, it will wrongly do local wake up.

so we use one write/modify instruction explicitly instead.

This bug will not occur on idle workers, because they have another
WORKER_NOT_RUNNING flags.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692d976..4f252d0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1434,10 +1434,13 @@ retry:
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
+		unsigned long worker_flags = worker->flags;
 
 		/* morph UNBOUND to REBIND */
-		worker->flags &= ~WORKER_UNBOUND;
-		worker->flags |= WORKER_REBIND;
+		worker_flags &= ~WORKER_UNBOUND;
+		worker_flags |= WORKER_REBIND;
+		/* ensure the wq_worker_sleeping() see the right flags */
+		ACCESS_ONCE(worker->flags) = worker_flags;
 
 		if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
 				     work_data_bits(rebind_work)))
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-05  0:54   ` Tejun Heo
  2012-09-01 16:28 ` [PATCH 03/10 V4] workqueue: add POOL_MANAGING_WORKERS Lai Jiangshan
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Current idle_worker_rebind() has a bug.

idle_worker_rebind() path			HOTPLUG path
						online
							rebind_workers()
wait_event(gcwq->rebind_hold)
	woken up but no scheduled			rebind_workers() returns (*)
						the same cpu offline
						the same cpu online again
							rebind_workers()
								set WORKER_REBIND
	scheduled,see the WORKER_REBIND
	wait rebind_workers() clear it    <--bug-->		wait idle_worker_rebind()
								rebound.

The two thread wait each other. It is bug.

This fix focuses in (*), rebind_workers() can't returns until all idles
finish waiting on gcwq->rebind_hold(aka: until all idles release the reference
of gcwq->rebind_hold). We add a ref_done to do it. rebind_workers() will
waits on ref_done for all idles to finish wait.

It is now tree-times-hand-shake.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   61 ++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4f252d0..1bfe407 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1305,8 +1305,22 @@ __acquires(&gcwq->lock)
 }
 
 struct idle_rebind {
-	int			cnt;		/* # workers to be rebound */
-	struct completion	done;		/* all workers rebound */
+	int		  idle_cnt;	/* # idle workers to be rebound */
+	struct completion idle_done;	/* all idle workers rebound */
+
+	/*
+	 * notify the rebind_workers() that:
+	 * 0. All worker leave rebind_hold:
+	 * 1. All idle workers are rebound.
+	 * 2. No idle worker has ref to this struct
+	 *
+	 * @ref_cnt: # idle workers which has ref to this struct
+	 * @ref_done: any idle workers has no ref to this struct,
+	 *	      nor rebind_hold.
+	 *	      it also implies that all idle workers are rebound.
+	 */
+	int		  ref_cnt;
+	struct completion ref_done;
 };
 
 /*
@@ -1320,12 +1334,19 @@ static void idle_worker_rebind(struct worker *worker)
 
 	/* CPU must be online at this point */
 	WARN_ON(!worker_maybe_bind_and_lock(worker));
-	if (!--worker->idle_rebind->cnt)
-		complete(&worker->idle_rebind->done);
+	++worker->idle_rebind->ref_cnt;
+	if (!--worker->idle_rebind->idle_cnt)
+		complete(&worker->idle_rebind->idle_done);
 	spin_unlock_irq(&worker->pool->gcwq->lock);
 
 	/* we did our part, wait for rebind_workers() to finish up */
 	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
+
+	/* noify if all idle worker are done(rebond & wait) */
+	spin_lock_irq(&worker->pool->gcwq->lock);
+	if (!--worker->idle_rebind->ref_cnt)
+		complete(&worker->idle_rebind->ref_done);
+	spin_unlock_irq(&worker->pool->gcwq->lock);
 }
 
 /*
@@ -1384,14 +1405,18 @@ static void rebind_workers(struct global_cwq *gcwq)
 		lockdep_assert_held(&pool->manager_mutex);
 
 	/*
-	 * Rebind idle workers.  Interlocked both ways.  We wait for
-	 * workers to rebind via @idle_rebind.done.  Workers will wait for
-	 * us to finish up by watching %WORKER_REBIND.
+	 * Rebind idle workers.  Interlocked both ways in triple waits.
+	 * We wait for workers to rebind via @idle_rebind.idle_done.
+	 * Workers will wait for us to finish up by watching %WORKER_REBIND.
+	 * And them we wait for workers to leave rebind_hold
+	 * via @idle_rebind.ref_done.
 	 */
-	init_completion(&idle_rebind.done);
+	init_completion(&idle_rebind.idle_done);
+	init_completion(&idle_rebind.ref_done);
+	idle_rebind.ref_cnt = 1;
 retry:
-	idle_rebind.cnt = 1;
-	INIT_COMPLETION(idle_rebind.done);
+	idle_rebind.idle_cnt = 1;
+	INIT_COMPLETION(idle_rebind.idle_done);
 
 	/* set REBIND and kick idle ones, we'll wait for these later */
 	for_each_worker_pool(pool, gcwq) {
@@ -1403,7 +1428,7 @@ retry:
 			worker->flags &= ~WORKER_UNBOUND;
 			worker->flags |= WORKER_REBIND;
 
-			idle_rebind.cnt++;
+			idle_rebind.idle_cnt++;
 			worker->idle_rebind = &idle_rebind;
 
 			/* worker_thread() will call idle_worker_rebind() */
@@ -1411,9 +1436,9 @@ retry:
 		}
 	}
 
-	if (--idle_rebind.cnt) {
+	if (--idle_rebind.idle_cnt) {
 		spin_unlock_irq(&gcwq->lock);
-		wait_for_completion(&idle_rebind.done);
+		wait_for_completion(&idle_rebind.idle_done);
 		spin_lock_irq(&gcwq->lock);
 		/* busy ones might have become idle while waiting, retry */
 		goto retry;
@@ -1452,6 +1477,16 @@ retry:
 			    worker->scheduled.next,
 			    work_color_to_flags(WORK_NO_COLOR));
 	}
+
+	/*
+	 * we will leave rebind_workers(), have to wait until no worker
+	 * has ref to this idle_rebind nor rebind_hold.
+	 */
+	if (--idle_rebind.ref_cnt) {
+		spin_unlock_irq(&gcwq->lock);
+		wait_for_completion(&idle_rebind.ref_done);
+		spin_lock_irq(&gcwq->lock);
+	}
 }
 
 static struct worker *alloc_worker(void)
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/10 V4] workqueue: add POOL_MANAGING_WORKERS
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers() Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 04/10 V4] workqueue: add manage_workers_slowpath() Lai Jiangshan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

When hotplug happens, the plug code will also grab the manager_mutex,
it will break too_many_workers()'s assumption, and make too_many_workers()
ugly(kick the timer wrongly, no found bug).

To avoid assumption-coruption, we add the original POOL_MANAGING_WORKERS back.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1bfe407..979ef4f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
 
 	/* pool flags */
 	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
+	POOL_MANAGING_WORKERS   = 1 << 1,       /* managing workers */
 
 	/* worker flags */
 	WORKER_STARTED		= 1 << 0,	/* started */
@@ -652,7 +653,7 @@ static bool need_to_manage_workers(struct worker_pool *pool)
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_mutex);
+	bool managing = pool->flags & POOL_MANAGING_WORKERS;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1836,6 +1837,7 @@ static bool manage_workers(struct worker *worker)
 		return ret;
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
+	pool->flags |= POOL_MANAGING_WORKERS;
 
 	/*
 	 * Destroy and then create so that may_start_working() is true
@@ -1844,6 +1846,7 @@ static bool manage_workers(struct worker *worker)
 	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
+	pool->flags &= ~POOL_MANAGING_WORKERS;
 	mutex_unlock(&pool->manager_mutex);
 	return ret;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (2 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 03/10 V4] workqueue: add POOL_MANAGING_WORKERS Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-05  1:12   ` Tejun Heo
  2012-09-01 16:28 ` [PATCH 05/10 V4] workqueue: move rebind_hold to idle_rebind Lai Jiangshan
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

If hotplug code grabbed the manager_mutex and worker_thread try to create
a worker, the manage_worker() will return false and worker_thread go to
process work items. Now, on the CPU, all workers are processing work items,
no idle_worker left/ready for managing. It breaks the concept of workqueue
and it is bug.

So when manage_worker() failed to grab the manager_mutex, it should
try to enter normal process contex and then compete on the manager_mutex
instead of return false.

To safely do this, we add manage_workers_slowpath() and the worker
go to process work items mode to do the managing jobs. thus
managing jobs are processed via work item and can free to compete
on manager_mutex.

After this patch, manager_mutex can be grabbed anywhere if needed,
it will not cause the CPU consumes all the idle worker_threads.

By the way, POOL_MANAGING_WORKERS is still need to tell us
why manage_workers() failed to grab the manage_mutex.

This slowpath is hard to trigger, so I change
"if (unlikely(!mutex_trylock(&pool->manager_mutex)))"
to "if (1 || unlikely(!mutex_trylock(&pool->manager_mutex)))"
when testing, it uses manage_workers_slowpath() always.


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   89 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 979ef4f..d40e8d7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1808,6 +1808,81 @@ static bool maybe_destroy_workers(struct worker_pool *pool)
 	return ret;
 }
 
+/* manage workers via work item */
+static void manage_workers_slowpath_fn(struct work_struct *work)
+{
+	struct worker *worker = kthread_data(current);
+	struct worker_pool *pool = worker->pool;
+
+	mutex_lock(&pool->manager_mutex);
+	spin_lock_irq(&pool->gcwq->lock);
+
+	pool->flags &= ~POOL_MANAGE_WORKERS;
+	maybe_destroy_workers(pool);
+	maybe_create_worker(pool);
+
+	spin_unlock_irq(&pool->gcwq->lock);
+	mutex_unlock(&pool->manager_mutex);
+}
+
+static void process_scheduled_works(struct worker *worker);
+
+/*
+ * manage_workers_slowpath - manage worker pool via work item
+ * @worker: self
+ *
+ * manage workers when rebind_workers() or gcwq_unbind_fn() beat us down
+ * on manage_mutex. The worker can't release the gcwq->lock and then
+ * compete on manage_mutex, because any worker must have at least one of:
+ * 	1) with gcwq->lock held
+ * 	2) with pool->manage_mutex held (manage_workers() fast path)
+ * 	3) queued on idle_list
+ * 	4) processing work item and queued on busy hash table
+ *
+ * So we move the managing worker job to a work item and process it,
+ * thus the manage_workers_slowpath_fn() has full ability to compete
+ * on manage_mutex.
+ *
+ * CONTEXT:
+ * with WORKER_PREP bit set
+ * spin_lock_irq(gcwq->lock) which will be released and regrabbed
+ * multiple times.  Does GFP_KERNEL allocations.
+ */
+static void manage_workers_slowpath(struct worker *worker)
+{
+	struct worker_pool *pool = worker->pool;
+	struct work_struct manage_work;
+	int cpu = pool->gcwq->cpu;
+	struct cpu_workqueue_struct *cwq;
+
+	pool->flags |= POOL_MANAGING_WORKERS;
+
+	INIT_WORK_ONSTACK(&manage_work, manage_workers_slowpath_fn);
+	__set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&manage_work));
+
+	/* see the comment of the same statement of worker_thread() */
+	BUG_ON(!list_empty(&worker->scheduled));
+
+	/* wq doesn't matter, use the default one */
+	if (cpu == WORK_CPU_UNBOUND)
+		cwq = get_cwq(cpu, system_unbound_wq);
+	else
+		cwq = get_cwq(cpu, system_wq);
+
+	/* insert the work to the worker's own scheduled list */
+	debug_work_activate(&manage_work);
+	insert_work(cwq, &manage_work, &worker->scheduled,
+		    work_color_to_flags(WORK_NO_COLOR));
+
+	/*
+	 * Do manage workers. And may also proccess busy_worker_rebind_fn()
+	 * queued by rebind_workers().
+	 */
+	process_scheduled_works(worker);
+
+	pool->flags &= ~POOL_MANAGING_WORKERS;
+}
+
 /**
  * manage_workers - manage worker pool
  * @worker: self
@@ -1833,8 +1908,18 @@ static bool manage_workers(struct worker *worker)
 	struct worker_pool *pool = worker->pool;
 	bool ret = false;
 
-	if (!mutex_trylock(&pool->manager_mutex))
-		return ret;
+	if (pool->flags & POOL_MANAGING_WORKERS)
+		return false;
+
+	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+		/*
+		 * Ouch! rebind_workers() or gcwq_unbind_fn() beats we,
+		 * but we can't return without making any progress.
+		 * Fall back to manage_workers_slowpath().
+		 */
+		manage_workers_slowpath(worker);
+		return true;
+	}
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
 	pool->flags |= POOL_MANAGING_WORKERS;
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/10 V4] workqueue: move rebind_hold to idle_rebind
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (3 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 04/10 V4] workqueue: add manage_workers_slowpath() Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 06/10 V4] workqueue: simple clear WORKER_REBIND Lai Jiangshan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

By the help of @idle_rebind.ref_done, the life time of idle_rebind
is expanded enough and it can include the whole reference-time
of @rebind_hold, so we can move @rebind_hold from gcwq to idle_rebind.

Also we change it to completion instead. we need to ease the pain of WORKER_REBIND.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d40e8d7..55864d1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -185,8 +185,6 @@ struct global_cwq {
 						/* L: hash of busy workers */
 
 	struct worker_pool	pools[2];	/* normal and highpri pools */
-
-	wait_queue_head_t	rebind_hold;	/* rebind hold wait */
 } ____cacheline_aligned_in_smp;
 
 /*
@@ -1309,15 +1307,16 @@ struct idle_rebind {
 	int		  idle_cnt;	/* # idle workers to be rebound */
 	struct completion idle_done;	/* all idle workers rebound */
 
+	/* idle workers wait all idles to be rebound */
+	struct completion rebind_hold;
+
 	/*
 	 * notify the rebind_workers() that:
-	 * 0. All worker leave rebind_hold:
 	 * 1. All idle workers are rebound.
 	 * 2. No idle worker has ref to this struct
 	 *
 	 * @ref_cnt: # idle workers which has ref to this struct
 	 * @ref_done: any idle workers has no ref to this struct,
-	 *	      nor rebind_hold.
 	 *	      it also implies that all idle workers are rebound.
 	 */
 	int		  ref_cnt;
@@ -1331,8 +1330,6 @@ struct idle_rebind {
  */
 static void idle_worker_rebind(struct worker *worker)
 {
-	struct global_cwq *gcwq = worker->pool->gcwq;
-
 	/* CPU must be online at this point */
 	WARN_ON(!worker_maybe_bind_and_lock(worker));
 	++worker->idle_rebind->ref_cnt;
@@ -1341,7 +1338,7 @@ static void idle_worker_rebind(struct worker *worker)
 	spin_unlock_irq(&worker->pool->gcwq->lock);
 
 	/* we did our part, wait for rebind_workers() to finish up */
-	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
+	wait_for_completion(&worker->idle_rebind->rebind_hold);
 
 	/* noify if all idle worker are done(rebond & wait) */
 	spin_lock_irq(&worker->pool->gcwq->lock);
@@ -1408,11 +1405,12 @@ static void rebind_workers(struct global_cwq *gcwq)
 	/*
 	 * Rebind idle workers.  Interlocked both ways in triple waits.
 	 * We wait for workers to rebind via @idle_rebind.idle_done.
-	 * Workers will wait for us to finish up by watching %WORKER_REBIND.
+	 * Workers will wait for us via @idle_rebind.rebind_hold.
 	 * And them we wait for workers to leave rebind_hold
 	 * via @idle_rebind.ref_done.
 	 */
 	init_completion(&idle_rebind.idle_done);
+	init_completion(&idle_rebind.rebind_hold);
 	init_completion(&idle_rebind.ref_done);
 	idle_rebind.ref_cnt = 1;
 retry:
@@ -1455,7 +1453,7 @@ retry:
 		list_for_each_entry(worker, &pool->idle_list, entry)
 			worker->flags &= ~WORKER_REBIND;
 
-	wake_up_all(&gcwq->rebind_hold);
+	complete_all(&idle_rebind.rebind_hold);
 
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
@@ -1481,7 +1479,7 @@ retry:
 
 	/*
 	 * we will leave rebind_workers(), have to wait until no worker
-	 * has ref to this idle_rebind nor rebind_hold.
+	 * has ref to this idle_rebind.
 	 */
 	if (--idle_rebind.ref_cnt) {
 		spin_unlock_irq(&gcwq->lock);
@@ -3848,8 +3846,6 @@ static int __init init_workqueues(void)
 			mutex_init(&pool->manager_mutex);
 			ida_init(&pool->worker_ida);
 		}
-
-		init_waitqueue_head(&gcwq->rebind_hold);
 	}
 
 	/* create the initial worker */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/10 V4] workqueue: simple clear WORKER_REBIND
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (4 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 05/10 V4] workqueue: move rebind_hold to idle_rebind Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 07/10 V4] workqueue: move idle_rebind pointer to gcwq Lai Jiangshan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

WORKER_REBIND is not used for other purpose,
idle_worker_rebind() can directly clear it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 55864d1..9466d91 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1332,6 +1332,7 @@ static void idle_worker_rebind(struct worker *worker)
 {
 	/* CPU must be online at this point */
 	WARN_ON(!worker_maybe_bind_and_lock(worker));
+	worker_clr_flags(worker, WORKER_REBIND);
 	++worker->idle_rebind->ref_cnt;
 	if (!--worker->idle_rebind->idle_cnt)
 		complete(&worker->idle_rebind->idle_done);
@@ -1420,7 +1421,7 @@ retry:
 	/* set REBIND and kick idle ones, we'll wait for these later */
 	for_each_worker_pool(pool, gcwq) {
 		list_for_each_entry(worker, &pool->idle_list, entry) {
-			if (worker->flags & WORKER_REBIND)
+			if (!(worker->flags & WORKER_UNBOUND))
 				continue;
 
 			/* morph UNBOUND to REBIND */
@@ -1443,16 +1444,6 @@ retry:
 		goto retry;
 	}
 
-	/*
-	 * All idle workers are rebound and waiting for %WORKER_REBIND to
-	 * be cleared inside idle_worker_rebind().  Clear and release.
-	 * Clearing %WORKER_REBIND from this foreign context is safe
-	 * because these workers are still guaranteed to be idle.
-	 */
-	for_each_worker_pool(pool, gcwq)
-		list_for_each_entry(worker, &pool->idle_list, entry)
-			worker->flags &= ~WORKER_REBIND;
-
 	complete_all(&idle_rebind.rebind_hold);
 
 	/* rebind busy workers */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/10 V4] workqueue: move idle_rebind pointer to gcwq
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (5 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 06/10 V4] workqueue: simple clear WORKER_REBIND Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 08/10 V4] workqueue: explicit way to wait for idles workers to finish Lai Jiangshan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

And this pointer helps other workers know the progress of idle-rebinding.
when gcwq->idle_rebind is not NULL, it means the idle-rebinding is still
in progress.

and idle_worker_rebind() is split.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   67 ++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9466d91..c875951 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -150,7 +150,6 @@ struct worker {
 	int			id;		/* I: worker id */
 
 	/* for rebinding worker to CPU */
-	struct idle_rebind	*idle_rebind;	/* L: for idle worker */
 	struct work_struct	rebind_work;	/* L: for busy worker */
 };
 
@@ -185,6 +184,9 @@ struct global_cwq {
 						/* L: hash of busy workers */
 
 	struct worker_pool	pools[2];	/* normal and highpri pools */
+
+	/* for rebinding worker to CPU */
+	struct idle_rebind	*idle_rebind;	/* L or ref: for idle worker */
 } ____cacheline_aligned_in_smp;
 
 /*
@@ -1313,10 +1315,10 @@ struct idle_rebind {
 	/*
 	 * notify the rebind_workers() that:
 	 * 1. All idle workers are rebound.
-	 * 2. No idle worker has ref to this struct
+	 * 2. No worker has ref to this struct
 	 *
-	 * @ref_cnt: # idle workers which has ref to this struct
-	 * @ref_done: any idle workers has no ref to this struct,
+	 * @ref_cnt: # workers which has ref to this struct
+	 * @ref_done: any worker has no ref to this struct,
 	 *	      it also implies that all idle workers are rebound.
 	 */
 	int		  ref_cnt;
@@ -1324,28 +1326,55 @@ struct idle_rebind {
 };
 
 /*
+ * synchronize_all_idles_rebound: wait until all idles workers are rebound
+ * @gcwq: gcwq of interest
+ * @idle_rebind: the value of @gcwq->idle_rebind, the caller should have
+ * 		 at least one reference to it.
+ *
+ * CONTEXT:
+ * Might sleep.  Called without any lock.
+ * With at least one bit of WORKER_NOT_RUNNING set if called from worker.
+ */
+static void synchronize_all_idles_rebound(struct global_cwq *gcwq,
+		struct idle_rebind *idle_rebind)
+{
+
+	/* wait for rebind_workers() to notify that all idles are rebound */
+	wait_for_completion(&idle_rebind->rebind_hold);
+
+	/* finished synchronizing, put reference */
+	spin_lock_irq(&gcwq->lock);
+	if (!--idle_rebind->ref_cnt) {
+		gcwq->idle_rebind = NULL;
+		complete(&idle_rebind->ref_done);
+	}
+	spin_unlock_irq(&gcwq->lock);
+}
+
+/*
  * Rebind an idle @worker to its CPU.  During CPU onlining, this has to
  * happen synchronously for idle workers.  worker_thread() will test
  * %WORKER_REBIND before leaving idle and call this function.
  */
 static void idle_worker_rebind(struct worker *worker)
 {
+	struct global_cwq *gcwq = worker->pool->gcwq;
+	struct idle_rebind *idle_rebind;
+
 	/* CPU must be online at this point */
 	WARN_ON(!worker_maybe_bind_and_lock(worker));
 	worker_clr_flags(worker, WORKER_REBIND);
-	++worker->idle_rebind->ref_cnt;
-	if (!--worker->idle_rebind->idle_cnt)
-		complete(&worker->idle_rebind->idle_done);
-	spin_unlock_irq(&worker->pool->gcwq->lock);
 
-	/* we did our part, wait for rebind_workers() to finish up */
-	wait_for_completion(&worker->idle_rebind->rebind_hold);
+	/* get reference */
+	idle_rebind = gcwq->idle_rebind;
+	++idle_rebind->ref_cnt;
 
-	/* noify if all idle worker are done(rebond & wait) */
-	spin_lock_irq(&worker->pool->gcwq->lock);
-	if (!--worker->idle_rebind->ref_cnt)
-		complete(&worker->idle_rebind->ref_done);
-	spin_unlock_irq(&worker->pool->gcwq->lock);
+	/* this worker has been rebound */
+	if (!--idle_rebind->idle_cnt)
+		complete(&idle_rebind->idle_done);
+	spin_unlock_irq(&gcwq->lock);
+
+	synchronize_all_idles_rebound(gcwq, idle_rebind);
 }
 
 /*
@@ -1414,6 +1443,7 @@ static void rebind_workers(struct global_cwq *gcwq)
 	init_completion(&idle_rebind.rebind_hold);
 	init_completion(&idle_rebind.ref_done);
 	idle_rebind.ref_cnt = 1;
+	gcwq->idle_rebind = &idle_rebind;
 retry:
 	idle_rebind.idle_cnt = 1;
 	INIT_COMPLETION(idle_rebind.idle_done);
@@ -1429,7 +1459,6 @@ retry:
 			worker->flags |= WORKER_REBIND;
 
 			idle_rebind.idle_cnt++;
-			worker->idle_rebind = &idle_rebind;
 
 			/* worker_thread() will call idle_worker_rebind() */
 			wake_up_process(worker->task);
@@ -1476,6 +1505,8 @@ retry:
 		spin_unlock_irq(&gcwq->lock);
 		wait_for_completion(&idle_rebind.ref_done);
 		spin_lock_irq(&gcwq->lock);
+	} else {
+		gcwq->idle_rebind = NULL;
 	}
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/10 V4] workqueue: explicit way to wait for idles workers to finish
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (6 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 07/10 V4] workqueue: move idle_rebind pointer to gcwq Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 09/10] workqueue: single pass rebind_workers Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done Lai Jiangshan
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

busy_worker_rebind_fn() can't return until all idle workers are rebound.
This order is ensured by rebind_workers() currently.

We use synchronize_all_idles_rebound() to wait for all idle workers
to be rebound. this is an explicit way and it will ease the pain of
the rebind_workers().

The sleeping synchronize_all_idles_rebound() must be called before
the WORKER_REBIND has been cleared.

It adds a small overhead to the unlikely path.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c875951..16bcd84 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1387,9 +1387,23 @@ static void busy_worker_rebind_fn(struct work_struct *work)
 {
 	struct worker *worker = container_of(work, struct worker, rebind_work);
 	struct global_cwq *gcwq = worker->pool->gcwq;
+	struct idle_rebind *idle_rebind;
+
+	if (worker_maybe_bind_and_lock(worker)) {
+		/* Is idle-rebinding still in progress? */
+		if ((idle_rebind = gcwq->idle_rebind) != NULL) {
+			/* get reference */
+			BUG_ON(idle_rebind->ref_cnt <= 0);
+			idle_rebind->ref_cnt++;
+			spin_unlock_irq(&gcwq->lock);
+
+			synchronize_all_idles_rebound(gcwq, idle_rebind);
+
+			spin_lock_irq(&gcwq->lock);
+		}
 
-	if (worker_maybe_bind_and_lock(worker))
 		worker_clr_flags(worker, WORKER_REBIND);
+	}
 
 	spin_unlock_irq(&gcwq->lock);
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/10] workqueue: single pass rebind_workers
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (7 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 08/10 V4] workqueue: explicit way to wait for idles workers to finish Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-01 16:28 ` [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done Lai Jiangshan
  9 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

busy_worker_rebind_fn() can't return until all idle workers are rebound,
the code of busy_worker_rebind_fn() ensure this.

So we can change the order of the code of rebind_workers(),
and make it is a single pass to do the rebind_workers().

It makes the code much clean and better readability(still need later patch
to improve the readability)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   23 ++++++++---------------
 1 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 16bcd84..6d68571 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1458,16 +1458,11 @@ static void rebind_workers(struct global_cwq *gcwq)
 	init_completion(&idle_rebind.ref_done);
 	idle_rebind.ref_cnt = 1;
 	gcwq->idle_rebind = &idle_rebind;
-retry:
 	idle_rebind.idle_cnt = 1;
-	INIT_COMPLETION(idle_rebind.idle_done);
 
 	/* set REBIND and kick idle ones, we'll wait for these later */
 	for_each_worker_pool(pool, gcwq) {
 		list_for_each_entry(worker, &pool->idle_list, entry) {
-			if (!(worker->flags & WORKER_UNBOUND))
-				continue;
-
 			/* morph UNBOUND to REBIND */
 			worker->flags &= ~WORKER_UNBOUND;
 			worker->flags |= WORKER_REBIND;
@@ -1479,16 +1474,6 @@ retry:
 		}
 	}
 
-	if (--idle_rebind.idle_cnt) {
-		spin_unlock_irq(&gcwq->lock);
-		wait_for_completion(&idle_rebind.idle_done);
-		spin_lock_irq(&gcwq->lock);
-		/* busy ones might have become idle while waiting, retry */
-		goto retry;
-	}
-
-	complete_all(&idle_rebind.rebind_hold);
-
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
@@ -1515,6 +1500,14 @@ retry:
 	 * we will leave rebind_workers(), have to wait until no worker
 	 * has ref to this idle_rebind.
 	 */
+	if (--idle_rebind.idle_cnt) {
+		spin_unlock_irq(&gcwq->lock);
+		wait_for_completion(&idle_rebind.idle_done);
+		spin_lock_irq(&gcwq->lock);
+	}
+
+	complete_all(&idle_rebind.rebind_hold);
+
 	if (--idle_rebind.ref_cnt) {
 		spin_unlock_irq(&gcwq->lock);
 		wait_for_completion(&idle_rebind.ref_done);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done
  2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
                   ` (8 preceding siblings ...)
  2012-09-01 16:28 ` [PATCH 09/10] workqueue: single pass rebind_workers Lai Jiangshan
@ 2012-09-01 16:28 ` Lai Jiangshan
  2012-09-05  1:15   ` Tejun Heo
  9 siblings, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-01 16:28 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Currently is single pass, we can wait on idle_done instead wait on rebind_hold.
So we can remove rebind_hold and make the code simpler.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   35 +++++++++++------------------------
 1 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6d68571..6411616 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1309,9 +1309,6 @@ struct idle_rebind {
 	int		  idle_cnt;	/* # idle workers to be rebound */
 	struct completion idle_done;	/* all idle workers rebound */
 
-	/* idle workers wait all idles to be rebound */
-	struct completion rebind_hold;
-
 	/*
 	 * notify the rebind_workers() that:
 	 * 1. All idle workers are rebound.
@@ -1340,7 +1337,7 @@ static void synchronize_all_idles_rebound(struct global_cwq *gcwq,
 {
 
 	/* wait for rebind_workers() to notify that all idles are rebound */
-	wait_for_completion(&idle_rebind->rebind_hold);
+	wait_for_completion(&idle_rebind->idle_done);
 
 	/* finished synchronizing, put reference */
 	spin_lock_irq(&gcwq->lock);
@@ -1371,7 +1368,7 @@ static void idle_worker_rebind(struct worker *worker)
 
 	/* this worker has been rebound */
 	if (!--idle_rebind->idle_cnt)
-		complete(&idle_rebind->idle_done);
+		complete_all(&idle_rebind->idle_done);
 	spin_unlock_irq(&gcwq->lock);
 
 	synchronize_all_idles_rebound(gcwq, idle_rebind);
@@ -1447,17 +1444,15 @@ static void rebind_workers(struct global_cwq *gcwq)
 		lockdep_assert_held(&pool->manager_mutex);
 
 	/*
-	 * Rebind idle workers.  Interlocked both ways in triple waits.
-	 * We wait for workers to rebind via @idle_rebind.idle_done.
-	 * Workers will wait for us via @idle_rebind.rebind_hold.
-	 * And them we wait for workers to leave rebind_hold
-	 * via @idle_rebind.ref_done.
+	 * Rebind idle workers. Idle workers wait each other to rebind via
+	 * synchronize_all_idles_rebound(). We wait for all idle workers
+	 * to rebind via synchronize_all_idles_rebound() too.
+	 * And them we wait for all workers finish synchronizing and
+	 * release the ref of @idle_rebind via @idle_rebind.ref_done.
 	 */
 	init_completion(&idle_rebind.idle_done);
-	init_completion(&idle_rebind.rebind_hold);
 	init_completion(&idle_rebind.ref_done);
 	idle_rebind.ref_cnt = 1;
-	gcwq->idle_rebind = &idle_rebind;
 	idle_rebind.idle_cnt = 1;
 
 	/* set REBIND and kick idle ones, we'll wait for these later */
@@ -1497,23 +1492,15 @@ static void rebind_workers(struct global_cwq *gcwq)
 	}
 
 	/*
-	 * we will leave rebind_workers(), have to wait until no worker
-	 * has ref to this idle_rebind.
+	 * we will leave rebind_workers(), have to wait until all idles
+	 * are rebound and until no worker has ref to this idle_rebind.
 	 */
 	if (--idle_rebind.idle_cnt) {
+		gcwq->idle_rebind = &idle_rebind;
 		spin_unlock_irq(&gcwq->lock);
-		wait_for_completion(&idle_rebind.idle_done);
-		spin_lock_irq(&gcwq->lock);
-	}
-
-	complete_all(&idle_rebind.rebind_hold);
-
-	if (--idle_rebind.ref_cnt) {
-		spin_unlock_irq(&gcwq->lock);
+		synchronize_all_idles_rebound(gcwq, &idle_rebind);
 		wait_for_completion(&idle_rebind.ref_done);
 		spin_lock_irq(&gcwq->lock);
-	} else {
-		gcwq->idle_rebind = NULL;
 	}
 }
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
  2012-09-01 16:28 ` [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags Lai Jiangshan
@ 2012-09-04 23:39   ` Tejun Heo
  2012-09-04 23:58     ` [PATCH -stable] " Tejun Heo
  2012-09-05  1:05     ` [PATCH] " Lai Jiangshan
  0 siblings, 2 replies; 23+ messages in thread
From: Tejun Heo @ 2012-09-04 23:39 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, stable

>From d2ae38fc5e37b4bca3c4bec04a10dcf861a77b2b Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Sun, 2 Sep 2012 00:28:19 +0800

The compiler may compile the following code into TWO write/modify
instructions.

	worker->flags &= ~WORKER_UNBOUND;
	worker->flags |= WORKER_REBIND;

so the other CPU may temporarily see worker->flags which doesn't have
either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
prematurely.

Fix it by using single explicit assignment via ACCESS_ONCE().

Because idle workers have another WORKER_NOT_RUNNING flag, this bug
doesn't exist for them; however, update it to use the same pattern for
consistency.

tj: Applied the change to idle workers too and updated comments and
    patch description a bit.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
---
Applied to wq/for-3.6-fixes with some modifications.  Greg, this won't
apply cleanly to -stable.  Will post the backported version as a reply
to this message.

Thanks!

 kernel/workqueue.c |   17 +++++++++++------
 1 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692d976..c462cd6 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1396,12 +1396,15 @@ retry:
 	/* set REBIND and kick idle ones, we'll wait for these later */
 	for_each_worker_pool(pool, gcwq) {
 		list_for_each_entry(worker, &pool->idle_list, entry) {
+			unsigned long worker_flags = worker->flags;
+
 			if (worker->flags & WORKER_REBIND)
 				continue;
 
-			/* morph UNBOUND to REBIND */
-			worker->flags &= ~WORKER_UNBOUND;
-			worker->flags |= WORKER_REBIND;
+			/* morph UNBOUND to REBIND atomically */
+			worker_flags &= ~WORKER_UNBOUND;
+			worker_flags |= WORKER_REBIND;
+			ACCESS_ONCE(worker->flags) = worker_flags;
 
 			idle_rebind.cnt++;
 			worker->idle_rebind = &idle_rebind;
@@ -1434,10 +1437,12 @@ retry:
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
+		unsigned long worker_flags = worker->flags;
 
-		/* morph UNBOUND to REBIND */
-		worker->flags &= ~WORKER_UNBOUND;
-		worker->flags |= WORKER_REBIND;
+		/* morph UNBOUND to REBIND atomically */
+		worker_flags &= ~WORKER_UNBOUND;
+		worker_flags |= WORKER_REBIND;
+		ACCESS_ONCE(worker->flags) = worker_flags;
 
 		if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
 				     work_data_bits(rebind_work)))
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH -stable] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
  2012-09-04 23:39   ` [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic Tejun Heo
@ 2012-09-04 23:58     ` Tejun Heo
  2012-09-16 15:49       ` Ben Hutchings
  2012-09-05  1:05     ` [PATCH] " Lai Jiangshan
  1 sibling, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2012-09-04 23:58 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, stable

From: Lai Jiangshan <laijs@cn.fujitsu.com>

The compiler may compile the following code into TWO write/modify
instructions.

        worker->flags &= ~WORKER_UNBOUND;
        worker->flags |= WORKER_REBIND;

so the other CPU may temporarily see worker->flags which doesn't have
either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
prematurely.

Fix it by using single explicit assignment via ACCESS_ONCE().

Because idle workers have another WORKER_NOT_RUNNING flag, this bug
doesn't exist for them; however, update it to use the same pattern for
consistency.

tj: Applied the change to idle workers too and updated comments and
    patch description a bit.

stable: Idle worker rebinding doesn't apply for -stable and
        WORKER_UNBOUND used to be WORKER_ROGUE.  Updated accordingly.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
---
 kernel/workqueue.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9a3128d..d7eb05a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3441,14 +3441,17 @@ static int __cpuinit trustee_thread(void *__gcwq)
 
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
+		unsigned long worker_flags = worker->flags;
 
 		/*
 		 * Rebind_work may race with future cpu hotplug
 		 * operations.  Use a separate flag to mark that
-		 * rebinding is scheduled.
+		 * rebinding is scheduled.  The morphing should
+		 * be atomic.
 		 */
-		worker->flags |= WORKER_REBIND;
-		worker->flags &= ~WORKER_ROGUE;
+		worker_flags |= WORKER_REBIND;
+		worker_flags &= ~WORKER_ROGUE;
+		ACCESS_ONCE(worker->flags) = worker_flags;
 
 		/* queue rebind_work, wq doesn't matter, use the default one */
 		if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()
  2012-09-01 16:28 ` [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers() Lai Jiangshan
@ 2012-09-05  0:54   ` Tejun Heo
  2012-09-05  1:28     ` Lai Jiangshan
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2012-09-05  0:54 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

How about something like the following?  This is more consistent with
the existing code and as the fixes need to go separately through
for-3.6-fixes, it's best to stay consistent regardless of the end
result after all the restructuring.  It's not tested yet.  If you
don't object, I'll split it into two patches, test and route them
through for-3.6-fixes w/ your Original-patch-by.

Thanks.
---
 kernel/workqueue.c |   51 ++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 38 insertions(+), 13 deletions(-)

Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -1326,6 +1326,15 @@ static void idle_worker_rebind(struct wo
 
 	/* we did our part, wait for rebind_workers() to finish up */
 	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
+
+	/*
+	 * rebind_workers() shouldn't finish until all workers passed the
+	 * above WORKER_REBIND wait.  Tell it when done.
+	 */
+	spin_lock_irq(&worker->pool->gcwq->lock);
+	if (!--worker->idle_rebind->cnt)
+		complete(&worker->idle_rebind->done);
+	spin_unlock_irq(&worker->pool->gcwq->lock);
 }
 
 /*
@@ -1422,19 +1431,7 @@ retry:
 		goto retry;
 	}
 
-	/*
-	 * All idle workers are rebound and waiting for %WORKER_REBIND to
-	 * be cleared inside idle_worker_rebind().  Clear and release.
-	 * Clearing %WORKER_REBIND from this foreign context is safe
-	 * because these workers are still guaranteed to be idle.
-	 */
-	for_each_worker_pool(pool, gcwq)
-		list_for_each_entry(worker, &pool->idle_list, entry)
-			worker->flags &= ~WORKER_REBIND;
-
-	wake_up_all(&gcwq->rebind_hold);
-
-	/* rebind busy workers */
+	/* all idle workers are rebound, rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
 		unsigned long worker_flags = worker->flags;
@@ -1454,6 +1451,34 @@ retry:
 			    worker->scheduled.next,
 			    work_color_to_flags(WORK_NO_COLOR));
 	}
+
+	/*
+	 * All idle workers are rebound and waiting for %WORKER_REBIND to
+	 * be cleared inside idle_worker_rebind().  Clear and release.
+	 * Clearing %WORKER_REBIND from this foreign context is safe
+	 * because these workers are still guaranteed to be idle.
+	 *
+	 * We need to make sure all idle workers passed WORKER_REBIND wait
+	 * in idle_worker_rebind() before returning; otherwise, workers can
+	 * get stuck at the wait if hotplug cycle repeats.
+	 */
+	idle_rebind.cnt = 1;
+	INIT_COMPLETION(idle_rebind.done);
+
+	for_each_worker_pool(pool, gcwq) {
+		list_for_each_entry(worker, &pool->idle_list, entry) {
+			worker->flags &= ~WORKER_REBIND;
+			idle_rebind.cnt++;
+		}
+	}
+
+	wake_up_all(&gcwq->rebind_hold);
+
+	if (--idle_rebind.cnt) {
+		spin_unlock_irq(&gcwq->lock);
+		wait_for_completion(&idle_rebind.done);
+		spin_lock_irq(&gcwq->lock);
+	}
 }
 
 static struct worker *alloc_worker(void)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
  2012-09-04 23:39   ` [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic Tejun Heo
  2012-09-04 23:58     ` [PATCH -stable] " Tejun Heo
@ 2012-09-05  1:05     ` Lai Jiangshan
  2012-09-05  1:17       ` Tejun Heo
  1 sibling, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-05  1:05 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, stable

On 09/05/2012 07:39 AM, Tejun Heo wrote:
>>From d2ae38fc5e37b4bca3c4bec04a10dcf861a77b2b Mon Sep 17 00:00:00 2001
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date: Sun, 2 Sep 2012 00:28:19 +0800
> 
> The compiler may compile the following code into TWO write/modify
> instructions.
> 
> 	worker->flags &= ~WORKER_UNBOUND;
> 	worker->flags |= WORKER_REBIND;
> 
> so the other CPU may temporarily see worker->flags which doesn't have
> either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
> prematurely.
> 
> Fix it by using single explicit assignment via ACCESS_ONCE().
> 
> Because idle workers have another WORKER_NOT_RUNNING flag, this bug
> doesn't exist for them; however, update it to use the same pattern for
> consistency.
> 
> tj: Applied the change to idle workers too and updated comments and
>     patch description a bit.

Hi, tj

Thank you for accepting this one.

I'm waiting for your comments on the other patches.
I need to rebase the other patches(on top of wq/for-3.7 and this merged one),
but I think it is not good to seed to new version patchset without considering
your comments. so I'm still waiting. Or should I rebase them at first?

Thanks
Lai

> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: stable@vger.kernel.org
> ---
> Applied to wq/for-3.6-fixes with some modifications.  Greg, this won't
> apply cleanly to -stable.  Will post the backported version as a reply
> to this message.
> 
> Thanks!
> 
>  kernel/workqueue.c |   17 +++++++++++------
>  1 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 692d976..c462cd6 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1396,12 +1396,15 @@ retry:
>  	/* set REBIND and kick idle ones, we'll wait for these later */
>  	for_each_worker_pool(pool, gcwq) {
>  		list_for_each_entry(worker, &pool->idle_list, entry) {
> +			unsigned long worker_flags = worker->flags;
> +
>  			if (worker->flags & WORKER_REBIND)
>  				continue;
>  
> -			/* morph UNBOUND to REBIND */
> -			worker->flags &= ~WORKER_UNBOUND;
> -			worker->flags |= WORKER_REBIND;
> +			/* morph UNBOUND to REBIND atomically */
> +			worker_flags &= ~WORKER_UNBOUND;
> +			worker_flags |= WORKER_REBIND;
> +			ACCESS_ONCE(worker->flags) = worker_flags;
>  
>  			idle_rebind.cnt++;
>  			worker->idle_rebind = &idle_rebind;
> @@ -1434,10 +1437,12 @@ retry:
>  	/* rebind busy workers */
>  	for_each_busy_worker(worker, i, pos, gcwq) {
>  		struct work_struct *rebind_work = &worker->rebind_work;
> +		unsigned long worker_flags = worker->flags;
>  
> -		/* morph UNBOUND to REBIND */
> -		worker->flags &= ~WORKER_UNBOUND;
> -		worker->flags |= WORKER_REBIND;
> +		/* morph UNBOUND to REBIND atomically */
> +		worker_flags &= ~WORKER_UNBOUND;
> +		worker_flags |= WORKER_REBIND;
> +		ACCESS_ONCE(worker->flags) = worker_flags;
>  
>  		if (test_and_set_bit(WORK_STRUCT_PENDING_BIT,
>  				     work_data_bits(rebind_work)))


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
  2012-09-01 16:28 ` [PATCH 04/10 V4] workqueue: add manage_workers_slowpath() Lai Jiangshan
@ 2012-09-05  1:12   ` Tejun Heo
  2012-09-06  1:55     ` Lai Jiangshan
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2012-09-05  1:12 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
> If hotplug code grabbed the manager_mutex and worker_thread try to create
> a worker, the manage_worker() will return false and worker_thread go to
> process work items. Now, on the CPU, all workers are processing work items,
> no idle_worker left/ready for managing. It breaks the concept of workqueue
> and it is bug.
> 
> So when manage_worker() failed to grab the manager_mutex, it should
> try to enter normal process contex and then compete on the manager_mutex
> instead of return false.
> 
> To safely do this, we add manage_workers_slowpath() and the worker
> go to process work items mode to do the managing jobs. thus
> managing jobs are processed via work item and can free to compete
> on manager_mutex.

Ummm.... this seems overly complicated.  How about scheduling rebind
work to a worker and forcing it to break out of the work processing
loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
- set it from the rebind function, break out of work processing loop
if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
test (the function really isn't necessary) and always jump back to
recheck afterwards.  It might need a bit more mangling here and there
but that should be the essence of it.  I'll give a stab at it later
today.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done
  2012-09-01 16:28 ` [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done Lai Jiangshan
@ 2012-09-05  1:15   ` Tejun Heo
  2012-09-05  1:48     ` Lai Jiangshan
  0 siblings, 1 reply; 23+ messages in thread
From: Tejun Heo @ 2012-09-05  1:15 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Sep 02, 2012 at 12:28:28AM +0800, Lai Jiangshan wrote:
> Currently is single pass, we can wait on idle_done instead wait on rebind_hold.
> So we can remove rebind_hold and make the code simpler.

As I wrote before, in general, I do like this approach; however, the
implementation in this series seems to make the code longer, which
kinda defeats its purpose of simplifying the implementation.  It could
be because the series is mixing fixes with restructuring.  Can you
please re-spin the restructuring part once fixes are settled?  Let's
see if that actually makes things simpler.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
  2012-09-05  1:05     ` [PATCH] " Lai Jiangshan
@ 2012-09-05  1:17       ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2012-09-05  1:17 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, stable

Hello, Lai.

On Wed, Sep 05, 2012 at 09:05:37AM +0800, Lai Jiangshan wrote:
> Thank you for accepting this one.

Thanks a lot for bearing with me.  I was at kernel summit / plumbers
last week so my reviews were shaky.  My apologies.

> I'm waiting for your comments on the other patches.
> I need to rebase the other patches(on top of wq/for-3.7 and this merged one),
> but I think it is not good to seed to new version patchset without considering
> your comments. so I'm still waiting. Or should I rebase them at first?

Let's get the fixes down first.  The fixes and restructuring need to
be routed separately, so we need to separate them into different patch
serieses.  Once the fix part is done, I'll pull the fixes into for-3.7
on which restructuring can happen.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()
  2012-09-05  0:54   ` Tejun Heo
@ 2012-09-05  1:28     ` Lai Jiangshan
  2012-09-05  1:33       ` Tejun Heo
  0 siblings, 1 reply; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-05  1:28 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 09/05/2012 08:54 AM, Tejun Heo wrote:
> How about something like the following?  This is more consistent with
> the existing code and as the fixes need to go separately through
> for-3.6-fixes, it's best to stay consistent regardless of the end
> result after all the restructuring.  It's not tested yet.  If you
> don't object, I'll split it into two patches, test and route them
> through for-3.6-fixes w/ your Original-patch-by.
> 

I see that this patch's idea is same as mine but reuses
@idle_rebind.cnt and @idle_rebind.done.

I don't think it is consistent to avoid adding new field
and to reuse old field for different purpose


Thanks
Lai


> Thanks.
> ---
>  kernel/workqueue.c |   51 ++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 38 insertions(+), 13 deletions(-)
> 
> Index: work/kernel/workqueue.c
> ===================================================================
> --- work.orig/kernel/workqueue.c
> +++ work/kernel/workqueue.c
> @@ -1326,6 +1326,15 @@ static void idle_worker_rebind(struct wo
>  
>  	/* we did our part, wait for rebind_workers() to finish up */
>  	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
> +
> +	/*
> +	 * rebind_workers() shouldn't finish until all workers passed the
> +	 * above WORKER_REBIND wait.  Tell it when done.
> +	 */
> +	spin_lock_irq(&worker->pool->gcwq->lock);
> +	if (!--worker->idle_rebind->cnt)
> +		complete(&worker->idle_rebind->done);
> +	spin_unlock_irq(&worker->pool->gcwq->lock);
>  }
>  
>  /*
> @@ -1422,19 +1431,7 @@ retry:
>  		goto retry;
>  	}
>  
> -	/*
> -	 * All idle workers are rebound and waiting for %WORKER_REBIND to
> -	 * be cleared inside idle_worker_rebind().  Clear and release.
> -	 * Clearing %WORKER_REBIND from this foreign context is safe
> -	 * because these workers are still guaranteed to be idle.
> -	 */
> -	for_each_worker_pool(pool, gcwq)
> -		list_for_each_entry(worker, &pool->idle_list, entry)
> -			worker->flags &= ~WORKER_REBIND;
> -
> -	wake_up_all(&gcwq->rebind_hold);

don't need to move down.

> -
> -	/* rebind busy workers */
> +	/* all idle workers are rebound, rebind busy workers */
>  	for_each_busy_worker(worker, i, pos, gcwq) {
>  		struct work_struct *rebind_work = &worker->rebind_work;
>  		unsigned long worker_flags = worker->flags;
> @@ -1454,6 +1451,34 @@ retry:
>  			    worker->scheduled.next,
>  			    work_color_to_flags(WORK_NO_COLOR));
>  	}
> +
> +	/*
> +	 * All idle workers are rebound and waiting for %WORKER_REBIND to
> +	 * be cleared inside idle_worker_rebind().  Clear and release.
> +	 * Clearing %WORKER_REBIND from this foreign context is safe
> +	 * because these workers are still guaranteed to be idle.
> +	 *
> +	 * We need to make sure all idle workers passed WORKER_REBIND wait
> +	 * in idle_worker_rebind() before returning; otherwise, workers can
> +	 * get stuck at the wait if hotplug cycle repeats.
> +	 */
> +	idle_rebind.cnt = 1;
> +	INIT_COMPLETION(idle_rebind.done);
> +
> +	for_each_worker_pool(pool, gcwq) {
> +		list_for_each_entry(worker, &pool->idle_list, entry) {
> +			worker->flags &= ~WORKER_REBIND;
> +			idle_rebind.cnt++;
> +		}
> +	}
> +
> +	wake_up_all(&gcwq->rebind_hold);
> +
> +	if (--idle_rebind.cnt) {
> +		spin_unlock_irq(&gcwq->lock);
> +		wait_for_completion(&idle_rebind.done);
> +		spin_lock_irq(&gcwq->lock);
> +	}
>  }
>  
>  static struct worker *alloc_worker(void)
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()
  2012-09-05  1:28     ` Lai Jiangshan
@ 2012-09-05  1:33       ` Tejun Heo
  0 siblings, 0 replies; 23+ messages in thread
From: Tejun Heo @ 2012-09-05  1:33 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Wed, Sep 05, 2012 at 09:28:38AM +0800, Lai Jiangshan wrote:
> I see that this patch's idea is same as mine but reuses
> @idle_rebind.cnt and @idle_rebind.done.
> 
> I don't think it is consistent to avoid adding new field
> and to reuse old field for different purpose

It's not necessarily the reuse which seems more consistent to me but
where the driving logic is.  It's now better contained in
rebind_workers().

> > -	/*
> > -	 * All idle workers are rebound and waiting for %WORKER_REBIND to
> > -	 * be cleared inside idle_worker_rebind().  Clear and release.
> > -	 * Clearing %WORKER_REBIND from this foreign context is safe
> > -	 * because these workers are still guaranteed to be idle.
> > -	 */
> > -	for_each_worker_pool(pool, gcwq)
> > -		list_for_each_entry(worker, &pool->idle_list, entry)
> > -			worker->flags &= ~WORKER_REBIND;
> > -
> > -	wake_up_all(&gcwq->rebind_hold);
> 
> don't need to move down.

Yeap, but cleaner that way.  With the moving separated into another
patch, the addition of another handshake stage becomes easier to
follow.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done
  2012-09-05  1:15   ` Tejun Heo
@ 2012-09-05  1:48     ` Lai Jiangshan
  0 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-05  1:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 09/05/2012 09:15 AM, Tejun Heo wrote:
> On Sun, Sep 02, 2012 at 12:28:28AM +0800, Lai Jiangshan wrote:
>> Currently is single pass, we can wait on idle_done instead wait on rebind_hold.
>> So we can remove rebind_hold and make the code simpler.
> 
> As I wrote before, in general, I do like this approach; however, the
> implementation in this series seems to make the code longer, which
> kinda defeats its purpose of simplifying the implementation.  It could
> be because the series is mixing fixes with restructuring.  

> Can you
> please re-spin the restructuring part once fixes are settled?  Let's
> see if that actually makes things simpler.

OK for me.

Patch 5:  -4
Patch 6:  -9		Patch 5,6 simplify idle rebind(benefit from Patch 2)
Patch 7:  +31
Patch 8:  +14		Patch 7,8 make busy worker wait idle worker's rebinding
Patch 9:  -7
Patch 10: -13		Patch 9,10 single pass

total:  +11		single pass allows us clean up more, example narrow the critical
			critical section of manage_mutex. I will add it in next round.

thank,
Lai

> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/10 V4] workqueue: add manage_workers_slowpath()
  2012-09-05  1:12   ` Tejun Heo
@ 2012-09-06  1:55     ` Lai Jiangshan
  0 siblings, 0 replies; 23+ messages in thread
From: Lai Jiangshan @ 2012-09-06  1:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 09/05/2012 09:12 AM, Tejun Heo wrote:
> Hello, Lai.
> 
> On Sun, Sep 02, 2012 at 12:28:22AM +0800, Lai Jiangshan wrote:
>> If hotplug code grabbed the manager_mutex and worker_thread try to create
>> a worker, the manage_worker() will return false and worker_thread go to
>> process work items. Now, on the CPU, all workers are processing work items,
>> no idle_worker left/ready for managing. It breaks the concept of workqueue
>> and it is bug.
>>
>> So when manage_worker() failed to grab the manager_mutex, it should
>> try to enter normal process contex and then compete on the manager_mutex
>> instead of return false.
>>
>> To safely do this, we add manage_workers_slowpath() and the worker
>> go to process work items mode to do the managing jobs. thus
>> managing jobs are processed via work item and can free to compete
>> on manager_mutex.
> 
> Ummm.... this seems overly complicated.  How about scheduling rebind
> work to a worker and forcing it to break out of the work processing
> loop?  I think it can be done fairly easily using POOL_MANAGE_WORKERS
> - set it from the rebind function, break out of work processing loop
> if it's set, replace need_to_manage_workers() with POOL_MANAGE_WORKERS
> test (the function really isn't necessary) and always jump back to
> recheck afterwards.  It might need a bit more mangling here and there
> but that should be the essence of it.  I'll give a stab at it later
> today.
> 

This approach is a little like my unsent approach3.(I will explain in other mail)
This approach is most complicated and changing more code if it is implemented.

First we should rebind/unbind separated by pool. because,
	if we queue the rebind-work to high-pri pool, we will break normal-pool
	vice versa

and this forces us move DISASSOCIATED to pool-flags.
and this forces us add more code in cpu-notify

second, reuse POOL_MANAGE_WORKERS, or add new one.

third, need to restruct of rebind/unbind and change a lot in worker_thread.

my partial/unsent approach3 has almost the same problem.
(different, my approach3 don't use work item, it is checked and called from
the "recheck" label of worker_thread. it is called with WORKER_PREP bit set
and it uses "mutex_trylock" to grab lock like manage_workers())

how much code will be changed for only unbind part of this approach:


 kernel/workqueue.c |  103 ++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 76 insertions(+), 27 deletions(-)


Thanks
Lai

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -stable] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
  2012-09-04 23:58     ` [PATCH -stable] " Tejun Heo
@ 2012-09-16 15:49       ` Ben Hutchings
  0 siblings, 0 replies; 23+ messages in thread
From: Ben Hutchings @ 2012-09-16 15:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On Tue, 2012-09-04 at 16:58 -0700, Tejun Heo wrote:
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
> 
> The compiler may compile the following code into TWO write/modify
> instructions.
> 
>         worker->flags &= ~WORKER_UNBOUND;
>         worker->flags |= WORKER_REBIND;
> 
> so the other CPU may temporarily see worker->flags which doesn't have
> either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
> prematurely.
> 
> Fix it by using single explicit assignment via ACCESS_ONCE().
> 
> Because idle workers have another WORKER_NOT_RUNNING flag, this bug
> doesn't exist for them; however, update it to use the same pattern for
> consistency.
> 
> tj: Applied the change to idle workers too and updated comments and
>     patch description a bit.
> 
> stable: Idle worker rebinding doesn't apply for -stable and
>         WORKER_UNBOUND used to be WORKER_ROGUE.  Updated accordingly.
[...]

Added to the queue for 3.2, thanks.

Ben.

-- 
Ben Hutchings
Experience is what causes a person to make new mistakes instead of old ones.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-09-16 15:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-01 16:28 [PATCH 00/10 V4] workqueue: fix and cleanup hotplug/rebind_workers() Lai Jiangshan
2012-09-01 16:28 ` [PATCH 01/10 V4] workqueue: ensure the wq_worker_sleeping() see the right flags Lai Jiangshan
2012-09-04 23:39   ` [PATCH] workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic Tejun Heo
2012-09-04 23:58     ` [PATCH -stable] " Tejun Heo
2012-09-16 15:49       ` Ben Hutchings
2012-09-05  1:05     ` [PATCH] " Lai Jiangshan
2012-09-05  1:17       ` Tejun Heo
2012-09-01 16:28 ` [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers() Lai Jiangshan
2012-09-05  0:54   ` Tejun Heo
2012-09-05  1:28     ` Lai Jiangshan
2012-09-05  1:33       ` Tejun Heo
2012-09-01 16:28 ` [PATCH 03/10 V4] workqueue: add POOL_MANAGING_WORKERS Lai Jiangshan
2012-09-01 16:28 ` [PATCH 04/10 V4] workqueue: add manage_workers_slowpath() Lai Jiangshan
2012-09-05  1:12   ` Tejun Heo
2012-09-06  1:55     ` Lai Jiangshan
2012-09-01 16:28 ` [PATCH 05/10 V4] workqueue: move rebind_hold to idle_rebind Lai Jiangshan
2012-09-01 16:28 ` [PATCH 06/10 V4] workqueue: simple clear WORKER_REBIND Lai Jiangshan
2012-09-01 16:28 ` [PATCH 07/10 V4] workqueue: move idle_rebind pointer to gcwq Lai Jiangshan
2012-09-01 16:28 ` [PATCH 08/10 V4] workqueue: explicit way to wait for idles workers to finish Lai Jiangshan
2012-09-01 16:28 ` [PATCH 09/10] workqueue: single pass rebind_workers Lai Jiangshan
2012-09-01 16:28 ` [PATCH 10/10 V4] workqueue: merge the role of rebind_hold to idle_done Lai Jiangshan
2012-09-05  1:15   ` Tejun Heo
2012-09-05  1:48     ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).