[PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization
@ 2014-04-27  4:08 Lai Jiangshan
  2014-04-27  4:08 ` [PATCH 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
                   ` (10 more replies)
  0 siblings, 11 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Patch1-4: async worker destruction

The old version(https://lkml.org/lkml/2014/2/17/418) code of async worker
destruction has to dance with pool->lock and worker_idr, it is complicated.

But worker_idr in the new version dosn't requies pool->lock, and the code in
put_unbound_pool() which waits workers exit is changed to wait on a
wait_queue_head_t. These two changes make the new async worker destruction
much simpler.

Patch2 reduces the review burden. It will be easier to review the whole
patchset if we know destroy_worker() is forced to destroy idle workers only.

==========
Patch5-10: pool-binding synchronization and simplify the workers management

The code which binds a worker to the pool and unbind a worker from the pool
is unfolded in create_worker()/destroy_worker().
The patchset moves this pool-binding code out and wraps them.

patch3-4 moves the pool-unbinding code out from destroy_worker(), and make
manger_mutex only protects the pool-unbinding code only rather than
protects the whole worker-destruction path.

patch5-7 makes manger_mutex only protects the pool-binding code rather than the
whole worker-creation path.

patch8: rename manger_mutex to bind_mutex
patch9-10: moves the pool-binding code out from create_worker() and use it for
rescuer.

Lai Jiangshan (10):
  workqueue: use manager lock only to protect worker_idr
  workqueue: destroy_worker() should destroy idle workers only
  workqueue: async worker destruction
  workqueue: destroy worker directly in the idle timeout handler
  workqueue: separate iteration role from worker_idr
  workqueue: convert worker_idr to worker_ida
  workqueue: narrow the protection range of manager_mutex
  workqueue: rename manager_mutex to bind_mutex
  workqueue: separate pool-binding code out from create_worker()
  workqueue: use generic pool-bind/unbind routine for rescuers

 kernel/workqueue.c          |  396 +++++++++++++------------------------------
 kernel/workqueue_internal.h |    1 +
 2 files changed, 120 insertions(+), 277 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] workqueue: use manager lock only to protect worker_idr
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
@ 2014-04-27  4:08 ` Lai Jiangshan
  2014-05-05 13:01   ` Tejun Heo
  2014-04-27  4:08 ` [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker_idr is highly bound to managers and is always/only accessed in manager
lock context. So we don't need pool->lock for it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   34 ++++++----------------------------
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c3f076f..48a22e4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,8 +124,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * MG: pool->manager_mutex and pool->lock protected.  Writes require both
- *     locks.  Reads can happen under either lock.
+ * M: pool->manager_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -164,7 +163,7 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* MG: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs and iteration */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -340,16 +339,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 			   lockdep_is_held(&wq->mutex),			\
 			   "sched RCU or wq->mutex should be held")
 
-#ifdef CONFIG_LOCKDEP
-#define assert_manager_or_pool_lock(pool)				\
-	WARN_ONCE(debug_locks &&					\
-		  !lockdep_is_held(&(pool)->manager_mutex) &&		\
-		  !lockdep_is_held(&(pool)->lock),			\
-		  "pool->manager_mutex or ->lock should be held")
-#else
-#define assert_manager_or_pool_lock(pool)	do { } while (0)
-#endif
-
 #define for_each_cpu_worker_pool(pool, cpu)				\
 	for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];		\
 	     (pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
@@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
  * @wi: integer used for iteration
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with either @pool->manager_mutex or ->lock held.
+ * This must be called with either @pool->manager_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, wi, pool)				\
 	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
-		if (({ assert_manager_or_pool_lock((pool)); false; })) { } \
+		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
 		else
 
 /**
@@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	 * ID is needed to determine kthread name.  Allocate ID first
 	 * without installing the pointer.
 	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pool->lock);
-
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
-
-	spin_unlock_irq(&pool->lock);
-	idr_preload_end();
+	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool *pool)
 		worker->flags |= WORKER_UNBOUND;
 
 	/* successful, commit the pointer to idr */
-	spin_lock_irq(&pool->lock);
 	idr_replace(&pool->worker_idr, worker, worker->id);
-	spin_unlock_irq(&pool->lock);
 
 	return worker;
 
 fail:
-	if (id >= 0) {
-		spin_lock_irq(&pool->lock);
+	if (id >= 0)
 		idr_remove(&pool->worker_idr, id);
-		spin_unlock_irq(&pool->lock);
-	}
 	kfree(worker);
 	return NULL;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
  2014-04-27  4:08 ` [PATCH 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
@ 2014-04-27  4:08 ` Lai Jiangshan
  2014-05-05 13:13   ` Tejun Heo
  2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

If destroy_worker() destroys a non-idle worker, it will be buggy
and it is extremely hard to check. We should force destroy_worker()
to destroy idle workers only.

We had already ensured in the code that we only pass idle workers
to destroy_worker(), so this change has no functionality changed.
We just need to update the comments and the sanity check code.

In the sanity check code, we will refuse to destroy the worker
if !(worker->flags & WORKER_IDLE).

If the worker entered idle which means it is already started,
so we remove the check of "worker->flags & WORKER_STARTED",
after this removal, WORKER_STARTED is totally unneeded,
so we remove WORKER_STARTED too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 48a22e4..7690d0c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -73,7 +73,6 @@ enum {
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
 
 	/* worker flags */
-	WORKER_STARTED		= 1 << 0,	/* started */
 	WORKER_DIE		= 1 << 1,	/* die die die */
 	WORKER_IDLE		= 1 << 2,	/* is idle */
 	WORKER_PREP		= 1 << 3,	/* preparing to run works */
@@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
- * Create a new worker which is bound to @pool.  The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is bound to @pool.
+ * The new worker should be started and enter idle by start_worker().
  *
  * CONTEXT:
  * Might sleep.  Does GFP_KERNEL allocations.
@@ -1778,7 +1776,6 @@ fail:
  */
 static void start_worker(struct worker *worker)
 {
-	worker->flags |= WORKER_STARTED;
 	worker->pool->nr_workers++;
 	worker_enter_idle(worker);
 	wake_up_process(worker->task);
@@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
  * @worker: worker to be destroyed
  *
  * Destroy @worker and adjust @pool stats accordingly.
+ * The worker should be idle(WORKER_IDLE).
  *
  * CONTEXT:
  * spin_lock_irq(pool->lock) which is released and regrabbed.
@@ -1828,13 +1826,13 @@ static void destroy_worker(struct worker *worker)
 
 	/* sanity check frenzy */
 	if (WARN_ON(worker->current_work) ||
-	    WARN_ON(!list_empty(&worker->scheduled)))
+	    WARN_ON(!list_empty(&worker->scheduled)) ||
+	    WARN_ON(!(worker->flags & WORKER_IDLE)) ||
+	    WARN_ON(pool->nr_workers == 1 && !list_empty(&pool->worklist)))
 		return;
 
-	if (worker->flags & WORKER_STARTED)
-		pool->nr_workers--;
-	if (worker->flags & WORKER_IDLE)
-		pool->nr_idle--;
+	pool->nr_workers--;
+	pool->nr_idle--;
 
 	/*
 	 * Once WORKER_DIE is set, the kworker may destroy itself at any
@@ -3589,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	mutex_lock(&pool->manager_mutex);
 	spin_lock_irq(&pool->lock);
 
+	WARN_ON(pool->nr_workers != pool->nr_idle);
 	while ((worker = first_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/10] workqueue: async worker destruction
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
  2014-04-27  4:08 ` [PATCH 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
  2014-04-27  4:08 ` [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
@ 2014-04-27  4:08 ` Lai Jiangshan
  2014-05-05 14:31   ` Tejun Heo
                     ` (2 more replies)
  2014-04-27  4:08 ` [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker destruction includes these parts of code:
	adjust pool's stats
	remove the worker from idle list
	unbind the worker from the pool
	kthread_stop() to wait for the worker's task exit
	free the worker struct

We can find out that there is no essential thing to do after
kthread_stop(). Which means destroy_worker() doesn't need
to wait for the worker's task exit. So we can remove kthread_stop()
and free the worker struct in the worker exiting path.

But put_unbound_pool() still needs to sync the all the workers'
destruction before to destroy the pool. Otherwise the workers
may access to the invalid pool when they are exiting.

So we also move the code of "unbind the worker" to the exiting
path and let put_unbound_pool() to sync with this code via
a wait_queue_head_t workers_unbound.

The code of "unbind the worker" is wrapped in a function "worker_unbind_pool()"

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   44 ++++++++++++++++++++++++++++----------------
 1 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7690d0c..7ca564d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -163,6 +163,7 @@ struct worker_pool {
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
 	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	wait_queue_head_t	workers_unbound;/* all workers pool-unbound */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -1688,6 +1689,23 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_unbind_pool() - unbind the worker from the pool
+ * @worker: worker which is bound to its pool
+ *
+ * Undo the pool-binding which had been done in create_worker()
+ */
+static void worker_unbind_pool(struct worker *worker)
+{
+	struct worker_pool *pool = worker->pool;
+
+	mutex_lock(&pool->manager_mutex);
+	idr_remove(&pool->worker_idr, worker->id);
+	if (idr_is_empty(&pool->worker_idr))
+		wake_up(&pool->workers_unbound);
+	mutex_unlock(&pool->manager_mutex);
+}
+
+/**
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
@@ -1834,24 +1852,9 @@ static void destroy_worker(struct worker *worker)
 	pool->nr_workers--;
 	pool->nr_idle--;
 
-	/*
-	 * Once WORKER_DIE is set, the kworker may destroy itself at any
-	 * point.  Pin to ensure the task stays until we're done with it.
-	 */
-	get_task_struct(worker->task);
-
 	list_del_init(&worker->entry);
 	worker->flags |= WORKER_DIE;
-
-	idr_remove(&pool->worker_idr, worker->id);
-
-	spin_unlock_irq(&pool->lock);
-
-	kthread_stop(worker->task);
-	put_task_struct(worker->task);
-	kfree(worker);
-
-	spin_lock_irq(&pool->lock);
+	wake_up_process(worker->task);
 }
 
 static void idle_worker_timeout(unsigned long __pool)
@@ -2290,6 +2293,8 @@ woke_up:
 		spin_unlock_irq(&pool->lock);
 		WARN_ON_ONCE(!list_empty(&worker->entry));
 		worker->task->flags &= ~PF_WQ_WORKER;
+		worker_unbind_pool(worker);
+		kfree(worker);
 		return 0;
 	}
 
@@ -3528,6 +3533,7 @@ static int init_worker_pool(struct worker_pool *pool)
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
 	idr_init(&pool->worker_idr);
+	init_waitqueue_head(&pool->workers_unbound);
 
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
@@ -3593,6 +3599,12 @@ static void put_unbound_pool(struct worker_pool *pool)
 	WARN_ON(pool->nr_workers || pool->nr_idle);
 
 	spin_unlock_irq(&pool->lock);
+
+	wait_event_cmd(pool->workers_unbound,
+		       idr_is_empty(&pool->worker_idr),
+		       mutex_unlock(&pool->manager_mutex),
+		       mutex_lock(&pool->manager_mutex));
+
 	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (2 preceding siblings ...)
  2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
@ 2014-04-27  4:08 ` Lai Jiangshan
  2014-05-05 14:36   ` Tejun Heo
  2014-04-27  4:09 ` [PATCH 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Since kthread_stop() is removed from destroy_worker(),
destroy_worker() doesn't need to sleep.
Since "unbind the worker" is moved out from destroy_worker(),
destroy_worker() doesn't require manager_mutex.

So destroy_worker() can be directly called in the idle timeout
handler, it helps us remove POOL_MANAGE_WORKERS and
maybe_destroy_worker() and simplify the manage_workers()

After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
need to test whether it needs to manage after processed works.
So we can remove this test branch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   72 ++++-----------------------------------------------
 1 files changed, 6 insertions(+), 66 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7ca564d..f3a6086 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
 	 * manager_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
-	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
 
@@ -752,13 +751,6 @@ static bool need_to_create_worker(struct worker_pool *pool)
 	return need_more_worker(pool) && !may_start_working(pool);
 }
 
-/* Do I need to be the manager? */
-static bool need_to_manage_workers(struct worker_pool *pool)
-{
-	return need_to_create_worker(pool) ||
-		(pool->flags & POOL_MANAGE_WORKERS);
-}
-
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
@@ -1833,13 +1825,12 @@ static int create_and_start_worker(struct worker_pool *pool)
  * The worker should be idle(WORKER_IDLE).
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * spin_lock_irq(pool->lock).
  */
 static void destroy_worker(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
 
-	lockdep_assert_held(&pool->manager_mutex);
 	lockdep_assert_held(&pool->lock);
 
 	/* sanity check frenzy */
@@ -1862,8 +1853,7 @@ static void idle_worker_timeout(unsigned long __pool)
 	struct worker_pool *pool = (void *)__pool;
 
 	spin_lock_irq(&pool->lock);
-
-	if (too_many_workers(pool)) {
+	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
 
@@ -1871,15 +1861,13 @@ static void idle_worker_timeout(unsigned long __pool)
 		worker = list_entry(pool->idle_list.prev, struct worker, entry);
 		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
 
-		if (time_before(jiffies, expires))
+		if (time_before(jiffies, expires)) {
 			mod_timer(&pool->idle_timer, expires);
-		else {
-			/* it's been idle for too long, wake up manager */
-			pool->flags |= POOL_MANAGE_WORKERS;
-			wake_up_worker(pool);
+			break;
 		}
-	}
 
+		destroy_worker(worker);
+	}
 	spin_unlock_irq(&pool->lock);
 }
 
@@ -1996,44 +1984,6 @@ restart:
 }
 
 /**
- * maybe_destroy_worker - destroy workers which have been idle for a while
- * @pool: pool to destroy workers for
- *
- * Destroy @pool workers which have been idle for longer than
- * IDLE_WORKER_TIMEOUT.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times.  Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_destroy_workers(struct worker_pool *pool)
-{
-	bool ret = false;
-
-	while (too_many_workers(pool)) {
-		struct worker *worker;
-		unsigned long expires;
-
-		worker = list_entry(pool->idle_list.prev, struct worker, entry);
-		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
-
-		if (time_before(jiffies, expires)) {
-			mod_timer(&pool->idle_timer, expires);
-			break;
-		}
-
-		destroy_worker(worker);
-		ret = true;
-	}
-
-	return ret;
-}
-
-/**
  * manage_workers - manage worker pool
  * @worker: self
  *
@@ -2096,13 +2046,6 @@ static bool manage_workers(struct worker *worker)
 		ret = true;
 	}
 
-	pool->flags &= ~POOL_MANAGE_WORKERS;
-
-	/*
-	 * Destroy and then create so that may_start_working() is true
-	 * on return.
-	 */
-	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
 	mutex_unlock(&pool->manager_mutex);
@@ -2342,9 +2285,6 @@ recheck:
 
 	worker_set_flags(worker, WORKER_PREP, false);
 sleep:
-	if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
-		goto recheck;
-
 	/*
 	 * pool->lock is held and there's no work to process and no need to
 	 * manage, sleep.  Workers are woken up only while holding
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/10] workqueue: separate iteration role from worker_idr
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (3 preceding siblings ...)
  2014-04-27  4:08 ` [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-05-05 14:57   ` Tejun Heo
  2014-04-27  4:09 ` [PATCH 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Andrew Morton, Peter Zijlstra,
	Jan Kara, Viresh Kumar

worker_idr has the iteration and worker ID duties. These two duties
are not necessary tied together. We can separate them and use a list
for iteration.

After separation, we can add the rescuer workers to the list for iteration
in future. worker_idr can't add rescuer workers due to rescuer workers
can't allocate id from worker_idr.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   38 ++++++++++++--------------------------
 kernel/workqueue_internal.h |    1 +
 2 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f3a6086..0eae42c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,7 +161,8 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs */
+	struct list_head	bind_list;	/* M: pool-bound workers */
 	wait_queue_head_t	workers_unbound;/* all workers pool-unbound */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
@@ -361,22 +362,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 		else
 
 /**
- * for_each_pool_worker - iterate through all workers of a worker_pool
- * @worker: iteration cursor
- * @wi: integer used for iteration
- * @pool: worker_pool to iterate workers of
- *
- * This must be called with either @pool->manager_mutex.
- *
- * The if/else clause exists only for the lockdep assertion and can be
- * ignored.
- */
-#define for_each_pool_worker(worker, wi, pool)				\
-	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
-		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
-		else
-
-/**
  * for_each_pwq - iterate through all pool_workqueues of the specified workqueue
  * @pwq: iteration cursor
  * @wq: the target workqueue
@@ -1674,6 +1659,7 @@ static struct worker *alloc_worker(void)
 	if (worker) {
 		INIT_LIST_HEAD(&worker->entry);
 		INIT_LIST_HEAD(&worker->scheduled);
+		INIT_LIST_HEAD(&worker->bind_entry);
 		/* on creation a worker is in !idle && prep state */
 		worker->flags = WORKER_PREP;
 	}
@@ -1692,7 +1678,8 @@ static void worker_unbind_pool(struct worker *worker)
 
 	mutex_lock(&pool->manager_mutex);
 	idr_remove(&pool->worker_idr, worker->id);
-	if (idr_is_empty(&pool->worker_idr))
+	list_del(&worker->bind_entry);
+	if (list_empty(&pool->bind_list))
 		wake_up(&pool->workers_unbound);
 	mutex_unlock(&pool->manager_mutex);
 }
@@ -1765,6 +1752,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	/* successful, commit the pointer to idr */
 	idr_replace(&pool->worker_idr, worker, worker->id);
+	list_add_tail(&worker->bind_entry, &pool->bind_list);
 
 	return worker;
 
@@ -3473,6 +3461,7 @@ static int init_worker_pool(struct worker_pool *pool)
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
 	idr_init(&pool->worker_idr);
+	INIT_LIST_HEAD(&pool->bind_list);
 	init_waitqueue_head(&pool->workers_unbound);
 
 	INIT_HLIST_NODE(&pool->hash_node);
@@ -3541,7 +3530,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	spin_unlock_irq(&pool->lock);
 
 	wait_event_cmd(pool->workers_unbound,
-		       idr_is_empty(&pool->worker_idr),
+		       list_empty(&pool->bind_list),
 		       mutex_unlock(&pool->manager_mutex),
 		       mutex_lock(&pool->manager_mutex));
 
@@ -4524,7 +4513,6 @@ static void wq_unbind_fn(struct work_struct *work)
 	int cpu = smp_processor_id();
 	struct worker_pool *pool;
 	struct worker *worker;
-	int wi;
 
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
@@ -4539,7 +4527,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		list_for_each_entry(worker, &pool->bind_list, bind_entry)
 			worker->flags |= WORKER_UNBOUND;
 
 		pool->flags |= POOL_DISASSOCIATED;
@@ -4585,7 +4573,6 @@ static void wq_unbind_fn(struct work_struct *work)
 static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4596,13 +4583,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	list_for_each_entry(worker, &pool->bind_list, bind_entry)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 
 	spin_lock_irq(&pool->lock);
 
-	for_each_pool_worker(worker, wi, pool) {
+	list_for_each_entry(worker, &pool->bind_list, bind_entry) {
 		unsigned int worker_flags = worker->flags;
 
 		/*
@@ -4654,7 +4641,6 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 {
 	static cpumask_t cpumask;
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4668,7 +4654,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 		return;
 
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
-	for_each_pool_worker(worker, wi, pool)
+	list_for_each_entry(worker, &pool->bind_list, bind_entry)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 }
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..7bff111 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,6 +37,7 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
+	struct list_head	bind_entry;	/* M: bound with the pool */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/10] workqueue: convert worker_idr to worker_ida
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (4 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-05-05 14:59   ` Tejun Heo
  2014-04-27  4:09 ` [PATCH 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

We don't need to travell workers via worker_idr, worker_idr is
used for allocating/freeing ID only, so we convert it to
worker_ida.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0eae42c..7a181ce 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,10 +161,11 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs */
 	struct list_head	bind_list;	/* M: pool-bound workers */
 	wait_queue_head_t	workers_unbound;/* all workers pool-unbound */
 
+	struct ida		worker_ida;	/* worker IDs for task name */
+
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
 	int			refcnt;		/* PL: refcnt for unbound pools */
@@ -1677,7 +1678,6 @@ static void worker_unbind_pool(struct worker *worker)
 	struct worker_pool *pool = worker->pool;
 
 	mutex_lock(&pool->manager_mutex);
-	idr_remove(&pool->worker_idr, worker->id);
 	list_del(&worker->bind_entry);
 	if (list_empty(&pool->bind_list))
 		wake_up(&pool->workers_unbound);
@@ -1705,11 +1705,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	lockdep_assert_held(&pool->manager_mutex);
 
-	/*
-	 * ID is needed to determine kthread name.  Allocate ID first
-	 * without installing the pointer.
-	 */
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
+	/* ID is needed to determine kthread name. */
+	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1750,15 +1747,14 @@ static struct worker *create_worker(struct worker_pool *pool)
 	if (pool->flags & POOL_DISASSOCIATED)
 		worker->flags |= WORKER_UNBOUND;
 
-	/* successful, commit the pointer to idr */
-	idr_replace(&pool->worker_idr, worker, worker->id);
+	/* successful, bind the worker to the pool */
 	list_add_tail(&worker->bind_entry, &pool->bind_list);
 
 	return worker;
 
 fail:
 	if (id >= 0)
-		idr_remove(&pool->worker_idr, id);
+		ida_simple_remove(&pool->worker_ida, id);
 	kfree(worker);
 	return NULL;
 }
@@ -2224,6 +2220,9 @@ woke_up:
 		spin_unlock_irq(&pool->lock);
 		WARN_ON_ONCE(!list_empty(&worker->entry));
 		worker->task->flags &= ~PF_WQ_WORKER;
+
+		set_task_comm(worker->task, "kworker_die");
+		ida_simple_remove(&pool->worker_ida, worker->id);
 		worker_unbind_pool(worker);
 		kfree(worker);
 		return 0;
@@ -3460,10 +3459,10 @@ static int init_worker_pool(struct worker_pool *pool)
 
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
-	idr_init(&pool->worker_idr);
 	INIT_LIST_HEAD(&pool->bind_list);
 	init_waitqueue_head(&pool->workers_unbound);
 
+	ida_init(&pool->worker_ida);
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
 
@@ -3478,7 +3477,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 {
 	struct worker_pool *pool = container_of(rcu, struct worker_pool, rcu);
 
-	idr_destroy(&pool->worker_idr);
+	ida_destroy(&pool->worker_ida);
 	free_workqueue_attrs(pool->attrs);
 	kfree(pool);
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/10] workqueue: narrow the protection range of manager_mutex
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (5 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-04-27  4:09 ` [PATCH 08/10] workqueue: rename manager_mutex to bind_mutex Lai Jiangshan
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

In create_worker(), pool->worker_ida is protected by idr subsystem via
using ida_simple_get()/ida_simple_put(), it dozn't need manager_mutex

struct worker allocation and kthread allocation are not visible by any one,
they don't need manager_mutex either.

The above operations are before the pool-binding operation which bind
the worker to the pool. And after the pool-binding(start_worker()...etc),
the worker is already bound to the pool, the cpuhotplug will handle the
cpu-binding for the worker correctly since it is pool-bound to the pool.
So we don't need the manager_mutex after the pool-binding.

The conclusion is that only the pool-binding needs manager_mutex,
so we narrow the protection section of manager_mutex in create_worker().

Some comments about manager_mutex are removed, due to we will add
bind_mutex and worker_bind_pool() later which have [self-]comments.

In put_unbound_pool(), we also narrow the protection of manager_mutex
due to destroy_worker() doesn't need manager_mutex.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   42 ++++++++----------------------------------
 1 files changed, 8 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7a181ce..ff6ec9a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1703,8 +1703,6 @@ static struct worker *create_worker(struct worker_pool *pool)
 	int id = -1;
 	char id_buf[16];
 
-	lockdep_assert_held(&pool->manager_mutex);
-
 	/* ID is needed to determine kthread name. */
 	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
@@ -1733,6 +1731,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
+	mutex_lock(&pool->manager_mutex);
+
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
@@ -1740,7 +1740,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The caller is responsible for ensuring %POOL_DISASSOCIATED
+	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1750,6 +1750,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, bind the worker to the pool */
 	list_add_tail(&worker->bind_entry, &pool->bind_list);
 
+	mutex_unlock(&pool->manager_mutex);
+
 	return worker;
 
 fail:
@@ -1787,8 +1789,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	mutex_lock(&pool->manager_mutex);
-
 	worker = create_worker(pool);
 	if (worker) {
 		spin_lock_irq(&pool->lock);
@@ -1796,8 +1796,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 		spin_unlock_irq(&pool->lock);
 	}
 
-	mutex_unlock(&pool->manager_mutex);
-
 	return worker ? 0 : -ENOMEM;
 }
 
@@ -1996,8 +1994,6 @@ static bool manage_workers(struct worker *worker)
 	bool ret = false;
 
 	/*
-	 * Managership is governed by two mutexes - manager_arb and
-	 * manager_mutex.  manager_arb handles arbitration of manager role.
 	 * Anyone who successfully grabs manager_arb wins the arbitration
 	 * and becomes the manager.  mutex_trylock() on pool->manager_arb
 	 * failure while holding pool->lock reliably indicates that someone
@@ -2006,33 +2002,12 @@ static bool manage_workers(struct worker *worker)
 	 * grabbing manager_arb is responsible for actually performing
 	 * manager duties.  If manager_arb is grabbed and released without
 	 * actual management, the pool may stall indefinitely.
-	 *
-	 * manager_mutex is used for exclusion of actual management
-	 * operations.  The holder of manager_mutex can be sure that none
-	 * of management operations, including creation and destruction of
-	 * workers, won't take place until the mutex is released.  Because
-	 * manager_mutex doesn't interfere with manager role arbitration,
-	 * it is guaranteed that the pool's management, while may be
-	 * delayed, won't be disturbed by someone else grabbing
-	 * manager_mutex.
 	 */
 	if (!mutex_trylock(&pool->manager_arb))
 		return ret;
 
-	/*
-	 * With manager arbitration won, manager_mutex would be free in
-	 * most cases.  trylock first without dropping @pool->lock.
-	 */
-	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
-		spin_unlock_irq(&pool->lock);
-		mutex_lock(&pool->manager_mutex);
-		spin_lock_irq(&pool->lock);
-		ret = true;
-	}
-
 	ret |= maybe_create_worker(pool);
 
-	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
 	return ret;
 }
@@ -3518,22 +3493,21 @@ static void put_unbound_pool(struct worker_pool *pool)
 	 * manager_mutex.
 	 */
 	mutex_lock(&pool->manager_arb);
-	mutex_lock(&pool->manager_mutex);
-	spin_lock_irq(&pool->lock);
 
+	spin_lock_irq(&pool->lock);
 	WARN_ON(pool->nr_workers != pool->nr_idle);
 	while ((worker = first_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-
 	spin_unlock_irq(&pool->lock);
 
+	mutex_lock(&pool->manager_mutex);
 	wait_event_cmd(pool->workers_unbound,
 		       list_empty(&pool->bind_list),
 		       mutex_unlock(&pool->manager_mutex),
 		       mutex_lock(&pool->manager_mutex));
-
 	mutex_unlock(&pool->manager_mutex);
+
 	mutex_unlock(&pool->manager_arb);
 
 	/* shut down the timers */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/10] workqueue: rename manager_mutex to bind_mutex
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (6 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-04-27  4:09 ` [PATCH 09/10] workqueue: separate pool-binding code out from create_worker() Lai Jiangshan
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Andrew Morton, Peter Zijlstra,
	Jan Kara, Viresh Kumar

manager_mutex is only used to protect the binding of the pool
and the workers. It protects the bind_list and operations
based on this list, such as:
	cpu-binding for the workers in the bind_list
	concurrency management for the workers in the bind_list

So we can simply rename manager_mutex to bind_mutex without any
functionality changed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   47 +++++++++++++++++++++----------------------
 kernel/workqueue_internal.h |    2 +-
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ff6ec9a..9e0e606 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -65,7 +65,7 @@ enum {
 	 * be executing on any CPU.  The pool behaves as an unbound one.
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
-	 * manager_mutex to avoid changing binding state while
+	 * bind_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
@@ -122,7 +122,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * M: pool->manager_mutex protected.
+ * B: pool->bind_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -160,8 +160,8 @@ struct worker_pool {
 
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
-	struct mutex		manager_mutex;	/* manager exclusion */
-	struct list_head	bind_list;	/* M: pool-bound workers */
+	struct mutex		bind_mutex;	/* pool-bind/unbind exclusion */
+	struct list_head	bind_list;	/* B: pool-bound workers */
 	wait_queue_head_t	workers_unbound;/* all workers pool-unbound */
 
 	struct ida		worker_ida;	/* worker IDs for task name */
@@ -1677,11 +1677,11 @@ static void worker_unbind_pool(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->bind_mutex);
 	list_del(&worker->bind_entry);
 	if (list_empty(&pool->bind_list))
 		wake_up(&pool->workers_unbound);
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->bind_mutex);
 }
 
 /**
@@ -1731,7 +1731,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->bind_mutex);
 
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1740,7 +1740,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
+	 * The pool->bind_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1750,7 +1750,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, bind the worker to the pool */
 	list_add_tail(&worker->bind_entry, &pool->bind_list);
 
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->bind_mutex);
 
 	return worker;
 
@@ -3433,7 +3433,7 @@ static int init_worker_pool(struct worker_pool *pool)
 		    (unsigned long)pool);
 
 	mutex_init(&pool->manager_arb);
-	mutex_init(&pool->manager_mutex);
+	mutex_init(&pool->bind_mutex);
 	INIT_LIST_HEAD(&pool->bind_list);
 	init_waitqueue_head(&pool->workers_unbound);
 
@@ -3489,8 +3489,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 
 	/*
 	 * Become the manager and destroy all workers.  Grabbing
-	 * manager_arb prevents @pool's workers from blocking on
-	 * manager_mutex.
+	 * manager_arb ensures @pool's manager finished.
 	 */
 	mutex_lock(&pool->manager_arb);
 
@@ -3501,12 +3500,12 @@ static void put_unbound_pool(struct worker_pool *pool)
 	WARN_ON(pool->nr_workers || pool->nr_idle);
 	spin_unlock_irq(&pool->lock);
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->bind_mutex);
 	wait_event_cmd(pool->workers_unbound,
 		       list_empty(&pool->bind_list),
-		       mutex_unlock(&pool->manager_mutex),
-		       mutex_lock(&pool->manager_mutex));
-	mutex_unlock(&pool->manager_mutex);
+		       mutex_unlock(&pool->bind_mutex),
+		       mutex_lock(&pool->bind_mutex));
+	mutex_unlock(&pool->bind_mutex);
 
 	mutex_unlock(&pool->manager_arb);
 
@@ -4490,12 +4489,12 @@ static void wq_unbind_fn(struct work_struct *work)
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
 
-		mutex_lock(&pool->manager_mutex);
+		mutex_lock(&pool->bind_mutex);
 		spin_lock_irq(&pool->lock);
 
 		/*
-		 * We've blocked all manager operations.  Make all workers
-		 * unbound and set DISASSOCIATED.  Before this, all workers
+		 * We've blocked all pool-[un]bind operations. Make all workers
+		 * cpu-unbound and set DISASSOCIATED.  Before this, all workers
 		 * except for the ones which are still executing works from
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
@@ -4506,7 +4505,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		pool->flags |= POOL_DISASSOCIATED;
 
 		spin_unlock_irq(&pool->lock);
-		mutex_unlock(&pool->manager_mutex);
+		mutex_unlock(&pool->bind_mutex);
 
 		/*
 		 * Call schedule() so that we cross rq->lock and thus can
@@ -4547,7 +4546,7 @@ static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->bind_mutex);
 
 	/*
 	 * Restore CPU affinity of all workers.  As all idle workers should
@@ -4615,7 +4614,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	static cpumask_t cpumask;
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->bind_mutex);
 
 	/* is @cpu allowed for @pool? */
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
@@ -4660,7 +4659,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 		mutex_lock(&wq_pool_mutex);
 
 		for_each_pool(pool, pi) {
-			mutex_lock(&pool->manager_mutex);
+			mutex_lock(&pool->bind_mutex);
 
 			if (pool->cpu == cpu) {
 				spin_lock_irq(&pool->lock);
@@ -4672,7 +4671,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 				restore_unbound_workers_cpumask(pool, cpu);
 			}
 
-			mutex_unlock(&pool->manager_mutex);
+			mutex_unlock(&pool->bind_mutex);
 		}
 
 		/* update NUMA affinity of unbound workqueues */
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7bff111..50f2a3a 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,7 +37,7 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
-	struct list_head	bind_entry;	/* M: bound with the pool */
+	struct list_head	bind_entry;	/* B: bound with the pool */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/10] workqueue: separate pool-binding code out from create_worker()
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (7 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 08/10] workqueue: rename manager_mutex to bind_mutex Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-04-27  4:09 ` [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers Lai Jiangshan
  2014-05-05 15:05 ` [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Tejun Heo
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

The code of pool-binding is unfolded in create_worker().
Separating this code out will make the codes more clear.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   53 ++++++++++++++++++++++++++++++++-------------------
 1 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9e0e606..0c575b1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
 	 * bind_mutex to avoid changing binding state while
-	 * create_worker() is in progress.
+	 * worker_bind_pool() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
@@ -1668,6 +1668,37 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_bind_pool() - bind the worker to the pool
+ * @worker: worker to be bound
+ * @pool: the target pool
+ *
+ * bind the worker to the pool, thus the concurrency and cpu-binding of
+ * the worker are kept coordination with the pool across cpu-[un]hotplug.
+ */
+static void worker_bind_pool(struct worker *worker, struct worker_pool *pool)
+{
+	mutex_lock(&pool->bind_mutex);
+
+	/*
+	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
+	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
+	 */
+	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+
+	/*
+	 * The pool->bind_mutex ensures %POOL_DISASSOCIATED remains
+	 * stable across this function.  See the comments above the
+	 * flag definition for details.
+	 */
+	if (pool->flags & POOL_DISASSOCIATED)
+		worker->flags |= WORKER_UNBOUND;
+
+	list_add_tail(&worker->bind_entry, &pool->bind_list);
+
+	mutex_unlock(&pool->bind_mutex);
+}
+
+/**
  * worker_unbind_pool() - unbind the worker from the pool
  * @worker: worker which is bound to its pool
  *
@@ -1731,26 +1762,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->bind_mutex);
-
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
-
-	/*
-	 * The pool->bind_mutex ensures %POOL_DISASSOCIATED
-	 * remains stable across this function.  See the comments above the
-	 * flag definition for details.
-	 */
-	if (pool->flags & POOL_DISASSOCIATED)
-		worker->flags |= WORKER_UNBOUND;
-
 	/* successful, bind the worker to the pool */
-	list_add_tail(&worker->bind_entry, &pool->bind_list);
-
-	mutex_unlock(&pool->bind_mutex);
+	worker_bind_pool(worker, pool);
 
 	return worker;
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (8 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 09/10] workqueue: separate pool-binding code out from create_worker() Lai Jiangshan
@ 2014-04-27  4:09 ` Lai Jiangshan
  2014-05-05 14:54   ` Tejun Heo
  2014-05-05 15:05 ` [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Tejun Heo
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-04-27  4:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

There are several problems with the code that rescuers bind itself to the pool'
cpumask
  1) It uses a way different from the normal workers to bind to the cpumask
     So we can't maintain the normal/rescuer workers under the same framework.
  2) The the code of cpu-binding for rescuer is complicated
  3) If one or more cpuhotplugs happen while the rescuer processes the
     scheduled works, the rescuer may not be correctly bound to the cpumask of
     the pool. This is allowed behavior, but is not good. It will be better
     if the cpumask of the rescuer is always kept coordination with the pool
     across any cpuhotplugs.

Using generic pool-bind/unbind routine will solve the above problems,
and result much more simple code.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   74 +++++----------------------------------------------
 1 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0c575b1..4f90a58 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1588,70 +1588,6 @@ static void worker_leave_idle(struct worker *worker)
 	list_del_init(&worker->entry);
 }
 
-/**
- * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
- * @pool: target worker_pool
- *
- * Bind %current to the cpu of @pool if it is associated and lock @pool.
- *
- * Works which are scheduled while the cpu is online must at least be
- * scheduled to a worker which is bound to the cpu so that if they are
- * flushed from cpu callbacks while cpu is going down, they are
- * guaranteed to execute on the cpu.
- *
- * This function is to be used by unbound workers and rescuers to bind
- * themselves to the target cpu and may race with cpu going down or
- * coming online.  kthread_bind() can't be used because it may put the
- * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
- * verbatim as it's best effort and blocking and pool may be
- * [dis]associated in the meantime.
- *
- * This function tries set_cpus_allowed() and locks pool and verifies the
- * binding against %POOL_DISASSOCIATED which is set during
- * %CPU_DOWN_PREPARE and cleared during %CPU_ONLINE, so if the worker
- * enters idle state or fetches works without dropping lock, it can
- * guarantee the scheduling requirement described in the first paragraph.
- *
- * CONTEXT:
- * Might sleep.  Called without any lock but returns with pool->lock
- * held.
- *
- * Return:
- * %true if the associated pool is online (@worker is successfully
- * bound), %false if offline.
- */
-static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
-__acquires(&pool->lock)
-{
-	while (true) {
-		/*
-		 * The following call may fail, succeed or succeed
-		 * without actually migrating the task to the cpu if
-		 * it races with cpu hotunplug operation.  Verify
-		 * against POOL_DISASSOCIATED.
-		 */
-		if (!(pool->flags & POOL_DISASSOCIATED))
-			set_cpus_allowed_ptr(current, pool->attrs->cpumask);
-
-		spin_lock_irq(&pool->lock);
-		if (pool->flags & POOL_DISASSOCIATED)
-			return false;
-		if (task_cpu(current) == pool->cpu &&
-		    cpumask_equal(&current->cpus_allowed, pool->attrs->cpumask))
-			return true;
-		spin_unlock_irq(&pool->lock);
-
-		/*
-		 * We've raced with CPU hot[un]plug.  Give it a breather
-		 * and retry migration.  cond_resched() is required here;
-		 * otherwise, we might deadlock against cpu_stop trying to
-		 * bring down the CPU on non-preemptive kernel.
-		 */
-		cpu_relax();
-		cond_resched();
-	}
-}
-
 static struct worker *alloc_worker(void)
 {
 	struct worker *worker;
@@ -2336,8 +2272,9 @@ repeat:
 
 		spin_unlock_irq(&wq_mayday_lock);
 
-		/* migrate to the target cpu if possible */
-		worker_maybe_bind_and_lock(pool);
+		worker_bind_pool(rescuer, pool);
+
+		spin_lock_irq(&pool->lock);
 		rescuer->pool = pool;
 
 		/*
@@ -2350,6 +2287,11 @@ repeat:
 				move_linked_works(work, scheduled, &n);
 
 		process_scheduled_works(rescuer);
+		spin_unlock_irq(&pool->lock);
+
+		worker_unbind_pool(rescuer);
+
+		spin_lock_irq(&pool->lock);
 
 		/*
 		 * Put the reference grabbed by send_mayday().  @pool won't
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] workqueue: use manager lock only to protect worker_idr
  2014-04-27  4:08 ` [PATCH 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
@ 2014-05-05 13:01   ` Tejun Heo
  2014-05-06 15:43     ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 13:01 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:08:56PM +0800, Lai Jiangshan wrote:
> worker_idr is highly bound to managers and is always/only accessed in manager
> lock context. So we don't need pool->lock for it.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
...
> @@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
>   * @wi: integer used for iteration
>   * @pool: worker_pool to iterate workers of
>   *
> - * This must be called with either @pool->manager_mutex or ->lock held.
> + * This must be called with either @pool->manager_mutex.

Please drop "either" from the sentence.

> @@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool *pool)
>  	 * ID is needed to determine kthread name.  Allocate ID first
>  	 * without installing the pointer.
>  	 */
> -	idr_preload(GFP_KERNEL);
> -	spin_lock_irq(&pool->lock);
> -
> -	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
> -
> -	spin_unlock_irq(&pool->lock);
> -	idr_preload_end();
> +	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
>  	if (id < 0)
>  		goto fail;
>  
> @@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool *pool)
>  		worker->flags |= WORKER_UNBOUND;
>  
>  	/* successful, commit the pointer to idr */
> -	spin_lock_irq(&pool->lock);
>  	idr_replace(&pool->worker_idr, worker, worker->id);
> -	spin_unlock_irq(&pool->lock);

W/ locking updated, we can simply assign the pointer on idr_alloc()
instead of doing split alloc/replace.

I'm a bit on the fence about this patch.  It does simplify the code a
bit but then we lose the ability to iterate workers without grabbing
the manager_mutex, which would come handy when, for example,
implementing better workqueue info reporting during oops which we'll
prolly need to add sooner or later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only
  2014-04-27  4:08 ` [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
@ 2014-05-05 13:13   ` Tejun Heo
  2014-05-06 15:58     ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 13:13 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Sun, Apr 27, 2014 at 12:08:57PM +0800, Lai Jiangshan wrote:
> @@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
>   * create_worker - create a new workqueue worker
>   * @pool: pool the new worker will belong to
>   *
> - * Create a new worker which is bound to @pool.  The returned worker
> - * can be started by calling start_worker() or destroyed using
> - * destroy_worker().
> + * Create a new worker which is bound to @pool.
> + * The new worker should be started and enter idle by start_worker().

Hmm... we used to have a path where a worker is created and then
destroyed without being started.  IIRC, it was on the CPU online
failure path.  A worker was created for the CPU coming online and if
the online operation failed the created worker was shut down without
being started.  Right, we no longer shutdown per-cpu pools on offline
so this doesn't matter anymore.  Might worthwhile to note in the patch
description tho.

> @@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
>   * @worker: worker to be destroyed
>   *
>   * Destroy @worker and adjust @pool stats accordingly.
> + * The worker should be idle(WORKER_IDLE).

Just write "The worker should be idle."  Also, in general, can you
please put a space before opening parenthesis?

>   *
>   * CONTEXT:
>   * spin_lock_irq(pool->lock) which is released and regrabbed.
> @@ -1828,13 +1826,13 @@ static void destroy_worker(struct worker *worker)
>  
>  	/* sanity check frenzy */
>  	if (WARN_ON(worker->current_work) ||
> -	    WARN_ON(!list_empty(&worker->scheduled)))
> +	    WARN_ON(!list_empty(&worker->scheduled)) ||
> +	    WARN_ON(!(worker->flags & WORKER_IDLE)) ||
> +	    WARN_ON(pool->nr_workers == 1 && !list_empty(&pool->worklist)))

I'm not sure about the pool condition check.  It's kinda overreaching
to check for it from worker destruction.

> @@ -3589,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
>  	mutex_lock(&pool->manager_mutex);
>  	spin_lock_irq(&pool->lock);
>  
> +	WARN_ON(pool->nr_workers != pool->nr_idle);

Does this condition detect anything new from the condition below?

>  	while ((worker = first_worker(pool)))
>  		destroy_worker(worker);
>  	WARN_ON(pool->nr_workers || pool->nr_idle);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
@ 2014-05-05 14:31   ` Tejun Heo
  2014-05-05 15:02   ` Tejun Heo
  2014-05-07  7:30   ` Lai Jiangshan
  2 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 14:31 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:08:58PM +0800, Lai Jiangshan wrote:
> worker destruction includes these parts of code:
> 	adjust pool's stats
> 	remove the worker from idle list
> 	unbind the worker from the pool
> 	kthread_stop() to wait for the worker's task exit
> 	free the worker struct
> 
> We can find out that there is no essential thing to do after
> kthread_stop(). Which means destroy_worker() doesn't need
> to wait for the worker's task exit. So we can remove kthread_stop()
> and free the worker struct in the worker exiting path.
> 
> But put_unbound_pool() still needs to sync the all the workers'
> destruction before to destroy the pool. Otherwise the workers
> may access to the invalid pool when they are exiting.
> 
> So we also move the code of "unbind the worker" to the exiting
> path and let put_unbound_pool() to sync with this code via
> a wait_queue_head_t workers_unbound.
> 
> The code of "unbind the worker" is wrapped in a function "worker_unbind_pool()"
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  kernel/workqueue.c |   44 ++++++++++++++++++++++++++++----------------
>  1 files changed, 28 insertions(+), 16 deletions(-)

Is this any simpler?  Does this enable larger simplification down the
road?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-04-27  4:08 ` [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
@ 2014-05-05 14:36   ` Tejun Heo
  2014-05-07  7:10     ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 14:36 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:08:59PM +0800, Lai Jiangshan wrote:
> Since kthread_stop() is removed from destroy_worker(),
> destroy_worker() doesn't need to sleep.
> Since "unbind the worker" is moved out from destroy_worker(),
> destroy_worker() doesn't require manager_mutex.
> 
> So destroy_worker() can be directly called in the idle timeout
> handler, it helps us remove POOL_MANAGE_WORKERS and
> maybe_destroy_worker() and simplify the manage_workers()
> 
> After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
> need to test whether it needs to manage after processed works.
> So we can remove this test branch.

Ah, so, you can take out workers directly from idle timer.  Yeah,
that's nice.  I'm not a big fan of the wait_queue usage in the
previous patch tho.  Can we use a completion instead?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers
  2014-04-27  4:09 ` [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers Lai Jiangshan
@ 2014-05-05 14:54   ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 14:54 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:09:05PM +0800, Lai Jiangshan wrote:
> There are several problems with the code that rescuers bind itself to the pool'
> cpumask
>   1) It uses a way different from the normal workers to bind to the cpumask
>      So we can't maintain the normal/rescuer workers under the same framework.
>   2) The the code of cpu-binding for rescuer is complicated
>   3) If one or more cpuhotplugs happen while the rescuer processes the
>      scheduled works, the rescuer may not be correctly bound to the cpumask of
>      the pool. This is allowed behavior, but is not good. It will be better
>      if the cpumask of the rescuer is always kept coordination with the pool
>      across any cpuhotplugs.
> 
> Using generic pool-bind/unbind routine will solve the above problems,
> and result much more simple code.

Ah, nice.  Yeah, I definitely like it.  We can solve
iteration-from-oops problem using RCU if it becomes necessary.  I
don't quite like the name of bind_entry tho.  I'll comment on it on
that patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/10] workqueue: separate iteration role from worker_idr
  2014-04-27  4:09 ` [PATCH 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
@ 2014-05-05 14:57   ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 14:57 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Jan Kara, Viresh Kumar

On Sun, Apr 27, 2014 at 12:09:00PM +0800, Lai Jiangshan wrote:
> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 7e2204d..7bff111 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -37,6 +37,7 @@ struct worker {
>  	struct task_struct	*task;		/* I: worker task */
>  	struct worker_pool	*pool;		/* I: the associated pool */
>  						/* L: for rescuers */
> +	struct list_head	bind_entry;	/* M: bound with the pool */

This is just a wrong name.  Let's just call it "workers_node" and the
head "workers".

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] workqueue: convert worker_idr to worker_ida
  2014-04-27  4:09 ` [PATCH 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
@ 2014-05-05 14:59   ` Tejun Heo
  2014-05-06 16:33     ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 14:59 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:09:01PM +0800, Lai Jiangshan wrote:
> @@ -2224,6 +2220,9 @@ woke_up:
>  		spin_unlock_irq(&pool->lock);
>  		WARN_ON_ONCE(!list_empty(&worker->entry));
>  		worker->task->flags &= ~PF_WQ_WORKER;
> +
> +		set_task_comm(worker->task, "kworker_die");
> +		ida_simple_remove(&pool->worker_ida, worker->id);
>  		worker_unbind_pool(worker);
>  		kfree(worker);
>  		return 0;

Does this chunk belong to this patch?  Why no description about this
change?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
  2014-05-05 14:31   ` Tejun Heo
@ 2014-05-05 15:02   ` Tejun Heo
  2014-05-06 16:27     ` Lai Jiangshan
  2014-05-07  7:30   ` Lai Jiangshan
  2 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 15:02 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:08:58PM +0800, Lai Jiangshan wrote:
>  /**
> + * worker_unbind_pool() - unbind the worker from the pool
> + * @worker: worker which is bound to its pool
> + *
> + * Undo the pool-binding which had been done in create_worker()
> + */
> +static void worker_unbind_pool(struct worker *worker)

worker_unbind_from_pool() would be a better name but I don't think
using bind/unbind for this purpose is a good idea.  We already use
that pair of verbs for workers/pools being bound and unbound to CPUs.
I don't think we want to overload the terms for this purpose.  It gets
pretty confusing.  How about something like worker_detach_from_pool()
and later worker_attach_to_pool()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization
  2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
                   ` (9 preceding siblings ...)
  2014-04-27  4:09 ` [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers Lai Jiangshan
@ 2014-05-05 15:05 ` Tejun Heo
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  10 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-05 15:05 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Sun, Apr 27, 2014 at 12:08:55PM +0800, Lai Jiangshan wrote:
> Patch1-4: async worker destruction
> 
> The old version(https://lkml.org/lkml/2014/2/17/418) code of async worker
> destruction has to dance with pool->lock and worker_idr, it is complicated.
> 
> But worker_idr in the new version dosn't requies pool->lock, and the code in
> put_unbound_pool() which waits workers exit is changed to wait on a
> wait_queue_head_t. These two changes make the new async worker destruction
> much simpler.
> 
> Patch2 reduces the review burden. It will be easier to review the whole
> patchset if we know destroy_worker() is forced to destroy idle workers only.
> 
> ==========
> Patch5-10: pool-binding synchronization and simplify the workers management
> 
> The code which binds a worker to the pool and unbind a worker from the pool
> is unfolded in create_worker()/destroy_worker().
> The patchset moves this pool-binding code out and wraps them.
> 
> patch3-4 moves the pool-unbinding code out from destroy_worker(), and make
> manger_mutex only protects the pool-unbinding code only rather than
> protects the whole worker-destruction path.
> 
> patch5-7 makes manger_mutex only protects the pool-binding code rather than the
> whole worker-creation path.
> 
> patch8: rename manger_mutex to bind_mutex
> patch9-10: moves the pool-binding code out from create_worker() and use it for
> rescuer.

Sorry about the late review.  I've been pretty sick last week.
Overall, excellent work.  We'll prolly go through a couple more
iterations but I really like what's going on.

Thanks a lot!

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] workqueue: use manager lock only to protect worker_idr
  2014-05-05 13:01   ` Tejun Heo
@ 2014-05-06 15:43     ` Lai Jiangshan
  0 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-06 15:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Mon, May 5, 2014 at 9:01 PM, Tejun Heo <tj@kernel.org> wrote:
> On Sun, Apr 27, 2014 at 12:08:56PM +0800, Lai Jiangshan wrote:
>> worker_idr is highly bound to managers and is always/only accessed in manager
>> lock context. So we don't need pool->lock for it.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ...
>> @@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
>>   * @wi: integer used for iteration
>>   * @pool: worker_pool to iterate workers of
>>   *
>> - * This must be called with either @pool->manager_mutex or ->lock held.
>> + * This must be called with either @pool->manager_mutex.
>
> Please drop "either" from the sentence.

Ahh, I will do it.

>
>> @@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool *pool)
>>        * ID is needed to determine kthread name.  Allocate ID first
>>        * without installing the pointer.
>>        */
>> -     idr_preload(GFP_KERNEL);
>> -     spin_lock_irq(&pool->lock);
>> -
>> -     id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
>> -
>> -     spin_unlock_irq(&pool->lock);
>> -     idr_preload_end();
>> +     id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
>>       if (id < 0)
>>               goto fail;
>>
>> @@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool *pool)
>>               worker->flags |= WORKER_UNBOUND;
>>
>>       /* successful, commit the pointer to idr */
>> -     spin_lock_irq(&pool->lock);
>>       idr_replace(&pool->worker_idr, worker, worker->id);
>> -     spin_unlock_irq(&pool->lock);
>
> W/ locking updated, we can simply assign the pointer on idr_alloc()
> instead of doing split alloc/replace.

I hope the patch simpler which just removes the lock around it.
this idr_replace() will be removed in later patch, so it is OK
when idr_replace() is temporary kept.

And I don't want to add a fail path after the thread is allocated.

>
> I'm a bit on the fence about this patch.  It does simplify the code a
> bit but then we lose the ability to iterate workers without grabbing
> the manager_mutex, which would come handy when, for example,
> implementing better workqueue info reporting during oops which we'll
> prolly need to add sooner or later.

It can be done by idle_list + busy_hash if needed.
Or add an additional list.

Thanks
Lai

>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only
  2014-05-05 13:13   ` Tejun Heo
@ 2014-05-06 15:58     ` Lai Jiangshan
  0 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-06 15:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Mon, May 5, 2014 at 9:13 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Sun, Apr 27, 2014 at 12:08:57PM +0800, Lai Jiangshan wrote:
>> @@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
>>   * create_worker - create a new workqueue worker
>>   * @pool: pool the new worker will belong to
>>   *
>> - * Create a new worker which is bound to @pool.  The returned worker
>> - * can be started by calling start_worker() or destroyed using
>> - * destroy_worker().
>> + * Create a new worker which is bound to @pool.
>> + * The new worker should be started and enter idle by start_worker().
>
> Hmm... we used to have a path where a worker is created and then
> destroyed without being started.  IIRC, it was on the CPU online
> failure path.  A worker was created for the CPU coming online and if
> the online operation failed the created worker was shut down without
> being started.  Right, we no longer shutdown per-cpu pools on offline
> so this doesn't matter anymore.  Might worthwhile to note in the patch
> description tho.
>
>> @@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
>>   * @worker: worker to be destroyed
>>   *
>>   * Destroy @worker and adjust @pool stats accordingly.
>> + * The worker should be idle(WORKER_IDLE).
>
> Just write "The worker should be idle."  Also, in general, can you
> please put a space before opening parenthesis?
>
>>   *
>>   * CONTEXT:
>>   * spin_lock_irq(pool->lock) which is released and regrabbed.
>> @@ -1828,13 +1826,13 @@ static void destroy_worker(struct worker *worker)
>>
>>       /* sanity check frenzy */
>>       if (WARN_ON(worker->current_work) ||
>> -         WARN_ON(!list_empty(&worker->scheduled)))
>> +         WARN_ON(!list_empty(&worker->scheduled)) ||
>> +         WARN_ON(!(worker->flags & WORKER_IDLE)) ||
>> +         WARN_ON(pool->nr_workers == 1 && !list_empty(&pool->worklist)))
>
> I'm not sure about the pool condition check.  It's kinda overreaching
> to check for it from worker destruction.

"pool->nr_workers == 1", it is the last worker of the pool, so I add this check.

I will remove it.

>
>> @@ -3589,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
>>       mutex_lock(&pool->manager_mutex);
>>       spin_lock_irq(&pool->lock);
>>
>> +     WARN_ON(pool->nr_workers != pool->nr_idle);
>
> Does this condition detect anything new from the condition below?

It is already ensured that all workers are idle since pool->ref==0
and manager_arb is held.

This additional WARN_ON() does double check.
It tells the reviewers that destroying all workers on idle_list
equals to destroying all workers of the pool.

I can remove this WARN_ON().

>
>>       while ((worker = first_worker(pool)))
>>               destroy_worker(worker);
>>       WARN_ON(pool->nr_workers || pool->nr_idle);
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-05-05 15:02   ` Tejun Heo
@ 2014-05-06 16:27     ` Lai Jiangshan
  2014-05-06 16:31       ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-06 16:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Mon, May 5, 2014 at 11:02 PM, Tejun Heo <tj@kernel.org> wrote:
> On Sun, Apr 27, 2014 at 12:08:58PM +0800, Lai Jiangshan wrote:
>>  /**
>> + * worker_unbind_pool() - unbind the worker from the pool
>> + * @worker: worker which is bound to its pool
>> + *
>> + * Undo the pool-binding which had been done in create_worker()
>> + */
>> +static void worker_unbind_pool(struct worker *worker)
>
> worker_unbind_from_pool() would be a better name but I don't think
> using bind/unbind for this purpose is a good idea.  We already use
> that pair of verbs for workers/pools being bound and unbound to CPUs.
> I don't think we want to overload the terms for this purpose.  It gets
> pretty confusing.  How about something like worker_detach_from_pool()
> and later worker_attach_to_pool()?

I considered several names (searching from an English dictionary)
assoc/pin/bind/attach/add...

The last I chose the winner "bind" from the last two candidates assoc&bind.

worker_OP[_to]_pool()
        OP(assoc/pin/bind/attach/add) the worker to OP_list of the
pool, do cpu-binding for the worker
OP_list
        the list in the pool for the workers, iterating(when
cpu-binding and worker concurrency [un]bound)
OP_entry
        worker entry for OP_list, for  iterating(when cpu-binding and
worker concurrency [un]bound)
OP_mutex
        protects worker_OP_to_pool() protect cpu-binding for the workers,
        protect iterating(when cpu-binding and worker concurrency [un]bound)

every sentence has "bind", so I think "bind_mutex" is proper.
since I used "bind_mutex", I will use bind_list and worker_bind_pool().

I don't refuse to use worker_attach_to_pool(), but I hope you choose
other names for me:

attach_list VS bind_list
attach_mutex VS bind_mutex

I guess they will be attach_list&attach_mutex.  it's a little pity to
drop the nice name bind_mutex.

Thanks,
Lai

>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-05-06 16:27     ` Lai Jiangshan
@ 2014-05-06 16:31       ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-06 16:31 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: LKML

Hello, Lai.

On Wed, May 07, 2014 at 12:27:13AM +0800, Lai Jiangshan wrote:
> I considered several names (searching from an English dictionary)
> assoc/pin/bind/attach/add...

Yeah, naming is hard.

> The last I chose the winner "bind" from the last two candidates assoc&bind.
> 
> worker_OP[_to]_pool()
>         OP(assoc/pin/bind/attach/add) the worker to OP_list of the
> pool, do cpu-binding for the worker
> OP_list
>         the list in the pool for the workers, iterating(when
> cpu-binding and worker concurrency [un]bound)
> OP_entry
>         worker entry for OP_list, for  iterating(when cpu-binding and
> worker concurrency [un]bound)

So, VERB_list or VERB_entry doesn't really work well unless they're
really for a list which is specifically created for the action.

> OP_mutex
>         protects worker_OP_to_pool() protect cpu-binding for the workers,
>         protect iterating(when cpu-binding and worker concurrency [un]bound)
> 
> every sentence has "bind", so I think "bind_mutex" is proper.
> since I used "bind_mutex", I will use bind_list and worker_bind_pool().

I don't think it's necessary for all those entries to share the same
prefix.  It could have some benefits but as long as the locking
requirements are clearly indicated in the comment, I don't think the
benefit is large enough to overrule the downsides of such naming.

> I don't refuse to use worker_attach_to_pool(), but I hope you choose
> other names for me:
> 
> attach_list VS bind_list
> attach_mutex VS bind_mutex

Just use worker->node and pool->workers for the list.  I don't think
attach_mutex sounds too bad.

> I guess they will be attach_list&attach_mutex.  it's a little pity to
> drop the nice name bind_mutex.

Why is bind_mutex any better than attach_mutex?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] workqueue: convert worker_idr to worker_ida
  2014-05-05 14:59   ` Tejun Heo
@ 2014-05-06 16:33     ` Lai Jiangshan
  2014-05-06 16:35       ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-06 16:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Mon, May 5, 2014 at 10:59 PM, Tejun Heo <tj@kernel.org> wrote:
> On Sun, Apr 27, 2014 at 12:09:01PM +0800, Lai Jiangshan wrote:
>> @@ -2224,6 +2220,9 @@ woke_up:
>>               spin_unlock_irq(&pool->lock);
>>               WARN_ON_ONCE(!list_empty(&worker->entry));
>>               worker->task->flags &= ~PF_WQ_WORKER;
>> +
>> +             set_task_comm(worker->task, "kworker_die");
>> +             ida_simple_remove(&pool->worker_ida, worker->id);
>>               worker_unbind_pool(worker);
>>               kfree(worker);
>>               return 0;
>
> Does this chunk belong to this patch?  Why no description about this
> change?

"set_task_comm()" doesn't belong to this patch. it avoids two workers
have the same name.(one is dying, the other one is newly created"

"ida_simple_remove()" does belong to this patch. and it is needed to
be removed out from  worker_unbind_pool() to here.

>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] workqueue: convert worker_idr to worker_ida
  2014-05-06 16:33     ` Lai Jiangshan
@ 2014-05-06 16:35       ` Tejun Heo
  2014-05-06 16:38         ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-06 16:35 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: LKML

On Wed, May 07, 2014 at 12:33:34AM +0800, Lai Jiangshan wrote:
> On Mon, May 5, 2014 at 10:59 PM, Tejun Heo <tj@kernel.org> wrote:
> > On Sun, Apr 27, 2014 at 12:09:01PM +0800, Lai Jiangshan wrote:
> >> @@ -2224,6 +2220,9 @@ woke_up:
> >>               spin_unlock_irq(&pool->lock);
> >>               WARN_ON_ONCE(!list_empty(&worker->entry));
> >>               worker->task->flags &= ~PF_WQ_WORKER;
> >> +
> >> +             set_task_comm(worker->task, "kworker_die");
> >> +             ida_simple_remove(&pool->worker_ida, worker->id);
> >>               worker_unbind_pool(worker);
> >>               kfree(worker);
> >>               return 0;
> >
> > Does this chunk belong to this patch?  Why no description about this
> > change?
> 
> "set_task_comm()" doesn't belong to this patch. it avoids two workers
> have the same name.(one is dying, the other one is newly created"

Separate out this to a separate patch?  A better name would be
"kworker_dying".  Does this matter tho?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] workqueue: convert worker_idr to worker_ida
  2014-05-06 16:35       ` Tejun Heo
@ 2014-05-06 16:38         ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-06 16:38 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: LKML

Hello,

On Tue, May 06, 2014 at 12:35:24PM -0400, Tejun Heo wrote:
> On Wed, May 07, 2014 at 12:33:34AM +0800, Lai Jiangshan wrote:
> > On Mon, May 5, 2014 at 10:59 PM, Tejun Heo <tj@kernel.org> wrote:
> > > On Sun, Apr 27, 2014 at 12:09:01PM +0800, Lai Jiangshan wrote:
> > >> @@ -2224,6 +2220,9 @@ woke_up:
> > >>               spin_unlock_irq(&pool->lock);
> > >>               WARN_ON_ONCE(!list_empty(&worker->entry));
> > >>               worker->task->flags &= ~PF_WQ_WORKER;
> > >> +
> > >> +             set_task_comm(worker->task, "kworker_die");
> > >> +             ida_simple_remove(&pool->worker_ida, worker->id);
> > >>               worker_unbind_pool(worker);
> > >>               kfree(worker);
> > >>               return 0;
> > >
> > > Does this chunk belong to this patch?  Why no description about this
> > > change?
> > 
> > "set_task_comm()" doesn't belong to this patch. it avoids two workers
> > have the same name.(one is dying, the other one is newly created"
> 
> Separate out this to a separate patch?  A better name would be
> "kworker_dying".  Does this matter tho?

On the second thought, before these patches, we only freed ID after
the task exited, right?  Hmm... yeah, it can be confusing for
debugging.  Can you please do the above in the patch which moves IDR
freeing to the worker itself and explain accordingly?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-05 14:36   ` Tejun Heo
@ 2014-05-07  7:10     ` Lai Jiangshan
  2014-05-07 13:12       ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-07  7:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/05/2014 10:36 PM, Tejun Heo wrote:
> On Sun, Apr 27, 2014 at 12:08:59PM +0800, Lai Jiangshan wrote:
>> Since kthread_stop() is removed from destroy_worker(),
>> destroy_worker() doesn't need to sleep.
>> Since "unbind the worker" is moved out from destroy_worker(),
>> destroy_worker() doesn't require manager_mutex.
>>
>> So destroy_worker() can be directly called in the idle timeout
>> handler, it helps us remove POOL_MANAGE_WORKERS and
>> maybe_destroy_worker() and simplify the manage_workers()
>>
>> After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
>> need to test whether it needs to manage after processed works.
>> So we can remove this test branch.
> 
> Ah, so, you can take out workers directly from idle timer.  Yeah,
> that's nice.  I'm not a big fan of the wait_queue usage in the
> previous patch tho.  Can we use a completion instead?
> 
> Thanks.
> 

1) complete() can't be called inside attach_mutex due to the worker
   shouldn't access to the pool after complete().
2) put_unbound_pool() may called from get_unbound_pool(), we need to add
   an additional check and avoid the wait_for_completion() if so.

+static void worker_detach_from_pool(struct worker *worker, struct worker_pool *pool)
+{
+	bool is_last;
+
+	mutex_lock(&pool->bind_mutex);
+	list_del(&worker->bind_entry);
+	is_last = list_empty(&worker->bind_entry);
+	mutex_unlock(&pool->bind_mutex);
+
+	/* need some comments here */
+	if (is_last)
+		complete(&pool->workers_detached);
+}


@@ -3588,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	mutex_lock(&pool->manager_mutex);
 	spin_lock_irq(&pool->lock);
 	
+	need_to_wait = pool->nr_workers != 0; /* it may be called from get_unbound_pool() */
 	while ((worker = first_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
@@ -3596,6 +3596,8 @@ static void put_unbound_pool(struct worker_pool *pool)
 	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
 
+	if (need_to_wait)
+		wait_for_completion(&pool->workers_detached);

 	/* shut down the timers */
 	del_timer_sync(&pool->idle_timer);
 	del_timer_sync(&pool->mayday_timer);


So I think wait_queue is more grace.

Thanks,
Lai


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
  2014-05-05 14:31   ` Tejun Heo
  2014-05-05 15:02   ` Tejun Heo
@ 2014-05-07  7:30   ` Lai Jiangshan
  2014-05-07 13:15     ` Tejun Heo
  2 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-07  7:30 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Tejun Heo, Ingo Molnar, Peter Zijlstra

On 04/27/2014 12:08 PM, Lai Jiangshan wrote:

>  
>  	spin_unlock_irq(&pool->lock);
> +
> +	wait_event_cmd(pool->workers_unbound,
> +		       idr_is_empty(&pool->worker_idr),
> +		       mutex_unlock(&pool->manager_mutex),
> +		       mutex_lock(&pool->manager_mutex));
> +


How about I wrap it as wait_event_mutex()?
(like wait_event_lock_irq() in kernel and pthread_cond_wait() in userspace)

>  	mutex_unlock(&pool->manager_mutex);
>  	mutex_unlock(&pool->manager_arb);
>  


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-07  7:10     ` Lai Jiangshan
@ 2014-05-07 13:12       ` Tejun Heo
  2014-05-07 13:38         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-07 13:12 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Wed, May 07, 2014 at 03:10:20PM +0800, Lai Jiangshan wrote:
> 1) complete() can't be called inside attach_mutex due to the worker
>    shouldn't access to the pool after complete().

Sure, complete it after releasing the lock.  Shutdown can't complete
before the completion gets completed, right?

> 2) put_unbound_pool() may called from get_unbound_pool(), we need to add
>    an additional check and avoid the wait_for_completion() if so.
> 
> +static void worker_detach_from_pool(struct worker *worker, struct worker_pool *pool)
> +{
> +	bool is_last;
> +
> +	mutex_lock(&pool->bind_mutex);
> +	list_del(&worker->bind_entry);
> +	is_last = list_empty(&worker->bind_entry);
> +	mutex_unlock(&pool->bind_mutex);
> +
> +	/* need some comments here */
> +	if (is_last)
> +		complete(&pool->workers_detached);
> +}
> 
> 
> @@ -3588,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
>  	mutex_lock(&pool->manager_mutex);
>  	spin_lock_irq(&pool->lock);
>  	
> +	need_to_wait = pool->nr_workers != 0; /* it may be called from get_unbound_pool() */
>  	while ((worker = first_worker(pool)))
>  		destroy_worker(worker);
>  	WARN_ON(pool->nr_workers || pool->nr_idle);
> @@ -3596,6 +3596,8 @@ static void put_unbound_pool(struct worker_pool *pool)
>  	mutex_unlock(&pool->manager_mutex);
>  	mutex_unlock(&pool->manager_arb);
>  
> +	if (need_to_wait)
> +		wait_for_completion(&pool->workers_detached);

Shouldn't it be able to wait whenever it's about to destroy non-empty
pool?

	DECLARE_COMPLETION_ONSTACK(completion);

	...

	while ((worker = first_worker(pool))) {
		destroy_worker(worker);
		pool->detach_completion = &completion;
	}

	...
	unlock;

	if (pool->detach_completion)
		wait_for_completion();
	...

And the worker side can simply do,

	struct completion *completion;

	if (I'm the last worker exiting)
		completion = pool->detach_completion;
	unlock;

	complete(completion);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] workqueue: async worker destruction
  2014-05-07  7:30   ` Lai Jiangshan
@ 2014-05-07 13:15     ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-07 13:15 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Ingo Molnar, Peter Zijlstra

On Wed, May 07, 2014 at 03:30:39PM +0800, Lai Jiangshan wrote:
> On 04/27/2014 12:08 PM, Lai Jiangshan wrote:
> 
> >  
> >  	spin_unlock_irq(&pool->lock);
> > +
> > +	wait_event_cmd(pool->workers_unbound,
> > +		       idr_is_empty(&pool->worker_idr),
> > +		       mutex_unlock(&pool->manager_mutex),
> > +		       mutex_lock(&pool->manager_mutex));
> > +
> 
> 
> How about I wrap it as wait_event_mutex()?
> (like wait_event_lock_irq() in kernel and pthread_cond_wait() in userspace)

waitq tends to be trickier to get right.  Can you please give
completion a shot?  completion is pretty convenient for (its only
purpose after all) this sort of one-shot synchronizations.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-07 13:12       ` Tejun Heo
@ 2014-05-07 13:38         ` Lai Jiangshan
  2014-05-07 13:41           ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-07 13:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Wed, May 7, 2014 at 9:12 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Lai.
>
> On Wed, May 07, 2014 at 03:10:20PM +0800, Lai Jiangshan wrote:
>> 1) complete() can't be called inside attach_mutex due to the worker
>>    shouldn't access to the pool after complete().
>
> Sure, complete it after releasing the lock.  Shutdown can't complete
> before the completion gets completed, right?
>
>> 2) put_unbound_pool() may called from get_unbound_pool(), we need to add
>>    an additional check and avoid the wait_for_completion() if so.

Do you accept if I remove put_unbound_pool() from get_unbound_pool()
and use several freeing code instead?

>>
>> +static void worker_detach_from_pool(struct worker *worker, struct worker_pool *pool)
>> +{
>> +     bool is_last;
>> +
>> +     mutex_lock(&pool->bind_mutex);
>> +     list_del(&worker->bind_entry);
>> +     is_last = list_empty(&worker->bind_entry);
>> +     mutex_unlock(&pool->bind_mutex);
>> +
>> +     /* need some comments here */
>> +     if (is_last)
>> +             complete(&pool->workers_detached);
>> +}
>>
>>
>> @@ -3588,6 +3587,7 @@ static void put_unbound_pool(struct worker_pool *pool)
>>       mutex_lock(&pool->manager_mutex);
>>       spin_lock_irq(&pool->lock);
>>
>> +     need_to_wait = pool->nr_workers != 0; /* it may be called from get_unbound_pool() */
>>       while ((worker = first_worker(pool)))
>>               destroy_worker(worker);
>>       WARN_ON(pool->nr_workers || pool->nr_idle);
>> @@ -3596,6 +3596,8 @@ static void put_unbound_pool(struct worker_pool *pool)
>>       mutex_unlock(&pool->manager_mutex);
>>       mutex_unlock(&pool->manager_arb);
>>
>> +     if (need_to_wait)
>> +             wait_for_completion(&pool->workers_detached);
>
> Shouldn't it be able to wait whenever it's about to destroy non-empty
> pool?
>
>         DECLARE_COMPLETION_ONSTACK(completion);
>
>         ...
>
>         while ((worker = first_worker(pool))) {
>                 destroy_worker(worker);
>                 pool->detach_completion = &completion;
>         }
>
>         ...
>         unlock;
>
>         if (pool->detach_completion)
>                 wait_for_completion();

it is more complex than wait_queue.
if put_unbound_pool() is removed out from get_unbound_pool(),
a simple wait_for_completion() is enough.

>         ...
>
> And the worker side can simply do,
>
>         struct completion *completion;
>
>         if (I'm the last worker exiting)
>                 completion = pool->detach_completion;
>         unlock;
>

if(completion)
>         complete(completion);
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-07 13:38         ` Lai Jiangshan
@ 2014-05-07 13:41           ` Tejun Heo
  2014-05-07 15:30             ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-07 13:41 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: LKML

On Wed, May 07, 2014 at 09:38:39PM +0800, Lai Jiangshan wrote:
> On Wed, May 7, 2014 at 9:12 PM, Tejun Heo <tj@kernel.org> wrote:
> > Hello, Lai.
> >
> > On Wed, May 07, 2014 at 03:10:20PM +0800, Lai Jiangshan wrote:
> >> 1) complete() can't be called inside attach_mutex due to the worker
> >>    shouldn't access to the pool after complete().
> >
> > Sure, complete it after releasing the lock.  Shutdown can't complete
> > before the completion gets completed, right?
> >
> >> 2) put_unbound_pool() may called from get_unbound_pool(), we need to add
> >>    an additional check and avoid the wait_for_completion() if so.
> 
> Do you accept if I remove put_unbound_pool() from get_unbound_pool()
> and use several freeing code instead?

Hah?  How much extra complexity are we talking about?  It's a single
if, no?

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-07 13:41           ` Tejun Heo
@ 2014-05-07 15:30             ` Lai Jiangshan
  2014-05-07 15:37               ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-07 15:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Wed, May 7, 2014 at 9:41 PM, Tejun Heo <tj@kernel.org> wrote:
> On Wed, May 07, 2014 at 09:38:39PM +0800, Lai Jiangshan wrote:
>> On Wed, May 7, 2014 at 9:12 PM, Tejun Heo <tj@kernel.org> wrote:
>> > Hello, Lai.
>> >
>> > On Wed, May 07, 2014 at 03:10:20PM +0800, Lai Jiangshan wrote:
>> >> 1) complete() can't be called inside attach_mutex due to the worker
>> >>    shouldn't access to the pool after complete().
>> >
>> > Sure, complete it after releasing the lock.  Shutdown can't complete
>> > before the completion gets completed, right?
>> >
>> >> 2) put_unbound_pool() may called from get_unbound_pool(), we need to add
>> >>    an additional check and avoid the wait_for_completion() if so.
>>
>> Do you accept if I remove put_unbound_pool() from get_unbound_pool()
>> and use several freeing code instead?
>
> Hah?  How much extra complexity are we talking about?  It's a single
> if, no?

        DECLARE_COMPLETION_ONSTACK(completion);
#1

        ...

        while ((worker = first_worker(pool))) {
                destroy_worker(worker);
                pool->detach_completion = &completion;
#2
        }

        ...
        unlock;

        if (pool->detach_completion)
                wait_for_completion();
#3

One thing is separated into 3 places and about 5~7lines.
I hope a single wait_for_completion() or single wait_event().


get_unbound_pool():
fail:
    if (pool)
        put_unbound_pool(pool);

I think we can change it into:

fail:
    if (pool) {
        if (pool->id >= 0)
            idr_remove(&worker_pool_idr, pool->id);
        call_rcu_sched(&pool->rcu, rcu_free_pool);
    }

Thanks,
Lai

>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-07 15:30             ` Lai Jiangshan
@ 2014-05-07 15:37               ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-07 15:37 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: LKML

Hello, Lai.

On Wed, May 07, 2014 at 11:30:35PM +0800, Lai Jiangshan wrote:
> > Hah?  How much extra complexity are we talking about?  It's a single
> > if, no?
> 
>         DECLARE_COMPLETION_ONSTACK(completion);
> #1
> 
>         ...
> 
>         while ((worker = first_worker(pool))) {
>                 destroy_worker(worker);
>                 pool->detach_completion = &completion;
> #2
>         }
> 
>         ...
>         unlock;
> 
>         if (pool->detach_completion)
>                 wait_for_completion();
> #3
> 
> One thing is separated into 3 places and about 5~7lines.
> I hope a single wait_for_completion() or single wait_event().

This is negligible amount of complexity contained in a single
function.

> get_unbound_pool():
> fail:
>     if (pool)
>         put_unbound_pool(pool);
> 
> I think we can change it into:
> 
> fail:
>     if (pool) {
>         if (pool->id >= 0)
>             idr_remove(&worker_pool_idr, pool->id);
>         call_rcu_sched(&pool->rcu, rcu_free_pool);
>     }

This is destruction logic unnecessarily duplicated in two places,
which is shittier to maintain when the destruction path changes.

There's more to complexity than simple number of lines.  It sure is a
minute difference here but I find the obsessions with LOC a bit
disturbing.  That's just one aspect and involving more lines doesn't
necessarily mean it's more complex.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching
  2014-05-05 15:05 ` [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Tejun Heo
@ 2014-05-12  6:56   ` Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
                       ` (10 more replies)
  0 siblings, 11 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Patch1-4: async worker destruction

Patch2 reduces the review burden. It will be easier to review the whole
patchset if we know destroy_worker() is forced to destroy idle workers only.

Patch5-10: worker attaching/detaching and simplify the workers management

The code which attaches a worker to the pool and detaches a worker from the pool
is unfolded in create_worker()/destroy_worker().
The patchset moves this attaching/detaching code out and wraps them.

patch3-4 moves the detaching code out from destroy_worker(), and make
manger_mutex only protects the detaching code only rather than
protects the whole worker-destruction path.

patch5-7 makes manger_mutex only protects the attaching code rather than the
whole worker-creation path.

patch8: rename manger_mutex to attach_mutex
patch9-10: moves the attaching code out from create_worker() and use it for
rescuer.

Lai Jiangshan (10):
  workqueue: use manager lock only to protect worker_idr
  workqueue: destroy_worker() should destroy idle workers only
  workqueue: async worker destruction
  workqueue: destroy worker directly in the idle timeout handler
  workqueue: separate iteration role from worker_idr
  workqueue: convert worker_idr to worker_ida
  workqueue: narrow the protection range of manager_mutex
  workqueue: rename manager_mutex to attach_mutex
  workqueue: separate pool-attaching code out from create_worker()
  workqueue: use generic attach/detach routine for rescuers

 kernel/workqueue.c          |  401 ++++++++++++++-----------------------------
 kernel/workqueue_internal.h |    1 +
 2 files changed, 126 insertions(+), 276 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker_idr is highly bound to managers and is always/only accessed in manager
lock context. So we don't need pool->lock for it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   34 ++++++----------------------------
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c3f076f..d38d07c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,8 +124,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * MG: pool->manager_mutex and pool->lock protected.  Writes require both
- *     locks.  Reads can happen under either lock.
+ * M: pool->manager_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -164,7 +163,7 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* MG: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs and iteration */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -340,16 +339,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 			   lockdep_is_held(&wq->mutex),			\
 			   "sched RCU or wq->mutex should be held")
 
-#ifdef CONFIG_LOCKDEP
-#define assert_manager_or_pool_lock(pool)				\
-	WARN_ONCE(debug_locks &&					\
-		  !lockdep_is_held(&(pool)->manager_mutex) &&		\
-		  !lockdep_is_held(&(pool)->lock),			\
-		  "pool->manager_mutex or ->lock should be held")
-#else
-#define assert_manager_or_pool_lock(pool)	do { } while (0)
-#endif
-
 #define for_each_cpu_worker_pool(pool, cpu)				\
 	for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];		\
 	     (pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
@@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
  * @wi: integer used for iteration
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with either @pool->manager_mutex or ->lock held.
+ * This must be called with @pool->manager_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, wi, pool)				\
 	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
-		if (({ assert_manager_or_pool_lock((pool)); false; })) { } \
+		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
 		else
 
 /**
@@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	 * ID is needed to determine kthread name.  Allocate ID first
 	 * without installing the pointer.
 	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pool->lock);
-
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
-
-	spin_unlock_irq(&pool->lock);
-	idr_preload_end();
+	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool *pool)
 		worker->flags |= WORKER_UNBOUND;
 
 	/* successful, commit the pointer to idr */
-	spin_lock_irq(&pool->lock);
 	idr_replace(&pool->worker_idr, worker, worker->id);
-	spin_unlock_irq(&pool->lock);
 
 	return worker;
 
 fail:
-	if (id >= 0) {
-		spin_lock_irq(&pool->lock);
+	if (id >= 0)
 		idr_remove(&pool->worker_idr, id);
-		spin_unlock_irq(&pool->lock);
-	}
 	kfree(worker);
 	return NULL;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 21:08       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 03/10 V2] workqueue: async worker destruction Lai Jiangshan
                       ` (8 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

We used to have the CPU online failure path where a worker is created
and then destroyed without being started. A worker was created for
the CPU coming online and if the online operation failed the created worker
was shut down without being started.  But this behavior was changed.
The first worker is created and started at the same time for the CPU coming
online.

It means that we had already ensured in the code that destroy_worker()
destroys idle workers only. And we don't want to allow it destroys any
non-idle worker future. Otherwise, it may be buggy and it may be
extremely hard to check. We should force destroy_worker()
to destroy idle workers only explicitly.

Since destroy_worker() destroys idle only, this patch does not change any
functionality. We just need to update the comments and the sanity check code.

In the sanity check code, we will refuse to destroy the worker
if !(worker->flags & WORKER_IDLE).

If the worker entered idle which means it is already started,
so we remove the check of "worker->flags & WORKER_STARTED",
after this removal, WORKER_STARTED is totally unneeded,
so we remove WORKER_STARTED too.

In the comments for create_worker(), "Create a new worker which is bound..."
is changed to "... which is attached..." due to we change the name of this
behavior to attaching.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   17 +++++++----------
 1 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d38d07c..752e109 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -73,7 +73,6 @@ enum {
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */

 	/* worker flags */
-	WORKER_STARTED		= 1 << 0,	/* started */
 	WORKER_DIE		= 1 << 1,	/* die die die */
 	WORKER_IDLE		= 1 << 2,	/* is idle */
 	WORKER_PREP		= 1 << 3,	/* preparing to run works */
@@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
- * Create a new worker which is bound to @pool.  The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is attached to @pool.
+ * The new worker must be started and enter idle via start_worker().
  *
  * CONTEXT:
  * Might sleep.  Does GFP_KERNEL allocations.
@@ -1778,7 +1776,6 @@ fail:
  */
 static void start_worker(struct worker *worker)
 {
-	worker->flags |= WORKER_STARTED;
 	worker->pool->nr_workers++;
 	worker_enter_idle(worker);
 	wake_up_process(worker->task);
@@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
  * @worker: worker to be destroyed
  *
  * Destroy @worker and adjust @pool stats accordingly.
+ * The worker should be idle.
  *
  * CONTEXT:
  * spin_lock_irq(pool->lock) which is released and regrabbed.
@@ -1828,13 +1826,12 @@ static void destroy_worker(struct worker *worker)

 	/* sanity check frenzy */
 	if (WARN_ON(worker->current_work) ||
-	    WARN_ON(!list_empty(&worker->scheduled)))
+	    WARN_ON(!list_empty(&worker->scheduled)) ||
+	    WARN_ON(!(worker->flags & WORKER_IDLE)))
 		return;

-	if (worker->flags & WORKER_STARTED)
-		pool->nr_workers--;
-	if (worker->flags & WORKER_IDLE)
-		pool->nr_idle--;
+	pool->nr_workers--;
+	pool->nr_idle--;

 	/*
 	 * Once WORKER_DIE is set, the kworker may destroy itself at any
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/10 V2] workqueue: async worker destruction
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 21:20       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
                       ` (7 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker destruction includes these parts of code:
	adjust pool's stats
	remove the worker from idle list
	detach the worker from the pool
	kthread_stop() to wait for the worker's task exit
	free the worker struct

We can find out that there is no essential work to do after
kthread_stop(). Which means destroy_worker() doesn't need
to wait for the worker's task exit. So we can remove kthread_stop()
and free the worker struct in the worker exiting path.

But put_unbound_pool() still needs to sync the all the workers'
destruction before to destroy the pool. Otherwise the workers
may access to the invalid pool when they are exiting.

So we also move the code of "detach the worker" to the exiting
path and let put_unbound_pool() to sync with this code via
detach_completion.

The code of "detach the worker" is wrapped in a new function
"worker_detach_from_pool()".

And the worker id is freed when detaching which happens
before the worker is fully dead. But this id of the dying worker
may be re-used for a new worker. So the dying worker's task name
is changed to "worker_dying" to avoid two or several workers
having the same name.

Since "detach the worker" is moved out from destroy_worker(),
destroy_worker() doesn't require manager_mutex.
So the "lockdep_assert_held(&pool->manager_mutex)" in destroy_worker()
is removed, and destroy_worker() is not protected by manager_mutex
in put_unbound_pool().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   65 +++++++++++++++++++++++++++++++++------------------
 1 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 752e109..465e751 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -163,6 +163,7 @@ struct worker_pool {
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
 	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	struct completion	*detach_completion; /* all workers detached */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -1688,6 +1689,30 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_detach_from_pool() - detach the worker from the pool
+ * @worker: worker which is attached to its pool
+ * @pool: attached pool
+ *
+ * Undo the attaching which had been done in create_worker().
+ * The caller worker shouldn't access to the pool after detached
+ * except it has other reference to the pool.
+ */
+static void worker_detach_from_pool(struct worker *worker,
+				    struct worker_pool *pool)
+{
+	struct completion *detach_completion = NULL;
+
+	mutex_lock(&pool->manager_mutex);
+	idr_remove(&pool->worker_idr, worker->id);
+	if (idr_is_empty(&pool->worker_idr))
+		detach_completion = pool->detach_completion;
+	mutex_unlock(&pool->manager_mutex);
+
+	if (detach_completion)
+		complete(detach_completion);
+}
+
+/**
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
@@ -1815,13 +1840,12 @@ static int create_and_start_worker(struct worker_pool *pool)
  * The worker should be idle.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * spin_lock_irq(pool->lock).
  */
 static void destroy_worker(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
 
-	lockdep_assert_held(&pool->manager_mutex);
 	lockdep_assert_held(&pool->lock);
 
 	/* sanity check frenzy */
@@ -1833,24 +1857,9 @@ static void destroy_worker(struct worker *worker)
 	pool->nr_workers--;
 	pool->nr_idle--;
 
-	/*
-	 * Once WORKER_DIE is set, the kworker may destroy itself at any
-	 * point.  Pin to ensure the task stays until we're done with it.
-	 */
-	get_task_struct(worker->task);
-
 	list_del_init(&worker->entry);
 	worker->flags |= WORKER_DIE;
-
-	idr_remove(&pool->worker_idr, worker->id);
-
-	spin_unlock_irq(&pool->lock);
-
-	kthread_stop(worker->task);
-	put_task_struct(worker->task);
-	kfree(worker);
-
-	spin_lock_irq(&pool->lock);
+	wake_up_process(worker->task);
 }
 
 static void idle_worker_timeout(unsigned long __pool)
@@ -2289,6 +2298,10 @@ woke_up:
 		spin_unlock_irq(&pool->lock);
 		WARN_ON_ONCE(!list_empty(&worker->entry));
 		worker->task->flags &= ~PF_WQ_WORKER;
+
+		set_task_comm(worker->task, "kworker_dying");
+		worker_detach_from_pool(worker, pool);
+		kfree(worker);
 		return 0;
 	}
 
@@ -3561,6 +3574,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 static void put_unbound_pool(struct worker_pool *pool)
 {
 	struct worker *worker;
+	DECLARE_COMPLETION_ONSTACK(detach_completion);
 
 	lockdep_assert_held(&wq_pool_mutex);
 
@@ -3579,19 +3593,24 @@ static void put_unbound_pool(struct worker_pool *pool)
 
 	/*
 	 * Become the manager and destroy all workers.  Grabbing
-	 * manager_arb prevents @pool's workers from blocking on
-	 * manager_mutex.
+	 * manager_arb ensures manage_workers() finish and enter idle.
 	 */
 	mutex_lock(&pool->manager_arb);
-	mutex_lock(&pool->manager_mutex);
-	spin_lock_irq(&pool->lock);
 
+	spin_lock_irq(&pool->lock);
 	while ((worker = first_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-
 	spin_unlock_irq(&pool->lock);
+
+	mutex_lock(&pool->manager_mutex);
+	if (!idr_is_empty(&pool->worker_idr))
+		pool->detach_completion = &detach_completion;
 	mutex_unlock(&pool->manager_mutex);
+
+	if (pool->detach_completion)
+		wait_for_completion(pool->detach_completion);
+
 	mutex_unlock(&pool->manager_arb);
 
 	/* shut down the timers */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (2 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 03/10 V2] workqueue: async worker destruction Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 21:26       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr Lai Jiangshan
                       ` (6 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Since destroy_worker() doesn't need to sleep nor require manager_mutex,
destroy_worker() can be directly called in the idle timeout
handler, it helps us remove POOL_MANAGE_WORKERS and
maybe_destroy_worker() and simplify the manage_workers()

After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
need to test whether it needs to manage after processed works.
So we can remove the test branch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   69 ++++------------------------------------------------
 1 files changed, 5 insertions(+), 64 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 465e751..95695c3 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
 	 * manager_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
-	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
 
@@ -752,13 +751,6 @@ static bool need_to_create_worker(struct worker_pool *pool)
 	return need_more_worker(pool) && !may_start_working(pool);
 }
 
-/* Do I need to be the manager? */
-static bool need_to_manage_workers(struct worker_pool *pool)
-{
-	return need_to_create_worker(pool) ||
-		(pool->flags & POOL_MANAGE_WORKERS);
-}
-
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
@@ -1867,8 +1859,7 @@ static void idle_worker_timeout(unsigned long __pool)
 	struct worker_pool *pool = (void *)__pool;
 
 	spin_lock_irq(&pool->lock);
-
-	if (too_many_workers(pool)) {
+	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
 
@@ -1876,15 +1867,13 @@ static void idle_worker_timeout(unsigned long __pool)
 		worker = list_entry(pool->idle_list.prev, struct worker, entry);
 		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
 
-		if (time_before(jiffies, expires))
+		if (time_before(jiffies, expires)) {
 			mod_timer(&pool->idle_timer, expires);
-		else {
-			/* it's been idle for too long, wake up manager */
-			pool->flags |= POOL_MANAGE_WORKERS;
-			wake_up_worker(pool);
+			break;
 		}
-	}
 
+		destroy_worker(worker);
+	}
 	spin_unlock_irq(&pool->lock);
 }
 
@@ -2001,44 +1990,6 @@ restart:
 }
 
 /**
- * maybe_destroy_worker - destroy workers which have been idle for a while
- * @pool: pool to destroy workers for
- *
- * Destroy @pool workers which have been idle for longer than
- * IDLE_WORKER_TIMEOUT.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times.  Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_destroy_workers(struct worker_pool *pool)
-{
-	bool ret = false;
-
-	while (too_many_workers(pool)) {
-		struct worker *worker;
-		unsigned long expires;
-
-		worker = list_entry(pool->idle_list.prev, struct worker, entry);
-		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
-
-		if (time_before(jiffies, expires)) {
-			mod_timer(&pool->idle_timer, expires);
-			break;
-		}
-
-		destroy_worker(worker);
-		ret = true;
-	}
-
-	return ret;
-}
-
-/**
  * manage_workers - manage worker pool
  * @worker: self
  *
@@ -2101,13 +2052,6 @@ static bool manage_workers(struct worker *worker)
 		ret = true;
 	}
 
-	pool->flags &= ~POOL_MANAGE_WORKERS;
-
-	/*
-	 * Destroy and then create so that may_start_working() is true
-	 * on return.
-	 */
-	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
 	mutex_unlock(&pool->manager_mutex);
@@ -2349,9 +2293,6 @@ recheck:
 
 	worker_set_flags(worker, WORKER_PREP, false);
 sleep:
-	if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
-		goto recheck;
-
 	/*
 	 * pool->lock is held and there's no work to process and no need to
 	 * manage, sleep.  Workers are woken up only while holding
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (3 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 21:35       ` Tejun Heo
  2014-05-12 21:37       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida Lai Jiangshan
                       ` (5 subsequent siblings)
  10 siblings, 2 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Viresh Kumar, Ingo Molnar

worker_idr has the iteration(iterating for attached workers) and worker ID
duties. These two duties are not necessary tied together. We can separate
them and use a list for tracking attached workers and iteration

After separation, we can add the rescuer workers to the list for iteration
in future. worker_idr can't add rescuer workers due to rescuer workers
can't allocate id from worker_idr.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   39 +++++++++++++--------------------------
 kernel/workqueue_internal.h |    1 +
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 95695c3..b6cf4d9 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,7 +161,8 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs */
+	struct list_head	workers;	/* M: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
@@ -361,22 +362,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 		else
 
 /**
- * for_each_pool_worker - iterate through all workers of a worker_pool
- * @worker: iteration cursor
- * @wi: integer used for iteration
- * @pool: worker_pool to iterate workers of
- *
- * This must be called with @pool->manager_mutex.
- *
- * The if/else clause exists only for the lockdep assertion and can be
- * ignored.
- */
-#define for_each_pool_worker(worker, wi, pool)				\
-	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
-		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
-		else
-
-/**
  * for_each_pwq - iterate through all pool_workqueues of the specified workqueue
  * @pwq: iteration cursor
  * @wq: the target workqueue
@@ -1674,6 +1659,7 @@ static struct worker *alloc_worker(void)
 	if (worker) {
 		INIT_LIST_HEAD(&worker->entry);
 		INIT_LIST_HEAD(&worker->scheduled);
+		INIT_LIST_HEAD(&worker->node);
 		/* on creation a worker is in !idle && prep state */
 		worker->flags = WORKER_PREP;
 	}
@@ -1696,7 +1682,8 @@ static void worker_detach_from_pool(struct worker *worker,
 
 	mutex_lock(&pool->manager_mutex);
 	idr_remove(&pool->worker_idr, worker->id);
-	if (idr_is_empty(&pool->worker_idr))
+	list_del(&worker->node);
+	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
 	mutex_unlock(&pool->manager_mutex);
 
@@ -1772,6 +1759,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	/* successful, commit the pointer to idr */
 	idr_replace(&pool->worker_idr, worker, worker->id);
+	/* successful, attach the worker to the pool */
+	list_add_tail(&worker->node, &pool->workers);
 
 	return worker;
 
@@ -3481,6 +3470,7 @@ static int init_worker_pool(struct worker_pool *pool)
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
 	idr_init(&pool->worker_idr);
+	INIT_LIST_HEAD(&pool->workers);
 
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
@@ -3545,7 +3535,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	spin_unlock_irq(&pool->lock);
 
 	mutex_lock(&pool->manager_mutex);
-	if (!idr_is_empty(&pool->worker_idr))
+	if (!list_empty(&pool->workers))
 		pool->detach_completion = &detach_completion;
 	mutex_unlock(&pool->manager_mutex);
 
@@ -4530,7 +4520,6 @@ static void wq_unbind_fn(struct work_struct *work)
 	int cpu = smp_processor_id();
 	struct worker_pool *pool;
 	struct worker *worker;
-	int wi;
 
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
@@ -4545,7 +4534,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		list_for_each_entry(worker, &pool->workers, node)
 			worker->flags |= WORKER_UNBOUND;
 
 		pool->flags |= POOL_DISASSOCIATED;
@@ -4591,7 +4580,6 @@ static void wq_unbind_fn(struct work_struct *work)
 static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4602,13 +4590,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	list_for_each_entry(worker, &pool->workers, node)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 
 	spin_lock_irq(&pool->lock);
 
-	for_each_pool_worker(worker, wi, pool) {
+	list_for_each_entry(worker, &pool->workers, node) {
 		unsigned int worker_flags = worker->flags;
 
 		/*
@@ -4660,7 +4648,6 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 {
 	static cpumask_t cpumask;
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4674,7 +4661,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 		return;
 
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
-	for_each_pool_worker(worker, wi, pool)
+	list_for_each_entry(worker, &pool->workers, node)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 }
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..c44abc3 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,6 +37,7 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
+	struct list_head	node;		/* M: attached to the pool */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (4 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 21:40       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 07/10 V2] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
                       ` (4 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

We don't need to iterate workers via worker_idr, worker_idr is
used for allocating/freeing ID only, so we convert it to worker_ida.

By using ida_simple_get/remove(), worker_ida can be protected by itself,
so we don't need manager_mutex to protect it and remove the coupling,
and we can move the ID-removal out from worker_detach_from_pool().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b6cf4d9..9f7f4ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,10 +161,11 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs */
 	struct list_head	workers;	/* M: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
+	struct ida		worker_ida;	/* worker IDs for task name */
+
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
 	int			refcnt;		/* PL: refcnt for unbound pools */
@@ -1681,7 +1682,6 @@ static void worker_detach_from_pool(struct worker *worker,
 	struct completion *detach_completion = NULL;
 
 	mutex_lock(&pool->manager_mutex);
-	idr_remove(&pool->worker_idr, worker->id);
 	list_del(&worker->node);
 	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
@@ -1712,11 +1712,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	lockdep_assert_held(&pool->manager_mutex);
 
-	/*
-	 * ID is needed to determine kthread name.  Allocate ID first
-	 * without installing the pointer.
-	 */
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
+	/* ID is needed to determine kthread name. */
+	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1757,8 +1754,6 @@ static struct worker *create_worker(struct worker_pool *pool)
 	if (pool->flags & POOL_DISASSOCIATED)
 		worker->flags |= WORKER_UNBOUND;
 
-	/* successful, commit the pointer to idr */
-	idr_replace(&pool->worker_idr, worker, worker->id);
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
@@ -1766,7 +1761,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 fail:
 	if (id >= 0)
-		idr_remove(&pool->worker_idr, id);
+		ida_simple_remove(&pool->worker_ida, id);
 	kfree(worker);
 	return NULL;
 }
@@ -2233,6 +2228,7 @@ woke_up:
 		worker->task->flags &= ~PF_WQ_WORKER;
 
 		set_task_comm(worker->task, "kworker_dying");
+		ida_simple_remove(&pool->worker_ida, worker->id);
 		worker_detach_from_pool(worker, pool);
 		kfree(worker);
 		return 0;
@@ -3469,9 +3465,9 @@ static int init_worker_pool(struct worker_pool *pool)
 
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
-	idr_init(&pool->worker_idr);
 	INIT_LIST_HEAD(&pool->workers);
 
+	ida_init(&pool->worker_ida);
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
 
@@ -3486,7 +3482,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 {
 	struct worker_pool *pool = container_of(rcu, struct worker_pool, rcu);
 
-	idr_destroy(&pool->worker_idr);
+	ida_destroy(&pool->worker_ida);
 	free_workqueue_attrs(pool->attrs);
 	kfree(pool);
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/10 V2] workqueue: narrow the protection range of manager_mutex
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (5 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

In create_worker(), pool->worker_ida is protected by idr subsystem via
using ida_simple_get()/ida_simple_put(), it doesn't need manager_mutex

struct worker allocation and kthread allocation are not visible by any one,
before attached, they don't need manager_mutex either.

The above operations are before the attaching operation which attach
the worker to the pool. And between attached and starting the worker,
the worker is already attached to the pool, the cpuhotplug will handle the
cpu-binding for the worker correctly since it is attached to the pool.
So we don't need the manager_mutex after attached.

The conclusion is that only the attaching operation needs manager_mutex,
so we narrow the protection section of manager_mutex in create_worker().

Some comments about manager_mutex are removed, due to we will rename it to
attach_mutex and add worker_attach_to_pool() later which is self-comments.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   35 +++++------------------------------
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f7f4ef..663de70 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1710,8 +1710,6 @@ static struct worker *create_worker(struct worker_pool *pool)
 	int id = -1;
 	char id_buf[16];
 
-	lockdep_assert_held(&pool->manager_mutex);
-
 	/* ID is needed to determine kthread name. */
 	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
@@ -1740,6 +1738,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
+	mutex_lock(&pool->manager_mutex);
+
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
@@ -1747,7 +1747,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The caller is responsible for ensuring %POOL_DISASSOCIATED
+	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1757,6 +1757,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
+	mutex_unlock(&pool->manager_mutex);
+
 	return worker;
 
 fail:
@@ -1794,8 +1796,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	mutex_lock(&pool->manager_mutex);
-
 	worker = create_worker(pool);
 	if (worker) {
 		spin_lock_irq(&pool->lock);
@@ -1803,8 +1803,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 		spin_unlock_irq(&pool->lock);
 	}
 
-	mutex_unlock(&pool->manager_mutex);
-
 	return worker ? 0 : -ENOMEM;
 }
 
@@ -2002,8 +2000,6 @@ static bool manage_workers(struct worker *worker)
 	bool ret = false;
 
 	/*
-	 * Managership is governed by two mutexes - manager_arb and
-	 * manager_mutex.  manager_arb handles arbitration of manager role.
 	 * Anyone who successfully grabs manager_arb wins the arbitration
 	 * and becomes the manager.  mutex_trylock() on pool->manager_arb
 	 * failure while holding pool->lock reliably indicates that someone
@@ -2012,33 +2008,12 @@ static bool manage_workers(struct worker *worker)
 	 * grabbing manager_arb is responsible for actually performing
 	 * manager duties.  If manager_arb is grabbed and released without
 	 * actual management, the pool may stall indefinitely.
-	 *
-	 * manager_mutex is used for exclusion of actual management
-	 * operations.  The holder of manager_mutex can be sure that none
-	 * of management operations, including creation and destruction of
-	 * workers, won't take place until the mutex is released.  Because
-	 * manager_mutex doesn't interfere with manager role arbitration,
-	 * it is guaranteed that the pool's management, while may be
-	 * delayed, won't be disturbed by someone else grabbing
-	 * manager_mutex.
 	 */
 	if (!mutex_trylock(&pool->manager_arb))
 		return ret;
 
-	/*
-	 * With manager arbitration won, manager_mutex would be free in
-	 * most cases.  trylock first without dropping @pool->lock.
-	 */
-	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
-		spin_unlock_irq(&pool->lock);
-		mutex_lock(&pool->manager_mutex);
-		spin_lock_irq(&pool->lock);
-		ret = true;
-	}
-
 	ret |= maybe_create_worker(pool);
 
-	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
 	return ret;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (6 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 07/10 V2] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 22:01       ` Tejun Heo
  2014-05-12  6:56     ` [PATCH 09/10 V2] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Viresh Kumar, Ingo Molnar

manager_mutex is only used to protect the attaching for the pool
and the pool->workers list. It protects the pool->workers and operations
based on this list, such as:
	cpu-binding for the workers in the pool->workers
	concurrency management for the workers in the pool->workers

So we can simply rename manager_mutex to attach_mutex without any
functionality changed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   38 +++++++++++++++++++-------------------
 kernel/workqueue_internal.h |    2 +-
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 663de70..e6d9725 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -65,7 +65,7 @@ enum {
 	 * be executing on any CPU.  The pool behaves as an unbound one.
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
-	 * manager_mutex to avoid changing binding state while
+	 * attach_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
@@ -122,7 +122,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * M: pool->manager_mutex protected.
+ * A: pool->attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -160,8 +160,8 @@ struct worker_pool {
 
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
-	struct mutex		manager_mutex;	/* manager exclusion */
-	struct list_head	workers;	/* M: attached workers */
+	struct mutex		attach_mutex;	/* attach/detach exclusion */
+	struct list_head	workers;	/* A: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
 	struct ida		worker_ida;	/* worker IDs for task name */
@@ -1681,11 +1681,11 @@ static void worker_detach_from_pool(struct worker *worker,
 {
 	struct completion *detach_completion = NULL;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 	list_del(&worker->node);
 	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	if (detach_completion)
 		complete(detach_completion);
@@ -1738,7 +1738,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1747,7 +1747,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
+	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1757,7 +1757,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	return worker;
 
@@ -3439,7 +3439,7 @@ static int init_worker_pool(struct worker_pool *pool)
 		    (unsigned long)pool);
 
 	mutex_init(&pool->manager_arb);
-	mutex_init(&pool->manager_mutex);
+	mutex_init(&pool->attach_mutex);
 	INIT_LIST_HEAD(&pool->workers);
 
 	ida_init(&pool->worker_ida);
@@ -3505,10 +3505,10 @@ static void put_unbound_pool(struct worker_pool *pool)
 	WARN_ON(pool->nr_workers || pool->nr_idle);
 	spin_unlock_irq(&pool->lock);
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 	if (!list_empty(&pool->workers))
 		pool->detach_completion = &detach_completion;
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	if (pool->detach_completion)
 		wait_for_completion(pool->detach_completion);
@@ -4495,11 +4495,11 @@ static void wq_unbind_fn(struct work_struct *work)
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
 
-		mutex_lock(&pool->manager_mutex);
+		mutex_lock(&pool->attach_mutex);
 		spin_lock_irq(&pool->lock);
 
 		/*
-		 * We've blocked all manager operations.  Make all workers
+		 * We've blocked all attach/detach operations. Make all workers
 		 * unbound and set DISASSOCIATED.  Before this, all workers
 		 * except for the ones which are still executing works from
 		 * before the last CPU down must be on the cpu.  After
@@ -4511,7 +4511,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		pool->flags |= POOL_DISASSOCIATED;
 
 		spin_unlock_irq(&pool->lock);
-		mutex_unlock(&pool->manager_mutex);
+		mutex_unlock(&pool->attach_mutex);
 
 		/*
 		 * Call schedule() so that we cross rq->lock and thus can
@@ -4552,7 +4552,7 @@ static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->attach_mutex);
 
 	/*
 	 * Restore CPU affinity of all workers.  As all idle workers should
@@ -4620,7 +4620,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	static cpumask_t cpumask;
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->attach_mutex);
 
 	/* is @cpu allowed for @pool? */
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
@@ -4665,7 +4665,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 		mutex_lock(&wq_pool_mutex);
 
 		for_each_pool(pool, pi) {
-			mutex_lock(&pool->manager_mutex);
+			mutex_lock(&pool->attach_mutex);
 
 			if (pool->cpu == cpu) {
 				spin_lock_irq(&pool->lock);
@@ -4677,7 +4677,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 				restore_unbound_workers_cpumask(pool, cpu);
 			}
 
-			mutex_unlock(&pool->manager_mutex);
+			mutex_unlock(&pool->attach_mutex);
 		}
 
 		/* update NUMA affinity of unbound workqueues */
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index c44abc3..2acb41b 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,7 +37,7 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
-	struct list_head	node;		/* M: attached to the pool */
+	struct list_head	node;		/* A: attached to the pool */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/10 V2] workqueue: separate pool-attaching code out from create_worker()
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (7 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12  6:56     ` [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

The code of attaching is unfolded in create_worker().
Separating this code out will make the codes more clear.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   54 ++++++++++++++++++++++++++++++++-------------------
 1 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e6d9725..0ea0152 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
 	 * attach_mutex to avoid changing binding state while
-	 * create_worker() is in progress.
+	 * worker_attach_to_pool() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
@@ -1668,6 +1668,38 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_attach_to_pool() - attach the worker to the pool
+ * @worker: worker to be attached
+ * @pool: the target pool
+ *
+ * attach the worker to the pool, thus the concurrency and cpu-binding of
+ * the worker are kept coordination with the pool across cpu-[un]hotplug.
+ */
+static void worker_attach_to_pool(struct worker *worker,
+				   struct worker_pool *pool)
+{
+	mutex_lock(&pool->attach_mutex);
+
+	/*
+	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
+	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
+	 */
+	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+
+	/*
+	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
+	 * stable across this function.  See the comments above the
+	 * flag definition for details.
+	 */
+	if (pool->flags & POOL_DISASSOCIATED)
+		worker->flags |= WORKER_UNBOUND;
+
+	list_add_tail(&worker->node, &pool->workers);
+
+	mutex_unlock(&pool->attach_mutex);
+}
+
+/**
  * worker_detach_from_pool() - detach the worker from the pool
  * @worker: worker which is attached to its pool
  * @pool: attached pool
@@ -1738,26 +1770,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->attach_mutex);
-
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
-
-	/*
-	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED
-	 * remains stable across this function.  See the comments above the
-	 * flag definition for details.
-	 */
-	if (pool->flags & POOL_DISASSOCIATED)
-		worker->flags |= WORKER_UNBOUND;
-
 	/* successful, attach the worker to the pool */
-	list_add_tail(&worker->node, &pool->workers);
-
-	mutex_unlock(&pool->attach_mutex);
+	worker_attach_to_pool(worker, pool);
 
 	return worker;
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (8 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 09/10 V2] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
@ 2014-05-12  6:56     ` Lai Jiangshan
  2014-05-12 22:05       ` Tejun Heo
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  10 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-12  6:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

There are several problems with the code that rescuers bind itself to the pool'
cpumask
  1) It uses a way different from the normal workers to bind to the cpumask
     So we can't maintain the normal/rescuer workers under the same framework.
  2) The the code of cpu-binding for rescuer is complicated
  3) If one or more cpuhotplugs happen while the rescuer processes the
     scheduled works, the rescuer may not be correctly bound to the cpumask of
     the pool. This is allowed behavior, but is not good. It will be better
     if the cpumask of the rescuer is always kept coordination with the pool
     across any cpuhotplugs.

Using generic attach/detach routine will solve the above problems,
and result much more simple code.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   74 +++++----------------------------------------------
 1 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0ea0152..099f02c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1588,70 +1588,6 @@ static void worker_leave_idle(struct worker *worker)
 	list_del_init(&worker->entry);
 }
 
-/**
- * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
- * @pool: target worker_pool
- *
- * Bind %current to the cpu of @pool if it is associated and lock @pool.
- *
- * Works which are scheduled while the cpu is online must at least be
- * scheduled to a worker which is bound to the cpu so that if they are
- * flushed from cpu callbacks while cpu is going down, they are
- * guaranteed to execute on the cpu.
- *
- * This function is to be used by unbound workers and rescuers to bind
- * themselves to the target cpu and may race with cpu going down or
- * coming online.  kthread_bind() can't be used because it may put the
- * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
- * verbatim as it's best effort and blocking and pool may be
- * [dis]associated in the meantime.
- *
- * This function tries set_cpus_allowed() and locks pool and verifies the
- * binding against %POOL_DISASSOCIATED which is set during
- * %CPU_DOWN_PREPARE and cleared during %CPU_ONLINE, so if the worker
- * enters idle state or fetches works without dropping lock, it can
- * guarantee the scheduling requirement described in the first paragraph.
- *
- * CONTEXT:
- * Might sleep.  Called without any lock but returns with pool->lock
- * held.
- *
- * Return:
- * %true if the associated pool is online (@worker is successfully
- * bound), %false if offline.
- */
-static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
-__acquires(&pool->lock)
-{
-	while (true) {
-		/*
-		 * The following call may fail, succeed or succeed
-		 * without actually migrating the task to the cpu if
-		 * it races with cpu hotunplug operation.  Verify
-		 * against POOL_DISASSOCIATED.
-		 */
-		if (!(pool->flags & POOL_DISASSOCIATED))
-			set_cpus_allowed_ptr(current, pool->attrs->cpumask);
-
-		spin_lock_irq(&pool->lock);
-		if (pool->flags & POOL_DISASSOCIATED)
-			return false;
-		if (task_cpu(current) == pool->cpu &&
-		    cpumask_equal(&current->cpus_allowed, pool->attrs->cpumask))
-			return true;
-		spin_unlock_irq(&pool->lock);
-
-		/*
-		 * We've raced with CPU hot[un]plug.  Give it a breather
-		 * and retry migration.  cond_resched() is required here;
-		 * otherwise, we might deadlock against cpu_stop trying to
-		 * bring down the CPU on non-preemptive kernel.
-		 */
-		cpu_relax();
-		cond_resched();
-	}
-}
-
 static struct worker *alloc_worker(void)
 {
 	struct worker *worker;
@@ -2343,8 +2279,9 @@ repeat:
 
 		spin_unlock_irq(&wq_mayday_lock);
 
-		/* migrate to the target cpu if possible */
-		worker_maybe_bind_and_lock(pool);
+		worker_attach_to_pool(rescuer, pool);
+
+		spin_lock_irq(&pool->lock);
 		rescuer->pool = pool;
 
 		/*
@@ -2357,6 +2294,11 @@ repeat:
 				move_linked_works(work, scheduled, &n);
 
 		process_scheduled_works(rescuer);
+		spin_unlock_irq(&pool->lock);
+
+		worker_detach_from_pool(rescuer, pool);
+
+		spin_lock_irq(&pool->lock);
 
 		/*
 		 * Put the reference grabbed by send_mayday().  @pool won't
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only
  2014-05-12  6:56     ` [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
@ 2014-05-12 21:08       ` Tejun Heo
  2014-05-13  6:08         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:08 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Mon, May 12, 2014 at 02:56:14PM +0800, Lai Jiangshan wrote:
> @@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
>   * create_worker - create a new workqueue worker
>   * @pool: pool the new worker will belong to
>   *
> - * Create a new worker which is bound to @pool.  The returned worker
> - * can be started by calling start_worker() or destroyed using
> - * destroy_worker().
> + * Create a new worker which is attached to @pool.
> + * The new worker must be started and enter idle via start_worker().

Please always fill the comment paragarphs to 75 column or so.  Also,
do we even need start_worker() separate anymore?  Maybe we can just
fold alloc_and_create_worker() into alloc_worker()?

> @@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
>   * @worker: worker to be destroyed
>   *
>   * Destroy @worker and adjust @pool stats accordingly.
> + * The worker should be idle.

Ditto about filling.

Looks good otherwise.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10 V2] workqueue: async worker destruction
  2014-05-12  6:56     ` [PATCH 03/10 V2] workqueue: async worker destruction Lai Jiangshan
@ 2014-05-12 21:20       ` Tejun Heo
  2014-05-13  6:32         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:20 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Mon, May 12, 2014 at 02:56:15PM +0800, Lai Jiangshan wrote:
>  /**
> + * worker_detach_from_pool() - detach the worker from the pool
> + * @worker: worker which is attached to its pool
> + * @pool: attached pool
> + *
> + * Undo the attaching which had been done in create_worker().
> + * The caller worker shouldn't access to the pool after detached
> + * except it has other reference to the pool.
> + */
> +static void worker_detach_from_pool(struct worker *worker,
> +				    struct worker_pool *pool)
> +{
> +	struct completion *detach_completion = NULL;
> +
> +	mutex_lock(&pool->manager_mutex);
> +	idr_remove(&pool->worker_idr, worker->id);
> +	if (idr_is_empty(&pool->worker_idr))
> +		detach_completion = pool->detach_completion;
> +	mutex_unlock(&pool->manager_mutex);
> +
> +	if (detach_completion)
> +		complete(detach_completion);
> +}

Are we gonna use this function from somewhere else too?

> @@ -2289,6 +2298,10 @@ woke_up:
>  		spin_unlock_irq(&pool->lock);
>  		WARN_ON_ONCE(!list_empty(&worker->entry));
>  		worker->task->flags &= ~PF_WQ_WORKER;
> +
> +		set_task_comm(worker->task, "kworker_dying");

Given how other kworkers are named, maybe a better name is
"kworker/dying" or "kworker/detached"?

> +		worker_detach_from_pool(worker, pool);
> +		kfree(worker);
>  		return 0;
>  	}
>  
> @@ -3561,6 +3574,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
>  static void put_unbound_pool(struct worker_pool *pool)
>  {
>  	struct worker *worker;
> +	DECLARE_COMPLETION_ONSTACK(detach_completion);

I think it's conventional to put initialized ones (especially the ones
require initializing macros) before uninitialized vars.

> @@ -3579,19 +3593,24 @@ static void put_unbound_pool(struct worker_pool *pool)
>  
>  	/*
>  	 * Become the manager and destroy all workers.  Grabbing
> -	 * manager_arb prevents @pool's workers from blocking on
> -	 * manager_mutex.
> +	 * manager_arb ensures manage_workers() finish and enter idle.

I don't follow what the above comment update is trying to say.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler
  2014-05-12  6:56     ` [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
@ 2014-05-12 21:26       ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:26 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Mon, May 12, 2014 at 02:56:16PM +0800, Lai Jiangshan wrote:
> @@ -1867,8 +1859,7 @@ static void idle_worker_timeout(unsigned long __pool)
>  	struct worker_pool *pool = (void *)__pool;
>  
>  	spin_lock_irq(&pool->lock);
> -
> -	if (too_many_workers(pool)) {
> +	while (too_many_workers(pool)) {

Hey, I like the new line there.

>  		struct worker *worker;
>  		unsigned long expires;
>  
> @@ -1876,15 +1867,13 @@ static void idle_worker_timeout(unsigned long __pool)
>  		worker = list_entry(pool->idle_list.prev, struct worker, entry);
>  		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
>  
> -		if (time_before(jiffies, expires))
> +		if (time_before(jiffies, expires)) {
>  			mod_timer(&pool->idle_timer, expires);
> -		else {
> -			/* it's been idle for too long, wake up manager */
> -			pool->flags |= POOL_MANAGE_WORKERS;
> -			wake_up_worker(pool);
> +			break;
>  		}
> -	}
>  
> +		destroy_worker(worker);
> +	}

here too.

Other than the above new line removals, looks great to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr
  2014-05-12  6:56     ` [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr Lai Jiangshan
@ 2014-05-12 21:35       ` Tejun Heo
  2014-05-12 21:37       ` Tejun Heo
  1 sibling, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:35 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Peter Zijlstra, Viresh Kumar, Ingo Molnar

On Mon, May 12, 2014 at 02:56:17PM +0800, Lai Jiangshan wrote:
> worker_idr has the iteration(iterating for attached workers) and worker ID
                              ^
Please put a space before an opening parenthses.  It becomes a lot
easier on the eyes.

> duties. These two duties are not necessary tied together. We can separate
                                   necessarily

> them and use a list for tracking attached workers and iteration
> 
> After separation, we can add the rescuer workers to the list for iteration
> in future. worker_idr can't add rescuer workers due to rescuer workers
> can't allocate id from worker_idr.

Explaining how that'd be beneficial and justifies this change would be
nice.

...
>  /**
> - * for_each_pool_worker - iterate through all workers of a worker_pool
> - * @worker: iteration cursor
> - * @wi: integer used for iteration
> - * @pool: worker_pool to iterate workers of
> - *
> - * This must be called with @pool->manager_mutex.
> - *
> - * The if/else clause exists only for the lockdep assertion and can be
> - * ignored.
> - */
> -#define for_each_pool_worker(worker, wi, pool)				\
> -	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
> -		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
> -		else

Hmmm.... don't we still want the lockdep protection?

> @@ -1772,6 +1759,8 @@ static struct worker *create_worker(struct worker_pool *pool)
>  
>  	/* successful, commit the pointer to idr */

	/* successful, commit it to idr and link on the workers list */

>  	idr_replace(&pool->worker_idr, worker, worker->id);
> +	/* successful, attach the worker to the pool */

and lose the above line.

> diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
> index 7e2204d..c44abc3 100644
> --- a/kernel/workqueue_internal.h
> +++ b/kernel/workqueue_internal.h
> @@ -37,6 +37,7 @@ struct worker {
>  	struct task_struct	*task;		/* I: worker task */
>  	struct worker_pool	*pool;		/* I: the associated pool */
>  						/* L: for rescuers */
> +	struct list_head	node;		/* M: attached to the pool */

						/* M: anchored at pool->workers */

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr
  2014-05-12  6:56     ` [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr Lai Jiangshan
  2014-05-12 21:35       ` Tejun Heo
@ 2014-05-12 21:37       ` Tejun Heo
  1 sibling, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:37 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Peter Zijlstra, Viresh Kumar, Ingo Molnar

Oops, one more thing.

On Mon, May 12, 2014 at 02:56:17PM +0800, Lai Jiangshan wrote:
> -	struct idr		worker_idr;	/* M: worker IDs and iteration */
> +	struct idr		worker_idr;	/* M: worker IDs */
> +	struct list_head	workers;	/* M: attached workers */

Something like the following is probably more useful.

						/* M: runs through worker->node */

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida
  2014-05-12  6:56     ` [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida Lai Jiangshan
@ 2014-05-12 21:40       ` Tejun Heo
  2014-05-13  6:43         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 21:40 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Mon, May 12, 2014 at 02:56:18PM +0800, Lai Jiangshan wrote:
> @@ -1681,7 +1682,6 @@ static void worker_detach_from_pool(struct worker *worker,
>  	struct completion *detach_completion = NULL;
>  
>  	mutex_lock(&pool->manager_mutex);
> -	idr_remove(&pool->worker_idr, worker->id);
>  	list_del(&worker->node);
>  	if (list_empty(&pool->workers))
>  		detach_completion = pool->detach_completion;

Why are we moving ida removal to the caller here?  Does
worker_detach_from_pool() get used for something else later?  Up until
this point, the distinction seems pretty arbitrary.

> @@ -1757,8 +1754,6 @@ static struct worker *create_worker(struct worker_pool *pool)
>  	if (pool->flags & POOL_DISASSOCIATED)
>  		worker->flags |= WORKER_UNBOUND;
>  
> -	/* successful, commit the pointer to idr */
> -	idr_replace(&pool->worker_idr, worker, worker->id);

Ah, the comment is removed altogether.  Please disregard my previous
review on the comment.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex
  2014-05-12  6:56     ` [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
@ 2014-05-12 22:01       ` Tejun Heo
  2014-05-13  6:45         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 22:01 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Peter Zijlstra, Viresh Kumar, Ingo Molnar

On Mon, May 12, 2014 at 02:56:20PM +0800, Lai Jiangshan wrote:
> manager_mutex is only used to protect the attaching for the pool
> and the pool->workers list. It protects the pool->workers and operations
> based on this list, such as:
> 	cpu-binding for the workers in the pool->workers
> 	concurrency management for the workers in the pool->workers

Concurrency management?  What are you referring to?

thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers
  2014-05-12  6:56     ` [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
@ 2014-05-12 22:05       ` Tejun Heo
  2014-05-13  6:55         ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-12 22:05 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Mon, May 12, 2014 at 02:56:22PM +0800, Lai Jiangshan wrote:
> There are several problems with the code that rescuers bind itself to the pool'
> cpumask
>   1) It uses a way different from the normal workers to bind to the cpumask
>      So we can't maintain the normal/rescuer workers under the same framework.
>   2) The the code of cpu-binding for rescuer is complicated
>   3) If one or more cpuhotplugs happen while the rescuer processes the
>      scheduled works, the rescuer may not be correctly bound to the cpumask of
>      the pool. This is allowed behavior, but is not good. It will be better
>      if the cpumask of the rescuer is always kept coordination with the pool
>      across any cpuhotplugs.
> 
> Using generic attach/detach routine will solve the above problems,
> and result much more simple code.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>  static struct worker *alloc_worker(void)
>  {
>  	struct worker *worker;
> @@ -2343,8 +2279,9 @@ repeat:
>  
>  		spin_unlock_irq(&wq_mayday_lock);
>  
> -		/* migrate to the target cpu if possible */
> -		worker_maybe_bind_and_lock(pool);
> +		worker_attach_to_pool(rescuer, pool);
> +
> +		spin_lock_irq(&pool->lock);
>  		rescuer->pool = pool;
>  
>  		/*
> @@ -2357,6 +2294,11 @@ repeat:
>  				move_linked_works(work, scheduled, &n);
>  
>  		process_scheduled_works(rescuer);
> +		spin_unlock_irq(&pool->lock);
> +
> +		worker_detach_from_pool(rescuer, pool);
> +
> +		spin_lock_irq(&pool->lock);

Ah, right, this is how it's used.  Yeah, it makes sense.  In a long
patchset, it usually helps to mention your intentions when structuring
functions tho.  When you're separating out detach_from_pool, just
mention that the function will later be used to make rescuers use the
same attach/detach framework as normal workers.

How has this been tested?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only
  2014-05-12 21:08       ` Tejun Heo
@ 2014-05-13  6:08         ` Lai Jiangshan
  0 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-13  6:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/13/2014 05:08 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:56:14PM +0800, Lai Jiangshan wrote:
>> @@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
>>   * create_worker - create a new workqueue worker
>>   * @pool: pool the new worker will belong to
>>   *
>> - * Create a new worker which is bound to @pool.  The returned worker
>> - * can be started by calling start_worker() or destroyed using
>> - * destroy_worker().
>> + * Create a new worker which is attached to @pool.
>> + * The new worker must be started and enter idle via start_worker().
> 
> Please always fill the comment paragarphs to 75 column or so.  Also,

> do we even need start_worker() separate anymore?  Maybe we can just
> fold alloc_and_create_worker() into alloc_worker()?

We should do this I think. but it is just cleanup, I will do it after
this core patchset is accepted.

> 
>> @@ -1815,6 +1812,7 @@ static int create_and_start_worker(struct worker_pool *pool)
>>   * @worker: worker to be destroyed
>>   *
>>   * Destroy @worker and adjust @pool stats accordingly.
>> + * The worker should be idle.
> 
> Ditto about filling.
> 
> Looks good otherwise.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10 V2] workqueue: async worker destruction
  2014-05-12 21:20       ` Tejun Heo
@ 2014-05-13  6:32         ` Lai Jiangshan
  2014-05-13 14:14           ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-13  6:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/13/2014 05:20 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:56:15PM +0800, Lai Jiangshan wrote:
>>  /**
>> + * worker_detach_from_pool() - detach the worker from the pool
>> + * @worker: worker which is attached to its pool
>> + * @pool: attached pool
>> + *
>> + * Undo the attaching which had been done in create_worker().
>> + * The caller worker shouldn't access to the pool after detached
>> + * except it has other reference to the pool.
>> + */
>> +static void worker_detach_from_pool(struct worker *worker,
>> +				    struct worker_pool *pool)
>> +{
>> +	struct completion *detach_completion = NULL;
>> +
>> +	mutex_lock(&pool->manager_mutex);
>> +	idr_remove(&pool->worker_idr, worker->id);
>> +	if (idr_is_empty(&pool->worker_idr))
>> +		detach_completion = pool->detach_completion;
>> +	mutex_unlock(&pool->manager_mutex);
>> +
>> +	if (detach_completion)
>> +		complete(detach_completion);
>> +}
> 
> Are we gonna use this function from somewhere else too?

it is called from worker_thread().

I don't want to unfold it into worker_thread(), it is better
readability when it is wrapped and it will be called in patch10
for rescuer.

> 
>> @@ -2289,6 +2298,10 @@ woke_up:
>>  		spin_unlock_irq(&pool->lock);
>>  		WARN_ON_ONCE(!list_empty(&worker->entry));
>>  		worker->task->flags &= ~PF_WQ_WORKER;
>> +
>> +		set_task_comm(worker->task, "kworker_dying");
> 
> Given how other kworkers are named, maybe a better name is
> "kworker/dying" or "kworker/detached"?
> 
>> +		worker_detach_from_pool(worker, pool);
>> +		kfree(worker);
>>  		return 0;
>>  	}
>>  
>> @@ -3561,6 +3574,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
>>  static void put_unbound_pool(struct worker_pool *pool)
>>  {
>>  	struct worker *worker;
>> +	DECLARE_COMPLETION_ONSTACK(detach_completion);
> 
> I think it's conventional to put initialized ones (especially the ones
> require initializing macros) before uninitialized vars.
> 
>> @@ -3579,19 +3593,24 @@ static void put_unbound_pool(struct worker_pool *pool)
>>  
>>  	/*
>>  	 * Become the manager and destroy all workers.  Grabbing
>> -	 * manager_arb prevents @pool's workers from blocking on
>> -	 * manager_mutex.
>> +	 * manager_arb ensures manage_workers() finish and enter idle.
> 
> I don't follow what the above comment update is trying to say.

If a pool is destroying, the worker will not call manage_workers().
but the existing manage_workers() may be still running.

mutex_lock(&manager_arb) in put_unbound_pool() should wait this manage_workers()
finished due to the manager-worker (non-idle-worker) can't be destroyed.

> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida
  2014-05-12 21:40       ` Tejun Heo
@ 2014-05-13  6:43         ` Lai Jiangshan
  2014-05-13 14:17           ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-13  6:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/13/2014 05:40 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:56:18PM +0800, Lai Jiangshan wrote:
>> @@ -1681,7 +1682,6 @@ static void worker_detach_from_pool(struct worker *worker,
>>  	struct completion *detach_completion = NULL;
>>  
>>  	mutex_lock(&pool->manager_mutex);
>> -	idr_remove(&pool->worker_idr, worker->id);
>>  	list_del(&worker->node);
>>  	if (list_empty(&pool->workers))
>>  		detach_completion = pool->detach_completion;
> 
> Why are we moving ida removal to the caller here?  Does

ida							is for worker ID
pool->workers list and worker_detach_from_pool()	are for attaching/detaching

moving ida removal to the caller removes the unneeded coupling.

> worker_detach_from_pool() get used for something else later?  Up until
> this point, the distinction seems pretty arbitrary.
> 
>> @@ -1757,8 +1754,6 @@ static struct worker *create_worker(struct worker_pool *pool)
>>  	if (pool->flags & POOL_DISASSOCIATED)
>>  		worker->flags |= WORKER_UNBOUND;
>>  
>> -	/* successful, commit the pointer to idr */
>> -	idr_replace(&pool->worker_idr, worker, worker->id);
> 
> Ah, the comment is removed altogether.  Please disregard my previous
> review on the comment.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex
  2014-05-12 22:01       ` Tejun Heo
@ 2014-05-13  6:45         ` Lai Jiangshan
  2014-05-13 14:19           ` Tejun Heo
  0 siblings, 1 reply; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-13  6:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Peter Zijlstra, Viresh Kumar, Ingo Molnar

On 05/13/2014 06:01 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:56:20PM +0800, Lai Jiangshan wrote:
>> manager_mutex is only used to protect the attaching for the pool
>> and the pool->workers list. It protects the pool->workers and operations
>> based on this list, such as:
>> 	cpu-binding for the workers in the pool->workers
>> 	concurrency management for the workers in the pool->workers
> 
> Concurrency management?  What are you referring to?

The operations to set/clear POOL_DISASSOCIATED and WORKER_UNBOUND

Which words will be better if I don't use "Concurrency management"

> 
> thanks.
> 


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers
  2014-05-12 22:05       ` Tejun Heo
@ 2014-05-13  6:55         ` Lai Jiangshan
  0 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-13  6:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/13/2014 06:05 AM, Tejun Heo wrote:
> On Mon, May 12, 2014 at 02:56:22PM +0800, Lai Jiangshan wrote:
>> There are several problems with the code that rescuers bind itself to the pool'
>> cpumask
>>   1) It uses a way different from the normal workers to bind to the cpumask
>>      So we can't maintain the normal/rescuer workers under the same framework.
>>   2) The the code of cpu-binding for rescuer is complicated
>>   3) If one or more cpuhotplugs happen while the rescuer processes the
>>      scheduled works, the rescuer may not be correctly bound to the cpumask of
>>      the pool. This is allowed behavior, but is not good. It will be better
>>      if the cpumask of the rescuer is always kept coordination with the pool
>>      across any cpuhotplugs.
>>
>> Using generic attach/detach routine will solve the above problems,
>> and result much more simple code.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>>  static struct worker *alloc_worker(void)
>>  {
>>  	struct worker *worker;
>> @@ -2343,8 +2279,9 @@ repeat:
>>  
>>  		spin_unlock_irq(&wq_mayday_lock);
>>  
>> -		/* migrate to the target cpu if possible */
>> -		worker_maybe_bind_and_lock(pool);
>> +		worker_attach_to_pool(rescuer, pool);
>> +
>> +		spin_lock_irq(&pool->lock);
>>  		rescuer->pool = pool;
>>  
>>  		/*
>> @@ -2357,6 +2294,11 @@ repeat:
>>  				move_linked_works(work, scheduled, &n);
>>  
>>  		process_scheduled_works(rescuer);
>> +		spin_unlock_irq(&pool->lock);
>> +
>> +		worker_detach_from_pool(rescuer, pool);
>> +
>> +		spin_lock_irq(&pool->lock);
> 
> Ah, right, this is how it's used.  Yeah, it makes sense.  In a long
> patchset, it usually helps to mention your intentions when structuring
> functions tho.  When you're separating out detach_from_pool, just
> mention that the function will later be used to make rescuers use the

In patch3, the intention for separating out detach_from_pool() is
only for aync-destruction and it is used for worker_thread().

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10 V2] workqueue: async worker destruction
  2014-05-13  6:32         ` Lai Jiangshan
@ 2014-05-13 14:14           ` Tejun Heo
  2014-05-20  9:54             ` Lai Jiangshan
  0 siblings, 1 reply; 78+ messages in thread
From: Tejun Heo @ 2014-05-13 14:14 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Tue, May 13, 2014 at 02:32:52PM +0800, Lai Jiangshan wrote:
> >> +	if (detach_completion)
> >> +		complete(detach_completion);
> >> +}
> > 
> > Are we gonna use this function from somewhere else too?
> 
> it is called from worker_thread().
> 
> I don't want to unfold it into worker_thread(), it is better
> readability when it is wrapped and it will be called in patch10
> for rescuer.

Yeah, it's shared by rescuer later, so it's fine but, again, it
probably helps to mention that it's planned to do so; otherwise, the
rationale is kinda weak and what belongs to that function is rather
arbitrary.

> >>  	/*
> >>  	 * Become the manager and destroy all workers.  Grabbing
> >> -	 * manager_arb prevents @pool's workers from blocking on
> >> -	 * manager_mutex.
> >> +	 * manager_arb ensures manage_workers() finish and enter idle.
> > 
> > I don't follow what the above comment update is trying to say.
> 
> If a pool is destroying, the worker will not call manage_workers().
> but the existing manage_workers() may be still running.
> 
> mutex_lock(&manager_arb) in put_unbound_pool() should wait this manage_workers()
> finished due to the manager-worker (non-idle-worker) can't be destroyed.

Hmmm... I think it'd be better to spell it out then.  The single
sentence is kinda cryptic especially because the two verbs in the
sentence don't have the same subject (managee_workers() can't enter
idle).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida
  2014-05-13  6:43         ` Lai Jiangshan
@ 2014-05-13 14:17           ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-13 14:17 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Tue, May 13, 2014 at 02:43:06PM +0800, Lai Jiangshan wrote:
> On 05/13/2014 05:40 AM, Tejun Heo wrote:
> > On Mon, May 12, 2014 at 02:56:18PM +0800, Lai Jiangshan wrote:
> >> @@ -1681,7 +1682,6 @@ static void worker_detach_from_pool(struct worker *worker,
> >>  	struct completion *detach_completion = NULL;
> >>  
> >>  	mutex_lock(&pool->manager_mutex);
> >> -	idr_remove(&pool->worker_idr, worker->id);
> >>  	list_del(&worker->node);
> >>  	if (list_empty(&pool->workers))
> >>  		detach_completion = pool->detach_completion;
> > 
> > Why are we moving ida removal to the caller here?  Does
> 
> ida							is for worker ID
> pool->workers list and worker_detach_from_pool()	are for attaching/detaching
> 
> moving ida removal to the caller removes the unneeded coupling.

This is completely arbitrary.  For example, if rescuers needed IDs to
be allocated and deallocated on attach/detach, ID allocation should be
included in the above two functions, right?  This makes sense only
because rescuers don't have IDs and we're gonna use the above
functions for the rescuers too.  There's nothing inherent in
decoupling a worker's attachment to its pool and its pool ID
allocation.  The design developed this way only because there's
certain specific usage for it.  The code is fine but it usually is a
lot more helpful for reviewing and later reference if you actually
explain why things are done in certain specific ways because in
isolation the above decoupling is completely arbitrary.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex
  2014-05-13  6:45         ` Lai Jiangshan
@ 2014-05-13 14:19           ` Tejun Heo
  0 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-13 14:19 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel, Peter Zijlstra, Viresh Kumar, Ingo Molnar

Hello,

On Tue, May 13, 2014 at 02:45:58PM +0800, Lai Jiangshan wrote:
> On 05/13/2014 06:01 AM, Tejun Heo wrote:
> > On Mon, May 12, 2014 at 02:56:20PM +0800, Lai Jiangshan wrote:
> >> manager_mutex is only used to protect the attaching for the pool
> >> and the pool->workers list. It protects the pool->workers and operations
> >> based on this list, such as:
> >> 	cpu-binding for the workers in the pool->workers
> >> 	concurrency management for the workers in the pool->workers
> > 
> > Concurrency management?  What are you referring to?
> 
> The operations to set/clear POOL_DISASSOCIATED and WORKER_UNBOUND
> 
> Which words will be better if I don't use "Concurrency management"

Just be explicit and spell them out?  Concurrency management in wq
context usually means nr_running tracking and the associated worker
suppression / wake up mechanism.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching
  2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                       ` (9 preceding siblings ...)
  2014-05-12  6:56     ` [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
@ 2014-05-20  9:46     ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
                         ` (10 more replies)
  10 siblings, 11 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Patch1-4: async worker destruction

Patch2 reduces the review burden. It will be easier to review the whole
patchset if we know destroy_worker() is forced to destroy idle workers only.

Patch5-10: worker attaching/detaching and simplify the workers management

The code which attaches a worker to the pool and detaches a worker from the pool
is unfolded in create_worker()/destroy_worker().
The patchset moves this attaching/detaching code out and wraps them.

patch3-4 moves the detaching code out from destroy_worker(), and make
manger_mutex only protects the detaching code only rather than
protects the whole worker-destruction path.

patch5-7 makes manger_mutex only protects the attaching code rather than the
whole worker-creation path.

patch8: rename manger_mutex to attach_mutex
patch9-10: moves the attaching code out from create_worker() and use it for
rescuer.

Lai Jiangshan (10):
  workqueue: use manager lock only to protect worker_idr
  workqueue: destroy_worker() should destroy idle workers only
  workqueue: async worker destruction
  workqueue: destroy worker directly in the idle timeout handler
  workqueue: separate iteration role from worker_idr
  workqueue: convert worker_idr to worker_ida
  workqueue: narrow the protection range of manager_mutex
  workqueue: rename manager_mutex to attach_mutex
  workqueue: separate pool-attaching code out from create_worker()
  workqueue: use generic attach/detach routine for rescuers

 kernel/workqueue.c          |  394 ++++++++++++++-----------------------------
 kernel/workqueue_internal.h |    2 +
 2 files changed, 133 insertions(+), 263 deletions(-)

-- 
1.7.4.4

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH V3 01/10] workqueue: use manager lock only to protect worker_idr
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker_idr is highly bound to managers and is always/only accessed in manager
lock context. So we don't need pool->lock for it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   34 ++++++----------------------------
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c841108..910d963 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -124,8 +124,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * MG: pool->manager_mutex and pool->lock protected.  Writes require both
- *     locks.  Reads can happen under either lock.
+ * M: pool->manager_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -164,7 +163,7 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* MG: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs and iteration */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -340,16 +339,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 			   lockdep_is_held(&wq->mutex),			\
 			   "sched RCU or wq->mutex should be held")
 
-#ifdef CONFIG_LOCKDEP
-#define assert_manager_or_pool_lock(pool)				\
-	WARN_ONCE(debug_locks &&					\
-		  !lockdep_is_held(&(pool)->manager_mutex) &&		\
-		  !lockdep_is_held(&(pool)->lock),			\
-		  "pool->manager_mutex or ->lock should be held")
-#else
-#define assert_manager_or_pool_lock(pool)	do { } while (0)
-#endif
-
 #define for_each_cpu_worker_pool(pool, cpu)				\
 	for ((pool) = &per_cpu(cpu_worker_pools, cpu)[0];		\
 	     (pool) < &per_cpu(cpu_worker_pools, cpu)[NR_STD_WORKER_POOLS]; \
@@ -378,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
  * @wi: integer used for iteration
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with either @pool->manager_mutex or ->lock held.
+ * This must be called with @pool->manager_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, wi, pool)				\
 	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
-		if (({ assert_manager_or_pool_lock((pool)); false; })) { } \
+		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
 		else
 
 /**
@@ -1725,13 +1714,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	 * ID is needed to determine kthread name.  Allocate ID first
 	 * without installing the pointer.
 	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pool->lock);
-
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
-
-	spin_unlock_irq(&pool->lock);
-	idr_preload_end();
+	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1773,18 +1756,13 @@ static struct worker *create_worker(struct worker_pool *pool)
 		worker->flags |= WORKER_UNBOUND;
 
 	/* successful, commit the pointer to idr */
-	spin_lock_irq(&pool->lock);
 	idr_replace(&pool->worker_idr, worker, worker->id);
-	spin_unlock_irq(&pool->lock);
 
 	return worker;
 
 fail:
-	if (id >= 0) {
-		spin_lock_irq(&pool->lock);
+	if (id >= 0)
 		idr_remove(&pool->worker_idr, id);
-		spin_unlock_irq(&pool->lock);
-	}
 	kfree(worker);
 	return NULL;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 02/10] workqueue: destroy_worker() should destroy idle workers only
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

We used to have the CPU online failure path where a worker is created
and then destroyed without being started. A worker was created for
the CPU coming online and if the online operation failed the created worker
was shut down without being started.  But this behavior was changed.
The first worker is created and started at the same time for the CPU coming
online.

It means that we had already ensured in the code that destroy_worker()
destroys idle workers only. And we don't want to allow it destroys any
non-idle worker future. Otherwise, it may be buggy and it may be
extremely hard to check. We should force destroy_worker()
to destroy idle workers only explicitly.

Since destroy_worker() destroys idle only, this patch does not change any
functionality. We just need to update the comments and the sanity check code.

In the sanity check code, we will refuse to destroy the worker
if !(worker->flags & WORKER_IDLE).

If the worker entered idle which means it is already started,
so we remove the check of "worker->flags & WORKER_STARTED",
after this removal, WORKER_STARTED is totally unneeded,
so we remove WORKER_STARTED too.

In the comments for create_worker(), "Create a new worker which is bound..."
is changed to "... which is attached..." due to we change the name of this
behavior to attaching.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   19 ++++++++-----------
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 910d963..862b7ba 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -73,7 +73,6 @@ enum {
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */

 	/* worker flags */
-	WORKER_STARTED		= 1 << 0,	/* started */
 	WORKER_DIE		= 1 << 1,	/* die die die */
 	WORKER_IDLE		= 1 << 2,	/* is idle */
 	WORKER_PREP		= 1 << 3,	/* preparing to run works */
@@ -1692,9 +1691,8 @@ static struct worker *alloc_worker(void)
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
- * Create a new worker which is bound to @pool.  The returned worker
- * can be started by calling start_worker() or destroyed using
- * destroy_worker().
+ * Create a new worker which is attached to @pool.  The new worker must
+ * be started and enter idle via start_worker().
  *
  * CONTEXT:
  * Might sleep.  Does GFP_KERNEL allocations.
@@ -1778,7 +1776,6 @@ fail:
  */
 static void start_worker(struct worker *worker)
 {
-	worker->flags |= WORKER_STARTED;
 	worker->pool->nr_workers++;
 	worker_enter_idle(worker);
 	wake_up_process(worker->task);
@@ -1814,7 +1811,8 @@ static int create_and_start_worker(struct worker_pool *pool)
  * destroy_worker - destroy a workqueue worker
  * @worker: worker to be destroyed
  *
- * Destroy @worker and adjust @pool stats accordingly.
+ * Destroy @worker and adjust @pool stats accordingly.  The worker should
+ * be idle.
  *
  * CONTEXT:
  * spin_lock_irq(pool->lock) which is released and regrabbed.
@@ -1828,13 +1826,12 @@ static void destroy_worker(struct worker *worker)

 	/* sanity check frenzy */
 	if (WARN_ON(worker->current_work) ||
-	    WARN_ON(!list_empty(&worker->scheduled)))
+	    WARN_ON(!list_empty(&worker->scheduled)) ||
+	    WARN_ON(!(worker->flags & WORKER_IDLE)))
 		return;

-	if (worker->flags & WORKER_STARTED)
-		pool->nr_workers--;
-	if (worker->flags & WORKER_IDLE)
-		pool->nr_idle--;
+	pool->nr_workers--;
+	pool->nr_idle--;

 	/*
 	 * Once WORKER_DIE is set, the kworker may destroy itself at any
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 03/10] workqueue: async worker destruction
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20 14:22         ` Tejun Heo
                           ` (2 more replies)
  2014-05-20  9:46       ` [PATCH V3 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
                         ` (7 subsequent siblings)
  10 siblings, 3 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

worker destruction includes these parts of code:
	adjust pool's stats
	remove the worker from idle list
	detach the worker from the pool
	kthread_stop() to wait for the worker's task exit
	free the worker struct

We can find out that there is no essential work to do after
kthread_stop(). Which means destroy_worker() doesn't need
to wait for the worker's task exit. So we can remove kthread_stop()
and free the worker struct in the worker exiting path.

But put_unbound_pool() still needs to sync the all the workers'
destruction before to destroy the pool. Otherwise the workers
may access to the invalid pool when they are exiting.

So we also move the code of "detach the worker" to the exiting
path and let put_unbound_pool() to sync with this code via
detach_completion.

The code of "detach the worker" is wrapped in a new function
"worker_detach_from_pool()". Although worker_detach_from_pool() is only
called once (in worker_thread()) after this patch, but we need to
wrap it for these reasons:
  1) The code of "detach the worker" is not short enough to unfold them
     in worker_thread().
  2) the name of "worker_detach_from_pool()" is self-comment, and we add
     some comments above the function.
  3) it will be shared by rescuer in later patch which allows rescuer
     and normal thread use the same attach/detach frameworks.

And the worker id is freed when detaching which happens
before the worker is fully dead. But this id of the dying worker
may be re-used for a new worker. So the dying worker's task name
is changed to "worker/dying" to avoid two or several workers
having the same name.

Since "detach the worker" is moved out from destroy_worker(),
destroy_worker() doesn't require manager_mutex.
So the "lockdep_assert_held(&pool->manager_mutex)" in destroy_worker()
is removed, and destroy_worker() is not protected by manager_mutex
in put_unbound_pool().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   62 ++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 862b7ba..98d04a4 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -163,6 +163,7 @@ struct worker_pool {
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
 	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	struct completion	*detach_completion; /* all workers detached */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
@@ -1688,6 +1689,30 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_detach_from_pool() - detach the worker from the pool
+ * @worker: worker which is attached to its pool
+ * @pool: attached pool
+ *
+ * Undo the attaching which had been done in create_worker().
+ * The caller worker shouldn't access to the pool after detached
+ * except it has other reference to the pool.
+ */
+static void worker_detach_from_pool(struct worker *worker,
+				    struct worker_pool *pool)
+{
+	struct completion *detach_completion = NULL;
+
+	mutex_lock(&pool->manager_mutex);
+	idr_remove(&pool->worker_idr, worker->id);
+	if (idr_is_empty(&pool->worker_idr))
+		detach_completion = pool->detach_completion;
+	mutex_unlock(&pool->manager_mutex);
+
+	if (detach_completion)
+		complete(detach_completion);
+}
+
+/**
  * create_worker - create a new workqueue worker
  * @pool: pool the new worker will belong to
  *
@@ -1815,13 +1840,12 @@ static int create_and_start_worker(struct worker_pool *pool)
  * be idle.
  *
  * CONTEXT:
- * spin_lock_irq(pool->lock) which is released and regrabbed.
+ * spin_lock_irq(pool->lock).
  */
 static void destroy_worker(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
 
-	lockdep_assert_held(&pool->manager_mutex);
 	lockdep_assert_held(&pool->lock);
 
 	/* sanity check frenzy */
@@ -1833,24 +1857,9 @@ static void destroy_worker(struct worker *worker)
 	pool->nr_workers--;
 	pool->nr_idle--;
 
-	/*
-	 * Once WORKER_DIE is set, the kworker may destroy itself at any
-	 * point.  Pin to ensure the task stays until we're done with it.
-	 */
-	get_task_struct(worker->task);
-
 	list_del_init(&worker->entry);
 	worker->flags |= WORKER_DIE;
-
-	idr_remove(&pool->worker_idr, worker->id);
-
-	spin_unlock_irq(&pool->lock);
-
-	kthread_stop(worker->task);
-	put_task_struct(worker->task);
-	kfree(worker);
-
-	spin_lock_irq(&pool->lock);
+	wake_up_process(worker->task);
 }
 
 static void idle_worker_timeout(unsigned long __pool)
@@ -2289,6 +2298,10 @@ woke_up:
 		spin_unlock_irq(&pool->lock);
 		WARN_ON_ONCE(!list_empty(&worker->entry));
 		worker->task->flags &= ~PF_WQ_WORKER;
+
+		set_task_comm(worker->task, "kworker/dying");
+		worker_detach_from_pool(worker, pool);
+		kfree(worker);
 		return 0;
 	}
 
@@ -3560,6 +3573,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
  */
 static void put_unbound_pool(struct worker_pool *pool)
 {
+	DECLARE_COMPLETION_ONSTACK(detach_completion);
 	struct worker *worker;
 
 	lockdep_assert_held(&wq_pool_mutex);
@@ -3583,15 +3597,21 @@ static void put_unbound_pool(struct worker_pool *pool)
 	 * manager_mutex.
 	 */
 	mutex_lock(&pool->manager_arb);
-	mutex_lock(&pool->manager_mutex);
-	spin_lock_irq(&pool->lock);
 
+	spin_lock_irq(&pool->lock);
 	while ((worker = first_worker(pool)))
 		destroy_worker(worker);
 	WARN_ON(pool->nr_workers || pool->nr_idle);
-
 	spin_unlock_irq(&pool->lock);
+
+	mutex_lock(&pool->manager_mutex);
+	if (!idr_is_empty(&pool->worker_idr))
+		pool->detach_completion = &detach_completion;
 	mutex_unlock(&pool->manager_mutex);
+
+	if (pool->detach_completion)
+		wait_for_completion(pool->detach_completion);
+
 	mutex_unlock(&pool->manager_arb);
 
 	/* shut down the timers */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 04/10] workqueue: destroy worker directly in the idle timeout handler
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (2 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

Since destroy_worker() doesn't need to sleep nor require manager_mutex,
destroy_worker() can be directly called in the idle timeout
handler, it helps us remove POOL_MANAGE_WORKERS and
maybe_destroy_worker() and simplify the manage_workers()

After POOL_MANAGE_WORKERS is removed, worker_thread() doesn't
need to test whether it needs to manage after processed works.
So we can remove the test branch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   67 ++++------------------------------------------------
 1 files changed, 5 insertions(+), 62 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 98d04a4..1236544 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -68,7 +68,6 @@ enum {
 	 * manager_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
-	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
 
@@ -752,13 +751,6 @@ static bool need_to_create_worker(struct worker_pool *pool)
 	return need_more_worker(pool) && !may_start_working(pool);
 }
 
-/* Do I need to be the manager? */
-static bool need_to_manage_workers(struct worker_pool *pool)
-{
-	return need_to_create_worker(pool) ||
-		(pool->flags & POOL_MANAGE_WORKERS);
-}
-
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
@@ -1868,7 +1860,7 @@ static void idle_worker_timeout(unsigned long __pool)
 
 	spin_lock_irq(&pool->lock);
 
-	if (too_many_workers(pool)) {
+	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
 
@@ -1876,13 +1868,12 @@ static void idle_worker_timeout(unsigned long __pool)
 		worker = list_entry(pool->idle_list.prev, struct worker, entry);
 		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
 
-		if (time_before(jiffies, expires))
+		if (time_before(jiffies, expires)) {
 			mod_timer(&pool->idle_timer, expires);
-		else {
-			/* it's been idle for too long, wake up manager */
-			pool->flags |= POOL_MANAGE_WORKERS;
-			wake_up_worker(pool);
+			break;
 		}
+
+		destroy_worker(worker);
 	}
 
 	spin_unlock_irq(&pool->lock);
@@ -2001,44 +1992,6 @@ restart:
 }
 
 /**
- * maybe_destroy_worker - destroy workers which have been idle for a while
- * @pool: pool to destroy workers for
- *
- * Destroy @pool workers which have been idle for longer than
- * IDLE_WORKER_TIMEOUT.
- *
- * LOCKING:
- * spin_lock_irq(pool->lock) which may be released and regrabbed
- * multiple times.  Called only from manager.
- *
- * Return:
- * %false if no action was taken and pool->lock stayed locked, %true
- * otherwise.
- */
-static bool maybe_destroy_workers(struct worker_pool *pool)
-{
-	bool ret = false;
-
-	while (too_many_workers(pool)) {
-		struct worker *worker;
-		unsigned long expires;
-
-		worker = list_entry(pool->idle_list.prev, struct worker, entry);
-		expires = worker->last_active + IDLE_WORKER_TIMEOUT;
-
-		if (time_before(jiffies, expires)) {
-			mod_timer(&pool->idle_timer, expires);
-			break;
-		}
-
-		destroy_worker(worker);
-		ret = true;
-	}
-
-	return ret;
-}
-
-/**
  * manage_workers - manage worker pool
  * @worker: self
  *
@@ -2101,13 +2054,6 @@ static bool manage_workers(struct worker *worker)
 		ret = true;
 	}
 
-	pool->flags &= ~POOL_MANAGE_WORKERS;
-
-	/*
-	 * Destroy and then create so that may_start_working() is true
-	 * on return.
-	 */
-	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
 	mutex_unlock(&pool->manager_mutex);
@@ -2349,9 +2295,6 @@ recheck:
 
 	worker_set_flags(worker, WORKER_PREP, false);
 sleep:
-	if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
-		goto recheck;
-
 	/*
 	 * pool->lock is held and there's no work to process and no need to
 	 * manage, sleep.  Workers are woken up only while holding
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 05/10] workqueue: separate iteration role from worker_idr
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (3 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Viresh Kumar, Ingo Molnar

worker_idr has the iteration (iterating for attached workers) and worker ID
duties. These two duties are not necessarily tied together. We can separate
them and use a list for tracking attached workers and iteration.

Before this separation, it is not possible to add rescuer workers
to worker_idr due to rescuer workers can't allocate ID in run-time. (Because
ID-allocation depends on memory-allocation. Rescuer rescues the world
when memory is insufficient, so it must avoid to allocate memory.)

After separation, we can easily add the rescuer workers to the list for
iteration without any memory-allocation. It is required when we attach
the rescuer worker to the pool in later patch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   28 +++++++++++++++-------------
 kernel/workqueue_internal.h |    2 ++
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1236544..fd2f265 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,7 +161,8 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs and iteration */
+	struct idr		worker_idr;	/* M: worker IDs */
+	struct list_head	workers;	/* M: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
@@ -363,7 +364,6 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
 /**
  * for_each_pool_worker - iterate through all workers of a worker_pool
  * @worker: iteration cursor
- * @wi: integer used for iteration
  * @pool: worker_pool to iterate workers of
  *
  * This must be called with @pool->manager_mutex.
@@ -371,8 +371,8 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
-#define for_each_pool_worker(worker, wi, pool)				\
-	idr_for_each_entry(&(pool)->worker_idr, (worker), (wi))		\
+#define for_each_pool_worker(worker, pool)				\
+	list_for_each_entry((worker), &(pool)->workers, node)		\
 		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
 		else
 
@@ -1674,6 +1674,7 @@ static struct worker *alloc_worker(void)
 	if (worker) {
 		INIT_LIST_HEAD(&worker->entry);
 		INIT_LIST_HEAD(&worker->scheduled);
+		INIT_LIST_HEAD(&worker->node);
 		/* on creation a worker is in !idle && prep state */
 		worker->flags = WORKER_PREP;
 	}
@@ -1696,7 +1697,8 @@ static void worker_detach_from_pool(struct worker *worker,
 
 	mutex_lock(&pool->manager_mutex);
 	idr_remove(&pool->worker_idr, worker->id);
-	if (idr_is_empty(&pool->worker_idr))
+	list_del(&worker->node);
+	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
 	mutex_unlock(&pool->manager_mutex);
 
@@ -1772,6 +1774,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	/* successful, commit the pointer to idr */
 	idr_replace(&pool->worker_idr, worker, worker->id);
+	/* successful, attach the worker to the pool */
+	list_add_tail(&worker->node, &pool->workers);
 
 	return worker;
 
@@ -3483,6 +3487,7 @@ static int init_worker_pool(struct worker_pool *pool)
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
 	idr_init(&pool->worker_idr);
+	INIT_LIST_HEAD(&pool->workers);
 
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
@@ -3548,7 +3553,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	spin_unlock_irq(&pool->lock);
 
 	mutex_lock(&pool->manager_mutex);
-	if (!idr_is_empty(&pool->worker_idr))
+	if (!list_empty(&pool->workers))
 		pool->detach_completion = &detach_completion;
 	mutex_unlock(&pool->manager_mutex);
 
@@ -4533,7 +4538,6 @@ static void wq_unbind_fn(struct work_struct *work)
 	int cpu = smp_processor_id();
 	struct worker_pool *pool;
 	struct worker *worker;
-	int wi;
 
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
@@ -4548,7 +4552,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		 * before the last CPU down must be on the cpu.  After
 		 * this, they may become diasporas.
 		 */
-		for_each_pool_worker(worker, wi, pool)
+		for_each_pool_worker(worker, pool)
 			worker->flags |= WORKER_UNBOUND;
 
 		pool->flags |= POOL_DISASSOCIATED;
@@ -4594,7 +4598,6 @@ static void wq_unbind_fn(struct work_struct *work)
 static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4605,13 +4608,13 @@ static void rebind_workers(struct worker_pool *pool)
 	 * of all workers first and then clear UNBOUND.  As we're called
 	 * from CPU_ONLINE, the following shouldn't fail.
 	 */
-	for_each_pool_worker(worker, wi, pool)
+	for_each_pool_worker(worker, pool)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 
 	spin_lock_irq(&pool->lock);
 
-	for_each_pool_worker(worker, wi, pool) {
+	for_each_pool_worker(worker, pool) {
 		unsigned int worker_flags = worker->flags;
 
 		/*
@@ -4663,7 +4666,6 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 {
 	static cpumask_t cpumask;
 	struct worker *worker;
-	int wi;
 
 	lockdep_assert_held(&pool->manager_mutex);
 
@@ -4677,7 +4679,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 		return;
 
 	/* as we're called from CPU_ONLINE, the following shouldn't fail */
-	for_each_pool_worker(worker, wi, pool)
+	for_each_pool_worker(worker, pool)
 		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
 						  pool->attrs->cpumask) < 0);
 }
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 7e2204d..8888e06 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,6 +37,8 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
+	struct list_head	node;		/* M: anchored at pool->workers */
+						/* M: runs through worker->node */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 06/10] workqueue: convert worker_idr to worker_ida
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (4 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

We don't need to iterate workers via worker_idr, worker_idr is
used for allocating/freeing ID only, so we convert it to worker_ida.

By using ida_simple_get/remove(), worker_ida can be protected by itself,
so we don't need manager_mutex to protect it and the ID-removal code
is allowed to be moved out from worker_detach_from_pool().

And in later patch, worker_detach_from_pool() will be used in rescuers
which doesn't have IDs, so we directly move the ID-removal code out
from worker_detach_from_pool().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index fd2f265..637c244 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -161,10 +161,11 @@ struct worker_pool {
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
 	struct mutex		manager_mutex;	/* manager exclusion */
-	struct idr		worker_idr;	/* M: worker IDs */
 	struct list_head	workers;	/* M: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
+	struct ida		worker_ida;	/* worker IDs for task name */
+
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
 	int			refcnt;		/* PL: refcnt for unbound pools */
@@ -1696,7 +1697,6 @@ static void worker_detach_from_pool(struct worker *worker,
 	struct completion *detach_completion = NULL;
 
 	mutex_lock(&pool->manager_mutex);
-	idr_remove(&pool->worker_idr, worker->id);
 	list_del(&worker->node);
 	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
@@ -1727,11 +1727,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	lockdep_assert_held(&pool->manager_mutex);
 
-	/*
-	 * ID is needed to determine kthread name.  Allocate ID first
-	 * without installing the pointer.
-	 */
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_KERNEL);
+	/* ID is needed to determine kthread name. */
+	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1772,8 +1769,6 @@ static struct worker *create_worker(struct worker_pool *pool)
 	if (pool->flags & POOL_DISASSOCIATED)
 		worker->flags |= WORKER_UNBOUND;
 
-	/* successful, commit the pointer to idr */
-	idr_replace(&pool->worker_idr, worker, worker->id);
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
@@ -1781,7 +1776,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 fail:
 	if (id >= 0)
-		idr_remove(&pool->worker_idr, id);
+		ida_simple_remove(&pool->worker_ida, id);
 	kfree(worker);
 	return NULL;
 }
@@ -2250,6 +2245,7 @@ woke_up:
 		worker->task->flags &= ~PF_WQ_WORKER;
 
 		set_task_comm(worker->task, "kworker/dying");
+		ida_simple_remove(&pool->worker_ida, worker->id);
 		worker_detach_from_pool(worker, pool);
 		kfree(worker);
 		return 0;
@@ -3486,9 +3482,9 @@ static int init_worker_pool(struct worker_pool *pool)
 
 	mutex_init(&pool->manager_arb);
 	mutex_init(&pool->manager_mutex);
-	idr_init(&pool->worker_idr);
 	INIT_LIST_HEAD(&pool->workers);
 
+	ida_init(&pool->worker_ida);
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
 
@@ -3503,7 +3499,7 @@ static void rcu_free_pool(struct rcu_head *rcu)
 {
 	struct worker_pool *pool = container_of(rcu, struct worker_pool, rcu);
 
-	idr_destroy(&pool->worker_idr);
+	ida_destroy(&pool->worker_ida);
 	free_workqueue_attrs(pool->attrs);
 	kfree(pool);
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 07/10] workqueue: narrow the protection range of manager_mutex
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (5 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 08/10] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

In create_worker(), pool->worker_ida is protected by idr subsystem via
using ida_simple_get()/ida_simple_put(), it doesn't need manager_mutex

struct worker allocation and kthread allocation are not visible by any one,
before attached, they don't need manager_mutex either.

The above operations are before the attaching operation which attach
the worker to the pool. And between attached and starting the worker,
the worker is already attached to the pool, the cpuhotplug will handle the
cpu-binding for the worker correctly since it is attached to the pool.
So we don't need the manager_mutex after attached.

The conclusion is that only the attaching operation needs manager_mutex,
so we narrow the protection section of manager_mutex in create_worker().

Some comments about manager_mutex are removed, due to we will rename it to
attach_mutex and add worker_attach_to_pool() later which is self-comments.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   35 +++++------------------------------
 1 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 637c244..4ae6a3c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1725,8 +1725,6 @@ static struct worker *create_worker(struct worker_pool *pool)
 	int id = -1;
 	char id_buf[16];
 
-	lockdep_assert_held(&pool->manager_mutex);
-
 	/* ID is needed to determine kthread name. */
 	id = ida_simple_get(&pool->worker_ida, 0, 0, GFP_KERNEL);
 	if (id < 0)
@@ -1755,6 +1753,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
+	mutex_lock(&pool->manager_mutex);
+
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
 	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
@@ -1762,7 +1762,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The caller is responsible for ensuring %POOL_DISASSOCIATED
+	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1772,6 +1772,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
+	mutex_unlock(&pool->manager_mutex);
+
 	return worker;
 
 fail:
@@ -1809,8 +1811,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	mutex_lock(&pool->manager_mutex);
-
 	worker = create_worker(pool);
 	if (worker) {
 		spin_lock_irq(&pool->lock);
@@ -1818,8 +1818,6 @@ static int create_and_start_worker(struct worker_pool *pool)
 		spin_unlock_irq(&pool->lock);
 	}
 
-	mutex_unlock(&pool->manager_mutex);
-
 	return worker ? 0 : -ENOMEM;
 }
 
@@ -2019,8 +2017,6 @@ static bool manage_workers(struct worker *worker)
 	bool ret = false;
 
 	/*
-	 * Managership is governed by two mutexes - manager_arb and
-	 * manager_mutex.  manager_arb handles arbitration of manager role.
 	 * Anyone who successfully grabs manager_arb wins the arbitration
 	 * and becomes the manager.  mutex_trylock() on pool->manager_arb
 	 * failure while holding pool->lock reliably indicates that someone
@@ -2029,33 +2025,12 @@ static bool manage_workers(struct worker *worker)
 	 * grabbing manager_arb is responsible for actually performing
 	 * manager duties.  If manager_arb is grabbed and released without
 	 * actual management, the pool may stall indefinitely.
-	 *
-	 * manager_mutex is used for exclusion of actual management
-	 * operations.  The holder of manager_mutex can be sure that none
-	 * of management operations, including creation and destruction of
-	 * workers, won't take place until the mutex is released.  Because
-	 * manager_mutex doesn't interfere with manager role arbitration,
-	 * it is guaranteed that the pool's management, while may be
-	 * delayed, won't be disturbed by someone else grabbing
-	 * manager_mutex.
 	 */
 	if (!mutex_trylock(&pool->manager_arb))
 		return ret;
 
-	/*
-	 * With manager arbitration won, manager_mutex would be free in
-	 * most cases.  trylock first without dropping @pool->lock.
-	 */
-	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
-		spin_unlock_irq(&pool->lock);
-		mutex_lock(&pool->manager_mutex);
-		spin_lock_irq(&pool->lock);
-		ret = true;
-	}
-
 	ret |= maybe_create_worker(pool);
 
-	mutex_unlock(&pool->manager_mutex);
 	mutex_unlock(&pool->manager_arb);
 	return ret;
 }
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 08/10] workqueue: rename manager_mutex to attach_mutex
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (6 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 09/10] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
                         ` (2 subsequent siblings)
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tejun Heo, Lai Jiangshan, Peter Zijlstra, Viresh Kumar, Ingo Molnar

manager_mutex is only used to protect the attaching for the pool
and the pool->workers list. It protects the pool->workers and operations
based on this list, such as:
	cpu-binding for the workers in the pool->workers
	the operations to set/clear WORKER_UNBOUND

So we can simply rename manager_mutex to attach_mutex without any
functionality changed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   44 +++++++++++++++++++++---------------------
 kernel/workqueue_internal.h |    4 +-
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4ae6a3c..728b515 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -65,7 +65,7 @@ enum {
 	 * be executing on any CPU.  The pool behaves as an unbound one.
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
-	 * manager_mutex to avoid changing binding state while
+	 * attach_mutex to avoid changing binding state while
 	 * create_worker() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
@@ -122,7 +122,7 @@ enum {
  *    cpu or grabbing pool->lock is enough for read access.  If
  *    POOL_DISASSOCIATED is set, it's identical to L.
  *
- * M: pool->manager_mutex protected.
+ * A: pool->attach_mutex protected.
  *
  * PL: wq_pool_mutex protected.
  *
@@ -160,8 +160,8 @@ struct worker_pool {
 
 	/* see manage_workers() for details on the two manager mutexes */
 	struct mutex		manager_arb;	/* manager arbitration */
-	struct mutex		manager_mutex;	/* manager exclusion */
-	struct list_head	workers;	/* M: attached workers */
+	struct mutex		attach_mutex;	/* attach/detach exclusion */
+	struct list_head	workers;	/* A: attached workers */
 	struct completion	*detach_completion; /* all workers detached */
 
 	struct ida		worker_ida;	/* worker IDs for task name */
@@ -367,14 +367,14 @@ static void copy_workqueue_attrs(struct workqueue_attrs *to,
  * @worker: iteration cursor
  * @pool: worker_pool to iterate workers of
  *
- * This must be called with @pool->manager_mutex.
+ * This must be called with @pool->attach_mutex.
  *
  * The if/else clause exists only for the lockdep assertion and can be
  * ignored.
  */
 #define for_each_pool_worker(worker, pool)				\
 	list_for_each_entry((worker), &(pool)->workers, node)		\
-		if (({ lockdep_assert_held(&pool->manager_mutex); false; })) { } \
+		if (({ lockdep_assert_held(&pool->attach_mutex); false; })) { } \
 		else
 
 /**
@@ -1696,11 +1696,11 @@ static void worker_detach_from_pool(struct worker *worker,
 {
 	struct completion *detach_completion = NULL;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 	list_del(&worker->node);
 	if (list_empty(&pool->workers))
 		detach_completion = pool->detach_completion;
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	if (detach_completion)
 		complete(detach_completion);
@@ -1753,7 +1753,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 
 	/*
 	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
@@ -1762,7 +1762,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
 
 	/*
-	 * The pool->manager_mutex ensures %POOL_DISASSOCIATED
+	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED
 	 * remains stable across this function.  See the comments above the
 	 * flag definition for details.
 	 */
@@ -1772,7 +1772,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* successful, attach the worker to the pool */
 	list_add_tail(&worker->node, &pool->workers);
 
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	return worker;
 
@@ -3456,7 +3456,7 @@ static int init_worker_pool(struct worker_pool *pool)
 		    (unsigned long)pool);
 
 	mutex_init(&pool->manager_arb);
-	mutex_init(&pool->manager_mutex);
+	mutex_init(&pool->attach_mutex);
 	INIT_LIST_HEAD(&pool->workers);
 
 	ida_init(&pool->worker_ida);
@@ -3513,7 +3513,7 @@ static void put_unbound_pool(struct worker_pool *pool)
 	/*
 	 * Become the manager and destroy all workers.  Grabbing
 	 * manager_arb prevents @pool's workers from blocking on
-	 * manager_mutex.
+	 * attach_mutex.
 	 */
 	mutex_lock(&pool->manager_arb);
 
@@ -3523,10 +3523,10 @@ static void put_unbound_pool(struct worker_pool *pool)
 	WARN_ON(pool->nr_workers || pool->nr_idle);
 	spin_unlock_irq(&pool->lock);
 
-	mutex_lock(&pool->manager_mutex);
+	mutex_lock(&pool->attach_mutex);
 	if (!list_empty(&pool->workers))
 		pool->detach_completion = &detach_completion;
-	mutex_unlock(&pool->manager_mutex);
+	mutex_unlock(&pool->attach_mutex);
 
 	if (pool->detach_completion)
 		wait_for_completion(pool->detach_completion);
@@ -4513,11 +4513,11 @@ static void wq_unbind_fn(struct work_struct *work)
 	for_each_cpu_worker_pool(pool, cpu) {
 		WARN_ON_ONCE(cpu != smp_processor_id());
 
-		mutex_lock(&pool->manager_mutex);
+		mutex_lock(&pool->attach_mutex);
 		spin_lock_irq(&pool->lock);
 
 		/*
-		 * We've blocked all manager operations.  Make all workers
+		 * We've blocked all attach/detach operations. Make all workers
 		 * unbound and set DISASSOCIATED.  Before this, all workers
 		 * except for the ones which are still executing works from
 		 * before the last CPU down must be on the cpu.  After
@@ -4529,7 +4529,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		pool->flags |= POOL_DISASSOCIATED;
 
 		spin_unlock_irq(&pool->lock);
-		mutex_unlock(&pool->manager_mutex);
+		mutex_unlock(&pool->attach_mutex);
 
 		/*
 		 * Call schedule() so that we cross rq->lock and thus can
@@ -4570,7 +4570,7 @@ static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->attach_mutex);
 
 	/*
 	 * Restore CPU affinity of all workers.  As all idle workers should
@@ -4638,7 +4638,7 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 	static cpumask_t cpumask;
 	struct worker *worker;
 
-	lockdep_assert_held(&pool->manager_mutex);
+	lockdep_assert_held(&pool->attach_mutex);
 
 	/* is @cpu allowed for @pool? */
 	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
@@ -4683,7 +4683,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 		mutex_lock(&wq_pool_mutex);
 
 		for_each_pool(pool, pi) {
-			mutex_lock(&pool->manager_mutex);
+			mutex_lock(&pool->attach_mutex);
 
 			if (pool->cpu == cpu) {
 				spin_lock_irq(&pool->lock);
@@ -4695,7 +4695,7 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb,
 				restore_unbound_workers_cpumask(pool, cpu);
 			}
 
-			mutex_unlock(&pool->manager_mutex);
+			mutex_unlock(&pool->attach_mutex);
 		}
 
 		/* update NUMA affinity of unbound workqueues */
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 8888e06..4521587 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -37,8 +37,8 @@ struct worker {
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
 						/* L: for rescuers */
-	struct list_head	node;		/* M: anchored at pool->workers */
-						/* M: runs through worker->node */
+	struct list_head	node;		/* A: anchored at pool->workers */
+						/* A: runs through worker->node */
 
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 09/10] workqueue: separate pool-attaching code out from create_worker()
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (7 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 08/10] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20  9:46       ` [PATCH V3 10/10] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
  2014-05-20 15:00       ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Tejun Heo
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

The code of attaching is unfolded in create_worker().
Separating this code out will make the codes more clear.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   55 +++++++++++++++++++++++++++++++++------------------
 1 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 728b515..4d021e0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,7 +66,7 @@ enum {
 	 *
 	 * Note that DISASSOCIATED should be flipped only while holding
 	 * attach_mutex to avoid changing binding state while
-	 * create_worker() is in progress.
+	 * worker_attach_to_pool() is in progress.
 	 */
 	POOL_DISASSOCIATED	= 1 << 2,	/* cpu can't serve workers */
 	POOL_FREEZING		= 1 << 3,	/* freeze in progress */
@@ -1683,6 +1683,39 @@ static struct worker *alloc_worker(void)
 }
 
 /**
+ * worker_attach_to_pool() - attach the worker to the pool
+ * @worker: worker to be attached
+ * @pool: the target pool
+ *
+ * Attach the worker to the pool, thus the %WORKER_UNBOUND flag and
+ * cpu-binding of the worker are kept coordination with the pool
+ * across cpu-[un]hotplug.
+ */
+static void worker_attach_to_pool(struct worker *worker,
+				   struct worker_pool *pool)
+{
+	mutex_lock(&pool->attach_mutex);
+
+	/*
+	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
+	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
+	 */
+	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+
+	/*
+	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED remains
+	 * stable across this function.  See the comments above the
+	 * flag definition for details.
+	 */
+	if (pool->flags & POOL_DISASSOCIATED)
+		worker->flags |= WORKER_UNBOUND;
+
+	list_add_tail(&worker->node, &pool->workers);
+
+	mutex_unlock(&pool->attach_mutex);
+}
+
+/**
  * worker_detach_from_pool() - detach the worker from the pool
  * @worker: worker which is attached to its pool
  * @pool: attached pool
@@ -1753,26 +1786,8 @@ static struct worker *create_worker(struct worker_pool *pool)
 	/* prevent userland from meddling with cpumask of workqueue workers */
 	worker->task->flags |= PF_NO_SETAFFINITY;
 
-	mutex_lock(&pool->attach_mutex);
-
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
-
-	/*
-	 * The pool->attach_mutex ensures %POOL_DISASSOCIATED
-	 * remains stable across this function.  See the comments above the
-	 * flag definition for details.
-	 */
-	if (pool->flags & POOL_DISASSOCIATED)
-		worker->flags |= WORKER_UNBOUND;
-
 	/* successful, attach the worker to the pool */
-	list_add_tail(&worker->node, &pool->workers);
-
-	mutex_unlock(&pool->attach_mutex);
+	worker_attach_to_pool(worker, pool);
 
 	return worker;
 
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH V3 10/10] workqueue: use generic attach/detach routine for rescuers
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (8 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 09/10] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
@ 2014-05-20  9:46       ` Lai Jiangshan
  2014-05-20 15:00       ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Tejun Heo
  10 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tejun Heo, Lai Jiangshan

There are several problems with the code that rescuers bind itself to the pool'
cpumask
  1) It uses a way different from the normal workers to bind to the cpumask
     So we can't maintain the normal/rescuer workers under the same framework.
  2) The the code of cpu-binding for rescuer is complicated
  3) If one or more cpuhotplugs happen while the rescuer processes the
     scheduled works, the rescuer may not be correctly bound to the cpumask of
     the pool. This is allowed behavior, but is not good. It will be better
     if the cpumask of the rescuer is always kept coordination with the pool
     across any cpuhotplugs.

Using generic attach/detach routine will solve the above problems,
and result much more simple code.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   74 +++++----------------------------------------------
 1 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4d021e0..2245dc7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1603,70 +1603,6 @@ static void worker_leave_idle(struct worker *worker)
 	list_del_init(&worker->entry);
 }
 
-/**
- * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
- * @pool: target worker_pool
- *
- * Bind %current to the cpu of @pool if it is associated and lock @pool.
- *
- * Works which are scheduled while the cpu is online must at least be
- * scheduled to a worker which is bound to the cpu so that if they are
- * flushed from cpu callbacks while cpu is going down, they are
- * guaranteed to execute on the cpu.
- *
- * This function is to be used by unbound workers and rescuers to bind
- * themselves to the target cpu and may race with cpu going down or
- * coming online.  kthread_bind() can't be used because it may put the
- * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
- * verbatim as it's best effort and blocking and pool may be
- * [dis]associated in the meantime.
- *
- * This function tries set_cpus_allowed() and locks pool and verifies the
- * binding against %POOL_DISASSOCIATED which is set during
- * %CPU_DOWN_PREPARE and cleared during %CPU_ONLINE, so if the worker
- * enters idle state or fetches works without dropping lock, it can
- * guarantee the scheduling requirement described in the first paragraph.
- *
- * CONTEXT:
- * Might sleep.  Called without any lock but returns with pool->lock
- * held.
- *
- * Return:
- * %true if the associated pool is online (@worker is successfully
- * bound), %false if offline.
- */
-static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
-__acquires(&pool->lock)
-{
-	while (true) {
-		/*
-		 * The following call may fail, succeed or succeed
-		 * without actually migrating the task to the cpu if
-		 * it races with cpu hotunplug operation.  Verify
-		 * against POOL_DISASSOCIATED.
-		 */
-		if (!(pool->flags & POOL_DISASSOCIATED))
-			set_cpus_allowed_ptr(current, pool->attrs->cpumask);
-
-		spin_lock_irq(&pool->lock);
-		if (pool->flags & POOL_DISASSOCIATED)
-			return false;
-		if (task_cpu(current) == pool->cpu &&
-		    cpumask_equal(&current->cpus_allowed, pool->attrs->cpumask))
-			return true;
-		spin_unlock_irq(&pool->lock);
-
-		/*
-		 * We've raced with CPU hot[un]plug.  Give it a breather
-		 * and retry migration.  cond_resched() is required here;
-		 * otherwise, we might deadlock against cpu_stop trying to
-		 * bring down the CPU on non-preemptive kernel.
-		 */
-		cpu_relax();
-		cond_resched();
-	}
-}
-
 static struct worker *alloc_worker(void)
 {
 	struct worker *worker;
@@ -2361,8 +2297,9 @@ repeat:
 
 		spin_unlock_irq(&wq_mayday_lock);
 
-		/* migrate to the target cpu if possible */
-		worker_maybe_bind_and_lock(pool);
+		worker_attach_to_pool(rescuer, pool);
+
+		spin_lock_irq(&pool->lock);
 		rescuer->pool = pool;
 
 		/*
@@ -2375,6 +2312,11 @@ repeat:
 				move_linked_works(work, scheduled, &n);
 
 		process_scheduled_works(rescuer);
+		spin_unlock_irq(&pool->lock);
+
+		worker_detach_from_pool(rescuer, pool);
+
+		spin_lock_irq(&pool->lock);
 
 		/*
 		 * Put the reference grabbed by send_mayday().  @pool won't
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10 V2] workqueue: async worker destruction
  2014-05-13 14:14           ` Tejun Heo
@ 2014-05-20  9:54             ` Lai Jiangshan
  0 siblings, 0 replies; 78+ messages in thread
From: Lai Jiangshan @ 2014-05-20  9:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 05/13/2014 10:14 PM, Tejun Heo wrote:
> Hello,

> Given how other kworkers are named, maybe a better name is
> "kworker/dying" or "kworker/detached"?

I use "kworker/dying". the name of "attach/detach" should be hidden from userspace 

> 
> On Tue, May 13, 2014 at 02:32:52PM +0800, Lai Jiangshan wrote:
>>>> +	if (detach_completion)
>>>> +		complete(detach_completion);
>>>> +}
>>>
>>> Are we gonna use this function from somewhere else too?
>>
>> it is called from worker_thread().
>>
>> I don't want to unfold it into worker_thread(), it is better
>> readability when it is wrapped and it will be called in patch10
>> for rescuer.
> 
> Yeah, it's shared by rescuer later, so it's fine but, again, it
> probably helps to mention that it's planned to do so; otherwise, the
> rationale is kinda weak and what belongs to that function is rather
> arbitrary.

changelog is updated.

> 
>>>>  	/*
>>>>  	 * Become the manager and destroy all workers.  Grabbing
>>>> -	 * manager_arb prevents @pool's workers from blocking on
>>>> -	 * manager_mutex.
>>>> +	 * manager_arb ensures manage_workers() finish and enter idle.
>>>
>>> I don't follow what the above comment update is trying to say.
>>
>> If a pool is destroying, the worker will not call manage_workers().
>> but the existing manage_workers() may be still running.
>>
>> mutex_lock(&manager_arb) in put_unbound_pool() should wait this manage_workers()
>> finished due to the manager-worker (non-idle-worker) can't be destroyed.
> 
> Hmmm... I think it'd be better to spell it out then.  The single
> sentence is kinda cryptic especially because the two verbs in the
> sentence don't have the same subject (managee_workers() can't enter
> idle).
> 

All your comments in this patchset are handled except this one.
modifying to this comments is not necessarily in this patchset, so I just
remove this it. Your original comments in the code is OK for me.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH V3 03/10] workqueue: async worker destruction
  2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
@ 2014-05-20 14:22         ` Tejun Heo
  2014-05-20 14:23         ` Tejun Heo
  2014-05-20 14:24         ` Tejun Heo
  2 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-20 14:22 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Tue, May 20, 2014 at 05:46:29PM +0800, Lai Jiangshan wrote:
>  /**
> + * worker_detach_from_pool() - detach the worker from the pool
> + * @worker: worker which is attached to its pool
> + * @pool: attached pool
> + *
> + * Undo the attaching which had been done in create_worker().
> + * The caller worker shouldn't access to the pool after detached
> + * except it has other reference to the pool.
> + */
> +static void worker_detach_from_pool(struct worker *worker,
> +				    struct worker_pool *pool)
> +{
> +	struct completion *detach_completion = NULL;
> +
> +	mutex_lock(&pool->manager_mutex);
> +	idr_remove(&pool->worker_idr, worker->id);
> +	if (idr_is_empty(&pool->worker_idr))
> +		detach_completion = pool->detach_completion;
> +	mutex_unlock(&pool->manager_mutex);
> +
> +	if (detach_completion)
> +		complete(detach_completion);
> +}

As a separate clean up, please submit a patch to remove @pool from the
above function.  @worker is attached to @pool, so the argument is
redundant.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH V3 03/10] workqueue: async worker destruction
  2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
  2014-05-20 14:22         ` Tejun Heo
@ 2014-05-20 14:23         ` Tejun Heo
  2014-05-20 14:24         ` Tejun Heo
  2 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-20 14:23 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Tue, May 20, 2014 at 05:46:29PM +0800, Lai Jiangshan wrote:
...
> +static void worker_detach_from_pool(struct worker *worker,
> +				    struct worker_pool *pool)
> +{
> +	struct completion *detach_completion = NULL;
> +
> +	mutex_lock(&pool->manager_mutex);
> +	idr_remove(&pool->worker_idr, worker->id);
> +	if (idr_is_empty(&pool->worker_idr))
> +		detach_completion = pool->detach_completion;
> +	mutex_unlock(&pool->manager_mutex);
> +
> +	if (detach_completion)
> +		complete(detach_completion);

Also, please add comment explaining what's going on here.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH V3 03/10] workqueue: async worker destruction
  2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
  2014-05-20 14:22         ` Tejun Heo
  2014-05-20 14:23         ` Tejun Heo
@ 2014-05-20 14:24         ` Tejun Heo
  2 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-20 14:24 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Tue, May 20, 2014 at 05:46:29PM +0800, Lai Jiangshan wrote:
>  static void idle_worker_timeout(unsigned long __pool)
> @@ -2289,6 +2298,10 @@ woke_up:
>  		spin_unlock_irq(&pool->lock);
>  		WARN_ON_ONCE(!list_empty(&worker->entry));
>  		worker->task->flags &= ~PF_WQ_WORKER;
> +
> +		set_task_comm(worker->task, "kworker/dying");

Also, didn't I ask you to add comment for the above before?

> +		worker_detach_from_pool(worker, pool);
> +		kfree(worker);

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching
  2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
                         ` (9 preceding siblings ...)
  2014-05-20  9:46       ` [PATCH V3 10/10] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
@ 2014-05-20 15:00       ` Tejun Heo
  10 siblings, 0 replies; 78+ messages in thread
From: Tejun Heo @ 2014-05-20 15:00 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Tue, May 20, 2014 at 05:46:26PM +0800, Lai Jiangshan wrote:
> Patch1-4: async worker destruction
> 
> Patch2 reduces the review burden. It will be easier to review the whole
> patchset if we know destroy_worker() is forced to destroy idle workers only.
> 
> Patch5-10: worker attaching/detaching and simplify the workers management
> 
> The code which attaches a worker to the pool and detaches a worker from the pool
> is unfolded in create_worker()/destroy_worker().
> The patchset moves this attaching/detaching code out and wraps them.
> 
> patch3-4 moves the detaching code out from destroy_worker(), and make
> manger_mutex only protects the detaching code only rather than
> protects the whole worker-destruction path.
> 
> patch5-7 makes manger_mutex only protects the attaching code rather than the
> whole worker-creation path.
> 
> patch8: rename manger_mutex to attach_mutex
> patch9-10: moves the attaching code out from create_worker() and use it for
> rescuer.
> 
> 
> Lai Jiangshan (10):
>   workqueue: use manager lock only to protect worker_idr
>   workqueue: destroy_worker() should destroy idle workers only
>   workqueue: async worker destruction
>   workqueue: destroy worker directly in the idle timeout handler
>   workqueue: separate iteration role from worker_idr
>   workqueue: convert worker_idr to worker_ida
>   workqueue: narrow the protection range of manager_mutex
>   workqueue: rename manager_mutex to attach_mutex
>   workqueue: separate pool-attaching code out from create_worker()
>   workqueue: use generic attach/detach routine for rescuers

Applied to wq/for-3.16 with comment and description updates.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2014-05-20 15:00 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-27  4:08 [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Lai Jiangshan
2014-04-27  4:08 ` [PATCH 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
2014-05-05 13:01   ` Tejun Heo
2014-05-06 15:43     ` Lai Jiangshan
2014-04-27  4:08 ` [PATCH 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
2014-05-05 13:13   ` Tejun Heo
2014-05-06 15:58     ` Lai Jiangshan
2014-04-27  4:08 ` [PATCH 03/10] workqueue: async worker destruction Lai Jiangshan
2014-05-05 14:31   ` Tejun Heo
2014-05-05 15:02   ` Tejun Heo
2014-05-06 16:27     ` Lai Jiangshan
2014-05-06 16:31       ` Tejun Heo
2014-05-07  7:30   ` Lai Jiangshan
2014-05-07 13:15     ` Tejun Heo
2014-04-27  4:08 ` [PATCH 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
2014-05-05 14:36   ` Tejun Heo
2014-05-07  7:10     ` Lai Jiangshan
2014-05-07 13:12       ` Tejun Heo
2014-05-07 13:38         ` Lai Jiangshan
2014-05-07 13:41           ` Tejun Heo
2014-05-07 15:30             ` Lai Jiangshan
2014-05-07 15:37               ` Tejun Heo
2014-04-27  4:09 ` [PATCH 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
2014-05-05 14:57   ` Tejun Heo
2014-04-27  4:09 ` [PATCH 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
2014-05-05 14:59   ` Tejun Heo
2014-05-06 16:33     ` Lai Jiangshan
2014-05-06 16:35       ` Tejun Heo
2014-05-06 16:38         ` Tejun Heo
2014-04-27  4:09 ` [PATCH 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
2014-04-27  4:09 ` [PATCH 08/10] workqueue: rename manager_mutex to bind_mutex Lai Jiangshan
2014-04-27  4:09 ` [PATCH 09/10] workqueue: separate pool-binding code out from create_worker() Lai Jiangshan
2014-04-27  4:09 ` [PATCH 10/10] workqueue: use generic pool-bind/unbind routine for rescuers Lai Jiangshan
2014-05-05 14:54   ` Tejun Heo
2014-05-05 15:05 ` [PATCH 00/10] workqueue: async worker destruction and pool-binding synchronization Tejun Heo
2014-05-12  6:56   ` [PATCH 00/10 V2] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
2014-05-12  6:56     ` [PATCH 01/10 V2] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
2014-05-12  6:56     ` [PATCH 02/10 V2] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
2014-05-12 21:08       ` Tejun Heo
2014-05-13  6:08         ` Lai Jiangshan
2014-05-12  6:56     ` [PATCH 03/10 V2] workqueue: async worker destruction Lai Jiangshan
2014-05-12 21:20       ` Tejun Heo
2014-05-13  6:32         ` Lai Jiangshan
2014-05-13 14:14           ` Tejun Heo
2014-05-20  9:54             ` Lai Jiangshan
2014-05-12  6:56     ` [PATCH 04/10 V2] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
2014-05-12 21:26       ` Tejun Heo
2014-05-12  6:56     ` [PATCH 05/10 V2] workqueue: separate iteration role from worker_idr Lai Jiangshan
2014-05-12 21:35       ` Tejun Heo
2014-05-12 21:37       ` Tejun Heo
2014-05-12  6:56     ` [PATCH 06/10 V2] workqueue: convert worker_idr to worker_ida Lai Jiangshan
2014-05-12 21:40       ` Tejun Heo
2014-05-13  6:43         ` Lai Jiangshan
2014-05-13 14:17           ` Tejun Heo
2014-05-12  6:56     ` [PATCH 07/10 V2] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
2014-05-12  6:56     ` [PATCH 08/10 V2] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
2014-05-12 22:01       ` Tejun Heo
2014-05-13  6:45         ` Lai Jiangshan
2014-05-13 14:19           ` Tejun Heo
2014-05-12  6:56     ` [PATCH 09/10 V2] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
2014-05-12  6:56     ` [PATCH 10/10 V2] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
2014-05-12 22:05       ` Tejun Heo
2014-05-13  6:55         ` Lai Jiangshan
2014-05-20  9:46     ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 01/10] workqueue: use manager lock only to protect worker_idr Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 02/10] workqueue: destroy_worker() should destroy idle workers only Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 03/10] workqueue: async worker destruction Lai Jiangshan
2014-05-20 14:22         ` Tejun Heo
2014-05-20 14:23         ` Tejun Heo
2014-05-20 14:24         ` Tejun Heo
2014-05-20  9:46       ` [PATCH V3 04/10] workqueue: destroy worker directly in the idle timeout handler Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 05/10] workqueue: separate iteration role from worker_idr Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 06/10] workqueue: convert worker_idr to worker_ida Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 07/10] workqueue: narrow the protection range of manager_mutex Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 08/10] workqueue: rename manager_mutex to attach_mutex Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 09/10] workqueue: separate pool-attaching code out from create_worker() Lai Jiangshan
2014-05-20  9:46       ` [PATCH V3 10/10] workqueue: use generic attach/detach routine for rescuers Lai Jiangshan
2014-05-20 15:00       ` [PATCH V3 00/10] workqueue: async worker destruction and worker attaching/detaching Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).