[PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID
@ 2013-02-18 16:12 Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 01/15] workqueue: add lock_work_pool() Lai Jiangshan
                   ` (15 more replies)
  0 siblings, 16 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Core patches are patch 1, patch 9, patch 13

Patch 1: enhance locking
Patch 9: recorde worker id to work->data instead of pool id
	 lookup worker via worker ID if offq
Patch 13:also lookup worker via worker ID if running&&queued,
	 remove lookup via hashtable

Patch 6-8: ensure modification to worker->pool is under pool lock held
Patch 14: remove hashtable totally

other patch is preparing-patch or cleanup.


Lai Jiangshan (15):
  workqueue: add lock_work_pool()
  workqueue: allow more work_pool id space
  workqueue: remname worker->id to worker->id_in_pool
  workqueue: add worker's global worker ID
  workqueue: only set pool id when the work is running
  workqueue: use current instead of worker->task in
    worker_maybe_bind_and_lock()
  workqueue: change argument of worker_maybe_bind_and_lock() to pool
  workqueue: only change worker->pool with pool lock held
  workqueue: use worker id instead
  workqueue: avoid unneeded calls to get_work_cwq(work)
  workqueue: split work_flags to delayed_flags and color_flags in
    __queue_work()
  workqueue: add extra flags to set_work_worker_and_keep_pending()
  workqueue: also record worker in work->data if running&&queued
  workqueue: convert busy hash to busy list
  workqueue: queue worker to busy list

 include/linux/workqueue.h   |   25 +-
 kernel/workqueue.c          |  522 ++++++++++++++++++++++++-------------------
 kernel/workqueue_internal.h |   17 +-
 3 files changed, 314 insertions(+), 250 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V2 01/15] workqueue: add lock_work_pool()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 02/15] workqueue: allow more work_pool id space Lai Jiangshan
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

get_work_pool() is too weak, the caller *always* has to lock and check.
We merge all the code and name it lock_work_pool() and remove get_work_pool().

This patch has a little overhead for __queue_work() and try_to_grab_pending():
	Even worker_pool_by_id(pool_id) == get_cwq(cpu, wq)->pool,
	It will still look up the worker.
	But this lookup is neeeded in later patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |  170 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 93 insertions(+), 77 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ea7f696..f90d0bd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -516,10 +516,8 @@ static int work_next_color(int color)
  * work->data.  These functions should only be called while the work is
  * owned - ie. while the PENDING bit is set.
  *
- * get_work_pool() and get_work_cwq() can be used to obtain the pool or cwq
- * corresponding to a work.  Pool is available once the work has been
- * queued anywhere after initialization until it is sync canceled.  cwq is
- * available only while the work item is queued.
+ * get_work_cwq() can be used to obtain the cwq corresponding to a work.
+ * It is available only while the work item is queued.
  *
  * %WORK_OFFQ_CANCELING is used to mark a work item which is being
  * canceled.  While being canceled, a work item may have its PENDING set
@@ -578,31 +576,6 @@ static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
 }
 
 /**
- * get_work_pool - return the worker_pool a given work was associated with
- * @work: the work item of interest
- *
- * Return the worker_pool @work was last associated with.  %NULL if none.
- */
-static struct worker_pool *get_work_pool(struct work_struct *work)
-{
-	unsigned long data = atomic_long_read(&work->data);
-	struct worker_pool *pool;
-	int pool_id;
-
-	if (data & WORK_STRUCT_CWQ)
-		return ((struct cpu_workqueue_struct *)
-			(data & WORK_STRUCT_WQ_DATA_MASK))->pool;
-
-	pool_id = data >> WORK_OFFQ_POOL_SHIFT;
-	if (pool_id == WORK_OFFQ_POOL_NONE)
-		return NULL;
-
-	pool = worker_pool_by_id(pool_id);
-	WARN_ON_ONCE(!pool);
-	return pool;
-}
-
-/**
  * get_work_pool_id - return the worker pool ID a given work is associated with
  * @work: the work item of interest
  *
@@ -921,6 +894,64 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 }
 
 /**
+ * lock_work_pool - return and lock the pool a given work was associated with
+ * @work: the work item of interest
+ * @queued_cwq: set to the queued cwq if the work is still queued
+ * @running_worker: set to the running worker if the work is still running
+ *
+ * CONTEXT:
+ * local_irq_disable().
+ *
+ * Return the worker_pool @work was last associated with.  %NULL if none.
+ */
+
+static
+struct worker_pool *lock_work_pool(struct work_struct *work,
+				   struct cpu_workqueue_struct **queued_cwq,
+				   struct worker **running_worker)
+{
+	unsigned long data = atomic_long_read(&work->data);
+	unsigned long pool_id;
+	struct worker_pool *pool;
+	struct cpu_workqueue_struct *cwq;
+	struct worker *worker = NULL;
+
+	if (data & WORK_STRUCT_CWQ) {
+		cwq = (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
+		pool = cwq->pool;
+		spin_lock(&pool->lock);
+		/*
+		 * work->data is guaranteed to point to cwq only while the work
+		 * item is queued on cwq->wq, and both updating work->data to
+		 * point to cwq on queueing and to pool on dequeueing are done
+		 * under cwq->pool->lock.  This in turn guarantees that,
+		 * if work->data points to cwq which is associated with a
+		 * locked pool, the work item is currently queued on that pool.
+		 */
+		cwq = get_work_cwq(work);
+		if (cwq && cwq->pool == pool) {
+			*queued_cwq = cwq;
+			worker = find_worker_executing_work(pool, work);
+		}
+	} else {
+		pool_id = data >> WORK_OFFQ_POOL_SHIFT;
+		if (pool_id == WORK_OFFQ_POOL_NONE)
+			return NULL;
+
+		pool = worker_pool_by_id(pool_id);
+		if (!pool)
+			return NULL;
+
+		spin_lock(&pool->lock);
+		worker = find_worker_executing_work(pool, work);
+	}
+
+	if (worker)
+		*running_worker = worker;
+	return pool;
+}
+
+/**
  * move_linked_works - move linked works to a list
  * @work: start of series of works to be scheduled
  * @head: target list to append @work to
@@ -1053,7 +1084,8 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 			       unsigned long *flags)
 {
 	struct worker_pool *pool;
-	struct cpu_workqueue_struct *cwq;
+	struct cpu_workqueue_struct *cwq = NULL;
+	struct worker *worker = NULL;
 
 	local_irq_save(*flags);
 
@@ -1078,21 +1110,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	 * The queueing is in progress, or it is already queued. Try to
 	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
 	 */
-	pool = get_work_pool(work);
+	pool = lock_work_pool(work, &cwq, &worker);
 	if (!pool)
 		goto fail;
 
-	spin_lock(&pool->lock);
-	/*
-	 * work->data is guaranteed to point to cwq only while the work
-	 * item is queued on cwq->wq, and both updating work->data to point
-	 * to cwq on queueing and to pool on dequeueing are done under
-	 * cwq->pool->lock.  This in turn guarantees that, if work->data
-	 * points to cwq which is associated with a locked pool, the work
-	 * item is currently queued on that pool.
-	 */
-	cwq = get_work_cwq(work);
-	if (cwq && cwq->pool == pool) {
+	if (cwq) {
 		debug_work_deactivate(work);
 
 		/*
@@ -1199,34 +1221,27 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	/* determine the cwq to use */
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct worker_pool *last_pool;
+		struct worker *worker = NULL;
+		struct cpu_workqueue_struct *dummy_cwq;
 
 		if (cpu == WORK_CPU_UNBOUND)
 			cpu = raw_smp_processor_id();
 
 		/*
 		 * It's multi cpu.  If @work was previously on a different
-		 * cpu, it might still be running there, in which case the
-		 * work needs to be queued on that cpu to guarantee
+		 * pool, it might still be running there, in which case the
+		 * work needs to be queued on that pool to guarantee
 		 * non-reentrancy.
 		 */
 		cwq = get_cwq(cpu, wq);
-		last_pool = get_work_pool(work);
-
-		if (last_pool && last_pool != cwq->pool) {
-			struct worker *worker;
-
-			spin_lock(&last_pool->lock);
-
-			worker = find_worker_executing_work(last_pool, work);
+		last_pool = lock_work_pool(work, &dummy_cwq, &worker);
 
-			if (worker && worker->current_cwq->wq == wq) {
-				cwq = get_cwq(last_pool->cpu, wq);
-			} else {
-				/* meh... not running there, queue here */
+		if (worker && worker->current_cwq->wq == wq) {
+			cwq = get_cwq(last_pool->cpu, wq);
+		} else if (cwq->pool != last_pool) {
+			/* meh... not running there, queue here */
+			if (last_pool)
 				spin_unlock(&last_pool->lock);
-				spin_lock(&cwq->pool->lock);
-			}
-		} else {
 			spin_lock(&cwq->pool->lock);
 		}
 	} else {
@@ -2749,25 +2764,22 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 {
 	struct worker *worker = NULL;
 	struct worker_pool *pool;
-	struct cpu_workqueue_struct *cwq;
+	struct cpu_workqueue_struct *cwq = NULL;
 
 	might_sleep();
-	pool = get_work_pool(work);
-	if (!pool)
+	local_irq_disable();
+	pool = lock_work_pool(work, &cwq, &worker);
+	if (!pool) {
+		local_irq_enable();
 		return false;
+	}
 
-	spin_lock_irq(&pool->lock);
-	/* see the comment in try_to_grab_pending() with the same code */
-	cwq = get_work_cwq(work);
-	if (cwq) {
-		if (unlikely(cwq->pool != pool))
-			goto already_gone;
-	} else {
-		worker = find_worker_executing_work(pool, work);
-		if (!worker)
-			goto already_gone;
+	if (cwq)
+		worker = NULL;
+	else if (worker)
 		cwq = worker->current_cwq;
-	}
+	else
+		goto already_gone;
 
 	insert_wq_barrier(cwq, barr, work, worker);
 	spin_unlock_irq(&pool->lock);
@@ -3386,19 +3398,23 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
  */
 unsigned int work_busy(struct work_struct *work)
 {
-	struct worker_pool *pool = get_work_pool(work);
+	struct worker_pool *pool;
+	struct cpu_workqueue_struct *cwq = NULL;
+	struct worker *worker = NULL;
 	unsigned long flags;
 	unsigned int ret = 0;
 
 	if (work_pending(work))
 		ret |= WORK_BUSY_PENDING;
 
+	local_irq_save(flags);
+	pool = lock_work_pool(work, &cwq, &worker);
 	if (pool) {
-		spin_lock_irqsave(&pool->lock, flags);
-		if (find_worker_executing_work(pool, work))
+		if (worker)
 			ret |= WORK_BUSY_RUNNING;
-		spin_unlock_irqrestore(&pool->lock, flags);
+		spin_unlock(&pool->lock);
 	}
+	local_irq_restore(flags);
 
 	return ret;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 02/15] workqueue: allow more work_pool id space
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 01/15] workqueue: add lock_work_pool() Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-19 20:19   ` [PATCH 1/4] workqueue: allow more off-queue flag space Tejun Heo
  2013-02-18 16:12 ` [PATCH V2 03/15] workqueue: remname current worker->id to worker->id_in_pool Lai Jiangshan
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

color bits is not used when offq, so we reuse them for pool IDs.
thus we will have more pool IDs.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a3d7556..0be57d4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -68,7 +68,7 @@ enum {
 				  WORK_STRUCT_COLOR_BITS,
 
 	/* data contains off-queue information when !WORK_STRUCT_CWQ */
-	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
+	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_COLOR_SHIFT,
 
 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 03/15] workqueue: remname current worker->id to worker->id_in_pool
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 01/15] workqueue: add lock_work_pool() Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 02/15] workqueue: allow more work_pool id space Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 04/15] workqueue: add worker's global worker ID Lai Jiangshan
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

We will use worker->id for global worker id.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   20 +++++++++++---------
 kernel/workqueue_internal.h |    2 +-
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f90d0bd..764ad45 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1722,10 +1722,10 @@ static struct worker *create_worker(struct worker_pool *pool)
 {
 	const char *pri = std_worker_pool_pri(pool) ? "H" : "";
 	struct worker *worker = NULL;
-	int id = -1;
+	int id_in_pool = -1;
 
 	spin_lock_irq(&pool->lock);
-	while (ida_get_new(&pool->worker_ida, &id)) {
+	while (ida_get_new(&pool->worker_ida, &id_in_pool)) {
 		spin_unlock_irq(&pool->lock);
 		if (!ida_pre_get(&pool->worker_ida, GFP_KERNEL))
 			goto fail;
@@ -1738,15 +1738,17 @@ static struct worker *create_worker(struct worker_pool *pool)
 		goto fail;
 
 	worker->pool = pool;
-	worker->id = id;
+	worker->id_in_pool = id_in_pool;
 
 	if (pool->cpu != WORK_CPU_UNBOUND)
 		worker->task = kthread_create_on_node(worker_thread,
 					worker, cpu_to_node(pool->cpu),
-					"kworker/%u:%d%s", pool->cpu, id, pri);
+					"kworker/%u:%d%s", pool->cpu,
+					id_in_pool, pri);
 	else
 		worker->task = kthread_create(worker_thread, worker,
-					      "kworker/u:%d%s", id, pri);
+					      "kworker/u:%d%s",
+					      id_in_pool, pri);
 	if (IS_ERR(worker->task))
 		goto fail;
 
@@ -1771,9 +1773,9 @@ static struct worker *create_worker(struct worker_pool *pool)
 
 	return worker;
 fail:
-	if (id >= 0) {
+	if (id_in_pool >= 0) {
 		spin_lock_irq(&pool->lock);
-		ida_remove(&pool->worker_ida, id);
+		ida_remove(&pool->worker_ida, id_in_pool);
 		spin_unlock_irq(&pool->lock);
 	}
 	kfree(worker);
@@ -1809,7 +1811,7 @@ static void start_worker(struct worker *worker)
 static void destroy_worker(struct worker *worker)
 {
 	struct worker_pool *pool = worker->pool;
-	int id = worker->id;
+	int id_in_pool = worker->id_in_pool;
 
 	/* sanity check frenzy */
 	BUG_ON(worker->current_work);
@@ -1829,7 +1831,7 @@ static void destroy_worker(struct worker *worker)
 	kfree(worker);
 
 	spin_lock_irq(&pool->lock);
-	ida_remove(&pool->worker_ida, id);
+	ida_remove(&pool->worker_ida, id_in_pool);
 }
 
 static void idle_worker_timeout(unsigned long __pool)
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 328be4a..e6afb59 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -35,7 +35,7 @@ struct worker {
 	/* 64 bytes boundary on 64bit, 32 on 32bit */
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-	int			id;		/* I: worker id */
+	int			id_in_pool;	/* I: worker id in the pool */
 
 	/* for rebinding worker to CPU */
 	struct work_struct	rebind_work;	/* L: for busy worker */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 04/15] workqueue: add worker's global worker ID
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (2 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 03/15] workqueue: remname current worker->id to worker->id_in_pool Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 05/15] workqueue: only set pool id when the work is running Lai Jiangshan
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Add new worker->id which is allocated from worker_idr.  This
will be used to record the last running worker in work->data.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   28 ++++++++++++++++++++++++++++
 kernel/workqueue_internal.h |    1 +
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 764ad45..5502188 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -435,6 +435,10 @@ static struct worker_pool unbound_std_worker_pools[NR_STD_WORKER_POOLS];
 static DEFINE_MUTEX(worker_pool_idr_mutex);
 static DEFINE_IDR(worker_pool_idr);
 
+/* idr of all workers */
+static DEFINE_MUTEX(worker_idr_mutex);
+static DEFINE_IDR(worker_idr);
+
 static int worker_thread(void *__worker);
 
 static struct worker_pool *std_worker_pools(int cpu)
@@ -450,6 +454,26 @@ static int std_worker_pool_pri(struct worker_pool *pool)
 	return pool - std_worker_pools(pool->cpu);
 }
 
+/* allocate ID and assign it to @worker */
+static int worker_assign_id(struct worker *worker)
+{
+	int ret;
+
+	mutex_lock(&worker_idr_mutex);
+	idr_pre_get(&worker_idr, GFP_KERNEL);
+	ret = idr_get_new(&worker_idr, worker, &worker->id);
+	mutex_unlock(&worker_idr_mutex);
+
+	return ret;
+}
+
+static void free_worker_id(struct worker *worker)
+{
+	mutex_lock(&worker_idr_mutex);
+	idr_remove(&worker_idr, worker->id);
+	mutex_unlock(&worker_idr_mutex);
+}
+
 /* allocate ID and assign it to @pool */
 static int worker_pool_assign_id(struct worker_pool *pool)
 {
@@ -1740,6 +1764,9 @@ static struct worker *create_worker(struct worker_pool *pool)
 	worker->pool = pool;
 	worker->id_in_pool = id_in_pool;
 
+	if (worker_assign_id(worker))
+		goto fail;
+
 	if (pool->cpu != WORK_CPU_UNBOUND)
 		worker->task = kthread_create_on_node(worker_thread,
 					worker, cpu_to_node(pool->cpu),
@@ -1828,6 +1855,7 @@ static void destroy_worker(struct worker *worker)
 	spin_unlock_irq(&pool->lock);
 
 	kthread_stop(worker->task);
+	free_worker_id(worker);
 	kfree(worker);
 
 	spin_lock_irq(&pool->lock);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index e6afb59..3694bc1 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -36,6 +36,7 @@ struct worker {
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
 	int			id_in_pool;	/* I: worker id in the pool */
+	int			id;		/* I: worker id (global) */
 
 	/* for rebinding worker to CPU */
 	struct work_struct	rebind_work;	/* L: for busy worker */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 05/15] workqueue: only set pool id when the work is running
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (3 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 04/15] workqueue: add worker's global worker ID Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 06/15] workqueue: use current instead of worker->task in worker_maybe_bind_and_lock() Lai Jiangshan
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

When a work is dequeued via try_to_grab_pending(), its pool id is recored
in work->data. but this recording is useless when the work is not running.

In this patch, we only record pool id when the work is running.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5502188..06d6211 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1139,6 +1139,8 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		goto fail;
 
 	if (cwq) {
+		int pool_id;
+
 		debug_work_deactivate(work);
 
 		/*
@@ -1154,8 +1156,13 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		list_del_init(&work->entry);
 		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
 
-		/* work->data points to cwq iff queued, point to pool */
-		set_work_pool_and_keep_pending(work, pool->id);
+		/* work is dequeued, work->data points to pool iff running */
+		if (worker)
+			pool_id = pool->id;
+		else
+			pool_id = WORK_OFFQ_POOL_NONE;
+
+		set_work_pool_and_keep_pending(work, pool_id);
 
 		spin_unlock(&pool->lock);
 		return 1;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 06/15] workqueue: use current instead of worker->task in worker_maybe_bind_and_lock()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (4 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 05/15] workqueue: only set pool id when the work is running Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-19 20:20   ` [PATCH 2/4] workqueue: use %current " Tejun Heo
  2013-02-18 16:12 ` [PATCH V2 07/15] workqueue: change argument of worker_maybe_bind_and_lock() to pool Lai Jiangshan
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

worker_maybe_bind_and_lock() uses both @task and @current and the same time,
they are the same(worker_maybe_bind_and_lock() can only be called by current
worker task)

We make it uses @current only.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    7 +++----
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 06d6211..b47d1af 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1560,7 +1560,7 @@ static void worker_leave_idle(struct worker *worker)
  * flushed from cpu callbacks while cpu is going down, they are
  * guaranteed to execute on the cpu.
  *
- * This function is to be used by rogue workers and rescuers to bind
+ * This function is to be used by unbound workers and rescuers to bind
  * themselves to the target cpu and may race with cpu going down or
  * coming online.  kthread_bind() can't be used because it may put the
  * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
@@ -1585,7 +1585,6 @@ static bool worker_maybe_bind_and_lock(struct worker *worker)
 __acquires(&pool->lock)
 {
 	struct worker_pool *pool = worker->pool;
-	struct task_struct *task = worker->task;
 
 	while (true) {
 		/*
@@ -1595,12 +1594,12 @@ __acquires(&pool->lock)
 		 * against POOL_DISASSOCIATED.
 		 */
 		if (!(pool->flags & POOL_DISASSOCIATED))
-			set_cpus_allowed_ptr(task, get_cpu_mask(pool->cpu));
+			set_cpus_allowed_ptr(current, get_cpu_mask(pool->cpu));
 
 		spin_lock_irq(&pool->lock);
 		if (pool->flags & POOL_DISASSOCIATED)
 			return false;
-		if (task_cpu(task) == pool->cpu &&
+		if (task_cpu(current) == pool->cpu &&
 		    cpumask_equal(&current->cpus_allowed,
 				  get_cpu_mask(pool->cpu)))
 			return true;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 07/15] workqueue: change argument of worker_maybe_bind_and_lock() to pool
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (5 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 06/15] workqueue: use current instead of worker->task in worker_maybe_bind_and_lock() Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-19 20:20   ` [PATCH 3/4] workqueue: change argument of worker_maybe_bind_and_lock() to @pool Tejun Heo
  2013-02-18 16:12 ` [PATCH V2 08/15] workqueue: only change worker->pool with pool lock held Lai Jiangshan
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

We plan to only set rescuer->pool in the C.S. of pool->lock,
so when worker_maybe_bind_and_lock() is called, the worker(rescuer)->pool
may NULL. Thus we need to change the worker_maybe_bind_and_lock() API,
we need to add a @pool argument to it or use @pool argument instead of
@worker argument. we choice the later one.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b47d1af..b987195 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1552,8 +1552,10 @@ static void worker_leave_idle(struct worker *worker)
 }
 
 /**
- * worker_maybe_bind_and_lock - bind worker to its cpu if possible and lock pool
- * @worker: self
+ * worker_maybe_bind_and_lock - bind current worker if possible and lock pool
+ * @pool: target pool
+ *
+ * Bind the worker(of current task) to the cpu of the pool if it is associated.
  *
  * Works which are scheduled while the cpu is online must at least be
  * scheduled to a worker which is bound to the cpu so that if they are
@@ -1581,11 +1583,9 @@ static void worker_leave_idle(struct worker *worker)
  * %true if the associated pool is online (@worker is successfully
  * bound), %false if offline.
  */
-static bool worker_maybe_bind_and_lock(struct worker *worker)
+static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
 __acquires(&pool->lock)
 {
-	struct worker_pool *pool = worker->pool;
-
 	while (true) {
 		/*
 		 * The following call may fail, succeed or succeed
@@ -1623,7 +1623,7 @@ __acquires(&pool->lock)
 static void idle_worker_rebind(struct worker *worker)
 {
 	/* CPU may go down again inbetween, clear UNBOUND only on success */
-	if (worker_maybe_bind_and_lock(worker))
+	if (worker_maybe_bind_and_lock(worker->pool))
 		worker_clr_flags(worker, WORKER_UNBOUND);
 
 	/* rebind complete, become available again */
@@ -1641,7 +1641,7 @@ static void busy_worker_rebind_fn(struct work_struct *work)
 {
 	struct worker *worker = container_of(work, struct worker, rebind_work);
 
-	if (worker_maybe_bind_and_lock(worker))
+	if (worker_maybe_bind_and_lock(worker->pool))
 		worker_clr_flags(worker, WORKER_UNBOUND);
 
 	spin_unlock_irq(&worker->pool->lock);
@@ -2093,7 +2093,7 @@ static bool manage_workers(struct worker *worker)
 		 * on @pool's current state.  Try it and adjust
 		 * %WORKER_UNBOUND accordingly.
 		 */
-		if (worker_maybe_bind_and_lock(worker))
+		if (worker_maybe_bind_and_lock(pool))
 			worker->flags &= ~WORKER_UNBOUND;
 		else
 			worker->flags |= WORKER_UNBOUND;
@@ -2413,7 +2413,7 @@ repeat:
 
 		/* migrate to the target cpu if possible */
 		rescuer->pool = pool;
-		worker_maybe_bind_and_lock(rescuer);
+		worker_maybe_bind_and_lock(pool);
 
 		/*
 		 * Slurp in all works issued via this workqueue and
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 08/15] workqueue: only change worker->pool with pool lock held
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (6 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 07/15] workqueue: change argument of worker_maybe_bind_and_lock() to pool Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-19 20:22   ` [PATCH 4/4] workqueue: better define synchronization rule around rescuer->pool updates Tejun Heo
  2013-02-18 16:12 ` [PATCH V2 09/15] workqueue: use worker id in work->data instead Lai Jiangshan
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

We ensure these semantics:
	worker->pool is set to pool if the worker is associated to the pool.
		(normal worker is associated to its pool when created,
		 rescuer is associated to a pool dynamically.)
	worker->pool is set to NULL if the worker is de-associated to the pool.
	It is done with pool->lock held in ether set of above

Thus we have this semantic:
	If pool->lock is held and worker->pool==pool, we can determine that
	the worker is associated to the pool now.


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |    3 ++-
 kernel/workqueue_internal.h |    1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b987195..9086a33 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2412,8 +2412,8 @@ repeat:
 		mayday_clear_cpu(cpu, wq->mayday_mask);
 
 		/* migrate to the target cpu if possible */
-		rescuer->pool = pool;
 		worker_maybe_bind_and_lock(pool);
+		rescuer->pool = pool;
 
 		/*
 		 * Slurp in all works issued via this workqueue and
@@ -2434,6 +2434,7 @@ repeat:
 		if (keep_working(pool))
 			wake_up_worker(pool);
 
+		rescuer->pool = NULL;
 		spin_unlock_irq(&pool->lock);
 	}
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 3694bc1..1040abc 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -32,6 +32,7 @@ struct worker {
 	struct list_head	scheduled;	/* L: scheduled works */
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
+						/* L: for rescuers */
 	/* 64 bytes boundary on 64bit, 32 on 32bit */
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 09/15] workqueue: use worker id in work->data instead
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (7 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 08/15] workqueue: only change worker->pool with pool lock held Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 10/15] workqueue: avoid unneeded calls to get_work_cwq() Lai Jiangshan
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Every time, when we need to find the executing worker from work,
we need 2 steps:
	find the pool from the idr by pool id.
	find the worker from the hash table of the pool.

Now we merge them as one step: find the worker directly from the idr by worker ID.
(lock_work_pool(). If the work is queued, we still use hash table.)

It makes the code more straightforward.

In future, we may add "percpu worker ID <--> worker" mapping cache when needed.

And we are planing to add non-std worker_pool, we still don't know how to
implement worker_pool_by_id() for non-std worker_pool, this patch solves it.

This patch slows down the very-slow-path destroy_worker(), if it is required,
we will move the synchronize_sched() out.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |   20 +++---
 kernel/workqueue.c        |  140 ++++++++++++++++++++++-----------------------
 2 files changed, 78 insertions(+), 82 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 0be57d4..a8db8c4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -73,20 +73,20 @@ enum {
 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
 
 	/*
-	 * When a work item is off queue, its high bits point to the last
-	 * pool it was on.  Cap at 31 bits and use the highest number to
-	 * indicate that no pool is associated.
+	 * When a work item starts to be executed, its high bits point to the
+	 * worker it is running on.  Cap at 31 bits and use the highest number
+	 * to indicate that no worker is associated.
 	 */
 	WORK_OFFQ_FLAG_BITS	= 1,
-	WORK_OFFQ_POOL_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
-	WORK_OFFQ_LEFT		= BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT,
-	WORK_OFFQ_POOL_BITS	= WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
-	WORK_OFFQ_POOL_NONE	= (1LU << WORK_OFFQ_POOL_BITS) - 1,
+	WORK_OFFQ_WORKER_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
+	WORK_OFFQ_LEFT		= BITS_PER_LONG - WORK_OFFQ_WORKER_SHIFT,
+	WORK_OFFQ_WORKER_BITS	= WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
+	WORK_OFFQ_WORKER_NONE	= (1LU << WORK_OFFQ_WORKER_BITS) - 1,
 
 	/* convenience constants */
 	WORK_STRUCT_FLAG_MASK	= (1UL << WORK_STRUCT_FLAG_BITS) - 1,
 	WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
-	WORK_STRUCT_NO_POOL	= (unsigned long)WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT,
+	WORK_STRUCT_NO_WORKER	= (unsigned long)WORK_OFFQ_WORKER_NONE << WORK_OFFQ_WORKER_SHIFT,
 
 	/* bit mask for work_busy() return values */
 	WORK_BUSY_PENDING	= 1 << 0,
@@ -102,9 +102,9 @@ struct work_struct {
 #endif
 };
 
-#define WORK_DATA_INIT()	ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL)
+#define WORK_DATA_INIT()	ATOMIC_LONG_INIT(WORK_STRUCT_NO_WORKER)
 #define WORK_DATA_STATIC_INIT()	\
-	ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL | WORK_STRUCT_STATIC)
+	ATOMIC_LONG_INIT(WORK_STRUCT_NO_WORKER | WORK_STRUCT_STATIC)
 
 struct delayed_work {
 	struct work_struct work;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9086a33..e14a03e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -42,6 +42,7 @@
 #include <linux/lockdep.h>
 #include <linux/idr.h>
 #include <linux/hashtable.h>
+#include <linux/rcupdate.h>
 
 #include "workqueue_internal.h"
 
@@ -125,7 +126,6 @@ enum {
 struct worker_pool {
 	spinlock_t		lock;		/* the pool lock */
 	unsigned int		cpu;		/* I: the associated cpu */
-	int			id;		/* I: pool ID */
 	unsigned int		flags;		/* X: flags */
 
 	struct list_head	worklist;	/* L: list of pending works */
@@ -431,10 +431,6 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct worker_pool [NR_STD_WORKER_POOLS],
 				     cpu_std_worker_pools);
 static struct worker_pool unbound_std_worker_pools[NR_STD_WORKER_POOLS];
 
-/* idr of all pools */
-static DEFINE_MUTEX(worker_pool_idr_mutex);
-static DEFINE_IDR(worker_pool_idr);
-
 /* idr of all workers */
 static DEFINE_MUTEX(worker_idr_mutex);
 static DEFINE_IDR(worker_idr);
@@ -474,28 +470,6 @@ static void free_worker_id(struct worker *worker)
 	mutex_unlock(&worker_idr_mutex);
 }
 
-/* allocate ID and assign it to @pool */
-static int worker_pool_assign_id(struct worker_pool *pool)
-{
-	int ret;
-
-	mutex_lock(&worker_pool_idr_mutex);
-	idr_pre_get(&worker_pool_idr, GFP_KERNEL);
-	ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
-	mutex_unlock(&worker_pool_idr_mutex);
-
-	return ret;
-}
-
-/*
- * Lookup worker_pool by id.  The idr currently is built during boot and
- * never modified.  Don't worry about locking for now.
- */
-static struct worker_pool *worker_pool_by_id(int pool_id)
-{
-	return idr_find(&worker_pool_idr, pool_id);
-}
-
 static struct worker_pool *get_std_worker_pool(int cpu, bool highpri)
 {
 	struct worker_pool *pools = std_worker_pools(cpu);
@@ -533,10 +507,11 @@ static int work_next_color(int color)
 /*
  * While queued, %WORK_STRUCT_CWQ is set and non flag bits of a work's data
  * contain the pointer to the queued cwq.  Once execution starts, the flag
- * is cleared and the high bits contain OFFQ flags and pool ID.
+ * is cleared and the high bits contain OFFQ flags and worker ID.
  *
- * set_work_cwq(), set_work_pool_and_clear_pending(), mark_work_canceling()
- * and clear_work_data() can be used to set the cwq, pool or clear
+ * set_work_cwq(), set_work_worker_and_keep_pending(),
+ * set_work_worker_and_clear_pending(), mark_work_canceling()
+ * and clear_work_data() can be used to set the cwq, worker or clear
  * work->data.  These functions should only be called while the work is
  * owned - ie. while the PENDING bit is set.
  *
@@ -563,15 +538,19 @@ static void set_work_cwq(struct work_struct *work,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
-static void set_work_pool_and_keep_pending(struct work_struct *work,
-					   int pool_id)
+static inline unsigned long worker_id_to_data(int worker_id)
 {
-	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT,
-		      WORK_STRUCT_PENDING);
+	return (unsigned long)worker_id << WORK_OFFQ_WORKER_SHIFT;
 }
 
-static void set_work_pool_and_clear_pending(struct work_struct *work,
-					    int pool_id)
+static void set_work_worker_and_keep_pending(struct work_struct *work,
+					     int worker_id)
+{
+	set_work_data(work, worker_id_to_data(worker_id), WORK_STRUCT_PENDING);
+}
+
+static void set_work_worker_and_clear_pending(struct work_struct *work,
+					      int worker_id)
 {
 	/*
 	 * The following wmb is paired with the implied mb in
@@ -580,13 +559,13 @@ static void set_work_pool_and_clear_pending(struct work_struct *work,
 	 * owner.
 	 */
 	smp_wmb();
-	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
+	set_work_data(work, worker_id_to_data(worker_id), 0);
 }
 
 static void clear_work_data(struct work_struct *work)
 {
-	smp_wmb();	/* see set_work_pool_and_clear_pending() */
-	set_work_data(work, WORK_STRUCT_NO_POOL, 0);
+	smp_wmb();	/* see set_work_worker_and_clear_pending() */
+	set_work_data(work, WORK_STRUCT_NO_WORKER, 0);
 }
 
 static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
@@ -600,29 +579,28 @@ static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
 }
 
 /**
- * get_work_pool_id - return the worker pool ID a given work is associated with
+ * get_work_worker_id - return the worker ID a given work was last running on
  * @work: the work item of interest
  *
- * Return the worker_pool ID @work was last associated with.
- * %WORK_OFFQ_POOL_NONE if none.
+ * Return the worker ID @work was last running on.
+ * %WORK_OFFQ_WORKER_NONE if none.
  */
-static int get_work_pool_id(struct work_struct *work)
+static int get_work_worker_id(struct work_struct *work)
 {
 	unsigned long data = atomic_long_read(&work->data);
 
-	if (data & WORK_STRUCT_CWQ)
-		return ((struct cpu_workqueue_struct *)
-			(data & WORK_STRUCT_WQ_DATA_MASK))->pool->id;
+	if (WARN_ON_ONCE(data & WORK_STRUCT_CWQ))
+		return WORK_OFFQ_WORKER_NONE;
 
-	return data >> WORK_OFFQ_POOL_SHIFT;
+	return data >> WORK_OFFQ_WORKER_SHIFT;
 }
 
 static void mark_work_canceling(struct work_struct *work)
 {
-	unsigned long pool_id = get_work_pool_id(work);
+	unsigned long data = get_work_worker_id(work);
 
-	pool_id <<= WORK_OFFQ_POOL_SHIFT;
-	set_work_data(work, pool_id | WORK_OFFQ_CANCELING, WORK_STRUCT_PENDING);
+	data <<= WORK_OFFQ_WORKER_SHIFT;
+	set_work_data(work, data | WORK_OFFQ_CANCELING, WORK_STRUCT_PENDING);
 }
 
 static bool work_is_canceling(struct work_struct *work)
@@ -918,6 +896,26 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 }
 
 /**
+ * worker_by_id - lookup worker by id
+ * @id: the worker id of of interest
+ *
+ * CONTEXT:
+ * local_irq_disable().
+ *
+ * local_irq_disable() implies rcu_read_lock_sched().
+ */
+static struct worker *worker_by_id(int id)
+{
+	if (!WARN_ON_ONCE(irqs_disabled()))
+		return NULL;
+
+	if (id == WORK_OFFQ_WORKER_NONE)
+		return NULL;
+
+	return idr_find(&worker_idr, id);
+}
+
+/**
  * lock_work_pool - return and lock the pool a given work was associated with
  * @work: the work item of interest
  * @queued_cwq: set to the queued cwq if the work is still queued
@@ -935,7 +933,6 @@ struct worker_pool *lock_work_pool(struct work_struct *work,
 				   struct worker **running_worker)
 {
 	unsigned long data = atomic_long_read(&work->data);
-	unsigned long pool_id;
 	struct worker_pool *pool;
 	struct cpu_workqueue_struct *cwq;
 	struct worker *worker = NULL;
@@ -958,16 +955,15 @@ struct worker_pool *lock_work_pool(struct work_struct *work,
 			worker = find_worker_executing_work(pool, work);
 		}
 	} else {
-		pool_id = data >> WORK_OFFQ_POOL_SHIFT;
-		if (pool_id == WORK_OFFQ_POOL_NONE)
-			return NULL;
-
-		pool = worker_pool_by_id(pool_id);
-		if (!pool)
+		worker = worker_by_id(data >> WORK_OFFQ_WORKER_SHIFT);
+		if (!worker)
 			return NULL;
 
+		pool = worker->pool;
 		spin_lock(&pool->lock);
-		worker = find_worker_executing_work(pool, work);
+		if (pool != worker->pool || worker->current_work != work ||
+		    worker->current_func != work->func)
+			worker = NULL;
 	}
 
 	if (worker)
@@ -1139,7 +1135,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		goto fail;
 
 	if (cwq) {
-		int pool_id;
+		int worker_id;
 
 		debug_work_deactivate(work);
 
@@ -1158,11 +1154,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 
 		/* work is dequeued, work->data points to pool iff running */
 		if (worker)
-			pool_id = pool->id;
+			worker_id = worker->id;
 		else
-			pool_id = WORK_OFFQ_POOL_NONE;
+			worker_id = WORK_OFFQ_WORKER_NONE;
 
-		set_work_pool_and_keep_pending(work, pool_id);
+		set_work_worker_and_keep_pending(work, worker_id);
 
 		spin_unlock(&pool->lock);
 		return 1;
@@ -1862,6 +1858,7 @@ static void destroy_worker(struct worker *worker)
 
 	kthread_stop(worker->task);
 	free_worker_id(worker);
+	synchronize_sched();
 	kfree(worker);
 
 	spin_lock_irq(&pool->lock);
@@ -2196,12 +2193,12 @@ __acquires(&pool->lock)
 		wake_up_worker(pool);
 
 	/*
-	 * Record the last pool and clear PENDING which should be the last
+	 * Record this worker and clear PENDING which should be the last
 	 * update to @work.  Also, do this inside @pool->lock so that
 	 * PENDING and queued state changes happen together while IRQ is
 	 * disabled.
 	 */
-	set_work_pool_and_clear_pending(work, pool->id);
+	set_work_worker_and_clear_pending(work, worker->id);
 
 	spin_unlock_irq(&pool->lock);
 
@@ -2961,8 +2958,8 @@ bool cancel_delayed_work(struct delayed_work *dwork)
 	if (unlikely(ret < 0))
 		return false;
 
-	set_work_pool_and_clear_pending(&dwork->work,
-					get_work_pool_id(&dwork->work));
+	set_work_worker_and_clear_pending(&dwork->work,
+					  get_work_worker_id(&dwork->work));
 	local_irq_restore(flags);
 	return ret;
 }
@@ -3780,9 +3777,11 @@ static int __init init_workqueues(void)
 {
 	unsigned int cpu;
 
-	/* make sure we have enough bits for OFFQ pool ID */
-	BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) <
-		     WORK_CPU_END * NR_STD_WORKER_POOLS);
+	/*
+	 * make sure we have enough bits for OFFQ worker ID,
+	 * at least a 4K stack for every worker.
+	 */
+	BUILD_BUG_ON((BITS_PER_LONG != 64) && (WORK_OFFQ_WORKER_SHIFT > 12));
 
 	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
 	hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
@@ -3808,9 +3807,6 @@ static int __init init_workqueues(void)
 
 			mutex_init(&pool->assoc_mutex);
 			ida_init(&pool->worker_ida);
-
-			/* alloc pool ID */
-			BUG_ON(worker_pool_assign_id(pool));
 		}
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 10/15] workqueue: avoid unneeded calls to get_work_cwq()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (8 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 09/15] workqueue: use worker id in work->data instead Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 11/15] workqueue: split work_flags to delayed_flags and color_flags in __queue_work() Lai Jiangshan
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Use already-known "cwq" instead of get_work_cwq(work) in try_to_grab_pending()
and cwq_activate_first_delayed().

It avoid unneeded calls to get_work_cwq() which becomes not so light-way
in later patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e14a03e..7ac6824 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1012,10 +1012,9 @@ static void move_linked_works(struct work_struct *work, struct list_head *head,
 		*nextp = n;
 }
 
-static void cwq_activate_delayed_work(struct work_struct *work)
+static void cwq_activate_delayed_work(struct work_struct *work,
+				      struct cpu_workqueue_struct *cwq)
 {
-	struct cpu_workqueue_struct *cwq = get_work_cwq(work);
-
 	trace_workqueue_activate_work(work);
 	move_linked_works(work, &cwq->pool->worklist, NULL);
 	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
@@ -1027,7 +1026,7 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
 	struct work_struct *work = list_first_entry(&cwq->delayed_works,
 						    struct work_struct, entry);
 
-	cwq_activate_delayed_work(work);
+	cwq_activate_delayed_work(work, cwq);
 }
 
 /**
@@ -1147,10 +1146,10 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		 * item is activated before grabbing.
 		 */
 		if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
-			cwq_activate_delayed_work(work);
+			cwq_activate_delayed_work(work, cwq);
 
 		list_del_init(&work->entry);
-		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
+		cwq_dec_nr_in_flight(cwq, get_work_color(work));
 
 		/* work is dequeued, work->data points to pool iff running */
 		if (worker)
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 11/15] workqueue: split work_flags to delayed_flags and color_flags in __queue_work()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (9 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 10/15] workqueue: avoid unneeded calls to get_work_cwq() Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 12/15] workqueue: add extra flags to set_work_worker_and_keep_pending() Lai Jiangshan
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Allow we use delayed_flags only in different path in later patches.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7ac6824..cdd5523 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1226,7 +1226,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 {
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
-	unsigned int work_flags;
+	unsigned int color_flags, delayed_flags = 0;
 	unsigned int req_cpu = cpu;
 
 	/*
@@ -1284,18 +1284,18 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	}
 
 	cwq->nr_in_flight[cwq->work_color]++;
-	work_flags = work_color_to_flags(cwq->work_color);
 
 	if (likely(cwq->nr_active < cwq->max_active)) {
 		trace_workqueue_activate_work(work);
 		cwq->nr_active++;
 		worklist = &cwq->pool->worklist;
 	} else {
-		work_flags |= WORK_STRUCT_DELAYED;
+		delayed_flags = WORK_STRUCT_DELAYED;
 		worklist = &cwq->delayed_works;
 	}
 
-	insert_work(cwq, work, worklist, work_flags);
+	color_flags = work_color_to_flags(cwq->work_color);
+	insert_work(cwq, work, worklist, color_flags | delayed_flags);
 
 	spin_unlock(&cwq->pool->lock);
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 12/15] workqueue: add extra flags to set_work_worker_and_keep_pending()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (10 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 11/15] workqueue: split work_flags to delayed_flags and color_flags in __queue_work() Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued Lai Jiangshan
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cdd5523..ab5c61a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -544,9 +544,11 @@ static inline unsigned long worker_id_to_data(int worker_id)
 }
 
 static void set_work_worker_and_keep_pending(struct work_struct *work,
-					     int worker_id)
+					     int worker_id,
+					     unsigned long extra_flags)
 {
-	set_work_data(work, worker_id_to_data(worker_id), WORK_STRUCT_PENDING);
+	set_work_data(work, worker_id_to_data(worker_id),
+			extra_flags | WORK_STRUCT_PENDING);
 }
 
 static void set_work_worker_and_clear_pending(struct work_struct *work,
@@ -1157,7 +1159,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		else
 			worker_id = WORK_OFFQ_WORKER_NONE;
 
-		set_work_worker_and_keep_pending(work, worker_id);
+		set_work_worker_and_keep_pending(work, worker_id, 0);
 
 		spin_unlock(&pool->lock);
 		return 1;
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (11 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 12/15] workqueue: add extra flags to set_work_worker_and_keep_pending() Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 19:50   ` Tejun Heo
  2013-02-18 16:12 ` [PATCH V2 14/15] workqueue: convert busy hash to busy list Lai Jiangshan
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Current, we always record cwq in work->data if the work is queued.
Even the work is running at the same time. It requires us to use hash table
to lookup the worker.

If we record the worker ID is work->data even the work is queued, we can
remove this hash table lookup:
	in lock_work_pool(), we use worker ID lookup instead
	in process_one_work(), we reduce any lookup in fast path!

Trade-off: get_work_cwq() becomes a little more heavy and slow!
	but (1)after our longterm effort, get_work_cwq() is now only called
	by mayday code which is unlikely path! and (2) get_work_cwq() is as fast
	as before in most cases.

The new semantic of get_work_cwq()
	get_work_cwq() requires the caller hold the pool lock and
	the work must be *queued* on the pool,
we can changes the semantic to this in future when needed(more code and check)
	get_work_cwq() requires the caller hold the pool lock and
	the work must be *owned* by the pool.
	or we can provide even more loose semantic,
	but we don't need loose semantic in any case currently, KISS.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h   |    3 +-
 kernel/workqueue.c          |  175 +++++++++++++++++++++++--------------------
 kernel/workqueue_internal.h |    4 +
 3 files changed, 101 insertions(+), 81 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a8db8c4..f0973a7 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -71,13 +71,14 @@ enum {
 	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_COLOR_SHIFT,
 
 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
+	WORK_OFFQ_REQUEUED	= (1 << (WORK_OFFQ_FLAG_BASE + 1)),
 
 	/*
 	 * When a work item starts to be executed, its high bits point to the
 	 * worker it is running on.  Cap at 31 bits and use the highest number
 	 * to indicate that no worker is associated.
 	 */
-	WORK_OFFQ_FLAG_BITS	= 1,
+	WORK_OFFQ_FLAG_BITS	= 2,
 	WORK_OFFQ_WORKER_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
 	WORK_OFFQ_LEFT		= BITS_PER_LONG - WORK_OFFQ_WORKER_SHIFT,
 	WORK_OFFQ_WORKER_BITS	= WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab5c61a..a23f4fc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -570,16 +570,6 @@ static void clear_work_data(struct work_struct *work)
 	set_work_data(work, WORK_STRUCT_NO_WORKER, 0);
 }
 
-static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
-{
-	unsigned long data = atomic_long_read(&work->data);
-
-	if (data & WORK_STRUCT_CWQ)
-		return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
-	else
-		return NULL;
-}
-
 /**
  * get_work_worker_id - return the worker ID a given work was last running on
  * @work: the work item of interest
@@ -849,55 +839,6 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
 }
 
 /**
- * find_worker_executing_work - find worker which is executing a work
- * @pool: pool of interest
- * @work: work to find worker for
- *
- * Find a worker which is executing @work on @pool by searching
- * @pool->busy_hash which is keyed by the address of @work.  For a worker
- * to match, its current execution should match the address of @work and
- * its work function.  This is to avoid unwanted dependency between
- * unrelated work executions through a work item being recycled while still
- * being executed.
- *
- * This is a bit tricky.  A work item may be freed once its execution
- * starts and nothing prevents the freed area from being recycled for
- * another work item.  If the same work item address ends up being reused
- * before the original execution finishes, workqueue will identify the
- * recycled work item as currently executing and make it wait until the
- * current execution finishes, introducing an unwanted dependency.
- *
- * This function checks the work item address, work function and workqueue
- * to avoid false positives.  Note that this isn't complete as one may
- * construct a work function which can introduce dependency onto itself
- * through a recycled work item.  Well, if somebody wants to shoot oneself
- * in the foot that badly, there's only so much we can do, and if such
- * deadlock actually occurs, it should be easy to locate the culprit work
- * function.
- *
- * CONTEXT:
- * spin_lock_irq(pool->lock).
- *
- * RETURNS:
- * Pointer to worker which is executing @work if found, NULL
- * otherwise.
- */
-static struct worker *find_worker_executing_work(struct worker_pool *pool,
-						 struct work_struct *work)
-{
-	struct worker *worker;
-	struct hlist_node *tmp;
-
-	hash_for_each_possible(pool->busy_hash, worker, tmp, hentry,
-			       (unsigned long)work)
-		if (worker->current_work == work &&
-		    worker->current_func == work->func)
-			return worker;
-
-	return NULL;
-}
-
-/**
  * worker_by_id - lookup worker by id
  * @id: the worker id of of interest
  *
@@ -917,6 +858,46 @@ static struct worker *worker_by_id(int id)
 	return idr_find(&worker_idr, id);
 }
 
+static struct worker *get_work_worker(struct work_struct *work)
+{
+	return worker_by_id(get_work_worker_id(work));
+}
+
+static
+struct cpu_workqueue_struct *get_work_cwq_no_running(struct work_struct *work)
+{
+	unsigned long data = atomic_long_read(&work->data);
+
+	if (data & WORK_STRUCT_CWQ)
+		return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
+	else
+		return NULL;
+}
+
+/**
+ * get_work_cwq - get cwq of the work
+ * @work: the work item of interest
+ *
+ * CONTEXT:
+ * spin_lock_irq(&pool->lock), the work must be queued on this pool
+ */
+static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
+{
+	unsigned long data = atomic_long_read(&work->data);
+	struct worker *worker;
+
+	if (data & WORK_STRUCT_CWQ) {
+		return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
+	} else if (data & WORK_OFFQ_REQUEUED) {
+		worker = worker_by_id(data >> WORK_OFFQ_WORKER_SHIFT);
+		BUG_ON(!worker || !worker->requeue);
+		return worker->current_cwq;
+	} else {
+		BUG();
+		return NULL;
+	}
+}
+
 /**
  * lock_work_pool - return and lock the pool a given work was associated with
  * @work: the work item of interest
@@ -951,11 +932,9 @@ struct worker_pool *lock_work_pool(struct work_struct *work,
 		 * if work->data points to cwq which is associated with a
 		 * locked pool, the work item is currently queued on that pool.
 		 */
-		cwq = get_work_cwq(work);
-		if (cwq && cwq->pool == pool) {
+		cwq = get_work_cwq_no_running(work);
+		if (cwq && cwq->pool == pool)
 			*queued_cwq = cwq;
-			worker = find_worker_executing_work(pool, work);
-		}
 	} else {
 		worker = worker_by_id(data >> WORK_OFFQ_WORKER_SHIFT);
 		if (!worker)
@@ -966,6 +945,8 @@ struct worker_pool *lock_work_pool(struct work_struct *work,
 		if (pool != worker->pool || worker->current_work != work ||
 		    worker->current_func != work->func)
 			worker = NULL;
+		else if (worker->requeue)
+			*queued_cwq = worker->current_cwq;
 	}
 
 	if (worker)
@@ -1154,10 +1135,12 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		cwq_dec_nr_in_flight(cwq, get_work_color(work));
 
 		/* work is dequeued, work->data points to pool iff running */
-		if (worker)
+		if (worker) {
 			worker_id = worker->id;
-		else
+			worker->requeue = false;
+		} else {
 			worker_id = WORK_OFFQ_WORKER_NONE;
+		}
 
 		set_work_worker_and_keep_pending(work, worker_id, 0);
 
@@ -1227,6 +1210,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
 	struct cpu_workqueue_struct *cwq;
+	struct worker *worker = NULL;
 	struct list_head *worklist;
 	unsigned int color_flags, delayed_flags = 0;
 	unsigned int req_cpu = cpu;
@@ -1249,7 +1233,6 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	/* determine the cwq to use */
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct worker_pool *last_pool;
-		struct worker *worker = NULL;
 		struct cpu_workqueue_struct *dummy_cwq;
 
 		if (cpu == WORK_CPU_UNBOUND)
@@ -1266,11 +1249,15 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 
 		if (worker && worker->current_cwq->wq == wq) {
 			cwq = get_cwq(last_pool->cpu, wq);
-		} else if (cwq->pool != last_pool) {
-			/* meh... not running there, queue here */
-			if (last_pool)
-				spin_unlock(&last_pool->lock);
-			spin_lock(&cwq->pool->lock);
+			BUG_ON(worker->current_cwq != cwq);
+		} else {
+			worker = NULL;
+			if (cwq->pool != last_pool) {
+				/* meh... not running there, queue here */
+				if (last_pool)
+					spin_unlock(&last_pool->lock);
+				spin_lock(&cwq->pool->lock);
+			}
 		}
 	} else {
 		cwq = get_cwq(WORK_CPU_UNBOUND, wq);
@@ -1296,8 +1283,16 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		worklist = &cwq->delayed_works;
 	}
 
-	color_flags = work_color_to_flags(cwq->work_color);
-	insert_work(cwq, work, worklist, color_flags | delayed_flags);
+	if (worker) {
+		worker->requeue = true;
+		worker->requeue_color = cwq->work_color;
+		set_work_worker_and_keep_pending(work, worker->id,
+				delayed_flags | WORK_OFFQ_REQUEUED);
+		list_add_tail(&work->entry, worklist);
+	} else {
+		color_flags = work_color_to_flags(cwq->work_color);
+		insert_work(cwq, work, worklist, color_flags | delayed_flags);
+	}
 
 	spin_unlock(&cwq->pool->lock);
 }
@@ -2131,9 +2126,9 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
 __releases(&pool->lock)
 __acquires(&pool->lock)
 {
-	struct cpu_workqueue_struct *cwq = get_work_cwq(work);
+	struct cpu_workqueue_struct *cwq = get_work_cwq_no_running(work);
 	struct worker_pool *pool = worker->pool;
-	bool cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE;
+	bool cpu_intensive;
 	int work_color;
 	struct worker *collision;
 #ifdef CONFIG_LOCKDEP
@@ -2159,12 +2154,21 @@ __acquires(&pool->lock)
 
 	/*
 	 * A single work shouldn't be executed concurrently by
-	 * multiple workers on a single cpu.  Check whether anyone is
-	 * already processing the work.  If so, defer the work to the
-	 * currently executing one.
+	 * multiple workers on a single pool. get_work_cwq_no_running()
+	 * returning NULL here means someone else is already prcoessing
+	 * the work, defer the work to the currently executing one.
+	 * These BUG_ON()s are just comments, they all are ensured
+	 * by __queue_work().
 	 */
-	collision = find_worker_executing_work(pool, work);
-	if (unlikely(collision)) {
+	if (unlikely(!cwq)) {
+		BUG_ON(!(atomic_long_read(&work->data) | WORK_OFFQ_REQUEUED));
+
+		collision = get_work_worker(work);
+		BUG_ON(!collision || !collision->requeue);
+		BUG_ON(collision->pool != pool);
+		BUG_ON(collision->current_work != work);
+		BUG_ON(collision->current_func != work->func);
+
 		move_linked_works(work, &collision->scheduled, NULL);
 		return;
 	}
@@ -2183,6 +2187,7 @@ __acquires(&pool->lock)
 	 * CPU intensive works don't participate in concurrency
 	 * management.  They're the scheduler's responsibility.
 	 */
+	cpu_intensive = cwq->wq->flags & WQ_CPU_INTENSIVE;
 	if (unlikely(cpu_intensive))
 		worker_set_flags(worker, WORKER_CPU_INTENSIVE, true);
 
@@ -2236,6 +2241,16 @@ __acquires(&pool->lock)
 	worker->current_func = NULL;
 	worker->current_cwq = NULL;
 	cwq_dec_nr_in_flight(cwq, work_color);
+
+	if (unlikely(worker->requeue)) {
+		unsigned long color_flags, keep_flags;
+
+		worker->requeue = false;
+		keep_flags = atomic_long_read(&work->data);
+		keep_flags &= WORK_STRUCT_LINKED | WORK_STRUCT_DELAYED;
+		color_flags = work_color_to_flags(worker->requeue_color);
+		set_work_cwq(work, cwq, color_flags | keep_flags);
+	}
 }
 
 /**
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 1040abc..bc0ce95 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -39,6 +39,10 @@ struct worker {
 	int			id_in_pool;	/* I: worker id in the pool */
 	int			id;		/* I: worker id (global) */
 
+	/* requeued work */
+	bool			requeue;	/* L: requeued current_work */
+	unsigned int		requeue_color;	/* L: color of requeued work */
+
 	/* for rebinding worker to CPU */
 	struct work_struct	rebind_work;	/* L: for busy worker */
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 14/15] workqueue: convert busy hash to busy list
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (12 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:12 ` [PATCH V2 15/15] workqueue: queue worker to busy list outside process_one_work() Lai Jiangshan
  2013-02-18 16:28 ` [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Since we don't use the hashtable, thus we can use list to implement
the for_each_busy_worker().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   27 ++++++++++-----------------
 kernel/workqueue_internal.h |    9 +++------
 2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a23f4fc..6d7dd78 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -41,7 +41,6 @@
 #include <linux/debug_locks.h>
 #include <linux/lockdep.h>
 #include <linux/idr.h>
-#include <linux/hashtable.h>
 #include <linux/rcupdate.h>
 
 #include "workqueue_internal.h"
@@ -138,9 +137,8 @@ struct worker_pool {
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
-	/* workers are chained either in busy_hash or idle_list */
-	DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
-						/* L: hash of busy workers */
+	/* workers are chained either in busy_list or idle_list */
+	struct list_head	busy_list;	/* L: list of busy workers */
 
 	struct mutex		assoc_mutex;	/* protect POOL_DISASSOCIATED */
 	struct ida		worker_ida;	/* L: for worker IDs */
@@ -250,8 +248,8 @@ EXPORT_SYMBOL_GPL(system_freezable_wq);
 	for ((pool) = &std_worker_pools(cpu)[0];			\
 	     (pool) < &std_worker_pools(cpu)[NR_STD_WORKER_POOLS]; (pool)++)
 
-#define for_each_busy_worker(worker, i, pos, pool)			\
-	hash_for_each(pool->busy_hash, i, pos, worker, hentry)
+#define for_each_busy_worker(worker, pool)			\
+	list_for_each_entry(worker, &pool->busy_list, entry)
 
 static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
 				unsigned int sw)
@@ -1499,8 +1497,7 @@ static void worker_enter_idle(struct worker *worker)
 	struct worker_pool *pool = worker->pool;
 
 	BUG_ON(worker->flags & WORKER_IDLE);
-	BUG_ON(!list_empty(&worker->entry) &&
-	       (worker->hentry.next || worker->hentry.pprev));
+	BUG_ON(!list_empty(&worker->entry));
 
 	/* can't use worker_set_flags(), also called from start_worker() */
 	worker->flags |= WORKER_IDLE;
@@ -1664,8 +1661,6 @@ static void busy_worker_rebind_fn(struct work_struct *work)
 static void rebind_workers(struct worker_pool *pool)
 {
 	struct worker *worker, *n;
-	struct hlist_node *pos;
-	int i;
 
 	lockdep_assert_held(&pool->assoc_mutex);
 	lockdep_assert_held(&pool->lock);
@@ -1686,7 +1681,7 @@ static void rebind_workers(struct worker_pool *pool)
 	}
 
 	/* rebind busy workers */
-	for_each_busy_worker(worker, i, pos, pool) {
+	for_each_busy_worker(worker, pool) {
 		struct work_struct *rebind_work = &worker->rebind_work;
 		struct workqueue_struct *wq;
 
@@ -2175,7 +2170,7 @@ __acquires(&pool->lock)
 
 	/* claim and dequeue */
 	debug_work_deactivate(work);
-	hash_add(pool->busy_hash, &worker->hentry, (unsigned long)work);
+	list_add(&worker->entry, &pool->busy_list);
 	worker->current_work = work;
 	worker->current_func = work->func;
 	worker->current_cwq = cwq;
@@ -2236,7 +2231,7 @@ __acquires(&pool->lock)
 		worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
 
 	/* we're done with it, release */
-	hash_del(&worker->hentry);
+	list_del_init(&worker->entry);
 	worker->current_work = NULL;
 	worker->current_func = NULL;
 	worker->current_cwq = NULL;
@@ -3490,8 +3485,6 @@ static void wq_unbind_fn(struct work_struct *work)
 	int cpu = smp_processor_id();
 	struct worker_pool *pool;
 	struct worker *worker;
-	struct hlist_node *pos;
-	int i;
 
 	for_each_std_worker_pool(pool, cpu) {
 		BUG_ON(cpu != smp_processor_id());
@@ -3509,7 +3502,7 @@ static void wq_unbind_fn(struct work_struct *work)
 		list_for_each_entry(worker, &pool->idle_list, entry)
 			worker->flags |= WORKER_UNBOUND;
 
-		for_each_busy_worker(worker, i, pos, pool)
+		for_each_busy_worker(worker, pool)
 			worker->flags |= WORKER_UNBOUND;
 
 		pool->flags |= POOL_DISASSOCIATED;
@@ -3812,7 +3805,7 @@ static int __init init_workqueues(void)
 			pool->flags |= POOL_DISASSOCIATED;
 			INIT_LIST_HEAD(&pool->worklist);
 			INIT_LIST_HEAD(&pool->idle_list);
-			hash_init(pool->busy_hash);
+			INIT_LIST_HEAD(&pool->busy_list);
 
 			init_timer_deferrable(&pool->idle_timer);
 			pool->idle_timer.function = idle_worker_timeout;
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index bc0ce95..3406f2e 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -14,17 +14,14 @@ struct worker_pool;
 
 /*
  * The poor guys doing the actual heavy lifting.  All on-duty workers are
- * either serving the manager role, on idle list or on busy hash.  For
+ * either serving the manager role, on idle list or on busy list.  For
  * details on the locking annotation (L, I, X...), refer to workqueue.c.
  *
  * Only to be used in workqueue and async.
  */
 struct worker {
-	/* on idle list while idle, on busy hash table while busy */
-	union {
-		struct list_head	entry;	/* L: while idle */
-		struct hlist_node	hentry;	/* L: while busy */
-	};
+	/* on idle list while idle, on busy list while busy */
+	struct list_head	entry;		/* L: idle/busy list */
 
 	struct work_struct	*current_work;	/* L: work being processed */
 	work_func_t		current_func;	/* L: current_work's fn */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V2 15/15] workqueue: queue worker to busy list outside process_one_work()
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (13 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 14/15] workqueue: convert busy hash to busy list Lai Jiangshan
@ 2013-02-18 16:12 ` Lai Jiangshan
  2013-02-18 16:28 ` [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:12 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

pool->busy_list is touched when the worker processes every work.
if this code is moved out, we reduce this touch.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6d7dd78..8ae92a8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2170,7 +2170,6 @@ __acquires(&pool->lock)
 
 	/* claim and dequeue */
 	debug_work_deactivate(work);
-	list_add(&worker->entry, &pool->busy_list);
 	worker->current_work = work;
 	worker->current_func = work->func;
 	worker->current_cwq = cwq;
@@ -2231,7 +2230,6 @@ __acquires(&pool->lock)
 		worker_clr_flags(worker, WORKER_CPU_INTENSIVE);
 
 	/* we're done with it, release */
-	list_del_init(&worker->entry);
 	worker->current_work = NULL;
 	worker->current_func = NULL;
 	worker->current_cwq = NULL;
@@ -2327,6 +2325,7 @@ recheck:
 	 * assumed the manager role.
 	 */
 	worker_clr_flags(worker, WORKER_PREP);
+	list_add(&worker->entry, &pool->busy_list);
 
 	do {
 		struct work_struct *work =
@@ -2344,6 +2343,7 @@ recheck:
 		}
 	} while (keep_working(pool));
 
+	list_del_init(&worker->entry);
 	worker_set_flags(worker, WORKER_PREP, false);
 sleep:
 	if (unlikely(need_to_manage_workers(pool)) && manage_workers(worker))
@@ -2422,6 +2422,7 @@ repeat:
 		/* migrate to the target cpu if possible */
 		worker_maybe_bind_and_lock(pool);
 		rescuer->pool = pool;
+		list_add(&rescuer->entry, &pool->busy_list);
 
 		/*
 		 * Slurp in all works issued via this workqueue and
@@ -2442,6 +2443,7 @@ repeat:
 		if (keep_working(pool))
 			wake_up_worker(pool);
 
+		list_del_init(&rescuer->entry);
 		rescuer->pool = NULL;
 		spin_unlock_irq(&pool->lock);
 	}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID
  2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
                   ` (14 preceding siblings ...)
  2013-02-18 16:12 ` [PATCH V2 15/15] workqueue: queue worker to busy list outside process_one_work() Lai Jiangshan
@ 2013-02-18 16:28 ` Lai Jiangshan
  15 siblings, 0 replies; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-18 16:28 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Tejun Heo, linux-kernel

On 19/02/13 00:12, Lai Jiangshan wrote:
> Core patches are patch 1, patch 9, patch 13
>
> Patch 1: enhance locking
> Patch 9: recorde worker id to work->data instead of pool id
> 	 lookup worker via worker ID if offq
> Patch 13:also lookup worker via worker ID if running&&queued,
> 	 remove lookup via hashtable
>
> Patch 6-8: ensure modification to worker->pool is under pool lock held
> Patch 14: remove hashtable totally
>
> other patch is preparing-patch or cleanup.

Patchset is on top of tj/for-3.9
8d03ecfe471802d6afe97da97722b6924533aa82
workqueue: reimplement is_chained_work() using current_wq_worker()

Some other patches will be sent tomorrow or later(need to consider/merge 
with your review comments at first)

Thanks,
lai

>
>
> Lai Jiangshan (15):
>    workqueue: add lock_work_pool()
>    workqueue: allow more work_pool id space
>    workqueue: remname worker->id to worker->id_in_pool
>    workqueue: add worker's global worker ID
>    workqueue: only set pool id when the work is running
>    workqueue: use current instead of worker->task in
>      worker_maybe_bind_and_lock()
>    workqueue: change argument of worker_maybe_bind_and_lock() to pool
>    workqueue: only change worker->pool with pool lock held
>    workqueue: use worker id instead
>    workqueue: avoid unneeded calls to get_work_cwq(work)
>    workqueue: split work_flags to delayed_flags and color_flags in
>      __queue_work()
>    workqueue: add extra flags to set_work_worker_and_keep_pending()
>    workqueue: also record worker in work->data if running&&queued
>    workqueue: convert busy hash to busy list
>    workqueue: queue worker to busy list
>
>   include/linux/workqueue.h   |   25 +-
>   kernel/workqueue.c          |  522 ++++++++++++++++++++++++-------------------
>   kernel/workqueue_internal.h |   17 +-
>   3 files changed, 314 insertions(+), 250 deletions(-)
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued
  2013-02-18 16:12 ` [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued Lai Jiangshan
@ 2013-02-18 19:50   ` Tejun Heo
  2013-02-19 15:04     ` Lai Jiangshan
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2013-02-18 19:50 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Tue, Feb 19, 2013 at 12:12:14AM +0800, Lai Jiangshan wrote:
> +/**
> + * get_work_cwq - get cwq of the work
> + * @work: the work item of interest
> + *
> + * CONTEXT:
> + * spin_lock_irq(&pool->lock), the work must be queued on this pool
> + */
> +static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
> +{
> +	unsigned long data = atomic_long_read(&work->data);
> +	struct worker *worker;
> +
> +	if (data & WORK_STRUCT_CWQ) {
> +		return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
> +	} else if (data & WORK_OFFQ_REQUEUED) {
> +		worker = worker_by_id(data >> WORK_OFFQ_WORKER_SHIFT);
> +		BUG_ON(!worker || !worker->requeue);
> +		return worker->current_cwq;
> +	} else {
> +		BUG();
> +		return NULL;
> +	}
> +}

So, work->data points to the last worker ID if off-queue or on-queue
with another worker executing it and points to cwq if on-queue w/o
another worker executing.  If on-queue w/ concurrent execution, the
excuting worker updates work->data when it finishes execution, right?

Why no documentation about it at all?  The mechanism is convoluted
with interlocking from both work and worker sides.  Lack of
documentation makes things difficult for reviewers and later readers
of the code.

> @@ -1296,8 +1283,16 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>  		worklist = &cwq->delayed_works;
>  	}
>  
> -	color_flags = work_color_to_flags(cwq->work_color);
> -	insert_work(cwq, work, worklist, color_flags | delayed_flags);
> +	if (worker) {
> +		worker->requeue = true;
> +		worker->requeue_color = cwq->work_color;
> +		set_work_worker_and_keep_pending(work, worker->id,
> +				delayed_flags | WORK_OFFQ_REQUEUED);
> +		list_add_tail(&work->entry, worklist);
> +	} else {
> +		color_flags = work_color_to_flags(cwq->work_color);
> +		insert_work(cwq, work, worklist, color_flags | delayed_flags);
> +	}

I can't say I like this.  In interlocks the work being queued and the
worker so that both have to watch out for each other.  It's kinda
nasty.

> @@ -2236,6 +2241,16 @@ __acquires(&pool->lock)
>  	worker->current_func = NULL;
>  	worker->current_cwq = NULL;
>  	cwq_dec_nr_in_flight(cwq, work_color);
> +
> +	if (unlikely(worker->requeue)) {
> +		unsigned long color_flags, keep_flags;
> +
> +		worker->requeue = false;
> +		keep_flags = atomic_long_read(&work->data);
> +		keep_flags &= WORK_STRUCT_LINKED | WORK_STRUCT_DELAYED;
> +		color_flags = work_color_to_flags(worker->requeue_color);
> +		set_work_cwq(work, cwq, color_flags | keep_flags);
> +	}

So, what was before mostly one way "is it still executing?" query
becomes three party handshake among the queuer, executing worker and
try_to_grab_pending(), and we end up shifting information from the
queuer through the executing worker because work->data can't hold both
workqueue and worker information.

I don't know, Lai.  While removal of busy_hash is nice, I'm not really
sure whether we're ending up with better or worse code by doing this.
It's more convoluted for sure.  Performance-wise, now that idr_find()
for pool costs almost nothing (because we're very unlikely to have
more than 256 pools), we're comparing one idr lookup (which can easily
blow through 256 single layer optimization limit) against two simple
hash table lookup.  I don't really think either would be noticeably
better than the other in any measureable way.

The trade-off, while doesn't seem too bad, doesn't seem much
beneficial either.  It's different from what we're currently doing but
I'm not sure we're making it better by doing this.

Hmmmmm....

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued
  2013-02-18 19:50   ` Tejun Heo
@ 2013-02-19 15:04     ` Lai Jiangshan
  2013-02-19 19:27       ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Lai Jiangshan @ 2013-02-19 15:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

Hi, TJ

Thank you for reviews and comments.
First, the patchset can't be merged to 3.9 since there are under
discussion and are not shown they benefit.
(But are patch 1, 5-8 possible merged after rebased and revised?)
Second, the removal of the hash table is very possible in future with
different implementation.

On Tue, Feb 19, 2013 at 3:50 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Lai.
>
> On Tue, Feb 19, 2013 at 12:12:14AM +0800, Lai Jiangshan wrote:
>> +/**
>> + * get_work_cwq - get cwq of the work
>> + * @work: the work item of interest
>> + *
>> + * CONTEXT:
>> + * spin_lock_irq(&pool->lock), the work must be queued on this pool
>> + */
>> +static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
>> +{
>> +     unsigned long data = atomic_long_read(&work->data);
>> +     struct worker *worker;
>> +
>> +     if (data & WORK_STRUCT_CWQ) {
>> +             return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
>> +     } else if (data & WORK_OFFQ_REQUEUED) {
>> +             worker = worker_by_id(data >> WORK_OFFQ_WORKER_SHIFT);
>> +             BUG_ON(!worker || !worker->requeue);
>> +             return worker->current_cwq;
>> +     } else {
>> +             BUG();
>> +             return NULL;
>> +     }
>> +}
>
> So, work->data points to the last worker ID if off-queue or on-queue
> with another worker executing it and points to cwq if on-queue w/o
> another worker executing.  If on-queue w/ concurrent execution, the
> excuting worker updates work->data when it finishes execution, right?

right.

>
> Why no documentation about it at all?  The mechanism is convoluted
> with interlocking from both work and worker sides.  Lack of
> documentation makes things difficult for reviewers and later readers
> of the code.

sorry.

>
>> @@ -1296,8 +1283,16 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>>               worklist = &cwq->delayed_works;
>>       }
>>
>> -     color_flags = work_color_to_flags(cwq->work_color);
>> -     insert_work(cwq, work, worklist, color_flags | delayed_flags);
>> +     if (worker) {
>> +             worker->requeue = true;
>> +             worker->requeue_color = cwq->work_color;
>> +             set_work_worker_and_keep_pending(work, worker->id,
>> +                             delayed_flags | WORK_OFFQ_REQUEUED);
>> +             list_add_tail(&work->entry, worklist);
>> +     } else {
>> +             color_flags = work_color_to_flags(cwq->work_color);
>> +             insert_work(cwq, work, worklist, color_flags | delayed_flags);
>> +     }
>
> I can't say I like this.  In interlocks the work being queued and the
> worker so that both have to watch out for each other.  It's kinda
> nasty.
>
>> @@ -2236,6 +2241,16 @@ __acquires(&pool->lock)
>>       worker->current_func = NULL;
>>       worker->current_cwq = NULL;
>>       cwq_dec_nr_in_flight(cwq, work_color);
>> +
>> +     if (unlikely(worker->requeue)) {
>> +             unsigned long color_flags, keep_flags;
>> +
>> +             worker->requeue = false;
>> +             keep_flags = atomic_long_read(&work->data);
>> +             keep_flags &= WORK_STRUCT_LINKED | WORK_STRUCT_DELAYED;
>> +             color_flags = work_color_to_flags(worker->requeue_color);
>> +             set_work_cwq(work, cwq, color_flags | keep_flags);
>> +     }
>
> So, what was before mostly one way "is it still executing?" query
> becomes three party handshake among the queuer, executing worker and
> try_to_grab_pending(), and we end up shifting information from the
> queuer through the executing worker because work->data can't hold both
> workqueue and worker information.
>
> I don't know, Lai.  While removal of busy_hash is nice, I'm not really
> sure whether we're ending up with better or worse code by doing this.
> It's more convoluted for sure.  Performance-wise, now that idr_find()
> for pool costs almost nothing (because we're very unlikely to have
> more than 256 pools), we're comparing one idr lookup (which can easily
> blow through 256 single layer optimization limit) against two simple
> hash table lookup.  I don't really think either would be noticeably
> better than the other in any measureable way.
>
> The trade-off, while doesn't seem too bad, doesn't seem much
> beneficial either.  It's different from what we're currently doing but
> I'm not sure we're making it better by doing this.
>

I agree your comments.

Thanks,
Lai

PS: Some small benefits of this patchset
1) work_busy() can be reimplement lockless.
2) the user can issue *reentrance" works by re-init the work.
    (w/o this patchset, the users must defer the free and alloc new
work item if they want reentrancable)
3) natural immunity of such
bugs(https://bugzilla.kernel.org/show_bug.cgi?id=51701)

but small.
need to find a better way in future.

>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued
  2013-02-19 15:04     ` Lai Jiangshan
@ 2013-02-19 19:27       ` Tejun Heo
  2013-02-19 20:24         ` Tejun Heo
  0 siblings, 1 reply; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 19:27 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hello, Lai.

On Tue, Feb 19, 2013 at 11:04:31PM +0800, Lai Jiangshan wrote:
> (But are patch 1, 5-8 possible merged after rebased and revised?)

0001 is problematic because it adds a hash lookup which isn't
necessary.  In this series, it's okay as the later lookup is removed
eventually but without that, I don't think it's a good idea to apply
0001.

0002 looks good to me.

0005, I'm not sure what the benefit would be without later changes.

0006-0008 yeah, probably.

I'll try to apply them on top of wq/for-3.9.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] workqueue: allow more off-queue flag space
  2013-02-18 16:12 ` [PATCH V2 02/15] workqueue: allow more work_pool id space Lai Jiangshan
@ 2013-02-19 20:19   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 20:19 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

>From 4acd43237af61d55ac128bbaf46ca092d560809c Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Tue, 19 Feb 2013 12:17:01 -0800

When a work item is off-queue, its work->data contains WORK_STRUCT_*
and WORK_OFFQ_* flags.  As WORK_OFFQ_* flags are used only while a
work item is off-queue, it can occupy bits of work->data which aren't
used while off-queue.  WORK_OFFQ_* currently only use bits used by
on-queue CWQ pointer.  As color bits aren't used while off-queue,
there's no reason to not use them.

Lower WORK_OFFQ_FLAG_BASE from WORK_STRUCT_FLAG_BITS to
WORK_STRUCT_COLOR_SHIFT thus giving 4 more bits to off-queue flag
space which is also used to record worker_pool ID while off-queue.

This doesn't introduce any visible behavior difference.

tj: Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to wq/for-3.10-tmp.  Note that I'm gonna reabse the branch
once 3.9-rc1 is out.

Thanks.

 include/linux/workqueue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 8afab27..5bd030f 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -68,7 +68,7 @@ enum {
 				  WORK_STRUCT_COLOR_BITS,

 	/* data contains off-queue information when !WORK_STRUCT_PWQ */
-	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
+	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_COLOR_SHIFT,

 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),

-- 
1.8.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] workqueue: use %current instead of worker->task in worker_maybe_bind_and_lock()
  2013-02-18 16:12 ` [PATCH V2 06/15] workqueue: use current instead of worker->task in worker_maybe_bind_and_lock() Lai Jiangshan
@ 2013-02-19 20:20   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 20:20 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

>From 82bde23c1c79441e2962998d6a614f3266a50bf8 Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Tue, 19 Feb 2013 12:17:02 -0800

worker_maybe_bind_and_lock() uses both @worker->task and @current at
the same time.  As worker_maybe_bind_and_lock() can only be called by
the current worker task, they are always the same.

Update worker_maybe_bind_and_lock() to use %current consistently.

This doesn't introduce any functional change.

tj: Massaged the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to wq/for-3.10-tmp.  Note that I'm gonna reabse the branch
once 3.9-rc1 is out.

Thanks.

 kernel/workqueue.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f4feaca..cebde5b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1513,7 +1513,7 @@ static void worker_leave_idle(struct worker *worker)
  * flushed from cpu callbacks while cpu is going down, they are
  * guaranteed to execute on the cpu.
  *
- * This function is to be used by rogue workers and rescuers to bind
+ * This function is to be used by unbound workers and rescuers to bind
  * themselves to the target cpu and may race with cpu going down or
  * coming online.  kthread_bind() can't be used because it may put the
  * worker to already dead cpu and set_cpus_allowed_ptr() can't be used
@@ -1538,7 +1538,6 @@ static bool worker_maybe_bind_and_lock(struct worker *worker)
 __acquires(&pool->lock)
 {
 	struct worker_pool *pool = worker->pool;
-	struct task_struct *task = worker->task;
 
 	while (true) {
 		/*
@@ -1548,12 +1547,12 @@ __acquires(&pool->lock)
 		 * against POOL_DISASSOCIATED.
 		 */
 		if (!(pool->flags & POOL_DISASSOCIATED))
-			set_cpus_allowed_ptr(task, get_cpu_mask(pool->cpu));
+			set_cpus_allowed_ptr(current, get_cpu_mask(pool->cpu));
 
 		spin_lock_irq(&pool->lock);
 		if (pool->flags & POOL_DISASSOCIATED)
 			return false;
-		if (task_cpu(task) == pool->cpu &&
+		if (task_cpu(current) == pool->cpu &&
 		    cpumask_equal(&current->cpus_allowed,
 				  get_cpu_mask(pool->cpu)))
 			return true;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] workqueue: change argument of worker_maybe_bind_and_lock() to @pool
  2013-02-18 16:12 ` [PATCH V2 07/15] workqueue: change argument of worker_maybe_bind_and_lock() to pool Lai Jiangshan
@ 2013-02-19 20:20   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 20:20 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

>From f8d038f52a57c25e37c547b80d327a9d4f03fd2d Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Tue, 19 Feb 2013 12:17:02 -0800

worker_maybe_bind_and_lock() currently takes @worker but only cares
about @worker->pool.  This patch updates worker_maybe_bind_and_lock()
to take @pool instead of @worker.  This will be used to better define
synchronization rules regarding rescuer->pool updates.

This doesn't introduce any functional change.

tj: Updated the comments and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to wq/for-3.10-tmp.  Note that I'm gonna reabse the branch
once 3.9-rc1 is out.

Thanks.

 kernel/workqueue.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cebde5b..1770b09 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1505,8 +1505,10 @@ static void worker_leave_idle(struct worker *worker)
 }
 
 /**
- * worker_maybe_bind_and_lock - bind worker to its cpu if possible and lock pool
- * @worker: self
+ * worker_maybe_bind_and_lock - try to bind %current to worker_pool and lock it
+ * @pool: target worker_pool
+ *
+ * Bind %current to the cpu of @pool if it is associated and lock @pool.
  *
  * Works which are scheduled while the cpu is online must at least be
  * scheduled to a worker which is bound to the cpu so that if they are
@@ -1534,11 +1536,9 @@ static void worker_leave_idle(struct worker *worker)
  * %true if the associated pool is online (@worker is successfully
  * bound), %false if offline.
  */
-static bool worker_maybe_bind_and_lock(struct worker *worker)
+static bool worker_maybe_bind_and_lock(struct worker_pool *pool)
 __acquires(&pool->lock)
 {
-	struct worker_pool *pool = worker->pool;
-
 	while (true) {
 		/*
 		 * The following call may fail, succeed or succeed
@@ -1576,7 +1576,7 @@ __acquires(&pool->lock)
 static void idle_worker_rebind(struct worker *worker)
 {
 	/* CPU may go down again inbetween, clear UNBOUND only on success */
-	if (worker_maybe_bind_and_lock(worker))
+	if (worker_maybe_bind_and_lock(worker->pool))
 		worker_clr_flags(worker, WORKER_UNBOUND);
 
 	/* rebind complete, become available again */
@@ -1594,7 +1594,7 @@ static void busy_worker_rebind_fn(struct work_struct *work)
 {
 	struct worker *worker = container_of(work, struct worker, rebind_work);
 
-	if (worker_maybe_bind_and_lock(worker))
+	if (worker_maybe_bind_and_lock(worker->pool))
 		worker_clr_flags(worker, WORKER_UNBOUND);
 
 	spin_unlock_irq(&worker->pool->lock);
@@ -2040,7 +2040,7 @@ static bool manage_workers(struct worker *worker)
 		 * on @pool's current state.  Try it and adjust
 		 * %WORKER_UNBOUND accordingly.
 		 */
-		if (worker_maybe_bind_and_lock(worker))
+		if (worker_maybe_bind_and_lock(pool))
 			worker->flags &= ~WORKER_UNBOUND;
 		else
 			worker->flags |= WORKER_UNBOUND;
@@ -2360,7 +2360,7 @@ repeat:
 
 		/* migrate to the target cpu if possible */
 		rescuer->pool = pool;
-		worker_maybe_bind_and_lock(rescuer);
+		worker_maybe_bind_and_lock(pool);
 
 		/*
 		 * Slurp in all works issued via this workqueue and
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] workqueue: better define synchronization rule around rescuer->pool updates
  2013-02-18 16:12 ` [PATCH V2 08/15] workqueue: only change worker->pool with pool lock held Lai Jiangshan
@ 2013-02-19 20:22   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 20:22 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

>From 7bceeff75e5599f5c026797229d897f518dc028e Mon Sep 17 00:00:00 2001
From: Lai Jiangshan <laijs@cn.fujitsu.com>
Date: Tue, 19 Feb 2013 12:17:02 -0800

Rescuers visit different worker_pools to process work items from pools
under pressure.  Currently, rescuer->pool is updated outside any
locking and when an outsider looks at a rescuer, there's no way to
tell when and whether rescuer->pool is gonna change.  While this
doesn't currently cause any problem, it is nasty.

With recent worker_maybe_bind_and_lock() changes, we can move
rescuer->pool updates inside pool locks such that if rescuer->pool
equals a locked pool, it's guaranteed to stay that way until the pool
is unlocked.

Move rescuer->pool inside pool->lock.

This patch doesn't introduce any visible behavior difference.

tj: Updated the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Applied to wq/for-3.10-tmp.  Note that I'm gonna reabse the branch
once 3.9-rc1 is out.

Thanks.

 kernel/workqueue.c          | 3 ++-
 kernel/workqueue_internal.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1770b09..0b1e6f2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2359,8 +2359,8 @@ repeat:
 		mayday_clear_cpu(cpu, wq->mayday_mask);
 
 		/* migrate to the target cpu if possible */
-		rescuer->pool = pool;
 		worker_maybe_bind_and_lock(pool);
+		rescuer->pool = pool;
 
 		/*
 		 * Slurp in all works issued via this workqueue and
@@ -2381,6 +2381,7 @@ repeat:
 		if (keep_working(pool))
 			wake_up_worker(pool);
 
+		rescuer->pool = NULL;
 		spin_unlock_irq(&pool->lock);
 	}
 
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 0765026..f9c8877 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -32,6 +32,7 @@ struct worker {
 	struct list_head	scheduled;	/* L: scheduled works */
 	struct task_struct	*task;		/* I: worker task */
 	struct worker_pool	*pool;		/* I: the associated pool */
+						/* L: for rescuers */
 	/* 64 bytes boundary on 64bit, 32 on 32bit */
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued
  2013-02-19 19:27       ` Tejun Heo
@ 2013-02-19 20:24         ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2013-02-19 20:24 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hey, again.

On Tue, Feb 19, 2013 at 11:27:50AM -0800, Tejun Heo wrote:
> 0001 is problematic because it adds a hash lookup which isn't
> necessary.  In this series, it's okay as the later lookup is removed
> eventually but without that, I don't think it's a good idea to apply
> 0001.
> 
> 0002 looks good to me.
> 
> 0005, I'm not sure what the benefit would be without later changes.
> 
> 0006-0008 yeah, probably.
> 
> I'll try to apply them on top of wq/for-3.9.

So, I picked and applied 2, 6-8 on a temp branch.  FYI, note that I'm
currently sitting on top of a bunch of patches implementing custom
worker_pools and likely to prioritize them over other conflicting
changes.  I'm expecting to send these out in a week or so.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-02-19 20:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-18 16:12 [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 01/15] workqueue: add lock_work_pool() Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 02/15] workqueue: allow more work_pool id space Lai Jiangshan
2013-02-19 20:19   ` [PATCH 1/4] workqueue: allow more off-queue flag space Tejun Heo
2013-02-18 16:12 ` [PATCH V2 03/15] workqueue: remname current worker->id to worker->id_in_pool Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 04/15] workqueue: add worker's global worker ID Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 05/15] workqueue: only set pool id when the work is running Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 06/15] workqueue: use current instead of worker->task in worker_maybe_bind_and_lock() Lai Jiangshan
2013-02-19 20:20   ` [PATCH 2/4] workqueue: use %current " Tejun Heo
2013-02-18 16:12 ` [PATCH V2 07/15] workqueue: change argument of worker_maybe_bind_and_lock() to pool Lai Jiangshan
2013-02-19 20:20   ` [PATCH 3/4] workqueue: change argument of worker_maybe_bind_and_lock() to @pool Tejun Heo
2013-02-18 16:12 ` [PATCH V2 08/15] workqueue: only change worker->pool with pool lock held Lai Jiangshan
2013-02-19 20:22   ` [PATCH 4/4] workqueue: better define synchronization rule around rescuer->pool updates Tejun Heo
2013-02-18 16:12 ` [PATCH V2 09/15] workqueue: use worker id in work->data instead Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 10/15] workqueue: avoid unneeded calls to get_work_cwq() Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 11/15] workqueue: split work_flags to delayed_flags and color_flags in __queue_work() Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 12/15] workqueue: add extra flags to set_work_worker_and_keep_pending() Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 13/15] workqueue: also record worker in work->data if running&&queued Lai Jiangshan
2013-02-18 19:50   ` Tejun Heo
2013-02-19 15:04     ` Lai Jiangshan
2013-02-19 19:27       ` Tejun Heo
2013-02-19 20:24         ` Tejun Heo
2013-02-18 16:12 ` [PATCH V2 14/15] workqueue: convert busy hash to busy list Lai Jiangshan
2013-02-18 16:12 ` [PATCH V2 15/15] workqueue: queue worker to busy list outside process_one_work() Lai Jiangshan
2013-02-18 16:28 ` [PATCH V2 00/15] workqueue: enhance locking and lookup worker via ID Lai Jiangshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.