linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] workqueue: enhance locking and record global worker id for work data
@ 2013-01-31 18:41 Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Better Locking:
---------

We have a much complex way to find out on which pool a given work is queued on,
mainly based on *mb() which is the most dangerous code and bad for readability.
This series change the usage of CWQ bit and makes these code simpler.
	--PATCH 3,4,5

We have get_work_pool(), but it requires the caller do the later check and locking,
we replace it which 3 better internal locking API. 1) More proper API and
2) merge the duplicated code and 3) simplify the caller.
	--PATCH 8,9,10

get_work_pool()/get_work_pool_id() are called everywhere, something they are
overkill(called idr_find() unneeded) and indirectly(caller knows it is onq or not),
we replace them with get_work_cwq()/offq_work_pool_id()/locking APIs.
	--PATCH 3,4,5,6,8,9,10


Safely/one-step searching and worker id:
----------------------------------------

We are planing to add non-std worker_pool, but old get_work_pool() or new
lock_pool_executing_work() was not prepared for this plan, idr_find(pool_id)
is unsafe when we introduce free-able non-std worker_pool. Although we can
fix it by adding rcu to worker_pool. but "recording global worker id for
work data and adding rcu to worker" is another way and more straight forward.
We implement the later one,  Now, lock_pool_executing_work() is ready for this plan.
	--PATCH 12,13

When every time we need to find out the running worker from a work,
we need two searches: search work_pool from work's data, and search worker
from hash. We record global worker id for work data and we only need one search.
	--PATCH 13

cleanup
	--PATCH 1,2,7,11

On top of tj/for-3.9 706026c2141113886f61e1ad2738c9a7723ec69c

Lai Jiangshan (13):
  workqueue: remove WORK_CPU_NONE
  workqueue: fix work_busy()
  workqueue: don't set work cwq until we queued it on pool
  workqueue: clear cwq when cancel the work
  workqueue: change queued detection and remove *mb()s
  workqueue: get pool id from work->data directly if it is offq
  workqueue: get pool from wq/cwq
  workqueue: add lock_pool_executing_work()
  workqueue: add lock_pool_queued_work()
  workqueue: add lock_pool_own_work() and remove get_work_pool()
  workqueue: allow more work_pool id space
  workqueue: add worker's global worker ID
  workqueue: record global worker ID instead of pool ID in work->data
    when off-queue

 include/linux/workqueue.h   |   24 ++--
 kernel/workqueue.c          |  428 +++++++++++++++++++++++--------------------
 kernel/workqueue_internal.h |    1 +
 3 files changed, 242 insertions(+), 211 deletions(-)

-- 
1.7.7.6


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/13] workqueue: remove WORK_CPU_NONE
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 19:49   ` Tejun Heo
  2013-02-06 22:34   ` [PATCH wq/for-3.9] workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END Tejun Heo
  2013-01-31 18:41 ` [PATCH 02/13] workqueue: fix work_busy() Lai Jiangshan
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

In __next_wq_cpu() for_each_*wq_cpu(), the name WORK_CPU_LAST
is proper than WORK_CPU_NONE, convert them to WORK_CPU_LAST.

WORK_CPU_NONE is not used any more, just remove it.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |    3 +--
 kernel/workqueue.c        |    8 ++++----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index a94e4e8..2dcbacc 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -57,8 +57,7 @@ enum {
 
 	/* special cpu IDs */
 	WORK_CPU_UNBOUND	= NR_CPUS,
-	WORK_CPU_NONE		= NR_CPUS + 1,
-	WORK_CPU_LAST		= WORK_CPU_NONE,
+	WORK_CPU_LAST		= NR_CPUS + 1,
 
 	/*
 	 * Reserve 7 bits off of cwq pointer w/ debugobjects turned
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 577de10..973b290 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -258,7 +258,7 @@ static inline int __next_wq_cpu(int cpu, const struct cpumask *mask,
 		if (sw & 2)
 			return WORK_CPU_UNBOUND;
 	}
-	return WORK_CPU_NONE;
+	return WORK_CPU_LAST;
 }
 
 static inline int __next_cwq_cpu(int cpu, const struct cpumask *mask,
@@ -282,17 +282,17 @@ static inline int __next_cwq_cpu(int cpu, const struct cpumask *mask,
  */
 #define for_each_wq_cpu(cpu)						\
 	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);		\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_LAST;					\
 	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, 3))
 
 #define for_each_online_wq_cpu(cpu)					\
 	for ((cpu) = __next_wq_cpu(-1, cpu_online_mask, 3);		\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_LAST;					\
 	     (cpu) = __next_wq_cpu((cpu), cpu_online_mask, 3))
 
 #define for_each_cwq_cpu(cpu, wq)					\
 	for ((cpu) = __next_cwq_cpu(-1, cpu_possible_mask, (wq));	\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_LAST;					\
 	     (cpu) = __next_cwq_cpu((cpu), cpu_possible_mask, (wq)))
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/13] workqueue: fix work_busy()
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 19:54   ` Tejun Heo
  2013-02-06 23:02   ` [PATCH wq/for-3.9] workqueue: make work_busy() test WORK_STRUCT_PENDING first Tejun Heo
  2013-01-31 18:41 ` [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool Lai Jiangshan
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Even get_work_pool() return NULL, the work may be pending.
Make work_pending() test unconditionally.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    8 ++------
 1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 973b290..d474a6c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3443,8 +3443,6 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
  * Test whether @work is currently pending or running.  There is no
  * synchronization around this function and the test result is
  * unreliable and only useful as advisory hints or for debugging.
- * Especially for reentrant wqs, the pending state might hide the
- * running state.
  *
  * RETURNS:
  * OR'd bitmask of WORK_BUSY_* bits.
@@ -3453,15 +3451,13 @@ unsigned int work_busy(struct work_struct *work)
 {
 	struct worker_pool *pool = get_work_pool(work);
 	unsigned long flags;
-	unsigned int ret = 0;
+	unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;
 
 	if (!pool)
-		return 0;
+		return ret;
 
 	spin_lock_irqsave(&pool->lock, flags);
 
-	if (work_pending(work))
-		ret |= WORK_BUSY_PENDING;
 	if (find_worker_executing_work(pool, work))
 		ret |= WORK_BUSY_RUNNING;
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 02/13] workqueue: fix work_busy() Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 21:28   ` Tejun Heo
  2013-02-07  0:28   ` [PATCH wq/for-3.9 workqueue]: add delayed_work->wq to simplify reentrancy handling Tejun Heo
  2013-01-31 18:41 ` [PATCH 04/13] workqueue: clear cwq when cancel the work Lai Jiangshan
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Setting cwq to work struct which it is timer-pending introduces
unneeded complex to __queue_delayed_work().

We introduce "struct workqueue_struct *wq;" to the big struct delayed_work
to reduce this complex. (If someone blame that I enlarge this struct,
I can encode @wq to delayed_work.work.entry, this patch and the later
two patches make this encoding possible.)

This is the first step of killing CWQ bit of the work which is not queued
on any pool.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |    1 +
 kernel/workqueue.c        |   32 +++-----------------------------
 2 files changed, 4 insertions(+), 29 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2dcbacc..db1782b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -110,6 +110,7 @@ struct delayed_work {
 	struct work_struct work;
 	struct timer_list timer;
 	int cpu;
+	struct workqueue_struct *wq;
 };
 
 static inline struct delayed_work *to_delayed_work(struct work_struct *work)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d474a6c..b12b30e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1339,10 +1339,9 @@ EXPORT_SYMBOL_GPL(queue_work);
 void delayed_work_timer_fn(unsigned long __data)
 {
 	struct delayed_work *dwork = (struct delayed_work *)__data;
-	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
 	/* should have been called from irqsafe timer with irq already off */
-	__queue_work(dwork->cpu, cwq->wq, &dwork->work);
+	__queue_work(dwork->cpu, dwork->wq, &dwork->work);
 }
 EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
@@ -1351,7 +1350,6 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 {
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
-	unsigned int lcpu;
 
 	WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
 		     timer->data != (unsigned long)dwork);
@@ -1371,31 +1369,8 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 
 	timer_stats_timer_set_start_info(&dwork->timer);
 
-	/*
-	 * This stores cwq for the moment, for the timer_fn.  Note that the
-	 * work's pool is preserved to allow reentrance detection for
-	 * delayed works.
-	 */
-	if (!(wq->flags & WQ_UNBOUND)) {
-		struct worker_pool *pool = get_work_pool(work);
-
-		/*
-		 * If we cannot get the last pool from @work directly,
-		 * select the last CPU such that it avoids unnecessarily
-		 * triggering non-reentrancy check in __queue_work().
-		 */
-		lcpu = cpu;
-		if (pool)
-			lcpu = pool->cpu;
-		if (lcpu == WORK_CPU_UNBOUND)
-			lcpu = raw_smp_processor_id();
-	} else {
-		lcpu = WORK_CPU_UNBOUND;
-	}
-
-	set_work_cwq(work, get_cwq(lcpu, wq), 0);
-
 	dwork->cpu = cpu;
+	dwork->wq = wq;
 	timer->expires = jiffies + delay;
 
 	if (unlikely(cpu != WORK_CPU_UNBOUND))
@@ -2944,8 +2919,7 @@ bool flush_delayed_work(struct delayed_work *dwork)
 {
 	local_irq_disable();
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(dwork->cpu,
-			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+		__queue_work(dwork->cpu, dwork->wq, &dwork->work);
 	local_irq_enable();
 	return flush_work(&dwork->work);
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/13] workqueue: clear cwq when cancel the work
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (2 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-07  0:53   ` [PATCH wq/for-3.9] workqueue: make work->data point to pool after try_to_grab_pending() Tejun Heo
  2013-01-31 18:41 ` [PATCH 05/13] workqueue: change queued detection and remove *mb()s Lai Jiangshan
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Add clear_work_cwq() which clear the CWQ bit when the work is offq when it is
cancelled.

It is the other important step of killing CWQ bit of the work which is
not queued on any pool.

Now, when a work is offq, it has no CWQ bit.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b12b30e..50d3dd5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -556,6 +556,12 @@ static void set_work_cwq(struct work_struct *work,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
+static void clear_work_cwq(struct work_struct *work, int pool_id)
+{
+	set_work_data(work, pool_id << WORK_OFFQ_POOL_SHIFT,
+		      WORK_STRUCT_PENDING);
+}
+
 static void set_work_pool_and_clear_pending(struct work_struct *work,
 					    int pool_id)
 {
@@ -1115,6 +1121,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 			cwq_dec_nr_in_flight(get_work_cwq(work),
 				get_work_color(work));
 
+			clear_work_cwq(work, pool->id);
 			spin_unlock(&pool->lock);
 			return 1;
 		}
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/13] workqueue: change queued detection and remove *mb()s
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (3 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 04/13] workqueue: clear cwq when cancel the work Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-07  1:52   ` [PATCH wq/for-3.9] workqueue: simplify is-work-item-queued-here test Tejun Heo
  2013-01-31 18:41 ` [PATCH 06/13] workqueue: get pool id from work->data directly if it is offq Lai Jiangshan
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Now, we has this invariant:
	CWQ bit is set and the cwq->pool == pool
	<==> the work is queued on the pool

So we can simplify the work-queued-detection-and-lock and remove *mb()s.

(Although rmb()/wmb() is nop in x86, but it is very slow in other arch.)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   43 +++++++++++++++++++------------------------
 1 files changed, 19 insertions(+), 24 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 50d3dd5..b7cfaa1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1067,6 +1067,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 			       unsigned long *flags)
 {
 	struct worker_pool *pool;
+	struct cpu_workqueue_struct *cwq;
 
 	local_irq_save(*flags);
 
@@ -1096,14 +1097,20 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		goto fail;
 
 	spin_lock(&pool->lock);
-	if (!list_empty(&work->entry)) {
-		/*
-		 * This work is queued, but perhaps we locked the wrong
-		 * pool.  In that case we must see the new value after
-		 * rmb(), see insert_work()->wmb().
-		 */
-		smp_rmb();
-		if (pool == get_work_pool(work)) {
+	/*
+	 * The CWQ bit is set/cleared only when we do enqueue/dequeue the work
+	 * When a work is enqueued(insert_work()) to a pool:
+	 *	we set cwq(CWQ bit) with pool->lock held
+	 * when a work is dequeued(process_one_work(),try_to_grab_pending()):
+	 *	we clear cwq(CWQ bit) with pool->lock held
+	 *
+	 * So when if the pool->lock is held, we can determine:
+	 * 	CWQ bit is set and the cwq->pool == pool
+	 * 	<==> the work is queued on the pool
+	 */
+	cwq = get_work_cwq(work);
+	if (cwq) {
+		if (pool == cwq->pool) {
 			debug_work_deactivate(work);
 
 			/*
@@ -1156,13 +1163,6 @@ static void insert_work(struct cpu_workqueue_struct *cwq,
 
 	/* we own @work, set data and link */
 	set_work_cwq(work, cwq, extra_flags);
-
-	/*
-	 * Ensure that we get the right work->data if we see the
-	 * result of list_add() below, see try_to_grab_pending().
-	 */
-	smp_wmb();
-
 	list_add_tail(&work->entry, head);
 
 	/*
@@ -2796,15 +2796,10 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 		return false;
 
 	spin_lock_irq(&pool->lock);
-	if (!list_empty(&work->entry)) {
-		/*
-		 * See the comment near try_to_grab_pending()->smp_rmb().
-		 * If it was re-queued to a different pool under us, we
-		 * are not going to wait.
-		 */
-		smp_rmb();
-		cwq = get_work_cwq(work);
-		if (unlikely(!cwq || pool != cwq->pool))
+	/* See the comment near try_to_grab_pending() with the same code */
+	cwq = get_work_cwq(work);
+	if (cwq) {
+		if (unlikely(pool != cwq->pool))
 			goto already_gone;
 	} else {
 		worker = find_worker_executing_work(pool, work);
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/13] workqueue: get pool id from work->data directly if it is offq
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (4 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 05/13] workqueue: change queued detection and remove *mb()s Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-07 20:16   ` [PATCH wq/for-3.9] workqueue: make get_work_pool_id() cheaper Tejun Heo
  2013-01-31 18:41 ` [PATCH 07/13] workqueue: get pool from wq/cwq Lai Jiangshan
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

We know that when the time of the two calls of get_work_pool_id(),
the work is offq, so we can get pool id from work->data directly
and remove the overhead of idr_find() called by get_work_pool_id().

Implement offq_work_pool_id() instead of get_work_pool_id().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b7cfaa1..9f04416 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -617,22 +617,22 @@ static struct worker_pool *get_work_pool(struct work_struct *work)
 }
 
 /**
- * get_work_pool_id - return the worker pool ID a given work is associated with
- * @work: the work item of interest
+ * offq_work_pool_id - return the worker pool ID a given work is associated with
+ * @work: the off-queued work item of interest
  *
  * Return the worker_pool ID @work was last associated with.
- * %WORK_OFFQ_POOL_NONE if none.
  */
-static int get_work_pool_id(struct work_struct *work)
+static int offq_work_pool_id(struct work_struct *work)
 {
-	struct worker_pool *pool = get_work_pool(work);
+	unsigned long data = atomic_long_read(&work->data);
 
-	return pool ? pool->id : WORK_OFFQ_POOL_NONE;
+	BUG_ON(data & WORK_STRUCT_CWQ);
+	return data >> WORK_OFFQ_POOL_SHIFT;
 }
 
 static void mark_work_canceling(struct work_struct *work)
 {
-	unsigned long pool_id = get_work_pool_id(work);
+	unsigned long pool_id = offq_work_pool_id(work);
 
 	pool_id <<= WORK_OFFQ_POOL_SHIFT;
 	set_work_data(work, pool_id | WORK_OFFQ_CANCELING, WORK_STRUCT_PENDING);
@@ -2952,7 +2952,7 @@ bool cancel_delayed_work(struct delayed_work *dwork)
 		return false;
 
 	set_work_pool_and_clear_pending(&dwork->work,
-					get_work_pool_id(&dwork->work));
+					offq_work_pool_id(&dwork->work));
 	local_irq_restore(flags);
 	return ret;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/13] workqueue: get pool from wq/cwq
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (5 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 06/13] workqueue: get pool id from work->data directly if it is offq Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-07 21:13   ` [PATCH wq/for-3.9] workqueue: pick cwq instead of pool in __queue_work() Tejun Heo
  2013-01-31 18:41 ` [PATCH 08/13] workqueue: add lock_pool_executing_work() Lai Jiangshan
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

__queue_work() calls to get_std_worker_pool(), it is indirectly,
we should get it directly from wq/cwq.

It is needed when we introduce non-std worker pool.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 9f04416..a138844 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1212,7 +1212,6 @@ static bool is_chained_work(struct workqueue_struct *wq)
 static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
-	bool highpri = wq->flags & WQ_HIGHPRI;
 	struct worker_pool *pool;
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
@@ -1247,7 +1246,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		 * work needs to be queued on that cpu to guarantee
 		 * non-reentrancy.
 		 */
-		pool = get_std_worker_pool(cpu, highpri);
+		pool = get_cwq(cpu, wq)->pool;
 		last_pool = get_work_pool(work);
 
 		if (last_pool && last_pool != pool) {
@@ -1268,7 +1267,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 			spin_lock(&pool->lock);
 		}
 	} else {
-		pool = get_std_worker_pool(WORK_CPU_UNBOUND, highpri);
+		pool = get_cwq(WORK_CPU_UNBOUND, wq)->pool;
 		spin_lock(&pool->lock);
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/13] workqueue: add lock_pool_executing_work()
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (6 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 07/13] workqueue: get pool from wq/cwq Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 21:34   ` Tejun Heo
  2013-01-31 18:41 ` [PATCH 09/13] workqueue: add lock_pool_queued_work() Lai Jiangshan
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Extract the lock code from __queue_work() and name it lock_pool_executing_work().
It makes the code better readability and make __queue_work() shorter,

And this new function can be reused by others(later patches).

Add/Use proper locking API.

This patch has a little bad side effect for __queue_work():
even worker_pool_by_id(pool_id) == get_cwq(cpu, wq)->pool,
It will still call find_worker_executing_work() unconditionally, but it is
it is a very small overhead compared to the lockings and worker_pool_by_id(),
and this overhead will be reduced later.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   72 +++++++++++++++++++++++++++++++++++----------------
 1 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a138844..6e92f18 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -934,6 +934,43 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 	return NULL;
 }
 
+/** lock_pool_executing_work - lock the pool a given offq work is running on
+ * @work: work of interest
+ * @worker: return the worker which is executing @work if found
+ *
+ * CONTEXT:
+ * local_irq_disable()
+ *
+ * RETURNS:
+ * Pointer to work pool(and locked) on which @work is running if found,
+ * NULL otherwise.
+ */
+static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
+						    struct worker **worker)
+{
+	unsigned long pool_id = offq_work_pool_id(work);
+	struct worker_pool *pool;
+	struct worker *exec;
+
+	if (pool_id == WORK_OFFQ_POOL_NONE)
+		return NULL;
+
+	pool = worker_pool_by_id(pool_id);
+	if (!pool)
+		return NULL;
+
+	spin_lock(&pool->lock);
+	exec = find_worker_executing_work(pool, work);
+	if (exec) {
+		BUG_ON(pool != exec->pool);
+		*worker = exec;
+		return pool;
+	}
+	spin_unlock(&pool->lock);
+
+	return NULL;
+}
+
 /**
  * move_linked_works - move linked works to a list
  * @work: start of series of works to be scheduled
@@ -1235,35 +1272,24 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 
 	/* determine pool to use */
 	if (!(wq->flags & WQ_UNBOUND)) {
-		struct worker_pool *last_pool;
+		struct worker *worker;
 
 		if (cpu == WORK_CPU_UNBOUND)
 			cpu = raw_smp_processor_id();
 
 		/*
-		 * It's multi cpu.  If @work was previously on a different
-		 * cpu, it might still be running there, in which case the
-		 * work needs to be queued on that cpu to guarantee
-		 * non-reentrancy.
+		 * It's multi cpu and pool. If @work is still running on a pool,
+		 * in which case the work needs to be queued on that pool
+		 * to guarantee non-reentrancy.
 		 */
-		pool = get_cwq(cpu, wq)->pool;
-		last_pool = get_work_pool(work);
-
-		if (last_pool && last_pool != pool) {
-			struct worker *worker;
-
-			spin_lock(&last_pool->lock);
-
-			worker = find_worker_executing_work(last_pool, work);
-
-			if (worker && worker->current_cwq->wq == wq)
-				pool = last_pool;
-			else {
-				/* meh... not running there, queue here */
-				spin_unlock(&last_pool->lock);
-				spin_lock(&pool->lock);
-			}
-		} else {
+		BUG_ON(get_work_cwq(work));
+		pool = lock_pool_executing_work(work, &worker);
+		if (pool && worker->current_cwq->wq != wq) {
+			spin_unlock(&pool->lock);
+			pool = NULL;
+		}
+		if (!pool) {
+			pool = get_cwq(cpu, wq)->pool;
 			spin_lock(&pool->lock);
 		}
 	} else {
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/13] workqueue: add lock_pool_queued_work()
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (7 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 08/13] workqueue: add lock_pool_executing_work() Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool() Lai Jiangshan
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Extract lock code from try_to_grab_pending() and name it lock_pool_queued_work().
It makes the code better readability and make try_to_grab_pending() shorter,

And this new function can be reused by other(later patch).

Add/Use proper locking API.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |  100 ++++++++++++++++++++++++++++++---------------------
 1 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6e92f18..a108788 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -971,6 +971,45 @@ static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
 	return NULL;
 }
 
+/** lock_pool_queued_work - lock the pool a given work is queued on
+ * @work: work of interest
+ *
+ * CONTEXT:
+ * local_irq_disable()
+ *
+ * RETURNS:
+ * Pointer to work pool(and locked) on which @work is queued if found,
+ * NULL otherwise.
+ */
+static struct worker_pool *lock_pool_queued_work(struct work_struct *work)
+{
+	struct cpu_workqueue_struct *cwq = get_work_cwq(work);
+	struct worker_pool *pool;
+
+	if (!cwq)
+		return NULL;
+
+	pool = cwq->pool;
+	spin_lock(&pool->lock);
+	/*
+	 * The CWQ bit is set/cleared only when we do enqueue/dequeue the work
+	 * When a work is enqueued(insert_work()) to a pool:
+	 *	we set cwq(CWQ bit) with pool->lock held
+	 * when a work is dequeued(process_one_work(),try_to_grab_pending()):
+	 *	we clear cwq(CWQ bit) with pool->lock held
+	 *
+	 * So when if the pool->lock is held, we can determine:
+	 * 	CWQ bit is set and the cwq->pool == pool
+	 * 	<==> the work is queued on the pool
+	 */
+	cwq = get_work_cwq(work);
+	if (cwq && cwq->pool == pool)
+		return pool;
+	spin_unlock(&pool->lock);
+
+	return NULL;
+}
+
 /**
  * move_linked_works - move linked works to a list
  * @work: start of series of works to be scheduled
@@ -1104,7 +1143,6 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 			       unsigned long *flags)
 {
 	struct worker_pool *pool;
-	struct cpu_workqueue_struct *cwq;
 
 	local_irq_save(*flags);
 
@@ -1129,49 +1167,29 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	 * The queueing is in progress, or it is already queued. Try to
 	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
 	 */
-	pool = get_work_pool(work);
-	if (!pool)
-		goto fail;
-
-	spin_lock(&pool->lock);
-	/*
-	 * The CWQ bit is set/cleared only when we do enqueue/dequeue the work
-	 * When a work is enqueued(insert_work()) to a pool:
-	 *	we set cwq(CWQ bit) with pool->lock held
-	 * when a work is dequeued(process_one_work(),try_to_grab_pending()):
-	 *	we clear cwq(CWQ bit) with pool->lock held
-	 *
-	 * So when if the pool->lock is held, we can determine:
-	 * 	CWQ bit is set and the cwq->pool == pool
-	 * 	<==> the work is queued on the pool
-	 */
-	cwq = get_work_cwq(work);
-	if (cwq) {
-		if (pool == cwq->pool) {
-			debug_work_deactivate(work);
+	pool = lock_pool_queued_work(work);
+	if (pool) {
+		debug_work_deactivate(work);
 
-			/*
-			 * A delayed work item cannot be grabbed directly
-			 * because it might have linked NO_COLOR work items
-			 * which, if left on the delayed_list, will confuse
-			 * cwq->nr_active management later on and cause
-			 * stall.  Make sure the work item is activated
-			 * before grabbing.
-			 */
-			if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
-				cwq_activate_delayed_work(work);
+		/*
+		 * A delayed work item cannot be grabbed directly
+		 * because it might have linked NO_COLOR work items
+		 * which, if left on the delayed_list, will confuse
+		 * cwq->nr_active management later on and cause
+		 * stall.  Make sure the work item is activated
+		 * before grabbing.
+		 */
+		if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
+			cwq_activate_delayed_work(work);
 
-			list_del_init(&work->entry);
-			cwq_dec_nr_in_flight(get_work_cwq(work),
-				get_work_color(work));
+		list_del_init(&work->entry);
+		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
 
-			clear_work_cwq(work, pool->id);
-			spin_unlock(&pool->lock);
-			return 1;
-		}
+		clear_work_cwq(work, pool->id);
+		spin_unlock(&pool->lock);
+		return 1;
 	}
-	spin_unlock(&pool->lock);
-fail:
+
 	local_irq_restore(*flags);
 	if (work_is_canceling(work))
 		return -ENOENT;
@@ -2821,7 +2839,7 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 		return false;
 
 	spin_lock_irq(&pool->lock);
-	/* See the comment near try_to_grab_pending() with the same code */
+	/* See the comment near lock_pool_queued_work() with the same code */
 	cwq = get_work_cwq(work);
 	if (cwq) {
 		if (unlikely(pool != cwq->pool))
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool()
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (8 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 09/13] workqueue: add lock_pool_queued_work() Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 21:38   ` Tejun Heo
  2013-01-31 18:41 ` [PATCH 11/13] workqueue: allow more work_pool id space Lai Jiangshan
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

lock_pool_own_work() reuse lock_pool_queued_work() and lock_pool_executing_work()
to implement "get the correct own pool and lock" logic. It makes all callings
to lock_pool_own_work()'s mean/logic are more clear.

get_work_pool() can't get the really pool which owns the work
and it is not used any more, just remove it.

Add/Use proper locking API.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |  123 ++++++++++++++++++++++++++--------------------------
 1 files changed, 62 insertions(+), 61 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a108788..04d1286 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -531,10 +531,8 @@ static int work_next_color(int color)
  * work->data.  These functions should only be called while the work is
  * owned - ie. while the PENDING bit is set.
  *
- * get_work_pool() and get_work_cwq() can be used to obtain the pool or cwq
- * corresponding to a work.  Pool is available once the work has been
- * queued anywhere after initialization until it is sync canceled.  cwq is
- * available only while the work item is queued.
+ * get_work_cwq() can be used to obtain the cwq, it is available only
+ * while the work item is queued.
  *
  * %WORK_OFFQ_CANCELING is used to mark a work item which is being
  * canceled.  While being canceled, a work item may have its PENDING set
@@ -592,31 +590,6 @@ static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
 }
 
 /**
- * get_work_pool - return the worker_pool a given work was associated with
- * @work: the work item of interest
- *
- * Return the worker_pool @work was last associated with.  %NULL if none.
- */
-static struct worker_pool *get_work_pool(struct work_struct *work)
-{
-	unsigned long data = atomic_long_read(&work->data);
-	struct worker_pool *pool;
-	int pool_id;
-
-	if (data & WORK_STRUCT_CWQ)
-		return ((struct cpu_workqueue_struct *)
-			(data & WORK_STRUCT_WQ_DATA_MASK))->pool;
-
-	pool_id = data >> WORK_OFFQ_POOL_SHIFT;
-	if (pool_id == WORK_OFFQ_POOL_NONE)
-		return NULL;
-
-	pool = worker_pool_by_id(pool_id);
-	WARN_ON_ONCE(!pool);
-	return pool;
-}
-
-/**
  * offq_work_pool_id - return the worker pool ID a given work is associated with
  * @work: the off-queued work item of interest
  *
@@ -936,6 +909,7 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 
 /** lock_pool_executing_work - lock the pool a given offq work is running on
  * @work: work of interest
+ * @pool_id: pool_id obtained from @work data
  * @worker: return the worker which is executing @work if found
  *
  * CONTEXT:
@@ -946,9 +920,9 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
  * NULL otherwise.
  */
 static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
+						    unsigned long pool_id,
 						    struct worker **worker)
 {
-	unsigned long pool_id = offq_work_pool_id(work);
 	struct worker_pool *pool;
 	struct worker *exec;
 
@@ -973,6 +947,7 @@ static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
 
 /** lock_pool_queued_work - lock the pool a given work is queued on
  * @work: work of interest
+ * @cwq: cwq obtained from @work data
  *
  * CONTEXT:
  * local_irq_disable()
@@ -981,14 +956,12 @@ static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
  * Pointer to work pool(and locked) on which @work is queued if found,
  * NULL otherwise.
  */
-static struct worker_pool *lock_pool_queued_work(struct work_struct *work)
+static
+struct worker_pool *lock_pool_queued_work(struct work_struct *work,
+					  struct cpu_workqueue_struct *cwq)
 {
-	struct cpu_workqueue_struct *cwq = get_work_cwq(work);
 	struct worker_pool *pool;
 
-	if (!cwq)
-		return NULL;
-
 	pool = cwq->pool;
 	spin_lock(&pool->lock);
 	/*
@@ -1010,6 +983,40 @@ static struct worker_pool *lock_pool_queued_work(struct work_struct *work)
 	return NULL;
 }
 
+/** lock_pool_own_work - lock the pool a given work is queued/running on
+ * @work: work of interest
+ * @worker: return the worker which is executing off-queued @work if found
+ *
+ * CONTEXT:
+ * local_irq_disable()
+ *
+ * RETURNS:
+ * Pointer to work pool(and locked) on which @work is queued/running if found,
+ * 	- If the work is queued on the pool, lock and return the pool, with
+ * 	  @worker untouched. The work may or may not be running on the pool.
+ * 	- If the work is not queued on the pool, but it is running on the pool,
+ * 	  lock and return the pool with @worker set to the executing worker
+ * 	  (This step will be skipped if @worker is NULL)
+ * NULL otherwise with @worker untouched
+ */
+static struct worker_pool *lock_pool_own_work(struct work_struct *work,
+					      struct worker **worker)
+{
+	unsigned long data = atomic_long_read(&work->data);
+	struct cpu_workqueue_struct *cwq;
+	unsigned long pool_id;
+
+	if (data & WORK_STRUCT_CWQ) {
+		cwq = (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
+		return lock_pool_queued_work(work, cwq);
+	} else if (worker) {
+		pool_id = data >> WORK_OFFQ_POOL_SHIFT;
+		return lock_pool_executing_work(work, pool_id, worker);
+	}
+
+	return NULL;
+}
+
 /**
  * move_linked_works - move linked works to a list
  * @work: start of series of works to be scheduled
@@ -1167,7 +1174,7 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	 * The queueing is in progress, or it is already queued. Try to
 	 * steal it from ->worklist without clearing WORK_STRUCT_PENDING.
 	 */
-	pool = lock_pool_queued_work(work);
+	pool = lock_pool_own_work(work, NULL);
 	if (pool) {
 		debug_work_deactivate(work);
 
@@ -1301,7 +1308,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		 * to guarantee non-reentrancy.
 		 */
 		BUG_ON(get_work_cwq(work));
-		pool = lock_pool_executing_work(work, &worker);
+		pool = lock_pool_own_work(work, &worker);
 		if (pool && worker->current_cwq->wq != wq) {
 			spin_unlock(&pool->lock);
 			pool = NULL;
@@ -2834,22 +2841,17 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 	struct cpu_workqueue_struct *cwq;
 
 	might_sleep();
-	pool = get_work_pool(work);
-	if (!pool)
+	local_irq_disable();
+	pool = lock_pool_own_work(work, &worker);
+	if (!pool) {
+		local_irq_enable();
 		return false;
+	}
 
-	spin_lock_irq(&pool->lock);
-	/* See the comment near lock_pool_queued_work() with the same code */
-	cwq = get_work_cwq(work);
-	if (cwq) {
-		if (unlikely(pool != cwq->pool))
-			goto already_gone;
-	} else {
-		worker = find_worker_executing_work(pool, work);
-		if (!worker)
-			goto already_gone;
+	if (!worker)
+		cwq = get_work_cwq(work);
+	else
 		cwq = worker->current_cwq;
-	}
 
 	insert_wq_barrier(cwq, barr, work, worker);
 	spin_unlock_irq(&pool->lock);
@@ -2867,9 +2869,6 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr)
 	lock_map_release(&cwq->wq->lockdep_map);
 
 	return true;
-already_gone:
-	spin_unlock_irq(&pool->lock);
-	return false;
 }
 
 /**
@@ -3468,19 +3467,21 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
  */
 unsigned int work_busy(struct work_struct *work)
 {
-	struct worker_pool *pool = get_work_pool(work);
 	unsigned long flags;
+	struct worker_pool *pool;
+	struct worker *worker = NULL;
 	unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;
 
-	if (!pool)
-		return ret;
-
-	spin_lock_irqsave(&pool->lock, flags);
+	local_irq_save(flags);
 
-	if (find_worker_executing_work(pool, work))
-		ret |= WORK_BUSY_RUNNING;
+	pool = lock_pool_own_work(work, &worker);
+	if (pool) {
+		if (worker || find_worker_executing_work(pool, work))
+			ret |= WORK_BUSY_RUNNING;
+		spin_unlock(&pool->lock);
+	}
 
-	spin_unlock_irqrestore(&pool->lock, flags);
+	local_irq_restore(flags);
 
 	return ret;
 }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/13] workqueue: allow more work_pool id space
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (9 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool() Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-01-31 18:41 ` [PATCH 12/13] workqueue: add worker's global worker ID Lai Jiangshan
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

color bits is not used when offq, so we reuse them for pool IDs.
thus we will have more pool IDs.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index db1782b..cb8e69d 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -68,7 +68,7 @@ enum {
 				  WORK_STRUCT_COLOR_BITS,
 
 	/* data contains off-queue information when !WORK_STRUCT_CWQ */
-	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_FLAG_BITS,
+	WORK_OFFQ_FLAG_BASE	= WORK_STRUCT_COLOR_SHIFT,
 
 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/13] workqueue: add worker's global worker ID
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (10 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 11/13] workqueue: allow more work_pool id space Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 21:39   ` Tejun Heo
  2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
  2013-02-04 21:04 ` [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Tejun Heo
  13 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Add worker->gwid which is allocated from worker_idr.  This
will be used to record the last running worker in work->data.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c          |   28 ++++++++++++++++++++++++++++
 kernel/workqueue_internal.h |    1 +
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 04d1286..bc57faf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -439,6 +439,10 @@ static atomic_t unbound_std_pool_nr_running[NR_STD_WORKER_POOLS] = {
 static DEFINE_MUTEX(worker_pool_idr_mutex);
 static DEFINE_IDR(worker_pool_idr);
 
+/* idr of all workers */
+static DEFINE_MUTEX(worker_gwid_idr_mutex);
+static DEFINE_IDR(worker_gwid_idr);
+
 static int worker_thread(void *__worker);
 
 static struct worker_pool *std_worker_pools(int cpu)
@@ -454,6 +458,26 @@ static int std_worker_pool_pri(struct worker_pool *pool)
 	return pool - std_worker_pools(pool->cpu);
 }
 
+/* allocate ID and assign it to @worker */
+static int worker_assign_gwid(struct worker *worker)
+{
+	int ret;
+
+	mutex_lock(&worker_gwid_idr_mutex);
+	idr_pre_get(&worker_gwid_idr, GFP_KERNEL);
+	ret = idr_get_new(&worker_gwid_idr, worker, &worker->gwid);
+	mutex_unlock(&worker_gwid_idr_mutex);
+
+	return ret;
+}
+
+static void free_worker_gwid(struct worker *worker)
+{
+	mutex_lock(&worker_gwid_idr_mutex);
+	idr_remove(&worker_gwid_idr, worker->gwid);
+	mutex_unlock(&worker_gwid_idr_mutex);
+}
+
 /* allocate ID and assign it to @pool */
 static int worker_pool_assign_id(struct worker_pool *pool)
 {
@@ -1814,6 +1838,9 @@ static struct worker *create_worker(struct worker_pool *pool)
 	worker->pool = pool;
 	worker->id = id;
 
+	if (worker_assign_gwid(worker))
+		goto fail;
+
 	if (pool->cpu != WORK_CPU_UNBOUND)
 		worker->task = kthread_create_on_node(worker_thread,
 					worker, cpu_to_node(pool->cpu),
@@ -1900,6 +1927,7 @@ static void destroy_worker(struct worker *worker)
 	spin_unlock_irq(&pool->lock);
 
 	kthread_stop(worker->task);
+	free_worker_gwid(worker);
 	kfree(worker);
 
 	spin_lock_irq(&pool->lock);
diff --git a/kernel/workqueue_internal.h b/kernel/workqueue_internal.h
index 328be4a..eccd50c 100644
--- a/kernel/workqueue_internal.h
+++ b/kernel/workqueue_internal.h
@@ -36,6 +36,7 @@ struct worker {
 	unsigned long		last_active;	/* L: last active timestamp */
 	unsigned int		flags;		/* X: flags */
 	int			id;		/* I: worker id */
+	int			gwid;		/* I: global worker id */
 
 	/* for rebinding worker to CPU */
 	struct work_struct	rebind_work;	/* L: for busy worker */
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (11 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 12/13] workqueue: add worker's global worker ID Lai Jiangshan
@ 2013-01-31 18:41 ` Lai Jiangshan
  2013-02-04 21:40   ` Tejun Heo
                     ` (2 more replies)
  2013-02-04 21:04 ` [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Tejun Heo
  13 siblings, 3 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-01-31 18:41 UTC (permalink / raw)
  To: Tejun Heo, linux-kernel; +Cc: Lai Jiangshan

Every time, when we need to find the executing worker from work,
we need 2 steps:
	find the pool from the idr by pool id.
	find the worker from the hash table of the pool.

Now we merge them as one step: find the worker directly from the idr by worker gwid.
(lock_pool_executing_work(). If the work is onq, we still use hash table.)

It makes the code more straightforward.

In future, we may add percpu worker gwid <--> worker mapping cache when needed.

And we are planing to add non-std worker_pool, we still don't know how to
implement worker_pool_by_id() for non-std worker_pool, this patch solves it.

This patch slows down the very-slow-path destroy_worker(), if it is blamed,
we will move the synchronize_rcu() out.

This patch adds a small overhead(rcu_read_[un]lock) in fast path
lock_pool_executing_work(). if it is blamed, we will use rcu_sched instead.
(local_irq_disable() implies rcu_read_lock_sched(), so we can remove
this overhead.)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 include/linux/workqueue.h |   18 +++---
 kernel/workqueue.c        |  126 ++++++++++++++++++++-------------------------
 2 files changed, 65 insertions(+), 79 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index cb8e69d..a5c508b 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -73,20 +73,20 @@ enum {
 	WORK_OFFQ_CANCELING	= (1 << WORK_OFFQ_FLAG_BASE),
 
 	/*
-	 * When a work item is off queue, its high bits point to the last
-	 * pool it was on.  Cap at 31 bits and use the highest number to
+	 * When a work item starts to be executed, its high bits point to the
+	 * worker it is running.  Cap at 31 bits and use the highest number to
 	 * indicate that no pool is associated.
 	 */
 	WORK_OFFQ_FLAG_BITS	= 1,
-	WORK_OFFQ_POOL_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
-	WORK_OFFQ_LEFT		= BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT,
-	WORK_OFFQ_POOL_BITS	= WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
-	WORK_OFFQ_POOL_NONE	= (1LU << WORK_OFFQ_POOL_BITS) - 1,
+	WORK_OFFQ_WORKER_SHIFT	= WORK_OFFQ_FLAG_BASE + WORK_OFFQ_FLAG_BITS,
+	WORK_OFFQ_LEFT		= BITS_PER_LONG - WORK_OFFQ_WORKER_SHIFT,
+	WORK_OFFQ_WORKER_BITS	= WORK_OFFQ_LEFT <= 31 ? WORK_OFFQ_LEFT : 31,
+	WORK_OFFQ_WORKER_NONE	= (1LU << WORK_OFFQ_WORKER_BITS) - 1,
 
 	/* convenience constants */
 	WORK_STRUCT_FLAG_MASK	= (1UL << WORK_STRUCT_FLAG_BITS) - 1,
 	WORK_STRUCT_WQ_DATA_MASK = ~WORK_STRUCT_FLAG_MASK,
-	WORK_STRUCT_NO_POOL	= (unsigned long)WORK_OFFQ_POOL_NONE << WORK_OFFQ_POOL_SHIFT,
+	WORK_STRUCT_NO_WORKER	= (unsigned long)WORK_OFFQ_WORKER_NONE << WORK_OFFQ_WORKER_SHIFT,
 
 	/* bit mask for work_busy() return values */
 	WORK_BUSY_PENDING	= 1 << 0,
@@ -102,9 +102,9 @@ struct work_struct {
 #endif
 };
 
-#define WORK_DATA_INIT()	ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL)
+#define WORK_DATA_INIT()	ATOMIC_LONG_INIT(WORK_STRUCT_NO_WORKER)
 #define WORK_DATA_STATIC_INIT()	\
-	ATOMIC_LONG_INIT(WORK_STRUCT_NO_POOL | WORK_STRUCT_STATIC)
+	ATOMIC_LONG_INIT(WORK_STRUCT_NO_WORKER | WORK_STRUCT_STATIC)
 
 struct delayed_work {
 	struct work_struct work;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bc57faf..bdaa1f7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -42,6 +42,7 @@
 #include <linux/lockdep.h>
 #include <linux/idr.h>
 #include <linux/hashtable.h>
+#include <linux/rcupdate.h>
 
 #include "workqueue_internal.h"
 
@@ -125,7 +126,6 @@ enum {
 struct worker_pool {
 	spinlock_t		lock;		/* the pool lock */
 	unsigned int		cpu;		/* I: the associated cpu */
-	int			id;		/* I: pool ID */
 	unsigned int		flags;		/* X: flags */
 
 	struct list_head	worklist;	/* L: list of pending works */
@@ -435,10 +435,6 @@ static atomic_t unbound_std_pool_nr_running[NR_STD_WORKER_POOLS] = {
 	[0 ... NR_STD_WORKER_POOLS - 1]	= ATOMIC_INIT(0),	/* always 0 */
 };
 
-/* idr of all pools */
-static DEFINE_MUTEX(worker_pool_idr_mutex);
-static DEFINE_IDR(worker_pool_idr);
-
 /* idr of all workers */
 static DEFINE_MUTEX(worker_gwid_idr_mutex);
 static DEFINE_IDR(worker_gwid_idr);
@@ -478,28 +474,6 @@ static void free_worker_gwid(struct worker *worker)
 	mutex_unlock(&worker_gwid_idr_mutex);
 }
 
-/* allocate ID and assign it to @pool */
-static int worker_pool_assign_id(struct worker_pool *pool)
-{
-	int ret;
-
-	mutex_lock(&worker_pool_idr_mutex);
-	idr_pre_get(&worker_pool_idr, GFP_KERNEL);
-	ret = idr_get_new(&worker_pool_idr, pool, &pool->id);
-	mutex_unlock(&worker_pool_idr_mutex);
-
-	return ret;
-}
-
-/*
- * Lookup worker_pool by id.  The idr currently is built during boot and
- * never modified.  Don't worry about locking for now.
- */
-static struct worker_pool *worker_pool_by_id(int pool_id)
-{
-	return idr_find(&worker_pool_idr, pool_id);
-}
-
 static struct worker_pool *get_std_worker_pool(int cpu, bool highpri)
 {
 	struct worker_pool *pools = std_worker_pools(cpu);
@@ -548,10 +522,10 @@ static int work_next_color(int color)
 /*
  * While queued, %WORK_STRUCT_CWQ is set and non flag bits of a work's data
  * contain the pointer to the queued cwq.  Once execution starts, the flag
- * is cleared and the high bits contain OFFQ flags and pool ID.
+ * is cleared and the high bits contain OFFQ flags and global worker ID.
  *
- * set_work_cwq(), set_work_pool_and_clear_pending(), mark_work_canceling()
- * and clear_work_data() can be used to set the cwq, pool or clear
+ * set_work_cwq(), set_work_worker_and_clear_pending(), mark_work_canceling()
+ * and clear_work_data() can be used to set the cwq, worker or clear
  * work->data.  These functions should only be called while the work is
  * owned - ie. while the PENDING bit is set.
  *
@@ -578,15 +552,17 @@ static void set_work_cwq(struct work_struct *work,
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
-static void clear_work_cwq(struct work_struct *work, int pool_id)
+static void clear_work_cwq(struct work_struct *work, int worker_gwid)
 {
-	set_work_data(work, pool_id << WORK_OFFQ_POOL_SHIFT,
+	set_work_data(work, worker_gwid << WORK_OFFQ_WORKER_SHIFT,
 		      WORK_STRUCT_PENDING);
 }
 
-static void set_work_pool_and_clear_pending(struct work_struct *work,
-					    int pool_id)
+static void set_work_worker_and_clear_pending(struct work_struct *work,
+					      int worker_gwid)
 {
+	unsigned long data = worker_gwid << WORK_OFFQ_WORKER_SHIFT;
+
 	/*
 	 * The following wmb is paired with the implied mb in
 	 * test_and_set_bit(PENDING) and ensures all updates to @work made
@@ -594,13 +570,13 @@ static void set_work_pool_and_clear_pending(struct work_struct *work,
 	 * owner.
 	 */
 	smp_wmb();
-	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT, 0);
+	set_work_data(work, data, 0);
 }
 
 static void clear_work_data(struct work_struct *work)
 {
-	smp_wmb();	/* see set_work_pool_and_clear_pending() */
-	set_work_data(work, WORK_STRUCT_NO_POOL, 0);
+	smp_wmb();	/* see set_work_worker_and_clear_pending() */
+	set_work_data(work, WORK_STRUCT_NO_WORKER, 0);
 }
 
 static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
@@ -614,25 +590,25 @@ static struct cpu_workqueue_struct *get_work_cwq(struct work_struct *work)
 }
 
 /**
- * offq_work_pool_id - return the worker pool ID a given work is associated with
+ * offq_work_worker_gwid - return the worker gwID a given work is last running on
  * @work: the off-queued work item of interest
  *
- * Return the worker_pool ID @work was last associated with.
+ * Return the worker gwID @work was last running on.
  */
-static int offq_work_pool_id(struct work_struct *work)
+static int offq_work_worker_gwid(struct work_struct *work)
 {
 	unsigned long data = atomic_long_read(&work->data);
 
 	BUG_ON(data & WORK_STRUCT_CWQ);
-	return data >> WORK_OFFQ_POOL_SHIFT;
+	return data >> WORK_OFFQ_WORKER_SHIFT;
 }
 
 static void mark_work_canceling(struct work_struct *work)
 {
-	unsigned long pool_id = offq_work_pool_id(work);
+	unsigned long data = offq_work_worker_gwid(work);
 
-	pool_id <<= WORK_OFFQ_POOL_SHIFT;
-	set_work_data(work, pool_id | WORK_OFFQ_CANCELING, WORK_STRUCT_PENDING);
+	data <<= WORK_OFFQ_WORKER_SHIFT;
+	set_work_data(work, data | WORK_OFFQ_CANCELING, WORK_STRUCT_PENDING);
 }
 
 static bool work_is_canceling(struct work_struct *work)
@@ -933,7 +909,7 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
 
 /** lock_pool_executing_work - lock the pool a given offq work is running on
  * @work: work of interest
- * @pool_id: pool_id obtained from @work data
+ * @worker_gwid: worker_gwid obtained from @work data
  * @worker: return the worker which is executing @work if found
  *
  * CONTEXT:
@@ -944,27 +920,30 @@ static struct worker *find_worker_executing_work(struct worker_pool *pool,
  * NULL otherwise.
  */
 static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
-						    unsigned long pool_id,
+						    unsigned long worker_gwid,
 						    struct worker **worker)
 {
 	struct worker_pool *pool;
 	struct worker *exec;
 
-	if (pool_id == WORK_OFFQ_POOL_NONE)
+	if (worker_gwid == WORK_OFFQ_WORKER_NONE)
 		return NULL;
 
-	pool = worker_pool_by_id(pool_id);
-	if (!pool)
-		return NULL;
+	rcu_read_lock();
+	exec = idr_find(&worker_gwid_idr, worker_gwid);
 
-	spin_lock(&pool->lock);
-	exec = find_worker_executing_work(pool, work);
 	if (exec) {
-		BUG_ON(pool != exec->pool);
-		*worker = exec;
-		return pool;
+		pool = exec->pool;
+		spin_lock(&pool->lock);
+		if (exec->current_work == work &&
+		    exec->current_func == work->func) {
+			*worker = exec;
+			rcu_read_unlock();
+			return pool;
+		}
+		spin_unlock(&pool->lock);
 	}
-	spin_unlock(&pool->lock);
+	rcu_read_unlock();
 
 	return NULL;
 }
@@ -1028,14 +1007,14 @@ static struct worker_pool *lock_pool_own_work(struct work_struct *work,
 {
 	unsigned long data = atomic_long_read(&work->data);
 	struct cpu_workqueue_struct *cwq;
-	unsigned long pool_id;
+	unsigned long worker_gwid;
 
 	if (data & WORK_STRUCT_CWQ) {
 		cwq = (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
 		return lock_pool_queued_work(work, cwq);
 	} else if (worker) {
-		pool_id = data >> WORK_OFFQ_POOL_SHIFT;
-		return lock_pool_executing_work(work, pool_id, worker);
+		worker_gwid = data >> WORK_OFFQ_WORKER_SHIFT;
+		return lock_pool_executing_work(work, worker_gwid, worker);
 	}
 
 	return NULL;
@@ -1200,6 +1179,9 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 	 */
 	pool = lock_pool_own_work(work, NULL);
 	if (pool) {
+		struct worker *worker;
+		int worker_gwid;
+
 		debug_work_deactivate(work);
 
 		/*
@@ -1216,7 +1198,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
 		list_del_init(&work->entry);
 		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
 
-		clear_work_cwq(work, pool->id);
+		/* Does the work is still running? */
+		worker = find_worker_executing_work(pool, work);
+		worker_gwid = worker ? worker->gwid: WORK_OFFQ_WORKER_NONE;
+		clear_work_cwq(work, worker_gwid);
+
 		spin_unlock(&pool->lock);
 		return 1;
 	}
@@ -1928,6 +1914,7 @@ static void destroy_worker(struct worker *worker)
 
 	kthread_stop(worker->task);
 	free_worker_gwid(worker);
+	synchronize_rcu();
 	kfree(worker);
 
 	spin_lock_irq(&pool->lock);
@@ -2262,12 +2249,12 @@ __acquires(&pool->lock)
 		wake_up_worker(pool);
 
 	/*
-	 * Record the last pool and clear PENDING which should be the last
+	 * Record this worker and clear PENDING which should be the last
 	 * update to @work.  Also, do this inside @pool->lock so that
 	 * PENDING and queued state changes happen together while IRQ is
 	 * disabled.
 	 */
-	set_work_pool_and_clear_pending(work, pool->id);
+	set_work_worker_and_clear_pending(work, worker->gwid);
 
 	spin_unlock_irq(&pool->lock);
 
@@ -3021,8 +3008,8 @@ bool cancel_delayed_work(struct delayed_work *dwork)
 	if (unlikely(ret < 0))
 		return false;
 
-	set_work_pool_and_clear_pending(&dwork->work,
-					offq_work_pool_id(&dwork->work));
+	set_work_worker_and_clear_pending(&dwork->work,
+					  offq_work_worker_gwid(&dwork->work));
 	local_irq_restore(flags);
 	return ret;
 }
@@ -3838,9 +3825,11 @@ static int __init init_workqueues(void)
 {
 	unsigned int cpu;
 
-	/* make sure we have enough bits for OFFQ pool ID */
-	BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) <
-		     WORK_CPU_LAST * NR_STD_WORKER_POOLS);
+	/*
+	 * make sure we have enough bits for OFFQ worker gwID,
+	 * at least a 4K stack for every worker.
+	 */
+	BUILD_BUG_ON((BITS_PER_LONG != 64) && (WORK_OFFQ_WORKER_SHIFT > 12));
 
 	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
 	hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);
@@ -3866,9 +3855,6 @@ static int __init init_workqueues(void)
 
 			mutex_init(&pool->assoc_mutex);
 			ida_init(&pool->worker_ida);
-
-			/* alloc pool ID */
-			BUG_ON(worker_pool_assign_id(pool));
 		}
 	}
 
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/13] workqueue: remove WORK_CPU_NONE
  2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
@ 2013-02-04 19:49   ` Tejun Heo
  2013-02-05 12:03     ` Lai Jiangshan
  2013-02-06 22:34   ` [PATCH wq/for-3.9] workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END Tejun Heo
  1 sibling, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 19:49 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Fri, Feb 01, 2013 at 02:41:24AM +0800, Lai Jiangshan wrote:
> In __next_wq_cpu() for_each_*wq_cpu(), the name WORK_CPU_LAST
> is proper than WORK_CPU_NONE, convert them to WORK_CPU_LAST.
> 
> WORK_CPU_NONE is not used any more, just remove it.
...
>  #define for_each_wq_cpu(cpu)						\
>  	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);		\
> -	     (cpu) < WORK_CPU_NONE;					\
> +	     (cpu) < WORK_CPU_LAST;					\
>  	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, 3))

LAST implies that it's the last element of the range and thus that
it's an inclusive range.  Maybe we should rename it to WORK_CPU_END?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/13] workqueue: fix work_busy()
  2013-01-31 18:41 ` [PATCH 02/13] workqueue: fix work_busy() Lai Jiangshan
@ 2013-02-04 19:54   ` Tejun Heo
  2013-02-05 12:06     ` Lai Jiangshan
  2013-02-06 23:02   ` [PATCH wq/for-3.9] workqueue: make work_busy() test WORK_STRUCT_PENDING first Tejun Heo
  1 sibling, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 19:54 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Fri, Feb 01, 2013 at 02:41:25AM +0800, Lai Jiangshan wrote:
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 973b290..d474a6c 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3443,8 +3443,6 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
>   * Test whether @work is currently pending or running.  There is no
>   * synchronization around this function and the test result is
>   * unreliable and only useful as advisory hints or for debugging.
> - * Especially for reentrant wqs, the pending state might hide the
> - * running state.

Yeap, this is no longer true.

>   *
>   * RETURNS:
>   * OR'd bitmask of WORK_BUSY_* bits.
> @@ -3453,15 +3451,13 @@ unsigned int work_busy(struct work_struct *work)
>  {
>  	struct worker_pool *pool = get_work_pool(work);
>  	unsigned long flags;
> -	unsigned int ret = 0;
> +	unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;

I'd prefer this as a if() statement.

>  	if (!pool)
> -		return 0;
> +		return ret;

I'm a bit confused.  When can we be pending w/o pool?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/13] workqueue: enhance locking and record global worker id for work data
  2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
                   ` (12 preceding siblings ...)
  2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
@ 2013-02-04 21:04 ` Tejun Heo
  2013-02-05 16:19   ` Lai Jiangshan
  13 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:04 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

Generally, I *really* like where you're headed but like before it's a
bit difficult for me to apply the patches as-is.  Please read on.

On Fri, Feb 01, 2013 at 02:41:23AM +0800, Lai Jiangshan wrote:
> Better Locking:
> mainly based on *mb() which is the most dangerous code and bad for readability.
> This series change the usage of CWQ bit and makes these code simpler.
> 	--PATCH 3,4,5

Yeah, that's one ugly piece of memory barrier magic which has been
around forever.  I never bothered with it as it was fairly localized
and not broken.  I *do* like removing it.  A bit on the fence about
adding another field to delayed_work tho.  The @cpu addition was about
correctness but this one doesn't really buy us anything other than
cleaner code.  Folding the wq field into work_struct would be ugly,
right?  Hmmm....

> We have get_work_pool(), but it requires the caller do the later check and locking,
> we replace it which 3 better internal locking API. 1) More proper API and
> 2) merge the duplicated code and 3) simplify the caller.
> 	--PATCH 8,9,10

This mostly leads up to gwid change, right?

> get_work_pool()/get_work_pool_id() are called everywhere, something they are
> overkill(called idr_find() unneeded) and indirectly(caller knows it is onq or not),
> we replace them with get_work_cwq()/offq_work_pool_id()/locking APIs.
> 	--PATCH 3,4,5,6,8,9,10

Can't we just make get_work_pool_id() do a fast path if OFFQ than
requiring the user to distinguish off and on queue cases?

> Safely/one-step searching and worker id:
> ----------------------------------------
> 
> We are planing to add non-std worker_pool, but old get_work_pool() or new
> lock_pool_executing_work() was not prepared for this plan, idr_find(pool_id)
> is unsafe when we introduce free-able non-std worker_pool. Although we can
> fix it by adding rcu to worker_pool. but "recording global worker id for
> work data and adding rcu to worker" is another way and more straight forward.
> We implement the later one,  Now, lock_pool_executing_work() is ready for this plan.
> 	--PATCH 12,13
>
> When every time we need to find out the running worker from a work,
> we need two searches: search work_pool from work's data, and search worker
> from hash. We record global worker id for work data and we only need one search.
> 	--PATCH 13

While I'm a bit worried about capping total number of workers by the
amount bits left in work->data, if that doesn't cause any practical
issue (how many do we have available on 32bit?), I think this is the
better approach.  We couldn't do this before because work -> worker
relationship could be 1:N but it should now be doable.  Note that we
need RCU no matter what we index (pool or worker) to avoid locking on
each lookup.

So, I like both major changes made by the patchset and most changes
seem correct, well at least on casual review that is.

The problem is that I'm not very happy with the descriptions and
comments (what's up with the weird /** formatting?).  At least for me,
the patchset is quite difficult to follow.  I'm not sure whether it
has actual organizational issues or the descriptions aren't detailed /
clear enough yet.

>From past experience, I *think* it's gonna be a bit of struggle for
both of us to get the series in a shape that I would find acceptable
by reviewing and iterating, so I might just swallow it and regurgitate
into a form that I like.  Hmm.... dunno.  Will think about it.

Anyways, nice work.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-01-31 18:41 ` [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool Lai Jiangshan
@ 2013-02-04 21:28   ` Tejun Heo
  2013-02-05 15:00     ` Lai Jiangshan
  2013-02-05 15:06     ` Lai Jiangshan
  2013-02-07  0:28   ` [PATCH wq/for-3.9 workqueue]: add delayed_work->wq to simplify reentrancy handling Tejun Heo
  1 sibling, 2 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:28 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, again.

On Fri, Feb 01, 2013 at 02:41:26AM +0800, Lai Jiangshan wrote:
> @@ -110,6 +110,7 @@ struct delayed_work {
>  	struct work_struct work;
>  	struct timer_list timer;
>  	int cpu;
> +	struct workqueue_struct *wq;

Can't we just replace delayed_work->cpu with delayed_work->cwq?  That
way, we don't enlarge delayed_work while encoding both wq and cpu in
delayed_work proper.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 08/13] workqueue: add lock_pool_executing_work()
  2013-01-31 18:41 ` [PATCH 08/13] workqueue: add lock_pool_executing_work() Lai Jiangshan
@ 2013-02-04 21:34   ` Tejun Heo
  2013-02-05 12:15     ` Lai Jiangshan
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:34 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Fri, Feb 01, 2013 at 02:41:31AM +0800, Lai Jiangshan wrote:
> +static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
> +						    struct worker **worker)
> +{
> +	unsigned long pool_id = offq_work_pool_id(work);
> +	struct worker_pool *pool;
> +	struct worker *exec;
> +
> +	if (pool_id == WORK_OFFQ_POOL_NONE)
> +		return NULL;
> +
> +	pool = worker_pool_by_id(pool_id);
> +	if (!pool)
> +		return NULL;
> +
> +	spin_lock(&pool->lock);
> +	exec = find_worker_executing_work(pool, work);
> +	if (exec) {
> +		BUG_ON(pool != exec->pool);
> +		*worker = exec;
> +		return pool;
> +	}
> +	spin_unlock(&pool->lock);
> +
> +	return NULL;
> +}

So, if a work item is queued on the same CPU and it isn't being
executed, it will lock, look up the hash, unlock and then lock again?
If this is something improved by later patch, please explain so.
There gotta be a better way to do this, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool()
  2013-01-31 18:41 ` [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool() Lai Jiangshan
@ 2013-02-04 21:38   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:38 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Fri, Feb 01, 2013 at 02:41:33AM +0800, Lai Jiangshan wrote:
> lock_pool_own_work() reuse lock_pool_queued_work() and lock_pool_executing_work()
> to implement "get the correct own pool and lock" logic. It makes all callings
> to lock_pool_own_work()'s mean/logic are more clear.
> 
> get_work_pool() can't get the really pool which owns the work
> and it is not used any more, just remove it.
> 
> Add/Use proper locking API.

I like they're being collected into one interface although the name
seems a bit iffy to me.  Maybe mark the original two functions as
internal to the new locking interface?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 12/13] workqueue: add worker's global worker ID
  2013-01-31 18:41 ` [PATCH 12/13] workqueue: add worker's global worker ID Lai Jiangshan
@ 2013-02-04 21:39   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:39 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Fri, Feb 01, 2013 at 02:41:35AM +0800, Lai Jiangshan wrote:
> Add worker->gwid which is allocated from worker_idr.  This
> will be used to record the last running worker in work->data.

worker->id is almost completely cosmetic at this point.  Maybe we want
to rename it to something, say, worker->id_in_pool and use worker->id
for the global id?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
@ 2013-02-04 21:40   ` Tejun Heo
  2013-02-04 22:12   ` Tejun Heo
  2013-02-07 22:02   ` Tejun Heo
  2 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 21:40 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Fri, Feb 01, 2013 at 02:41:36AM +0800, Lai Jiangshan wrote:
> -	pool = worker_pool_by_id(pool_id);
> -	if (!pool)
> -		return NULL;
> +	rcu_read_lock();
> +	exec = idr_find(&worker_gwid_idr, worker_gwid);
>  
> -	spin_lock(&pool->lock);
> -	exec = find_worker_executing_work(pool, work);
>  	if (exec) {
> -		BUG_ON(pool != exec->pool);
> -		*worker = exec;
> -		return pool;
> +		pool = exec->pool;
> +		spin_lock(&pool->lock);
> +		if (exec->current_work == work &&
> +		    exec->current_func == work->func) {
> +			*worker = exec;
> +			rcu_read_unlock();
> +			return pool;
> +		}
> +		spin_unlock(&pool->lock);
>  	}
> -	spin_unlock(&pool->lock);
> +	rcu_read_unlock();

We're guaranteed to have irq disabled, why not use sched_rcu in
combination with synchronize_sched()?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
  2013-02-04 21:40   ` Tejun Heo
@ 2013-02-04 22:12   ` Tejun Heo
  2013-02-05 15:18     ` Lai Jiangshan
  2013-02-07 22:02   ` Tejun Heo
  2 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-04 22:12 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

One more thing.

On Fri, Feb 01, 2013 at 02:41:36AM +0800, Lai Jiangshan wrote:
> @@ -1216,7 +1198,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
>  		list_del_init(&work->entry);
>  		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
>  
> -		clear_work_cwq(work, pool->id);
> +		/* Does the work is still running? */
> +		worker = find_worker_executing_work(pool, work);
> +		worker_gwid = worker ? worker->gwid: WORK_OFFQ_WORKER_NONE;
> +		clear_work_cwq(work, worker_gwid);

Any chance we can remove the busy_hash?  Having to keep it around
isn't a big deal but it would be nice if can get rid of it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 01/13] workqueue: remove WORK_CPU_NONE
  2013-02-04 19:49   ` Tejun Heo
@ 2013-02-05 12:03     ` Lai Jiangshan
  0 siblings, 0 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 12:03 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

于 2013/2/5 3:49, Tejun Heo 写道:
> Hello, Lai.
>
> On Fri, Feb 01, 2013 at 02:41:24AM +0800, Lai Jiangshan wrote:
>> In __next_wq_cpu() for_each_*wq_cpu(), the name WORK_CPU_LAST
>> is proper than WORK_CPU_NONE, convert them to WORK_CPU_LAST.
>>
>> WORK_CPU_NONE is not used any more, just remove it.
> ...
>>   #define for_each_wq_cpu(cpu)						\
>>   	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);		\
>> -	     (cpu) < WORK_CPU_NONE;					\
>> +	     (cpu) < WORK_CPU_LAST;					\
>>   	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, 3))
>
> LAST implies that it's the last element of the range and thus that
> it's an inclusive range.  Maybe we should rename it to WORK_CPU_END?
>

You are right, WORK_CPU_END seems better.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/13] workqueue: fix work_busy()
  2013-02-04 19:54   ` Tejun Heo
@ 2013-02-05 12:06     ` Lai Jiangshan
  2013-02-05 18:53       ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 12:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

于 2013/2/5 3:54, Tejun Heo 写道:
> Hello, Lai.
>
> On Fri, Feb 01, 2013 at 02:41:25AM +0800, Lai Jiangshan wrote:
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 973b290..d474a6c 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -3443,8 +3443,6 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
>>    * Test whether @work is currently pending or running.  There is no
>>    * synchronization around this function and the test result is
>>    * unreliable and only useful as advisory hints or for debugging.
>> - * Especially for reentrant wqs, the pending state might hide the
>> - * running state.
>
> Yeap, this is no longer true.
>
>>    *
>>    * RETURNS:
>>    * OR'd bitmask of WORK_BUSY_* bits.
>> @@ -3453,15 +3451,13 @@ unsigned int work_busy(struct work_struct *work)
>>   {
>>   	struct worker_pool *pool = get_work_pool(work);
>>   	unsigned long flags;
>> -	unsigned int ret = 0;
>> +	unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;
>
> I'd prefer this as a if() statement.
>
>>   	if (!pool)
>> -		return 0;
>> +		return ret;
>
> I'm a bit confused.  When can we be pending w/o pool?
>

grab the pending bits <==time==> really queued
			^
		this patch considers the work is busy in this time


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 08/13] workqueue: add lock_pool_executing_work()
  2013-02-04 21:34   ` Tejun Heo
@ 2013-02-05 12:15     ` Lai Jiangshan
  2013-02-05 19:09       ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 12:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

于 2013/2/5 5:34, Tejun Heo 写道:
> Hello,
>
> On Fri, Feb 01, 2013 at 02:41:31AM +0800, Lai Jiangshan wrote:
>> +static struct worker_pool *lock_pool_executing_work(struct work_struct *work,
>> +						    struct worker **worker)
>> +{
>> +	unsigned long pool_id = offq_work_pool_id(work);
>> +	struct worker_pool *pool;
>> +	struct worker *exec;
>> +
>> +	if (pool_id == WORK_OFFQ_POOL_NONE)
>> +		return NULL;
>> +
>> +	pool = worker_pool_by_id(pool_id);
>> +	if (!pool)
>> +		return NULL;
>> +
>> +	spin_lock(&pool->lock);
>> +	exec = find_worker_executing_work(pool, work);
>> +	if (exec) {
>> +		BUG_ON(pool != exec->pool);
>> +		*worker = exec;
>> +		return pool;
>> +	}
>> +	spin_unlock(&pool->lock);
>> +
>> +	return NULL;
>> +}
>
> So, if a work item is queued on the same CPU and it isn't being
> executed, it will lock, look up the hash, unlock and then lock again?
> If this is something improved by later patch, please explain so.
> There gotta be a better way to do this, right?
>

The caller can't call this function if the work is on queue.
the caller should call it only when CWQ bit is not set.

it is commentted "lock the pool a given offq work is running on",
sorry, comment is too short.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-02-04 21:28   ` Tejun Heo
@ 2013-02-05 15:00     ` Lai Jiangshan
  2013-02-05 16:45       ` Tejun Heo
  2013-02-05 15:06     ` Lai Jiangshan
  1 sibling, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 15:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

于 2013/2/5 5:28, Tejun Heo 写道:
> Hello, again.
>
> On Fri, Feb 01, 2013 at 02:41:26AM +0800, Lai Jiangshan wrote:
>> @@ -110,6 +110,7 @@ struct delayed_work {
>>   	struct work_struct work;
>>   	struct timer_list timer;
>>   	int cpu;
>> +	struct workqueue_struct *wq;
>
> Can't we just replace delayed_work->cpu with delayed_work->cwq?  That
> way, we don't enlarge delayed_work while encoding both wq and cpu in
> delayed_work proper.
>
> Thanks.
>

The @cpu can be WORK_CPU_UNBOUND, and the current code will delay
to determine on which cpu should this work be queued until timeout,
I didn't want to change this behavior when I wrote the patch.

queue_delayed_work_on(WORK_CPU_UNBOUND, system_wq, dwork, delay);

But there is no "get_cwq(WORK_CPU_UNBOUND, system_wq)".
So we have to record the @wq to dwork, Or determine the cpu to queue
earlier which changes the current behavior. I'm OK with this change
if you want.

Thanks,
Lai

PS, again, @cpu and @wq can be saved/encoded in dwork.work.entry if needed.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-02-04 21:28   ` Tejun Heo
  2013-02-05 15:00     ` Lai Jiangshan
@ 2013-02-05 15:06     ` Lai Jiangshan
  1 sibling, 0 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 15:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

于 2013/2/5 5:28, Tejun Heo 写道:
> Hello, again.
>
> On Fri, Feb 01, 2013 at 02:41:26AM +0800, Lai Jiangshan wrote:
>> @@ -110,6 +110,7 @@ struct delayed_work {
>>   	struct work_struct work;
>>   	struct timer_list timer;
>>   	int cpu;
>> +	struct workqueue_struct *wq;
>
> Can't we just replace delayed_work->cpu with delayed_work->cwq?  That
> way, we don't enlarge delayed_work while encoding both wq and cpu in
> delayed_work proper.
>
> Thanks.
>

The @cpu can be WORK_CPU_UNBOUND, and the current code will delay
to determine on which cpu should this work be queued until timeout,
I didn't want to change this behavior when I.

queue_delayed_work_on(WORK_CPU_UNBOUND, system_wq, dwork, delay);

But there is no "get_cwq(WORK_CPU_UNBOUND, system_wq)".
So we have to record the @wq to dwork, Or determine the cpu to queue
earlier which changes the current behavior. I'm OK with this change.

Thanks,
Lai

PS, again, @cpu and @wq can be saved in dwork.work.entry if needed.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-02-04 22:12   ` Tejun Heo
@ 2013-02-05 15:18     ` Lai Jiangshan
  0 siblings, 0 replies; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 15:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Tue, Feb 5, 2013 at 6:12 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> One more thing.
>
> On Fri, Feb 01, 2013 at 02:41:36AM +0800, Lai Jiangshan wrote:
>> @@ -1216,7 +1198,11 @@ static int try_to_grab_pending(struct work_struct *work, bool is_dwork,
>>               list_del_init(&work->entry);
>>               cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
>>
>> -             clear_work_cwq(work, pool->id);
>> +             /* Does the work is still running? */
>> +             worker = find_worker_executing_work(pool, work);
>> +             worker_gwid = worker ? worker->gwid: WORK_OFFQ_WORKER_NONE;
>> +             clear_work_cwq(work, worker_gwid);
>
> Any chance we can remove the busy_hash?  Having to keep it around
> isn't a big deal but it would be nice if can get rid of it.

It is very possible, but it needs a different trade-off.
And I didn't plan to implement it in the series.
I will discus it when this series is settled down.

Thanks,
lai

>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/13] workqueue: enhance locking and record global worker id for work data
  2013-02-04 21:04 ` [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Tejun Heo
@ 2013-02-05 16:19   ` Lai Jiangshan
  2013-02-05 16:50     ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-05 16:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Tue, Feb 5, 2013 at 5:04 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello, Lai.
>
> Generally, I *really* like where you're headed but like before it's a
> bit difficult for me to apply the patches as-is.  Please read on.
>
> On Fri, Feb 01, 2013 at 02:41:23AM +0800, Lai Jiangshan wrote:
>> Better Locking:
>> mainly based on *mb() which is the most dangerous code and bad for readability.
>> This series change the usage of CWQ bit and makes these code simpler.
>>       --PATCH 3,4,5
>
> Yeah, that's one ugly piece of memory barrier magic which has been
> around forever.  I never bothered with it as it was fairly localized
> and not broken.  I *do* like removing it.  A bit on the fence about
> adding another field to delayed_work tho.  The @cpu addition was about
> correctness but this one doesn't really buy us anything other than
> cleaner code.  Folding the wq field into work_struct would be ugly,
> right?  Hmmm....
>
>> We have get_work_pool(), but it requires the caller do the later check and locking,
>> we replace it which 3 better internal locking API. 1) More proper API and
>> 2) merge the duplicated code and 3) simplify the caller.
>>       --PATCH 8,9,10
>
> This mostly leads up to gwid change, right?
>
>> get_work_pool()/get_work_pool_id() are called everywhere, something they are
>> overkill(called idr_find() unneeded) and indirectly(caller knows it is onq or not),
>> we replace them with get_work_cwq()/offq_work_pool_id()/locking APIs.
>>       --PATCH 3,4,5,6,8,9,10
>
> Can't we just make get_work_pool_id() do a fast path if OFFQ than
> requiring the user to distinguish off and on queue cases?

old code, get_work_pool_id() is only called when offq.
after series applied, offq_work_worker_id() *must* be called only when offq,
and we can't offer get_work_worker_id().

so removing get_work_pool_id() and using offq_work_pool_id() instead
are preparing.



>
>> Safely/one-step searching and worker id:
>> ----------------------------------------
>>
>> We are planing to add non-std worker_pool, but old get_work_pool() or new
>> lock_pool_executing_work() was not prepared for this plan, idr_find(pool_id)
>> is unsafe when we introduce free-able non-std worker_pool. Although we can
>> fix it by adding rcu to worker_pool. but "recording global worker id for
>> work data and adding rcu to worker" is another way and more straight forward.
>> We implement the later one,  Now, lock_pool_executing_work() is ready for this plan.
>>       --PATCH 12,13
>>
>> When every time we need to find out the running worker from a work,
>> we need two searches: search work_pool from work's data, and search worker
>> from hash. We record global worker id for work data and we only need one search.
>>       --PATCH 13
>
> While I'm a bit worried about capping total number of workers by the
> amount bits left in work->data, if that doesn't cause any practical
> issue (how many do we have available on 32bit?), I think this is the
> better approach.  We couldn't do this before because work -> worker
> relationship could be 1:N but it should now be doable.  Note that we
> need RCU no matter what we index (pool or worker) to avoid locking on
> each lookup.

BUILD_BUG_ON((BITS_PER_LONG != 64) && (WORK_OFFQ_WORKER_SHIFT > 12));

Every worker needs at least 4k memory for its stack, the bits are enough if
WORK_OFFQ_WORKER_SHIFT <= 12.

>
> So, I like both major changes made by the patchset and most changes
> seem correct, well at least on casual review that is.
>
> The problem is that I'm not very happy with the descriptions and
> comments (what's up with the weird /** formatting?).  At least for me,
> the patchset is quite difficult to follow.  I'm not sure whether it
> has actual organizational issues or the descriptions aren't detailed /
> clear enough yet.
>
> From past experience, I *think* it's gonna be a bit of struggle for
> both of us to get the series in a shape that I would find acceptable
> by reviewing and iterating, so I might just swallow it and regurgitate
> into a form that I like.  Hmm.... dunno.  Will think about it.
>

It is not nightmare for me! the work and discusses will consume most
time of my night, no night time for nightmare.

> Anyways, nice work.
>
I'm glad you like it. My daughter was born about 3month ago and I left
workqueue work then. I think it is time to pick up old pending
patches.

Thanks,
Lai

>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-02-05 15:00     ` Lai Jiangshan
@ 2013-02-05 16:45       ` Tejun Heo
  2013-02-06 11:35         ` Lai Jiangshan
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-05 16:45 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hey, Lai.

On Tue, Feb 05, 2013 at 11:00:42PM +0800, Lai Jiangshan wrote:
> The @cpu can be WORK_CPU_UNBOUND, and the current code will delay
> to determine on which cpu should this work be queued until timeout,
> I didn't want to change this behavior when I wrote the patch.

Hmmm?  The current code determines the CPU on queue_delayed_work().

       	/*
	 * If we cannot get the last pool from @work directly,
	 * select the last CPU such that it avoids unnecessarily
	 * triggering non-reentrancy check in __queue_work().
	 */
	lcpu = cpu;
	if (pool)
		lcpu = pool->cpu;
	if (lcpu == WORK_CPU_UNBOUND)
		lcpu = raw_smp_processor_id();

@lcpu is never WORK_CPU_UNBOUND for bound workqueues and for unbound
workqueues we of course have cwq for WORK_CPU_UNBOUND.

> PS, again, @cpu and @wq can be saved/encoded in dwork.work.entry if needed.

If possible, I'd much prefer to keep it inside delayed_work() proper.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/13] workqueue: enhance locking and record global worker id for work data
  2013-02-05 16:19   ` Lai Jiangshan
@ 2013-02-05 16:50     ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-05 16:50 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hello, Lai.

On Wed, Feb 06, 2013 at 12:19:26AM +0800, Lai Jiangshan wrote:
> > Can't we just make get_work_pool_id() do a fast path if OFFQ than
> > requiring the user to distinguish off and on queue cases?
> 
> old code, get_work_pool_id() is only called when offq.
> after series applied, offq_work_worker_id() *must* be called only when offq,
> and we can't offer get_work_worker_id().

Can you please elaborate why we can't offer get_work_worker_id()?

> > While I'm a bit worried about capping total number of workers by the
> > amount bits left in work->data, if that doesn't cause any practical
> > issue (how many do we have available on 32bit?), I think this is the
> > better approach.  We couldn't do this before because work -> worker
> > relationship could be 1:N but it should now be doable.  Note that we
> > need RCU no matter what we index (pool or worker) to avoid locking on
> > each lookup.
> 
> BUILD_BUG_ON((BITS_PER_LONG != 64) && (WORK_OFFQ_WORKER_SHIFT > 12));
> 
> Every worker needs at least 4k memory for its stack, the bits are enough if
> WORK_OFFQ_WORKER_SHIFT <= 12.

Ah, so that was what that comment was about, so we always have enough
bits.  I have to say the comment is a bit cryptic.

> > From past experience, I *think* it's gonna be a bit of struggle for
> > both of us to get the series in a shape that I would find acceptable
> > by reviewing and iterating, so I might just swallow it and regurgitate
> > into a form that I like.  Hmm.... dunno.  Will think about it.
> 
> It is not nightmare for me! the work and discusses will consume most
> time of my night, no night time for nightmare.

Heh, yeah, I didn't say it was a nightmare.  I do like your work.  I
think most problems are from communication (including description and
comments).  I guess I can help along for a while.

> > Anyways, nice work.
> >
> I'm glad you like it. My daughter was born about 3month ago and I left
> workqueue work then. I think it is time to pick up old pending
> patches.

Big congratulations.  :)

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/13] workqueue: fix work_busy()
  2013-02-05 12:06     ` Lai Jiangshan
@ 2013-02-05 18:53       ` Tejun Heo
  2013-02-06 11:42         ` Lai Jiangshan
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-05 18:53 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hello,

On Tue, Feb 05, 2013 at 08:06:54PM +0800, Lai Jiangshan wrote:
> >>@@ -3453,15 +3451,13 @@ unsigned int work_busy(struct work_struct *work)
> >>  {
> >>  	struct worker_pool *pool = get_work_pool(work);
> >>  	unsigned long flags;
> >>-	unsigned int ret = 0;
> >>+	unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;
> >
> >I'd prefer this as a if() statement.
> >
> >>  	if (!pool)
> >>-		return 0;
> >>+		return ret;
> >
> >I'm a bit confused.  When can we be pending w/o pool?
> >
> 
> grab the pending bits <==time==> really queued
> 			^
> 		this patch considers the work is busy in this time

Given the advisory nature of the function, why do we care?  Is it
needed for later patches?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 08/13] workqueue: add lock_pool_executing_work()
  2013-02-05 12:15     ` Lai Jiangshan
@ 2013-02-05 19:09       ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-05 19:09 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hey,

On Tue, Feb 05, 2013 at 08:15:18PM +0800, Lai Jiangshan wrote:
> >So, if a work item is queued on the same CPU and it isn't being
> >executed, it will lock, look up the hash, unlock and then lock again?
> >If this is something improved by later patch, please explain so.
> >There gotta be a better way to do this, right?
> >
> 
> The caller can't call this function if the work is on queue.
> the caller should call it only when CWQ bit is not set.

Yeah, but when it's off-queue but was queued on the same CPU the last
time, we would first lock, look up the hash, find out it's not
executing, unlock and then lock the same pool again, no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-02-05 16:45       ` Tejun Heo
@ 2013-02-06 11:35         ` Lai Jiangshan
  2013-02-06 14:18           ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-06 11:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Wed, Feb 6, 2013 at 12:45 AM, Tejun Heo <tj@kernel.org> wrote:
> Hey, Lai.
>
> On Tue, Feb 05, 2013 at 11:00:42PM +0800, Lai Jiangshan wrote:
>> The @cpu can be WORK_CPU_UNBOUND, and the current code will delay
>> to determine on which cpu should this work be queued until timeout,
>> I didn't want to change this behavior when I wrote the patch.
>
> Hmmm?  The current code determines the CPU on queue_delayed_work().
>
>         /*
>          * If we cannot get the last pool from @work directly,
>          * select the last CPU such that it avoids unnecessarily
>          * triggering non-reentrancy check in __queue_work().
>          */
>         lcpu = cpu;
>         if (pool)
>                 lcpu = pool->cpu;
>         if (lcpu == WORK_CPU_UNBOUND)
>                 lcpu = raw_smp_processor_id();
>
> @lcpu is never WORK_CPU_UNBOUND for bound workqueues and for unbound
> workqueues we of course have cwq for WORK_CPU_UNBOUND.


Are you sure?
@lcpu is only used when it tries to set cwq to work data without
breaking nrt logic.

void delayed_work_timer_fn(unsigned long __data)
{
	struct delayed_work *dwork = (struct delayed_work *)__data;
	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);

	/* should have been called from irqsafe timer with irq already off */
	__queue_work(dwork->cpu, cwq->wq, &dwork->work);
}

>
>> PS, again, @cpu and @wq can be saved/encoded in dwork.work.entry if needed.
>
> If possible, I'd much prefer to keep it inside delayed_work() proper.
>
> Thanks.
>
> --
> tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/13] workqueue: fix work_busy()
  2013-02-05 18:53       ` Tejun Heo
@ 2013-02-06 11:42         ` Lai Jiangshan
  2013-02-06 14:05           ` Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Lai Jiangshan @ 2013-02-06 11:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Wed, Feb 6, 2013 at 2:53 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Tue, Feb 05, 2013 at 08:06:54PM +0800, Lai Jiangshan wrote:
>> >>@@ -3453,15 +3451,13 @@ unsigned int work_busy(struct work_struct *work)
>> >>  {
>> >>    struct worker_pool *pool = get_work_pool(work);
>> >>    unsigned long flags;
>> >>-   unsigned int ret = 0;
>> >>+   unsigned int ret = work_pending(work) ? WORK_BUSY_PENDING : 0;
>> >
>> >I'd prefer this as a if() statement.
>> >
>> >>    if (!pool)
>> >>-           return 0;
>> >>+           return ret;
>> >
>> >I'm a bit confused.  When can we be pending w/o pool?
>> >
>>
>> grab the pending bits <==time==> really queued
>>                       ^
>>               this patch considers the work is busy in this time
>
> Given the advisory nature of the function, why do we care?  Is it
> needed for later patches?
>

Ah, yes.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 02/13] workqueue: fix work_busy()
  2013-02-06 11:42         ` Lai Jiangshan
@ 2013-02-06 14:05           ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-06 14:05 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hey, Lai.

On Wed, Feb 6, 2013 at 3:42 AM, Lai Jiangshan <eag0628@gmail.com> wrote:
>> Given the advisory nature of the function, why do we care?  Is it
>> needed for later patches?
>
> Ah, yes.

Please elaborate.  Also, things like that - why the patch needs to
exist or what the benefits are - really should be in the patch
description.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool
  2013-02-06 11:35         ` Lai Jiangshan
@ 2013-02-06 14:18           ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-06 14:18 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hello, Lai.

On Wed, Feb 06, 2013 at 07:35:24PM +0800, Lai Jiangshan wrote:
> > @lcpu is never WORK_CPU_UNBOUND for bound workqueues and for unbound
> > workqueues we of course have cwq for WORK_CPU_UNBOUND.
> 
> 
> Are you sure?
> @lcpu is only used when it tries to set cwq to work data without
> breaking nrt logic.
> 
> void delayed_work_timer_fn(unsigned long __data)
> {
> 	struct delayed_work *dwork = (struct delayed_work *)__data;
> 	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
> 
> 	/* should have been called from irqsafe timer with irq already off */
> 	__queue_work(dwork->cpu, cwq->wq, &dwork->work);
> }

Ah, you're right.  I was confused about ->cpu and cwq in ->data.
Yeah, let's just add ->wq as the original patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END
  2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
  2013-02-04 19:49   ` Tejun Heo
@ 2013-02-06 22:34   ` Tejun Heo
  1 sibling, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-06 22:34 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Now that workqueue has moved away from gcwqs, workqueue no longer has
the need to have a CPU identifier indicating "no cpu associated" - we
now use WORK_OFFQ_POOL_NONE instead - and most uses of WORK_CPU_NONE
are gone.

The only left usage is as the end marker for for_each_*wq*()
iterators, where the name WORK_CPU_NONE is confusing w/o actual
WORK_CPU_NONE usages.  Similarly, WORK_CPU_LAST which equals
WORK_CPU_NONE no longer makes sense.

Replace both WORK_CPU_NONE and LAST with WORK_CPU_END.  This patch
doesn't introduce any functional difference.

tj: s/WORK_CPU_LAST/WORK_CPU_END/ and rewrote description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello, Lai.

I massaged the patch and am committing this to wq/for-3.9.  I think
I'm gonna go through the series, massage other patches too and apply
them, mostly because I wanna make progress on custom pool
implementation and it seems like iterating this series the normal way
would take quite some time.

Thanks!

 include/linux/workqueue.h |    3 +--
 kernel/workqueue.c        |   10 +++++-----
 2 files changed, 6 insertions(+), 7 deletions(-)

--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -57,8 +57,7 @@ enum {
 
 	/* special cpu IDs */
 	WORK_CPU_UNBOUND	= NR_CPUS,
-	WORK_CPU_NONE		= NR_CPUS + 1,
-	WORK_CPU_LAST		= WORK_CPU_NONE,
+	WORK_CPU_END		= NR_CPUS + 1,
 
 	/*
 	 * Reserve 7 bits off of cwq pointer w/ debugobjects turned
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -258,7 +258,7 @@ static inline int __next_wq_cpu(int cpu,
 		if (sw & 2)
 			return WORK_CPU_UNBOUND;
 	}
-	return WORK_CPU_NONE;
+	return WORK_CPU_END;
 }
 
 static inline int __next_cwq_cpu(int cpu, const struct cpumask *mask,
@@ -282,17 +282,17 @@ static inline int __next_cwq_cpu(int cpu
  */
 #define for_each_wq_cpu(cpu)						\
 	for ((cpu) = __next_wq_cpu(-1, cpu_possible_mask, 3);		\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_END;					\
 	     (cpu) = __next_wq_cpu((cpu), cpu_possible_mask, 3))
 
 #define for_each_online_wq_cpu(cpu)					\
 	for ((cpu) = __next_wq_cpu(-1, cpu_online_mask, 3);		\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_END;					\
 	     (cpu) = __next_wq_cpu((cpu), cpu_online_mask, 3))
 
 #define for_each_cwq_cpu(cpu, wq)					\
 	for ((cpu) = __next_cwq_cpu(-1, cpu_possible_mask, (wq));	\
-	     (cpu) < WORK_CPU_NONE;					\
+	     (cpu) < WORK_CPU_END;					\
 	     (cpu) = __next_cwq_cpu((cpu), cpu_possible_mask, (wq)))
 
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -3796,7 +3796,7 @@ static int __init init_workqueues(void)
 
 	/* make sure we have enough bits for OFFQ pool ID */
 	BUILD_BUG_ON((1LU << (BITS_PER_LONG - WORK_OFFQ_POOL_SHIFT)) <
-		     WORK_CPU_LAST * NR_STD_WORKER_POOLS);
+		     WORK_CPU_END * NR_STD_WORKER_POOLS);
 
 	cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);
 	hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: make work_busy() test WORK_STRUCT_PENDING first
  2013-01-31 18:41 ` [PATCH 02/13] workqueue: fix work_busy() Lai Jiangshan
  2013-02-04 19:54   ` Tejun Heo
@ 2013-02-06 23:02   ` Tejun Heo
  1 sibling, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-06 23:02 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Currently, work_busy() first tests whether the work has a pool
associated with it and if not, considers it idle.  This works fine
even for delayed_work.work queued on timer, as __queue_delayed_work()
sets cwq on delayed_work.work - a queued delayed_work always has its
cwq and thus pool associated with it.

However, we're about to update delayed_work queueing and this won't
hold.  Update work_busy() such that it tests WORK_STRUCT_PENDING
before the associated pool.  This doesn't make any noticeable behavior
difference now.

With work_pending() test moved, the function read a lot better with
"if (!pool)" test flipped to positive.  Flip it.

While at it, lose the comment about now non-existent reentrant
workqueues.

tj: Reorganized the function and rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hey, Lai.

Did I get the reason right?  Can you *please* try to explain "why" a
patch is needed / beneficial for future postings like I did above?  I
get that writing in English could be a bit stressful (it's not my
mother tongue either and I struggled with it a lot and still do to
certain extent) but the language doesn't have to be perfect.  You just
need to communicate main points of your rationale somehow.

Thanks.

 kernel/workqueue.c |   16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3443,8 +3443,6 @@ EXPORT_SYMBOL_GPL(workqueue_congested);
  * Test whether @work is currently pending or running.  There is no
  * synchronization around this function and the test result is
  * unreliable and only useful as advisory hints or for debugging.
- * Especially for reentrant wqs, the pending state might hide the
- * running state.
  *
  * RETURNS:
  * OR'd bitmask of WORK_BUSY_* bits.
@@ -3455,17 +3453,15 @@ unsigned int work_busy(struct work_struc
 	unsigned long flags;
 	unsigned int ret = 0;
 
-	if (!pool)
-		return 0;
-
-	spin_lock_irqsave(&pool->lock, flags);
-
 	if (work_pending(work))
 		ret |= WORK_BUSY_PENDING;
-	if (find_worker_executing_work(pool, work))
-		ret |= WORK_BUSY_RUNNING;
 
-	spin_unlock_irqrestore(&pool->lock, flags);
+	if (pool) {
+		spin_lock_irqsave(&pool->lock, flags);
+		if (find_worker_executing_work(pool, work))
+			ret |= WORK_BUSY_RUNNING;
+		spin_unlock_irqrestore(&pool->lock, flags);
+	}
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9 workqueue]: add delayed_work->wq to simplify reentrancy handling
  2013-01-31 18:41 ` [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool Lai Jiangshan
  2013-02-04 21:28   ` Tejun Heo
@ 2013-02-07  0:28   ` Tejun Heo
  1 sibling, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-07  0:28 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

To avoid executing the same work item from multiple CPUs concurrently,
a work_struct records the last pool it was on in its ->data so that,
on the next queueing, the pool can be queried to determine whether the
work item is still executing or not.

A delayed_work goes through timer before actually being queued on the
target workqueue and the timer needs to know the target workqueue and
CPU.  This is currently achieved by modifying delayed_work->work.data
such that it points to the cwq which points to the target workqueue
and the last CPU the work item was on.  __queue_delayed_work()
extracts the last CPU from delayed_work->work.data and then combines
it with the target workqueue to create new work.data.

The only thing this rather ugly hack achieves is encoding the target
workqueue into delayed_work->work.data without using a separate field,
which could be a trade off one can make; unfortunately, this entangles
work->data management between regular workqueue and delayed_work code
by setting cwq pointer before the work item is actually queued and
becomes a hindrance for further improvements of work->data handling.

This can be easily made sane by adding a target workqueue field to
delayed_work.  While delayed_work is used widely in the kernel and
this does make it a bit larger (<5%), I think this is the right
trade-off especially given the prospect of much saner handling of
work->data which currently involves quite tricky memory barrier
dancing, and don't expect to see any measureable effect.

Add delayed_work->wq and drop the delayed_work->work.data overloading.

tj: Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |    3 +++
 kernel/workqueue.c        |   32 +++-----------------------------
 2 files changed, 6 insertions(+), 29 deletions(-)

--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -109,6 +109,9 @@ struct work_struct {
 struct delayed_work {
 	struct work_struct work;
 	struct timer_list timer;
+
+	/* target workqueue and CPU ->timer uses to queue ->work */
+	struct workqueue_struct *wq;
 	int cpu;
 };
 
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1339,10 +1339,9 @@ EXPORT_SYMBOL_GPL(queue_work);
 void delayed_work_timer_fn(unsigned long __data)
 {
 	struct delayed_work *dwork = (struct delayed_work *)__data;
-	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
 	/* should have been called from irqsafe timer with irq already off */
-	__queue_work(dwork->cpu, cwq->wq, &dwork->work);
+	__queue_work(dwork->cpu, dwork->wq, &dwork->work);
 }
 EXPORT_SYMBOL_GPL(delayed_work_timer_fn);
 
@@ -1351,7 +1350,6 @@ static void __queue_delayed_work(int cpu
 {
 	struct timer_list *timer = &dwork->timer;
 	struct work_struct *work = &dwork->work;
-	unsigned int lcpu;
 
 	WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
 		     timer->data != (unsigned long)dwork);
@@ -1371,30 +1369,7 @@ static void __queue_delayed_work(int cpu
 
 	timer_stats_timer_set_start_info(&dwork->timer);
 
-	/*
-	 * This stores cwq for the moment, for the timer_fn.  Note that the
-	 * work's pool is preserved to allow reentrance detection for
-	 * delayed works.
-	 */
-	if (!(wq->flags & WQ_UNBOUND)) {
-		struct worker_pool *pool = get_work_pool(work);
-
-		/*
-		 * If we cannot get the last pool from @work directly,
-		 * select the last CPU such that it avoids unnecessarily
-		 * triggering non-reentrancy check in __queue_work().
-		 */
-		lcpu = cpu;
-		if (pool)
-			lcpu = pool->cpu;
-		if (lcpu == WORK_CPU_UNBOUND)
-			lcpu = raw_smp_processor_id();
-	} else {
-		lcpu = WORK_CPU_UNBOUND;
-	}
-
-	set_work_cwq(work, get_cwq(lcpu, wq), 0);
-
+	dwork->wq = wq;
 	dwork->cpu = cpu;
 	timer->expires = jiffies + delay;
 
@@ -2944,8 +2919,7 @@ bool flush_delayed_work(struct delayed_w
 {
 	local_irq_disable();
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(dwork->cpu,
-			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+		__queue_work(dwork->cpu, dwork->wq, &dwork->work);
 	local_irq_enable();
 	return flush_work(&dwork->work);
 }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: make work->data point to pool after try_to_grab_pending()
  2013-01-31 18:41 ` [PATCH 04/13] workqueue: clear cwq when cancel the work Lai Jiangshan
@ 2013-02-07  0:53   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-07  0:53 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

We plan to use work->data pointing to cwq as the synchronization
invariant when determining whether a given work item is on a locked
pool or not, which requires work->data pointing to cwq only while the
work item is queued on the associated pool.

With delayed_work updated not to overload work->data for target
workqueue recording, the only case where we still have off-queue
work->data pointing to cwq is try_to_grab_pending() which doesn't
update work->data after stealing a queued work item.  There's no
reason for try_to_grab_pending() to not update work->data to point to
the pool instead of cwq, like the normal execution does.

This patch adds set_work_pool_and_keep_pending() which makes
work->data point to pool instead of cwq but keeps the pending bit
unlike set_work_pool_and_clear_pending() (surprise!).

After this patch, it's guaranteed that only queued work items point to
cwqs.

This patch doesn't introduce any visible behavior change.

tj: Renamed the new helper function to match
    set_work_pool_and_clear_pending() and rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -556,6 +556,13 @@ static void set_work_cwq(struct work_str
 		      WORK_STRUCT_PENDING | WORK_STRUCT_CWQ | extra_flags);
 }
 
+static void set_work_pool_and_keep_pending(struct work_struct *work,
+					   int pool_id)
+{
+	set_work_data(work, (unsigned long)pool_id << WORK_OFFQ_POOL_SHIFT,
+		      WORK_STRUCT_PENDING);
+}
+
 static void set_work_pool_and_clear_pending(struct work_struct *work,
 					    int pool_id)
 {
@@ -1115,6 +1122,9 @@ static int try_to_grab_pending(struct wo
 			cwq_dec_nr_in_flight(get_work_cwq(work),
 				get_work_color(work));
 
+			/* work->data points to cwq iff queued, point to pool */
+			set_work_pool_and_keep_pending(work, pool->id);
+
 			spin_unlock(&pool->lock);
 			return 1;
 		}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: simplify is-work-item-queued-here test
  2013-01-31 18:41 ` [PATCH 05/13] workqueue: change queued detection and remove *mb()s Lai Jiangshan
@ 2013-02-07  1:52   ` Tejun Heo
  2013-02-07  2:03     ` [PATCH wq/for-3.9] workqueue: cosmetic update in try_to_grab_pending() Tejun Heo
  0 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-07  1:52 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Currently, determining whether a work item is queued on a locked pool
involves somewhat convoluted memory barrier dancing.  It goes like the
following.

* When a work item is queued on a pool, work->data is updated before
  work->entry is linked to the pending list with a wmb() inbetween.

* When trying to determine whether a work item is currently queued on
  a pool pointed to by work->data, it locks the pool and looks at
  work->entry.  If work->entry is linked, we then do rmb() and then
  check whether work->data points to the current pool.

This works because, work->data can only point to a pool if it
currently is or were on the pool and,

* If it currently is on the pool, the tests would obviously succeed.

* It it left the pool, its work->entry was cleared under pool->lock,
  so if we're seeing non-empty work->entry, it has to be from the work
  item being linked on another pool.  Because work->data is updated
  before work->entry is linked with wmb() inbetween, work->data update
  from another pool is guaranteed to be visible if we do rmb() after
  seeing non-empty work->entry.  So, we either see empty work->entry
  or we see updated work->data pointin to another pool.

While this works, it's convoluted, to put it mildly.  With recent
updates, it's now guaranteed that work->data points to cwq only while
the work item is queued and that updating work->data to point to cwq
or back to pool is done under pool->lock, so we can simply test
whether work->data points to cwq which is associated with the
currently locked pool instead of the convoluted memory barrier
dancing.

This patch replaces the memory barrier based "are you still here,
really?" test with much simpler "does work->data points to me?" test -
if work->data points to a cwq which is associated with the currently
locked pool, the work item is guaranteed to be queued on the pool as
work->data can start and stop pointing to such cwq only under
pool->lock and the start and stop coincide with queue and dequeue.

tj: Rewrote the comments and description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1068,6 +1068,7 @@ static int try_to_grab_pending(struct wo
 			       unsigned long *flags)
 {
 	struct worker_pool *pool;
+	struct cpu_workqueue_struct *cwq;
 
 	local_irq_save(*flags);
 
@@ -1097,14 +1098,17 @@ static int try_to_grab_pending(struct wo
 		goto fail;
 
 	spin_lock(&pool->lock);
-	if (!list_empty(&work->entry)) {
-		/*
-		 * This work is queued, but perhaps we locked the wrong
-		 * pool.  In that case we must see the new value after
-		 * rmb(), see insert_work()->wmb().
-		 */
-		smp_rmb();
-		if (pool == get_work_pool(work)) {
+	/*
+	 * work->data is guaranteed to point to cwq only while the work
+	 * item is queued on cwq->wq, and both updating work->data to point
+	 * to cwq on queueing and to pool on dequeueing are done under
+	 * cwq->pool->lock.  This in turn guarantees that, if work->data
+	 * points to cwq which is associated with a locked pool, the work
+	 * item is currently queued on that pool.
+	 */
+	cwq = get_work_cwq(work);
+	if (cwq) {
+		if (cwq->pool == pool) {
 			debug_work_deactivate(work);
 
 			/*
@@ -1159,13 +1163,6 @@ static void insert_work(struct cpu_workq
 
 	/* we own @work, set data and link */
 	set_work_cwq(work, cwq, extra_flags);
-
-	/*
-	 * Ensure that we get the right work->data if we see the
-	 * result of list_add() below, see try_to_grab_pending().
-	 */
-	smp_wmb();
-
 	list_add_tail(&work->entry, head);
 
 	/*
@@ -2799,15 +2796,10 @@ static bool start_flush_work(struct work
 		return false;
 
 	spin_lock_irq(&pool->lock);
-	if (!list_empty(&work->entry)) {
-		/*
-		 * See the comment near try_to_grab_pending()->smp_rmb().
-		 * If it was re-queued to a different pool under us, we
-		 * are not going to wait.
-		 */
-		smp_rmb();
-		cwq = get_work_cwq(work);
-		if (unlikely(!cwq || pool != cwq->pool))
+	/* see the comment in try_to_grab_pending() with the same code */
+	cwq = get_work_cwq(work);
+	if (cwq) {
+		if (unlikely(cwq->pool != pool))
 			goto already_gone;
 	} else {
 		worker = find_worker_executing_work(pool, work);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: cosmetic update in try_to_grab_pending()
  2013-02-07  1:52   ` [PATCH wq/for-3.9] workqueue: simplify is-work-item-queued-here test Tejun Heo
@ 2013-02-07  2:03     ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-07  2:03 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

With the recent is-work-queued-here test simplification, the nested
if() in try_to_grab_pending() can be collapsed.  Collapse it.

This patch is purely cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
---
A follow-up cleanup patch.

Thanks.

 kernel/workqueue.c |   38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1107,31 +1107,27 @@ static int try_to_grab_pending(struct wo
 	 * item is currently queued on that pool.
 	 */
 	cwq = get_work_cwq(work);
-	if (cwq) {
-		if (cwq->pool == pool) {
-			debug_work_deactivate(work);
+	if (cwq && cwq->pool == pool) {
+		debug_work_deactivate(work);
 
-			/*
-			 * A delayed work item cannot be grabbed directly
-			 * because it might have linked NO_COLOR work items
-			 * which, if left on the delayed_list, will confuse
-			 * cwq->nr_active management later on and cause
-			 * stall.  Make sure the work item is activated
-			 * before grabbing.
-			 */
-			if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
-				cwq_activate_delayed_work(work);
+		/*
+		 * A delayed work item cannot be grabbed directly because
+		 * it might have linked NO_COLOR work items which, if left
+		 * on the delayed_list, will confuse cwq->nr_active
+		 * management later on and cause stall.  Make sure the work
+		 * item is activated before grabbing.
+		 */
+		if (*work_data_bits(work) & WORK_STRUCT_DELAYED)
+			cwq_activate_delayed_work(work);
 
-			list_del_init(&work->entry);
-			cwq_dec_nr_in_flight(get_work_cwq(work),
-				get_work_color(work));
+		list_del_init(&work->entry);
+		cwq_dec_nr_in_flight(get_work_cwq(work), get_work_color(work));
 
-			/* work->data points to cwq iff queued, point to pool */
-			set_work_pool_and_keep_pending(work, pool->id);
+		/* work->data points to cwq iff queued, point to pool */
+		set_work_pool_and_keep_pending(work, pool->id);
 
-			spin_unlock(&pool->lock);
-			return 1;
-		}
+		spin_unlock(&pool->lock);
+		return 1;
 	}
 	spin_unlock(&pool->lock);
 fail:

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: make get_work_pool_id() cheaper
  2013-01-31 18:41 ` [PATCH 06/13] workqueue: get pool id from work->data directly if it is offq Lai Jiangshan
@ 2013-02-07 20:16   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-07 20:16 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

get_work_pool_id() currently first obtains pool using get_work_pool()
and then return pool->id.  For an off-queue work item, this involves
obtaining pool ID from worker->data, performing idr_find() to find the
matching pool and then returning its pool->id which of course is the
same as the one which went into idr_find().

Just open code WORK_STRUCT_CWQ case and directly return pool ID from
work->data.

tj: The original patch dropped on-queue work item handling and renamed
    the function to offq_work_pool_id().  There isn't much benefit in
    doing so.  Handling it only requires a single if() and we need at
    least BUG_ON(), which is also a branch, even if we drop on-queue
    handling.  Open code WORK_STRUCT_CWQ case and keep the function in
    line with get_work_pool().  Rewrote the description.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -626,9 +626,13 @@ static struct worker_pool *get_work_pool
  */
 static int get_work_pool_id(struct work_struct *work)
 {
-	struct worker_pool *pool = get_work_pool(work);
+	unsigned long data = atomic_long_read(&work->data);
 
-	return pool ? pool->id : WORK_OFFQ_POOL_NONE;
+	if (data & WORK_STRUCT_CWQ)
+		return ((struct cpu_workqueue_struct *)
+			(data & WORK_STRUCT_WQ_DATA_MASK))->pool->id;
+
+	return data >> WORK_OFFQ_POOL_SHIFT;
 }
 
 static void mark_work_canceling(struct work_struct *work)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH wq/for-3.9] workqueue: pick cwq instead of pool in __queue_work()
  2013-01-31 18:41 ` [PATCH 07/13] workqueue: get pool from wq/cwq Lai Jiangshan
@ 2013-02-07 21:13   ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-07 21:13 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Currently, __queue_work() chooses the pool to queue a work item to and
then determines cwq from the target wq and the chosen pool.  This is a
bit backwards in that we can determine cwq first and simply use
cwq->pool.  This way, we can skip get_std_worker_pool() in queueing
path which will be a hurdle when implementing custom worker pools.

Update __queue_work() such that it chooses the target cwq and then use
cwq->pool instead of the other way around.  While at it, add missing
{} in an if statement.

This patch doesn't introduce any functional changes.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1193,8 +1193,6 @@ static bool is_chained_work(struct workq
 static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 			 struct work_struct *work)
 {
-	bool highpri = wq->flags & WQ_HIGHPRI;
-	struct worker_pool *pool;
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
 	unsigned int work_flags;
@@ -1215,7 +1213,7 @@ static void __queue_work(unsigned int cp
 	    WARN_ON_ONCE(!is_chained_work(wq)))
 		return;
 
-	/* determine pool to use */
+	/* determine the cwq to use */
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct worker_pool *last_pool;
 
@@ -1228,37 +1226,36 @@ static void __queue_work(unsigned int cp
 		 * work needs to be queued on that cpu to guarantee
 		 * non-reentrancy.
 		 */
-		pool = get_std_worker_pool(cpu, highpri);
+		cwq = get_cwq(cpu, wq);
 		last_pool = get_work_pool(work);
 
-		if (last_pool && last_pool != pool) {
+		if (last_pool && last_pool != cwq->pool) {
 			struct worker *worker;
 
 			spin_lock(&last_pool->lock);
 
 			worker = find_worker_executing_work(last_pool, work);
 
-			if (worker && worker->current_cwq->wq == wq)
-				pool = last_pool;
-			else {
+			if (worker && worker->current_cwq->wq == wq) {
+				cwq = get_cwq(last_pool->cpu, wq);
+			} else {
 				/* meh... not running there, queue here */
 				spin_unlock(&last_pool->lock);
-				spin_lock(&pool->lock);
+				spin_lock(&cwq->pool->lock);
 			}
 		} else {
-			spin_lock(&pool->lock);
+			spin_lock(&cwq->pool->lock);
 		}
 	} else {
-		pool = get_std_worker_pool(WORK_CPU_UNBOUND, highpri);
-		spin_lock(&pool->lock);
+		cwq = get_cwq(WORK_CPU_UNBOUND, wq);
+		spin_lock(&cwq->pool->lock);
 	}
 
-	/* pool determined, get cwq and queue */
-	cwq = get_cwq(pool->cpu, wq);
+	/* cwq determined, queue */
 	trace_workqueue_queue_work(req_cpu, cwq, work);
 
 	if (WARN_ON(!list_empty(&work->entry))) {
-		spin_unlock(&pool->lock);
+		spin_unlock(&cwq->pool->lock);
 		return;
 	}
 
@@ -1276,7 +1273,7 @@ static void __queue_work(unsigned int cp
 
 	insert_work(cwq, work, worklist, work_flags);
 
-	spin_unlock(&pool->lock);
+	spin_unlock(&cwq->pool->lock);
 }
 
 /**

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
  2013-02-04 21:40   ` Tejun Heo
  2013-02-04 22:12   ` Tejun Heo
@ 2013-02-07 22:02   ` Tejun Heo
  2013-02-13 22:23     ` Tejun Heo
  2 siblings, 1 reply; 48+ messages in thread
From: Tejun Heo @ 2013-02-07 22:02 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hey, again.

On Fri, Feb 01, 2013 at 02:41:36AM +0800, Lai Jiangshan wrote:
> Every time, when we need to find the executing worker from work,
> we need 2 steps:
> 	find the pool from the idr by pool id.
> 	find the worker from the hash table of the pool.
> 
> Now we merge them as one step: find the worker directly from the idr by worker gwid.
> (lock_pool_executing_work(). If the work is onq, we still use hash table.)
> 
> It makes the code more straightforward.
> 
> In future, we may add percpu worker gwid <--> worker mapping cache when needed.
> 
> And we are planing to add non-std worker_pool, we still don't know how to
> implement worker_pool_by_id() for non-std worker_pool, this patch solves it.
> 
> This patch slows down the very-slow-path destroy_worker(), if it is blamed,
> we will move the synchronize_rcu() out.
> 
> This patch adds a small overhead(rcu_read_[un]lock) in fast path
> lock_pool_executing_work(). if it is blamed, we will use rcu_sched instead.
> (local_irq_disable() implies rcu_read_lock_sched(), so we can remove
> this overhead.)

So, I've been looking at the last six patches and am not really sure
whether I want this.  At least not in the current form.  We end up
locking, unlocking and then locking again the same thing in hot path.
Having a proper locking API is all nice and good *if* it actually can
support use case at hand.  The proposed one can't.

Also, if worker is RCU protected then pool should be too (otherwise
you can't do worker->pool->lock), so it doesn't really simplify that
aspect either.

I do like work items pointing back to workers instead of pools, so I
think I'll try that differently.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue
  2013-02-07 22:02   ` Tejun Heo
@ 2013-02-13 22:23     ` Tejun Heo
  0 siblings, 0 replies; 48+ messages in thread
From: Tejun Heo @ 2013-02-13 22:23 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, again.

On Thu, Feb 07, 2013 at 02:02:27PM -0800, Tejun Heo wrote:
> I do like work items pointing back to workers instead of pools, so I
> think I'll try that differently.

I tried to convert work->data to point to the last worker and having
tried it I'm not too sure about it anymore.

* work->data points to cwq while queued which basically is the current
  pool + target workqueue.  Making work->data point to worker off
  queue makes the two modes further apart, which is a bit
  uncomfortable.

* Also, as it currently stands, we can't remove busy_hash because
  on-queue work->data can't point to the last worker and thus we need
  to look up busy_hash before actually executing it.  If we're gonna
  have to keep busy_hash, we might as well use it consistently.

* With pending idr changes, idr lookups become very efficient if the
  ID range is limited.  The range currently is 256 which should be
  enough to cover all the pools in most configurations.  The idr
  lookup essentially becomes if (prefix matches) return hint[offset],
  so I don't think it would save anything measureable from removing
  that look up.  Replacing the idr + busy_hash lookup with single idr
  last worker lookup could still be nicer, I think, but probably not
  by too much.

So, I'm not really seeing too much benefit in converting work->data to
point to the last worker.  If we can remove busy_hash, maybe, but
short of that, I think I'm gonna keep things the way they're now.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2013-02-13 22:24 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-31 18:41 [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Lai Jiangshan
2013-01-31 18:41 ` [PATCH 01/13] workqueue: remove WORK_CPU_NONE Lai Jiangshan
2013-02-04 19:49   ` Tejun Heo
2013-02-05 12:03     ` Lai Jiangshan
2013-02-06 22:34   ` [PATCH wq/for-3.9] workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END Tejun Heo
2013-01-31 18:41 ` [PATCH 02/13] workqueue: fix work_busy() Lai Jiangshan
2013-02-04 19:54   ` Tejun Heo
2013-02-05 12:06     ` Lai Jiangshan
2013-02-05 18:53       ` Tejun Heo
2013-02-06 11:42         ` Lai Jiangshan
2013-02-06 14:05           ` Tejun Heo
2013-02-06 23:02   ` [PATCH wq/for-3.9] workqueue: make work_busy() test WORK_STRUCT_PENDING first Tejun Heo
2013-01-31 18:41 ` [PATCH 03/13] workqueue: don't set work cwq until we queued it on pool Lai Jiangshan
2013-02-04 21:28   ` Tejun Heo
2013-02-05 15:00     ` Lai Jiangshan
2013-02-05 16:45       ` Tejun Heo
2013-02-06 11:35         ` Lai Jiangshan
2013-02-06 14:18           ` Tejun Heo
2013-02-05 15:06     ` Lai Jiangshan
2013-02-07  0:28   ` [PATCH wq/for-3.9 workqueue]: add delayed_work->wq to simplify reentrancy handling Tejun Heo
2013-01-31 18:41 ` [PATCH 04/13] workqueue: clear cwq when cancel the work Lai Jiangshan
2013-02-07  0:53   ` [PATCH wq/for-3.9] workqueue: make work->data point to pool after try_to_grab_pending() Tejun Heo
2013-01-31 18:41 ` [PATCH 05/13] workqueue: change queued detection and remove *mb()s Lai Jiangshan
2013-02-07  1:52   ` [PATCH wq/for-3.9] workqueue: simplify is-work-item-queued-here test Tejun Heo
2013-02-07  2:03     ` [PATCH wq/for-3.9] workqueue: cosmetic update in try_to_grab_pending() Tejun Heo
2013-01-31 18:41 ` [PATCH 06/13] workqueue: get pool id from work->data directly if it is offq Lai Jiangshan
2013-02-07 20:16   ` [PATCH wq/for-3.9] workqueue: make get_work_pool_id() cheaper Tejun Heo
2013-01-31 18:41 ` [PATCH 07/13] workqueue: get pool from wq/cwq Lai Jiangshan
2013-02-07 21:13   ` [PATCH wq/for-3.9] workqueue: pick cwq instead of pool in __queue_work() Tejun Heo
2013-01-31 18:41 ` [PATCH 08/13] workqueue: add lock_pool_executing_work() Lai Jiangshan
2013-02-04 21:34   ` Tejun Heo
2013-02-05 12:15     ` Lai Jiangshan
2013-02-05 19:09       ` Tejun Heo
2013-01-31 18:41 ` [PATCH 09/13] workqueue: add lock_pool_queued_work() Lai Jiangshan
2013-01-31 18:41 ` [PATCH 10/13] workqueue: add lock_pool_own_work() and remove get_work_pool() Lai Jiangshan
2013-02-04 21:38   ` Tejun Heo
2013-01-31 18:41 ` [PATCH 11/13] workqueue: allow more work_pool id space Lai Jiangshan
2013-01-31 18:41 ` [PATCH 12/13] workqueue: add worker's global worker ID Lai Jiangshan
2013-02-04 21:39   ` Tejun Heo
2013-01-31 18:41 ` [PATCH 13/13] workqueue: record global worker ID instead of pool ID in work->data when off-queue Lai Jiangshan
2013-02-04 21:40   ` Tejun Heo
2013-02-04 22:12   ` Tejun Heo
2013-02-05 15:18     ` Lai Jiangshan
2013-02-07 22:02   ` Tejun Heo
2013-02-13 22:23     ` Tejun Heo
2013-02-04 21:04 ` [PATCH 00/13] workqueue: enhance locking and record global worker id for work data Tejun Heo
2013-02-05 16:19   ` Lai Jiangshan
2013-02-05 16:50     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).