linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock()
@ 2012-09-06 20:06 Tejun Heo
  2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-06 20:06 UTC (permalink / raw)
  To: linux-kernel, Lai Jiangshan

>From a6d1347ef1a08623b9881b1705ce0df6b213afb1 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 6 Sep 2012 12:50:40 -0700

Releasing management and unlocking gcwq->lock need to be done
separately for the scheduled fix of a subtle idle worker depletion
issue during CPU_ONLINE.  Break out gcwq->lock handling from these
functions.

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
This three patch series fixes the possible idle worker depletion bug
reported by Lai.  The first two patches are prep patches which don't
introduce any functional difference.  The third fixes the problem by
releasing manager_mutexes before releasing idle workers.

Thanks.

 kernel/workqueue.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dc7b845..63ede1f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3394,21 +3394,19 @@ EXPORT_SYMBOL_GPL(work_busy);
  */
 
 /* claim manager positions of all pools */
-static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
+static void gcwq_claim_management(struct global_cwq *gcwq)
 {
 	struct worker_pool *pool;
 
 	for_each_worker_pool(pool, gcwq)
 		mutex_lock_nested(&pool->manager_mutex, pool - gcwq->pools);
-	spin_lock_irq(&gcwq->lock);
 }
 
 /* release manager positions */
-static void gcwq_release_management_and_unlock(struct global_cwq *gcwq)
+static void gcwq_release_management(struct global_cwq *gcwq)
 {
 	struct worker_pool *pool;
 
-	spin_unlock_irq(&gcwq->lock);
 	for_each_worker_pool(pool, gcwq)
 		mutex_unlock(&pool->manager_mutex);
 }
@@ -3423,7 +3421,8 @@ static void gcwq_unbind_fn(struct work_struct *work)
 
 	BUG_ON(gcwq->cpu != smp_processor_id());
 
-	gcwq_claim_management_and_lock(gcwq);
+	gcwq_claim_management(gcwq);
+	spin_lock_irq(&gcwq->lock);
 
 	/*
 	 * We've claimed all manager positions.  Make all workers unbound
@@ -3440,7 +3439,8 @@ static void gcwq_unbind_fn(struct work_struct *work)
 
 	gcwq->flags |= GCWQ_DISASSOCIATED;
 
-	gcwq_release_management_and_unlock(gcwq);
+	spin_unlock_irq(&gcwq->lock);
+	gcwq_release_management(gcwq);
 
 	/*
 	 * Call schedule() so that we cross rq->lock and thus can guarantee
@@ -3496,10 +3496,12 @@ static int __devinit workqueue_cpu_up_callback(struct notifier_block *nfb,
 
 	case CPU_DOWN_FAILED:
 	case CPU_ONLINE:
-		gcwq_claim_management_and_lock(gcwq);
+		gcwq_claim_management(gcwq);
+		spin_lock_irq(&gcwq->lock);
 		gcwq->flags &= ~GCWQ_DISASSOCIATED;
 		rebind_workers(gcwq);
-		gcwq_release_management_and_unlock(gcwq);
+		spin_unlock_irq(&gcwq->lock);
+		gcwq_release_management(gcwq);
 		break;
 	}
 	return NOTIFY_OK;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing
  2012-09-06 20:06 [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock() Tejun Heo
@ 2012-09-06 20:07 ` Tejun Heo
  2012-09-06 20:08   ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-06 20:07 UTC (permalink / raw)
  To: linux-kernel, Lai Jiangshan

>From 0150a04271dbcc9abbb2575911fa1d72d40451bf Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 6 Sep 2012 12:50:40 -0700

CPU_ONLINE used to handle locking and clearing of DISASSOCIATED and
rebind_workers() just the rebinding.  This patch renames the function
to gcwq_associate() and let it handle the whole onlining.  This is for
the scheduled fix of a subtle idle worker depletion issue during
CPU_ONLINE.

Note that this removes the unnecessary relock at the end of
gcwq_associate().

This patch doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/workqueue.c |   29 ++++++++++++++---------------
 1 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 63ede1f..b19170b 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -480,6 +480,8 @@ static atomic_t unbound_pool_nr_running[NR_WORKER_POOLS] = {
 };
 
 static int worker_thread(void *__worker);
+static void gcwq_claim_management(struct global_cwq *gcwq);
+static void gcwq_release_management(struct global_cwq *gcwq);
 
 static int worker_pool_pri(struct worker_pool *pool)
 {
@@ -1355,11 +1357,11 @@ static void busy_worker_rebind_fn(struct work_struct *work)
 }
 
 /**
- * rebind_workers - rebind all workers of a gcwq to the associated CPU
+ * gcwq_associate - (re)associate a gcwq to its CPU and rebind its workers
  * @gcwq: gcwq of interest
  *
- * @gcwq->cpu is coming online.  Rebind all workers to the CPU.  Rebinding
- * is different for idle and busy ones.
+ * @gcwq->cpu is coming online.  Clear %GCWQ_DISASSOCIATED and rebind all
+ * workers to the CPU.  Rebinding is different for idle and busy ones.
  *
  * The idle ones should be rebound synchronously and idle rebinding should
  * be complete before any worker starts executing work items with
@@ -1378,8 +1380,7 @@ static void busy_worker_rebind_fn(struct work_struct *work)
  * On return, all workers are guaranteed to either be bound or have rebind
  * work item scheduled.
  */
-static void rebind_workers(struct global_cwq *gcwq)
-	__releases(&gcwq->lock) __acquires(&gcwq->lock)
+static void gcwq_associate(struct global_cwq *gcwq)
 {
 	struct idle_rebind idle_rebind;
 	struct worker_pool *pool;
@@ -1387,10 +1388,10 @@ static void rebind_workers(struct global_cwq *gcwq)
 	struct hlist_node *pos;
 	int i;
 
-	lockdep_assert_held(&gcwq->lock);
+	gcwq_claim_management(gcwq);
+	spin_lock_irq(&gcwq->lock);
 
-	for_each_worker_pool(pool, gcwq)
-		lockdep_assert_held(&pool->manager_mutex);
+	gcwq->flags &= ~GCWQ_DISASSOCIATED;
 
 	/*
 	 * Rebind idle workers.  Interlocked both ways.  We wait for
@@ -1477,8 +1478,11 @@ retry:
 	if (--idle_rebind.cnt) {
 		spin_unlock_irq(&gcwq->lock);
 		wait_for_completion(&idle_rebind.done);
-		spin_lock_irq(&gcwq->lock);
+	} else {
+		spin_unlock_irq(&gcwq->lock);
 	}
+
+	gcwq_release_management(gcwq);
 }
 
 static struct worker *alloc_worker(void)
@@ -3496,12 +3500,7 @@ static int __devinit workqueue_cpu_up_callback(struct notifier_block *nfb,
 
 	case CPU_DOWN_FAILED:
 	case CPU_ONLINE:
-		gcwq_claim_management(gcwq);
-		spin_lock_irq(&gcwq->lock);
-		gcwq->flags &= ~GCWQ_DISASSOCIATED;
-		rebind_workers(gcwq);
-		spin_unlock_irq(&gcwq->lock);
-		gcwq_release_management(gcwq);
+		gcwq_associate(gcwq);
 		break;
 	}
 	return NOTIFY_OK;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
@ 2012-09-06 20:08   ` Tejun Heo
  2012-09-07  1:53     ` Lai Jiangshan
  2012-09-07  3:10     ` Lai Jiangshan
  0 siblings, 2 replies; 17+ messages in thread
From: Tejun Heo @ 2012-09-06 20:08 UTC (permalink / raw)
  To: linux-kernel, Lai Jiangshan

>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Thu, 6 Sep 2012 12:50:41 -0700

To simplify both normal and CPU hotplug paths, while CPU hotplug is in
progress, manager_mutex is held to prevent one of the workers from
becoming a manager and creating or destroying workers; unfortunately,
it currently may lead to idle worker depletion which in turn can lead
to deadlock under extreme circumstances.

Idle workers aren't allowed to become busy if there's no other idle
worker left to create more idle workers, but during CPU_ONLINE
gcwq_associate() is holding all managerships and all the idle workers
can proceed to become busy before gcwq_associate() is finished.

This patch fixes the bug by releasing manager_mutexes before letting
the rebound idle workers go.  This ensures that by the time idle
workers check whether management is necessary, CPU_ONLINE already has
released the positions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 kernel/workqueue.c |   20 ++++++++++++++------
 1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index b19170b..74487ef 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1454,10 +1454,19 @@ retry:
 	}
 
 	/*
-	 * All idle workers are rebound and waiting for %WORKER_REBIND to
-	 * be cleared inside idle_worker_rebind().  Clear and release.
-	 * Clearing %WORKER_REBIND from this foreign context is safe
-	 * because these workers are still guaranteed to be idle.
+	 * At this point, each pool is guaranteed to have at least one idle
+	 * worker and all idle workers are waiting for WORKER_REBIND to
+	 * clear.  Release management before releasing idle workers;
+	 * otherwise, they can all go become busy as we're holding the
+	 * manager_mutexes, which can lead to deadlock as we don't actually
+	 * create new workers.
+	 */
+	gcwq_release_management(gcwq);
+
+	/*
+	 * Clear %WORKER_REBIND and release.  Clearing it from this foreign
+	 * context is safe because these workers are still guaranteed to be
+	 * idle.
 	 *
 	 * We need to make sure all idle workers passed WORKER_REBIND wait
 	 * in idle_worker_rebind() before returning; otherwise, workers can
@@ -1467,6 +1476,7 @@ retry:
 	INIT_COMPLETION(idle_rebind.done);
 
 	for_each_worker_pool(pool, gcwq) {
+		WARN_ON_ONCE(list_empty(&pool->idle_list));
 		list_for_each_entry(worker, &pool->idle_list, entry) {
 			worker->flags &= ~WORKER_REBIND;
 			idle_rebind.cnt++;
@@ -1481,8 +1491,6 @@ retry:
 	} else {
 		spin_unlock_irq(&gcwq->lock);
 	}
-
-	gcwq_release_management(gcwq);
 }
 
 static struct worker *alloc_worker(void)
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-06 20:08   ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
@ 2012-09-07  1:53     ` Lai Jiangshan
  2012-09-07 19:25       ` Tejun Heo
  2012-09-07  3:10     ` Lai Jiangshan
  1 sibling, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2012-09-07  1:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 09/07/2012 04:08 AM, Tejun Heo wrote:
>>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Thu, 6 Sep 2012 12:50:41 -0700
> 
> To simplify both normal and CPU hotplug paths, while CPU hotplug is in
> progress, manager_mutex is held to prevent one of the workers from
> becoming a manager and creating or destroying workers; unfortunately,
> it currently may lead to idle worker depletion which in turn can lead
> to deadlock under extreme circumstances.
> 
> Idle workers aren't allowed to become busy if there's no other idle
> worker left to create more idle workers, but during CPU_ONLINE
> gcwq_associate() is holding all managerships and all the idle workers
> can proceed to become busy before gcwq_associate() is finished.

The any code which grab the manage_mutex can cause the bug.
Not only rebind_workers(), but also gcwq_unbind_fn().
Not only during CPU_ONLINE, but also during CPU_DOWN_PREPARE

> 
> This patch fixes the bug by releasing manager_mutexes before letting
> the rebound idle workers go.  This ensures that by the time idle
> workers check whether management is necessary, CPU_ONLINE already has
> released the positions.

This can't fix the problem.

+	gcwq_claim_management(gcwq);
+	spin_lock_irq(&gcwq->lock);


If manage_workers() happens between the two line, the problem occurs!.


My non_manager_role_manager_mutex_unlock() approach has the same idea: release manage_mutex before release gcwq->lock.
but non_manager_role_manager_mutex_unlock() approach will detect the fail reason of failing to grab manage_lock and go to sleep.
rebind_workers()/gcwq_unbind_fn() will release manage_mutex and then wait up some before release gcwq->lock.


==========================
A "release manage_mutex before release gcwq->lock" approach.(no one likes it, I think)


/* claim manager positions of all pools */
static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
{
	struct worker_pool *pool, *pool_fail;

again:
	spin_lock_irq(&gcwq->lock);
	for_each_worker_pool(pool, gcwq) {
		if (!mutex_trylock(&pool->manager_mutex))
			goto fail;
	}
	return;

fail: /* unlikely, because manage_workers() are very unlike path in my box */
	
	for_each_worker_pool(pool_fail, gcwq) {
		if (pool_fail != pool)
			mutex_unlock(&pool->manager_mutex);
		else
			break;
	}
	spin_unlock_irq(&gcwq->lock);
	cpu_relax();
	goto again;
}

/* release manager positions */
static void gcwq_release_management_and_unlock(struct global_cwq *gcwq)
{
	struct worker_pool *pool;

	for_each_worker_pool(pool, gcwq)
		mutex_unlock(&pool->manager_mutex);
	spin_unlock_irq(&gcwq->lock);
}


> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
>  kernel/workqueue.c |   20 ++++++++++++++------
>  1 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index b19170b..74487ef 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1454,10 +1454,19 @@ retry:
>  	}
>  
>  	/*
> -	 * All idle workers are rebound and waiting for %WORKER_REBIND to
> -	 * be cleared inside idle_worker_rebind().  Clear and release.
> -	 * Clearing %WORKER_REBIND from this foreign context is safe
> -	 * because these workers are still guaranteed to be idle.
> +	 * At this point, each pool is guaranteed to have at least one idle
> +	 * worker and all idle workers are waiting for WORKER_REBIND to
> +	 * clear.  Release management before releasing idle workers;
> +	 * otherwise, they can all go become busy as we're holding the
> +	 * manager_mutexes, which can lead to deadlock as we don't actually
> +	 * create new workers.
> +	 */
> +	gcwq_release_management(gcwq);
> +
> +	/*
> +	 * Clear %WORKER_REBIND and release.  Clearing it from this foreign
> +	 * context is safe because these workers are still guaranteed to be
> +	 * idle.
>  	 *
>  	 * We need to make sure all idle workers passed WORKER_REBIND wait
>  	 * in idle_worker_rebind() before returning; otherwise, workers can
> @@ -1467,6 +1476,7 @@ retry:
>  	INIT_COMPLETION(idle_rebind.done);
>  
>  	for_each_worker_pool(pool, gcwq) {
> +		WARN_ON_ONCE(list_empty(&pool->idle_list));
>  		list_for_each_entry(worker, &pool->idle_list, entry) {
>  			worker->flags &= ~WORKER_REBIND;
>  			idle_rebind.cnt++;
> @@ -1481,8 +1491,6 @@ retry:
>  	} else {
>  		spin_unlock_irq(&gcwq->lock);
>  	}
> -
> -	gcwq_release_management(gcwq);
>  }
>  
>  static struct worker *alloc_worker(void)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-06 20:08   ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
  2012-09-07  1:53     ` Lai Jiangshan
@ 2012-09-07  3:10     ` Lai Jiangshan
  2012-09-07 19:29       ` Tejun Heo
  1 sibling, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2012-09-07  3:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

On 09/07/2012 04:08 AM, Tejun Heo wrote:
>>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Thu, 6 Sep 2012 12:50:41 -0700
> 
> To simplify both normal and CPU hotplug paths, while CPU hotplug is in
> progress, manager_mutex is held to prevent one of the workers from
> becoming a manager and creating or destroying workers; unfortunately,
> it currently may lead to idle worker depletion which in turn can lead
> to deadlock under extreme circumstances.
> 
> Idle workers aren't allowed to become busy if there's no other idle
> worker left to create more idle workers, but during CPU_ONLINE
> gcwq_associate() is holding all managerships and all the idle workers
> can proceed to become busy before gcwq_associate() is finished.
> 
> This patch fixes the bug by releasing manager_mutexes before letting
> the rebound idle workers go.  This ensures that by the time idle
> workers check whether management is necessary, CPU_ONLINE already has
> released the positions.
> 

Could you review manage_workers_slowpath() in V4 patchset.
It has enough changelog and comments.

After the discussion,

We don't move the hotplug code outside hotplug code. it matches this requirement.

Since we introduce manage_mutex(), any palace should be allowed to grab it
when its context allows. So it is not hotplug code's responsibility of this bug.

manage_workers() just use mutex_trylock() to grab the lock, it does not make
hard to do it jobs when need, and it does not try to find out the reason of fail.
so I think it is manage_workers()'s responsibility to handle this bug.
a manage_workers_slowpath() is enough to fix the bug.

=====
manage_workers_slowpath() just adds a little overhead over manage_workers(),
so we can use manage_workers_slowpath() to replace manage_workers(), thus we
can reduce the code of manage_workers() and we can do more cleanup for manage.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07  1:53     ` Lai Jiangshan
@ 2012-09-07 19:25       ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 19:25 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello, Lai.

On Fri, Sep 07, 2012 at 09:53:25AM +0800, Lai Jiangshan wrote:
> > This patch fixes the bug by releasing manager_mutexes before letting
> > the rebound idle workers go.  This ensures that by the time idle
> > workers check whether management is necessary, CPU_ONLINE already has
> > released the positions.
> 
> This can't fix the problem.
> 
> +	gcwq_claim_management(gcwq);
> +	spin_lock_irq(&gcwq->lock);
> 
> 
> If manage_workers() happens between the two line, the problem occurs!.

Indeed.  I was only looking at rebinding completion.  Hmmm... I
suppose any simple solution is out of window at this point.  I guess
we'll have to defer the fix to 3.7.  I reverted the posted patches.

> My non_manager_role_manager_mutex_unlock() approach has the same
> idea: release manage_mutex before release gcwq->lock.  but
> non_manager_role_manager_mutex_unlock() approach will detect the
> fail reason of failing to grab manage_lock and go to sleep.
> rebind_workers()/gcwq_unbind_fn() will release manage_mutex and then
> wait up some before release gcwq->lock.

Can you please try to fit the text to 80 column?  It would be much
easier to read.

> A "release manage_mutex before release gcwq->lock" approach.(no one
> likes it, I think)
> 
> 
> /* claim manager positions of all pools */
> static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
> {
> 	struct worker_pool *pool, *pool_fail;
> 
> again:
> 	spin_lock_irq(&gcwq->lock);
> 	for_each_worker_pool(pool, gcwq) {
> 		if (!mutex_trylock(&pool->manager_mutex))
> 			goto fail;
> 	}
> 	return;
> 
> fail: /* unlikely, because manage_workers() are very unlike path in my box */
> 	
> 	for_each_worker_pool(pool_fail, gcwq) {
> 		if (pool_fail != pool)
> 			mutex_unlock(&pool->manager_mutex);
> 		else
> 			break;
> 	}
> 	spin_unlock_irq(&gcwq->lock);
> 	cpu_relax();
> 	goto again;
> }

Yeah, that's kinda ugly and also has the potential to cause extended
period of busy looping.  Let's think of something else.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07  3:10     ` Lai Jiangshan
@ 2012-09-07 19:29       ` Tejun Heo
  2012-09-07 20:22         ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 19:29 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello,

On Fri, Sep 07, 2012 at 11:10:34AM +0800, Lai Jiangshan wrote:
> > This patch fixes the bug by releasing manager_mutexes before letting
> > the rebound idle workers go.  This ensures that by the time idle
> > workers check whether management is necessary, CPU_ONLINE already has
> > released the positions.
> 
> Could you review manage_workers_slowpath() in V4 patchset.
> It has enough changelog and comments.
> 
> After the discussion,
> 
> We don't move the hotplug code outside hotplug code. it matches this requirement.

Was that the one which deferred calling manager function to a work
item on trylock failure?

> Since we introduce manage_mutex(), any palace should be allowed to grab it
> when its context allows. So it is not hotplug code's responsibility of this bug.
> 
> manage_workers() just use mutex_trylock() to grab the lock, it does not make
> hard to do it jobs when need, and it does not try to find out the reason of fail.
> so I think it is manage_workers()'s responsibility to handle this bug.
> a manage_workers_slowpath() is enough to fix the bug.

It doesn't really matter how the synchronization between regular
manager and hotplug path is done.  The point is that hotplug path, as
much as possible, should be responsible for any incurred complexities,
so I'd really like to stay away from adding a completely different
path manager can be invoked in the usual path if at all possible.
Let's try to solve this from the hotplug side.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 19:29       ` Tejun Heo
@ 2012-09-07 20:22         ` Tejun Heo
  2012-09-07 20:34           ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 20:22 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

Hello again, Lai.

On Fri, Sep 07, 2012 at 12:29:39PM -0700, Tejun Heo wrote:
> > Since we introduce manage_mutex(), any palace should be allowed to grab it
> > when its context allows. So it is not hotplug code's responsibility of this bug.
> > 
> > manage_workers() just use mutex_trylock() to grab the lock, it does not make
> > hard to do it jobs when need, and it does not try to find out the reason of fail.
> > so I think it is manage_workers()'s responsibility to handle this bug.
> > a manage_workers_slowpath() is enough to fix the bug.
> 
> It doesn't really matter how the synchronization between regular
> manager and hotplug path is done.  The point is that hotplug path, as
> much as possible, should be responsible for any incurred complexities,
> so I'd really like to stay away from adding a completely different
> path manager can be invoked in the usual path if at all possible.
> Let's try to solve this from the hotplug side.

So, how about something like the following?

* Make manage_workers() called outside gcwq->lock (or drop gcwq->lock
  after checking MANAGING).  worker_thread() can jump back to woke_up:
  instead.

* Distinguish synchronization among workers and against hotplug.  Was
  this what you tried with non_manager_mutex?  Anyways, revive
  WORKER_MANAGING to synchronize among workers.  If the worker won
  MANAGING, drop gcwq->lock and mutex_lock() gcwq->hotplug_mutex and
  then do other stuff.

This should prevent any idle worker passing through manage_workers()
while hotplug is in progress.  Do you think it would work?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 20:22         ` Tejun Heo
@ 2012-09-07 20:34           ` Tejun Heo
  2012-09-07 23:05             ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 20:34 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Fri, Sep 07, 2012 at 01:22:49PM -0700, Tejun Heo wrote:
> So, how about something like the following?
> 
> * Make manage_workers() called outside gcwq->lock (or drop gcwq->lock
>   after checking MANAGING).  worker_thread() can jump back to woke_up:
>   instead.
> 
> * Distinguish synchronization among workers and against hotplug.  Was
>   this what you tried with non_manager_mutex?  Anyways, revive
>   WORKER_MANAGING to synchronize among workers.  If the worker won
>   MANAGING, drop gcwq->lock and mutex_lock() gcwq->hotplug_mutex and
>   then do other stuff.
> 
> This should prevent any idle worker passing through manage_workers()
> while hotplug is in progress.  Do you think it would work?

Something like the following.  Completely untested.  What do you
think?

Thanks.

 kernel/workqueue.c |   63 ++++++++++++++++++++++++-----------------------------
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dc7b845..4c7502d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
 
 	/* pool flags */
 	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
+	POOL_MANAGING_WORKERS	= 1 << 1,
 
 	/* worker flags */
 	WORKER_STARTED		= 1 << 0,	/* started */
@@ -165,7 +166,7 @@ struct worker_pool {
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
-	struct mutex		manager_mutex;	/* mutex manager should hold */
+	struct mutex		hotplug_mutex;	/* mutex manager should hold */
 	struct ida		worker_ida;	/* L: for worker IDs */
 };
 
@@ -652,7 +653,7 @@ static bool need_to_manage_workers(struct worker_pool *pool)
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_mutex);
+	bool managing = pool->flags & POOL_MANAGING_WORKERS;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1390,7 +1391,7 @@ static void rebind_workers(struct global_cwq *gcwq)
 	lockdep_assert_held(&gcwq->lock);
 
 	for_each_worker_pool(pool, gcwq)
-		lockdep_assert_held(&pool->manager_mutex);
+		lockdep_assert_held(&pool->hotplug_mutex);
 
 	/*
 	 * Rebind idle workers.  Interlocked both ways.  We wait for
@@ -1713,22 +1714,16 @@ static void gcwq_mayday_timeout(unsigned long __pool)
  * spin_lock_irq(gcwq->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.  Called only from
  * manager.
- *
- * RETURNS:
- * false if no action was taken and gcwq->lock stayed locked, true
- * otherwise.
  */
-static bool maybe_create_worker(struct worker_pool *pool)
-__releases(&gcwq->lock)
-__acquires(&gcwq->lock)
+static void maybe_create_worker(struct worker_pool *pool)
 {
 	struct global_cwq *gcwq = pool->gcwq;
 
+	spin_lock_irq(&gcwq->lock);
 	if (!need_to_create_worker(pool))
-		return false;
+		goto out_unlock;
 restart:
 	spin_unlock_irq(&gcwq->lock);
-
 	/* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
 	mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
 
@@ -1741,7 +1736,7 @@ restart:
 			spin_lock_irq(&gcwq->lock);
 			start_worker(worker);
 			BUG_ON(need_to_create_worker(pool));
-			return true;
+			goto out_unlock;
 		}
 
 		if (!need_to_create_worker(pool))
@@ -1758,7 +1753,8 @@ restart:
 	spin_lock_irq(&gcwq->lock);
 	if (need_to_create_worker(pool))
 		goto restart;
-	return true;
+out_unlock:
+	spin_unlock_irq(&gcwq->lock);
 }
 
 /**
@@ -1771,15 +1767,9 @@ restart:
  * LOCKING:
  * spin_lock_irq(gcwq->lock) which may be released and regrabbed
  * multiple times.  Called only from manager.
- *
- * RETURNS:
- * false if no action was taken and gcwq->lock stayed locked, true
- * otherwise.
  */
-static bool maybe_destroy_workers(struct worker_pool *pool)
+static void maybe_destroy_workers(struct worker_pool *pool)
 {
-	bool ret = false;
-
 	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
@@ -1793,10 +1783,7 @@ static bool maybe_destroy_workers(struct worker_pool *pool)
 		}
 
 		destroy_worker(worker);
-		ret = true;
 	}
-
-	return ret;
 }
 
 /**
@@ -1820,24 +1807,32 @@ static bool maybe_destroy_workers(struct worker_pool *pool)
  * some action was taken.
  */
 static bool manage_workers(struct worker *worker)
+	__releases(&gcwq->lock) __acquires(&gcwq->lock)
 {
 	struct worker_pool *pool = worker->pool;
-	bool ret = false;
+	struct global_cwq *gcwq = pool->gcwq;
 
-	if (!mutex_trylock(&pool->manager_mutex))
-		return ret;
+	if (pool->flags & POOL_MANAGING_WORKERS)
+		return false;
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
 
+	spin_unlock_irq(&gcwq->lock);
+
+	/* blah blah */
+	mutex_lock(&pool->hotplug_mutex);
+
 	/*
 	 * Destroy and then create so that may_start_working() is true
 	 * on return.
 	 */
-	ret |= maybe_destroy_workers(pool);
-	ret |= maybe_create_worker(pool);
+	maybe_destroy_workers(pool);
+	maybe_create_worker(pool);
 
-	mutex_unlock(&pool->manager_mutex);
-	return ret;
+	mutex_unlock(&pool->hotplug_mutex);
+
+	spin_lock_irq(&gcwq->lock);
+	return true;
 }
 
 /**
@@ -3399,7 +3394,7 @@ static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
 	struct worker_pool *pool;
 
 	for_each_worker_pool(pool, gcwq)
-		mutex_lock_nested(&pool->manager_mutex, pool - gcwq->pools);
+		mutex_lock_nested(&pool->hotplug_mutex, pool - gcwq->pools);
 	spin_lock_irq(&gcwq->lock);
 }
 
@@ -3410,7 +3405,7 @@ static void gcwq_release_management_and_unlock(struct global_cwq *gcwq)
 
 	spin_unlock_irq(&gcwq->lock);
 	for_each_worker_pool(pool, gcwq)
-		mutex_unlock(&pool->manager_mutex);
+		mutex_unlock(&pool->hotplug_mutex);
 }
 
 static void gcwq_unbind_fn(struct work_struct *work)
@@ -3749,7 +3744,7 @@ static int __init init_workqueues(void)
 			setup_timer(&pool->mayday_timer, gcwq_mayday_timeout,
 				    (unsigned long)pool);
 
-			mutex_init(&pool->manager_mutex);
+			mutex_init(&pool->hotplug_mutex);
 			ida_init(&pool->worker_ida);
 		}
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 20:34           ` Tejun Heo
@ 2012-09-07 23:05             ` Tejun Heo
  2012-09-07 23:07               ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 23:05 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

I got it down to the following but it creates a problem where CPU
hotplug queues a work item on worker->scheduled before the execution
loops starts.  :(

Need to think more about it.

 kernel/workqueue.c |   63 ++++++++++++++++++++++++-----------------------------
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index dc7b845..4c7502d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
 
 	/* pool flags */
 	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
+	POOL_MANAGING_WORKERS	= 1 << 1,
 
 	/* worker flags */
 	WORKER_STARTED		= 1 << 0,	/* started */
@@ -165,7 +166,7 @@ struct worker_pool {
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
-	struct mutex		manager_mutex;	/* mutex manager should hold */
+	struct mutex		hotplug_mutex;	/* mutex manager should hold */
 	struct ida		worker_ida;	/* L: for worker IDs */
 };
 
@@ -652,7 +653,7 @@ static bool need_to_manage_workers(struct worker_pool *pool)
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_mutex);
+	bool managing = pool->flags & POOL_MANAGING_WORKERS;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1390,7 +1391,7 @@ static void rebind_workers(struct global_cwq *gcwq)
 	lockdep_assert_held(&gcwq->lock);
 
 	for_each_worker_pool(pool, gcwq)
-		lockdep_assert_held(&pool->manager_mutex);
+		lockdep_assert_held(&pool->hotplug_mutex);
 
 	/*
 	 * Rebind idle workers.  Interlocked both ways.  We wait for
@@ -1713,22 +1714,16 @@ static void gcwq_mayday_timeout(unsigned long __pool)
  * spin_lock_irq(gcwq->lock) which may be released and regrabbed
  * multiple times.  Does GFP_KERNEL allocations.  Called only from
  * manager.
- *
- * RETURNS:
- * false if no action was taken and gcwq->lock stayed locked, true
- * otherwise.
  */
-static bool maybe_create_worker(struct worker_pool *pool)
-__releases(&gcwq->lock)
-__acquires(&gcwq->lock)
+static void maybe_create_worker(struct worker_pool *pool)
 {
 	struct global_cwq *gcwq = pool->gcwq;
 
+	spin_lock_irq(&gcwq->lock);
 	if (!need_to_create_worker(pool))
-		return false;
+		goto out_unlock;
 restart:
 	spin_unlock_irq(&gcwq->lock);
-
 	/* if we don't make progress in MAYDAY_INITIAL_TIMEOUT, call for help */
 	mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INITIAL_TIMEOUT);
 
@@ -1741,7 +1736,7 @@ restart:
 			spin_lock_irq(&gcwq->lock);
 			start_worker(worker);
 			BUG_ON(need_to_create_worker(pool));
-			return true;
+			goto out_unlock;
 		}
 
 		if (!need_to_create_worker(pool))
@@ -1758,7 +1753,8 @@ restart:
 	spin_lock_irq(&gcwq->lock);
 	if (need_to_create_worker(pool))
 		goto restart;
-	return true;
+out_unlock:
+	spin_unlock_irq(&gcwq->lock);
 }
 
 /**
@@ -1771,15 +1767,9 @@ restart:
  * LOCKING:
  * spin_lock_irq(gcwq->lock) which may be released and regrabbed
  * multiple times.  Called only from manager.
- *
- * RETURNS:
- * false if no action was taken and gcwq->lock stayed locked, true
- * otherwise.
  */
-static bool maybe_destroy_workers(struct worker_pool *pool)
+static void maybe_destroy_workers(struct worker_pool *pool)
 {
-	bool ret = false;
-
 	while (too_many_workers(pool)) {
 		struct worker *worker;
 		unsigned long expires;
@@ -1793,10 +1783,7 @@ static bool maybe_destroy_workers(struct worker_pool *pool)
 		}
 
 		destroy_worker(worker);
-		ret = true;
 	}
-
-	return ret;
 }
 
 /**
@@ -1820,24 +1807,32 @@ static bool maybe_destroy_workers(struct worker_pool *pool)
  * some action was taken.
  */
 static bool manage_workers(struct worker *worker)
+	__releases(&gcwq->lock) __acquires(&gcwq->lock)
 {
 	struct worker_pool *pool = worker->pool;
-	bool ret = false;
+	struct global_cwq *gcwq = pool->gcwq;
 
-	if (!mutex_trylock(&pool->manager_mutex))
-		return ret;
+	if (pool->flags & POOL_MANAGING_WORKERS)
+		return false;
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
 
+	spin_unlock_irq(&gcwq->lock);
+
+	/* blah blah */
+	mutex_lock(&pool->hotplug_mutex);
+
 	/*
 	 * Destroy and then create so that may_start_working() is true
 	 * on return.
 	 */
-	ret |= maybe_destroy_workers(pool);
-	ret |= maybe_create_worker(pool);
+	maybe_destroy_workers(pool);
+	maybe_create_worker(pool);
 
-	mutex_unlock(&pool->manager_mutex);
-	return ret;
+	mutex_unlock(&pool->hotplug_mutex);
+
+	spin_lock_irq(&gcwq->lock);
+	return true;
 }
 
 /**
@@ -3399,7 +3394,7 @@ static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
 	struct worker_pool *pool;
 
 	for_each_worker_pool(pool, gcwq)
-		mutex_lock_nested(&pool->manager_mutex, pool - gcwq->pools);
+		mutex_lock_nested(&pool->hotplug_mutex, pool - gcwq->pools);
 	spin_lock_irq(&gcwq->lock);
 }
 
@@ -3410,7 +3405,7 @@ static void gcwq_release_management_and_unlock(struct global_cwq *gcwq)
 
 	spin_unlock_irq(&gcwq->lock);
 	for_each_worker_pool(pool, gcwq)
-		mutex_unlock(&pool->manager_mutex);
+		mutex_unlock(&pool->hotplug_mutex);
 }
 
 static void gcwq_unbind_fn(struct work_struct *work)
@@ -3749,7 +3744,7 @@ static int __init init_workqueues(void)
 			setup_timer(&pool->mayday_timer, gcwq_mayday_timeout,
 				    (unsigned long)pool);
 
-			mutex_init(&pool->manager_mutex);
+			mutex_init(&pool->hotplug_mutex);
 			ida_init(&pool->worker_ida);
 		}
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 23:05             ` Tejun Heo
@ 2012-09-07 23:07               ` Tejun Heo
  2012-09-07 23:41                 ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 23:07 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

On Fri, Sep 07, 2012 at 04:05:56PM -0700, Tejun Heo wrote:
> I got it down to the following but it creates a problem where CPU
> hotplug queues a work item on worker->scheduled before the execution
> loops starts.  :(

Oops, wrong patch.  This is the right one.

Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
 
 	/* pool flags */
 	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
+	POOL_MANAGING_WORKERS   = 1 << 1,       /* managing workers */
 
 	/* worker flags */
 	WORKER_STARTED		= 1 << 0,	/* started */
@@ -165,7 +166,7 @@ struct worker_pool {
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
-	struct mutex		manager_mutex;	/* mutex manager should hold */
+	struct mutex		manager_mutex;	/* manager <-> CPU hotplug */
 	struct ida		worker_ida;	/* L: for worker IDs */
 };
 
@@ -652,7 +653,7 @@ static bool need_to_manage_workers(struc
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_mutex);
+	bool managing = pool->flags & POOL_MANAGING_WORKERS;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1820,14 +1821,35 @@ static bool maybe_destroy_workers(struct
  * some action was taken.
  */
 static bool manage_workers(struct worker *worker)
+	__releases(&gcwq->lock) __acquires(&gcwq->lock)
 {
 	struct worker_pool *pool = worker->pool;
+	struct global_cwq *gcwq = pool->gcwq;
 	bool ret = false;
 
-	if (!mutex_trylock(&pool->manager_mutex))
-		return ret;
+	if (pool->flags & POOL_MANAGING_WORKERS)
+ 		return ret;
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
+	pool->flags |= POOL_MANAGING_WORKERS;
+
+	/*
+	 * To simplify both worker management and CPU hotplug, hold off
+	 * management while hotplug is in progress.  CPU hotplug path can't
+	 * grab %POOL_MANAGING_WORKERS to achieve this because that can
+	 * lead to idle worker depletion (all become busy thinking someone
+	 * else is managing) which in turn can result in deadlock under
+	 * extreme circumstances.
+	 *
+	 * manager_mutex would always be free unless CPU hotplug is in
+	 * progress.  trylock first without dropping gcwq->lock.
+	 */
+	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+		spin_unlock_irq(&gcwq->lock);
+		mutex_lock(&pool->manager_mutex);
+		spin_lock_irq(&gcwq->lock);
+		ret = true;
+	}
 
 	/*
 	 * Destroy and then create so that may_start_working() is true
@@ -1836,6 +1858,7 @@ static bool manage_workers(struct worker
 	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
+	pool->flags &= ~POOL_MANAGING_WORKERS;
 	mutex_unlock(&pool->manager_mutex);
 	return ret;
 }
@@ -3393,7 +3416,7 @@ EXPORT_SYMBOL_GPL(work_busy);
  * cpu comes back online.
  */
 
-/* claim manager positions of all pools */
+/* claim manager positions of all pools, see manage_workers() for details */
 static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
 {
 	struct worker_pool *pool;


-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 23:07               ` Tejun Heo
@ 2012-09-07 23:41                 ` Tejun Heo
  2012-09-08 17:18                   ` Lai Jiangshan
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-07 23:41 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: linux-kernel

I think this should do it.  Can you spot any hole with the following
patch?

Thanks.

Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -66,6 +66,7 @@ enum {
 
 	/* pool flags */
 	POOL_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
+	POOL_MANAGING_WORKERS   = 1 << 1,       /* managing workers */
 
 	/* worker flags */
 	WORKER_STARTED		= 1 << 0,	/* started */
@@ -165,7 +166,7 @@ struct worker_pool {
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
-	struct mutex		manager_mutex;	/* mutex manager should hold */
+	struct mutex		manager_mutex;	/* manager <-> CPU hotplug */
 	struct ida		worker_ida;	/* L: for worker IDs */
 };
 
@@ -480,6 +481,7 @@ static atomic_t unbound_pool_nr_running[
 };
 
 static int worker_thread(void *__worker);
+static void process_scheduled_works(struct worker *worker);
 
 static int worker_pool_pri(struct worker_pool *pool)
 {
@@ -652,7 +654,7 @@ static bool need_to_manage_workers(struc
 /* Do we have too many workers and should some go away? */
 static bool too_many_workers(struct worker_pool *pool)
 {
-	bool managing = mutex_is_locked(&pool->manager_mutex);
+	bool managing = pool->flags & POOL_MANAGING_WORKERS;
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
@@ -1820,14 +1822,43 @@ static bool maybe_destroy_workers(struct
  * some action was taken.
  */
 static bool manage_workers(struct worker *worker)
+	__releases(&gcwq->lock) __acquires(&gcwq->lock)
 {
 	struct worker_pool *pool = worker->pool;
+	struct global_cwq *gcwq = pool->gcwq;
 	bool ret = false;
 
-	if (!mutex_trylock(&pool->manager_mutex))
-		return ret;
+	if (pool->flags & POOL_MANAGING_WORKERS)
+ 		return ret;
 
 	pool->flags &= ~POOL_MANAGE_WORKERS;
+	pool->flags |= POOL_MANAGING_WORKERS;
+
+	/*
+	 * To simplify both worker management and CPU hotplug, hold off
+	 * management while hotplug is in progress.  CPU hotplug path can't
+	 * grab %POOL_MANAGING_WORKERS to achieve this because that can
+	 * lead to idle worker depletion (all become busy thinking someone
+	 * else is managing) which in turn can result in deadlock under
+	 * extreme circumstances.
+	 *
+	 * manager_mutex would always be free unless CPU hotplug is in
+	 * progress.  trylock first without dropping gcwq->lock.
+	 */
+	if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
+		spin_unlock_irq(&gcwq->lock);
+		mutex_lock(&pool->manager_mutex);
+		spin_lock_irq(&gcwq->lock);
+
+		/*
+		 * CPU hotplug could have scheduled rebind_work while we're
+		 * waiting for manager_mutex.  Rebind before doing anything
+		 * else.  This has to be handled here.  worker_thread()
+		 * will be confused by the unexpected work item.
+		 */
+		process_scheduled_works(worker);
+		ret = true;
+	}
 
 	/*
 	 * Destroy and then create so that may_start_working() is true
@@ -1836,7 +1867,9 @@ static bool manage_workers(struct worker
 	ret |= maybe_destroy_workers(pool);
 	ret |= maybe_create_worker(pool);
 
+	pool->flags &= ~POOL_MANAGING_WORKERS;
 	mutex_unlock(&pool->manager_mutex);
+
 	return ret;
 }
 
@@ -3393,7 +3426,7 @@ EXPORT_SYMBOL_GPL(work_busy);
  * cpu comes back online.
  */
 
-/* claim manager positions of all pools */
+/* claim manager positions of all pools, see manage_workers() for details */
 static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
 {
 	struct worker_pool *pool;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-07 23:41                 ` Tejun Heo
@ 2012-09-08 17:18                   ` Lai Jiangshan
  2012-09-08 17:29                     ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2012-09-08 17:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Sat, Sep 8, 2012 at 7:41 AM, Tejun Heo <tj@kernel.org> wrote:
> I think this should do it.  Can you spot any hole with the following
> patch?
>
> Thanks.
>
> Index: work/kernel/workqueue.c
> ===================================================================
> --- work.orig/kernel/workqueue.c
> +++ work/kernel/workqueue.c
> @@ -66,6 +66,7 @@ enum {
>
>         /* pool flags */
>         POOL_MANAGE_WORKERS     = 1 << 0,       /* need to manage workers */
> +       POOL_MANAGING_WORKERS   = 1 << 1,       /* managing workers */
>
>         /* worker flags */
>         WORKER_STARTED          = 1 << 0,       /* started */
> @@ -165,7 +166,7 @@ struct worker_pool {
>         struct timer_list       idle_timer;     /* L: worker idle timeout */
>         struct timer_list       mayday_timer;   /* L: SOS timer for workers */
>
> -       struct mutex            manager_mutex;  /* mutex manager should hold */
> +       struct mutex            manager_mutex;  /* manager <-> CPU hotplug */
>         struct ida              worker_ida;     /* L: for worker IDs */
>  };
>
> @@ -480,6 +481,7 @@ static atomic_t unbound_pool_nr_running[
>  };
>
>  static int worker_thread(void *__worker);
> +static void process_scheduled_works(struct worker *worker);
>
>  static int worker_pool_pri(struct worker_pool *pool)
>  {
> @@ -652,7 +654,7 @@ static bool need_to_manage_workers(struc
>  /* Do we have too many workers and should some go away? */
>  static bool too_many_workers(struct worker_pool *pool)
>  {
> -       bool managing = mutex_is_locked(&pool->manager_mutex);
> +       bool managing = pool->flags & POOL_MANAGING_WORKERS;
>         int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
>         int nr_busy = pool->nr_workers - nr_idle;
>
> @@ -1820,14 +1822,43 @@ static bool maybe_destroy_workers(struct
>   * some action was taken.
>   */
>  static bool manage_workers(struct worker *worker)
> +       __releases(&gcwq->lock) __acquires(&gcwq->lock)
>  {
>         struct worker_pool *pool = worker->pool;
> +       struct global_cwq *gcwq = pool->gcwq;
>         bool ret = false;
>
> -       if (!mutex_trylock(&pool->manager_mutex))
> -               return ret;
> +       if (pool->flags & POOL_MANAGING_WORKERS)
> +               return ret;
>
>         pool->flags &= ~POOL_MANAGE_WORKERS;
> +       pool->flags |= POOL_MANAGING_WORKERS;
> +
> +       /*
> +        * To simplify both worker management and CPU hotplug, hold off
> +        * management while hotplug is in progress.  CPU hotplug path can't
> +        * grab %POOL_MANAGING_WORKERS to achieve this because that can
> +        * lead to idle worker depletion (all become busy thinking someone
> +        * else is managing) which in turn can result in deadlock under
> +        * extreme circumstances.
> +        *
> +        * manager_mutex would always be free unless CPU hotplug is in
> +        * progress.  trylock first without dropping gcwq->lock.
> +        */
> +       if (unlikely(!mutex_trylock(&pool->manager_mutex))) {
> +               spin_unlock_irq(&gcwq->lock);

hotplug can happen here.

> +               mutex_lock(&pool->manager_mutex);
> +               spin_lock_irq(&gcwq->lock);
> +
> +               /*
> +                * CPU hotplug could have scheduled rebind_work while we're
> +                * waiting for manager_mutex.  Rebind before doing anything
> +                * else.  This has to be handled here.  worker_thread()
> +                * will be confused by the unexpected work item.
> +                */
> +               process_scheduled_works(worker);

hotplug code can't iterate manager.  not rebind_work() nor UNBOUND for manager.

> +               ret = true;
> +       }
>
>         /*
>          * Destroy and then create so that may_start_working() is true
> @@ -1836,7 +1867,9 @@ static bool manage_workers(struct worker
>         ret |= maybe_destroy_workers(pool);
>         ret |= maybe_create_worker(pool);
>
> +       pool->flags &= ~POOL_MANAGING_WORKERS;
>         mutex_unlock(&pool->manager_mutex);
> +
>         return ret;
>  }
>
> @@ -3393,7 +3426,7 @@ EXPORT_SYMBOL_GPL(work_busy);
>   * cpu comes back online.
>   */
>
> -/* claim manager positions of all pools */
> +/* claim manager positions of all pools, see manage_workers() for details */
>  static void gcwq_claim_management_and_lock(struct global_cwq *gcwq)
>  {
>         struct worker_pool *pool;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-08 17:18                   ` Lai Jiangshan
@ 2012-09-08 17:29                     ` Tejun Heo
  2012-09-08 17:32                       ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-08 17:29 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

Hello, Lai.

On Sun, Sep 09, 2012 at 01:18:25AM +0800, Lai Jiangshan wrote:
> > +               /*
> > +                * CPU hotplug could have scheduled rebind_work while we're
> > +                * waiting for manager_mutex.  Rebind before doing anything
> > +                * else.  This has to be handled here.  worker_thread()
> > +                * will be confused by the unexpected work item.
> > +                */
> > +               process_scheduled_works(worker);
> 
> hotplug code can't iterate manager.  not rebind_work() nor UNBOUND for manager.

Ah, right.  It isn't either on idle or busy list.  Maybe have
pool->manager pointer?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-08 17:29                     ` Tejun Heo
@ 2012-09-08 17:32                       ` Tejun Heo
  2012-09-08 17:40                         ` Lai Jiangshan
  0 siblings, 1 reply; 17+ messages in thread
From: Tejun Heo @ 2012-09-08 17:32 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

On Sat, Sep 08, 2012 at 10:29:50AM -0700, Tejun Heo wrote:
> > hotplug code can't iterate manager.  not rebind_work() nor UNBOUND for manager.
> 
> Ah, right.  It isn't either on idle or busy list.  Maybe have
> pool->manager pointer?

Ooh, this is what you did with the new patchset, right?

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-08 17:32                       ` Tejun Heo
@ 2012-09-08 17:40                         ` Lai Jiangshan
  2012-09-08 17:41                           ` Tejun Heo
  0 siblings, 1 reply; 17+ messages in thread
From: Lai Jiangshan @ 2012-09-08 17:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Lai Jiangshan, linux-kernel

On Sun, Sep 9, 2012 at 1:32 AM, Tejun Heo <tj@kernel.org> wrote:
> On Sat, Sep 08, 2012 at 10:29:50AM -0700, Tejun Heo wrote:
>> > hotplug code can't iterate manager.  not rebind_work() nor UNBOUND for manager.
>>
>> Ah, right.  It isn't either on idle or busy list.  Maybe have
>> pool->manager pointer?
>
> Ooh, this is what you did with the new patchset, right?

I already did it in V5 patchset. not in new patchset. I just change it
as you like in V6.
I change the strategy of calling may_rebind_manager().

Thanks.
Lai

>
> --
> tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE
  2012-09-08 17:40                         ` Lai Jiangshan
@ 2012-09-08 17:41                           ` Tejun Heo
  0 siblings, 0 replies; 17+ messages in thread
From: Tejun Heo @ 2012-09-08 17:41 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Lai Jiangshan, linux-kernel

On Sun, Sep 09, 2012 at 01:40:19AM +0800, Lai Jiangshan wrote:
> On Sun, Sep 9, 2012 at 1:32 AM, Tejun Heo <tj@kernel.org> wrote:
> > On Sat, Sep 08, 2012 at 10:29:50AM -0700, Tejun Heo wrote:
> >> > hotplug code can't iterate manager.  not rebind_work() nor UNBOUND for manager.
> >>
> >> Ah, right.  It isn't either on idle or busy list.  Maybe have
> >> pool->manager pointer?
> >
> > Ooh, this is what you did with the new patchset, right?
> 
> I already did it in V5 patchset. not in new patchset. I just change it
> as you like in V6.
> I change the strategy of calling may_rebind_manager().

Yeah, you did that and a lot else too.  Let's please isolate the fixes
and think about restructuring later.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-09-08 17:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-06 20:06 [PATCH wq/for-3.6-fixes 1/3] workqueue: break out gcwq->lock locking from gcwq_claim/release_management_and_[un]lock() Tejun Heo
2012-09-06 20:07 ` [PATCH wq/for-3.6-fixes 2/3] workqueue: rename rebind_workers() to gcwq_associate() and let it handle locking and DISASSOCIATED clearing Tejun Heo
2012-09-06 20:08   ` [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker depletion during CPU_ONLINE Tejun Heo
2012-09-07  1:53     ` Lai Jiangshan
2012-09-07 19:25       ` Tejun Heo
2012-09-07  3:10     ` Lai Jiangshan
2012-09-07 19:29       ` Tejun Heo
2012-09-07 20:22         ` Tejun Heo
2012-09-07 20:34           ` Tejun Heo
2012-09-07 23:05             ` Tejun Heo
2012-09-07 23:07               ` Tejun Heo
2012-09-07 23:41                 ` Tejun Heo
2012-09-08 17:18                   ` Lai Jiangshan
2012-09-08 17:29                     ` Tejun Heo
2012-09-08 17:32                       ` Tejun Heo
2012-09-08 17:40                         ` Lai Jiangshan
2012-09-08 17:41                           ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).