All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] workqueue: not allow recursion run_workqueue
@ 2009-01-21  9:42 Lai Jiangshan
  2009-01-21 10:56 ` Ingo Molnar
  2009-01-21 11:12 ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Lai Jiangshan @ 2009-01-21  9:42 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linux Kernel Mailing List


1) lockdep will complain when recursion run_workqueue
2) works is not run orderly when recursion run_workqueue

3) BUG!
   We use recursion run_workqueue to hidden deadlock when
   keventd trying to flush its own queue.

   It's bug. When flush_workqueue()(nested in a work callback)returns,
   the workqueue is not really flushed, the sequence statement of
   this work callback will do some thing bad.

   So we should not allow workqueue trying to flush its own queue.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2f44583..1129cde 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -48,8 +48,6 @@ struct cpu_workqueue_struct {
 
 	struct workqueue_struct *wq;
 	struct task_struct *thread;
-
-	int run_depth;		/* Detect run_workqueue() recursion depth */
 } ____cacheline_aligned;
 
 /*
@@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 static void run_workqueue(struct cpu_workqueue_struct *cwq)
 {
 	spin_lock_irq(&cwq->lock);
-	cwq->run_depth++;
-	if (cwq->run_depth > 3) {
-		/* morton gets to eat his hat */
-		printk("%s: recursion depth exceeded: %d\n",
-			__func__, cwq->run_depth);
-		dump_stack();
-	}
 	while (!list_empty(&cwq->worklist)) {
 		struct work_struct *work = list_entry(cwq->worklist.next,
 						struct work_struct, entry);
@@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq)
 		spin_lock_irq(&cwq->lock);
 		cwq->current_work = NULL;
 	}
-	cwq->run_depth--;
 	spin_unlock_irq(&cwq->lock);
 }
 
@@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq,
 
 static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
 {
-	int active;
+	int active = 0;
+	struct wq_barrier barr;
 
-	if (cwq->thread == current) {
-		/*
-		 * Probably keventd trying to flush its own queue. So simply run
-		 * it by hand rather than deadlocking.
-		 */
-		run_workqueue(cwq);
-		active = 1;
-	} else {
-		struct wq_barrier barr;
+	WARN_ON(cwq->thread == current);
 
-		active = 0;
-		spin_lock_irq(&cwq->lock);
-		if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
-			insert_wq_barrier(cwq, &barr, &cwq->worklist);
-			active = 1;
-		}
-		spin_unlock_irq(&cwq->lock);
-
-		if (active)
-			wait_for_completion(&barr.done);
+	spin_lock_irq(&cwq->lock);
+	if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
+		insert_wq_barrier(cwq, &barr, &cwq->worklist);
+		active = 1;
 	}
+	spin_unlock_irq(&cwq->lock);
+
+	if (active)
+		wait_for_completion(&barr.done);
 
 	return active;
 }



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] workqueue: not allow recursion run_workqueue
  2009-01-21  9:42 [PATCH] workqueue: not allow recursion run_workqueue Lai Jiangshan
@ 2009-01-21 10:56 ` Ingo Molnar
  2009-01-21 11:12 ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2009-01-21 10:56 UTC (permalink / raw)
  To: Lai Jiangshan, Peter Zijlstra
  Cc: Oleg Nesterov, Andrew Morton, Linux Kernel Mailing List


* Lai Jiangshan <laijs@cn.fujitsu.com> wrote:

> 1) lockdep will complain when recursion run_workqueue
> 2) works is not run orderly when recursion run_workqueue
> 
> 3) BUG!
>    We use recursion run_workqueue to hidden deadlock when
>    keventd trying to flush its own queue.
> 
>    It's bug. When flush_workqueue()(nested in a work callback)returns,
>    the workqueue is not really flushed, the sequence statement of
>    this work callback will do some thing bad.
> 
>    So we should not allow workqueue trying to flush its own queue.

That's a nice change. I'm wondering what the existing users are though and 
how difficult they are to fix.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] workqueue: not allow recursion run_workqueue
  2009-01-21  9:42 [PATCH] workqueue: not allow recursion run_workqueue Lai Jiangshan
  2009-01-21 10:56 ` Ingo Molnar
@ 2009-01-21 11:12 ` Peter Zijlstra
  2009-01-21 12:45   ` Oleg Nesterov
  2009-01-22  6:03   ` Lai Jiangshan
  1 sibling, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2009-01-21 11:12 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Wed, 2009-01-21 at 17:42 +0800, Lai Jiangshan wrote:
> 1) lockdep will complain when recursion run_workqueue
> 2) works is not run orderly when recursion run_workqueue
> 
> 3) BUG!
>    We use recursion run_workqueue to hidden deadlock when
>    keventd trying to flush its own queue.
> 
>    It's bug. When flush_workqueue()(nested in a work callback)returns,
>    the workqueue is not really flushed, the sequence statement of
>    this work callback will do some thing bad.
> 
>    So we should not allow workqueue trying to flush its own queue.

The patch looks good, but I'm utterly failing to comprehend this
changelog. What exactly can go wrong (other than the obvious too deep
nest and the fact that lockdep will complain)?

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2f44583..1129cde 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -48,8 +48,6 @@ struct cpu_workqueue_struct {
>  
>  	struct workqueue_struct *wq;
>  	struct task_struct *thread;
> -
> -	int run_depth;		/* Detect run_workqueue() recursion depth */
>  } ____cacheline_aligned;
>  
>  /*
> @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
>  static void run_workqueue(struct cpu_workqueue_struct *cwq)
>  {
>  	spin_lock_irq(&cwq->lock);
> -	cwq->run_depth++;
> -	if (cwq->run_depth > 3) {
> -		/* morton gets to eat his hat */
> -		printk("%s: recursion depth exceeded: %d\n",
> -			__func__, cwq->run_depth);
> -		dump_stack();
> -	}
>  	while (!list_empty(&cwq->worklist)) {
>  		struct work_struct *work = list_entry(cwq->worklist.next,
>  						struct work_struct, entry);
> @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq)
>  		spin_lock_irq(&cwq->lock);
>  		cwq->current_work = NULL;
>  	}
> -	cwq->run_depth--;
>  	spin_unlock_irq(&cwq->lock);
>  }
>  
> @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq,
>  
>  static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
>  {
> -	int active;
> +	int active = 0;
> +	struct wq_barrier barr;
>  
> -	if (cwq->thread == current) {
> -		/*
> -		 * Probably keventd trying to flush its own queue. So simply run
> -		 * it by hand rather than deadlocking.
> -		 */
> -		run_workqueue(cwq);
> -		active = 1;
> -	} else {
> -		struct wq_barrier barr;
> +	WARN_ON(cwq->thread == current);
>  
> -		active = 0;
> -		spin_lock_irq(&cwq->lock);
> -		if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> -			insert_wq_barrier(cwq, &barr, &cwq->worklist);
> -			active = 1;
> -		}
> -		spin_unlock_irq(&cwq->lock);
> -
> -		if (active)
> -			wait_for_completion(&barr.done);
> +	spin_lock_irq(&cwq->lock);
> +	if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> +		insert_wq_barrier(cwq, &barr, &cwq->worklist);
> +		active = 1;
>  	}
> +	spin_unlock_irq(&cwq->lock);
> +
> +	if (active)
> +		wait_for_completion(&barr.done);
>  
>  	return active;
>  }
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] workqueue: not allow recursion run_workqueue
  2009-01-21 11:12 ` Peter Zijlstra
@ 2009-01-21 12:45   ` Oleg Nesterov
  2009-01-22  6:03   ` Lai Jiangshan
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2009-01-21 12:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Lai Jiangshan, Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On 01/21, Peter Zijlstra wrote:
>
> On Wed, 2009-01-21 at 17:42 +0800, Lai Jiangshan wrote:
> > 1) lockdep will complain when recursion run_workqueue
> > 2) works is not run orderly when recursion run_workqueue
> >
> > 3) BUG!
> >    We use recursion run_workqueue to hidden deadlock when
> >    keventd trying to flush its own queue.
> >
> >    It's bug. When flush_workqueue()(nested in a work callback)returns,
> >    the workqueue is not really flushed, the sequence statement of
> >    this work callback will do some thing bad.
> >
> >    So we should not allow workqueue trying to flush its own queue.
>
> The patch looks good, but I'm utterly failing to comprehend this
> changelog. What exactly can go wrong (other than the obvious too deep
> nest and the fact that lockdep will complain)?

I am confused too.


But the change itself looks good to me, I am only worried if we still
have the callers of flush() from within work->func().

> +	WARN_ON(cwq->thread == current);

probably BUG_ON() is better, we are going to deadlock in this case.

Oleg.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] workqueue: not allow recursion run_workqueue
  2009-01-21 11:12 ` Peter Zijlstra
  2009-01-21 12:45   ` Oleg Nesterov
@ 2009-01-22  6:03   ` Lai Jiangshan
  2009-01-22  9:52     ` Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Lai Jiangshan @ 2009-01-22  6:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

Peter Zijlstra wrote:
> On Wed, 2009-01-21 at 17:42 +0800, Lai Jiangshan wrote:
>> 1) lockdep will complain when recursion run_workqueue
>> 2) works is not run orderly when recursion run_workqueue
>>
>> 3) BUG!
>>    We use recursion run_workqueue to hidden deadlock when
>>    keventd trying to flush its own queue.
>>
>>    It's bug. When flush_workqueue()(nested in a work callback)returns,
>>    the workqueue is not really flushed, the sequence statement of
>>    this work callback will do some thing bad.
>>
>>    So we should not allow workqueue trying to flush its own queue.
> 
> The patch looks good, but I'm utterly failing to comprehend this
> changelog. What exactly can go wrong (other than the obvious too deep
> nest and the fact that lockdep will complain)?

void do_some_cleanup(void)
{
	find_all_queued_work_struct_and_mark_it_old();
	flush_workqueue(workqueue);
	/* we can destroy old work_struct for we have flushed them */
	destroy_old_work_structs();
}

if work->func() called do_some_cleanup(), it's very probably a bug.

> 
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 2f44583..1129cde 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -48,8 +48,6 @@ struct cpu_workqueue_struct {
>>  
>>  	struct workqueue_struct *wq;
>>  	struct task_struct *thread;
>> -
>> -	int run_depth;		/* Detect run_workqueue() recursion depth */
>>  } ____cacheline_aligned;
>>  
>>  /*
>> @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on);
>>  static void run_workqueue(struct cpu_workqueue_struct *cwq)
>>  {
>>  	spin_lock_irq(&cwq->lock);
>> -	cwq->run_depth++;
>> -	if (cwq->run_depth > 3) {
>> -		/* morton gets to eat his hat */
>> -		printk("%s: recursion depth exceeded: %d\n",
>> -			__func__, cwq->run_depth);
>> -		dump_stack();
>> -	}
>>  	while (!list_empty(&cwq->worklist)) {
>>  		struct work_struct *work = list_entry(cwq->worklist.next,
>>  						struct work_struct, entry);
>> @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq)
>>  		spin_lock_irq(&cwq->lock);
>>  		cwq->current_work = NULL;
>>  	}
>> -	cwq->run_depth--;
>>  	spin_unlock_irq(&cwq->lock);
>>  }
>>  
>> @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq,
>>  
>>  static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq)
>>  {
>> -	int active;
>> +	int active = 0;
>> +	struct wq_barrier barr;
>>  
>> -	if (cwq->thread == current) {
>> -		/*
>> -		 * Probably keventd trying to flush its own queue. So simply run
>> -		 * it by hand rather than deadlocking.
>> -		 */
>> -		run_workqueue(cwq);
>> -		active = 1;
>> -	} else {
>> -		struct wq_barrier barr;
>> +	WARN_ON(cwq->thread == current);
>>  
>> -		active = 0;
>> -		spin_lock_irq(&cwq->lock);
>> -		if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
>> -			insert_wq_barrier(cwq, &barr, &cwq->worklist);
>> -			active = 1;
>> -		}
>> -		spin_unlock_irq(&cwq->lock);
>> -
>> -		if (active)
>> -			wait_for_completion(&barr.done);
>> +	spin_lock_irq(&cwq->lock);
>> +	if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
>> +		insert_wq_barrier(cwq, &barr, &cwq->worklist);
>> +		active = 1;
>>  	}
>> +	spin_unlock_irq(&cwq->lock);
>> +
>> +	if (active)
>> +		wait_for_completion(&barr.done);
>>  
>>  	return active;
>>  }
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] workqueue: not allow recursion run_workqueue
  2009-01-22  6:03   ` Lai Jiangshan
@ 2009-01-22  9:52     ` Peter Zijlstra
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2009-01-22  9:52 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Linux Kernel Mailing List

On Thu, 2009-01-22 at 14:03 +0800, Lai Jiangshan wrote:
> 
> void do_some_cleanup(void)
> {
>         find_all_queued_work_struct_and_mark_it_old();
>         flush_workqueue(workqueue);
>         /* we can destroy old work_struct for we have flushed them */
>         destroy_old_work_structs();
> }
> 
> if work->func() called do_some_cleanup(), it's very probably a bug.

Of course it is, if only because calling flush on the same workqueue is
pretty dumb.

But I'm still not getting it, flush_workqueue() provides the guarantee
that all work enqueued previous to the call will be finished thereafter.

The self-flush stuff you propose to rip out doesn't violate that
guarantee afaict.

Suppose we have a workqueue Q, with pending work W1..Wn.

Suppose W5 will have the nested flush, it will then recursively complete
W6..Wn+i, where i accounts for any concurrent worklet additions.

Therefore it will have completed (at least) those worklets that were
enqueued at the time flush got called.

So, to get back at your changelog.

 1) yes lockdep will complain -- for good reasons, and I'm all for
getting rid of this mis-feature.

 2) I've no clue what you're on about

 3) more mystery.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-01-22  9:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-21  9:42 [PATCH] workqueue: not allow recursion run_workqueue Lai Jiangshan
2009-01-21 10:56 ` Ingo Molnar
2009-01-21 11:12 ` Peter Zijlstra
2009-01-21 12:45   ` Oleg Nesterov
2009-01-22  6:03   ` Lai Jiangshan
2009-01-22  9:52     ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.