* [PATCH 2/3] workqueue: not allow recursion run_workqueue @ 2009-01-22 9:14 Lai Jiangshan 2009-01-22 9:30 ` Frederic Weisbecker 0 siblings, 1 reply; 20+ messages in thread From: Lai Jiangshan @ 2009-01-22 9:14 UTC (permalink / raw) To: Oleg Nesterov, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Frédéric Weisbecker, Linux Kernel Mailing List 1) lockdep will complain when recursion run_workqueue 2) works is not run orderly when recursion run_workqueue 3) BUG! We use recursion run_workqueue to hidden deadlock when keventd trying to flush its own queue. It's bug. When flush_workqueue()(nested in a work callback)returns, the workqueue is not really flushed, the sequence statement of this work callback will do some thing bad. So we should not allow workqueue trying to flush its own queue. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2f44583..1129cde 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { struct workqueue_struct *wq; struct task_struct *thread; - - int run_depth; /* Detect run_workqueue() recursion depth */ } ____cacheline_aligned; /* @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); static void run_workqueue(struct cpu_workqueue_struct *cwq) { spin_lock_irq(&cwq->lock); - cwq->run_depth++; - if (cwq->run_depth > 3) { - /* morton gets to eat his hat */ - printk("%s: recursion depth exceeded: %d\n", - __func__, cwq->run_depth); - dump_stack(); - } while (!list_empty(&cwq->worklist)) { struct work_struct *work = list_entry(cwq->worklist.next, struct work_struct, entry); @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) spin_lock_irq(&cwq->lock); cwq->current_work = NULL; } - cwq->run_depth--; spin_unlock_irq(&cwq->lock); } @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) { - int active; + int active = 0; + struct wq_barrier barr; - if (cwq->thread == current) { - /* - * Probably keventd trying to flush its own queue. So simply run - * it by hand rather than deadlocking. - */ - run_workqueue(cwq); - active = 1; - } else { - struct wq_barrier barr; + BUG_ON(cwq->thread == current); - active = 0; - spin_lock_irq(&cwq->lock); - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { - insert_wq_barrier(cwq, &barr, &cwq->worklist); - active = 1; - } - spin_unlock_irq(&cwq->lock); - - if (active) - wait_for_completion(&barr.done); + spin_lock_irq(&cwq->lock); + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { + insert_wq_barrier(cwq, &barr, &cwq->worklist); + active = 1; } + spin_unlock_irq(&cwq->lock); + + if (active) + wait_for_completion(&barr.done); return active; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 9:14 [PATCH 2/3] workqueue: not allow recursion run_workqueue Lai Jiangshan @ 2009-01-22 9:30 ` Frederic Weisbecker 2009-01-22 9:36 ` Ingo Molnar ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Frederic Weisbecker @ 2009-01-22 9:30 UTC (permalink / raw) To: Lai Jiangshan Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: > 1) lockdep will complain when recursion run_workqueue > 2) works is not run orderly when recursion run_workqueue > > 3) BUG! > We use recursion run_workqueue to hidden deadlock when > keventd trying to flush its own queue. > > It's bug. When flush_workqueue()(nested in a work callback)returns, > the workqueue is not really flushed, the sequence statement of > this work callback will do some thing bad. > > So we should not allow workqueue trying to flush its own queue. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > --- > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2f44583..1129cde 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > struct workqueue_struct *wq; > struct task_struct *thread; > - > - int run_depth; /* Detect run_workqueue() recursion depth */ > } ____cacheline_aligned; > > /* > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > static void run_workqueue(struct cpu_workqueue_struct *cwq) > { > spin_lock_irq(&cwq->lock); > - cwq->run_depth++; > - if (cwq->run_depth > 3) { > - /* morton gets to eat his hat */ > - printk("%s: recursion depth exceeded: %d\n", > - __func__, cwq->run_depth); > - dump_stack(); > - } > while (!list_empty(&cwq->worklist)) { > struct work_struct *work = list_entry(cwq->worklist.next, > struct work_struct, entry); > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > spin_lock_irq(&cwq->lock); > cwq->current_work = NULL; > } > - cwq->run_depth--; > spin_unlock_irq(&cwq->lock); > } > > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > { > - int active; > + int active = 0; > + struct wq_barrier barr; > > - if (cwq->thread == current) { > - /* > - * Probably keventd trying to flush its own queue. So simply run > - * it by hand rather than deadlocking. > - */ > - run_workqueue(cwq); > - active = 1; > - } else { > - struct wq_barrier barr; > + BUG_ON(cwq->thread == current); Hi Lai, BUG_ON seems perhaps a bit too much for such case. The system will run in an endless loop because of a mistake that will not have necessarily a fatal end. WARN_ON should be enough (plus the warn that lockdep will raise too in this case). Thanks. Frederic. > - active = 0; > - spin_lock_irq(&cwq->lock); > - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > - insert_wq_barrier(cwq, &barr, &cwq->worklist); > - active = 1; > - } > - spin_unlock_irq(&cwq->lock); > - > - if (active) > - wait_for_completion(&barr.done); > + spin_lock_irq(&cwq->lock); > + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > + insert_wq_barrier(cwq, &barr, &cwq->worklist); > + active = 1; > } > + spin_unlock_irq(&cwq->lock); > + > + if (active) > + wait_for_completion(&barr.done); > > return active; > } > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 9:30 ` Frederic Weisbecker @ 2009-01-22 9:36 ` Ingo Molnar 2009-01-22 11:06 ` Frédéric Weisbecker 2009-01-22 9:39 ` Frederic Weisbecker 2009-01-22 17:23 ` Oleg Nesterov 2 siblings, 1 reply; 20+ messages in thread From: Ingo Molnar @ 2009-01-22 9:36 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lai Jiangshan, Oleg Nesterov, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List * Frederic Weisbecker <fweisbec@gmail.com> wrote: > On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: > > 1) lockdep will complain when recursion run_workqueue > > 2) works is not run orderly when recursion run_workqueue > > > > 3) BUG! > > We use recursion run_workqueue to hidden deadlock when > > keventd trying to flush its own queue. > > > > It's bug. When flush_workqueue()(nested in a work callback)returns, > > the workqueue is not really flushed, the sequence statement of > > this work callback will do some thing bad. > > > > So we should not allow workqueue trying to flush its own queue. > > > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > > --- > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 2f44583..1129cde 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > > > struct workqueue_struct *wq; > > struct task_struct *thread; > > - > > - int run_depth; /* Detect run_workqueue() recursion depth */ > > } ____cacheline_aligned; > > > > /* > > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > > static void run_workqueue(struct cpu_workqueue_struct *cwq) > > { > > spin_lock_irq(&cwq->lock); > > - cwq->run_depth++; > > - if (cwq->run_depth > 3) { > > - /* morton gets to eat his hat */ > > - printk("%s: recursion depth exceeded: %d\n", > > - __func__, cwq->run_depth); > > - dump_stack(); > > - } > > while (!list_empty(&cwq->worklist)) { > > struct work_struct *work = list_entry(cwq->worklist.next, > > struct work_struct, entry); > > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > > spin_lock_irq(&cwq->lock); > > cwq->current_work = NULL; > > } > > - cwq->run_depth--; > > spin_unlock_irq(&cwq->lock); > > } > > > > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, > > > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > > { > > - int active; > > + int active = 0; > > + struct wq_barrier barr; > > > > - if (cwq->thread == current) { > > - /* > > - * Probably keventd trying to flush its own queue. So simply run > > - * it by hand rather than deadlocking. > > - */ > > - run_workqueue(cwq); > > - active = 1; > > - } else { > > - struct wq_barrier barr; > > + BUG_ON(cwq->thread == current); > > Hi Lai, > > BUG_ON seems perhaps a bit too much for such case. The system > will run in an endless loop because of a mistake that will not have > necessarily a fatal end. > WARN_ON should be enough (plus the warn that lockdep will raise > too in this case). WARN_ONCE() is the best method usually - we want a one-time and expressive warning, not just a stack dump. (i.e. not WARN_ON and not WARN_ON_ONCE) Plus some thinking needs to be put into exiting from that function in a way that the system will still be usable enough to report the bug. Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 9:36 ` Ingo Molnar @ 2009-01-22 11:06 ` Frédéric Weisbecker 2009-01-22 11:10 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Frédéric Weisbecker @ 2009-01-22 11:06 UTC (permalink / raw) To: Ingo Molnar Cc: Lai Jiangshan, Oleg Nesterov, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List 2009/1/22 Ingo Molnar <mingo@elte.hu>: > > * Frederic Weisbecker <fweisbec@gmail.com> wrote: > >> On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: >> > 1) lockdep will complain when recursion run_workqueue >> > 2) works is not run orderly when recursion run_workqueue >> > >> > 3) BUG! >> > We use recursion run_workqueue to hidden deadlock when >> > keventd trying to flush its own queue. >> > >> > It's bug. When flush_workqueue()(nested in a work callback)returns, >> > the workqueue is not really flushed, the sequence statement of >> > this work callback will do some thing bad. >> > >> > So we should not allow workqueue trying to flush its own queue. >> > >> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> >> > --- >> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c >> > index 2f44583..1129cde 100644 >> > --- a/kernel/workqueue.c >> > +++ b/kernel/workqueue.c >> > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { >> > >> > struct workqueue_struct *wq; >> > struct task_struct *thread; >> > - >> > - int run_depth; /* Detect run_workqueue() recursion depth */ >> > } ____cacheline_aligned; >> > >> > /* >> > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); >> > static void run_workqueue(struct cpu_workqueue_struct *cwq) >> > { >> > spin_lock_irq(&cwq->lock); >> > - cwq->run_depth++; >> > - if (cwq->run_depth > 3) { >> > - /* morton gets to eat his hat */ >> > - printk("%s: recursion depth exceeded: %d\n", >> > - __func__, cwq->run_depth); >> > - dump_stack(); >> > - } >> > while (!list_empty(&cwq->worklist)) { >> > struct work_struct *work = list_entry(cwq->worklist.next, >> > struct work_struct, entry); >> > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) >> > spin_lock_irq(&cwq->lock); >> > cwq->current_work = NULL; >> > } >> > - cwq->run_depth--; >> > spin_unlock_irq(&cwq->lock); >> > } >> > >> > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, >> > >> > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) >> > { >> > - int active; >> > + int active = 0; >> > + struct wq_barrier barr; >> > >> > - if (cwq->thread == current) { >> > - /* >> > - * Probably keventd trying to flush its own queue. So simply run >> > - * it by hand rather than deadlocking. >> > - */ >> > - run_workqueue(cwq); >> > - active = 1; >> > - } else { >> > - struct wq_barrier barr; >> > + BUG_ON(cwq->thread == current); >> >> Hi Lai, >> >> BUG_ON seems perhaps a bit too much for such case. The system >> will run in an endless loop because of a mistake that will not have >> necessarily a fatal end. >> WARN_ON should be enough (plus the warn that lockdep will raise >> too in this case). > > WARN_ONCE() is the best method usually - we want a one-time and expressive > warning, not just a stack dump. (i.e. not WARN_ON and not WARN_ON_ONCE) > > Plus some thinking needs to be put into exiting from that function in a way > that the system will still be usable enough to report the bug. > > Ingo > Ok. Oh but I haven't seen that Oleg said he prefered bug_on, because the system will deadlock instead....hmm... Or perhaps keeping the things like the old way, but with a WARN_ONCE: if (cwq->thread == current) { /* * Don't ever think to flush workqueue from a work */ WARN_ONCE(1); run_workqueue(cwq); active = 1; } And then, the workqueue will flush...so it will behave correctly but will warn on this bad developer idea of flushing from a work. Actually I don't understand when Lai says that it will actually not flush. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 11:06 ` Frédéric Weisbecker @ 2009-01-22 11:10 ` Peter Zijlstra 2009-02-05 8:18 ` Lai Jiangshan 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2009-01-22 11:10 UTC (permalink / raw) To: Frédéric Weisbecker Cc: Ingo Molnar, Lai Jiangshan, Oleg Nesterov, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On Thu, 2009-01-22 at 12:06 +0100, Frédéric Weisbecker wrote: > Ok. > Oh but I haven't seen that Oleg said he prefered bug_on, because the > system will deadlock instead....hmm... > > Or perhaps keeping the things like the old way, but with a WARN_ONCE: > > if (cwq->thread == current) { > /* > * Don't ever think to flush workqueue from a work > */ > WARN_ONCE(1); > > run_workqueue(cwq); > active = 1; > } > > And then, the workqueue will flush...so it will behave correctly but > will warn on this bad developer idea of flushing from a work. lockdep already yells at you for doing that, and developers should run with lockdep enabled -- or at least test stuff with lockdep enabled, so I'm not exactly seeing what this will buy us. > Actually I don't understand when Lai says that it will actually not flush. Yeah, his changelog is an utter mistery to many.. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 11:10 ` Peter Zijlstra @ 2009-02-05 8:18 ` Lai Jiangshan 2009-02-05 13:47 ` Frederic Weisbecker ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Lai Jiangshan @ 2009-02-05 8:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Frédéric Weisbecker, Ingo Molnar, Oleg Nesterov, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List Peter Zijlstra wrote: > On Thu, 2009-01-22 at 12:06 +0100, Frédéric Weisbecker wrote: > >> Actually I don't understand when Lai says that it will actually not flush. > > Yeah, his changelog is an utter mistery to many.. > > ---- Suppose what I wanted to say is A, but sometimes I wrote B for my poor English, and people got C when they read it. Thank you, Peter. ---- "if (cwq->thread == current)" is a narrowed checking. lockdep can perform the proper checking. I think we could hardly write some code which can perform the proper checking when lockdep is off. Why "if (cwq->thread == current)" is a narrowed checking, It hasn't tested "if (brother_cwq->thread == current)". (*brother* cwq) DEADLOCK EXAMPLE for explain my above option: (work_func0() and work_func1() are work callback, and they calls flush_workqueue()) CPU#0 CPU#1 run_workqueue() run_workqueue() work_func0() work_func1() flush_workqueue() flush_workqueue() flush_cpu_workqueue(0) . flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 DEADLOCK! So we do not allow recursion. And "BUG_ON(cwq->thread == current)" is not enough(but it's better than we don't have this line, I think). we should use lockdep to detect recursion when we develop. Answer other email-thread: Peter Zijlstra wrote: > On Thu, 2009-01-22 at 14:03 +0800, Lai Jiangshan wrote: >> void do_some_cleanup(void) >> { >> find_all_queued_work_struct_and_mark_it_old(); >> flush_workqueue(workqueue); >> /* we can destroy old work_struct for we have flushed them */ >> destroy_old_work_structs(); >> } >> >> if work->func() called do_some_cleanup(), it's very probably a bug. > > Of course it is, if only because calling flush on the same workqueue is > pretty dumb. flush_workqueue() should ensure works are finished, but this example shows the work hasn't finished, so flush_workqueue()'s code is not right. See also flush_workqueue()'s doc: * We sleep until all works which were queued on entry have been handled, * but we are not livelocked by new incoming ones. And this example show a bug(destroy the work which still be used) for recursion. So in my changlog: I said it hide deadlock: "We use recursion run_workqueue to hidden deadlock when keventd trying to flush its own queue." I said it will be bug(for flush_workqueue() and it's doc is inconsistent): "It's bug. When flush_workqueue()(nested in a work callback)returns, the workqueue is not really flushed, the sequence statement of this work callback will do some thing bad." And I concluded: "So we should not allow workqueue trying to flush its own queue." If it still mistery, I will explain more. I will change my changlog too, I sincerely hope you help me more. Thanks, Lai > > But I'm still not getting it, flush_workqueue() provides the guarantee > that all work enqueued previous to the call will be finished thereafter. In my example, flush_workqueue() can't guarantee. > > The self-flush stuff you propose to rip out doesn't violate that > guarantee afaict. > > Suppose we have a workqueue Q, with pending work W1..Wn. > > Suppose W5 will have the nested flush, it will then recursively complete > W6..Wn+i, where i accounts for any concurrent worklet additions. > > Therefore it will have completed (at least) those worklets that were > enqueued at the time flush got called. > > So, to get back at your changelog. > > 1) yes lockdep will complain -- for good reasons, and I'm all for > getting rid of this mis-feature. > > 2) I've no clue what you're on about > > 3) more mystery. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 8:18 ` Lai Jiangshan @ 2009-02-05 13:47 ` Frederic Weisbecker 2009-02-05 17:01 ` Oleg Nesterov 2009-02-06 1:46 ` Lai Jiangshan 2 siblings, 0 replies; 20+ messages in thread From: Frederic Weisbecker @ 2009-02-05 13:47 UTC (permalink / raw) To: Lai Jiangshan Cc: Peter Zijlstra, Ingo Molnar, Oleg Nesterov, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List Hi Lai, On Thu, Feb 05, 2009 at 04:18:57PM +0800, Lai Jiangshan wrote: > Peter Zijlstra wrote: > > On Thu, 2009-01-22 at 12:06 +0100, Frédéric Weisbecker wrote: > > > >> Actually I don't understand when Lai says that it will actually not flush. > > > > Yeah, his changelog is an utter mistery to many.. > > > > > > ---- > Suppose what I wanted to say is A, but sometimes I wrote B for my poor > English, and people got C when they read it. Thank you, Peter. > ---- Me too! My poor english takes me double time to explain something :-) > "if (cwq->thread == current)" is a narrowed checking. lockdep can perform > the proper checking. I think we could hardly write some code which can > perform the proper checking when lockdep is off. > > Why "if (cwq->thread == current)" is a narrowed checking, > It hasn't tested "if (brother_cwq->thread == current)". (*brother* cwq) > > DEADLOCK EXAMPLE for explain my above option: > > (work_func0() and work_func1() are work callback, and they > calls flush_workqueue()) > > CPU#0 CPU#1 > run_workqueue() run_workqueue() > work_func0() work_func1() > flush_workqueue() flush_workqueue() > flush_cpu_workqueue(0) . > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 Heh you're right, I did not imagine this one. But this race condition should be rare, and still, lockdep should have warned before concerning the recursion flushing, hopefully assuming the developer built lockdep. > DEADLOCK! > So we do not allow recursion. > And "BUG_ON(cwq->thread == current)" is not enough(but it's better > than we don't have this line, I think). we should use lockdep to detect > recursion when we develop. > > Answer other email-thread: > > Peter Zijlstra wrote: > > On Thu, 2009-01-22 at 14:03 +0800, Lai Jiangshan wrote: > >> void do_some_cleanup(void) > >> { > >> find_all_queued_work_struct_and_mark_it_old(); > >> flush_workqueue(workqueue); > >> /* we can destroy old work_struct for we have flushed them */ > >> destroy_old_work_structs(); > >> } > >> > >> if work->func() called do_some_cleanup(), it's very probably a bug. > > > > Of course it is, if only because calling flush on the same workqueue is > > pretty dumb. > > flush_workqueue() should ensure works are finished, but this example shows > the work hasn't finished, so flush_workqueue()'s code is not right. > > See also flush_workqueue()'s doc: > * We sleep until all works which were queued on entry have been handled, > * but we are not livelocked by new incoming ones. > > And this example show a bug(destroy the work which still be used) > for recursion. So in my changlog: > > I said it hide deadlock: > "We use recursion run_workqueue to hidden deadlock when > keventd trying to flush its own queue." > > I said it will be bug(for flush_workqueue() and it's doc is inconsistent): > "It's bug. When flush_workqueue()(nested in a work callback)returns, > the workqueue is not really flushed, the sequence statement of > this work callback will do some thing bad." > > And I concluded: > "So we should not allow workqueue trying to flush its own queue." > > If it still mistery, I will explain more. > I will change my changlog too, I sincerely hope you help me more. > > Thanks, Lai > > > > > But I'm still not getting it, flush_workqueue() provides the guarantee > > that all work enqueued previous to the call will be finished thereafter. > > In my example, flush_workqueue() can't guarantee. > > > > > The self-flush stuff you propose to rip out doesn't violate that > > guarantee afaict. > > > > Suppose we have a workqueue Q, with pending work W1..Wn. > > > > Suppose W5 will have the nested flush, it will then recursively complete > > W6..Wn+i, where i accounts for any concurrent worklet additions. > > > > Therefore it will have completed (at least) those worklets that were > > enqueued at the time flush got called. > > > > So, to get back at your changelog. > > > > 1) yes lockdep will complain -- for good reasons, and I'm all for > > getting rid of this mis-feature. > > > > 2) I've no clue what you're on about > > > > 3) more mystery. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 8:18 ` Lai Jiangshan 2009-02-05 13:47 ` Frederic Weisbecker @ 2009-02-05 17:01 ` Oleg Nesterov 2009-02-05 17:24 ` Frederic Weisbecker 2009-02-06 1:20 ` Lai Jiangshan 2009-02-06 1:46 ` Lai Jiangshan 2 siblings, 2 replies; 20+ messages in thread From: Oleg Nesterov @ 2009-02-05 17:01 UTC (permalink / raw) To: Lai Jiangshan Cc: Peter Zijlstra, Frédéric Weisbecker, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On 02/05, Lai Jiangshan wrote: > > DEADLOCK EXAMPLE for explain my above option: > > (work_func0() and work_func1() are work callback, and they > calls flush_workqueue()) > > CPU#0 CPU#1 > run_workqueue() run_workqueue() > work_func0() work_func1() > flush_workqueue() flush_workqueue() > flush_cpu_workqueue(0) . > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 > > DEADLOCK! I am not sure. Note that when work_func0() calls run_workqueue(), it will clear cwq->current_work, so another flush_ on CPU#1 will not wait for work_func0, no? But anyway. Nobody argues, "if (cwq->thread == current) {...}" code in flush_cpu_workqueue() is bad and should die. Otrherwise, we should fix the lockdep warning ;) The only problem: if we still have the users of this hack, they will deadlock. But perhaps it is time to fix them. And, if it was not clear, I do agree with this change. And Peter seems to agree as well. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 17:01 ` Oleg Nesterov @ 2009-02-05 17:24 ` Frederic Weisbecker 2009-02-05 18:00 ` Oleg Nesterov 2009-02-06 1:20 ` Lai Jiangshan 1 sibling, 1 reply; 20+ messages in thread From: Frederic Weisbecker @ 2009-02-05 17:24 UTC (permalink / raw) To: Oleg Nesterov Cc: Lai Jiangshan, Peter Zijlstra, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On Thu, Feb 05, 2009 at 06:01:56PM +0100, Oleg Nesterov wrote: > On 02/05, Lai Jiangshan wrote: > > > > DEADLOCK EXAMPLE for explain my above option: > > > > (work_func0() and work_func1() are work callback, and they > > calls flush_workqueue()) > > > > CPU#0 CPU#1 > > run_workqueue() run_workqueue() > > work_func0() work_func1() > > flush_workqueue() flush_workqueue() > > flush_cpu_workqueue(0) . > > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 > > > > DEADLOCK! > > I am not sure. Note that when work_func0() calls run_workqueue(), > it will clear cwq->current_work, so another flush_ on CPU#1 will > not wait for work_func0, no? No but CPU#1 can wait for a completion that will never be done, because CWQ#0 is waiting for CWQ#1. > But anyway. Nobody argues, "if (cwq->thread == current) {...}" code in > flush_cpu_workqueue() is bad and should die. Otrherwise, we should > fix the lockdep warning ;) > > The only problem: if we still have the users of this hack, they will > deadlock. But perhaps it is time to fix them. > > And, if it was not clear, I do agree with this change. And Peter > seems to agree as well. > > Oleg. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 17:24 ` Frederic Weisbecker @ 2009-02-05 18:00 ` Oleg Nesterov 0 siblings, 0 replies; 20+ messages in thread From: Oleg Nesterov @ 2009-02-05 18:00 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lai Jiangshan, Peter Zijlstra, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On 02/05, Frederic Weisbecker wrote: > > On Thu, Feb 05, 2009 at 06:01:56PM +0100, Oleg Nesterov wrote: > > On 02/05, Lai Jiangshan wrote: > > > > > > DEADLOCK EXAMPLE for explain my above option: > > > > > > (work_func0() and work_func1() are work callback, and they > > > calls flush_workqueue()) > > > > > > CPU#0 CPU#1 > > > run_workqueue() run_workqueue() > > > work_func0() work_func1() > > > flush_workqueue() flush_workqueue() > > > flush_cpu_workqueue(0) . > > > flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > > > waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 > > > > > > DEADLOCK! > > > > I am not sure. Note that when work_func0() calls run_workqueue(), > > it will clear cwq->current_work, so another flush_ on CPU#1 will > > not wait for work_func0, no? > > No but CPU#1 can wait for a completion that will never be done, because > CWQ#0 is waiting for CWQ#1. Still can't understand. When work_func0()->run_workqueue() returns, we should have no works in ->worklist and ->current_work must be NULL. If we have a barrier which was inserted before - it should be flushed. But yes, deadlock is possible, if other works come after run_workqueue() returns and before work_func1() starts the flush. Just the description is not exactly accurate, imho. And we have other problems. Just to say, nothing can guarantee that run_workqueue() will ever return. It is correct if some work_struct always re-queues itself and should be cancelled before destroy_workqueue(). Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 17:01 ` Oleg Nesterov 2009-02-05 17:24 ` Frederic Weisbecker @ 2009-02-06 1:20 ` Lai Jiangshan 2009-02-06 16:46 ` Oleg Nesterov 1 sibling, 1 reply; 20+ messages in thread From: Lai Jiangshan @ 2009-02-06 1:20 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, Frédéric Weisbecker, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List Oleg Nesterov wrote: > On 02/05, Lai Jiangshan wrote: >> DEADLOCK EXAMPLE for explain my above option: >> >> (work_func0() and work_func1() are work callback, and they >> calls flush_workqueue()) >> >> CPU#0 CPU#1 >> run_workqueue() run_workqueue() >> work_func0() work_func1() >> flush_workqueue() flush_workqueue() >> flush_cpu_workqueue(0) . >> flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) >> waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 >> >> DEADLOCK! > > I am not sure. Note that when work_func0() calls run_workqueue(), > it will clear cwq->current_work, so another flush_ on CPU#1 will > not wait for work_func0, no? cwq->current_work is changed only when !list_empty(&cwq->worklist) in run_workqueue(). so cwq->current_work may not be changed. > > But anyway. Nobody argues, "if (cwq->thread == current) {...}" code in > flush_cpu_workqueue() is bad and should die. Otrherwise, we should > fix the lockdep warning ;) > > The only problem: if we still have the users of this hack, they will > deadlock. But perhaps it is time to fix them. > > And, if it was not clear, I do agree with this change. And Peter > seems to agree as well. > > Oleg. > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-06 1:20 ` Lai Jiangshan @ 2009-02-06 16:46 ` Oleg Nesterov 2009-02-09 7:20 ` Lai Jiangshan 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2009-02-06 16:46 UTC (permalink / raw) To: Lai Jiangshan Cc: Peter Zijlstra, Frédéric Weisbecker, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On 02/06, Lai Jiangshan wrote: > > Oleg Nesterov wrote: > > On 02/05, Lai Jiangshan wrote: > >> DEADLOCK EXAMPLE for explain my above option: > >> > >> (work_func0() and work_func1() are work callback, and they > >> calls flush_workqueue()) > >> > >> CPU#0 CPU#1 > >> run_workqueue() run_workqueue() > >> work_func0() work_func1() > >> flush_workqueue() flush_workqueue() > >> flush_cpu_workqueue(0) . > >> flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) > >> waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 > >> > >> DEADLOCK! > > > > I am not sure. Note that when work_func0() calls run_workqueue(), > > it will clear cwq->current_work, so another flush_ on CPU#1 will > > not wait for work_func0, no? > > cwq->current_work is changed only when > !list_empty(&cwq->worklist) > in run_workqueue(). > > so cwq->current_work may not be changed. Ah, indeed. Thanks for correcting me! Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-06 16:46 ` Oleg Nesterov @ 2009-02-09 7:20 ` Lai Jiangshan 0 siblings, 0 replies; 20+ messages in thread From: Lai Jiangshan @ 2009-02-09 7:20 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, Frédéric Weisbecker, Ingo Molnar, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List Oleg Nesterov wrote: > On 02/06, Lai Jiangshan wrote: >> Oleg Nesterov wrote: >>> On 02/05, Lai Jiangshan wrote: >>>> DEADLOCK EXAMPLE for explain my above option: >>>> >>>> (work_func0() and work_func1() are work callback, and they >>>> calls flush_workqueue()) >>>> >>>> CPU#0 CPU#1 >>>> run_workqueue() run_workqueue() >>>> work_func0() work_func1() >>>> flush_workqueue() flush_workqueue() >>>> flush_cpu_workqueue(0) . >>>> flush_cpu_workqueue(cpu#1) flush_cpu_workqueue(cpu#0) >>>> waiting work_func1() in cpu#1 waiting work_func0 in cpu#0 >>>> >>>> DEADLOCK! >>> I am not sure. Note that when work_func0() calls run_workqueue(), >>> it will clear cwq->current_work, so another flush_ on CPU#1 will >>> not wait for work_func0, no? >> cwq->current_work is changed only when >> !list_empty(&cwq->worklist) >> in run_workqueue(). >> >> so cwq->current_work may not be changed. > > Ah, indeed. > > Thanks for correcting me! > > Oleg. > > > > Thanks, Could you Ack these patches? Lai. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-05 8:18 ` Lai Jiangshan 2009-02-05 13:47 ` Frederic Weisbecker 2009-02-05 17:01 ` Oleg Nesterov @ 2009-02-06 1:46 ` Lai Jiangshan 2009-02-09 19:14 ` Oleg Nesterov 2009-02-10 20:53 ` Andrew Morton 2 siblings, 2 replies; 20+ messages in thread From: Lai Jiangshan @ 2009-02-06 1:46 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar Cc: Frédéric Weisbecker, Oleg Nesterov, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List Hi, Ingo This is new changelog, I didn't change the patch, except use WARN_ON instead BUG_ON. Thanks, Lai From: Lai Jiangshan <laijs@cn.fujitsu.com> 1) lockdep will complain when recursion run_workqueue() 2) The recursive implement of run_workqueue() makes flush_workqueue() and it's doc are inconsistent. It may hide deadlock and other bugs. 3) recursion run_workqueue() will poison cwq->current_work, but flush_work() and __cancel_work_timer() ...etc. need reliable cwq->current_work. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> --- diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 2f44583..1129cde 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { struct workqueue_struct *wq; struct task_struct *thread; - - int run_depth; /* Detect run_workqueue() recursion depth */ } ____cacheline_aligned; /* @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); static void run_workqueue(struct cpu_workqueue_struct *cwq) { spin_lock_irq(&cwq->lock); - cwq->run_depth++; - if (cwq->run_depth > 3) { - /* morton gets to eat his hat */ - printk("%s: recursion depth exceeded: %d\n", - __func__, cwq->run_depth); - dump_stack(); - } while (!list_empty(&cwq->worklist)) { struct work_struct *work = list_entry(cwq->worklist.next, struct work_struct, entry); @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) spin_lock_irq(&cwq->lock); cwq->current_work = NULL; } - cwq->run_depth--; spin_unlock_irq(&cwq->lock); } @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) { - int active; + int active = 0; + struct wq_barrier barr; - if (cwq->thread == current) { - /* - * Probably keventd trying to flush its own queue. So simply run - * it by hand rather than deadlocking. - */ - run_workqueue(cwq); - active = 1; - } else { - struct wq_barrier barr; + WARN_ON(cwq->thread == current); - active = 0; - spin_lock_irq(&cwq->lock); - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { - insert_wq_barrier(cwq, &barr, &cwq->worklist); - active = 1; - } - spin_unlock_irq(&cwq->lock); - - if (active) - wait_for_completion(&barr.done); + spin_lock_irq(&cwq->lock); + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { + insert_wq_barrier(cwq, &barr, &cwq->worklist); + active = 1; } + spin_unlock_irq(&cwq->lock); + + if (active) + wait_for_completion(&barr.done); return active; } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-06 1:46 ` Lai Jiangshan @ 2009-02-09 19:14 ` Oleg Nesterov 2009-02-10 20:53 ` Andrew Morton 1 sibling, 0 replies; 20+ messages in thread From: Oleg Nesterov @ 2009-02-09 19:14 UTC (permalink / raw) To: Lai Jiangshan Cc: Peter Zijlstra, Ingo Molnar, Frédéric Weisbecker, Andrew Morton, Eric Dumazet, Linux Kernel Mailing List On 02/06, Lai Jiangshan wrote: > > 1) lockdep will complain when recursion run_workqueue() > 2) The recursive implement of run_workqueue() makes flush_workqueue() > and it's doc are inconsistent. It may hide deadlock and other bugs. > 3) recursion run_workqueue() will poison cwq->current_work, > but flush_work() and __cancel_work_timer() ...etc. need > reliable cwq->current_work. I think this change is good. If we still have users which call flush from work->func() they should be fixed, imho. And while I knew this recursive flush is bad, I didn't realize how bad it is until Lai spelled this. Thanks. Acked-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > --- > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2f44583..1129cde 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > struct workqueue_struct *wq; > struct task_struct *thread; > - > - int run_depth; /* Detect run_workqueue() recursion depth */ > } ____cacheline_aligned; > > /* > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > static void run_workqueue(struct cpu_workqueue_struct *cwq) > { > spin_lock_irq(&cwq->lock); > - cwq->run_depth++; > - if (cwq->run_depth > 3) { > - /* morton gets to eat his hat */ > - printk("%s: recursion depth exceeded: %d\n", > - __func__, cwq->run_depth); > - dump_stack(); > - } > while (!list_empty(&cwq->worklist)) { > struct work_struct *work = list_entry(cwq->worklist.next, > struct work_struct, entry); > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > spin_lock_irq(&cwq->lock); > cwq->current_work = NULL; > } > - cwq->run_depth--; > spin_unlock_irq(&cwq->lock); > } > > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > { > - int active; > + int active = 0; > + struct wq_barrier barr; > > - if (cwq->thread == current) { > - /* > - * Probably keventd trying to flush its own queue. So simply run > - * it by hand rather than deadlocking. > - */ > - run_workqueue(cwq); > - active = 1; > - } else { > - struct wq_barrier barr; > + WARN_ON(cwq->thread == current); > > - active = 0; > - spin_lock_irq(&cwq->lock); > - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > - insert_wq_barrier(cwq, &barr, &cwq->worklist); > - active = 1; > - } > - spin_unlock_irq(&cwq->lock); > - > - if (active) > - wait_for_completion(&barr.done); > + spin_lock_irq(&cwq->lock); > + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > + insert_wq_barrier(cwq, &barr, &cwq->worklist); > + active = 1; > } > + spin_unlock_irq(&cwq->lock); > + > + if (active) > + wait_for_completion(&barr.done); > > return active; > } > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-02-06 1:46 ` Lai Jiangshan 2009-02-09 19:14 ` Oleg Nesterov @ 2009-02-10 20:53 ` Andrew Morton 1 sibling, 0 replies; 20+ messages in thread From: Andrew Morton @ 2009-02-10 20:53 UTC (permalink / raw) To: Lai Jiangshan; +Cc: peterz, mingo, fweisbec, oleg, dada1, linux-kernel On Fri, 06 Feb 2009 09:46:29 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote: > Hi, Ingo > > This is new changelog, I didn't change the patch, > except use WARN_ON instead BUG_ON. > > Thanks, Lai > > From: Lai Jiangshan <laijs@cn.fujitsu.com> > > 1) lockdep will complain when recursion run_workqueue() > 2) The recursive implement of run_workqueue() makes flush_workqueue() > and it's doc are inconsistent. It may hide deadlock and other bugs. > 3) recursion run_workqueue() will poison cwq->current_work, > but flush_work() and __cancel_work_timer() ...etc. need > reliable cwq->current_work. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > --- > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 2f44583..1129cde 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > struct workqueue_struct *wq; > struct task_struct *thread; > - > - int run_depth; /* Detect run_workqueue() recursion depth */ > } ____cacheline_aligned; > > /* > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > static void run_workqueue(struct cpu_workqueue_struct *cwq) > { > spin_lock_irq(&cwq->lock); > - cwq->run_depth++; > - if (cwq->run_depth > 3) { > - /* morton gets to eat his hat */ > - printk("%s: recursion depth exceeded: %d\n", > - __func__, cwq->run_depth); > - dump_stack(); > - } I never did get to eat my hat. Do we know of any cases where there was ever any recursion here? Even single-level recursion? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 9:30 ` Frederic Weisbecker 2009-01-22 9:36 ` Ingo Molnar @ 2009-01-22 9:39 ` Frederic Weisbecker 2009-01-22 17:23 ` Oleg Nesterov 2 siblings, 0 replies; 20+ messages in thread From: Frederic Weisbecker @ 2009-01-22 9:39 UTC (permalink / raw) To: Lai Jiangshan Cc: Oleg Nesterov, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List On Thu, Jan 22, 2009 at 10:30:46AM +0100, Frederic Weisbecker wrote: > On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: > > 1) lockdep will complain when recursion run_workqueue > > 2) works is not run orderly when recursion run_workqueue > > > > 3) BUG! > > We use recursion run_workqueue to hidden deadlock when > > keventd trying to flush its own queue. > > > > It's bug. When flush_workqueue()(nested in a work callback)returns, > > the workqueue is not really flushed, the sequence statement of > > this work callback will do some thing bad. > > > > So we should not allow workqueue trying to flush its own queue. > > > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > > --- > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > > index 2f44583..1129cde 100644 > > --- a/kernel/workqueue.c > > +++ b/kernel/workqueue.c > > @@ -48,8 +48,6 @@ struct cpu_workqueue_struct { > > > > struct workqueue_struct *wq; > > struct task_struct *thread; > > - > > - int run_depth; /* Detect run_workqueue() recursion depth */ > > } ____cacheline_aligned; > > > > /* > > @@ -262,13 +260,6 @@ EXPORT_SYMBOL_GPL(queue_delayed_work_on); > > static void run_workqueue(struct cpu_workqueue_struct *cwq) > > { > > spin_lock_irq(&cwq->lock); > > - cwq->run_depth++; > > - if (cwq->run_depth > 3) { > > - /* morton gets to eat his hat */ > > - printk("%s: recursion depth exceeded: %d\n", > > - __func__, cwq->run_depth); > > - dump_stack(); > > - } > > while (!list_empty(&cwq->worklist)) { > > struct work_struct *work = list_entry(cwq->worklist.next, > > struct work_struct, entry); > > @@ -311,7 +302,6 @@ static void run_workqueue(struct cpu_workqueue_struct *cwq) > > spin_lock_irq(&cwq->lock); > > cwq->current_work = NULL; > > } > > - cwq->run_depth--; > > spin_unlock_irq(&cwq->lock); > > } > > > > @@ -368,29 +358,20 @@ static void insert_wq_barrier(struct cpu_workqueue_struct *cwq, > > > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > > { > > - int active; > > + int active = 0; > > + struct wq_barrier barr; > > > > - if (cwq->thread == current) { > > - /* > > - * Probably keventd trying to flush its own queue. So simply run > > - * it by hand rather than deadlocking. > > - */ > > - run_workqueue(cwq); > > - active = 1; > > - } else { > > - struct wq_barrier barr; > > + BUG_ON(cwq->thread == current); > > Hi Lai, > > BUG_ON seems perhaps a bit too much for such case. The system > will run in an endless loop because of a mistake that will not have > necessarily a fatal end. > WARN_ON should be enough (plus the warn that lockdep will raise > too in this case). > > Thanks. > Frederic. And perhaps add a comment for the developers who will encounter such a warn, and then fall down in this call site while searching which warned. To easily find the reason of the WARN. cwq->thread == current is perhaps not verbose enough to help the developer finding the source of the problem. They could solve the issue and say "Doh!" more quickly if they find in a one shot sight: /* Never flush a workqueue from a work */ :-) > > > - active = 0; > > - spin_lock_irq(&cwq->lock); > > - if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > > - insert_wq_barrier(cwq, &barr, &cwq->worklist); > > - active = 1; > > - } > > - spin_unlock_irq(&cwq->lock); > > - > > - if (active) > > - wait_for_completion(&barr.done); > > + spin_lock_irq(&cwq->lock); > > + if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > > + insert_wq_barrier(cwq, &barr, &cwq->worklist); > > + active = 1; > > } > > + spin_unlock_irq(&cwq->lock); > > + > > + if (active) > > + wait_for_completion(&barr.done); > > > > return active; > > } > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 9:30 ` Frederic Weisbecker 2009-01-22 9:36 ` Ingo Molnar 2009-01-22 9:39 ` Frederic Weisbecker @ 2009-01-22 17:23 ` Oleg Nesterov 2009-01-22 17:47 ` Frédéric Weisbecker 2 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2009-01-22 17:23 UTC (permalink / raw) To: Frederic Weisbecker Cc: Lai Jiangshan, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List On 01/22, Frederic Weisbecker wrote: > > On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: > > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) > > { > > - int active; > > + int active = 0; > > + struct wq_barrier barr; > > > > - if (cwq->thread == current) { > > - /* > > - * Probably keventd trying to flush its own queue. So simply run > > - * it by hand rather than deadlocking. > > - */ > > - run_workqueue(cwq); > > - active = 1; > > - } else { > > - struct wq_barrier barr; > > + BUG_ON(cwq->thread == current); > > Hi Lai, > > BUG_ON seems perhaps a bit too much for such case. The system > will run in an endless loop because of a mistake that will not have > necessarily a fatal end. Confused. Why do you think the system will run in an endless loop? cwq-thread will exit. > WARN_ON should be enough (plus the warn that lockdep will raise > too in this case). and if cwq-thread proceeds after WARN_ON() it will be "lost" anyway because it will sleep forever. Not that I think BUG_ON() is much better, except it is more "loud". As for the patch itself, I completely agree with Peter. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 17:23 ` Oleg Nesterov @ 2009-01-22 17:47 ` Frédéric Weisbecker 2009-01-22 18:22 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Frédéric Weisbecker @ 2009-01-22 17:47 UTC (permalink / raw) To: Oleg Nesterov Cc: Lai Jiangshan, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List 2009/1/22 Oleg Nesterov <oleg@redhat.com>: > On 01/22, Frederic Weisbecker wrote: >> >> On Thu, Jan 22, 2009 at 05:14:24PM +0800, Lai Jiangshan wrote: >> > static int flush_cpu_workqueue(struct cpu_workqueue_struct *cwq) >> > { >> > - int active; >> > + int active = 0; >> > + struct wq_barrier barr; >> > >> > - if (cwq->thread == current) { >> > - /* >> > - * Probably keventd trying to flush its own queue. So simply run >> > - * it by hand rather than deadlocking. >> > - */ >> > - run_workqueue(cwq); >> > - active = 1; >> > - } else { >> > - struct wq_barrier barr; >> > + BUG_ON(cwq->thread == current); >> >> Hi Lai, >> >> BUG_ON seems perhaps a bit too much for such case. The system >> will run in an endless loop because of a mistake that will not have >> necessarily a fatal end. > > Confused. Why do you think the system will run in an endless loop? > cwq-thread will exit. Because a BUG_ON panics and then spin for ever. Yeah I shoud have said "panic", sorry... It was just to tell that a BUG_ON is the end... > >> WARN_ON should be enough (plus the warn that lockdep will raise >> too in this case). > > and if cwq-thread proceeds after WARN_ON() it will be "lost" anyway > because it will sleep forever. You want to say spin forever? Why would it? cwq->lock is unlocked at this time. If we keep the usual path: if (cwq->thread == current) { run_workqueue(cwq); active = 1; } it shouldn't hurt. > Not that I think BUG_ON() is much better, except it is more "loud". I don't think so IMHO, BUG_ON is for critical issues. Here it is not critical, the workqueue will flush but lockdep will warn because of recursion. That's all. > > As for the patch itself, I completely agree with Peter. > > Oleg. > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] workqueue: not allow recursion run_workqueue 2009-01-22 17:47 ` Frédéric Weisbecker @ 2009-01-22 18:22 ` Oleg Nesterov 0 siblings, 0 replies; 20+ messages in thread From: Oleg Nesterov @ 2009-01-22 18:22 UTC (permalink / raw) To: Frédéric Weisbecker Cc: Lai Jiangshan, Ingo Molnar, Andrew Morton, Peter Zijlstra, Eric Dumazet, Linux Kernel Mailing List On 01/22, Frédéric Weisbecker wrote: > > 2009/1/22 Oleg Nesterov <oleg@redhat.com>: > > On 01/22, Frederic Weisbecker wrote: > >> > >> BUG_ON seems perhaps a bit too much for such case. The system > >> will run in an endless loop because of a mistake that will not have > >> necessarily a fatal end. > > > > Confused. Why do you think the system will run in an endless loop? > > cwq-thread will exit. > > Because a BUG_ON panics and then spin for ever. Yeah I shoud have said "panic", > sorry... It was just to tell that a BUG_ON is the end... BUG_ON() only panics when panic_on_oops == T, no? But let me repeat, this is minor issue. I agree with WARN(). > >> WARN_ON should be enough (plus the warn that lockdep will raise > >> too in this case). > > > > and if cwq-thread proceeds after WARN_ON() it will be "lost" anyway > > because it will sleep forever. > > You want to say spin forever? > Why would it? cwq->lock is unlocked at this time. No, it will sleep forever, unless I missed something. Even if ->worklist is empty, ->current_work != NULL, we are ->current_work. We insert the barrier work and call wait_for_completion(). But nobody can do complete() except us. > If we keep the usual path: > > if (cwq->thread == current) { > run_workqueue(cwq); > active = 1; > } > > it shouldn't hurt. If we keep this path then we have the different patch ;) In that case of course BUG_ON() is overkill. But again, as Peter says, we already have the warning from lockdep. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-02-10 20:54 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-01-22 9:14 [PATCH 2/3] workqueue: not allow recursion run_workqueue Lai Jiangshan 2009-01-22 9:30 ` Frederic Weisbecker 2009-01-22 9:36 ` Ingo Molnar 2009-01-22 11:06 ` Frédéric Weisbecker 2009-01-22 11:10 ` Peter Zijlstra 2009-02-05 8:18 ` Lai Jiangshan 2009-02-05 13:47 ` Frederic Weisbecker 2009-02-05 17:01 ` Oleg Nesterov 2009-02-05 17:24 ` Frederic Weisbecker 2009-02-05 18:00 ` Oleg Nesterov 2009-02-06 1:20 ` Lai Jiangshan 2009-02-06 16:46 ` Oleg Nesterov 2009-02-09 7:20 ` Lai Jiangshan 2009-02-06 1:46 ` Lai Jiangshan 2009-02-09 19:14 ` Oleg Nesterov 2009-02-10 20:53 ` Andrew Morton 2009-01-22 9:39 ` Frederic Weisbecker 2009-01-22 17:23 ` Oleg Nesterov 2009-01-22 17:47 ` Frédéric Weisbecker 2009-01-22 18:22 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).