All of lore.kernel.org
 help / color / mirror / Atom feed
* workqueue destruction BUG_ON
@ 2010-08-24  8:55 Johannes Berg
  2010-08-24 10:24 ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24  8:55 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

Tejun, all,

In our testing with iwlwifi, we keep running into this BUG_ON:

	BUG_ON(cwq->nr_active);

in destroy_workqueue(). This is quite unhelpful, and since the code
flushes the workqueue I really don't see how this could be happening,
unless maybe there's cross-talk between this and other workqueues?

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24  8:55 workqueue destruction BUG_ON Johannes Berg
@ 2010-08-24 10:24 ` Tejun Heo
  2010-08-24 10:37   ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 10:24 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

Hello,

On 08/24/2010 10:55 AM, Johannes Berg wrote:
> In our testing with iwlwifi, we keep running into this BUG_ON:
> 
> 	BUG_ON(cwq->nr_active);
> 
> in destroy_workqueue(). This is quite unhelpful, and since the code
> flushes the workqueue I really don't see how this could be happening,
> unless maybe there's cross-talk between this and other workqueues?

Flushing the workqueue doesn't guarantee that the workqueue stays
empty.  It just flushes the currently pending works.  If there are
requeueing works or something else is queueing new works, workqueue
won't be empty.  The check is new.  Previously, workqueue code didn't
check whether the workqueue is actually empty.  Now that the worklist
is shared, we need such check in place.  There was also a similar
report on ath9k.  I think it's most likely that something is queueing
works on a dying wokrqueue in the wireless common code.  I'll prep a
debug patch to print out some details.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 10:24 ` Tejun Heo
@ 2010-08-24 10:37   ` Johannes Berg
  2010-08-24 12:35     ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 10:37 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

Hi,

> > In our testing with iwlwifi, we keep running into this BUG_ON:
> > 
> > 	BUG_ON(cwq->nr_active);
> > 
> > in destroy_workqueue(). This is quite unhelpful, and since the code
> > flushes the workqueue I really don't see how this could be happening,
> > unless maybe there's cross-talk between this and other workqueues?
> 
> Flushing the workqueue doesn't guarantee that the workqueue stays
> empty.

True.

> It just flushes the currently pending works.  If there are
> requeueing works or something else is queueing new works, workqueue
> won't be empty.  The check is new.  Previously, workqueue code didn't
> check whether the workqueue is actually empty.  Now that the worklist
> is shared, we need such check in place.  There was also a similar
> report on ath9k.  I think it's most likely that something is queueing
> works on a dying wokrqueue in the wireless common code.

I think in my iwlwifi case it's actually destroying the iwlwifi
workqueue (not sure why it even exists though).

>   I'll prep a
> debug patch to print out some details.

That'd be helpful, thanks!

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 10:37   ` Johannes Berg
@ 2010-08-24 12:35     ` Tejun Heo
  2010-08-24 13:04       ` Johannes Berg
  2010-08-24 13:10       ` Johannes Berg
  0 siblings, 2 replies; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 12:35 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

Hello,

On 08/24/2010 12:37 PM, Johannes Berg wrote:
> I think in my iwlwifi case it's actually destroying the iwlwifi
> workqueue (not sure why it even exists though).

Yeah, I'm planning on auditing each workqueue and remove unnecessary
ones.

>> I'll prep a debug patch to print out some details.
> 
> That'd be helpful, thanks!

Can you please apply the following patch and report the result?

Thanks.

>From 492a242b75b0abae3b1c17b4a654ab9ef67e612d Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Tue, 24 Aug 2010 14:22:47 +0200
Subject: [PATCH] workqueue: improve destroy_workqueue() debuggability

Now that the worklist is global, having works pending after wq
destruction can easily lead to oops and destroy_workqueue() have
several BUG_ON()s to catch these cases.  Unfortunately, BUG_ON()
doesn't tell much about how the work became pending after the final
flush_workqueue().

This patch adds WQ_DYING which is set before the final flush begins
and WARN_ON_ONCE() is triggered if a work is requested to be queued on
a dying workqueue and the request is ignored.  This clearly indicates
which caller is trying to queue a work on a dying workqueue and keeps
the system working in most cases.

Locking rule comment is updated such that the 'I' rule includes
modifying the field from destruction path.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/workqueue.h |    2 ++
 kernel/workqueue.c        |    7 ++++++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 4f9d277..c959666 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -241,6 +241,8 @@ enum {
 	WQ_HIGHPRI		= 1 << 4, /* high priority */
 	WQ_CPU_INTENSIVE	= 1 << 5, /* cpu instensive workqueue */

+	WQ_DYING		= 1 << 6, /* internal: workqueue is dying */
+
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
 	WQ_DFL_ACTIVE		= WQ_MAX_ACTIVE / 2,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index cc3456f..362b50d 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -87,7 +87,8 @@ enum {
 /*
  * Structure fields follow one of the following exclusion rules.
  *
- * I: Set during initialization and read-only afterwards.
+ * I: Modifiable by initialization/destruction paths and read-only for
+ *    everyone else.
  *
  * P: Preemption protected.  Disabling preemption is enough and should
  *    only be modified and accessed from the local cpu.
@@ -944,6 +945,9 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,

 	debug_work_activate(work);

+	if (WARN_ON_ONCE(wq->flags & WQ_DYING))
+		return;
+
 	/* determine gcwq to use */
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct global_cwq *last_gcwq;
@@ -2828,6 +2832,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 {
 	unsigned int cpu;

+	wq->flags |= WQ_DYING;
 	flush_workqueue(wq);

 	/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 12:35     ` Tejun Heo
@ 2010-08-24 13:04       ` Johannes Berg
  2010-08-24 13:10       ` Johannes Berg
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 13:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 14:35 +0200, Tejun Heo wrote:

> Can you please apply the following patch and report the result?
> 
> Thanks.
> 
> From 492a242b75b0abae3b1c17b4a654ab9ef67e612d Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Tue, 24 Aug 2010 14:22:47 +0200
> Subject: [PATCH] workqueue: improve destroy_workqueue() debuggability

Thanks, will do. I couldn't reproduce right away, but I had it quite
frequently yesterday so I'll go back to what I was testing yesterday.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:10       ` Johannes Berg
@ 2010-08-24 13:07         ` Tejun Heo
  2010-08-24 13:17           ` Johannes Berg
  2010-08-24 13:19           ` Johannes Berg
  0 siblings, 2 replies; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 13:07 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

Hello,

On 08/24/2010 03:10 PM, Johannes Berg wrote:
>> From 492a242b75b0abae3b1c17b4a654ab9ef67e612d Mon Sep 17 00:00:00 2001
>> From: Tejun Heo <tj@kernel.org>
>> Date: Tue, 24 Aug 2010 14:22:47 +0200
>> Subject: [PATCH] workqueue: improve destroy_workqueue() debuggability
> 
> With this patch applied I still get just the BUG_ON:

Hmmm... weird.

> [  500.874185] ------------[ cut here ]------------
> [  500.875212] kernel BUG at kernel/workqueue.c:2849!

Are you sure you're running the patched kernel?  With the patch
applied, the BUG_ON() wouldn't be on line 2849 (on both rc1 and 2).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 12:35     ` Tejun Heo
  2010-08-24 13:04       ` Johannes Berg
@ 2010-08-24 13:10       ` Johannes Berg
  2010-08-24 13:07         ` Tejun Heo
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 13:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

Tejun,

> From 492a242b75b0abae3b1c17b4a654ab9ef67e612d Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Tue, 24 Aug 2010 14:22:47 +0200
> Subject: [PATCH] workqueue: improve destroy_workqueue() debuggability

With this patch applied I still get just the BUG_ON:

[  500.874185] ------------[ cut here ]------------
[  500.875212] kernel BUG at kernel/workqueue.c:2849!
[  500.876224] invalid opcode: 0000 [#1] PREEMPT SMP 
[  500.877256] last sysfs file: /sys/module/iwlagn/refcnt
[  500.878267] CPU 1 
[  500.878281] Modules linked in: iwlagn(-) ...
[  500.880074] 
[  500.880074] Pid: 2167, comm: modprobe Not tainted 2.6.36-rc1-wl-p2p-64225-g61066fb-dirty #18 JPTR/Satellite U505
[  500.880074] RIP: 0010:[<ffffffff81078aac>]  [<ffffffff81078aac>] destroy_workqueue+0x19c/0x1e0
[  500.880074] RSP: 0018:ffff88012587fd58  EFLAGS: 00010286
[  500.880074] RAX: 000000000000003c RBX: 0000000000000002 RCX: 000000000000005a
[  500.880074] RDX: ffff880132783d80 RSI: 0000000000000004 RDI: 00000000ffffffff
[  500.880074] RBP: ffff88012587fd68 R08: 0000000000000000 R09: 0000000000000000
[  500.880074] R10: 0000000000000003 R11: 0000000000000000 R12: ffff880123709240
[  500.880074] R13: ffff880133aa6240 R14: ffff880124423258 R15: 0000000000000282
[  500.880074] FS:  00007fcf136b2700(0000) GS:ffff88002cc00000(0000) knlGS:0000000000000000
[  500.880074] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  500.880074] CR2: 00000000006e3aa4 CR3: 00000001231fc000 CR4: 00000000000006e0
[  500.880074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  500.880074] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  500.880074] Process modprobe (pid: 2167, threadinfo ffff88012587e000, task ffff8801336c4780)
[  500.880074] Stack:
[  500.880074]  ffff880124422840 ffff880133aa61b0 ffff88012587fda8 ffffffffa040bc30
[  500.880074] <0> ffff88012587fd98 ffff880133aa6240 ffff880133aa6530 ffffffffa0417980
[  500.880074] <0> ffff880133aa61b0 0000000001c014b8 ffff88012587fdd8 ffffffff812421b4
[  500.880074] Call Trace:
[  500.880074]  [<ffffffffa040bc30>] iwl_pci_remove+0x18c/0x21f [iwlagn]
[  500.880074]  [<ffffffff812421b4>] pci_device_remove+0x54/0x120
[  500.880074]  [<ffffffff812d4ab5>] __device_release_driver+0x75/0xe0
[  500.880074]  [<ffffffff812d4bf8>] driver_detach+0xd8/0xe0
[  500.880074]  [<ffffffff812d3a58>] bus_remove_driver+0x88/0xe0
[  500.880074]  [<ffffffff812d5282>] driver_unregister+0x62/0xa0
[  500.880074]  [<ffffffff81242514>] pci_unregister_driver+0x44/0xc0
[  500.880074]  [<ffffffffa040ba9d>] iwl_exit+0x15/0x1c [iwlagn]
[  500.880074]  [<ffffffff810a1e32>] sys_delete_module+0x1a2/0x280
[  500.880074]  [<ffffffff8100c1b2>] system_call_fastpath+0x16/0x1b
[  500.880074] Code: 19 00 39 05 77 81 81 00 7f 06 41 8b 0c 24 eb b4 83 f8 04 89 c6 77 23 41 8b 0c 24 e9 28 ff ff ff 0f 0b eb fe 31 d2 e9 43 ff ff ff <0f> 0b eb fe 0f 0b eb fe 83 f8 04 89 c6 76 dd 41 8b 04 24 e9 c6 
[  500.880074] RIP  [<ffffffff81078aac>] destroy_workqueue+0x19c/0x1e0
[  500.880074]  RSP <ffff88012587fd58>
[  500.931585] ---[ end trace 06397bea564a72fe ]---

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:17           ` Johannes Berg
@ 2010-08-24 13:15             ` Tejun Heo
  2010-08-24 13:23               ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 13:15 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

Hello,

On 08/24/2010 03:17 PM, Johannes Berg wrote:
> On Tue, 2010-08-24 at 15:07 +0200, Tejun Heo wrote:
> 
>>> [  500.874185] ------------[ cut here ]------------
>>> [  500.875212] kernel BUG at kernel/workqueue.c:2849!
>>
>> Are you sure you're running the patched kernel?  With the patch
>> applied, the BUG_ON() wouldn't be on line 2849 (on both rc1 and 2).
> 
> Yes:
> 
> void destroy_workqueue(struct workqueue_struct *wq)
> {
>         unsigned int cpu;
>    
>         wq->flags |= WQ_DYING;
>         flush_workqueue(wq);
>   
>         /*
>          * wq list is used to freeze wq, remove from list after
>          * flushing is complete in case freeze races us.
>          */
>         spin_lock(&workqueue_lock);
>         list_del(&wq->list);
>         spin_unlock(&workqueue_lock);
> 
>         /* sanity check */
>         for_each_cwq_cpu(cpu, wq) {
>                 struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
>                 int i;
> 
>                 for (i = 0; i < WORK_NR_COLORS; i++)
>                         BUG_ON(cwq->nr_in_flight[i]);
> 2849:           BUG_ON(cwq->nr_active);
>                 BUG_ON(!list_empty(&cwq->delayed_works));
> 
> 
> Applying the patch reported some offset, but the kernel is just rc1 +
> wireless stuff.

I see, thanks for verifying.  I probably got confused about the line
number.  Hmm... weird.  I'll prep further debug patch but can you
please tell me what you did to trigger the bug?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:07         ` Tejun Heo
@ 2010-08-24 13:17           ` Johannes Berg
  2010-08-24 13:15             ` Tejun Heo
  2010-08-24 13:19           ` Johannes Berg
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 13:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 15:07 +0200, Tejun Heo wrote:

> > [  500.874185] ------------[ cut here ]------------
> > [  500.875212] kernel BUG at kernel/workqueue.c:2849!
> 
> Are you sure you're running the patched kernel?  With the patch
> applied, the BUG_ON() wouldn't be on line 2849 (on both rc1 and 2).

Yes:

void destroy_workqueue(struct workqueue_struct *wq)
{
        unsigned int cpu;
   
        wq->flags |= WQ_DYING;
        flush_workqueue(wq);
  
        /*
         * wq list is used to freeze wq, remove from list after
         * flushing is complete in case freeze races us.
         */
        spin_lock(&workqueue_lock);
        list_del(&wq->list);
        spin_unlock(&workqueue_lock);

        /* sanity check */
        for_each_cwq_cpu(cpu, wq) {
                struct cpu_workqueue_struct *cwq = get_cwq(cpu, wq);
                int i;

                for (i = 0; i < WORK_NR_COLORS; i++)
                        BUG_ON(cwq->nr_in_flight[i]);
2849:           BUG_ON(cwq->nr_active);
                BUG_ON(!list_empty(&cwq->delayed_works));


Applying the patch reported some offset, but the kernel is just rc1 +
wireless stuff.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:07         ` Tejun Heo
  2010-08-24 13:17           ` Johannes Berg
@ 2010-08-24 13:19           ` Johannes Berg
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 13:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 15:07 +0200, Tejun Heo wrote:

> > [  500.874185] ------------[ cut here ]------------
> > [  500.875212] kernel BUG at kernel/workqueue.c:2849!
> 
> Are you sure you're running the patched kernel?  With the patch
> applied, the BUG_ON() wouldn't be on line 2849 (on both rc1 and 2).

And this http://paste.pocoo.org/raw/253920/ is the diff between
2.6.35-rc1 and my current workqueue.c

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:15             ` Tejun Heo
@ 2010-08-24 13:23               ` Johannes Berg
  2010-08-24 14:56                 ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 13:23 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 15:15 +0200, Tejun Heo wrote:

> I see, thanks for verifying.  I probably got confused about the line
> number.  Hmm... weird.  I'll prep further debug patch but can you
> please tell me what you did to trigger the bug?

Not really sure. I think I need to trigger a firmware restart in iwlwifi
and then try to unload the module.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 13:23               ` Johannes Berg
@ 2010-08-24 14:56                 ` Tejun Heo
  2010-08-24 15:52                   ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 14:56 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

On 08/24/2010 03:23 PM, Johannes Berg wrote:
>> I see, thanks for verifying.  I probably got confused about the line
>> number.  Hmm... weird.  I'll prep further debug patch but can you
>> please tell me what you did to trigger the bug?
> 
> Not really sure. I think I need to trigger a firmware restart in iwlwifi
> and then try to unload the module.

Does the following patch fix the problem?

Thanks.

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index c959666..f11100f 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -25,18 +25,20 @@ typedef void (*work_func_t)(struct work_struct *work);

 enum {
 	WORK_STRUCT_PENDING_BIT	= 0,	/* work item is pending execution */
-	WORK_STRUCT_CWQ_BIT	= 1,	/* data points to cwq */
-	WORK_STRUCT_LINKED_BIT	= 2,	/* next work is linked to this one */
+	WORK_STRUCT_DELAYED_BIT	= 1,	/* work item is delayed */
+	WORK_STRUCT_CWQ_BIT	= 2,	/* data points to cwq */
+	WORK_STRUCT_LINKED_BIT	= 3,	/* next work is linked to this one */
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
-	WORK_STRUCT_STATIC_BIT	= 3,	/* static initializer (debugobjects) */
-	WORK_STRUCT_COLOR_SHIFT	= 4,	/* color for workqueue flushing */
+	WORK_STRUCT_STATIC_BIT	= 4,	/* static initializer (debugobjects) */
+	WORK_STRUCT_COLOR_SHIFT	= 5,	/* color for workqueue flushing */
 #else
-	WORK_STRUCT_COLOR_SHIFT	= 3,	/* color for workqueue flushing */
+	WORK_STRUCT_COLOR_SHIFT	= 4,	/* color for workqueue flushing */
 #endif

 	WORK_STRUCT_COLOR_BITS	= 4,

 	WORK_STRUCT_PENDING	= 1 << WORK_STRUCT_PENDING_BIT,
+	WORK_STRUCT_DELAYED	= 1 << WORK_STRUCT_DELAYED_BIT,
 	WORK_STRUCT_CWQ		= 1 << WORK_STRUCT_CWQ_BIT,
 	WORK_STRUCT_LINKED	= 1 << WORK_STRUCT_LINKED_BIT,
 #ifdef CONFIG_DEBUG_OBJECTS_WORK
@@ -59,8 +61,8 @@ enum {

 	/*
 	 * Reserve 7 bits off of cwq pointer w/ debugobjects turned
-	 * off.  This makes cwqs aligned to 128 bytes which isn't too
-	 * excessive while allowing 15 workqueue flush colors.
+	 * off.  This makes cwqs aligned to 256 bytes and allows 15
+	 * workqueue flush colors.
 	 */
 	WORK_STRUCT_FLAG_BITS	= WORK_STRUCT_COLOR_SHIFT +
 				  WORK_STRUCT_COLOR_BITS,
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 362b50d..a2dccfc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -941,6 +941,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	struct global_cwq *gcwq;
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
+	unsigned int work_flags;
 	unsigned long flags;

 	debug_work_activate(work);
@@ -990,14 +991,17 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	BUG_ON(!list_empty(&work->entry));

 	cwq->nr_in_flight[cwq->work_color]++;
+	work_flags = work_color_to_flags(cwq->work_color);

 	if (likely(cwq->nr_active < cwq->max_active)) {
 		cwq->nr_active++;
 		worklist = gcwq_determine_ins_pos(gcwq, cwq);
-	} else
+	} else {
+		work_flags |= WORK_STRUCT_DELAYED;
 		worklist = &cwq->delayed_works;
+	}

-	insert_work(cwq, work, worklist, work_color_to_flags(cwq->work_color));
+	insert_work(cwq, work, worklist, work_flags);

 	spin_unlock_irqrestore(&gcwq->lock, flags);
 }
@@ -1666,6 +1670,7 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
 	struct list_head *pos = gcwq_determine_ins_pos(cwq->gcwq, cwq);

 	move_linked_works(work, pos, NULL);
+	__clear_bit(WORK_STRUCT_DELAYED_BIT, work_data_bits(work));
 	cwq->nr_active++;
 }

@@ -1673,6 +1678,7 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
  * cwq_dec_nr_in_flight - decrement cwq's nr_in_flight
  * @cwq: cwq of interest
  * @color: color of work which left the queue
+ * @delayed: for a delayed work
  *
  * A work either has completed or is removed from pending queue,
  * decrement nr_in_flight of its cwq and handle workqueue flushing.
@@ -1680,19 +1686,22 @@ static void cwq_activate_first_delayed(struct cpu_workqueue_struct *cwq)
  * CONTEXT:
  * spin_lock_irq(gcwq->lock).
  */
-static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color)
+static void cwq_dec_nr_in_flight(struct cpu_workqueue_struct *cwq, int color,
+				 bool delayed)
 {
 	/* ignore uncolored works */
 	if (color == WORK_NO_COLOR)
 		return;

 	cwq->nr_in_flight[color]--;
-	cwq->nr_active--;

-	if (!list_empty(&cwq->delayed_works)) {
-		/* one down, submit a delayed one */
-		if (cwq->nr_active < cwq->max_active)
-			cwq_activate_first_delayed(cwq);
+	if (!delayed) {
+		cwq->nr_active--;
+		if (!list_empty(&cwq->delayed_works)) {
+			/* one down, submit a delayed one */
+			if (cwq->nr_active < cwq->max_active)
+				cwq_activate_first_delayed(cwq);
+		}
 	}

 	/* is flush in progress and are we at the flushing tip? */
@@ -1823,7 +1832,7 @@ __acquires(&gcwq->lock)
 	hlist_del_init(&worker->hentry);
 	worker->current_work = NULL;
 	worker->current_cwq = NULL;
-	cwq_dec_nr_in_flight(cwq, work_color);
+	cwq_dec_nr_in_flight(cwq, work_color, false);
 }

 /**
@@ -2388,7 +2397,8 @@ static int try_to_grab_pending(struct work_struct *work)
 			debug_work_deactivate(work);
 			list_del_init(&work->entry);
 			cwq_dec_nr_in_flight(get_work_cwq(work),
-					     get_work_color(work));
+				get_work_color(work),
+				*work_data_bits(work) & WORK_STRUCT_DELAYED);
 			ret = 1;
 		}
 	}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 15:52                   ` Johannes Berg
@ 2010-08-24 15:47                     ` Tejun Heo
  2010-08-24 15:56                       ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2010-08-24 15:47 UTC (permalink / raw)
  To: Johannes Berg; +Cc: LKML

Hello,

On 08/24/2010 05:52 PM, Johannes Berg wrote:
> On Tue, 2010-08-24 at 16:56 +0200, Tejun Heo wrote:
>> On 08/24/2010 03:23 PM, Johannes Berg wrote:
>>>> I see, thanks for verifying.  I probably got confused about the line
>>>> number.  Hmm... weird.  I'll prep further debug patch but can you
>>>> please tell me what you did to trigger the bug?
>>>
>>> Not really sure. I think I need to trigger a firmware restart in iwlwifi
>>> and then try to unload the module.
>>
>> Does the following patch fix the problem?
> 
> Not sure. I can't seem to reproduce the exact conditions (firmware
> crash) right now... However I just ran into the WARN_ON_ONCE() your
> previous patch added as well, so that might explain the ath9k problem...

Can you please post the log of the WARN_ON_ONCE()?  That's something
which needs to be fixed from the caller.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 14:56                 ` Tejun Heo
@ 2010-08-24 15:52                   ` Johannes Berg
  2010-08-24 15:47                     ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 15:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 16:56 +0200, Tejun Heo wrote:
> On 08/24/2010 03:23 PM, Johannes Berg wrote:
> >> I see, thanks for verifying.  I probably got confused about the line
> >> number.  Hmm... weird.  I'll prep further debug patch but can you
> >> please tell me what you did to trigger the bug?
> > 
> > Not really sure. I think I need to trigger a firmware restart in iwlwifi
> > and then try to unload the module.
> 
> Does the following patch fix the problem?

Not sure. I can't seem to reproduce the exact conditions (firmware
crash) right now... However I just ran into the WARN_ON_ONCE() your
previous patch added as well, so that might explain the ath9k problem...

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: workqueue destruction BUG_ON
  2010-08-24 15:47                     ` Tejun Heo
@ 2010-08-24 15:56                       ` Johannes Berg
  0 siblings, 0 replies; 15+ messages in thread
From: Johannes Berg @ 2010-08-24 15:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML

On Tue, 2010-08-24 at 17:47 +0200, Tejun Heo wrote:

> > Not sure. I can't seem to reproduce the exact conditions (firmware
> > crash) right now... However I just ran into the WARN_ON_ONCE() your
> > previous patch added as well, so that might explain the ath9k problem...
> 
> Can you please post the log of the WARN_ON_ONCE()?  That's something
> which needs to be fixed from the caller.

Yeah, I know, there's a timer that never gets cancelled, I'll fix it.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-08-24 15:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-24  8:55 workqueue destruction BUG_ON Johannes Berg
2010-08-24 10:24 ` Tejun Heo
2010-08-24 10:37   ` Johannes Berg
2010-08-24 12:35     ` Tejun Heo
2010-08-24 13:04       ` Johannes Berg
2010-08-24 13:10       ` Johannes Berg
2010-08-24 13:07         ` Tejun Heo
2010-08-24 13:17           ` Johannes Berg
2010-08-24 13:15             ` Tejun Heo
2010-08-24 13:23               ` Johannes Berg
2010-08-24 14:56                 ` Tejun Heo
2010-08-24 15:52                   ` Johannes Berg
2010-08-24 15:47                     ` Tejun Heo
2010-08-24 15:56                       ` Johannes Berg
2010-08-24 13:19           ` Johannes Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.