net/bluetooth: workqueue destruction WARNING in hci_unregister

All of lore.kernel.org
 help / color / mirror / Atom feed

* net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-01-26 11:53 Dmitry Vyukov
  2016-01-26 12:29 ` Dmitry Vyukov
  2016-02-18 14:00 ` Jiri Slaby
  0 siblings, 2 replies; 29+ messages in thread
From: Dmitry Vyukov @ 2016-01-26 11:53 UTC (permalink / raw)
  To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet

Hello,

I've hit the following warning while running syzkaller fuzzer:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 17409 at kernel/workqueue.c:3968
destroy_workqueue+0x172/0x550()
Modules linked in:
CPU: 2 PID: 17409 Comm: syz-executor Not tainted 4.5.0-rc1+ #283
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 00000000ffffffff ffff88003665f8a0 ffffffff8299a06d 0000000000000000
 ffff88003599c740 ffffffff8643f0c0 ffff88003665f8e0 ffffffff8134fcf9
 ffffffff8139d4c2 ffffffff8643f0c0 0000000000000f80 ffff8800630c5ae8
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8299a06d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [<ffffffff8134fcf9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
 [<ffffffff8134ff29>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
 [<ffffffff8139d4c2>] destroy_workqueue+0x172/0x550 kernel/workqueue.c:3968
 [<ffffffff85a714a4>] hci_unregister_dev+0x264/0x700
net/bluetooth/hci_core.c:3162
 [<ffffffff84595ce6>] vhci_release+0x76/0xe0 drivers/bluetooth/hci_vhci.c:341
 [<ffffffff817b2376>] __fput+0x236/0x780 fs/file_table.c:208
 [<ffffffff817b2945>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff813ad760>] task_work_run+0x170/0x210 kernel/task_work.c:115
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff81358da5>] do_exit+0x8b5/0x2c60 kernel/exit.c:748
 [<ffffffff8135b2c8>] do_group_exit+0x108/0x330 kernel/exit.c:878
 [<ffffffff8137e434>] get_signal+0x5e4/0x14f0 kernel/signal.c:2307
 [<ffffffff811a1db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
 [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
arch/x86/entry/common.c:247
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
 [<ffffffff810084ea>] syscall_return_slowpath+0x2ba/0x340
arch/x86/entry/common.c:344
 [<ffffffff863597a2>] int_ret_from_sys_call+0x25/0x9f
arch/x86/entry/entry_64.S:281
---[ end trace f627386faee7426f ]---

Unfortunately I cannot reproduce it in a controlled environment, but
I've hit it twice in different VMs. So maybe if you see something
obvious there. Is it possible that something is submitted into the
workqueue between it is drained and destroyed in hci_unregister_dev?

On commit 92e963f50fc74041b5e9e744c330dca48e04f08d (Jan 24).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-01-26 11:53 net/bluetooth: workqueue destruction WARNING in hci_unregister_dev Dmitry Vyukov
@ 2016-01-26 12:29 ` Dmitry Vyukov
  2016-02-18 14:00 ` Jiri Slaby
  1 sibling, 0 replies; 29+ messages in thread
From: Dmitry Vyukov @ 2016-01-26 12:29 UTC (permalink / raw)
  To: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet

On Tue, Jan 26, 2016 at 12:53 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> I've hit the following warning while running syzkaller fuzzer:
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 17409 at kernel/workqueue.c:3968
> destroy_workqueue+0x172/0x550()
> Modules linked in:
> CPU: 2 PID: 17409 Comm: syz-executor Not tainted 4.5.0-rc1+ #283
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88003665f8a0 ffffffff8299a06d 0000000000000000
>  ffff88003599c740 ffffffff8643f0c0 ffff88003665f8e0 ffffffff8134fcf9
>  ffffffff8139d4c2 ffffffff8643f0c0 0000000000000f80 ffff8800630c5ae8
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff8299a06d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff8134fcf9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>  [<ffffffff8134ff29>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>  [<ffffffff8139d4c2>] destroy_workqueue+0x172/0x550 kernel/workqueue.c:3968
>  [<ffffffff85a714a4>] hci_unregister_dev+0x264/0x700
> net/bluetooth/hci_core.c:3162
>  [<ffffffff84595ce6>] vhci_release+0x76/0xe0 drivers/bluetooth/hci_vhci.c:341
>  [<ffffffff817b2376>] __fput+0x236/0x780 fs/file_table.c:208
>  [<ffffffff817b2945>] ____fput+0x15/0x20 fs/file_table.c:244
>  [<ffffffff813ad760>] task_work_run+0x170/0x210 kernel/task_work.c:115
>  [<     inline     >] exit_task_work include/linux/task_work.h:21
>  [<ffffffff81358da5>] do_exit+0x8b5/0x2c60 kernel/exit.c:748
>  [<ffffffff8135b2c8>] do_group_exit+0x108/0x330 kernel/exit.c:878
>  [<ffffffff8137e434>] get_signal+0x5e4/0x14f0 kernel/signal.c:2307
>  [<ffffffff811a1db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
>  [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
> arch/x86/entry/common.c:247
>  [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>  [<ffffffff810084ea>] syscall_return_slowpath+0x2ba/0x340
> arch/x86/entry/common.c:344
>  [<ffffffff863597a2>] int_ret_from_sys_call+0x25/0x9f
> arch/x86/entry/entry_64.S:281
> ---[ end trace f627386faee7426f ]---
>
> Unfortunately I cannot reproduce it in a controlled environment, but
> I've hit it twice in different VMs. So maybe if you see something
> obvious there. Is it possible that something is submitted into the
> workqueue between it is drained and destroyed in hci_unregister_dev?
>
> On commit 92e963f50fc74041b5e9e744c330dca48e04f08d (Jan 24).


Wait, I was able to reproduce it by running the following program in a
parallel loop for several hours:
https://gist.githubusercontent.com/dvyukov/c15675af95e599fe6631/raw/c34117b54d0352f5df4f572d4151a94557780a9b/gistfile1.txt
I think that you just need to open /dev/vhci, because the program does
not seem to do anything useful with the descriptor.
Machine died, full log is here:
https://gist.githubusercontent.com/dvyukov/4584d70c2b4bee7ca875/raw/3794fb9f4c8e0fb2f91261a66725b042b5de4e0f/gistfile1.txt

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-01-26 11:53 net/bluetooth: workqueue destruction WARNING in hci_unregister_dev Dmitry Vyukov
  2016-01-26 12:29 ` Dmitry Vyukov
@ 2016-02-18 14:00 ` Jiri Slaby
  2016-02-18 14:22   ` Dmitry Vyukov
  1 sibling, 1 reply; 29+ messages in thread
From: Jiri Slaby @ 2016-02-18 14:00 UTC (permalink / raw)
  To: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML
  Cc: syzkaller, Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Tejun Heo, Takashi Iwai

[-- Attachment #1: Type: text/plain, Size: 3509 bytes --]

Cc Tejun (workqueues), Takashi (debug patch)

On 01/26/2016, 12:53 PM, Dmitry Vyukov wrote:
> Hello,
> 
> I've hit the following warning while running syzkaller fuzzer:

Hi,

I am hitting it over and over again using syzkaller.

> WARNING: CPU: 2 PID: 17409 at kernel/workqueue.c:3968
> destroy_workqueue+0x172/0x550()

Which of the warnings is it in your case?

I hit "(pwq != wq->dfl_pwq) && (pwq->refcnt > 1)". So I have this in the
code:
if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1))) {
        pr_info("%s: pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d\n",
                        __func__, pwq, wq->dfl_pwq,
                        pwq->refcnt);
        mutex_unlock(&wq->mutex);
        return;
}

if (WARN_ON(pwq->nr_active)) {
...

And the values are:
pwq=ffff88006e271500 wq->dfl_pwq=ffff88006e272400 pwq->refcnt=2
pwq=ffff88002ec48300 wq->dfl_pwq=ffff88002ec4bc00 pwq->refcnt=2
pwq=ffff880020f76400 wq->dfl_pwq=ffff880020f75500 pwq->refcnt=2
pwq=ffff88002a908f00 wq->dfl_pwq=ffff88002a90bc00 pwq->refcnt=1
pwq=ffff8800348e0300 wq->dfl_pwq=ffff8800348e0000 pwq->refcnt=2
pwq=ffff88006e276400 wq->dfl_pwq=ffff88006e275800 pwq->refcnt=2


Note the single "pwq->refcnt=1" in there. So this is perhaps a race?

Takashi also provided me with a debug patch included in the attached
patch. It did not trigger though.

> CPU: 2 PID: 17409 Comm: syz-executor Not tainted 4.5.0-rc1+ #283
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88003665f8a0 ffffffff8299a06d 0000000000000000
>  ffff88003599c740 ffffffff8643f0c0 ffff88003665f8e0 ffffffff8134fcf9
>  ffffffff8139d4c2 ffffffff8643f0c0 0000000000000f80 ffff8800630c5ae8
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff8299a06d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff8134fcf9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>  [<ffffffff8134ff29>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>  [<ffffffff8139d4c2>] destroy_workqueue+0x172/0x550 kernel/workqueue.c:3968
>  [<ffffffff85a714a4>] hci_unregister_dev+0x264/0x700
> net/bluetooth/hci_core.c:3162
>  [<ffffffff84595ce6>] vhci_release+0x76/0xe0 drivers/bluetooth/hci_vhci.c:341
>  [<ffffffff817b2376>] __fput+0x236/0x780 fs/file_table.c:208
>  [<ffffffff817b2945>] ____fput+0x15/0x20 fs/file_table.c:244
>  [<ffffffff813ad760>] task_work_run+0x170/0x210 kernel/task_work.c:115
>  [<     inline     >] exit_task_work include/linux/task_work.h:21
>  [<ffffffff81358da5>] do_exit+0x8b5/0x2c60 kernel/exit.c:748
>  [<ffffffff8135b2c8>] do_group_exit+0x108/0x330 kernel/exit.c:878
>  [<ffffffff8137e434>] get_signal+0x5e4/0x14f0 kernel/signal.c:2307
>  [<ffffffff811a1db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
>  [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
> arch/x86/entry/common.c:247
>  [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>  [<ffffffff810084ea>] syscall_return_slowpath+0x2ba/0x340
> arch/x86/entry/common.c:344
>  [<ffffffff863597a2>] int_ret_from_sys_call+0x25/0x9f
> arch/x86/entry/entry_64.S:281
> ---[ end trace f627386faee7426f ]---
> 
> Unfortunately I cannot reproduce it in a controlled environment, but
> I've hit it twice in different VMs. So maybe if you see something
> obvious there. Is it possible that something is submitted into the
> workqueue between it is drained and destroyed in hci_unregister_dev?
> 
> On commit 92e963f50fc74041b5e9e744c330dca48e04f08d (Jan 24).
> 

thanks,
-- 
js
suse labs

[-- Attachment #2: vhci_debug.patch --]
[-- Type: text/x-patch, Size: 3862 bytes --]

---
 include/linux/workqueue.h |    1 +
 kernel/workqueue.c        |   23 ++++++++++++++++++++---
 net/bluetooth/hci_core.c  |   19 +++++++++++++++++++
 3 files changed, 40 insertions(+), 3 deletions(-)

--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -312,6 +312,7 @@ enum {
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
+	__WQ_DESTROYING		= 1 << 19,
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1366,6 +1366,9 @@ static void __queue_work(int cpu, struct
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
+	if (WARN_ON(wq->flags & __WQ_DESTROYING))
+		return;
+
 	/*
 	 * While a work item is PENDING && off queue, a task trying to
 	 * steal the PENDING will busy-loop waiting for it to either get
@@ -4010,6 +4013,7 @@ void destroy_workqueue(struct workqueue_
 	int node;
 
 	/* drain it before proceeding with destruction */
+	wq->flags |= __WQ_DESTROYING;
 	drain_workqueue(wq);
 
 	/* sanity checks */
@@ -4024,9 +4028,22 @@ void destroy_workqueue(struct workqueue_
 			}
 		}
 
-		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
-		    WARN_ON(pwq->nr_active) ||
-		    WARN_ON(!list_empty(&pwq->delayed_works))) {
+		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1))) {
+			pr_info("%s: pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d\n",
+					__func__, pwq, wq->dfl_pwq,
+					pwq->refcnt);
+			mutex_unlock(&wq->mutex);
+			return;
+		}
+
+		if (WARN_ON(pwq->nr_active)) {
+			pr_info("%s: %ps\n", __func__, wq);
+			mutex_unlock(&wq->mutex);
+			return;
+		}
+
+		if (WARN_ON(!list_empty(&pwq->delayed_works))) {
+			pr_info("%s: %ps\n", __func__, wq);
 			mutex_unlock(&wq->mutex);
 			return;
 		}
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2467,6 +2467,8 @@ static void hci_cmd_timeout(struct work_
 	struct hci_dev *hdev = container_of(work, struct hci_dev,
 					    cmd_timer.work);
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
 	if (hdev->sent_cmd) {
 		struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
 		u16 opcode = __le16_to_cpu(sent->opcode);
@@ -3225,6 +3227,11 @@ int hci_recv_frame(struct hci_dev *hdev,
 		return -ENXIO;
 	}
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER))) {
+		kfree_skb(skb);
+		return -ENXIO;
+	}
+
 	if (hci_skb_pkt_type(skb) != HCI_EVENT_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_ACLDATA_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_SCODATA_PKT) {
@@ -3248,6 +3255,9 @@ EXPORT_SYMBOL(hci_recv_frame);
 /* Receive diagnostic message from HCI drivers */
 int hci_recv_diag(struct hci_dev *hdev, struct sk_buff *skb)
 {
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	/* Mark as diagnostic packet */
 	hci_skb_pkt_type(skb) = HCI_DIAG_PKT;
 
@@ -3326,6 +3336,9 @@ int hci_send_cmd(struct hci_dev *hdev, _
 {
 	struct sk_buff *skb;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	BT_DBG("%s opcode 0x%4.4x plen %d", hdev->name, opcode, plen);
 
 	skb = hci_prepare_cmd(hdev, opcode, plen, param);
@@ -3461,6 +3474,9 @@ void hci_send_acl(struct hci_chan *chan,
 {
 	struct hci_dev *hdev = chan->conn->hdev;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
 
 	hci_queue_acl(chan, &chan->data_q, skb, flags);
@@ -3474,6 +3490,9 @@ void hci_send_sco(struct hci_conn *conn,
 	struct hci_dev *hdev = conn->hdev;
 	struct hci_sco_hdr hdr;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s len %d", hdev->name, skb->len);
 
 	hdr.handle = cpu_to_le16(conn->handle);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-02-18 14:00 ` Jiri Slaby
@ 2016-02-18 14:22   ` Dmitry Vyukov
  2016-02-18 17:44     ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Vyukov @ 2016-02-18 14:22 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin, Eric Dumazet, Tejun Heo,
	Takashi Iwai

On Thu, Feb 18, 2016 at 3:00 PM, Jiri Slaby <jslaby@suse.cz> wrote:
> Cc Tejun (workqueues), Takashi (debug patch)
>
> On 01/26/2016, 12:53 PM, Dmitry Vyukov wrote:
>> Hello,
>>
>> I've hit the following warning while running syzkaller fuzzer:
>
> Hi,
>
> I am hitting it over and over again using syzkaller.
>
>> WARNING: CPU: 2 PID: 17409 at kernel/workqueue.c:3968
>> destroy_workqueue+0x172/0x550()
>
> Which of the warnings is it in your case?


The line number points to WARN_ON(pwq->nr_active), but there can well
be a off-by-one error, so I am not sure.
If you are hitting it all the time, so probably it is the same. Or at
least if makes sense to fix yours first since it is more easily
reproducible.


> I hit "(pwq != wq->dfl_pwq) && (pwq->refcnt > 1)". So I have this in the
> code:
> if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1))) {
>         pr_info("%s: pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d\n",
>                         __func__, pwq, wq->dfl_pwq,
>                         pwq->refcnt);
>         mutex_unlock(&wq->mutex);
>         return;
> }
>
> if (WARN_ON(pwq->nr_active)) {
> ...
>
> And the values are:
> pwq=ffff88006e271500 wq->dfl_pwq=ffff88006e272400 pwq->refcnt=2
> pwq=ffff88002ec48300 wq->dfl_pwq=ffff88002ec4bc00 pwq->refcnt=2
> pwq=ffff880020f76400 wq->dfl_pwq=ffff880020f75500 pwq->refcnt=2
> pwq=ffff88002a908f00 wq->dfl_pwq=ffff88002a90bc00 pwq->refcnt=1
> pwq=ffff8800348e0300 wq->dfl_pwq=ffff8800348e0000 pwq->refcnt=2
> pwq=ffff88006e276400 wq->dfl_pwq=ffff88006e275800 pwq->refcnt=2
>
>
> Note the single "pwq->refcnt=1" in there. So this is perhaps a race?
>
> Takashi also provided me with a debug patch included in the attached
> patch. It did not trigger though.
>
>> CPU: 2 PID: 17409 Comm: syz-executor Not tainted 4.5.0-rc1+ #283
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>  00000000ffffffff ffff88003665f8a0 ffffffff8299a06d 0000000000000000
>>  ffff88003599c740 ffffffff8643f0c0 ffff88003665f8e0 ffffffff8134fcf9
>>  ffffffff8139d4c2 ffffffff8643f0c0 0000000000000f80 ffff8800630c5ae8
>> Call Trace:
>>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>>  [<ffffffff8299a06d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>>  [<ffffffff8134fcf9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>>  [<ffffffff8134ff29>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>>  [<ffffffff8139d4c2>] destroy_workqueue+0x172/0x550 kernel/workqueue.c:3968
>>  [<ffffffff85a714a4>] hci_unregister_dev+0x264/0x700
>> net/bluetooth/hci_core.c:3162
>>  [<ffffffff84595ce6>] vhci_release+0x76/0xe0 drivers/bluetooth/hci_vhci.c:341
>>  [<ffffffff817b2376>] __fput+0x236/0x780 fs/file_table.c:208
>>  [<ffffffff817b2945>] ____fput+0x15/0x20 fs/file_table.c:244
>>  [<ffffffff813ad760>] task_work_run+0x170/0x210 kernel/task_work.c:115
>>  [<     inline     >] exit_task_work include/linux/task_work.h:21
>>  [<ffffffff81358da5>] do_exit+0x8b5/0x2c60 kernel/exit.c:748
>>  [<ffffffff8135b2c8>] do_group_exit+0x108/0x330 kernel/exit.c:878
>>  [<ffffffff8137e434>] get_signal+0x5e4/0x14f0 kernel/signal.c:2307
>>  [<ffffffff811a1db3>] do_signal+0x83/0x1c90 arch/x86/kernel/signal.c:712
>>  [<ffffffff81006685>] exit_to_usermode_loop+0x1a5/0x210
>> arch/x86/entry/common.c:247
>>  [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:282
>>  [<ffffffff810084ea>] syscall_return_slowpath+0x2ba/0x340
>> arch/x86/entry/common.c:344
>>  [<ffffffff863597a2>] int_ret_from_sys_call+0x25/0x9f
>> arch/x86/entry/entry_64.S:281
>> ---[ end trace f627386faee7426f ]---
>>
>> Unfortunately I cannot reproduce it in a controlled environment, but
>> I've hit it twice in different VMs. So maybe if you see something
>> obvious there. Is it possible that something is submitted into the
>> workqueue between it is drained and destroyed in hci_unregister_dev?
>>
>> On commit 92e963f50fc74041b5e9e744c330dca48e04f08d (Jan 24).
>>
>
> thanks,
> --
> js
> suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-02-18 14:22   ` Dmitry Vyukov
@ 2016-02-18 17:44     ` Tejun Heo
  2016-02-19 10:20         ` Jiri Slaby
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2016-02-18 17:44 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Slaby, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

Hello,

Can you please do the followings?

1. Remove WQ_MEM_RECLAIM from the affected workqueue and see whether
   the problem is reproducible.  WQ_MEM_RECLAIM on anything bluetooth
   doesn't make sense btw.  Why is it there?

2. If WQ_MEM_RECLAIM makes the issue go away, see whether the attached
   patch works too.

Thanks.

-- 
tejun

[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 1183 bytes --]

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7ff5dc7..9824d4f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4012,7 +4012,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	/* drain it before proceeding with destruction */
 	drain_workqueue(wq);
 
-	/* sanity checks */
+	/* nothing should be in flight */
 	mutex_lock(&wq->mutex);
 	for_each_pwq(pwq, wq) {
 		int i;
@@ -4024,8 +4024,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 			}
 		}
 
-		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
-		    WARN_ON(pwq->nr_active) ||
+		if (WARN_ON(pwq->nr_active) ||
 		    WARN_ON(!list_empty(&pwq->delayed_works))) {
 			mutex_unlock(&wq->mutex);
 			return;
@@ -4046,6 +4045,13 @@ void destroy_workqueue(struct workqueue_struct *wq)
 	if (wq->rescuer)
 		kthread_stop(wq->rescuer->task);
 
+	/* rescuer is gone, everything should be quiescent now */
+	WARN_ON(!list_empty(&wq->maydays));
+	mutex_lock(&wq->mutex);
+	for_each_pwq(pwq, wq)
+		WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1));
+	mutex_unlock(&wq->mutex);
+
 	if (!(wq->flags & WQ_UNBOUND)) {
 		/*
 		 * The base ref is never dropped on per-cpu pwqs.  Directly

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-02-19 10:20         ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-02-19 10:20 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin, Eric Dumazet, Takashi Iwai

On 02/18/2016, 06:44 PM, Tejun Heo wrote:
> Hello,
> 
> Can you please do the followings?
> 
> 1. Remove WQ_MEM_RECLAIM from the affected workqueue and see whether
>    the problem is reproducible.  WQ_MEM_RECLAIM on anything bluetooth
>    doesn't make sense btw.  Why is it there?
> 
> 2. If WQ_MEM_RECLAIM makes the issue go away, see whether the attached
>    patch works too.

Hello,

1. didn't help, the problem persists. So I haven't applied the patch from 2.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-02-19 10:20         ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-02-19 10:20 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On 02/18/2016, 06:44 PM, Tejun Heo wrote:
> Hello,
> 
> Can you please do the followings?
> 
> 1. Remove WQ_MEM_RECLAIM from the affected workqueue and see whether
>    the problem is reproducible.  WQ_MEM_RECLAIM on anything bluetooth
>    doesn't make sense btw.  Why is it there?
> 
> 2. If WQ_MEM_RECLAIM makes the issue go away, see whether the attached
>    patch works too.

Hello,

1. didn't help, the problem persists. So I haven't applied the patch from 2.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-02-19 10:20         ` Jiri Slaby
  (?)
@ 2016-02-19 12:10         ` Jiri Slaby
  2016-03-02 15:45           ` Tejun Heo
  -1 siblings, 1 reply; 29+ messages in thread
From: Jiri Slaby @ 2016-02-19 12:10 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin, Eric Dumazet, Takashi Iwai

On 02/19/2016, 11:20 AM, Jiri Slaby wrote:
> On 02/18/2016, 06:44 PM, Tejun Heo wrote:
>> Hello,
>>
>> Can you please do the followings?
>>
>> 1. Remove WQ_MEM_RECLAIM from the affected workqueue and see whether
>>    the problem is reproducible.  WQ_MEM_RECLAIM on anything bluetooth
>>    doesn't make sense btw.  Why is it there?
>>
>> 2. If WQ_MEM_RECLAIM makes the issue go away, see whether the attached
>>    patch works too.
> 
> Hello,
> 
> 1. didn't help, the problem persists. So I haven't applied the patch from 2.

FWIW I dumped more info about the wq:
wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>

> thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-02-19 12:10         ` Jiri Slaby
@ 2016-03-02 15:45           ` Tejun Heo
  2016-03-03  9:12               ` Jiri Slaby
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2016-03-02 15:45 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hello, Jiri.

On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
> > 1. didn't help, the problem persists. So I haven't applied the patch from 2.
> 
> FWIW I dumped more info about the wq:
> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>

Can you please print out the same info for all pwq's during shutdown?
It looks like we're leaking pwq refcnt but I can't spot a place where
that could happen on an empty pwq.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-03  9:12               ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-03  9:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hi,

On 03/02/2016, 04:45 PM, Tejun Heo wrote:
> On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
>>> 1. didn't help, the problem persists. So I haven't applied the patch from 2.
>>
>> FWIW I dumped more info about the wq:
>> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
>> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>
> 
> Can you please print out the same info for all pwq's during shutdown?
> It looks like we're leaking pwq refcnt but I can't spot a place where
> that could happen on an empty pwq.

I have not done that yet, but today, I see:
destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
   pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
     in-flight: 18568:wq_barrier_func

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-03  9:12               ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-03  9:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev,
	LKML, syzkaller, Kostya Serebryany, Alexander Potapenko,
	Sasha Levin, Eric Dumazet, Takashi Iwai

Hi,

On 03/02/2016, 04:45 PM, Tejun Heo wrote:
> On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
>>> 1. didn't help, the problem persists. So I haven't applied the patch from 2.
>>
>> FWIW I dumped more info about the wq:
>> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
>> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>
> 
> Can you please print out the same info for all pwq's during shutdown?
> It looks like we're leaking pwq refcnt but I can't spot a place where
> that could happen on an empty pwq.

I have not done that yet, but today, I see:
destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
   pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
     in-flight: 18568:wq_barrier_func

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-03-03  9:12               ` Jiri Slaby
  (?)
@ 2016-03-11 17:12               ` Tejun Heo
  2016-03-17 12:00                   ` Jiri Slaby
  -1 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2016-03-11 17:12 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hello,

Sorry about the delay.

On Thu, Mar 03, 2016 at 10:12:01AM +0100, Jiri Slaby wrote:
> On 03/02/2016, 04:45 PM, Tejun Heo wrote:
> > On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
> >>> 1. didn't help, the problem persists. So I haven't applied the patch from 2.
> >>
> >> FWIW I dumped more info about the wq:
> >> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
> >> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>
> > 
> > Can you please print out the same info for all pwq's during shutdown?
> > It looks like we're leaking pwq refcnt but I can't spot a place where
> > that could happen on an empty pwq.
> 
> I have not done that yet, but today, I see:
> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>      in-flight: 18568:wq_barrier_func

So, this means that there's flush_work() racing against workqueue
destruction, which can't be safe. :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-17 12:00                   ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-17 12:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Hello,

On 03/11/2016, 06:12 PM, Tejun Heo wrote:
> On Thu, Mar 03, 2016 at 10:12:01AM +0100, Jiri Slaby wrote:
>> On 03/02/2016, 04:45 PM, Tejun Heo wrote:
>>> On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
>>>>> 1. didn't help, the problem persists. So I haven't applied the patch from 2.
>>>>
>>>> FWIW I dumped more info about the wq:
>>>> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
>>>> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>
>>>
>>> Can you please print out the same info for all pwq's during shutdown?
>>> It looks like we're leaking pwq refcnt but I can't spot a place where
>>> that could happen on an empty pwq.
>>
>> I have not done that yet, but today, I see:
>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>      in-flight: 18568:wq_barrier_func
> 
> So, this means that there's flush_work() racing against workqueue
> destruction, which can't be safe. :(

But I cannot trigger the WARN_ONs in the attached patch, so I am
confused how this can happen :(. (While I am still seeing the destroy
WARNINGs.)

BTW. what did you mean by dumping the states at shutdown? Is it still
relevant?

thanks,
-- 
js
suse labs

[-- Attachment #2: vhci_debug.patch --]
[-- Type: text/x-patch, Size: 7549 bytes --]

---
 include/linux/workqueue.h        |    1 +
 include/net/bluetooth/hci_core.h |    5 +++++
 kernel/reboot.c                  |    1 +
 kernel/workqueue.c               |   34 +++++++++++++++++++++++++++++++---
 net/bluetooth/hci_core.c         |   39 +++++++++++++++++++++++++++++++++++----
 5 files changed, 73 insertions(+), 7 deletions(-)

--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -312,6 +312,7 @@ enum {
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
+	__WQ_DESTROYING		= 1 << 19,
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -312,6 +312,11 @@ struct hci_dev {
 
 	struct workqueue_struct	*workqueue;
 	struct workqueue_struct	*req_workqueue;
+#define HCI_WQ_A	0
+#define HCI_WQ_D	1
+#define HCI_WQR_A	8
+#define HCI_WQR_D	9
+	unsigned long		wq_status;
 
 	struct work_struct	power_on;
 	struct delayed_work	power_off;
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -231,6 +231,7 @@ static void kernel_shutdown_prepare(enum
 		(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
 	system_state = state;
 	usermodehelper_disable();
+	show_workqueue_state();
 	device_shutdown();
 }
 /**
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1366,6 +1366,9 @@ static void __queue_work(int cpu, struct
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
+	if (WARN_ON(wq->flags & __WQ_DESTROYING))
+		return;
+
 	/*
 	 * While a work item is PENDING && off queue, a task trying to
 	 * steal the PENDING will busy-loop waiting for it to either get
@@ -2804,6 +2807,9 @@ static bool start_flush_work(struct work
 		pwq = worker->current_pwq;
 	}
 
+	if (WARN_ON(pwq->wq->flags & __WQ_DESTROYING))
+		return false;
+
 	check_flush_dependency(pwq->wq, work);
 
 	insert_wq_barrier(pwq, barr, work, worker);
@@ -2821,6 +2827,8 @@ static bool start_flush_work(struct work
 		lock_map_acquire_read(&pwq->wq->lockdep_map);
 	lock_map_release(&pwq->wq->lockdep_map);
 
+	WARN_ON(pwq->wq->flags & __WQ_DESTROYING);
+
 	return true;
 already_gone:
 	spin_unlock_irq(&pool->lock);
@@ -3998,6 +4006,8 @@ err_destroy:
 }
 EXPORT_SYMBOL_GPL(__alloc_workqueue_key);
 
+static void show_pwq(struct pool_workqueue *pwq);
+
 /**
  * destroy_workqueue - safely terminate a workqueue
  * @wq: target workqueue
@@ -4010,6 +4020,7 @@ void destroy_workqueue(struct workqueue_
 	int node;
 
 	/* drain it before proceeding with destruction */
+	wq->flags |= __WQ_DESTROYING;
 	drain_workqueue(wq);
 
 	/* sanity checks */
@@ -4024,9 +4035,26 @@ void destroy_workqueue(struct workqueue_
 			}
 		}
 
-		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
-		    WARN_ON(pwq->nr_active) ||
-		    WARN_ON(!list_empty(&pwq->delayed_works))) {
+		if ((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) {
+			pr_info("%s: name='%s' pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d pwq->nr_active=%d delayed_works:",
+					__func__, wq->name, pwq, wq->dfl_pwq,
+					pwq->refcnt, pwq->nr_active);
+
+			show_pwq(pwq);
+
+			mutex_unlock(&wq->mutex);
+			WARN_ON(1);
+			return;
+		}
+
+		if (WARN_ON(pwq->nr_active)) {
+			pr_info("%s: %ps\n", __func__, wq);
+			mutex_unlock(&wq->mutex);
+			return;
+		}
+
+		if (WARN_ON(!list_empty(&pwq->delayed_works))) {
+			pr_info("%s: %ps\n", __func__, wq);
 			mutex_unlock(&wq->mutex);
 			return;
 		}
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1471,7 +1471,11 @@ int hci_dev_open(__u16 dev)
 	 * has finished. This means that error conditions like RFKILL
 	 * or no valid public or static random address apply.
 	 */
+	WARN_ON(test_bit(HCI_WQR_D, &hdev->wq_status));
+	WARN_ON(!test_bit(HCI_WQR_A, &hdev->wq_status));
 	flush_workqueue(hdev->req_workqueue);
+	WARN_ON(test_bit(HCI_WQR_D, &hdev->wq_status));
+	WARN_ON(!test_bit(HCI_WQR_A, &hdev->wq_status));
 
 	/* For controllers not using the management interface and that
 	 * are brought up using legacy ioctl, set the HCI_BONDABLE bit
@@ -2467,6 +2471,8 @@ static void hci_cmd_timeout(struct work_
 	struct hci_dev *hdev = container_of(work, struct hci_dev,
 					    cmd_timer.work);
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
 	if (hdev->sent_cmd) {
 		struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
 		u16 opcode = __le16_to_cpu(sent->opcode);
@@ -3043,15 +3049,17 @@ int hci_register_dev(struct hci_dev *hde
 
 	BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus);
 
-	hdev->workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND |
-					  WQ_MEM_RECLAIM, 1, hdev->name);
+	hdev->workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND
+					  , 1, hdev->name);
+	set_bit(HCI_WQ_A, &hdev->wq_status);
 	if (!hdev->workqueue) {
 		error = -ENOMEM;
 		goto err;
 	}
 
-	hdev->req_workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND |
-					      WQ_MEM_RECLAIM, 1, hdev->name);
+	hdev->req_workqueue = alloc_workqueue("req_%s", WQ_HIGHPRI | WQ_UNBOUND
+					      , 1, hdev->name);
+	set_bit(HCI_WQR_A, &hdev->wq_status);
 	if (!hdev->req_workqueue) {
 		destroy_workqueue(hdev->workqueue);
 		error = -ENOMEM;
@@ -3108,8 +3116,12 @@ int hci_register_dev(struct hci_dev *hde
 	return id;
 
 err_wqueue:
+	set_bit(HCI_WQ_D, &hdev->wq_status);
 	destroy_workqueue(hdev->workqueue);
+	clear_bit(HCI_WQ_A, &hdev->wq_status);
+	set_bit(HCI_WQR_D, &hdev->wq_status);
 	destroy_workqueue(hdev->req_workqueue);
+	clear_bit(HCI_WQR_A, &hdev->wq_status);
 err:
 	ida_simple_remove(&hci_index_ida, hdev->id);
 
@@ -3160,7 +3172,9 @@ void hci_unregister_dev(struct hci_dev *
 	debugfs_remove_recursive(hdev->debugfs);
 
 	destroy_workqueue(hdev->workqueue);
+	set_bit(HCI_WQR_D, &hdev->wq_status);
 	destroy_workqueue(hdev->req_workqueue);
+	clear_bit(HCI_WQR_A, &hdev->wq_status);
 
 	hci_dev_lock(hdev);
 	hci_bdaddr_list_clear(&hdev->blacklist);
@@ -3225,6 +3239,11 @@ int hci_recv_frame(struct hci_dev *hdev,
 		return -ENXIO;
 	}
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER))) {
+		kfree_skb(skb);
+		return -ENXIO;
+	}
+
 	if (hci_skb_pkt_type(skb) != HCI_EVENT_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_ACLDATA_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_SCODATA_PKT) {
@@ -3248,6 +3267,9 @@ EXPORT_SYMBOL(hci_recv_frame);
 /* Receive diagnostic message from HCI drivers */
 int hci_recv_diag(struct hci_dev *hdev, struct sk_buff *skb)
 {
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	/* Mark as diagnostic packet */
 	hci_skb_pkt_type(skb) = HCI_DIAG_PKT;
 
@@ -3326,6 +3348,9 @@ int hci_send_cmd(struct hci_dev *hdev, _
 {
 	struct sk_buff *skb;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	BT_DBG("%s opcode 0x%4.4x plen %d", hdev->name, opcode, plen);
 
 	skb = hci_prepare_cmd(hdev, opcode, plen, param);
@@ -3461,6 +3486,9 @@ void hci_send_acl(struct hci_chan *chan,
 {
 	struct hci_dev *hdev = chan->conn->hdev;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
 
 	hci_queue_acl(chan, &chan->data_q, skb, flags);
@@ -3474,6 +3502,9 @@ void hci_send_sco(struct hci_conn *conn,
 	struct hci_dev *hdev = conn->hdev;
 	struct hci_sco_hdr hdr;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s len %d", hdev->name, skb->len);
 
 	hdr.handle = cpu_to_le16(conn->handle);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-17 12:00                   ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-17 12:00 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev,
	LKML, syzkaller, Kostya Serebryany, Alexander Potapenko,
	Sasha Levin, Eric Dumazet, Takashi Iwai

[-- Attachment #1: Type: text/plain, Size: 1327 bytes --]

Hello,

On 03/11/2016, 06:12 PM, Tejun Heo wrote:
> On Thu, Mar 03, 2016 at 10:12:01AM +0100, Jiri Slaby wrote:
>> On 03/02/2016, 04:45 PM, Tejun Heo wrote:
>>> On Fri, Feb 19, 2016 at 01:10:00PM +0100, Jiri Slaby wrote:
>>>>> 1. didn't help, the problem persists. So I haven't applied the patch from 2.
>>>>
>>>> FWIW I dumped more info about the wq:
>>>> wq->name='hci0' pwq=ffff8800390d7600 wq->dfl_pwq=ffff8800390d5200
>>>> pwq->refcnt=2 pwq->nr_active=0 delayed_works: <nothing>
>>>
>>> Can you please print out the same info for all pwq's during shutdown?
>>> It looks like we're leaking pwq refcnt but I can't spot a place where
>>> that could happen on an empty pwq.
>>
>> I have not done that yet, but today, I see:
>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>      in-flight: 18568:wq_barrier_func
> 
> So, this means that there's flush_work() racing against workqueue
> destruction, which can't be safe. :(

But I cannot trigger the WARN_ONs in the attached patch, so I am
confused how this can happen :(. (While I am still seeing the destroy
WARNINGs.)

BTW. what did you mean by dumping the states at shutdown? Is it still
relevant?

thanks,
-- 
js
suse labs

[-- Attachment #2: vhci_debug.patch --]
[-- Type: text/x-patch, Size: 7549 bytes --]

---
 include/linux/workqueue.h        |    1 +
 include/net/bluetooth/hci_core.h |    5 +++++
 kernel/reboot.c                  |    1 +
 kernel/workqueue.c               |   34 +++++++++++++++++++++++++++++++---
 net/bluetooth/hci_core.c         |   39 +++++++++++++++++++++++++++++++++++----
 5 files changed, 73 insertions(+), 7 deletions(-)

--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -312,6 +312,7 @@ enum {
 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
 	__WQ_LEGACY		= 1 << 18, /* internal: create*_workqueue() */
+	__WQ_DESTROYING		= 1 << 19,
 
 	WQ_MAX_ACTIVE		= 512,	  /* I like 512, better ideas? */
 	WQ_MAX_UNBOUND_PER_CPU	= 4,	  /* 4 * #cpus for unbound wq */
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -312,6 +312,11 @@ struct hci_dev {
 
 	struct workqueue_struct	*workqueue;
 	struct workqueue_struct	*req_workqueue;
+#define HCI_WQ_A	0
+#define HCI_WQ_D	1
+#define HCI_WQR_A	8
+#define HCI_WQR_D	9
+	unsigned long		wq_status;
 
 	struct work_struct	power_on;
 	struct delayed_work	power_off;
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -231,6 +231,7 @@ static void kernel_shutdown_prepare(enum
 		(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
 	system_state = state;
 	usermodehelper_disable();
+	show_workqueue_state();
 	device_shutdown();
 }
 /**
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1366,6 +1366,9 @@ static void __queue_work(int cpu, struct
 	unsigned int work_flags;
 	unsigned int req_cpu = cpu;
 
+	if (WARN_ON(wq->flags & __WQ_DESTROYING))
+		return;
+
 	/*
 	 * While a work item is PENDING && off queue, a task trying to
 	 * steal the PENDING will busy-loop waiting for it to either get
@@ -2804,6 +2807,9 @@ static bool start_flush_work(struct work
 		pwq = worker->current_pwq;
 	}
 
+	if (WARN_ON(pwq->wq->flags & __WQ_DESTROYING))
+		return false;
+
 	check_flush_dependency(pwq->wq, work);
 
 	insert_wq_barrier(pwq, barr, work, worker);
@@ -2821,6 +2827,8 @@ static bool start_flush_work(struct work
 		lock_map_acquire_read(&pwq->wq->lockdep_map);
 	lock_map_release(&pwq->wq->lockdep_map);
 
+	WARN_ON(pwq->wq->flags & __WQ_DESTROYING);
+
 	return true;
 already_gone:
 	spin_unlock_irq(&pool->lock);
@@ -3998,6 +4006,8 @@ err_destroy:
 }
 EXPORT_SYMBOL_GPL(__alloc_workqueue_key);
 
+static void show_pwq(struct pool_workqueue *pwq);
+
 /**
  * destroy_workqueue - safely terminate a workqueue
  * @wq: target workqueue
@@ -4010,6 +4020,7 @@ void destroy_workqueue(struct workqueue_
 	int node;
 
 	/* drain it before proceeding with destruction */
+	wq->flags |= __WQ_DESTROYING;
 	drain_workqueue(wq);
 
 	/* sanity checks */
@@ -4024,9 +4035,26 @@ void destroy_workqueue(struct workqueue_
 			}
 		}
 
-		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
-		    WARN_ON(pwq->nr_active) ||
-		    WARN_ON(!list_empty(&pwq->delayed_works))) {
+		if ((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) {
+			pr_info("%s: name='%s' pwq=%p wq->dfl_pwq=%p pwq->refcnt=%d pwq->nr_active=%d delayed_works:",
+					__func__, wq->name, pwq, wq->dfl_pwq,
+					pwq->refcnt, pwq->nr_active);
+
+			show_pwq(pwq);
+
+			mutex_unlock(&wq->mutex);
+			WARN_ON(1);
+			return;
+		}
+
+		if (WARN_ON(pwq->nr_active)) {
+			pr_info("%s: %ps\n", __func__, wq);
+			mutex_unlock(&wq->mutex);
+			return;
+		}
+
+		if (WARN_ON(!list_empty(&pwq->delayed_works))) {
+			pr_info("%s: %ps\n", __func__, wq);
 			mutex_unlock(&wq->mutex);
 			return;
 		}
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1471,7 +1471,11 @@ int hci_dev_open(__u16 dev)
 	 * has finished. This means that error conditions like RFKILL
 	 * or no valid public or static random address apply.
 	 */
+	WARN_ON(test_bit(HCI_WQR_D, &hdev->wq_status));
+	WARN_ON(!test_bit(HCI_WQR_A, &hdev->wq_status));
 	flush_workqueue(hdev->req_workqueue);
+	WARN_ON(test_bit(HCI_WQR_D, &hdev->wq_status));
+	WARN_ON(!test_bit(HCI_WQR_A, &hdev->wq_status));
 
 	/* For controllers not using the management interface and that
 	 * are brought up using legacy ioctl, set the HCI_BONDABLE bit
@@ -2467,6 +2471,8 @@ static void hci_cmd_timeout(struct work_
 	struct hci_dev *hdev = container_of(work, struct hci_dev,
 					    cmd_timer.work);
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
 	if (hdev->sent_cmd) {
 		struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
 		u16 opcode = __le16_to_cpu(sent->opcode);
@@ -3043,15 +3049,17 @@ int hci_register_dev(struct hci_dev *hde
 
 	BT_DBG("%p name %s bus %d", hdev, hdev->name, hdev->bus);
 
-	hdev->workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND |
-					  WQ_MEM_RECLAIM, 1, hdev->name);
+	hdev->workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND
+					  , 1, hdev->name);
+	set_bit(HCI_WQ_A, &hdev->wq_status);
 	if (!hdev->workqueue) {
 		error = -ENOMEM;
 		goto err;
 	}
 
-	hdev->req_workqueue = alloc_workqueue("%s", WQ_HIGHPRI | WQ_UNBOUND |
-					      WQ_MEM_RECLAIM, 1, hdev->name);
+	hdev->req_workqueue = alloc_workqueue("req_%s", WQ_HIGHPRI | WQ_UNBOUND
+					      , 1, hdev->name);
+	set_bit(HCI_WQR_A, &hdev->wq_status);
 	if (!hdev->req_workqueue) {
 		destroy_workqueue(hdev->workqueue);
 		error = -ENOMEM;
@@ -3108,8 +3116,12 @@ int hci_register_dev(struct hci_dev *hde
 	return id;
 
 err_wqueue:
+	set_bit(HCI_WQ_D, &hdev->wq_status);
 	destroy_workqueue(hdev->workqueue);
+	clear_bit(HCI_WQ_A, &hdev->wq_status);
+	set_bit(HCI_WQR_D, &hdev->wq_status);
 	destroy_workqueue(hdev->req_workqueue);
+	clear_bit(HCI_WQR_A, &hdev->wq_status);
 err:
 	ida_simple_remove(&hci_index_ida, hdev->id);
 
@@ -3160,7 +3172,9 @@ void hci_unregister_dev(struct hci_dev *
 	debugfs_remove_recursive(hdev->debugfs);
 
 	destroy_workqueue(hdev->workqueue);
+	set_bit(HCI_WQR_D, &hdev->wq_status);
 	destroy_workqueue(hdev->req_workqueue);
+	clear_bit(HCI_WQR_A, &hdev->wq_status);
 
 	hci_dev_lock(hdev);
 	hci_bdaddr_list_clear(&hdev->blacklist);
@@ -3225,6 +3239,11 @@ int hci_recv_frame(struct hci_dev *hdev,
 		return -ENXIO;
 	}
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER))) {
+		kfree_skb(skb);
+		return -ENXIO;
+	}
+
 	if (hci_skb_pkt_type(skb) != HCI_EVENT_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_ACLDATA_PKT &&
 	    hci_skb_pkt_type(skb) != HCI_SCODATA_PKT) {
@@ -3248,6 +3267,9 @@ EXPORT_SYMBOL(hci_recv_frame);
 /* Receive diagnostic message from HCI drivers */
 int hci_recv_diag(struct hci_dev *hdev, struct sk_buff *skb)
 {
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	/* Mark as diagnostic packet */
 	hci_skb_pkt_type(skb) = HCI_DIAG_PKT;
 
@@ -3326,6 +3348,9 @@ int hci_send_cmd(struct hci_dev *hdev, _
 {
 	struct sk_buff *skb;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return -ENXIO;
+
 	BT_DBG("%s opcode 0x%4.4x plen %d", hdev->name, opcode, plen);
 
 	skb = hci_prepare_cmd(hdev, opcode, plen, param);
@@ -3461,6 +3486,9 @@ void hci_send_acl(struct hci_chan *chan,
 {
 	struct hci_dev *hdev = chan->conn->hdev;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s chan %p flags 0x%4.4x", hdev->name, chan, flags);
 
 	hci_queue_acl(chan, &chan->data_q, skb, flags);
@@ -3474,6 +3502,9 @@ void hci_send_sco(struct hci_conn *conn,
 	struct hci_dev *hdev = conn->hdev;
 	struct hci_sco_hdr hdr;
 
+	if (WARN_ON(hci_dev_test_flag(hdev, HCI_UNREGISTER)))
+		return;
+
 	BT_DBG("%s len %d", hdev->name, skb->len);
 
 	hdr.handle = cpu_to_le16(conn->handle);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-18 20:52                     ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2016-03-18 20:52 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hello, Jiri.

On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
> >> I have not done that yet, but today, I see:
> >> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
> >> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
> >>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
> >>      in-flight: 18568:wq_barrier_func
> > 
> > So, this means that there's flush_work() racing against workqueue
> > destruction, which can't be safe. :(
> 
> But I cannot trigger the WARN_ONs in the attached patch, so I am
> confused how this can happen :(. (While I am still seeing the destroy
> WARNINGs.)

So, no operations should be in progress when destroy_workqueue() is
called.  If somebody was flushing a work item, the flush call must
have returned before destroy_workqueue() was invoked, which doesn't
seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
the above triggers?  There must be a task which is flushing a work
item there and it shouldn't be difficult to pinpoint what's going on
from it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-18 20:52                     ` Tejun Heo
  0 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2016-03-18 20:52 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev,
	LKML, syzkaller, Kostya Serebryany, Alexander Potapenko,
	Sasha Levin, Eric Dumazet, Takashi Iwai

Hello, Jiri.

On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
> >> I have not done that yet, but today, I see:
> >> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
> >> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
> >>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
> >>      in-flight: 18568:wq_barrier_func
> > 
> > So, this means that there's flush_work() racing against workqueue
> > destruction, which can't be safe. :(
> 
> But I cannot trigger the WARN_ONs in the attached patch, so I am
> confused how this can happen :(. (While I am still seeing the destroy
> WARNINGs.)

So, no operations should be in progress when destroy_workqueue() is
called.  If somebody was flushing a work item, the flush call must
have returned before destroy_workqueue() was invoked, which doesn't
seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
the above triggers?  There must be a task which is flushing a work
item there and it shouldn't be difficult to pinpoint what's going on
from it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-03-18 20:52                     ` Tejun Heo
  (?)
@ 2016-03-21 15:58                     ` Jiri Slaby
  2016-03-22  8:09                         ` Jiri Slaby
  -1 siblings, 1 reply; 29+ messages in thread
From: Jiri Slaby @ 2016-03-21 15:58 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hello,

On 03/18/2016, 09:52 PM, Tejun Heo wrote:
> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>> I have not done that yet, but today, I see:
>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>      in-flight: 18568:wq_barrier_func
>>>
>>> So, this means that there's flush_work() racing against workqueue
>>> destruction, which can't be safe. :(
>>
>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>> confused how this can happen :(. (While I am still seeing the destroy
>> WARNINGs.)
> 
> So, no operations should be in progress when destroy_workqueue() is
> called.  If somebody was flushing a work item, the flush call must
> have returned before destroy_workqueue() was invoked, which doesn't
> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
> the above triggers?  There must be a task which is flushing a work
> item there and it shouldn't be difficult to pinpoint what's going on
> from it.

The output of sysrq-t is here (> 200k), but I cannot see anything
suspicious in it:
http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt

This is what the code does now:
+               if ((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) {
+                       pr_info("%s: name='%s' pwq=%p wq->dfl_pwq=%p
pwq->refcnt=%d pwq->nr_active=%d delayed_works:",
+                                       __func__, wq->name, pwq,
wq->dfl_pwq,
+                                       pwq->refcnt, pwq->nr_active);
+
+                       show_pwq(pwq);
+
+                       mutex_unlock(&wq->mutex);
+                       show_state();
+                       show_workqueue_state();
+                       WARN_ON(1);
+                       return;
+               }

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-22  8:09                         ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-22  8:09 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Sasha Levin, Eric Dumazet, Takashi Iwai

On 03/21/2016, 04:58 PM, Jiri Slaby wrote:
> Hello,
> 
> On 03/18/2016, 09:52 PM, Tejun Heo wrote:
>> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>>> I have not done that yet, but today, I see:
>>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>>      in-flight: 18568:wq_barrier_func
>>>>
>>>> So, this means that there's flush_work() racing against workqueue
>>>> destruction, which can't be safe. :(
>>>
>>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>>> confused how this can happen :(. (While I am still seeing the destroy
>>> WARNINGs.)
>>
>> So, no operations should be in progress when destroy_workqueue() is
>> called.  If somebody was flushing a work item, the flush call must
>> have returned before destroy_workqueue() was invoked, which doesn't
>> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
>> the above triggers?  There must be a task which is flushing a work
>> item there and it shouldn't be difficult to pinpoint what's going on
>> from it.
> 
> The output of sysrq-t is here (> 200k), but I cannot see anything
> suspicious in it:
> http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt

Hmm, so I seem I cannot reproduce with this hunk:
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3139,10 +3139,10 @@ void hci_unregister_dev(struct hci_dev *hdev)
        list_del(&hdev->list);
        write_unlock(&hci_dev_list_lock);

-       hci_dev_do_close(hdev);
-
        cancel_work_sync(&hdev->power_on);

+       hci_dev_do_close(hdev);
+
        if (!test_bit(HCI_INIT, &hdev->flags) &&
            !hci_dev_test_flag(hdev, HCI_SETUP) &&
            !hci_dev_test_flag(hdev, HCI_CONFIG)) {



I cannot explain why though. I do not see how it matters in this
particular case...

Dmitry, could you apply it too? But I don't know how often you see the
warning.

PS. next on the table is the gsm tty warning.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-22  8:09                         ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-03-22  8:09 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On 03/21/2016, 04:58 PM, Jiri Slaby wrote:
> Hello,
> 
> On 03/18/2016, 09:52 PM, Tejun Heo wrote:
>> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>>> I have not done that yet, but today, I see:
>>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>>      in-flight: 18568:wq_barrier_func
>>>>
>>>> So, this means that there's flush_work() racing against workqueue
>>>> destruction, which can't be safe. :(
>>>
>>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>>> confused how this can happen :(. (While I am still seeing the destroy
>>> WARNINGs.)
>>
>> So, no operations should be in progress when destroy_workqueue() is
>> called.  If somebody was flushing a work item, the flush call must
>> have returned before destroy_workqueue() was invoked, which doesn't
>> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
>> the above triggers?  There must be a task which is flushing a work
>> item there and it shouldn't be difficult to pinpoint what's going on
>> from it.
> 
> The output of sysrq-t is here (> 200k), but I cannot see anything
> suspicious in it:
> http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt

Hmm, so I seem I cannot reproduce with this hunk:
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3139,10 +3139,10 @@ void hci_unregister_dev(struct hci_dev *hdev)
        list_del(&hdev->list);
        write_unlock(&hci_dev_list_lock);

-       hci_dev_do_close(hdev);
-
        cancel_work_sync(&hdev->power_on);

+       hci_dev_do_close(hdev);
+
        if (!test_bit(HCI_INIT, &hdev->flags) &&
            !hci_dev_test_flag(hdev, HCI_SETUP) &&
            !hci_dev_test_flag(hdev, HCI_CONFIG)) {



I cannot explain why though. I do not see how it matters in this
particular case...

Dmitry, could you apply it too? But I don't know how often you see the
warning.

PS. next on the table is the gsm tty warning.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-22 12:32                           ` Dmitry Vyukov
  0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Vyukov @ 2016-03-22 12:32 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Tejun Heo, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On Tue, Mar 22, 2016 at 9:09 AM, Jiri Slaby <jslaby@suse.cz> wrote:
> On 03/21/2016, 04:58 PM, Jiri Slaby wrote:
>> Hello,
>>
>> On 03/18/2016, 09:52 PM, Tejun Heo wrote:
>>> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>>>> I have not done that yet, but today, I see:
>>>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>>>      in-flight: 18568:wq_barrier_func
>>>>>
>>>>> So, this means that there's flush_work() racing against workqueue
>>>>> destruction, which can't be safe. :(
>>>>
>>>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>>>> confused how this can happen :(. (While I am still seeing the destroy
>>>> WARNINGs.)
>>>
>>> So, no operations should be in progress when destroy_workqueue() is
>>> called.  If somebody was flushing a work item, the flush call must
>>> have returned before destroy_workqueue() was invoked, which doesn't
>>> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
>>> the above triggers?  There must be a task which is flushing a work
>>> item there and it shouldn't be difficult to pinpoint what's going on
>>> from it.
>>
>> The output of sysrq-t is here (> 200k), but I cannot see anything
>> suspicious in it:
>> http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt
>
> Hmm, so I seem I cannot reproduce with this hunk:
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3139,10 +3139,10 @@ void hci_unregister_dev(struct hci_dev *hdev)
>         list_del(&hdev->list);
>         write_unlock(&hci_dev_list_lock);
>
> -       hci_dev_do_close(hdev);
> -
>         cancel_work_sync(&hdev->power_on);
>
> +       hci_dev_do_close(hdev);
> +
>         if (!test_bit(HCI_INIT, &hdev->flags) &&
>             !hci_dev_test_flag(hdev, HCI_SETUP) &&
>             !hci_dev_test_flag(hdev, HCI_CONFIG)) {
>
>
>
> I cannot explain why though. I do not see how it matters in this
> particular case...
>
> Dmitry, could you apply it too? But I don't know how often you see the
> warning.

I've seen it only several times in several months, so I don't it will
be helpful.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-03-22 12:32                           ` Dmitry Vyukov
  0 siblings, 0 replies; 29+ messages in thread
From: Dmitry Vyukov @ 2016-03-22 12:32 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Tejun Heo, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On Tue, Mar 22, 2016 at 9:09 AM, Jiri Slaby <jslaby-AlSwsSmVLrQ@public.gmane.org> wrote:
> On 03/21/2016, 04:58 PM, Jiri Slaby wrote:
>> Hello,
>>
>> On 03/18/2016, 09:52 PM, Tejun Heo wrote:
>>> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>>>> I have not done that yet, but today, I see:
>>>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>>>      in-flight: 18568:wq_barrier_func
>>>>>
>>>>> So, this means that there's flush_work() racing against workqueue
>>>>> destruction, which can't be safe. :(
>>>>
>>>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>>>> confused how this can happen :(. (While I am still seeing the destroy
>>>> WARNINGs.)
>>>
>>> So, no operations should be in progress when destroy_workqueue() is
>>> called.  If somebody was flushing a work item, the flush call must
>>> have returned before destroy_workqueue() was invoked, which doesn't
>>> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
>>> the above triggers?  There must be a task which is flushing a work
>>> item there and it shouldn't be difficult to pinpoint what's going on
>>> from it.
>>
>> The output of sysrq-t is here (> 200k), but I cannot see anything
>> suspicious in it:
>> http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt
>
> Hmm, so I seem I cannot reproduce with this hunk:
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3139,10 +3139,10 @@ void hci_unregister_dev(struct hci_dev *hdev)
>         list_del(&hdev->list);
>         write_unlock(&hci_dev_list_lock);
>
> -       hci_dev_do_close(hdev);
> -
>         cancel_work_sync(&hdev->power_on);
>
> +       hci_dev_do_close(hdev);
> +
>         if (!test_bit(HCI_INIT, &hdev->flags) &&
>             !hci_dev_test_flag(hdev, HCI_SETUP) &&
>             !hci_dev_test_flag(hdev, HCI_CONFIG)) {
>
>
>
> I cannot explain why though. I do not see how it matters in this
> particular case...
>
> Dmitry, could you apply it too? But I don't know how often you see the
> warning.

I've seen it only several times in several months, so I don't it will
be helpful.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-03-22 12:32                           ` Dmitry Vyukov
  (?)
@ 2016-09-03 10:58                           ` Dmitry Vyukov
  2016-09-05 13:08                             ` Tejun Heo
  -1 siblings, 1 reply; 29+ messages in thread
From: Dmitry Vyukov @ 2016-09-03 10:58 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Tejun Heo, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On Tue, Mar 22, 2016 at 1:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Mar 22, 2016 at 9:09 AM, Jiri Slaby <jslaby@suse.cz> wrote:
>> On 03/21/2016, 04:58 PM, Jiri Slaby wrote:
>>> Hello,
>>>
>>> On 03/18/2016, 09:52 PM, Tejun Heo wrote:
>>>> On Thu, Mar 17, 2016 at 01:00:13PM +0100, Jiri Slaby wrote:
>>>>>>> I have not done that yet, but today, I see:
>>>>>>> destroy_workqueue: name='req_hci0' pwq=ffff88002f590300
>>>>>>> wq->dfl_pwq=ffff88002f591e00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>>>>>>>    pwq 12: cpus=0-1 node=0 flags=0x4 nice=-20 active=0/1
>>>>>>>      in-flight: 18568:wq_barrier_func
>>>>>>
>>>>>> So, this means that there's flush_work() racing against workqueue
>>>>>> destruction, which can't be safe. :(
>>>>>
>>>>> But I cannot trigger the WARN_ONs in the attached patch, so I am
>>>>> confused how this can happen :(. (While I am still seeing the destroy
>>>>> WARNINGs.)
>>>>
>>>> So, no operations should be in progress when destroy_workqueue() is
>>>> called.  If somebody was flushing a work item, the flush call must
>>>> have returned before destroy_workqueue() was invoked, which doesn't
>>>> seem to be the case here.  Can you trigger BUG_ON() or sysrq-t when
>>>> the above triggers?  There must be a task which is flushing a work
>>>> item there and it shouldn't be difficult to pinpoint what's going on
>>>> from it.
>>>
>>> The output of sysrq-t is here (> 200k), but I cannot see anything
>>> suspicious in it:
>>> http://www.fi.muni.cz/~xslaby/sklad/panics/jctl.txt
>>
>> Hmm, so I seem I cannot reproduce with this hunk:
>> --- a/net/bluetooth/hci_core.c
>> +++ b/net/bluetooth/hci_core.c
>> @@ -3139,10 +3139,10 @@ void hci_unregister_dev(struct hci_dev *hdev)
>>         list_del(&hdev->list);
>>         write_unlock(&hci_dev_list_lock);
>>
>> -       hci_dev_do_close(hdev);
>> -
>>         cancel_work_sync(&hdev->power_on);
>>
>> +       hci_dev_do_close(hdev);
>> +
>>         if (!test_bit(HCI_INIT, &hdev->flags) &&
>>             !hci_dev_test_flag(hdev, HCI_SETUP) &&
>>             !hci_dev_test_flag(hdev, HCI_CONFIG)) {
>>
>>
>>
>> I cannot explain why though. I do not see how it matters in this
>> particular case...
>>
>> Dmitry, could you apply it too? But I don't know how often you see the
>> warning.
>
> I've seen it only several times in several months, so I don't it will
> be helpful.


Bad news: I hit it again.
On 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next, so I have
bf389cabb3b8079c23f9762e62b05f291e2d5e99.

WARNING: CPU: 3 PID: 23168 at kernel/workqueue.c:4029
destroy_workqueue+0x172/0x620
CPU: 3 PID: 23168 Comm: syz-executor Not tainted 4.8.0-rc3-next-20160825+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffffffff886b6fe0 ffff880037be77a8 ffffffff82db38d9 ffffffff00000016
 fffffbfff10d6dfc ffffffff8708f000 ffff880037be7880 ffffffff8709be00
 dffffc0000000000 0000000000000009 ffff880037be7870 ffffffff816cf888
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82db38d9>] dump_stack+0x12e/0x185 lib/dump_stack.c:51
 [<ffffffff816cf888>] panic+0x1e4/0x3fb kernel/panic.c:179
 [<ffffffff813863b4>] __warn+0x1c4/0x1e0 kernel/panic.c:542
 [<ffffffff8138659c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [<ffffffff813dc6d2>] destroy_workqueue+0x172/0x620 kernel/workqueue.c:4029
 [<ffffffff8645973f>] hci_unregister_dev+0x3df/0x8a0
net/bluetooth/hci_core.c:3170
 [<ffffffff84cb6926>] vhci_release+0x76/0xf0 drivers/bluetooth/hci_vhci.c:354
 [<ffffffff81868bbc>] __fput+0x28c/0x780 fs/file_table.c:208
 [<ffffffff81869135>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff813ebf63>] task_work_run+0xf3/0x170
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff81394218>] do_exit+0x868/0x2e70 kernel/exit.c:828
 [<ffffffff81396998>] do_group_exit+0x108/0x330 kernel/exit.c:958
 [<ffffffff813ba4aa>] get_signal+0x62a/0x15d0 kernel/signal.c:2307
 [<ffffffff81202333>] do_signal+0x83/0x1f60 arch/x86/kernel/signal.c:805
 [<ffffffff81006345>] exit_to_usermode_loop+0x1a5/0x210
arch/x86/entry/common.c:163
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:198
 [<ffffffff8100869f>] syscall_return_slowpath+0x2bf/0x340
arch/x86/entry/common.c:267
 [<ffffffff86e1079c>] entry_SYSCALL_64_fastpath+0xbf/0xc1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-09-03 10:58                           ` Dmitry Vyukov
@ 2016-09-05 13:08                             ` Tejun Heo
  2016-09-05 13:14                               ` Dmitry Vyukov
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2016-09-05 13:08 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Slaby, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

Hello,

On Sat, Sep 03, 2016 at 12:58:33PM +0200, Dmitry Vyukov wrote:
> > I've seen it only several times in several months, so I don't it will
> > be helpful.
> 
> 
> Bad news: I hit it again.
> On 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next, so I have
> bf389cabb3b8079c23f9762e62b05f291e2d5e99.

Hmmm... we're not getting anywhere.  I've applied the following patch
to wq/for-4.8-fixes so that when this happens the next time we can
actually tell what the hell is going wrong.

Thanks.

------ 8< ------
>From 278930ada88c972d20025b0f20def27b1a09dff7 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Mon, 5 Sep 2016 08:54:06 -0400
Subject: [PATCH] workqueue: dump workqueue state on sanity check failures in
 destroy_workqueue()

destroy_workqueue() performs a number of sanity checks to ensure that
the workqueue is empty before proceeding with destruction.  However,
it's not always easy to tell what's going on just from the warning
message.  Let's dump workqueue state after sanity check failures to
help debugging.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
---
 kernel/workqueue.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ef071ca..4eaec8b8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4021,6 +4021,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 		for (i = 0; i < WORK_NR_COLORS; i++) {
 			if (WARN_ON(pwq->nr_in_flight[i])) {
 				mutex_unlock(&wq->mutex);
+				show_workqueue_state();
 				return;
 			}
 		}
@@ -4029,6 +4030,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 		    WARN_ON(pwq->nr_active) ||
 		    WARN_ON(!list_empty(&pwq->delayed_works))) {
 			mutex_unlock(&wq->mutex);
+			show_workqueue_state();
 			return;
 		}
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-09-05 13:08                             ` Tejun Heo
@ 2016-09-05 13:14                               ` Dmitry Vyukov
  2016-09-10  9:33                                 ` Dmitry Vyukov
  0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Vyukov @ 2016-09-05 13:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Slaby, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Sasha Levin,
	Eric Dumazet, Takashi Iwai

On Mon, Sep 5, 2016 at 3:08 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Sat, Sep 03, 2016 at 12:58:33PM +0200, Dmitry Vyukov wrote:
>> > I've seen it only several times in several months, so I don't it will
>> > be helpful.
>>
>>
>> Bad news: I hit it again.
>> On 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next, so I have
>> bf389cabb3b8079c23f9762e62b05f291e2d5e99.
>
> Hmmm... we're not getting anywhere.  I've applied the following patch
> to wq/for-4.8-fixes so that when this happens the next time we can
> actually tell what the hell is going wrong.
>
> Thanks.
>
> ------ 8< ------
> From 278930ada88c972d20025b0f20def27b1a09dff7 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@kernel.org>
> Date: Mon, 5 Sep 2016 08:54:06 -0400
> Subject: [PATCH] workqueue: dump workqueue state on sanity check failures in
>  destroy_workqueue()
>
> destroy_workqueue() performs a number of sanity checks to ensure that
> the workqueue is empty before proceeding with destruction.  However,
> it's not always easy to tell what's going on just from the warning
> message.  Let's dump workqueue state after sanity check failures to
> help debugging.
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com
> Cc: Dmitry Vyukov <dvyukov@google.com>
> ---
>  kernel/workqueue.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index ef071ca..4eaec8b8 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4021,6 +4021,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
>                 for (i = 0; i < WORK_NR_COLORS; i++) {
>                         if (WARN_ON(pwq->nr_in_flight[i])) {
>                                 mutex_unlock(&wq->mutex);
> +                               show_workqueue_state();
>                                 return;
>                         }
>                 }
> @@ -4029,6 +4030,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
>                     WARN_ON(pwq->nr_active) ||
>                     WARN_ON(!list_empty(&pwq->delayed_works))) {
>                         mutex_unlock(&wq->mutex);
> +                       show_workqueue_state();
>                         return;
>                 }
>         }


Applied this to my test tree.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-09-05 13:14                               ` Dmitry Vyukov
@ 2016-09-10  9:33                                 ` Dmitry Vyukov
  2016-09-13 15:35                                   ` Tejun Heo
  0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Vyukov @ 2016-09-10  9:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jiri Slaby, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Takashi Iwai

On Mon, Sep 5, 2016 at 3:14 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Mon, Sep 5, 2016 at 3:08 PM, Tejun Heo <tj@kernel.org> wrote:
>> Hello,
>>
>> On Sat, Sep 03, 2016 at 12:58:33PM +0200, Dmitry Vyukov wrote:
>>> > I've seen it only several times in several months, so I don't it will
>>> > be helpful.
>>>
>>>
>>> Bad news: I hit it again.
>>> On 0f98f121e1670eaa2a2fbb675e07d6ba7f0e146f of linux-next, so I have
>>> bf389cabb3b8079c23f9762e62b05f291e2d5e99.
>>
>> Hmmm... we're not getting anywhere.  I've applied the following patch
>> to wq/for-4.8-fixes so that when this happens the next time we can
>> actually tell what the hell is going wrong.
>>
>> Thanks.
>>
>> ------ 8< ------
>> From 278930ada88c972d20025b0f20def27b1a09dff7 Mon Sep 17 00:00:00 2001
>> From: Tejun Heo <tj@kernel.org>
>> Date: Mon, 5 Sep 2016 08:54:06 -0400
>> Subject: [PATCH] workqueue: dump workqueue state on sanity check failures in
>>  destroy_workqueue()
>>
>> destroy_workqueue() performs a number of sanity checks to ensure that
>> the workqueue is empty before proceeding with destruction.  However,
>> it's not always easy to tell what's going on just from the warning
>> message.  Let's dump workqueue state after sanity check failures to
>> help debugging.
>>
>> Signed-off-by: Tejun Heo <tj@kernel.org>
>> Link: http://lkml.kernel.org/r/CACT4Y+Zs6vkjHo9qHb4TrEiz3S4+quvvVQ9VWvj2Mx6pETGb9Q@mail.gmail.com
>> Cc: Dmitry Vyukov <dvyukov@google.com>
>> ---
>>  kernel/workqueue.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index ef071ca..4eaec8b8 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4021,6 +4021,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
>>                 for (i = 0; i < WORK_NR_COLORS; i++) {
>>                         if (WARN_ON(pwq->nr_in_flight[i])) {
>>                                 mutex_unlock(&wq->mutex);
>> +                               show_workqueue_state();
>>                                 return;
>>                         }
>>                 }
>> @@ -4029,6 +4030,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
>>                     WARN_ON(pwq->nr_active) ||
>>                     WARN_ON(!list_empty(&pwq->delayed_works))) {
>>                         mutex_unlock(&wq->mutex);
>> +                       show_workqueue_state();
>>                         return;
>>                 }
>>         }
>
>
> Applied this to my test tree.


Hit the WARNING with the patch. It showed "Showing busy workqueues and
worker pools:" after the WARNING, but then no queue info. Was it
already destroyed and removed from the list?...



[ 186.994195] blk_update_request: critical target error, dev sr0, sector 0
[ 187.031368] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 187.032172] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 187.032816] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 187.033432] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 02 00
[ 187.034124] blk_update_request: critical target error, dev sr0, sector 0
[ 187.069408] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 187.070127] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 187.070744] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 187.071368] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 04 00
[ 187.071982] blk_update_request: critical target error, dev sr0, sector 0
[ 187.098397] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 187.099194] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 187.099863] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 187.100527] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 04 00
[ 187.101206] blk_update_request: critical target error, dev sr0, sector 0
[ 187.104486] VFS: Dirty inode writeback failed for block device sr0 (err=-5).
[ 198.113128] ------------[ cut here ]------------
[ 198.113838] WARNING: CPU: 2 PID: 26691 at kernel/workqueue.c:4042
destroy_workqueue+0x17b/0x630
[ 198.114628] Modules linked in:
[ 198.115009] CPU: 2 PID: 26691 Comm: syz-executor Not tainted
4.8.0-rc5-next-20160909+ #23
[ 198.115615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 198.116032] ffff880060f4f880 ffffffff82d88e39 ffffffff87098a20
fffffbfff10d6ad0
[ 198.116032] 0000000000000000 0000000000000000 ffffffff87098a20
ffffffff813b5abb
[ 198.116032] 0000000000000009 ffff880060f4f8c8 ffffffff81354c3f
ffff880060f4f8e8
[ 198.116032] Call Trace:
[ 198.116032] [<ffffffff82d88e39>] dump_stack+0x12e/0x185
[ 198.116032] [<ffffffff813b5abb>] ? destroy_workqueue+0x17b/0x630
[ 198.116032] [<ffffffff81354c3f>] __warn+0x19f/0x1e0
[ 198.116032] [<ffffffff81354e4c>] warn_slowpath_null+0x2c/0x40
[ 198.116032] [<ffffffff813b5abb>] destroy_workqueue+0x17b/0x630
[ 198.116032] [<ffffffff8642b43f>] hci_unregister_dev+0x3df/0x8a0
[ 198.116032] [<ffffffff8195bc70>] ? fcntl_setlk+0xc30/0xc30
[ 198.116032] [<ffffffff84c86b70>] ? vhci_flush+0x50/0x50
[ 198.116032] [<ffffffff84c86be6>] vhci_release+0x76/0xf0
[ 198.116032] [<ffffffff8183c71c>] __fput+0x28c/0x780
[ 198.116032] [<ffffffff8183cc95>] ____fput+0x15/0x20
[ 198.116032] [<ffffffff813bc6d3>] task_work_run+0xf3/0x170
[ 198.116032] [<ffffffff81364c58>] do_exit+0x868/0x2de0
[ 198.116032] [<ffffffff81934de0>] ? userfaultfd_ioctl+0x27b0/0x27b0
[ 198.116032] [<ffffffff8146c180>] ? debug_check_no_locks_freed+0x3c0/0x3c0
[ 198.116032] [<ffffffff813643f0>] ? mm_update_next_owner+0x640/0x640
[ 198.116032] [<ffffffff819323d0>] ? userfaultfd_release+0x620/0x620
[ 198.116032] [<ffffffff8137fd32>] ? __dequeue_signal+0x132/0x540
[ 198.116032] [<ffffffff8137f896>] ? recalc_sigpending_tsk+0xd6/0x180
[ 198.116032] [<ffffffff81367348>] do_group_exit+0x108/0x330
[ 198.116032] [<ffffffff8138adea>] get_signal+0x62a/0x15d0
[ 198.116032] [<ffffffff811d0333>] do_signal+0x83/0x1e20
[ 198.116032] [<ffffffff81761920>] ? __pmd_alloc+0x3f0/0x3f0
[ 198.116032] [<ffffffff811d02b0>] ? setup_sigcontext+0x7d0/0x7d0
[ 198.116032] [<ffffffff8125d3c5>] ? kvm_clock_get_cycles+0x25/0x30
[ 198.116032] [<ffffffff817720d0>] ? find_vma+0x30/0x150
[ 198.116032] [<ffffffff8126b128>] ? __do_page_fault+0x198/0xbb0
[ 198.116032] [<ffffffff81510500>] ? do_futex+0x17d0/0x17d0
[ 198.116032] [<ffffffff81006335>] exit_to_usermode_loop+0x1a5/0x210
[ 198.116032] [<ffffffff81008384>] prepare_exit_to_usermode+0x154/0x190
[ 198.116032] [<ffffffff86df20b4>] retint_user+0x8/0x23
[ 198.137668] ---[ end trace 1beeef55c15054c4 ]---
[ 198.138229] Showing busy workqueues and worker pools:
[ 198.219611] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 198.220424] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 198.221207] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 198.222086] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 04 00
[ 198.222872] blk_update_request: critical target error, dev sr0, sector 0
[ 198.223637] buffer_io_error: 78 callbacks suppressed
[ 198.224152] Buffer I/O error on dev sr0, logical block 0, lost async
page write
[ 198.225336] Buffer I/O error on dev sr0, logical block 1, lost async
page write
[ 198.226211] Buffer I/O error on dev sr0, logical block 2, lost async
page write
[ 198.227031] Buffer I/O error on dev sr0, logical block 3, lost async
page write
[ 198.227887] Buffer I/O error on dev sr0, logical block 4, lost async
page write
[ 198.228841] Buffer I/O error on dev sr0, logical block 5, lost async
page write
[ 198.229653] Buffer I/O error on dev sr0, logical block 6, lost async
page write
[ 198.230344] Buffer I/O error on dev sr0, logical block 7, lost async
page write
[ 198.231156] Buffer I/O error on dev sr0, logical block 8, lost async
page write
[ 198.231973] Buffer I/O error on dev sr0, logical block 9, lost async
page write
[ 198.244460] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 198.245272] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 198.245939] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 198.246554] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 02 00
[ 198.247265] blk_update_request: critical target error, dev sr0, sector 0
[ 198.275421] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 198.276249] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 198.276865] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 198.277466] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 04 00
[ 198.277998] blk_update_request: critical target error, dev sr0, sector 0
[ 198.333417] sr 1:0:0:0: [sr0] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[ 198.334161] sr 1:0:0:0: [sr0] tag#0 Sense Key : Illegal Request [current]
[ 198.334734] sr 1:0:0:0: [sr0] tag#0 Add. Sense: Invalid command operation code
[ 198.335433] sr 1:0:0:0: [sr0] tag#0 CDB: Write(10) 2a 00 00 00 00 00
00 00 04 00
[ 198.336095] blk_update_request: critical target error, dev sr0, sector 0
[ 198.337116] VFS: Dirty inode writeback failed for block device sr0 (err=-5).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-09-10  9:33                                 ` Dmitry Vyukov
@ 2016-09-13 15:35                                   ` Tejun Heo
  2016-09-13 18:14                                       ` Jiri Slaby
  0 siblings, 1 reply; 29+ messages in thread
From: Tejun Heo @ 2016-09-13 15:35 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Jiri Slaby, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Takashi Iwai

Hello,

On Sat, Sep 10, 2016 at 11:33:48AM +0200, Dmitry Vyukov wrote:
> Hit the WARNING with the patch. It showed "Showing busy workqueues and
> worker pools:" after the WARNING, but then no queue info. Was it
> already destroyed and removed from the list?...

Hmm...  It either means that the work item which was in flight when
WARN_ON() ran finished by the time the debug printout got to it or
that it's something unrelated to busy work items.

> [ 198.113838] WARNING: CPU: 2 PID: 26691 at kernel/workqueue.c:4042
> destroy_workqueue+0x17b/0x630

I don't seem to have the same source code that you have.  Which exact
WARN_ON() is this?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-09-13 18:14                                       ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-09-13 18:14 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Takashi Iwai

On 09/13/2016, 05:35 PM, Tejun Heo wrote:
> Hello,
> 
> On Sat, Sep 10, 2016 at 11:33:48AM +0200, Dmitry Vyukov wrote:
>> Hit the WARNING with the patch. It showed "Showing busy workqueues and
>> worker pools:" after the WARNING, but then no queue info. Was it
>> already destroyed and removed from the list?...
> 
> Hmm...  It either means that the work item which was in flight when
> WARN_ON() ran finished by the time the debug printout got to it or
> that it's something unrelated to busy work items.
> 
>> [ 198.113838] WARNING: CPU: 2 PID: 26691 at kernel/workqueue.c:4042
>> destroy_workqueue+0x17b/0x630
> 
> I don't seem to have the same source code that you have.  Which exact
> WARN_ON() is this?

I assume Dmitry sees the same what I am still seeing, so I reported this
some time ago:
https://lkml.org/lkml/2016/3/21/492

This warning is trigerred there and still occurs with "HEAD":
  (pwq != wq->dfl_pwq) && (pwq->refcnt > 1)
and the state dump is in the log empty too:
destroy_workqueue: name='hci0' pwq=ffff88006b5c8f00
wq->dfl_pwq=ffff88006b5c9b00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
  pwq 13:
 cpus=2-3 node=1 flags=0x4 nice=-20 active=0/1
    in-flight: 2669:wq_barrier_func

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
@ 2016-09-13 18:14                                       ` Jiri Slaby
  0 siblings, 0 replies; 29+ messages in thread
From: Jiri Slaby @ 2016-09-13 18:14 UTC (permalink / raw)
  To: Tejun Heo, Dmitry Vyukov
  Cc: Marcel Holtmann, Gustavo Padovan, Johan Hedberg, David S. Miller,
	linux-bluetooth, netdev, LKML, syzkaller, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Takashi Iwai

On 09/13/2016, 05:35 PM, Tejun Heo wrote:
> Hello,
> 
> On Sat, Sep 10, 2016 at 11:33:48AM +0200, Dmitry Vyukov wrote:
>> Hit the WARNING with the patch. It showed "Showing busy workqueues and
>> worker pools:" after the WARNING, but then no queue info. Was it
>> already destroyed and removed from the list?...
> 
> Hmm...  It either means that the work item which was in flight when
> WARN_ON() ran finished by the time the debug printout got to it or
> that it's something unrelated to busy work items.
> 
>> [ 198.113838] WARNING: CPU: 2 PID: 26691 at kernel/workqueue.c:4042
>> destroy_workqueue+0x17b/0x630
> 
> I don't seem to have the same source code that you have.  Which exact
> WARN_ON() is this?

I assume Dmitry sees the same what I am still seeing, so I reported this
some time ago:
https://lkml.org/lkml/2016/3/21/492

This warning is trigerred there and still occurs with "HEAD":
  (pwq != wq->dfl_pwq) && (pwq->refcnt > 1)
and the state dump is in the log empty too:
destroy_workqueue: name='hci0' pwq=ffff88006b5c8f00
wq->dfl_pwq=ffff88006b5c9b00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
  pwq 13:
 cpus=2-3 node=1 flags=0x4 nice=-20 active=0/1
    in-flight: 2669:wq_barrier_func

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: net/bluetooth: workqueue destruction WARNING in hci_unregister_dev
  2016-09-13 18:14                                       ` Jiri Slaby
  (?)
@ 2016-09-16 20:24                                       ` Tejun Heo
  -1 siblings, 0 replies; 29+ messages in thread
From: Tejun Heo @ 2016-09-16 20:24 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Dmitry Vyukov, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	David S. Miller, linux-bluetooth, netdev, LKML, syzkaller,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet,
	Takashi Iwai

Hello,

On Tue, Sep 13, 2016 at 08:14:40PM +0200, Jiri Slaby wrote:
> I assume Dmitry sees the same what I am still seeing, so I reported this
> some time ago:
> https://lkml.org/lkml/2016/3/21/492
> 
> This warning is trigerred there and still occurs with "HEAD":
>   (pwq != wq->dfl_pwq) && (pwq->refcnt > 1)
> and the state dump is in the log empty too:
> destroy_workqueue: name='hci0' pwq=ffff88006b5c8f00
> wq->dfl_pwq=ffff88006b5c9b00 pwq->refcnt=2 pwq->nr_active=0 delayed_works:
>   pwq 13:
>  cpus=2-3 node=1 flags=0x4 nice=-20 active=0/1
>     in-flight: 2669:wq_barrier_func

Hmmm... I think it could be from rescuer holding reference on the pwq.
Both cases have WQ_MEM_RECLAIM and it could be that rescuer was still
in flight (even without work items pending) when the sanity checks
were done.  The following patch moves the sanity checks after rescuer
destruction.  Dmitry, Jiri, can you please see whether the warning
goes away with this patch?

Thanks.

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 984f6ff..e8046a1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4042,8 +4042,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 			}
 		}
 
-		if (WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt > 1)) ||
-		    WARN_ON(pwq->nr_active) ||
+		if (WARN_ON(pwq->nr_active) ||
 		    WARN_ON(!list_empty(&pwq->delayed_works))) {
 			mutex_unlock(&wq->mutex);
 			show_workqueue_state();
@@ -4080,6 +4079,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 		for_each_node(node) {
 			pwq = rcu_access_pointer(wq->numa_pwq_tbl[node]);
 			RCU_INIT_POINTER(wq->numa_pwq_tbl[node], NULL);
+			WARN_ON((pwq != wq->dfl_pwq) && (pwq->refcnt != 1));
 			put_pwq_unlocked(pwq);
 		}
 
@@ -4089,6 +4089,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
 		 */
 		pwq = wq->dfl_pwq;
 		wq->dfl_pwq = NULL;
+		WARN_ON(pwq->refcnt != 1);
 		put_pwq_unlocked(pwq);
 	}
 }

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-09-16 20:24 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-26 11:53 net/bluetooth: workqueue destruction WARNING in hci_unregister_dev Dmitry Vyukov
2016-01-26 12:29 ` Dmitry Vyukov
2016-02-18 14:00 ` Jiri Slaby
2016-02-18 14:22   ` Dmitry Vyukov
2016-02-18 17:44     ` Tejun Heo
2016-02-19 10:20       ` Jiri Slaby
2016-02-19 10:20         ` Jiri Slaby
2016-02-19 12:10         ` Jiri Slaby
2016-03-02 15:45           ` Tejun Heo
2016-03-03  9:12             ` Jiri Slaby
2016-03-03  9:12               ` Jiri Slaby
2016-03-11 17:12               ` Tejun Heo
2016-03-17 12:00                 ` Jiri Slaby
2016-03-17 12:00                   ` Jiri Slaby
2016-03-18 20:52                   ` Tejun Heo
2016-03-18 20:52                     ` Tejun Heo
2016-03-21 15:58                     ` Jiri Slaby
2016-03-22  8:09                       ` Jiri Slaby
2016-03-22  8:09                         ` Jiri Slaby
2016-03-22 12:32                         ` Dmitry Vyukov
2016-03-22 12:32                           ` Dmitry Vyukov
2016-09-03 10:58                           ` Dmitry Vyukov
2016-09-05 13:08                             ` Tejun Heo
2016-09-05 13:14                               ` Dmitry Vyukov
2016-09-10  9:33                                 ` Dmitry Vyukov
2016-09-13 15:35                                   ` Tejun Heo
2016-09-13 18:14                                     ` Jiri Slaby
2016-09-13 18:14                                       ` Jiri Slaby
2016-09-16 20:24                                       ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.