All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] upstream test error: WARNING in __queue_work
@ 2022-08-30  2:07 syzbot
  2022-08-30 14:08 ` Lai Jiangshan
  2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
  0 siblings, 2 replies; 15+ messages in thread
From: syzbot @ 2022-08-30  2:07 UTC (permalink / raw)
  To: jiangshanlai, linux-kernel, syzkaller-bugs, tj

Hello,

syzbot found the following issue on:

HEAD commit:    4c612826bec1 Merge tag 'net-6.0-rc3' of git://git.kernel.o..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=120ebce7080000
kernel config:  https://syzkaller.appspot.com/x/.config?x=312be25752c7fe30
dashboard link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+243b7d89777f90f7613b@syzkaller.appspotmail.com

Bluetooth: hci0: command 0x0409 tx timeout
------------[ cut here ]------------
WARNING: CPU: 0 PID: 52 at kernel/workqueue.c:1438 __queue_work+0xe3f/0x1210 kernel/workqueue.c:1438
Modules linked in:
CPU: 0 PID: 52 Comm: kworker/0:2 Not tainted 6.0.0-rc2-syzkaller-00159-g4c612826bec1 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: events hci_cmd_timeout
RIP: 0010:__queue_work+0xe3f/0x1210 kernel/workqueue.c:1438
Code: e0 07 83 c0 03 38 d0 7c 09 84 d2 74 05 e8 29 09 79 00 8b 5b 2c 31 ff 83 e3 20 89 de e8 9a 5f 2d 00 85 db 75 42 e8 d1 62 2d 00 <0f> 0b e9 41 f8 ff ff e8 c5 62 2d 00 0f 0b e9 d3 f7 ff ff e8 b9 62
RSP: 0018:ffffc90000947c60 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff88802c83e200 RCX: 0000000000000000
RDX: ffff88801538a180 RSI: ffffffff814dd75f RDI: ffff88802c83e208
RBP: 0000000000000008 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000200000 R11: 0000000000000000 R12: ffff8880266b4c70
R13: 0000000000000000 R14: ffff888014b1e000 R15: ffff888014b1e000
FS:  0000000000000000(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0003d1e80 CR3: 00000000155b2000 CR4: 0000000000150ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 queue_work_on+0xee/0x110 kernel/workqueue.c:1545
 process_one_work+0x991/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e4/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [syzbot] upstream test error: WARNING in __queue_work
  2022-08-30  2:07 [syzbot] upstream test error: WARNING in __queue_work syzbot
@ 2022-08-30 14:08 ` Lai Jiangshan
  2022-08-30 17:37   ` Luiz Augusto von Dentz
  2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
  1 sibling, 1 reply; 15+ messages in thread
From: Lai Jiangshan @ 2022-08-30 14:08 UTC (permalink / raw)
  To: syzbot
  Cc: LKML, syzkaller-bugs, Tejun Heo, Marcel Holtmann, Johan Hedberg,
	Luiz Augusto von Dentz, linux-bluetooth

CC: BLUETOOTH SUBSYSTEM

It seems that hci_cmd_timeout() queues a work to a destroyed workqueue.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [syzbot] upstream test error: WARNING in __queue_work
  2022-08-30 14:08 ` Lai Jiangshan
@ 2022-08-30 17:37   ` Luiz Augusto von Dentz
  2022-09-02 12:28     ` Aleksandr Nogikh
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2022-08-30 17:37 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: syzbot, LKML, syzkaller-bugs, Tejun Heo, Marcel Holtmann,
	Johan Hedberg, linux-bluetooth

Hi Lai,

On Tue, Aug 30, 2022 at 7:08 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> CC: BLUETOOTH SUBSYSTEM
>
> It seems that hci_cmd_timeout() queues a work to a destroyed workqueue.

Are there any traces or a way to reproduce the problem?

-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-08-30  2:07 [syzbot] upstream test error: WARNING in __queue_work syzbot
  2022-08-30 14:08 ` Lai Jiangshan
@ 2022-09-02 11:23 ` Tetsuo Handa
  2022-09-02 12:00   ` bluez.test.bot
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Tetsuo Handa @ 2022-09-02 11:23 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, Luiz Augusto von Dentz, Schspa Shi
  Cc: syzbot, syzkaller-bugs, jiangshanlai, tj, linux-bluetooth

syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
WQ into hdev->workqueue WQ which is under draining operation [1], for
commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
destruction") does not allow such operation.

The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
queue is drained, only queue chained work") was incomplete.

Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
the queuing operation with RCU read lock in order to avoid calling
queue_delayed_work() after cancel_delayed_work() completed.

Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
Reported-by: syzbot <syzbot+243b7d89777f90f7613b@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
---
This is a difficult to trigger race condition, and therefore reproducer is
not available. Please do logical check in addition to automated testing.

 net/bluetooth/hci_core.c  | 15 +++++++++++++--
 net/bluetooth/hci_event.c |  6 ++++--
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index b3a5a3cc9372..9873d2e67988 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
 
 	/* Cancel these to avoid queueing non-chained pending work */
 	hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
+	/* Wait for
+	 *
+	 *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
+	 *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
+	 *
+	 * inside RCU section to see the flag or complete scheduling.
+	 */
+	synchronize_rcu();
+	/* Explicitly cancel works in case scheduled after setting the flag. */
 	cancel_delayed_work(&hdev->cmd_timer);
 	cancel_delayed_work(&hdev->ncmd_timer);
 
@@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
 			if (res < 0)
 				__hci_cmd_sync_cancel(hdev, -res);
 
+			rcu_read_lock();
 			if (test_bit(HCI_RESET, &hdev->flags) ||
 			    hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
 				cancel_delayed_work(&hdev->cmd_timer);
 			else
-				schedule_delayed_work(&hdev->cmd_timer,
-						      HCI_CMD_TIMEOUT);
+				queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
+						   HCI_CMD_TIMEOUT);
+			rcu_read_unlock();
 		} else {
 			skb_queue_head(&hdev->cmd_q, skb);
 			queue_work(hdev->workqueue, &hdev->cmd_work);
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 6643c9c20fa4..d6f0e6ca0e7e 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
 {
 	cancel_delayed_work(&hdev->cmd_timer);
 
+	rcu_read_lock();
 	if (!test_bit(HCI_RESET, &hdev->flags)) {
 		if (ncmd) {
 			cancel_delayed_work(&hdev->ncmd_timer);
 			atomic_set(&hdev->cmd_cnt, 1);
 		} else {
 			if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
-				schedule_delayed_work(&hdev->ncmd_timer,
-						      HCI_NCMD_TIMEOUT);
+				queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
+						   HCI_NCMD_TIMEOUT);
 		}
 	}
+	rcu_read_unlock();
 }
 
 static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
-- 
2.18.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* RE: Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
@ 2022-09-02 12:00   ` bluez.test.bot
  2022-09-02 18:45   ` [PATCH] " Luiz Augusto von Dentz
  2022-09-19 17:30   ` patchwork-bot+bluetooth
  2 siblings, 0 replies; 15+ messages in thread
From: bluez.test.bot @ 2022-09-02 12:00 UTC (permalink / raw)
  To: linux-bluetooth, penguin-kernel

[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=673604

---Test result---

Test Summary:
CheckPatch                    FAIL      1.66 seconds
GitLint                       PASS      0.82 seconds
SubjectPrefix                 PASS      0.67 seconds
BuildKernel                   PASS      34.07 seconds
BuildKernel32                 PASS      30.21 seconds
Incremental Build with patchesPASS      44.33 seconds
TestRunner: Setup             PASS      512.05 seconds
TestRunner: l2cap-tester      PASS      16.75 seconds
TestRunner: iso-tester        PASS      15.58 seconds
TestRunner: bnep-tester       PASS      6.27 seconds
TestRunner: mgmt-tester       PASS      100.77 seconds
TestRunner: rfcomm-tester     PASS      9.46 seconds
TestRunner: sco-tester        PASS      9.39 seconds
TestRunner: smp-tester        PASS      9.49 seconds
TestRunner: userchan-tester   PASS      6.46 seconds

Details
##############################
Test: CheckPatch - FAIL - 1.66 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works\ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit fatal: unsaf ("ace/src' is owned by someone else)")'
#66: 
commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
destruction") does not allow such operation.

ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit fatal: unsaf ("ace/src' is owned by someone else)")'
#69: 
The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
queue is drained, only queue chained work") was incomplete.

total: 2 errors, 0 warnings, 0 checks, 51 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

/github/workspace/src/12964073.patch has style problems, please review.

NOTE: Ignored message types: UNKNOWN_COMMIT_ID

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.




---
Regards,
Linux Bluetooth


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [syzbot] upstream test error: WARNING in __queue_work
  2022-08-30 17:37   ` Luiz Augusto von Dentz
@ 2022-09-02 12:28     ` Aleksandr Nogikh
  0 siblings, 0 replies; 15+ messages in thread
From: Aleksandr Nogikh @ 2022-09-02 12:28 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Lai Jiangshan, syzbot, LKML,
	'Aleksandr Nogikh' via syzkaller-bugs, Tejun Heo,
	Marcel Holtmann, Johan Hedberg, linux-bluetooth

Hi,

This one has so far happened only once on syzbot, probably it's either
an extremely rare issue or was already solved.

On Tue, Aug 30, 2022 at 7:37 PM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Lai,
>
> On Tue, Aug 30, 2022 at 7:08 AM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
> >
> > CC: BLUETOOTH SUBSYSTEM
> >
> > It seems that hci_cmd_timeout() queues a work to a destroyed workqueue.
>
> Are there any traces or a way to reproduce the problem?

You can take a look at the console log provided in the original bug report:

console output: https://syzkaller.appspot.com/x/log.txt?x=120ebce7080000

Re. reproduction -- syzbot records a test error when it failed to do
the following sequence of steps:
1) Boot a VM and establish an SSH connection to it
2) Upload fuzzer binaries
3) Start fuzzer binaries; these binaries will set up the fuzzing
environment (networking devices, etc)
4) Execute a simple mmap program to check if coverage collection works fine

mmap(0x1ffff000, 0x1000, 0x0, 0x32, 0xffffffffffffffff, 0x0)
mmap(0x20000000, 0x1000000, 0x7, 0x32, 0xffffffffffffffff, 0x0)
map(0x21000000, 0x1000, 0x0, 0x32, 0xffffffffffffffff, 0x0)

It's probably easiest to start syzkaller locally on this exact kernel
revision and see if the fuzzing is able to start. It will perform the
same steps and report an error, if the issue persists.
I've just tried to reproduce this particular bug myself on
4c612826bec1 and everything booted absolutely fine. So probably it was
just a flake.

FWIW syzbot can also perform patch testing for the reported bugs and
output console logs, so it should also simplify the debugging of such
bugs. More details are here:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches

Patch testing can be done if there's a repro, I've just sent a PR
(https://github.com/google/syzkaller/pull/3355) to add testing to the
exception list -- we can retest that without a repro.

Best Regards,
Aleksandr
>
> --
> Luiz Augusto von Dentz
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CABBYNZKNHnrgHfu8JN-kw5UqfEGUVWGyOwK_fLqHP5w8kPc2KA%40mail.gmail.com.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
  2022-09-02 12:00   ` bluez.test.bot
@ 2022-09-02 18:45   ` Luiz Augusto von Dentz
  2022-09-02 21:31     ` Luiz Augusto von Dentz
  2022-09-19 17:30   ` patchwork-bot+bluetooth
  2 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2022-09-02 18:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Marcel Holtmann, Johan Hedberg, Schspa Shi, syzbot,
	syzkaller-bugs, Lai Jiangshan, Tejun Heo, linux-bluetooth

Hi Tetsuo,

On Fri, Sep 2, 2022 at 4:23 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> WQ into hdev->workqueue WQ which is under draining operation [1], for
> commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> destruction") does not allow such operation.
>
> The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> queue is drained, only queue chained work") was incomplete.
>
> Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
> hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
> the queuing operation with RCU read lock in order to avoid calling
> queue_delayed_work() after cancel_delayed_work() completed.

Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
after the cancel pattern? I wonder if wouldn't be better to introduce
some function that disables/enables the workqueue so we don't have to
do extra tracking in the driver/subsystem?

> Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
> Reported-by: syzbot <syzbot+243b7d89777f90f7613b@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
> ---
> This is a difficult to trigger race condition, and therefore reproducer is
> not available. Please do logical check in addition to automated testing.
>
>  net/bluetooth/hci_core.c  | 15 +++++++++++++--
>  net/bluetooth/hci_event.c |  6 ++++--
>  2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b3a5a3cc9372..9873d2e67988 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
>
>         /* Cancel these to avoid queueing non-chained pending work */
>         hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> +       /* Wait for
> +        *
> +        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> +        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> +        *
> +        * inside RCU section to see the flag or complete scheduling.
> +        */
> +       synchronize_rcu();
> +       /* Explicitly cancel works in case scheduled after setting the flag. */
>         cancel_delayed_work(&hdev->cmd_timer);
>         cancel_delayed_work(&hdev->ncmd_timer);
>
> @@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
>                         if (res < 0)
>                                 __hci_cmd_sync_cancel(hdev, -res);
>
> +                       rcu_read_lock();
>                         if (test_bit(HCI_RESET, &hdev->flags) ||
>                             hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>                                 cancel_delayed_work(&hdev->cmd_timer);
>                         else
> -                               schedule_delayed_work(&hdev->cmd_timer,
> -                                                     HCI_CMD_TIMEOUT);
> +                               queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
> +                                                  HCI_CMD_TIMEOUT);
> +                       rcu_read_unlock();
>                 } else {
>                         skb_queue_head(&hdev->cmd_q, skb);
>                         queue_work(hdev->workqueue, &hdev->cmd_work);
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 6643c9c20fa4..d6f0e6ca0e7e 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
>  {
>         cancel_delayed_work(&hdev->cmd_timer);
>
> +       rcu_read_lock();
>         if (!test_bit(HCI_RESET, &hdev->flags)) {
>                 if (ncmd) {
>                         cancel_delayed_work(&hdev->ncmd_timer);
>                         atomic_set(&hdev->cmd_cnt, 1);
>                 } else {
>                         if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> -                               schedule_delayed_work(&hdev->ncmd_timer,
> -                                                     HCI_NCMD_TIMEOUT);
> +                               queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
> +                                                  HCI_NCMD_TIMEOUT);
>                 }
>         }
> +       rcu_read_unlock();
>  }
>
>  static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
> --
> 2.18.4
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-02 18:45   ` [PATCH] " Luiz Augusto von Dentz
@ 2022-09-02 21:31     ` Luiz Augusto von Dentz
  2022-09-03  6:49       ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2022-09-02 21:31 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Marcel Holtmann, Johan Hedberg, Schspa Shi, syzbot,
	syzkaller-bugs, Lai Jiangshan, Tejun Heo, linux-bluetooth

Hi Tetsuo,

On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Tetsuo,
>
> On Fri, Sep 2, 2022 at 4:23 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> > WQ into hdev->workqueue WQ which is under draining operation [1], for
> > commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> > destruction") does not allow such operation.
> >
> > The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> > queue is drained, only queue chained work") was incomplete.
> >
> > Use hdev->workqueue WQ when queuing hdev->{cmd,ncmd}_timer works because
> > hci_{cmd,ncmd}_timeout() calls queue_work(hdev->workqueue). Also, protect
> > the queuing operation with RCU read lock in order to avoid calling
> > queue_delayed_work() after cancel_delayed_work() completed.
>
> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
> after the cancel pattern? I wonder if wouldn't be better to introduce
> some function that disables/enables the workqueue so we don't have to
> do extra tracking in the driver/subsystem?
>
> > Link: https://syzkaller.appspot.com/bug?extid=243b7d89777f90f7613b [1]
> > Reported-by: syzbot <syzbot+243b7d89777f90f7613b@syzkaller.appspotmail.com>
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Fixes: 877afadad2dce8aa ("Bluetooth: When HCI work queue is drained, only queue chained work")
> > ---
> > This is a difficult to trigger race condition, and therefore reproducer is
> > not available. Please do logical check in addition to automated testing.
> >
> >  net/bluetooth/hci_core.c  | 15 +++++++++++++--
> >  net/bluetooth/hci_event.c |  6 ++++--
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index b3a5a3cc9372..9873d2e67988 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -597,6 +597,15 @@ static int hci_dev_do_reset(struct hci_dev *hdev)
> >
> >         /* Cancel these to avoid queueing non-chained pending work */
> >         hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> > +       /* Wait for
> > +        *
> > +        *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > +        *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> > +        *
> > +        * inside RCU section to see the flag or complete scheduling.
> > +        */
> > +       synchronize_rcu();
> > +       /* Explicitly cancel works in case scheduled after setting the flag. */
> >         cancel_delayed_work(&hdev->cmd_timer);
> >         cancel_delayed_work(&hdev->ncmd_timer);
> >
> > @@ -4056,12 +4065,14 @@ static void hci_cmd_work(struct work_struct *work)
> >                         if (res < 0)
> >                                 __hci_cmd_sync_cancel(hdev, -res);
> >
> > +                       rcu_read_lock();
> >                         if (test_bit(HCI_RESET, &hdev->flags) ||
> >                             hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> >                                 cancel_delayed_work(&hdev->cmd_timer);
> >                         else
> > -                               schedule_delayed_work(&hdev->cmd_timer,
> > -                                                     HCI_CMD_TIMEOUT);
> > +                               queue_delayed_work(hdev->workqueue, &hdev->cmd_timer,
> > +                                                  HCI_CMD_TIMEOUT);
> > +                       rcu_read_unlock();
> >                 } else {
> >                         skb_queue_head(&hdev->cmd_q, skb);
> >                         queue_work(hdev->workqueue, &hdev->cmd_work);
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 6643c9c20fa4..d6f0e6ca0e7e 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -3766,16 +3766,18 @@ static inline void handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd)
> >  {
> >         cancel_delayed_work(&hdev->cmd_timer);
> >
> > +       rcu_read_lock();
> >         if (!test_bit(HCI_RESET, &hdev->flags)) {
> >                 if (ncmd) {
> >                         cancel_delayed_work(&hdev->ncmd_timer);
> >                         atomic_set(&hdev->cmd_cnt, 1);
> >                 } else {
> >                         if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> > -                               schedule_delayed_work(&hdev->ncmd_timer,
> > -                                                     HCI_NCMD_TIMEOUT);
> > +                               queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer,
> > +                                                  HCI_NCMD_TIMEOUT);
> >                 }
> >         }
> > +       rcu_read_unlock();
> >  }
> >
> >  static u8 hci_cc_le_read_buffer_size_v2(struct hci_dev *hdev, void *data,
> > --
> > 2.18.4
> >

I was thinking on doing something like the following:

https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16

Since there is no reason to queue any command if we are draining and
are gonna reset at the end it is pretty useless to queue commands at
that point.

>
> --
> Luiz Augusto von Dentz



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-02 21:31     ` Luiz Augusto von Dentz
@ 2022-09-03  6:49       ` Tetsuo Handa
  2022-09-04  2:11         ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-09-03  6:49 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Marcel Holtmann, Johan Hedberg, Schspa Shi, syzbot,
	syzkaller-bugs, Lai Jiangshan, Tejun Heo, linux-bluetooth

On 2022/09/03 6:31, Luiz Augusto von Dentz wrote:
> Hi Tetsuo,
> 
> On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote:
>> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
>> after the cancel pattern?

HCI_CMD_DRAIN_WORKQUEUE does not help for this case.

What extid=243b7d89777f90f7613b is reporting is

  hci_cmd_timeout() {                             hci_dev_do_reset() {
    starts sleeping due to e.g. preemption
                                                    hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag
                                                    cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running
                                                    cancel_delayed_work(&hdev->ncmd_timer);
                                                    drain_workqueue(hdev->workqueue) {
                                                      sets __WQ_DRAINING flag on hdev->workqueue
                                                      starts waiting for completion of all works on hdev->workqueue
    finishes sleeping due to e.g. preemption
    queue_work(hdev->workqueue,  &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue
  }
                                                      finishes waiting for completion of all works on hdev->workqueue
                                                      clears __WQ_DRAINING flag
                                                    }
                                                  }

race condition. Notice that cancel_delayed_work() does not wait for
completion of already started hci_cmd_timeout() callback.

If you need to wait for completion of already started callback,
you need to use _sync version (e.g. cancel_delayed_work_sync()).
And watch out for locking dependency when using _sync version.

>>                           I wonder if wouldn't be better to introduce
>> some function that disables/enables the workqueue so we don't have to
>> do extra tracking in the driver/subsystem?
>>
> 
> I was thinking on doing something like the following:
> 
> https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16

That patch does not close race, for

@@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work)
        BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
               atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
 
+       /* Don't queue while draining */
+       if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
+               return;
        /*
         * BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG.
         * ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM.
         */
+
        /* Send queued commands */
        if (atomic_read(&hdev->cmd_cnt)) {
                skb = skb_dequeue(&hdev->cmd_q);

. In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting.

If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism.

> 
> Since there is no reason to queue any command if we are draining and
> are gonna reset at the end it is pretty useless to queue commands at
> that point.

Then, you can add that check.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-03  6:49       ` Tetsuo Handa
@ 2022-09-04  2:11         ` Luiz Augusto von Dentz
  2022-09-04  2:20           ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2022-09-04  2:11 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Marcel Holtmann, Johan Hedberg, Schspa Shi, syzbot,
	syzkaller-bugs, Lai Jiangshan, Tejun Heo, linux-bluetooth

Hi Tetsuo,

On Fri, Sep 2, 2022 at 11:49 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2022/09/03 6:31, Luiz Augusto von Dentz wrote:
> > Hi Tetsuo,
> >
> > On Fri, Sep 2, 2022 at 11:45 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote:
> >> Didn't we introduce HCI_CMD_DRAIN_WORKQUEUE exactly to avoid queuing
> >> after the cancel pattern?
>
> HCI_CMD_DRAIN_WORKQUEUE does not help for this case.
>
> What extid=243b7d89777f90f7613b is reporting is
>
>   hci_cmd_timeout() {                             hci_dev_do_reset() {
>     starts sleeping due to e.g. preemption
>                                                     hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE); // Sets HCI_CMD_DRAIN_WORKQUEUE flag
>                                                     cancel_delayed_work(&hdev->cmd_timer); // does nothing because hci_cmd_timeout() is already running
>                                                     cancel_delayed_work(&hdev->ncmd_timer);
>                                                     drain_workqueue(hdev->workqueue) {
>                                                       sets __WQ_DRAINING flag on hdev->workqueue
>                                                       starts waiting for completion of all works on hdev->workqueue
>     finishes sleeping due to e.g. preemption
>     queue_work(hdev->workqueue,  &hdev->cmd_work) // <= complains attempt to queue work from system_wq into __WQ_DRAINING hdev->workqueue

And we can check for __WQ_DRAINING? Anyway checking
HCI_CMD_DRAIN_WORKQUEUE seems useless so we either have to check if
queue_work can be used or not.

>   }
>                                                       finishes waiting for completion of all works on hdev->workqueue
>                                                       clears __WQ_DRAINING flag
>                                                     }
>                                                   }
>
> race condition. Notice that cancel_delayed_work() does not wait for
> completion of already started hci_cmd_timeout() callback.
>
> If you need to wait for completion of already started callback,
> you need to use _sync version (e.g. cancel_delayed_work_sync()).
> And watch out for locking dependency when using _sync version.
>
> >>                           I wonder if wouldn't be better to introduce
> >> some function that disables/enables the workqueue so we don't have to
> >> do extra tracking in the driver/subsystem?
> >>
> >
> > I was thinking on doing something like the following:
> >
> > https://gist.github.com/Vudentz/a2288015fedbed366fcdb612264a9d16
>
> That patch does not close race, for
>
> @@ -4037,6 +4038,10 @@ static void hci_cmd_work(struct work_struct *work)
>         BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name,
>                atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q));
>
> +       /* Don't queue while draining */
> +       if (hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> +               return;
>         /*
>          * BUG: WE ARE FREE TO SLEEP FOR ARBITRARY DURATION IMMEDIATELY AFTER CHECKING THE FLAG.
>          * ANY "TEST AND DO SOMETHING" NEEDS TO BE PROTECTED BY A LOCK MECHANISM.
>          */

Then we need a lock not a flag.

>         /* Send queued commands */
>         if (atomic_read(&hdev->cmd_cnt)) {
>                 skb = skb_dequeue(&hdev->cmd_q);
>
> . In other words, HCI_CMD_DRAIN_WORKQUEUE does not fix what extid=63bed493aebbf6872647 is reporting.
>
> If "TEST AND DO SOMETHING" does not sleep, RCU is a handy lock mechanism.
>
> >
> > Since there is no reason to queue any command if we are draining and
> > are gonna reset at the end it is pretty useless to queue commands at
> > that point.
>
> Then, you can add that check.
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-04  2:11         ` Luiz Augusto von Dentz
@ 2022-09-04  2:20           ` Tejun Heo
  2022-09-05  8:24             ` Schspa Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2022-09-04  2:20 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Tetsuo Handa, Marcel Holtmann, Johan Hedberg, Schspa Shi, syzbot,
	syzkaller-bugs, Lai Jiangshan, linux-bluetooth

Hello,

On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
> And we can check for __WQ_DRAINING? Anyway checking

Please don't do that. That's an internal flag. It shouldn't be *that*
difficult to avoid this without peeking into wq internal state.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-04  2:20           ` Tejun Heo
@ 2022-09-05  8:24             ` Schspa Shi
  2022-09-05 11:23               ` Tetsuo Handa
  0 siblings, 1 reply; 15+ messages in thread
From: Schspa Shi @ 2022-09-05  8:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Luiz Augusto von Dentz, Tetsuo Handa, Marcel Holtmann,
	Johan Hedberg, syzbot, syzkaller-bugs, Lai Jiangshan,
	linux-bluetooth


Tejun Heo <tj@kernel.org> writes:

> Hello,
>
> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>> And we can check for __WQ_DRAINING? Anyway checking
>
> Please don't do that. That's an internal flag. It shouldn't be *that*
> difficult to avoid this without peeking into wq internal state.
>
> Thanks.

It seems we only need to change hdev->{cmd,ncmd}_timer to
hdev->workqueue, there will be no race because drain_workqueue will
flush all pending work internally.
Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
cancel and flushed all the delayed work.

-- 
BRs
Schspa Shi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-05  8:24             ` Schspa Shi
@ 2022-09-05 11:23               ` Tetsuo Handa
  2022-09-05 12:26                 ` Schspa Shi
  0 siblings, 1 reply; 15+ messages in thread
From: Tetsuo Handa @ 2022-09-05 11:23 UTC (permalink / raw)
  To: Schspa Shi
  Cc: Luiz Augusto von Dentz, Marcel Holtmann, Johan Hedberg, syzbot,
	syzkaller-bugs, Lai Jiangshan, linux-bluetooth, Tejun Heo

On 2022/09/05 17:24, Schspa Shi wrote:
> 
> Tejun Heo <tj@kernel.org> writes:
> 
>> Hello,
>>
>> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>>> And we can check for __WQ_DRAINING? Anyway checking
>>
>> Please don't do that. That's an internal flag. It shouldn't be *that*
>> difficult to avoid this without peeking into wq internal state.
>>
>> Thanks.
> 
> It seems we only need to change hdev->{cmd,ncmd}_timer to
> hdev->workqueue, there will be no race because drain_workqueue will
> flush all pending work internally.

True for queue_work(), not always true for queue_delayed_work(). Explained below.

> Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
> cancel and flushed all the delayed work.
> 

If you don't mind calling

  queue_work(&hdev->cmd_work) followed by hci_cmd_work() (case A below)

and/or

  queue_delayed_work(&hdev->ncmd_timer) potentially followed by hci_ncmd_timeout()/hci_reset_dev() (case B and C below)

after observing HCI_CMD_DRAIN_WORKQUEUE flag.
We need to use RCU protection if you mind one of these.



Case A:

hci_dev_do_reset() {
                                      hci_cmd_work() {
                                        if (test_bit(HCI_RESET, &hdev->flags) ||
                                          hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
                                            cancel_delayed_work(&hdev->cmd_timer);
                                          else
                                            queue_delayed_work(hdev->workqueue, &hdev->cmd_timer, HCI_CMD_TIMEOUT);
                                        } else {
                                          skb_queue_head(&hdev->cmd_q, skb);
  hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
  cancel_delayed_work(&hdev->cmd_timer);
  cancel_delayed_work(&hdev->ncmd_timer);
                                          queue_work(hdev->workqueue, &hdev->cmd_work); // Queuing after setting HCI_CMD_DRAIN_WORKQUEUE despite the intent of HCI_CMD_DRAIN_WORKQUEUE...
  drain_workqueue(hdev->workqueue); // Will wait for hci_cmd_timeout() queued by queue_work() to complete.

                                        }

  // Actual flush() happens here.

  hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}



Case B:

hci_dev_do_reset() {
                                          handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
                                            if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
  hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
  cancel_delayed_work(&hdev->cmd_timer);
                                              queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
  cancel_delayed_work(&hdev->ncmd_timer); // May or may not cancel hci_ncmd_timeout() queued by queue_delayed_work().
  drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if cancel_delayed_work() failed to cancel.

                                          }

  // Actual flush() happens here.

  hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}



Case C:

hci_dev_do_reset() {
                                          handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
                                            if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
  hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
  cancel_delayed_work(&hdev->cmd_timer);
  cancel_delayed_work(&hdev->ncmd_timer); // Does nothing.
                                              queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
  drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if delay timer has expired.

                                          }

  // Actual flush() happens here, but hci_ncmd_timeout() queued by queue_delayed_work() can be running if delay timer has not expired as of calling drain_workqueue().

  hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-05 11:23               ` Tetsuo Handa
@ 2022-09-05 12:26                 ` Schspa Shi
  0 siblings, 0 replies; 15+ messages in thread
From: Schspa Shi @ 2022-09-05 12:26 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Luiz Augusto von Dentz, Marcel Holtmann, Johan Hedberg, syzbot,
	syzkaller-bugs, Lai Jiangshan, linux-bluetooth, Tejun Heo


Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> On 2022/09/05 17:24, Schspa Shi wrote:
>> 
>> Tejun Heo <tj@kernel.org> writes:
>> 
>>> Hello,
>>>
>>> On Sat, Sep 03, 2022 at 07:11:18PM -0700, Luiz Augusto von Dentz wrote:
>>>> And we can check for __WQ_DRAINING? Anyway checking
>>>
>>> Please don't do that. That's an internal flag. It shouldn't be *that*
>>> difficult to avoid this without peeking into wq internal state.
>>>
>>> Thanks.
>> 
>> It seems we only need to change hdev->{cmd,ncmd}_timer to
>> hdev->workqueue, there will be no race because drain_workqueue will
>> flush all pending work internally.
>
> True for queue_work(), not always true for queue_delayed_work(). Explained below.
>

Ok, you are right, got it now.

>> Any new timeout work will see HCI_CMD_DRAIN_WORKQUEUE flags after we
>> cancel and flushed all the delayed work.
>> 
>
> If you don't mind calling
>
>   queue_work(&hdev->cmd_work) followed by hci_cmd_work() (case A below)
>
> and/or
>
>   queue_delayed_work(&hdev->ncmd_timer) potentially followed by hci_ncmd_timeout()/hci_reset_dev() (case B and C below)
>
> after observing HCI_CMD_DRAIN_WORKQUEUE flag.
> We need to use RCU protection if you mind one of these.
>
>
>
> Case A:
>
> hci_dev_do_reset() {
>                                       hci_cmd_work() {
>                                         if (test_bit(HCI_RESET, &hdev->flags) ||
>                                           hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>                                             cancel_delayed_work(&hdev->cmd_timer);
>                                           else
>                                             queue_delayed_work(hdev->workqueue, &hdev->cmd_timer, HCI_CMD_TIMEOUT);
>                                         } else {
>                                           skb_queue_head(&hdev->cmd_q, skb);
>   hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>   cancel_delayed_work(&hdev->cmd_timer);
>   cancel_delayed_work(&hdev->ncmd_timer);
>                                           queue_work(hdev->workqueue, &hdev->cmd_work); // Queuing after setting HCI_CMD_DRAIN_WORKQUEUE despite the intent of HCI_CMD_DRAIN_WORKQUEUE...
>   drain_workqueue(hdev->workqueue); // Will wait for hci_cmd_timeout() queued by queue_work() to complete.
>
>                                         }
>
>   // Actual flush() happens here.
>
>   hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }
>
>
>
> Case B:
>
> hci_dev_do_reset() {
>                                           handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
>                                             if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>   hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>   cancel_delayed_work(&hdev->cmd_timer);
>                                               queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
>   cancel_delayed_work(&hdev->ncmd_timer); // May or may not cancel hci_ncmd_timeout() queued by queue_delayed_work().
>   drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if cancel_delayed_work() failed to cancel.
>
>                                           }
>
>   // Actual flush() happens here.
>
>   hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }
>
>
>
> Case C:
>
> hci_dev_do_reset() {
>                                           handle_cmd_cnt_and_timer(struct hci_dev *hdev, u8 ncmd) {
>                                             if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
>   hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
>   cancel_delayed_work(&hdev->cmd_timer);
>   cancel_delayed_work(&hdev->ncmd_timer); // Does nothing.
>                                               queue_delayed_work(hdev->workqueue, &hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
>   drain_workqueue(hdev->workqueue); // Will wait for hci_ncmd_timeout() queued by queue_delayed_work() to complete if delay timer has expired.
>
>                                           }
>
>   // Actual flush() happens here, but hci_ncmd_timeout() queued by queue_delayed_work() can be running if delay timer has not expired as of calling drain_workqueue().
>
>   hci_dev_clear_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> }

-- 
BRs
Schspa Shi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
  2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
  2022-09-02 12:00   ` bluez.test.bot
  2022-09-02 18:45   ` [PATCH] " Luiz Augusto von Dentz
@ 2022-09-19 17:30   ` patchwork-bot+bluetooth
  2 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+bluetooth @ 2022-09-19 17:30 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: marcel, johan.hedberg, luiz.dentz, schspa,
	syzbot+243b7d89777f90f7613b, syzkaller-bugs, jiangshanlai, tj,
	linux-bluetooth

Hello:

This patch was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <luiz.von.dentz@intel.com>:

On Fri, 2 Sep 2022 20:23:48 +0900 you wrote:
> syzbot is reporting attempt to schedule hdev->cmd_work work from system_wq
> WQ into hdev->workqueue WQ which is under draining operation [1], for
> commit c8efcc2589464ac7 ("workqueue: allow chained queueing during
> destruction") does not allow such operation.
> 
> The check introduced by commit 877afadad2dce8aa ("Bluetooth: When HCI work
> queue is drained, only queue chained work") was incomplete.
> 
> [...]

Here is the summary with links:
  - Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works
    https://git.kernel.org/bluetooth/bluetooth-next/c/deee93d13d38

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-19 17:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-30  2:07 [syzbot] upstream test error: WARNING in __queue_work syzbot
2022-08-30 14:08 ` Lai Jiangshan
2022-08-30 17:37   ` Luiz Augusto von Dentz
2022-09-02 12:28     ` Aleksandr Nogikh
2022-09-02 11:23 ` [PATCH] Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works Tetsuo Handa
2022-09-02 12:00   ` bluez.test.bot
2022-09-02 18:45   ` [PATCH] " Luiz Augusto von Dentz
2022-09-02 21:31     ` Luiz Augusto von Dentz
2022-09-03  6:49       ` Tetsuo Handa
2022-09-04  2:11         ` Luiz Augusto von Dentz
2022-09-04  2:20           ` Tejun Heo
2022-09-05  8:24             ` Schspa Shi
2022-09-05 11:23               ` Tetsuo Handa
2022-09-05 12:26                 ` Schspa Shi
2022-09-19 17:30   ` patchwork-bot+bluetooth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.