* [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
@ 2021-12-08 20:36 syzbot
2021-12-09 1:59 ` syzbot
0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2021-12-08 20:36 UTC (permalink / raw)
To: changbin.du, christian.brauner, davem, edumazet, kuba,
linux-kernel, netdev, syzkaller-bugs, yajun.deng
Hello,
syzbot found the following issue on:
HEAD commit: 4eee8d0b64ec Add linux-next specific files for 20211208
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13d1329db00000
kernel config: https://syzkaller.appspot.com/x/.config?x=20b74d9da4ce1ef1
dashboard link: https://syzkaller.appspot.com/bug?extid=485cc00ea7cf41dfdbf1
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e7e955b00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed3641b00000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+485cc00ea7cf41dfdbf1@syzkaller.appspotmail.com
BUG: sleeping function called from invalid context at kernel/workqueue.c:3039
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8, name: kworker/u4:0
preempt_count: 101, expected: 0
RCU nest depth: 0, expected: 0
5 locks held by kworker/u4:0/8:
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:635 [inline]
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:662 [inline]
#0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2289
#1: ffffc90000cd7db0 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2293
#2: ffffffff8d308150 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x9b/0xb00 net/core/net_namespace.c:555
#3: ffff88806f0ab8f0 (&ent->pde_unload_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
#3: ffff88806f0ab8f0 (&ent->pde_unload_lock){+.+.}-{2:2}, at: proc_entry_rundown+0xe7/0x1d0 fs/proc/inode.c:266
#4: ffffc90000dc0d70 ((&dum_hcd->timer)){+.-.}-{0:0}, at: lockdep_copy_map include/linux/lockdep.h:35 [inline]
#4: ffffc90000dc0d70 ((&dum_hcd->timer)){+.-.}-{0:0}, at: call_timer_fn+0xd5/0x6b0 kernel/time/timer.c:1411
irq event stamp: 268125
hardirqs last enabled at (268124): [<ffffffff8956ab6f>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (268124): [<ffffffff8956ab6f>] _raw_spin_unlock_irq+0x1f/0x40 kernel/locking/spinlock.c:202
hardirqs last disabled at (268125): [<ffffffff8956a99e>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (268125): [<ffffffff8956a99e>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last enabled at (268062): [<ffffffff885ab700>] spin_unlock_bh include/linux/spinlock.h:399 [inline]
softirqs last enabled at (268062): [<ffffffff885ab700>] rxrpc_release_sock net/rxrpc/af_rxrpc.c:876 [inline]
softirqs last enabled at (268062): [<ffffffff885ab700>] rxrpc_release+0x1f0/0x5a0 net/rxrpc/af_rxrpc.c:917
softirqs last disabled at (268119): [<ffffffff8146b723>] invoke_softirq kernel/softirq.c:432 [inline]
softirqs last disabled at (268119): [<ffffffff8146b723>] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 5.16.0-rc4-next-20211208-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9583
start_flush_work kernel/workqueue.c:3039 [inline]
__flush_work+0x109/0xb10 kernel/workqueue.c:3103
__cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3191
hci_cmd_sync_cancel net/bluetooth/hci_sync.c:346 [inline]
hci_cmd_sync_cancel+0xe1/0x170 net/bluetooth/hci_sync.c:338
btusb_intr_complete+0x3d3/0x4a0 drivers/bluetooth/btusb.c:969
__usb_hcd_giveback_urb+0x2b0/0x5c0 drivers/usb/core/hcd.c:1656
usb_hcd_giveback_urb+0x367/0x410 drivers/usb/core/hcd.c:1726
dummy_timer+0x11f9/0x32b0 drivers/usb/gadget/udc/dummy_hcd.c:1987
call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
expire_timers kernel/time/timer.c:1466 [inline]
__run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
__run_timers kernel/time/timer.c:1715 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:lock_acquire+0x1ef/0x510 kernel/locking/lockdep.c:5607
Code: 9d a5 7e 83 f8 01 0f 85 b4 02 00 00 9c 58 f6 c4 02 0f 85 9f 02 00 00 48 83 7c 24 08 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c3 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 48 8b 84 24
RSP: 0000:ffffc90000cd7928 EFLAGS: 00000206
RAX: dffffc0000000000 RBX: 1ffff9200019af27 RCX: 0000000000000001
RDX: 1ffff110021cfc44 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: 00000000000c1948 R09: 0000000000000001
R10: fffffbfff2024ea1 R11: 1ffffffff1ee5241 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88806f0ab8f0 R15: 0000000000000000
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:354 [inline]
proc_entry_rundown+0xe7/0x1d0 fs/proc/inode.c:266
remove_proc_subtree+0x25c/0x500 fs/proc/generic.c:767
proc_remove fs/proc/generic.c:790 [inline]
proc_remove+0x66/0x90 fs/proc/generic.c:787
afs_proc_cleanup+0x34/0x70 fs/afs/proc.c:703
afs_net_exit+0x17d/0x320 fs/afs/main.c:159
ops_exit_list+0xb0/0x160 net/core/net_namespace.c:168
cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:593
process_one_work+0x9b2/0x1690 kernel/workqueue.c:2318
worker_thread+0x658/0x11f0 kernel/workqueue.c:2465
kthread+0x405/0x4f0 kernel/kthread.c:345
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
----------------
Code disassembly (best guess):
0: 9d popfq
1: a5 movsl %ds:(%rsi),%es:(%rdi)
2: 7e 83 jle 0xffffff87
4: f8 clc
5: 01 0f add %ecx,(%rdi)
7: 85 b4 02 00 00 9c 58 test %esi,0x589c0000(%rdx,%rax,1)
e: f6 c4 02 test $0x2,%ah
11: 0f 85 9f 02 00 00 jne 0x2b6
17: 48 83 7c 24 08 00 cmpq $0x0,0x8(%rsp)
1d: 74 01 je 0x20
1f: fb sti
20: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
27: fc ff df
* 2a: 48 01 c3 add %rax,%rbx <-- trapping instruction
2d: 48 c7 03 00 00 00 00 movq $0x0,(%rbx)
34: 48 c7 43 08 00 00 00 movq $0x0,0x8(%rbx)
3b: 00
3c: 48 rex.W
3d: 8b .byte 0x8b
3e: 84 .byte 0x84
3f: 24 .byte 0x24
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
2021-12-08 20:36 [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel syzbot
@ 2021-12-09 1:59 ` syzbot
2021-12-09 10:06 ` Oliver Neukum
0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2021-12-09 1:59 UTC (permalink / raw)
To: Thinh.Nguyen, bberg, changbin.du, christian.brauner, davem,
edumazet, gregkh, johan.hedberg, kuba, linux-bluetooth,
linux-kernel, linux-usb, luiz.dentz, luiz.von.dentz, marcel,
mathias.nyman, netdev, stern, syzkaller-bugs, yajun.deng
syzbot has bisected this issue to:
commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
Author: Benjamin Berg <bberg@redhat.com>
Date: Fri Dec 3 14:59:02 2021 +0000
Bluetooth: btusb: Cancel sync commands for certain URB errors
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10a0fdc5b00000
start commit: 4eee8d0b64ec Add linux-next specific files for 20211208
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=12a0fdc5b00000
console output: https://syzkaller.appspot.com/x/log.txt?x=14a0fdc5b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=20b74d9da4ce1ef1
dashboard link: https://syzkaller.appspot.com/bug?extid=485cc00ea7cf41dfdbf1
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e7e955b00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10ed3641b00000
Reported-by: syzbot+485cc00ea7cf41dfdbf1@syzkaller.appspotmail.com
Fixes: c97a747efc93 ("Bluetooth: btusb: Cancel sync commands for certain URB errors")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
2021-12-09 1:59 ` syzbot
@ 2021-12-09 10:06 ` Oliver Neukum
2021-12-09 12:46 ` Benjamin Berg
0 siblings, 1 reply; 5+ messages in thread
From: Oliver Neukum @ 2021-12-09 10:06 UTC (permalink / raw)
To: syzbot, Thinh.Nguyen, bberg, changbin.du, christian.brauner,
davem, edumazet, gregkh, johan.hedberg, kuba, linux-bluetooth,
linux-kernel, linux-usb, luiz.dentz, luiz.von.dentz, marcel,
mathias.nyman, netdev, stern, syzkaller-bugs, yajun.deng
On 09.12.21 02:59, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
> Author: Benjamin Berg <bberg@redhat.com>
> Date: Fri Dec 3 14:59:02 2021 +0000
>
> Bluetooth: btusb: Cancel sync commands for certain URB errors
Hi,
looking at the patch, it sleeps in an interrupt handler (or equivalent)
in two places:
@@ -933,6 +933,8 @@ static void btusb_intr_complete(struct urb *urb)
if (err != -EPERM && err != -ENODEV)
bt_dev_err(hdev, "urb %p failed to resubmit (%d)",
urb, -err);
+ if (err != -EPERM)
+ hci_cmd_sync_cancel(hdev, -err);
@@ -1331,10 +1335,13 @@ static void btusb_tx_complete(struct urb *urb)
if (!test_bit(HCI_RUNNING, &hdev->flags))
goto done;
- if (!urb->status)
+ if (!urb->status) {
hdev->stat.byte_tx += urb->transfer_buffer_length;
- else
+ } else {
+ if (hci_skb_pkt_type(skb) == HCI_COMMAND_PKT)
+ hci_cmd_sync_cancel(hdev, -urb->status);
As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
just not
an approach you can take. It looks like asynchronously canceling the
scheduled work
would result in a race, so I would for now just revert.
What issue exactly is this trying to fix or improve?
Regards
Oliver
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
2021-12-09 10:06 ` Oliver Neukum
@ 2021-12-09 12:46 ` Benjamin Berg
2021-12-09 13:21 ` Oliver Neukum
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Berg @ 2021-12-09 12:46 UTC (permalink / raw)
To: Oliver Neukum, syzbot, Thinh.Nguyen, changbin.du,
christian.brauner, davem, edumazet, gregkh, johan.hedberg, kuba,
linux-bluetooth, linux-kernel, linux-usb, luiz.dentz,
luiz.von.dentz, marcel, mathias.nyman, netdev, stern,
syzkaller-bugs, yajun.deng
[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]
Hi,
On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
> just not
> an approach you can take. It looks like asynchronously canceling the
> scheduled work
> would result in a race, so I would for now just revert.
Right, so this needs to be pushed into a workqueue instead, I suppose.
> What issue exactly is this trying to fix or improve?
The problem is aborting long-running synchronous operations. i.e.
without this patchset, USB enumeration will hang for 10s if a USB
bluetooth device disappears during firmware loading. This is because
even though the USB device is gone and all URB submissions fail, the
operation will only be aborted after the internal timeout happens.
The device in turn disappears because an rfkill switch is blocked and
the platform removes it from the bus. Overall, this can lead to
graphical login to hang as fprintd cannot initialise as it hangs in USB
enumeration.
Benjamin
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
2021-12-09 12:46 ` Benjamin Berg
@ 2021-12-09 13:21 ` Oliver Neukum
0 siblings, 0 replies; 5+ messages in thread
From: Oliver Neukum @ 2021-12-09 13:21 UTC (permalink / raw)
To: Benjamin Berg, Oliver Neukum, syzbot, Thinh.Nguyen, changbin.du,
christian.brauner, davem, edumazet, gregkh, johan.hedberg, kuba,
linux-bluetooth, linux-kernel, linux-usb, luiz.dentz,
luiz.von.dentz, marcel, mathias.nyman, netdev, stern,
syzkaller-bugs, yajun.deng
On 09.12.21 13:46, Benjamin Berg wrote:
Hi,
> On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
>> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
>> just not
>> an approach you can take. It looks like asynchronously canceling the
>> scheduled work
>> would result in a race, so I would for now just revert.
> Right, so this needs to be pushed into a workqueue instead, I suppose.
It looks like overkill, but I have no good alternative to offer either.
>
>> What issue exactly is this trying to fix or improve?
> The problem is aborting long-running synchronous operations. i.e.
> without this patchset, USB enumeration will hang for 10s if a USB
> bluetooth device disappears during firmware loading. This is because
> even though the USB device is gone and all URB submissions fail, the
> operation will only be aborted after the internal timeout happens.
>
I see. Something ought to be done. The issue is in good hands.
Thanks
Oliver
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-12-09 13:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 20:36 [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel syzbot
2021-12-09 1:59 ` syzbot
2021-12-09 10:06 ` Oliver Neukum
2021-12-09 12:46 ` Benjamin Berg
2021-12-09 13:21 ` Oliver Neukum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.