All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
@ 2021-12-08 20:36 syzbot
  2021-12-09  1:59 ` syzbot
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2021-12-08 20:36 UTC (permalink / raw)
  To: changbin.du, christian.brauner, davem, edumazet, kuba,
	linux-kernel, netdev, syzkaller-bugs, yajun.deng

Hello,

syzbot found the following issue on:

HEAD commit:    4eee8d0b64ec Add linux-next specific files for 20211208
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13d1329db00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=20b74d9da4ce1ef1
dashboard link: https://syzkaller.appspot.com/bug?extid=485cc00ea7cf41dfdbf1
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e7e955b00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed3641b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+485cc00ea7cf41dfdbf1@syzkaller.appspotmail.com

BUG: sleeping function called from invalid context at kernel/workqueue.c:3039
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 8, name: kworker/u4:0
preempt_count: 101, expected: 0
RCU nest depth: 0, expected: 0
5 locks held by kworker/u4:0/8:
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:635 [inline]
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:662 [inline]
 #0: ffff8880155d9938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2289
 #1: ffffc90000cd7db0 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2293
 #2: ffffffff8d308150 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x9b/0xb00 net/core/net_namespace.c:555
 #3: ffff88806f0ab8f0 (&ent->pde_unload_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
 #3: ffff88806f0ab8f0 (&ent->pde_unload_lock){+.+.}-{2:2}, at: proc_entry_rundown+0xe7/0x1d0 fs/proc/inode.c:266
 #4: ffffc90000dc0d70 ((&dum_hcd->timer)){+.-.}-{0:0}, at: lockdep_copy_map include/linux/lockdep.h:35 [inline]
 #4: ffffc90000dc0d70 ((&dum_hcd->timer)){+.-.}-{0:0}, at: call_timer_fn+0xd5/0x6b0 kernel/time/timer.c:1411
irq event stamp: 268125
hardirqs last  enabled at (268124): [<ffffffff8956ab6f>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last  enabled at (268124): [<ffffffff8956ab6f>] _raw_spin_unlock_irq+0x1f/0x40 kernel/locking/spinlock.c:202
hardirqs last disabled at (268125): [<ffffffff8956a99e>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (268125): [<ffffffff8956a99e>] _raw_spin_lock_irqsave+0x4e/0x50 kernel/locking/spinlock.c:162
softirqs last  enabled at (268062): [<ffffffff885ab700>] spin_unlock_bh include/linux/spinlock.h:399 [inline]
softirqs last  enabled at (268062): [<ffffffff885ab700>] rxrpc_release_sock net/rxrpc/af_rxrpc.c:876 [inline]
softirqs last  enabled at (268062): [<ffffffff885ab700>] rxrpc_release+0x1f0/0x5a0 net/rxrpc/af_rxrpc.c:917
softirqs last disabled at (268119): [<ffffffff8146b723>] invoke_softirq kernel/softirq.c:432 [inline]
softirqs last disabled at (268119): [<ffffffff8146b723>] __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 5.16.0-rc4-next-20211208-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 __might_resched.cold+0x222/0x26b kernel/sched/core.c:9583
 start_flush_work kernel/workqueue.c:3039 [inline]
 __flush_work+0x109/0xb10 kernel/workqueue.c:3103
 __cancel_work_timer+0x3f9/0x570 kernel/workqueue.c:3191
 hci_cmd_sync_cancel net/bluetooth/hci_sync.c:346 [inline]
 hci_cmd_sync_cancel+0xe1/0x170 net/bluetooth/hci_sync.c:338
 btusb_intr_complete+0x3d3/0x4a0 drivers/bluetooth/btusb.c:969
 __usb_hcd_giveback_urb+0x2b0/0x5c0 drivers/usb/core/hcd.c:1656
 usb_hcd_giveback_urb+0x367/0x410 drivers/usb/core/hcd.c:1726
 dummy_timer+0x11f9/0x32b0 drivers/usb/gadget/udc/dummy_hcd.c:1987
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421
 expire_timers kernel/time/timer.c:1466 [inline]
 __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734
 __run_timers kernel/time/timer.c:1715 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:lock_acquire+0x1ef/0x510 kernel/locking/lockdep.c:5607
Code: 9d a5 7e 83 f8 01 0f 85 b4 02 00 00 9c 58 f6 c4 02 0f 85 9f 02 00 00 48 83 7c 24 08 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c3 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 48 8b 84 24
RSP: 0000:ffffc90000cd7928 EFLAGS: 00000206
RAX: dffffc0000000000 RBX: 1ffff9200019af27 RCX: 0000000000000001
RDX: 1ffff110021cfc44 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: 00000000000c1948 R09: 0000000000000001
R10: fffffbfff2024ea1 R11: 1ffffffff1ee5241 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88806f0ab8f0 R15: 0000000000000000
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:354 [inline]
 proc_entry_rundown+0xe7/0x1d0 fs/proc/inode.c:266
 remove_proc_subtree+0x25c/0x500 fs/proc/generic.c:767
 proc_remove fs/proc/generic.c:790 [inline]
 proc_remove+0x66/0x90 fs/proc/generic.c:787
 afs_proc_cleanup+0x34/0x70 fs/afs/proc.c:703
 afs_net_exit+0x17d/0x320 fs/afs/main.c:159
 ops_exit_list+0xb0/0x160 net/core/net_namespace.c:168
 cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:593
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2318
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2465
 kthread+0x405/0x4f0 kernel/kthread.c:345
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:	9d                   	popfq
   1:	a5                   	movsl  %ds:(%rsi),%es:(%rdi)
   2:	7e 83                	jle    0xffffff87
   4:	f8                   	clc
   5:	01 0f                	add    %ecx,(%rdi)
   7:	85 b4 02 00 00 9c 58 	test   %esi,0x589c0000(%rdx,%rax,1)
   e:	f6 c4 02             	test   $0x2,%ah
  11:	0f 85 9f 02 00 00    	jne    0x2b6
  17:	48 83 7c 24 08 00    	cmpq   $0x0,0x8(%rsp)
  1d:	74 01                	je     0x20
  1f:	fb                   	sti
  20:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  27:	fc ff df
* 2a:	48 01 c3             	add    %rax,%rbx <-- trapping instruction
  2d:	48 c7 03 00 00 00 00 	movq   $0x0,(%rbx)
  34:	48 c7 43 08 00 00 00 	movq   $0x0,0x8(%rbx)
  3b:	00
  3c:	48                   	rex.W
  3d:	8b                   	.byte 0x8b
  3e:	84                   	.byte 0x84
  3f:	24                   	.byte 0x24


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
  2021-12-08 20:36 [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel syzbot
@ 2021-12-09  1:59 ` syzbot
  2021-12-09 10:06   ` Oliver Neukum
  0 siblings, 1 reply; 5+ messages in thread
From: syzbot @ 2021-12-09  1:59 UTC (permalink / raw)
  To: Thinh.Nguyen, bberg, changbin.du, christian.brauner, davem,
	edumazet, gregkh, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, linux-usb, luiz.dentz, luiz.von.dentz, marcel,
	mathias.nyman, netdev, stern, syzkaller-bugs, yajun.deng

syzbot has bisected this issue to:

commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
Author: Benjamin Berg <bberg@redhat.com>
Date:   Fri Dec 3 14:59:02 2021 +0000

    Bluetooth: btusb: Cancel sync commands for certain URB errors

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=10a0fdc5b00000
start commit:   4eee8d0b64ec Add linux-next specific files for 20211208
git tree:       linux-next
final oops:     https://syzkaller.appspot.com/x/report.txt?x=12a0fdc5b00000
console output: https://syzkaller.appspot.com/x/log.txt?x=14a0fdc5b00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=20b74d9da4ce1ef1
dashboard link: https://syzkaller.appspot.com/bug?extid=485cc00ea7cf41dfdbf1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e7e955b00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10ed3641b00000

Reported-by: syzbot+485cc00ea7cf41dfdbf1@syzkaller.appspotmail.com
Fixes: c97a747efc93 ("Bluetooth: btusb: Cancel sync commands for certain URB errors")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
  2021-12-09  1:59 ` syzbot
@ 2021-12-09 10:06   ` Oliver Neukum
  2021-12-09 12:46     ` Benjamin Berg
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Neukum @ 2021-12-09 10:06 UTC (permalink / raw)
  To: syzbot, Thinh.Nguyen, bberg, changbin.du, christian.brauner,
	davem, edumazet, gregkh, johan.hedberg, kuba, linux-bluetooth,
	linux-kernel, linux-usb, luiz.dentz, luiz.von.dentz, marcel,
	mathias.nyman, netdev, stern, syzkaller-bugs, yajun.deng


On 09.12.21 02:59, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit c97a747efc93f94a4ad6c707972dfbf8d774edf9
> Author: Benjamin Berg <bberg@redhat.com>
> Date:   Fri Dec 3 14:59:02 2021 +0000
>
>     Bluetooth: btusb: Cancel sync commands for certain URB errors

Hi,

looking at the patch, it sleeps in an interrupt handler (or equivalent)
in two places:

@@ -933,6 +933,8 @@ static void btusb_intr_complete(struct urb *urb)
                if (err != -EPERM && err != -ENODEV)
                        bt_dev_err(hdev, "urb %p failed to resubmit (%d)",
                                   urb, -err);
+               if (err != -EPERM)
+                       hci_cmd_sync_cancel(hdev, -err);


@@ -1331,10 +1335,13 @@ static void btusb_tx_complete(struct urb *urb)
        if (!test_bit(HCI_RUNNING, &hdev->flags))
                goto done;
 
-       if (!urb->status)
+       if (!urb->status) {
                hdev->stat.byte_tx += urb->transfer_buffer_length;
-       else
+       } else {
+               if (hci_skb_pkt_type(skb) == HCI_COMMAND_PKT)
+                       hci_cmd_sync_cancel(hdev, -urb->status);

As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
just not
an approach you can take. It looks like asynchronously canceling the
scheduled work
would result in a race, so I would for now just revert.

What issue exactly is this trying to fix or improve?

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
  2021-12-09 10:06   ` Oliver Neukum
@ 2021-12-09 12:46     ` Benjamin Berg
  2021-12-09 13:21       ` Oliver Neukum
  0 siblings, 1 reply; 5+ messages in thread
From: Benjamin Berg @ 2021-12-09 12:46 UTC (permalink / raw)
  To: Oliver Neukum, syzbot, Thinh.Nguyen, changbin.du,
	christian.brauner, davem, edumazet, gregkh, johan.hedberg, kuba,
	linux-bluetooth, linux-kernel, linux-usb, luiz.dentz,
	luiz.von.dentz, marcel, mathias.nyman, netdev, stern,
	syzkaller-bugs, yajun.deng

[-- Attachment #1: Type: text/plain, Size: 1013 bytes --]

Hi,

On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
> just not
> an approach you can take. It looks like asynchronously canceling the
> scheduled work
> would result in a race, so I would for now just revert.

Right, so this needs to be pushed into a workqueue instead, I suppose.

> What issue exactly is this trying to fix or improve?

The problem is aborting long-running synchronous operations. i.e.
without this patchset, USB enumeration will hang for 10s if a USB
bluetooth device disappears during firmware loading. This is because
even though the USB device is gone and all URB submissions fail, the
operation will only be aborted after the internal timeout happens.

The device in turn disappears because an rfkill switch is blocked and
the platform removes it from the bus. Overall, this can lead to
graphical login to hang as fprintd cannot initialise as it hangs in USB
enumeration.

Benjamin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel
  2021-12-09 12:46     ` Benjamin Berg
@ 2021-12-09 13:21       ` Oliver Neukum
  0 siblings, 0 replies; 5+ messages in thread
From: Oliver Neukum @ 2021-12-09 13:21 UTC (permalink / raw)
  To: Benjamin Berg, Oliver Neukum, syzbot, Thinh.Nguyen, changbin.du,
	christian.brauner, davem, edumazet, gregkh, johan.hedberg, kuba,
	linux-bluetooth, linux-kernel, linux-usb, luiz.dentz,
	luiz.von.dentz, marcel, mathias.nyman, netdev, stern,
	syzkaller-bugs, yajun.deng


On 09.12.21 13:46, Benjamin Berg wrote:
Hi,
> On Thu, 2021-12-09 at 11:06 +0100, Oliver Neukum wrote:
>> As __cancel_work_timer can be called from hci_cmd_sync_cancel() this is
>> just not
>> an approach you can take. It looks like asynchronously canceling the
>> scheduled work
>> would result in a race, so I would for now just revert.
> Right, so this needs to be pushed into a workqueue instead, I suppose.
It looks like overkill, but I have no good alternative to offer either.
>
>> What issue exactly is this trying to fix or improve?
> The problem is aborting long-running synchronous operations. i.e.
> without this patchset, USB enumeration will hang for 10s if a USB
> bluetooth device disappears during firmware loading. This is because
> even though the USB device is gone and all URB submissions fail, the
> operation will only be aborted after the internal timeout happens.
>
I see. Something ought to be done. The issue is in good hands.

    Thanks
        Oliver


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-09 13:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 20:36 [syzbot] BUG: sleeping function called from invalid context in hci_cmd_sync_cancel syzbot
2021-12-09  1:59 ` syzbot
2021-12-09 10:06   ` Oliver Neukum
2021-12-09 12:46     ` Benjamin Berg
2021-12-09 13:21       ` Oliver Neukum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.