* INFO: task hung in nbd_ioctl @ 2021-09-16 2:43 Hao Sun 0 siblings, 0 replies; 13+ messages in thread From: Hao Sun @ 2021-09-16 2:43 UTC (permalink / raw) To: Jens Axboe, linux-kernel; +Cc: Josef Bacik, linux-block, nbd Hello, When using Healer to fuzz the latest Linux kernel, the following crash was triggered. HEAD commit: 6880fa6c5660 Linux 5.15-rc1 git tree: upstream console output: https://drive.google.com/file/d/1LfSHVsXZBF1k8KjBkz5OauavDE0rMs7D/view?usp=sharing kernel config: https://drive.google.com/file/d/1rUzyMbe5vcs6khA3tL9EHTLJvsUdWcgB/view?usp=sharing Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> INFO: task syz-executor:24965 blocked for more than 143 seconds. Not tainted 5.15.0-rc1 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:27880 pid:24965 ppid: 24302 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425 __mutex_lock_common kernel/locking/mutex.c:669 [inline] __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729 nbd_start_device_ioctl drivers/block/nbd.c:1361 [inline] __nbd_ioctl drivers/block/nbd.c:1422 [inline] nbd_ioctl+0x58b/0x9c0 drivers/block/nbd.c:1462 blkdev_ioctl+0x2a4/0x720 block/ioctl.c:589 block_ioctl+0xfa/0x140 block/fops.c:477 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4739cd RSP: 002b:00007fd1b9ddec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000059c0a0 RCX: 00000000004739cd RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000008 RBP: 00000000004ebd80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c0a0 R13: 00007ffd8476ee1f R14: 00007ffd8476efc0 R15: 00007fd1b9ddedc0 INFO: task syz-executor:24976 blocked for more than 143 seconds. Not tainted 5.15.0-rc1 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:28400 pid:24976 ppid: 24302 flags:0x00000004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 blk_mq_freeze_queue_wait+0x114/0x160 block/blk-mq.c:151 nbd_add_socket+0x102/0x7c0 drivers/block/nbd.c:1050 __nbd_ioctl drivers/block/nbd.c:1405 [inline] nbd_ioctl+0x391/0x9c0 drivers/block/nbd.c:1462 blkdev_ioctl+0x2a4/0x720 block/ioctl.c:589 block_ioctl+0xfa/0x140 block/fops.c:477 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4739cd RSP: 002b:00007fd1b9d7bc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000059c2c8 RCX: 00000000004739cd RDX: 0000000000000006 RSI: 000000000000ab00 RDI: 0000000000000004 RBP: 00000000004ebd80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c2c8 R13: 00007ffd8476ee1f R14: 00007ffd8476efc0 R15: 00007fd1b9d7bdc0 Showing all locks held in the system: 1 lock held by khungtaskd/39: #0: ffffffff8b97e9a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446 1 lock held by in:imklog/15673: #0: ffff88801eeab570 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990 1 lock held by syz-executor/24965: #0: ffff88801a0f4208 (&nbd->config_lock){+.+.}-{3:3}, at: nbd_start_device_ioctl drivers/block/nbd.c:1361 [inline] #0: ffff88801a0f4208 (&nbd->config_lock){+.+.}-{3:3}, at: __nbd_ioctl drivers/block/nbd.c:1422 [inline] #0: ffff88801a0f4208 (&nbd->config_lock){+.+.}-{3:3}, at: nbd_ioctl+0x58b/0x9c0 drivers/block/nbd.c:1462 1 lock held by syz-executor/24976: #0: ffff88801a0f4208 (&nbd->config_lock){+.+.}-{3:3}, at: nbd_ioctl+0x14f/0x9c0 drivers/block/nbd.c:1455 ============================================= NMI backtrace for cpu 2 CPU: 2 PID: 39 Comm: khungtaskd Not tainted 5.15.0-rc1 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x1e1/0x220 lib/nmi_backtrace.c:62 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline] watchdog+0xcc8/0x1010 kernel/hung_task.c:295 kthread+0x3e5/0x4d0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 2 to CPUs 0-1,3: NMI backtrace for cpu 1 CPU: 1 PID: 15674 Comm: rs:main Q:Reg Not tainted 5.15.0-rc1 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__lock_acquire+0xdc5/0x57e0 kernel/locking/lockdep.c:4885 Code: bc e9 0d e9 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 80 3c 02 00 0f 85 76 2a 00 00 49 81 3e 40 34 f0 8e <0f> 84 16 f3 ff ff 83 fd 01 0f 87 1e f3 ff ff 89 eb 0f 87 3d 39 00 RSP: 0018:ffffc90007fbf628 EFLAGS: 00000087 RAX: dffffc0000000000 RBX: 1ffff92000ff7ef5 RCX: 0000000000000000 RDX: 1ffffffff1757c40 RSI: 0000000000000000 RDI: ffffffff8babe200 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff888135c32a0b R11: ffffed1026b86541 R12: 0000000000000000 R13: ffff88810287d580 R14: ffffffff8babe200 R15: 0000000000000000 FS: 00007fe40dfd0700(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000056f1d8 CR3: 000000001911a000 CR4: 0000000000350ee0 Call Trace: lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5590 fs_reclaim_acquire+0xd2/0x160 mm/page_alloc.c:4556 might_alloc include/linux/sched/mm.h:198 [inline] slab_pre_alloc_hook mm/slab.h:492 [inline] slab_alloc_node mm/slub.c:3120 [inline] slab_alloc mm/slub.c:3214 [inline] kmem_cache_alloc+0x42/0x340 mm/slub.c:3219 kmem_cache_zalloc include/linux/slab.h:711 [inline] jbd2_alloc_handle include/linux/jbd2.h:1603 [inline] new_handle fs/jbd2/transaction.c:481 [inline] jbd2__journal_start fs/jbd2/transaction.c:508 [inline] jbd2__journal_start+0x191/0x920 fs/jbd2/transaction.c:490 __ext4_journal_start_sb+0x3a8/0x4a0 fs/ext4/ext4_jbd2.c:105 __ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline] ext4_da_write_begin+0x4c5/0x1180 fs/ext4/inode.c:3002 generic_perform_write+0x1fe/0x510 mm/filemap.c:3770 ext4_buffered_write_iter+0x206/0x4c0 fs/ext4/file.c:269 ext4_file_write_iter+0x42e/0x14a0 fs/ext4/file.c:680 call_write_iter include/linux/fs.h:2163 [inline] new_sync_write+0x414/0x640 fs/read_write.c:507 vfs_write+0x67a/0xae0 fs/read_write.c:594 ksys_write+0x12d/0x250 fs/read_write.c:647 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fe410a141cd Code: c2 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae fc ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 f7 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007fe40dfcf590 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fe404027260 RCX: 00007fe410a141cd RDX: 000000000000005c RSI: 00007fe404027260 RDI: 0000000000000009 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007fe404026fe0 R13: 00007fe40dfcf5b0 R14: 000055dcd82aa800 R15: 000000000000005c NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline] NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline] NMI backtrace for cpu 0 skipped: idling at default_idle+0xb/0x10 arch/x86/kernel/process.c:716 NMI backtrace for cpu 3 CPU: 3 PID: 3017 Comm: systemd-journal Not tainted 5.15.0-rc1 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:lockdep_hardirqs_off+0x3b/0xd0 kernel/locking/lockdep.c:4372 Code: 2b 47 cf 76 a9 00 00 f0 00 55 53 48 89 fb 74 49 8b 15 f9 2f f2 06 85 d2 74 0e 65 8b 05 6a 4e cf 76 85 c0 75 4e 5b 5d c3 9c 58 <f6> c4 02 74 eb e8 5b fa ac fa 85 c0 74 ed 8b 05 b9 46 3b 04 85 c0 RSP: 0018:ffffc90000edf900 EFLAGS: 00000046 RAX: 0000000000000046 RBX: ffffffff81ccee3d RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90000edf9b0 R08: ffffffff817c49c9 R09: 0000000000000000 R10: 0000000000000007 R11: ffffed1026ba6541 R12: 0000000000000200 R13: 0000000000000000 R14: ffff888109853900 R15: 0000000000000000 FS: 00007fdba43168c0(0000) GS:ffff888135d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdb9ff28000 CR3: 000000001bdb3000 CR4: 0000000000350ee0 Call Trace: trace_hardirqs_off+0x13/0x1b0 kernel/trace/trace_preemptirq.c:76 seqcount_lockdep_reader_access include/linux/seqlock.h:102 [inline] set_root+0x39d/0x560 fs/namei.c:940 nd_jump_root+0x38d/0x520 fs/namei.c:961 path_init+0xf81/0x1700 fs/namei.c:2359 path_openat+0x18e/0x2710 fs/namei.c:3556 do_filp_open+0x1c1/0x290 fs/namei.c:3588 do_sys_openat2+0x61b/0x9a0 fs/open.c:1200 do_sys_open+0xc3/0x140 fs/open.c:1216 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fdba38a685d Code: bb 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e f6 ff ff 48 89 04 24 b8 02 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 67 f6 ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007ffd7e798f70 EFLAGS: 00000293 ORIG_RAX: 0000000000000002 RAX: ffffffffffffffda RBX: 00007ffd7e799280 RCX: 00007fdba38a685d RDX: 00000000000001a0 RSI: 0000000000080042 RDI: 000055ef99869060 RBP: 000000000000000d R08: 000000000000ffc0 R09: 00000000ffffffff R10: 0000000000000069 R11: 0000000000000293 R12: 00000000ffffffff R13: 000055ef99865040 R14: 00007ffd7e799240 R15: 000055ef99870c40 ---------------- Code disassembly (best guess), 1 bytes skipped: 0: e9 0d e9 f4 01 jmpq 0x1f4e912 5: 00 00 add %al,(%rax) 7: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax e: fc ff df 11: 4c 89 f2 mov %r14,%rdx 14: 48 c1 ea 03 shr $0x3,%rdx 18: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) 1c: 0f 85 76 2a 00 00 jne 0x2a98 22: 49 81 3e 40 34 f0 8e cmpq $0xffffffff8ef03440,(%r14) * 29: 0f 84 16 f3 ff ff je 0xfffff345 <-- trapping instruction 2f: 83 fd 01 cmp $0x1,%ebp 32: 0f 87 1e f3 ff ff ja 0xfffff356 38: 89 eb mov %ebp,%ebx 3a: 0f .byte 0xf 3b: 87 .byte 0x87 3c: 3d .byte 0x3d 3d: 39 00 cmp %eax,(%rax) ^ permalink raw reply [flat|nested] 13+ messages in thread
* INFO: task hung in nbd_ioctl @ 2021-09-18 1:34 Hao Sun 0 siblings, 0 replies; 13+ messages in thread From: Hao Sun @ 2021-09-18 1:34 UTC (permalink / raw) To: Jens Axboe, Linux Kernel Mailing List; +Cc: Josef Bacik, linux-block, nbd Hello, When using Healer to fuzz the latest Linux kernel, the following crash was triggered. HEAD commit: ff1ffd71d5f0 Merge tag 'hyperv-fixes-signed-20210915 git tree: upstream console output: https://drive.google.com/file/d/1Htx96ZZ5dAxLIr-4jNJ62iQdstmHnliH/view?usp=sharing kernel config: https://drive.google.com/file/d/1zXpDhs-IdE7tX17B7MhaYP0VGUfP6m9B/view?usp=sharing Sorry, I don't have a reproducer for this crash, hope the symbolized report can help. If you fix this issue, please add the following tag to the commit: Reported-by: Hao Sun <sunhao.th@gmail.com> INFO: task syz-executor:25816 blocked for more than 143 seconds. Not tainted 5.15.0-rc1+ #6 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:27872 pid:25816 ppid: 24814 flags:0x00000004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425 __mutex_lock_common kernel/locking/mutex.c:669 [inline] __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729 nbd_ioctl+0x14f/0x9c0 drivers/block/nbd.c:1455 blkdev_ioctl+0x2a4/0x720 block/ioctl.c:589 block_ioctl+0xfa/0x140 block/fops.c:477 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4739cd RSP: 002b:00007fe430645c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000059c0a0 RCX: 00000000004739cd RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000007 RBP: 00000000004ebd80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c0a0 R13: 00007ffcaa94abdf R14: 00007ffcaa94ad80 R15: 00007fe430645dc0 INFO: task syz-executor:25822 blocked for more than 143 seconds. Not tainted 5.15.0-rc1+ #6 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:28400 pid:25822 ppid: 24814 flags:0x00000004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 blk_mq_freeze_queue_wait+0x114/0x160 block/blk-mq.c:151 nbd_add_socket+0x102/0x7c0 drivers/block/nbd.c:1050 __nbd_ioctl drivers/block/nbd.c:1405 [inline] nbd_ioctl+0x391/0x9c0 drivers/block/nbd.c:1462 blkdev_ioctl+0x2a4/0x720 block/ioctl.c:589 block_ioctl+0xfa/0x140 block/fops.c:477 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4739cd RSP: 002b:00007fe430603c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000059c210 RCX: 00000000004739cd RDX: 0000000000000005 RSI: 000000000000ab00 RDI: 0000000000000004 RBP: 00000000004ebd80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c210 R13: 00007ffcaa94abdf R14: 00007ffcaa94ad80 R15: 00007fe430603dc0 INFO: task syz-executor:25823 blocked for more than 143 seconds. Not tainted 5.15.0-rc1+ #6 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:27408 pid:25823 ppid: 24814 flags:0x00000004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 blk_queue_enter+0x956/0xdb0 block/blk-core.c:462 bio_queue_enter block/blk-core.c:477 [inline] __submit_bio_noacct_mq block/blk-core.c:989 [inline] submit_bio_noacct+0xd32/0x1460 block/blk-core.c:1031 submit_bio+0x10a/0x460 block/blk-core.c:1093 submit_bio_wait+0x106/0x230 block/bio.c:1248 blkdev_issue_flush+0xd7/0x120 block/blk-flush.c:458 blkdev_fsync+0x8e/0xd0 block/fops.c:420 vfs_fsync_range+0x13a/0x220 fs/sync.c:200 vfs_fsync fs/sync.c:214 [inline] do_fsync+0x4d/0x90 fs/sync.c:224 __do_sys_fsync fs/sync.c:232 [inline] __se_sys_fsync fs/sync.c:230 [inline] __x64_sys_fsync+0x2f/0x40 fs/sync.c:230 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x2000014c RSP: 002b:00007fe4305e2bb8 EFLAGS: 00000213 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000002000014c RDX: 0000000000004c01 RSI: 0000000000000003 RDI: 0000000000000003 RBP: 00000000000000b9 R08: 0000000000000005 R09: 0000000000000006 R10: 0000000000000007 R11: 0000000000000213 R12: 000000000000000b R13: 000000000000000c R14: 000000000000000d R15: 00007fe4305e2dc0 Showing all locks held in the system: 1 lock held by khungtaskd/39: #0: ffffffff8b97e9a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446 1 lock held by in:imklog/6298: #0: ffff88801c9d19f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990 3 locks held by kworker/u8:2/6743: 1 lock held by syz-executor/25816: #0: ffff88801ae63208 (&nbd->config_lock){+.+.}-{3:3}, at: nbd_ioctl+0x14f/0x9c0 drivers/block/nbd.c:1455 1 lock held by syz-executor/25822: #0: ffff88801ae63208 (&nbd->config_lock){+.+.}-{3:3}, at: nbd_ioctl+0x14f/0x9c0 drivers/block/nbd.c:1455 ============================================= NMI backtrace for cpu 1 CPU: 1 PID: 39 Comm: khungtaskd Not tainted 5.15.0-rc1+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105 nmi_trigger_cpumask_backtrace+0x1e1/0x220 lib/nmi_backtrace.c:62 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline] watchdog+0xcc8/0x1010 kernel/hung_task.c:295 kthread+0x3e5/0x4d0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Sending NMI from CPU 1 to CPUs 0,2-3: NMI backtrace for cpu 0 CPU: 0 PID: 3022 Comm: systemd-journal Not tainted 5.15.0-rc1+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:__orc_find+0x0/0xf0 arch/x86/kernel/unwind_orc.c:35 Code: 7f 8b e8 63 a6 c8 02 e9 60 fb ff ff e8 b9 96 8a 00 e9 cf fb ff ff cc cc cc cc 48 8b 07 c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 <41> 57 89 d0 41 56 41 55 41 54 4c 8d 64 87 fc 55 53 48 83 ec 10 85 RSP: 0018:ffffc9000121f980 EFLAGS: 00000212 RAX: 000000000002c858 RBX: 1ffff92000243f39 RCX: ffffffff81bdbadf RDX: 000000000000000b RSI: ffffffff8df4036c RDI: ffffffff8d82b228 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff8df4036c R10: ffffc9000121fadf R11: 0000000000086088 R12: ffffc9000121fac8 R13: ffffc9000121fab5 R14: ffffc9000121fa80 R15: ffffffff81bdbadf FS: 00007f13812868c0(0000) GS:ffff888063e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f137ce35000 CR3: 0000000018bf1000 CR4: 0000000000350ef0 Call Trace: orc_find arch/x86/kernel/unwind_orc.c:173 [inline] unwind_next_frame+0x33a/0x1770 arch/x86/kernel/unwind_orc.c:443 arch_stack_walk+0x7d/0xe0 arch/x86/kernel/stacktrace.c:25 stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:121 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360 ____kasan_slab_free mm/kasan/common.c:366 [inline] ____kasan_slab_free mm/kasan/common.c:328 [inline] __kasan_slab_free+0x100/0x140 mm/kasan/common.c:374 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:1700 [inline] slab_free_freelist_hook mm/slub.c:1725 [inline] slab_free mm/slub.c:3483 [inline] kmem_cache_free+0xa0/0x670 mm/slub.c:3499 putname+0xfe/0x140 fs/namei.c:270 do_mkdirat+0x18a/0x2b0 fs/namei.c:3920 __do_sys_mkdir fs/namei.c:3931 [inline] __se_sys_mkdir fs/namei.c:3929 [inline] __x64_sys_mkdir+0x61/0x80 fs/namei.c:3929 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f1380542687 Code: 00 b8 ff ff ff ff c3 0f 1f 40 00 48 8b 05 09 d8 2b 00 64 c7 00 5f 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 d7 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffcb3a1e5b8 EFLAGS: 00000293 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00007ffcb3a214d0 RCX: 00007f1380542687 RDX: 00007f1380fb3a00 RSI: 00000000000001ed RDI: 00005567696f38a0 RBP: 00007ffcb3a1e5f0 R08: 000000000000c000 R09: 0000000000000000 R10: 0000000000000069 R11: 0000000000000293 R12: 0000000000000000 R13: 0000000000000000 R14: 00007ffcb3a214d0 R15: 00007ffcb3a1eae0 NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline] NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline] NMI backtrace for cpu 3 skipped: idling at default_idle+0xb/0x10 arch/x86/kernel/process.c:716 NMI backtrace for cpu 2 CPU: 2 PID: 6743 Comm: kworker/u8:2 Not tainted 5.15.0-rc1+ #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:check_wait_context kernel/locking/lockdep.c:4693 [inline] RIP: 0010:__lock_acquire+0x4b6/0x57e0 kernel/locking/lockdep.c:4965 Code: 06 49 81 c7 40 fd cf 8f 45 84 e4 0f 84 f8 02 00 00 48 8d 7d 21 48 b8 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 0f b6 04 01 <48> 89 f9 83 e1 07 38 c8 7f 08 84 c0 0f 85 62 33 00 00 44 0f b6 4d RSP: 0018:ffffc9000308f8f0 EFLAGS: 00000012 RAX: 0000000000000000 RBX: 0000000000000007 RCX: 1ffff11005adb152 RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88802d6d8a91 RBP: ffff88802d6d8a70 R08: 0000000000000001 R09: fffffbfff1f9ff25 R10: ffffffff8fcff927 R11: fffffbfff1f9ff24 R12: 0000000000000002 R13: ffff88802d6d8000 R14: ffffffff8b97e9a0 R15: ffffffff8fd00280 FS: 0000000000000000(0000) GS:ffff888063f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002a6ac48 CR3: 000000000b68e000 CR4: 0000000000350ee0 Call Trace: lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5590 rcu_lock_acquire include/linux/rcupdate.h:267 [inline] rcu_read_lock include/linux/rcupdate.h:687 [inline] inet_twsk_purge+0x117/0x7b0 net/ipv4/inet_timewait_sock.c:268 ops_exit_list.isra.0+0x103/0x150 net/core/net_namespace.c:171 cleanup_net+0x511/0xa90 net/core/net_namespace.c:591 process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297 worker_thread+0x90/0xed0 kernel/workqueue.c:2444 kthread+0x3e5/0x4d0 kernel/kthread.c:319 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 ---------------- Code disassembly (best guess): 0: 7f 8b jg 0xffffff8d 2: e8 63 a6 c8 02 callq 0x2c8a66a 7: e9 60 fb ff ff jmpq 0xfffffb6c c: e8 b9 96 8a 00 callq 0x8a96ca 11: e9 cf fb ff ff jmpq 0xfffffbe5 16: cc int3 17: cc int3 18: cc int3 19: cc int3 1a: 48 8b 07 mov (%rdi),%rax 1d: c3 retq 1e: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 25: 00 00 00 00 29: 90 nop * 2a: 41 57 push %r15 <-- trapping instruction 2c: 89 d0 mov %edx,%eax 2e: 41 56 push %r14 30: 41 55 push %r13 32: 41 54 push %r12 34: 4c 8d 64 87 fc lea -0x4(%rdi,%rax,4),%r12 39: 55 push %rbp 3a: 53 push %rbx 3b: 48 83 ec 10 sub $0x10,%rsp 3f: 85 .byte 0x85 ^ permalink raw reply [flat|nested] 13+ messages in thread
* INFO: task hung in nbd_ioctl @ 2019-09-30 22:39 syzbot 2019-10-01 17:48 ` Mike Christie 2019-10-01 21:19 ` Mike Christie 0 siblings, 2 replies; 13+ messages in thread From: syzbot @ 2019-09-30 22:39 UTC (permalink / raw) To: axboe, josef, linux-block, linux-kernel, mchristi, nbd, syzkaller-bugs Hello, syzbot found the following crash on: HEAD commit: bb2aee77 Add linux-next specific files for 20190926 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 dashboard link: https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a compiler: gcc (GCC) 9.0.0 20181231 (experimental) syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 The bug was bisected to: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie <mchristi@redhat.com> Date: Sun Aug 4 19:10:06 2019 +0000 nbd: fix max number of supported devs bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") INFO: task syz-executor390:8778 can't die for more than 143 seconds. syz-executor390 D27432 8778 8777 0x00004004 Call Trace: context_switch kernel/sched/core.c:3384 [inline] __schedule+0x828/0x1c20 kernel/sched/core.c:4065 schedule+0xd9/0x260 kernel/sched/core.c:4132 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 do_wait_for_common kernel/sched/completion.c:83 [inline] __wait_for_common kernel/sched/completion.c:104 [inline] wait_for_common kernel/sched/completion.c:115 [inline] wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] __nbd_ioctl drivers/block/nbd.c:1347 [inline] nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 __blkdev_driver_ioctl block/ioctl.c:304 [inline] blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 block_ioctl+0xee/0x130 fs/block_dev.c:1954 vfs_ioctl fs/ioctl.c:47 [inline] file_ioctl fs/ioctl.c:539 [inline] do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 __do_sys_ioctl fs/ioctl.c:750 [inline] __se_sys_ioctl fs/ioctl.c:748 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4452d9 Code: Bad RIP value. RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 INFO: task syz-executor390:8778 blocked for more than 143 seconds. Not tainted 5.3.0-next-20190926 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor390 D27432 8778 8777 0x00004004 Call Trace: context_switch kernel/sched/core.c:3384 [inline] __schedule+0x828/0x1c20 kernel/sched/core.c:4065 schedule+0xd9/0x260 kernel/sched/core.c:4132 schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 do_wait_for_common kernel/sched/completion.c:83 [inline] __wait_for_common kernel/sched/completion.c:104 [inline] wait_for_common kernel/sched/completion.c:115 [inline] wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] __nbd_ioctl drivers/block/nbd.c:1347 [inline] nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 __blkdev_driver_ioctl block/ioctl.c:304 [inline] blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 block_ioctl+0xee/0x130 fs/block_dev.c:1954 vfs_ioctl fs/ioctl.c:47 [inline] file_ioctl fs/ioctl.c:539 [inline] do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 __do_sys_ioctl fs/ioctl.c:750 [inline] __se_sys_ioctl fs/ioctl.c:748 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4452d9 Code: Bad RIP value. RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 Showing all locks held in the system: 1 lock held by khungtaskd/1066: #0: ffffffff88faad80 (rcu_read_lock){....}, at: debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 2 locks held by kworker/u5:0/1525: #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: __write_once_size include/linux/compiler.h:226 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:40 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: set_work_data kernel/workqueue.c:620 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 1 lock held by rsyslogd/8659: 2 locks held by getty/8749: #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8750: #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8751: #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8752: #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8753: #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8754: #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 2 locks held by getty/8755: #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 ============================================= NMI backtrace for cpu 1 CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] watchdog+0xc99/0x1360 kernel/hung_task.c:353 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:60 --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this bug report. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this bug, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-09-30 22:39 syzbot @ 2019-10-01 17:48 ` Mike Christie 2019-10-01 21:19 ` Mike Christie 1 sibling, 0 replies; 13+ messages in thread From: Mike Christie @ 2019-10-01 17:48 UTC (permalink / raw) To: syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On 09/30/2019 05:39 PM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: bb2aee77 Add linux-next specific files for 20190926 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 > kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > dashboard link: > https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 > > The bug was bisected to: > > commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > Author: Mike Christie <mchristi@redhat.com> > Date: Sun Aug 4 19:10:06 2019 +0000 > > nbd: fix max number of supported devs > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 > final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 > console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com > Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > > INFO: task syz-executor390:8778 can't die for more than 143 seconds. > syz-executor390 D27432 8778 8777 0x00004004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > INFO: task syz-executor390:8778 blocked for more than 143 seconds. > Not tainted 5.3.0-next-20190926 #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor390 D27432 8778 8777 0x00004004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > I will send a fix for this. I had assumed that for every socket type a kernel_sock_shutdown would break us out of sock_recvmsg call, but it looks like that's not the case. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-09-30 22:39 syzbot 2019-10-01 17:48 ` Mike Christie @ 2019-10-01 21:19 ` Mike Christie 2019-10-17 14:03 ` Richard W.M. Jones 1 sibling, 1 reply; 13+ messages in thread From: Mike Christie @ 2019-10-01 21:19 UTC (permalink / raw) To: syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs Hey Josef and nbd list, I had a question about if there are any socket family restrictions for nbd? The bug here is that some socket familys do not support the sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a flush_workqueue call, so for socket familys like AF_NETLINK in this bug we hang like we see below. I can just remove the flush_workqueue call in that code path since it's not needed there, but it leaves the original bug my patch was hitting where we leave the recv_work running which can then result in leaked resources, or possible use after free crashes and you still get the hang if you remove the module. It looks like we have used kernel_sock_shutdown for a while so I thought we might never have supported sockets that did not support the callout. Is that correct? If so then I can just add a check for this in nbd_add_socket and fix that bug too. On 09/30/2019 05:39 PM, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: bb2aee77 Add linux-next specific files for 20190926 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 > kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > dashboard link: > https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 > > The bug was bisected to: > > commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > Author: Mike Christie <mchristi@redhat.com> > Date: Sun Aug 4 19:10:06 2019 +0000 > > nbd: fix max number of supported devs > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 > final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 > console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com > Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > > INFO: task syz-executor390:8778 can't die for more than 143 seconds. > syz-executor390 D27432 8778 8777 0x00004004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > INFO: task syz-executor390:8778 blocked for more than 143 seconds. > Not tainted 5.3.0-next-20190926 #0 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > syz-executor390 D27432 8778 8777 0x00004004 > Call Trace: > context_switch kernel/sched/core.c:3384 [inline] > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > schedule+0xd9/0x260 kernel/sched/core.c:4132 > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > do_wait_for_common kernel/sched/completion.c:83 [inline] > __wait_for_common kernel/sched/completion.c:104 [inline] > wait_for_common kernel/sched/completion.c:115 [inline] > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > vfs_ioctl fs/ioctl.c:47 [inline] > file_ioctl fs/ioctl.c:539 [inline] > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > __do_sys_ioctl fs/ioctl.c:750 [inline] > __se_sys_ioctl fs/ioctl.c:748 [inline] > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x4452d9 > Code: Bad RIP value. > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > > Showing all locks held in the system: > 1 lock held by khungtaskd/1066: > #0: ffffffff88faad80 (rcu_read_lock){....}, at: > debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 > 2 locks held by kworker/u5:0/1525: > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > __write_once_size include/linux/compiler.h:226 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > atomic_long_set include/asm-generic/atomic-long.h:40 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > set_work_data kernel/workqueue.c:620 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 > #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: > process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 > 1 lock held by rsyslogd/8659: > 2 locks held by getty/8749: > #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8750: > #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8751: > #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8752: > #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8753: > #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8754: > #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > 2 locks held by getty/8755: > #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > ============================================= > > NMI backtrace for cpu 1 > CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 > nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] > check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] > watchdog+0xc99/0x1360 kernel/hung_task.c:353 > kthread+0x361/0x430 kernel/kthread.c:255 > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > Sending NMI from CPU 1 to CPUs 0: > NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 > arch/x86/include/asm/irqflags.h:60 > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > For information about bisection process see: > https://goo.gl/tpsmEJ#bisection > syzbot can test patches for this bug, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-01 21:19 ` Mike Christie @ 2019-10-17 14:03 ` Richard W.M. Jones 2019-10-17 15:47 ` Mike Christie 2019-10-30 8:39 ` Wouter Verhelst 0 siblings, 2 replies; 13+ messages in thread From: Richard W.M. Jones @ 2019-10-17 14:03 UTC (permalink / raw) To: Mike Christie Cc: syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > Hey Josef and nbd list, > > I had a question about if there are any socket family restrictions for nbd? In normal circumstances, in userspace, the NBD protocol would only be used over AF_UNIX or AF_INET/AF_INET6. There's a bit of confusion because netlink is used by nbd-client to configure the NBD device, setting things like block size and timeouts (instead of ioctl which is deprecated). I think you don't mean this use of netlink? > The bug here is that some socket familys do not support the > sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > flush_workqueue call, so for socket familys like AF_NETLINK in this bug > we hang like we see below. > > I can just remove the flush_workqueue call in that code path since it's > not needed there, but it leaves the original bug my patch was hitting > where we leave the recv_work running which can then result in leaked > resources, or possible use after free crashes and you still get the hang > if you remove the module. > > It looks like we have used kernel_sock_shutdown for a while so I thought > we might never have supported sockets that did not support the callout. > Is that correct? If so then I can just add a check for this in > nbd_add_socket and fix that bug too. Rich. > On 09/30/2019 05:39 PM, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: bb2aee77 Add linux-next specific files for 20190926 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > > dashboard link: > > https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 > > > > The bug was bisected to: > > > > commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > > Author: Mike Christie <mchristi@redhat.com> > > Date: Sun Aug 4 19:10:06 2019 +0000 > > > > nbd: fix max number of supported devs > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 > > final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com > > Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > > > > INFO: task syz-executor390:8778 can't die for more than 143 seconds. > > syz-executor390 D27432 8778 8777 0x00004004 > > Call Trace: > > context_switch kernel/sched/core.c:3384 [inline] > > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > > schedule+0xd9/0x260 kernel/sched/core.c:4132 > > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > > do_wait_for_common kernel/sched/completion.c:83 [inline] > > __wait_for_common kernel/sched/completion.c:104 [inline] > > wait_for_common kernel/sched/completion.c:115 [inline] > > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > > vfs_ioctl fs/ioctl.c:47 [inline] > > file_ioctl fs/ioctl.c:539 [inline] > > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > > __do_sys_ioctl fs/ioctl.c:750 [inline] > > __se_sys_ioctl fs/ioctl.c:748 [inline] > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x4452d9 > > Code: Bad RIP value. > > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > > INFO: task syz-executor390:8778 blocked for more than 143 seconds. > > Not tainted 5.3.0-next-20190926 #0 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > syz-executor390 D27432 8778 8777 0x00004004 > > Call Trace: > > context_switch kernel/sched/core.c:3384 [inline] > > __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > > schedule+0xd9/0x260 kernel/sched/core.c:4132 > > schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > > do_wait_for_common kernel/sched/completion.c:83 [inline] > > __wait_for_common kernel/sched/completion.c:104 [inline] > > wait_for_common kernel/sched/completion.c:115 [inline] > > wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > > flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > > nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > > __nbd_ioctl drivers/block/nbd.c:1347 [inline] > > nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > > __blkdev_driver_ioctl block/ioctl.c:304 [inline] > > blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > > block_ioctl+0xee/0x130 fs/block_dev.c:1954 > > vfs_ioctl fs/ioctl.c:47 [inline] > > file_ioctl fs/ioctl.c:539 [inline] > > do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > > ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > > __do_sys_ioctl fs/ioctl.c:750 [inline] > > __se_sys_ioctl fs/ioctl.c:748 [inline] > > __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > > do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > RIP: 0033:0x4452d9 > > Code: Bad RIP value. > > RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > > RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > > RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > > R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > > > > Showing all locks held in the system: > > 1 lock held by khungtaskd/1066: > > #0: ffffffff88faad80 (rcu_read_lock){....}, at: > > debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 > > 2 locks held by kworker/u5:0/1525: > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > __write_once_size include/linux/compiler.h:226 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > atomic_long_set include/asm-generic/atomic-long.h:40 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > set_work_data kernel/workqueue.c:620 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] > > #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > > process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 > > #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: > > process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 > > 1 lock held by rsyslogd/8659: > > 2 locks held by getty/8749: > > #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8750: > > #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8751: > > #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8752: > > #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8753: > > #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8754: > > #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > 2 locks held by getty/8755: > > #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: > > ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > > #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: > > n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > > > > ============================================= > > > > NMI backtrace for cpu 1 > > CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > > Google 01/01/2011 > > Call Trace: > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > > nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 > > nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 > > arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > > trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] > > check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] > > watchdog+0xc99/0x1360 kernel/hung_task.c:353 > > kthread+0x361/0x430 kernel/kthread.c:255 > > ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > > Sending NMI from CPU 1 to CPUs 0: > > NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 > > arch/x86/include/asm/irqflags.h:60 > > > > > > --- > > This bug is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > syzbot will keep track of this bug report. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > For information about bisection process see: > > https://goo.gl/tpsmEJ#bisection > > syzbot can test patches for this bug, for details see: > > https://goo.gl/tpsmEJ#testing-patches -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 14:03 ` Richard W.M. Jones @ 2019-10-17 15:47 ` Mike Christie 2019-10-17 16:28 ` Richard W.M. Jones 2019-10-30 8:39 ` Wouter Verhelst 1 sibling, 1 reply; 13+ messages in thread From: Mike Christie @ 2019-10-17 15:47 UTC (permalink / raw) To: Richard W.M. Jones Cc: syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs [-- Attachment #1: Type: text/plain, Size: 11884 bytes --] On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: >> Hey Josef and nbd list, >> >> I had a question about if there are any socket family restrictions for nbd? > > In normal circumstances, in userspace, the NBD protocol would only be > used over AF_UNIX or AF_INET/AF_INET6. > > There's a bit of confusion because netlink is used by nbd-client to > configure the NBD device, setting things like block size and timeouts > (instead of ioctl which is deprecated). I think you don't mean this > use of netlink? I didn't. It looks like it is just a bad test. For the automated test in this thread the test created a AF_NETLINK socket and passed it into the NBD_SET_SOCK ioctl. That is what got used for the NBD_DO_IT ioctl. I was not sure if the test creator picked any old socket and it just happened to pick one nbd never supported, or it was trying to simulate sockets that did not support the shutdown method. I attached the automated test that got run (test.c). > >> The bug here is that some socket familys do not support the >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug >> we hang like we see below. >> >> I can just remove the flush_workqueue call in that code path since it's >> not needed there, but it leaves the original bug my patch was hitting >> where we leave the recv_work running which can then result in leaked >> resources, or possible use after free crashes and you still get the hang >> if you remove the module. >> >> It looks like we have used kernel_sock_shutdown for a while so I thought >> we might never have supported sockets that did not support the callout. >> Is that correct? If so then I can just add a check for this in >> nbd_add_socket and fix that bug too. > > Rich. > >> On 09/30/2019 05:39 PM, syzbot wrote: >>> Hello, >>> >>> syzbot found the following crash on: >>> >>> HEAD commit: bb2aee77 Add linux-next specific files for 20190926 >>> git tree: linux-next >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 >>> dashboard link: >>> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a >>> compiler: gcc (GCC) 9.0.0 20181231 (experimental) >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 >>> >>> The bug was bisected to: >>> >>> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 >>> Author: Mike Christie <mchristi@redhat.com> >>> Date: Sun Aug 4 19:10:06 2019 +0000 >>> >>> nbd: fix max number of supported devs >>> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 >>> final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 >>> >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: >>> Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") >>> >>> INFO: task syz-executor390:8778 can't die for more than 143 seconds. >>> syz-executor390 D27432 8778 8777 0x00004004 >>> Call Trace: >>> context_switch kernel/sched/core.c:3384 [inline] >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 >>> do_wait_for_common kernel/sched/completion.c:83 [inline] >>> __wait_for_common kernel/sched/completion.c:104 [inline] >>> wait_for_common kernel/sched/completion.c:115 [inline] >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 >>> vfs_ioctl fs/ioctl.c:47 [inline] >>> file_ioctl fs/ioctl.c:539 [inline] >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 >>> __do_sys_ioctl fs/ioctl.c:750 [inline] >>> __se_sys_ioctl fs/ioctl.c:748 [inline] >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> RIP: 0033:0x4452d9 >>> Code: Bad RIP value. >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 >>> INFO: task syz-executor390:8778 blocked for more than 143 seconds. >>> Not tainted 5.3.0-next-20190926 #0 >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> syz-executor390 D27432 8778 8777 0x00004004 >>> Call Trace: >>> context_switch kernel/sched/core.c:3384 [inline] >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 >>> do_wait_for_common kernel/sched/completion.c:83 [inline] >>> __wait_for_common kernel/sched/completion.c:104 [inline] >>> wait_for_common kernel/sched/completion.c:115 [inline] >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 >>> vfs_ioctl fs/ioctl.c:47 [inline] >>> file_ioctl fs/ioctl.c:539 [inline] >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 >>> __do_sys_ioctl fs/ioctl.c:750 [inline] >>> __se_sys_ioctl fs/ioctl.c:748 [inline] >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> RIP: 0033:0x4452d9 >>> Code: Bad RIP value. >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 >>> >>> Showing all locks held in the system: >>> 1 lock held by khungtaskd/1066: >>> #0: ffffffff88faad80 (rcu_read_lock){....}, at: >>> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 >>> 2 locks held by kworker/u5:0/1525: >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> __write_once_size include/linux/compiler.h:226 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> atomic_long_set include/asm-generic/atomic-long.h:40 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> set_work_data kernel/workqueue.c:620 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: >>> process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 >>> #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: >>> process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 >>> 1 lock held by rsyslogd/8659: >>> 2 locks held by getty/8749: >>> #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8750: >>> #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8751: >>> #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8752: >>> #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8753: >>> #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8754: >>> #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> 2 locks held by getty/8755: >>> #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 >>> #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 >>> >>> ============================================= >>> >>> NMI backtrace for cpu 1 >>> CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >>> Google 01/01/2011 >>> Call Trace: >>> __dump_stack lib/dump_stack.c:77 [inline] >>> dump_stack+0x172/0x1f0 lib/dump_stack.c:113 >>> nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 >>> nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 >>> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 >>> trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] >>> check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] >>> watchdog+0xc99/0x1360 kernel/hung_task.c:353 >>> kthread+0x361/0x430 kernel/kthread.c:255 >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 >>> Sending NMI from CPU 1 to CPUs 0: >>> NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 >>> arch/x86/include/asm/irqflags.h:60 >>> >>> >>> --- >>> This bug is generated by a bot. It may contain errors. >>> See https://goo.gl/tpsmEJ for more information about syzbot. >>> syzbot engineers can be reached at syzkaller@googlegroups.com. >>> >>> syzbot will keep track of this bug report. See: >>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >>> For information about bisection process see: >>> https://goo.gl/tpsmEJ#bisection >>> syzbot can test patches for this bug, for details see: >>> https://goo.gl/tpsmEJ#testing-patches > [-- Attachment #2: test.c --] [-- Type: text/x-csrc, Size: 5247 bytes --] // autogenerated by syzkaller (https://github.com/google/syzkaller) #define _GNU_SOURCE #include <dirent.h> #include <endian.h> #include <errno.h> #include <fcntl.h> #include <setjmp.h> #include <signal.h> #include <stdarg.h> #include <stdbool.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/prctl.h> #include <sys/stat.h> #include <sys/syscall.h> #include <sys/types.h> #include <sys/wait.h> #include <time.h> #include <unistd.h> static __thread int skip_segv; static __thread jmp_buf segv_env; static void segv_handler(int sig, siginfo_t* info, void* ctx) { uintptr_t addr = (uintptr_t)info->si_addr; const uintptr_t prog_start = 1 << 20; const uintptr_t prog_end = 100 << 20; if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED) && (addr < prog_start || addr > prog_end)) { _longjmp(segv_env, 1); } exit(sig); } static void install_segv_handler(void) { struct sigaction sa; memset(&sa, 0, sizeof(sa)); sa.sa_handler = SIG_IGN; syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8); syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8); memset(&sa, 0, sizeof(sa)); sa.sa_sigaction = segv_handler; sa.sa_flags = SA_NODEFER | SA_SIGINFO; sigaction(SIGSEGV, &sa, NULL); sigaction(SIGBUS, &sa, NULL); } #define NONFAILING(...) \ { \ __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \ if (_setjmp(segv_env) == 0) { \ __VA_ARGS__; \ } \ __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \ } static void sleep_ms(uint64_t ms) { usleep(ms * 1000); } static uint64_t current_time_ms(void) { struct timespec ts; if (clock_gettime(CLOCK_MONOTONIC, &ts)) exit(1); return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; } static bool write_file(const char* file, const char* what, ...) { char buf[1024]; va_list args; va_start(args, what); vsnprintf(buf, sizeof(buf), what, args); va_end(args); buf[sizeof(buf) - 1] = 0; int len = strlen(buf); int fd = open(file, O_WRONLY | O_CLOEXEC); if (fd == -1) return false; if (write(fd, buf, len) != len) { int err = errno; close(fd); errno = err; return false; } close(fd); return true; } static long syz_open_dev(volatile long a0, volatile long a1, volatile long a2) { if (a0 == 0xc || a0 == 0xb) { char buf[128]; sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block", (uint8_t)a1, (uint8_t)a2); return open(buf, O_RDWR, 0); } else { char buf[1024]; char* hash; NONFAILING(strncpy(buf, (char*)a0, sizeof(buf) - 1)); buf[sizeof(buf) - 1] = 0; while ((hash = strchr(buf, '#'))) { *hash = '0' + (char)(a1 % 10); a1 /= 10; } return open(buf, a2, 0); } } static void kill_and_wait(int pid, int* status) { kill(-pid, SIGKILL); kill(pid, SIGKILL); int i; for (i = 0; i < 100; i++) { if (waitpid(-1, status, WNOHANG | __WALL) == pid) return; usleep(1000); } DIR* dir = opendir("/sys/fs/fuse/connections"); if (dir) { for (;;) { struct dirent* ent = readdir(dir); if (!ent) break; if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0) continue; char abort[300]; snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", ent->d_name); int fd = open(abort, O_WRONLY); if (fd == -1) { continue; } if (write(fd, abort, 1) < 0) { } close(fd); } closedir(dir); } else { } while (waitpid(-1, status, __WALL) != pid) { } } static void setup_test() { prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0); setpgrp(); write_file("/proc/self/oom_score_adj", "1000"); } static void execute_one(void); #define WAIT_FLAGS __WALL static void loop(void) { int iter; for (iter = 0; iter < 1; iter++) { int pid = fork(); if (pid < 0) exit(1); if (pid == 0) { setup_test(); execute_one(); exit(0); } int status = 0; uint64_t start = current_time_ms(); for (;;) { if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid) break; sleep_ms(1); if (current_time_ms() - start < 5 * 1000) continue; kill_and_wait(pid, &status); break; } } } uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; void execute_one(void) { intptr_t res = 0; res = syscall(__NR_socket, 0x10, 2, 2); if (res != -1) r[0] = res; NONFAILING(memcpy((void*)0x20000080, "/dev/nbd#\000", 10)); res = syz_open_dev(0x20000080, 0, 0); if (res != -1) r[1] = res; res = syz_open_dev(0, 0, 0); if (res != -1) r[2] = res; syscall(__NR_ioctl, r[2], 0xab00, r[0]); syscall(__NR_ioctl, r[1], 0xab03, 0); } int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); install_segv_handler(); loop(); return 0; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 15:47 ` Mike Christie @ 2019-10-17 16:28 ` Richard W.M. Jones 2019-10-17 16:36 ` Eric Biggers 0 siblings, 1 reply; 13+ messages in thread From: Richard W.M. Jones @ 2019-10-17 16:28 UTC (permalink / raw) To: Mike Christie Cc: syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > >> Hey Josef and nbd list, > >> > >> I had a question about if there are any socket family restrictions for nbd? > > > > In normal circumstances, in userspace, the NBD protocol would only be > > used over AF_UNIX or AF_INET/AF_INET6. > > > > There's a bit of confusion because netlink is used by nbd-client to > > configure the NBD device, setting things like block size and timeouts > > (instead of ioctl which is deprecated). I think you don't mean this > > use of netlink? > > I didn't. It looks like it is just a bad test. > > For the automated test in this thread the test created a AF_NETLINK > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > for the NBD_DO_IT ioctl. > > I was not sure if the test creator picked any old socket and it just > happened to pick one nbd never supported, or it was trying to simulate > sockets that did not support the shutdown method. > > I attached the automated test that got run (test.c). I'd say it sounds like a bad test, but I'm not familiar with syzkaller nor how / from where it generates these tests. Did someone report a bug and then syzkaller wrote this test? Rich. > > > >> The bug here is that some socket familys do not support the > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug > >> we hang like we see below. > >> > >> I can just remove the flush_workqueue call in that code path since it's > >> not needed there, but it leaves the original bug my patch was hitting > >> where we leave the recv_work running which can then result in leaked > >> resources, or possible use after free crashes and you still get the hang > >> if you remove the module. > >> > >> It looks like we have used kernel_sock_shutdown for a while so I thought > >> we might never have supported sockets that did not support the callout. > >> Is that correct? If so then I can just add a check for this in > >> nbd_add_socket and fix that bug too. > > > > Rich. > > > >> On 09/30/2019 05:39 PM, syzbot wrote: > >>> Hello, > >>> > >>> syzbot found the following crash on: > >>> > >>> HEAD commit: bb2aee77 Add linux-next specific files for 20190926 > >>> git tree: linux-next > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13385ca3600000 > >>> kernel config: https://syzkaller.appspot.com/x/.config?x=e60af4ac5a01e964 > >>> dashboard link: > >>> https://syzkaller.appspot.com/bug?extid=24c12fa8d218ed26011a > >>> compiler: gcc (GCC) 9.0.0 20181231 (experimental) > >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12abc2a3600000 > >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11712c05600000 > >>> > >>> The bug was bisected to: > >>> > >>> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > >>> Author: Mike Christie <mchristi@redhat.com> > >>> Date: Sun Aug 4 19:10:06 2019 +0000 > >>> > >>> nbd: fix max number of supported devs > >>> > >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1226f3c5600000 > >>> final crash: https://syzkaller.appspot.com/x/report.txt?x=1126f3c5600000 > >>> console output: https://syzkaller.appspot.com/x/log.txt?x=1626f3c5600000 > >>> > >>> IMPORTANT: if you fix the bug, please add the following tag to the commit: > >>> Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com > >>> Fixes: e9e006f5fcf2 ("nbd: fix max number of supported devs") > >>> > >>> INFO: task syz-executor390:8778 can't die for more than 143 seconds. > >>> syz-executor390 D27432 8778 8777 0x00004004 > >>> Call Trace: > >>> context_switch kernel/sched/core.c:3384 [inline] > >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 > >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > >>> do_wait_for_common kernel/sched/completion.c:83 [inline] > >>> __wait_for_common kernel/sched/completion.c:104 [inline] > >>> wait_for_common kernel/sched/completion.c:115 [inline] > >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] > >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] > >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 > >>> vfs_ioctl fs/ioctl.c:47 [inline] > >>> file_ioctl fs/ioctl.c:539 [inline] > >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > >>> __do_sys_ioctl fs/ioctl.c:750 [inline] > >>> __se_sys_ioctl fs/ioctl.c:748 [inline] > >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> RIP: 0033:0x4452d9 > >>> Code: Bad RIP value. > >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > >>> INFO: task syz-executor390:8778 blocked for more than 143 seconds. > >>> Not tainted 5.3.0-next-20190926 #0 > >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >>> syz-executor390 D27432 8778 8777 0x00004004 > >>> Call Trace: > >>> context_switch kernel/sched/core.c:3384 [inline] > >>> __schedule+0x828/0x1c20 kernel/sched/core.c:4065 > >>> schedule+0xd9/0x260 kernel/sched/core.c:4132 > >>> schedule_timeout+0x717/0xc50 kernel/time/timer.c:1871 > >>> do_wait_for_common kernel/sched/completion.c:83 [inline] > >>> __wait_for_common kernel/sched/completion.c:104 [inline] > >>> wait_for_common kernel/sched/completion.c:115 [inline] > >>> wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136 > >>> flush_workqueue+0x40f/0x14c0 kernel/workqueue.c:2826 > >>> nbd_start_device_ioctl drivers/block/nbd.c:1272 [inline] > >>> __nbd_ioctl drivers/block/nbd.c:1347 [inline] > >>> nbd_ioctl+0xb2e/0xc44 drivers/block/nbd.c:1387 > >>> __blkdev_driver_ioctl block/ioctl.c:304 [inline] > >>> blkdev_ioctl+0xedb/0x1c20 block/ioctl.c:606 > >>> block_ioctl+0xee/0x130 fs/block_dev.c:1954 > >>> vfs_ioctl fs/ioctl.c:47 [inline] > >>> file_ioctl fs/ioctl.c:539 [inline] > >>> do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726 > >>> ksys_ioctl+0xab/0xd0 fs/ioctl.c:743 > >>> __do_sys_ioctl fs/ioctl.c:750 [inline] > >>> __se_sys_ioctl fs/ioctl.c:748 [inline] > >>> __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748 > >>> do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 > >>> entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> RIP: 0033:0x4452d9 > >>> Code: Bad RIP value. > >>> RSP: 002b:00007ffde928d288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > >>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004452d9 > >>> RDX: 0000000000000000 RSI: 000000000000ab03 RDI: 0000000000000004 > >>> RBP: 0000000000000000 R08: 00000000004025b0 R09: 00000000004025b0 > >>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402520 > >>> R13: 00000000004025b0 R14: 0000000000000000 R15: 0000000000000000 > >>> > >>> Showing all locks held in the system: > >>> 1 lock held by khungtaskd/1066: > >>> #0: ffffffff88faad80 (rcu_read_lock){....}, at: > >>> debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5337 > >>> 2 locks held by kworker/u5:0/1525: > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> __write_once_size include/linux/compiler.h:226 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> atomic_long_set include/asm-generic/atomic-long.h:40 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> set_work_data kernel/workqueue.c:620 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> set_work_pool_and_clear_pending kernel/workqueue.c:647 [inline] > >>> #0: ffff8880923d0d28 ((wq_completion)knbd0-recv){+.+.}, at: > >>> process_one_work+0x88b/0x1740 kernel/workqueue.c:2240 > >>> #1: ffff8880a63b7dc0 ((work_completion)(&args->work)){+.+.}, at: > >>> process_one_work+0x8c1/0x1740 kernel/workqueue.c:2244 > >>> 1 lock held by rsyslogd/8659: > >>> 2 locks held by getty/8749: > >>> #0: ffff888098c08090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f112e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8750: > >>> #0: ffff88808f10b090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f2d2e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8751: > >>> #0: ffff88809a6be090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f192e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8752: > >>> #0: ffff8880a48af090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f352e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8753: > >>> #0: ffff88808c599090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f212e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8754: > >>> #0: ffff88808f1a8090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f392e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> 2 locks held by getty/8755: > >>> #0: ffff88809ab33090 (&tty->ldisc_sem){++++}, at: > >>> ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:340 > >>> #1: ffffc90005f012e0 (&ldata->atomic_read_lock){+.+.}, at: > >>> n_tty_read+0x232/0x1c10 drivers/tty/n_tty.c:2156 > >>> > >>> ============================================= > >>> > >>> NMI backtrace for cpu 1 > >>> CPU: 1 PID: 1066 Comm: khungtaskd Not tainted 5.3.0-next-20190926 #0 > >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >>> Google 01/01/2011 > >>> Call Trace: > >>> __dump_stack lib/dump_stack.c:77 [inline] > >>> dump_stack+0x172/0x1f0 lib/dump_stack.c:113 > >>> nmi_cpu_backtrace.cold+0x70/0xb2 lib/nmi_backtrace.c:101 > >>> nmi_trigger_cpumask_backtrace+0x23b/0x28b lib/nmi_backtrace.c:62 > >>> arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 > >>> trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline] > >>> check_hung_uninterruptible_tasks kernel/hung_task.c:269 [inline] > >>> watchdog+0xc99/0x1360 kernel/hung_task.c:353 > >>> kthread+0x361/0x430 kernel/kthread.c:255 > >>> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 > >>> Sending NMI from CPU 1 to CPUs 0: > >>> NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0x10 > >>> arch/x86/include/asm/irqflags.h:60 > >>> > >>> > >>> --- > >>> This bug is generated by a bot. It may contain errors. > >>> See https://goo.gl/tpsmEJ for more information about syzbot. > >>> syzbot engineers can be reached at syzkaller@googlegroups.com. > >>> > >>> syzbot will keep track of this bug report. See: > >>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > >>> For information about bisection process see: > >>> https://goo.gl/tpsmEJ#bisection > >>> syzbot can test patches for this bug, for details see: > >>> https://goo.gl/tpsmEJ#testing-patches > > > > // autogenerated by syzkaller (https://github.com/google/syzkaller) > > #define _GNU_SOURCE > > #include <dirent.h> > #include <endian.h> > #include <errno.h> > #include <fcntl.h> > #include <setjmp.h> > #include <signal.h> > #include <stdarg.h> > #include <stdbool.h> > #include <stdint.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <sys/prctl.h> > #include <sys/stat.h> > #include <sys/syscall.h> > #include <sys/types.h> > #include <sys/wait.h> > #include <time.h> > #include <unistd.h> > > static __thread int skip_segv; > static __thread jmp_buf segv_env; > > static void segv_handler(int sig, siginfo_t* info, void* ctx) > { > uintptr_t addr = (uintptr_t)info->si_addr; > const uintptr_t prog_start = 1 << 20; > const uintptr_t prog_end = 100 << 20; > if (__atomic_load_n(&skip_segv, __ATOMIC_RELAXED) && > (addr < prog_start || addr > prog_end)) { > _longjmp(segv_env, 1); > } > exit(sig); > } > > static void install_segv_handler(void) > { > struct sigaction sa; > memset(&sa, 0, sizeof(sa)); > sa.sa_handler = SIG_IGN; > syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8); > syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8); > memset(&sa, 0, sizeof(sa)); > sa.sa_sigaction = segv_handler; > sa.sa_flags = SA_NODEFER | SA_SIGINFO; > sigaction(SIGSEGV, &sa, NULL); > sigaction(SIGBUS, &sa, NULL); > } > > #define NONFAILING(...) \ > { \ > __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST); \ > if (_setjmp(segv_env) == 0) { \ > __VA_ARGS__; \ > } \ > __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST); \ > } > > static void sleep_ms(uint64_t ms) > { > usleep(ms * 1000); > } > > static uint64_t current_time_ms(void) > { > struct timespec ts; > if (clock_gettime(CLOCK_MONOTONIC, &ts)) > exit(1); > return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000; > } > > static bool write_file(const char* file, const char* what, ...) > { > char buf[1024]; > va_list args; > va_start(args, what); > vsnprintf(buf, sizeof(buf), what, args); > va_end(args); > buf[sizeof(buf) - 1] = 0; > int len = strlen(buf); > int fd = open(file, O_WRONLY | O_CLOEXEC); > if (fd == -1) > return false; > if (write(fd, buf, len) != len) { > int err = errno; > close(fd); > errno = err; > return false; > } > close(fd); > return true; > } > > static long syz_open_dev(volatile long a0, volatile long a1, volatile long a2) > { > if (a0 == 0xc || a0 == 0xb) { > char buf[128]; > sprintf(buf, "/dev/%s/%d:%d", a0 == 0xc ? "char" : "block", (uint8_t)a1, > (uint8_t)a2); > return open(buf, O_RDWR, 0); > } else { > char buf[1024]; > char* hash; > NONFAILING(strncpy(buf, (char*)a0, sizeof(buf) - 1)); > buf[sizeof(buf) - 1] = 0; > while ((hash = strchr(buf, '#'))) { > *hash = '0' + (char)(a1 % 10); > a1 /= 10; > } > return open(buf, a2, 0); > } > } > > static void kill_and_wait(int pid, int* status) > { > kill(-pid, SIGKILL); > kill(pid, SIGKILL); > int i; > for (i = 0; i < 100; i++) { > if (waitpid(-1, status, WNOHANG | __WALL) == pid) > return; > usleep(1000); > } > DIR* dir = opendir("/sys/fs/fuse/connections"); > if (dir) { > for (;;) { > struct dirent* ent = readdir(dir); > if (!ent) > break; > if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0) > continue; > char abort[300]; > snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort", > ent->d_name); > int fd = open(abort, O_WRONLY); > if (fd == -1) { > continue; > } > if (write(fd, abort, 1) < 0) { > } > close(fd); > } > closedir(dir); > } else { > } > while (waitpid(-1, status, __WALL) != pid) { > } > } > > static void setup_test() > { > prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0); > setpgrp(); > write_file("/proc/self/oom_score_adj", "1000"); > } > > static void execute_one(void); > > #define WAIT_FLAGS __WALL > > static void loop(void) > { > int iter; > for (iter = 0; iter < 1; iter++) { > int pid = fork(); > if (pid < 0) > exit(1); > if (pid == 0) { > setup_test(); > execute_one(); > exit(0); > } > int status = 0; > uint64_t start = current_time_ms(); > for (;;) { > if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid) > break; > sleep_ms(1); > if (current_time_ms() - start < 5 * 1000) > continue; > kill_and_wait(pid, &status); > break; > } > } > } > > uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; > > void execute_one(void) > { > intptr_t res = 0; > res = syscall(__NR_socket, 0x10, 2, 2); > if (res != -1) > r[0] = res; > NONFAILING(memcpy((void*)0x20000080, "/dev/nbd#\000", 10)); > res = syz_open_dev(0x20000080, 0, 0); > if (res != -1) > r[1] = res; > res = syz_open_dev(0, 0, 0); > if (res != -1) > r[2] = res; > syscall(__NR_ioctl, r[2], 0xab00, r[0]); > syscall(__NR_ioctl, r[1], 0xab03, 0); > } > int main(void) > { > syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); > install_segv_handler(); > loop(); > return 0; > } -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 16:28 ` Richard W.M. Jones @ 2019-10-17 16:36 ` Eric Biggers 2019-10-17 16:49 ` Richard W.M. Jones 0 siblings, 1 reply; 13+ messages in thread From: Eric Biggers @ 2019-10-17 16:36 UTC (permalink / raw) To: Richard W.M. Jones Cc: Mike Christie, syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote: > On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > > >> Hey Josef and nbd list, > > >> > > >> I had a question about if there are any socket family restrictions for nbd? > > > > > > In normal circumstances, in userspace, the NBD protocol would only be > > > used over AF_UNIX or AF_INET/AF_INET6. > > > > > > There's a bit of confusion because netlink is used by nbd-client to > > > configure the NBD device, setting things like block size and timeouts > > > (instead of ioctl which is deprecated). I think you don't mean this > > > use of netlink? > > > > I didn't. It looks like it is just a bad test. > > > > For the automated test in this thread the test created a AF_NETLINK > > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > > for the NBD_DO_IT ioctl. > > > > I was not sure if the test creator picked any old socket and it just > > happened to pick one nbd never supported, or it was trying to simulate > > sockets that did not support the shutdown method. > > > > I attached the automated test that got run (test.c). > > I'd say it sounds like a bad test, but I'm not familiar with syzkaller > nor how / from where it generates these tests. Did someone report a > bug and then syzkaller wrote this test? > > Rich. > > > > > > >> The bug here is that some socket familys do not support the > > >> sock->ops->shutdown callout, and when nbd calls kernel_sock_shutdown > > >> their callout returns -EOPNOTSUPP. That then leaves recv_work stuck in > > >> nbd_read_stat -> sock_xmit -> sock_recvmsg. My patch added a > > >> flush_workqueue call, so for socket familys like AF_NETLINK in this bug > > >> we hang like we see below. > > >> > > >> I can just remove the flush_workqueue call in that code path since it's > > >> not needed there, but it leaves the original bug my patch was hitting > > >> where we leave the recv_work running which can then result in leaked > > >> resources, or possible use after free crashes and you still get the hang > > >> if you remove the module. > > >> > > >> It looks like we have used kernel_sock_shutdown for a while so I thought > > >> we might never have supported sockets that did not support the callout. > > >> Is that correct? If so then I can just add a check for this in > > >> nbd_add_socket and fix that bug too. > > > > > > Rich. > > > It's an automatically generated fuzz test. There's rarely any such thing as a "bad" fuzz test. If userspace can do something that causes the kernel to crash or hang, it's a kernel bug, with very few exceptions (e.g. like writing to /dev/mem). If there are cases that aren't supported, like sockets that don't support a certain function or whatever, then the code needs to check for those cases and return an error, not hang the kernel. - Eric ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 16:36 ` Eric Biggers @ 2019-10-17 16:49 ` Richard W.M. Jones 2019-10-17 21:26 ` Mike Christie 0 siblings, 1 reply; 13+ messages in thread From: Richard W.M. Jones @ 2019-10-17 16:49 UTC (permalink / raw) To: Mike Christie, syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Thu, Oct 17, 2019 at 09:36:34AM -0700, Eric Biggers wrote: > On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote: > > On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: > > > On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: > > > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > > > >> Hey Josef and nbd list, > > > >> > > > >> I had a question about if there are any socket family restrictions for nbd? > > > > > > > > In normal circumstances, in userspace, the NBD protocol would only be > > > > used over AF_UNIX or AF_INET/AF_INET6. > > > > > > > > There's a bit of confusion because netlink is used by nbd-client to > > > > configure the NBD device, setting things like block size and timeouts > > > > (instead of ioctl which is deprecated). I think you don't mean this > > > > use of netlink? > > > > > > I didn't. It looks like it is just a bad test. > > > > > > For the automated test in this thread the test created a AF_NETLINK > > > socket and passed it into the NBD_SET_SOCK ioctl. That is what got used > > > for the NBD_DO_IT ioctl. > > > > > > I was not sure if the test creator picked any old socket and it just > > > happened to pick one nbd never supported, or it was trying to simulate > > > sockets that did not support the shutdown method. > > > > > > I attached the automated test that got run (test.c). > > > > I'd say it sounds like a bad test, but I'm not familiar with syzkaller > > nor how / from where it generates these tests. Did someone report a > > bug and then syzkaller wrote this test? > > It's an automatically generated fuzz test. > > There's rarely any such thing as a "bad" fuzz test. If userspace > can do something that causes the kernel to crash or hang, it's a > kernel bug, with very few exceptions (e.g. like writing to > /dev/mem). > > If there are cases that aren't supported, like sockets that don't > support a certain function or whatever, then the code needs to check > for those cases and return an error, not hang the kernel. Oh I see. In that case I agree, although I believe this is a root-only API and root has a lot of ways to crash the kernel, but sure it could be fixed to restrict sockets to one of: - AF_LOCAL or AF_UNIX - AF_INET or AF_INET6 - AF_INET*_SDP (? no idea what this is, but it's used by nbd-client) Here are some ways NBD is used in real code: libnbd$ git grep AF_ fuzzing/libnbd-fuzz-wrapper.c: if (socketpair (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, sv) == -1) { generator/states-connect-socket-activation.c: s = socket (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0); generator/states-connect-socket-activation.c: addr.sun_family = AF_UNIX; generator/states-connect.c: fd = socket (AF_UNIX, SOCK_STREAM|SOCK_NONBLOCK|SOCK_CLOEXEC, 0); generator/states-connect.c: struct sockaddr_un sun = { .sun_family = AF_UNIX }; generator/states-connect.c: if (socketpair (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, sv) == -1) { nbdkit$ git grep AF_ plugins/info/info.c: case AF_INET: plugins/info/info.c: if (inet_ntop (AF_INET, &addr->sin_addr, plugins/info/info.c: case AF_INET6: plugins/info/info.c: if (inet_ntop (AF_INET6, &addr6->sin6_addr, plugins/info/info.c: case AF_UNIX: plugins/nbd/nbd-standalone.c: struct sockaddr_un sock = { .sun_family = AF_UNIX }; plugins/nbd/nbd-standalone.c: fd = socket (AF_UNIX, SOCK_STREAM, 0); server/sockets.c: sock = socket (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0); server/sockets.c: sock = set_cloexec (socket (AF_UNIX, SOCK_STREAM, 0)); server/sockets.c: addr.sun_family = AF_UNIX; tests/test-layers.c: if (socketpair (AF_LOCAL, SOCK_STREAM, 0, sfd) == -1) { tests/test-socket-activation.c: sock = socket (AF_UNIX, SOCK_STREAM /* NB do not use SOCK_CLOEXEC */, 0); tests/test-socket-activation.c: addr.sun_family = AF_UNIX; tests/test-socket-activation.c: sock = socket (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0); tests/web-server.c: listen_sock = socket (AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0); tests/web-server.c: addr.sun_family = AF_UNIX; nbd$ git grep AF_ gznbd/gznbd.c: if(socketpair(AF_UNIX, SOCK_STREAM, 0, pr)){ nbd-client.c: if (ai->ai_family == AF_INET) nbd-client.c: ai->ai_family = AF_INET_SDP; nbd-client.c: else (ai->ai_family == AF_INET6) nbd-client.c: ai->ai_family = AF_INET6_SDP; nbd-client.c: un_addr.sun_family = AF_UNIX; nbd-client.c: if ((sock = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { nbd-client.c: if (socketpair(AF_UNIX, SOCK_STREAM, 0, plainfd) < 0) nbd-server.c: if(netaddr.ss_family == AF_UNIX) { nbd-server.c: client->clientaddr.ss_family = AF_UNIX; nbd-server.c: if(client->clientaddr.ss_family == AF_UNIX) { nbd-server.c: assert((ai->ai_family == AF_INET) || (ai->ai_family == AF_INET6)); nbd-server.c: if(ai->ai_family == AF_INET) { nbd-server.c: } else if(ai->ai_family == AF_INET6) { nbd-server.c: socketpair(AF_UNIX, SOCK_STREAM, 0, sockets); nbd-server.c: sa.sun_family = AF_UNIX; nbd-server.c: sock = socket(AF_UNIX, SOCK_STREAM, 0); nbdsrv.c: int addrlen = addr->sa_family == AF_INET ? 4 : 16; nbdsrv.c: assert(addr->sa_family == AF_INET || addr->sa_family == AF_INET6); nbdsrv.c: case AF_INET: nbdsrv.c: case AF_INET6: tests/code/trim.c: socketpair(AF_UNIX, SOCK_STREAM, AF_UNIX, spair); tests/run/nbd-tester-client.c: if (socketpair(AF_UNIX, SOCK_STREAM, 0, plainfd) < 0) { tests/run/nbd-tester-client.c: if ((sock = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { tests/run/nbd-tester-client.c: addr.sun_family = AF_UNIX; tests/run/nbd-tester-client.c: addr.sin_family = AF_INET; tests/run/nbd-tester-client.c: if (socketpair(AF_UNIX, SOCK_STREAM, 0, sv) == -1) { qemu-nbd is a bit hard to grep like this, but it only supports Unix domain sockets or TCP/IP. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 16:49 ` Richard W.M. Jones @ 2019-10-17 21:26 ` Mike Christie 0 siblings, 0 replies; 13+ messages in thread From: Mike Christie @ 2019-10-17 21:26 UTC (permalink / raw) To: Richard W.M. Jones, syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On 10/17/2019 11:49 AM, Richard W.M. Jones wrote: > On Thu, Oct 17, 2019 at 09:36:34AM -0700, Eric Biggers wrote: >> On Thu, Oct 17, 2019 at 05:28:29PM +0100, Richard W.M. Jones wrote: >>> On Thu, Oct 17, 2019 at 10:47:59AM -0500, Mike Christie wrote: >>>> On 10/17/2019 09:03 AM, Richard W.M. Jones wrote: >>>>> On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: >>>>>> Hey Josef and nbd list, >>>>>> >>>>>> I had a question about if there are any socket family restrictions for nbd? >>>>> >>>>> In normal circumstances, in userspace, the NBD protocol would only be >>>>> used over AF_UNIX or AF_INET/AF_INET6. >>>>> >>>>> There's a bit of confusion because netlink is used by nbd-client to >>>>> configure the NBD device, setting things like block size and timeouts >>>>> (instead of ioctl which is deprecated). I think you don't mean this >>>>> use of netlink? >>>> >>>> I didn't. It looks like it is just a bad test. >>>> >>>> For the automated test in this thread the test created a AF_NETLINK >>>> socket and passed it into the NBD_SET_SOCK ioctl. That is what got used >>>> for the NBD_DO_IT ioctl. >>>> >>>> I was not sure if the test creator picked any old socket and it just >>>> happened to pick one nbd never supported, or it was trying to simulate >>>> sockets that did not support the shutdown method. >>>> >>>> I attached the automated test that got run (test.c). >>> >>> I'd say it sounds like a bad test, but I'm not familiar with syzkaller >>> nor how / from where it generates these tests. Did someone report a >>> bug and then syzkaller wrote this test? >> >> It's an automatically generated fuzz test. >> >> There's rarely any such thing as a "bad" fuzz test. If userspace >> can do something that causes the kernel to crash or hang, it's a >> kernel bug, with very few exceptions (e.g. like writing to >> /dev/mem). >> >> If there are cases that aren't supported, like sockets that don't >> support a certain function or whatever, then the code needs to check >> for those cases and return an error, not hang the kernel. > > Oh I see. In that case I agree, although I believe this is a > root-only API and root has a lot of ways to crash the kernel, but sure > it could be fixed to restrict sockets to one of: > > - AF_LOCAL or AF_UNIX > - AF_INET or AF_INET6 > - AF_INET*_SDP (? no idea what this is, but it's used by nbd-client) > This one as for a infinniband related socket family that never made it upstream. It did support the shutdown callout, so I just made my patch check that the passed in socket support that instead of hard coding the family names just in case there was some user still using it. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-17 14:03 ` Richard W.M. Jones 2019-10-17 15:47 ` Mike Christie @ 2019-10-30 8:39 ` Wouter Verhelst 2019-10-30 8:41 ` Wouter Verhelst 1 sibling, 1 reply; 13+ messages in thread From: Wouter Verhelst @ 2019-10-30 8:39 UTC (permalink / raw) To: Richard W.M. Jones Cc: Mike Christie, syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Thu, Oct 17, 2019 at 03:03:30PM +0100, Richard W.M. Jones wrote: > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > > Hey Josef and nbd list, > > > > I had a question about if there are any socket family restrictions for nbd? > > In normal circumstances, in userspace, the NBD protocol would only be > used over AF_UNIX or AF_INET/AF_INET6. Note that someone once also did work to make it work over SCTP. I incorporated the patch into nbd-client and nbd-server, but never actually tested it myself. I have no way of knowing if it even still works anymore... [...] -- To the thief who stole my anti-depressants: I hope you're happy -- seen somewhere on the Internet on a photo of a billboard ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: INFO: task hung in nbd_ioctl 2019-10-30 8:39 ` Wouter Verhelst @ 2019-10-30 8:41 ` Wouter Verhelst 0 siblings, 0 replies; 13+ messages in thread From: Wouter Verhelst @ 2019-10-30 8:41 UTC (permalink / raw) To: Richard W.M. Jones Cc: Mike Christie, syzbot, axboe, josef, linux-block, linux-kernel, nbd, syzkaller-bugs On Wed, Oct 30, 2019 at 10:39:57AM +0200, Wouter Verhelst wrote: > On Thu, Oct 17, 2019 at 03:03:30PM +0100, Richard W.M. Jones wrote: > > On Tue, Oct 01, 2019 at 04:19:25PM -0500, Mike Christie wrote: > > > Hey Josef and nbd list, > > > > > > I had a question about if there are any socket family restrictions for nbd? > > > > In normal circumstances, in userspace, the NBD protocol would only be > > used over AF_UNIX or AF_INET/AF_INET6. > > Note that someone once also did work to make it work over SCTP. I > incorporated the patch into nbd-client and nbd-server, but never > actually tested it myself. I have no way of knowing if it even still > works anymore... Actually, I meant SDP (as you pointed out downthread). Sorry for the confusion ;-) (I should probably kick that out though, indeed) -- To the thief who stole my anti-depressants: I hope you're happy -- seen somewhere on the Internet on a photo of a billboard ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-09-18 1:35 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-16 2:43 INFO: task hung in nbd_ioctl Hao Sun -- strict thread matches above, loose matches on Subject: below -- 2021-09-18 1:34 Hao Sun 2019-09-30 22:39 syzbot 2019-10-01 17:48 ` Mike Christie 2019-10-01 21:19 ` Mike Christie 2019-10-17 14:03 ` Richard W.M. Jones 2019-10-17 15:47 ` Mike Christie 2019-10-17 16:28 ` Richard W.M. Jones 2019-10-17 16:36 ` Eric Biggers 2019-10-17 16:49 ` Richard W.M. Jones 2019-10-17 21:26 ` Mike Christie 2019-10-30 8:39 ` Wouter Verhelst 2019-10-30 8:41 ` Wouter Verhelst
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).