* INFO: task hung in blk_queue_enter @ 2018-04-28 11:24 syzbot 2018-05-15 11:45 ` Tetsuo Handa 0 siblings, 1 reply; 32+ messages in thread From: syzbot @ 2018-04-28 11:24 UTC (permalink / raw) To: axboe, linux-block, linux-kernel, syzkaller-bugs Hello, syzbot hit the following crash on upstream commit d8a332730e757129e70675679f2b2a03f1ecf65e (Fri Apr 27 17:39:38 2018 +0000) Merge tag 'char-misc-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=c4f9cebf9d651f6e54de So far this crash happened 13 times on upstream. C reproducer: https://syzkaller.appspot.com/x/repro.c?id=4727023951937536 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6091190189424640 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5018098448990208 Kernel config: https://syzkaller.appspot.com/x/.config?id=7043958930931867332 compiler: gcc (GCC) 8.0.1 20180413 (experimental) IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. random: sshd: uninitialized urandom read (32 bytes read) print_req_error: I/O error, dev loop0, sector 7680 print_req_error: I/O error, dev loop0, sector 10240 print_req_error: I/O error, dev loop0, sector 7680 print_req_error: I/O error, dev loop0, sector 12800 INFO: task kworker/u4:2:31 blocked for more than 120 seconds. Not tainted 4.17.0-rc2+ #20 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u4:2 D12840 31 2 0x80000000 Workqueue: writeback wb_workfn (flush-7:0) Call Trace: context_switch kernel/sched/core.c:2848 [inline] __schedule+0x801/0x1e30 kernel/sched/core.c:3490 schedule+0xef/0x430 kernel/sched/core.c:3549 blk_queue_enter+0x8da/0xdf0 block/blk-core.c:953 generic_make_request+0x144/0x1510 block/blk-core.c:2395 submit_bio+0xba/0x460 block/blk-core.c:2568 submit_bh_wbc+0x5e8/0x7c0 fs/buffer.c:3076 __block_write_full_page+0x82e/0xef0 fs/buffer.c:1758 block_write_full_page+0x1ff/0x250 fs/buffer.c:2944 blkdev_writepage+0x24/0x30 fs/block_dev.c:566 __writepage+0x69/0xe0 mm/page-writeback.c:2302 write_cache_pages+0x920/0x1690 mm/page-writeback.c:2240 generic_writepages+0xcc/0x120 mm/page-writeback.c:2326 blkdev_writepages+0x1d/0x30 fs/block_dev.c:1951 do_writepages+0x9a/0x1a0 mm/page-writeback.c:2341 __writeback_single_inode+0x1e4/0x15c0 fs/fs-writeback.c:1323 writeback_sb_inodes+0x6f0/0x11a0 fs/fs-writeback.c:1587 __writeback_inodes_wb+0x1b0/0x320 fs/fs-writeback.c:1656 wb_writeback+0x9e4/0xf50 fs/fs-writeback.c:1765 wb_check_background_flush fs/fs-writeback.c:1833 [inline] wb_do_writeback fs/fs-writeback.c:1921 [inline] wb_workfn+0xd2f/0x1770 fs/fs-writeback.c:1949 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Showing all locks held in the system: 3 locks held by kworker/u4:2/31: #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: __write_once_size include/linux/compiler.h:215 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:57 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: set_work_data kernel/workqueue.c:617 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline] #0: (ptrval) ((wq_completion)"writeback"){+.+.}, at: process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116 #1: (ptrval) ((work_completion)(&(&wb->dwork)->work)){+.+.}, at: process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120 #2: (ptrval) (&type->s_umount_key#35){.+.+}, at: trylock_super+0x22/0x110 fs/super.c:399 2 locks held by khungtaskd/891: #0: (ptrval) (rcu_read_lock){....}, at: check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (ptrval) (rcu_read_lock){....}, at: watchdog+0x1ff/0xf60 kernel/hung_task.c:249 #1: (ptrval) (tasklist_lock){.+.+}, at: debug_show_all_locks+0xde/0x34a kernel/locking/lockdep.c:4470 2 locks held by getty/4512: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4513: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4514: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4515: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4516: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4517: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 2 locks held by getty/4518: #0: (ptrval) (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (ptrval) (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0x321/0x1cc0 drivers/tty/n_tty.c:2131 1 lock held by syz-executor516/4581: #0: (ptrval) (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2130 drivers/block/loop.c:1391 ============================================= NMI backtrace for cpu 1 CPU: 1 PID: 891 Comm: khungtaskd Not tainted 4.17.0-rc2+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103 nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38 trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline] check_hung_task kernel/hung_task.c:132 [inline] check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline] watchdog+0xc10/0xf60 kernel/hung_task.c:249 kthread+0x345/0x410 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.17.0-rc2+ #20 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:ksoftirqd_running kernel/softirq.c:88 [inline] RIP: 0010:invoke_softirq kernel/softirq.c:355 [inline] RIP: 0010:irq_exit+0x125/0x200 kernel/softirq.c:405 RSP: 0018:ffff8801dae07cd8 EFLAGS: 00000046 RAX: dffffc0000000000 RBX: ffff8801d9a9c200 RCX: 0000000000000000 RDX: 1ffff1003b353842 RSI: 0000000000000000 RDI: ffff8801d9a9c210 RBP: ffff8801dae07ce0 R08: ffffed003b5c46c3 R09: ffffed003b5c46c2 R10: ffffed003b5c46c2 R11: ffff8801dae23613 R12: ffff8801dae2c580 R13: ffffffff88c75dc0 R14: ffff8801dae07e68 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600400 CR3: 00000001b5553000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> scheduler_ipi+0x52b/0xa30 kernel/sched/core.c:1778 smp_reschedule_interrupt+0xed/0x660 arch/x86/kernel/smp.c:277 reschedule_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:887 </IRQ> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54 RSP: 0018:ffffffff88c07bc0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff02 RAX: dffffc0000000000 RBX: 1ffffffff1180f7b RCX: 0000000000000000 RDX: 1ffffffff11a3160 RSI: 0000000000000001 RDI: ffffffff88d18b00 RBP: ffffffff88c07bc0 R08: ffffed003b5c46c3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff88c07c78 R14: ffffffff897bf060 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0xc2/0x440 arch/x86/kernel/process.c:354 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x395/0x560 kernel/sched/idle.c:262 cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368 rest_init+0xe1/0xe4 init/main.c:441 start_kernel+0x906/0x92d init/main.c:737 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:445 x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:426 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242 Code: 00 01 5b 5d c3 65 48 8b 1d 89 a4 ba 7e 48 85 db 74 2c 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 10 48 89 fa 48 c1 ea 03 80 3c 02 00 <0f> 85 c2 00 00 00 48 8b 43 10 48 85 c0 0f 84 31 ff ff ff 48 c7 --- This bug is generated by a dumb bot. It may contain errors. See https://goo.gl/tpsmEJ for details. Direct all questions to syzkaller@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title If you want to test a patch for this bug, please reply with: #syz test: git://repo/address.git branch and provide the patch inline or as an attachment. To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-04-28 11:24 INFO: task hung in blk_queue_enter syzbot @ 2018-05-15 11:45 ` Tetsuo Handa 2018-05-16 13:05 ` Tetsuo Handa 0 siblings, 1 reply; 32+ messages in thread From: Tetsuo Handa @ 2018-05-15 11:45 UTC (permalink / raw) To: syzbot, axboe, linux-block, syzkaller-bugs Cc: linux-kernel, linux-ext4, dvyukov I managed to obtain SysRq-t when khungtaskd fires using debug printk() ( https://groups.google.com/forum/#!topic/syzkaller-bugs/OTuOsVebAiE ). Only 4 threads shown below seems to be relevant to this problem. [ 246.929688] task PC stack pid father [ 249.888937] jbd2/sda1-8 D17736 2271 2 0x80000000 [ 249.894586] Call Trace: [ 249.897193] __schedule+0x801/0x1e30 [ 249.900954] schedule+0xef/0x430 [ 249.904356] io_schedule+0x1c/0x70 [ 249.907935] bit_wait_io+0x18/0x90 [ 249.911492] __wait_on_bit+0xb3/0x130 [ 249.915313] out_of_line_wait_on_bit+0x204/0x3a0 [ 249.920105] __wait_on_buffer+0x76/0x90 [ 249.924102] jbd2_journal_commit_transaction+0x655b/0x8c18 [ 249.929893] kjournald2+0x26c/0xb30 [ 249.933579] kthread+0x345/0x410 [ 249.936966] ret_from_fork+0x3a/0x50 [ 254.408972] syz-executor7 D18976 5010 4828 0x00000004 [ 254.414639] Call Trace: [ 254.417263] __schedule+0x801/0x1e30 [ 254.421070] schedule+0xef/0x430 [ 254.424467] blk_queue_enter+0x8da/0xdf0 [ 254.428584] generic_make_request+0x144/0x1510 [ 254.433217] blk_queue_split+0x374/0x2090 [ 254.437425] blk_mq_make_request+0x2d0/0x25c0 [ 254.442004] generic_make_request+0x795/0x1510 [ 254.446663] submit_bio+0xba/0x460 [ 254.451104] mpage_readpages+0x6d7/0x970 [ 254.455224] blkdev_readpages+0x2c/0x40 [ 254.459220] __do_page_cache_readahead+0x79a/0xdc0 [ 254.464205] ondemand_readahead+0x550/0xc40 [ 254.468559] page_cache_sync_readahead+0xd1/0x110 [ 254.473430] generic_file_read_iter+0x1a74/0x2f00 [ 254.478423] blkdev_read_iter+0x120/0x190 [ 254.482594] generic_file_splice_read+0x552/0x910 [ 254.487484] do_splice_to+0x12e/0x190 [ 254.491311] splice_direct_to_actor+0x268/0x8d0 [ 254.496017] do_splice_direct+0x2cc/0x400 [ 254.500224] do_sendfile+0x60f/0xe00 [ 254.503971] __x64_sys_sendfile64+0x155/0x240 [ 254.508507] do_syscall_64+0x1b1/0x800 [ 254.512431] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 254.568779] syz-executor7 D25408 5021 4828 0x00000004 [ 254.574429] Call Trace: [ 254.577125] __schedule+0x801/0x1e30 [ 254.580882] schedule+0xef/0x430 [ 254.584297] blk_mq_freeze_queue_wait+0x1ce/0x460 [ 254.589192] blk_freeze_queue+0x4a/0x80 [ 254.593209] blk_mq_freeze_queue+0x15/0x20 [ 254.597464] loop_clr_fd+0x226/0xb80 [ 254.601208] lo_ioctl+0x642/0x2130 [ 254.604774] blkdev_ioctl+0x9b6/0x2020 [ 254.608715] block_ioctl+0xee/0x130 [ 254.612362] do_vfs_ioctl+0x1cf/0x16a0 [ 254.616294] ksys_ioctl+0xa9/0xd0 [ 254.619770] __x64_sys_ioctl+0x73/0xb0 [ 254.623674] do_syscall_64+0x1b1/0x800 [ 254.627596] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 255.022839] 1 lock held by syz-executor7/5021: [ 255.028209] #0: (ptrval) (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2130 [ 254.719133] blkid D23880 5071 2713 0x00000000 [ 254.724791] Call Trace: [ 254.727402] __schedule+0x801/0x1e30 [ 254.731159] schedule+0xef/0x430 [ 254.734562] schedule_preempt_disabled+0x10/0x20 [ 254.739333] __mutex_lock+0xe38/0x17f0 [ 254.743280] mutex_lock_killable_nested+0x16/0x20 [ 254.748135] lo_ioctl+0x8d/0x2130 [ 254.751611] blkdev_ioctl+0x9b6/0x2020 [ 254.755550] block_ioctl+0xee/0x130 [ 254.759200] do_vfs_ioctl+0x1cf/0x16a0 [ 254.763123] ksys_ioctl+0xa9/0xd0 [ 254.766601] __x64_sys_ioctl+0x73/0xb0 [ 254.770506] do_syscall_64+0x1b1/0x800 [ 254.774425] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 255.036399] 1 lock held by blkid/5071: [ 255.041075] #0: (ptrval) (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2130 syz-executor7(PID=5021) is doing ioctl(LOOP_CLR_FD). blkid(PID=5071) is doing ioctl() but is blocked on lo_ctl_mutex which is held by syz-executor7(PID=5021). Therefore, I assume that blkid(PID=5071) is irrelevant for this hung up problem. syz-executor7(PID=5021) is stuck at blk_mq_freeze_queue_wait() from blk_freeze_queue() from blk_mq_freeze_queue() from loop_clr_fd(). /* * Guarantee no request is in use, so we can change any data structure of * the queue afterward. */ void blk_freeze_queue(struct request_queue *q) { /* * In the !blk_mq case we are only calling this to kill the * q_usage_counter, otherwise this increases the freeze depth * and waits for it to return to zero. For this reason there is * no blk_unfreeze_queue(), and blk_freeze_queue() is not * exported to drivers as the only user for unfreeze is blk_mq. */ blk_freeze_queue_start(q); if (!q->mq_ops) blk_drain_queue(q); blk_mq_freeze_queue_wait(q); } q->mq_freeze_depth is incremented at blk_freeze_queue_start() and the caller waits until q->q_usage_counter becomes 0. void blk_freeze_queue_start(struct request_queue *q) { int freeze_depth; freeze_depth = atomic_inc_return(&q->mq_freeze_depth); if (freeze_depth == 1) { percpu_ref_kill(&q->q_usage_counter); if (q->mq_ops) blk_mq_run_hw_queues(q, false); } } void blk_mq_freeze_queue_wait(struct request_queue *q) { wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); } I couldn't check whether freeze_depth in blk_freeze_queue_start() was 1, but presumably q->mq_freeze_depth > 0 because syz-executor7(PID=5010) is stuck at wait_event() in blk_queue_enter(). syz-executor7 D18976 5010 4828 0x00000004 Call Trace: context_switch kernel/sched/core.c:2859 [inline] __schedule+0x801/0x1e30 kernel/sched/core.c:3501 schedule+0xef/0x430 kernel/sched/core.c:3545 blk_queue_enter+0x8da/0xdf0 block/blk-core.c:953 generic_make_request+0x144/0x1510 block/blk-core.c:2395 blk_queue_split+0x374/0x2090 block/blk-merge.c:215 blk_mq_make_request+0x2d0/0x25c0 block/blk-mq.c:1861 generic_make_request+0x795/0x1510 block/blk-core.c:2460 submit_bio+0xba/0x460 block/blk-core.c:2568 mpage_bio_submit fs/mpage.c:66 [inline] mpage_readpages+0x6d7/0x970 fs/mpage.c:393 blkdev_readpages+0x2c/0x40 fs/block_dev.c:577 read_pages mm/readahead.c:121 [inline] __do_page_cache_readahead+0x79a/0xdc0 mm/readahead.c:199 ra_submit mm/internal.h:66 [inline] ondemand_readahead+0x550/0xc40 mm/readahead.c:478 page_cache_sync_readahead+0xd1/0x110 mm/readahead.c:510 generic_file_buffered_read mm/filemap.c:2092 [inline] generic_file_read_iter+0x1a74/0x2f00 mm/filemap.c:2362 blkdev_read_iter+0x120/0x190 fs/block_dev.c:1930 call_read_iter include/linux/fs.h:1778 [inline] generic_file_splice_read+0x552/0x910 fs/splice.c:307 do_splice_to+0x12e/0x190 fs/splice.c:880 splice_direct_to_actor+0x268/0x8d0 fs/splice.c:952 do_splice_direct+0x2cc/0x400 fs/splice.c:1061 do_sendfile+0x60f/0xe00 fs/read_write.c:1440 __do_sys_sendfile64 fs/read_write.c:1495 [inline] __se_sys_sendfile64 fs/read_write.c:1487 [inline] __x64_sys_sendfile64+0x155/0x240 fs/read_write.c:1487 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Since flags == 0, preempt == false. Since stuck at wait_event(), success == false. Thus, atomic_read(&q->mq_freeze_depth) > 0 if blk_queue_dying(q) == false. And I guess blk_queue_dying(q) == false because we are just trying to freeze/unfreeze. /** * blk_queue_enter() - try to increase q->q_usage_counter * @q: request queue pointer * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT */ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) { const bool preempt = flags & BLK_MQ_REQ_PREEMPT; while (true) { bool success = false; rcu_read_lock(); if (percpu_ref_tryget_live(&q->q_usage_counter)) { /* * The code that sets the PREEMPT_ONLY flag is * responsible for ensuring that that flag is globally * visible before the queue is unfrozen. */ if (preempt || !blk_queue_preempt_only(q)) { success = true; } else { percpu_ref_put(&q->q_usage_counter); } } rcu_read_unlock(); if (success) return 0; if (flags & BLK_MQ_REQ_NOWAIT) return -EBUSY; /* * read pair of barrier in blk_freeze_queue_start(), * we need to order reading __PERCPU_REF_DEAD flag of * .q_usage_counter and reading .mq_freeze_depth or * queue dying flag, otherwise the following wait may * never return if the two reads are reordered. */ smp_rmb(); wait_event(q->mq_freeze_wq, (atomic_read(&q->mq_freeze_depth) == 0 && (preempt || !blk_queue_preempt_only(q))) || blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } } jbd2/sda1-8(PID=2271) is stuck waiting for journal commit operation. I don't know how this thread is involved to this problem. jbd2/sda1-8 D17736 2271 2 0x80000000 Call Trace: context_switch kernel/sched/core.c:2859 [inline] __schedule+0x801/0x1e30 kernel/sched/core.c:3501 schedule+0xef/0x430 kernel/sched/core.c:3545 io_schedule+0x1c/0x70 kernel/sched/core.c:5165 bit_wait_io+0x18/0x90 kernel/sched/wait_bit.c:207 __wait_on_bit+0xb3/0x130 kernel/sched/wait_bit.c:48 out_of_line_wait_on_bit+0x204/0x3a0 kernel/sched/wait_bit.c:63 wait_on_bit_io include/linux/wait_bit.h:101 [inline] __wait_on_buffer+0x76/0x90 fs/buffer.c:118 wait_on_buffer include/linux/buffer_head.h:356 [inline] journal_wait_on_commit_record fs/jbd2/commit.c:174 [inline] jbd2_journal_commit_transaction+0x655b/0x8c18 fs/jbd2/commit.c:865 kjournald2+0x26c/0xb30 fs/jbd2/journal.c:229 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Can we find the culprit from these hints? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-15 11:45 ` Tetsuo Handa @ 2018-05-16 13:05 ` Tetsuo Handa 2018-05-16 14:56 ` Bart Van Assche 2018-05-16 17:33 ` Alan Jenkins 0 siblings, 2 replies; 32+ messages in thread From: Tetsuo Handa @ 2018-05-16 13:05 UTC (permalink / raw) To: syzbot, linux-block, syzkaller-bugs, Dan Williams, Jens Axboe Cc: linux-kernel, linux-ext4, dvyukov, Alan Jenkins, Bart Van Assche, Christoph Hellwig, Hannes Reinecke, Johannes Thumshirn, Keith Busch, Martin K. Petersen, Martin Steigerwald, Ming Lei, Oleksandr Natalenko, Ross Zwisler Tetsuo Handa wrote: > I couldn't check whether freeze_depth in blk_freeze_queue_start() was 1, > but presumably q->mq_freeze_depth > 0 because syz-executor7(PID=5010) is > stuck at wait_event() in blk_queue_enter(). > > Since flags == 0, preempt == false. Since stuck at wait_event(), success == false. > Thus, atomic_read(&q->mq_freeze_depth) > 0 if blk_queue_dying(q) == false. And I > guess blk_queue_dying(q) == false because we are just trying to freeze/unfreeze. > I was able to reproduce the hung up using modified reproducer, and got values using below debug printk() patch. --- a/block/blk-core.c +++ b/block/blk-core.c @@ -950,10 +950,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); - wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || - blk_queue_dying(q)); + while (wait_event_timeout(q->mq_freeze_wq, + (atomic_read(&q->mq_freeze_depth) == 0 && + (preempt || !blk_queue_preempt_only(q))) || + blk_queue_dying(q), 10 * HZ) == 0) + printk("%s(%u): q->mq_freeze_depth=%d preempt=%d blk_queue_preempt_only(q)=%d blk_queue_dying(q)=%d\n", + current->comm, current->pid, atomic_read(&q->mq_freeze_depth), preempt, blk_queue_preempt_only(q), blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; } [ 75.869126] print_req_error: I/O error, dev loop0, sector 0 [ 85.983146] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 96.222884] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 106.463322] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 [ 116.702912] a.out(8838): q->mq_freeze_depth=1 preempt=0 blk_queue_preempt_only(q)=0 blk_queue_dying(q)=0 One ore more threads are waiting for q->mq_freeze_depth to become 0. But the thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in blk_queue_enter() will never be satisfied. But what does that wait_event() want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 as if blk_queue_dying(q) == true? That is, something like below: diff --git a/block/blk-core.c b/block/blk-core.c index 85909b4..59e2496 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) smp_rmb(); wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || + atomic_read(&q->mq_freeze_depth) || + (preempt || !blk_queue_preempt_only(q)) || blk_queue_dying(q)); - if (blk_queue_dying(q)) + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) return -ENODEV; } } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-16 13:05 ` Tetsuo Handa @ 2018-05-16 14:56 ` Bart Van Assche 2018-05-16 17:33 ` Alan Jenkins 1 sibling, 0 replies; 32+ messages in thread From: Bart Van Assche @ 2018-05-16 14:56 UTC (permalink / raw) To: syzbot+c4f9cebf9d651f6e54de, syzkaller-bugs, dan.j.williams, linux-block, penguin-kernel, axboe Cc: linux-kernel, jthumshirn, alan.christopher.jenkins, hch, martin.petersen, dvyukov, ming.lei, martin, oleksandr, hare, ross.zwisler, keith.busch, linux-ext4 T24gV2VkLCAyMDE4LTA1LTE2IGF0IDIyOjA1ICswOTAwLCBUZXRzdW8gSGFuZGEgd3JvdGU6DQo+ IE9uZSBvcmUgbW9yZSB0aHJlYWRzIGFyZSB3YWl0aW5nIGZvciBxLT5tcV9mcmVlemVfZGVwdGgg dG8gYmVjb21lIDAuIEJ1dCB0aGUNCj4gdGhyZWFkIHdobyBpbmNyZW1lbnRlZCBxLT5tcV9mcmVl emVfZGVwdGggYXQgYmxrX2ZyZWV6ZV9xdWV1ZV9zdGFydChxKSBmcm9tDQo+IGJsa19mcmVlemVf cXVldWUoKSBpcyB3YWl0aW5nIGF0IGJsa19tcV9mcmVlemVfcXVldWVfd2FpdCgpLiBUaGVyZWZv cmUsDQo+IGF0b21pY19yZWFkKCZxLT5tcV9mcmVlemVfZGVwdGgpID09IDAgY29uZGl0aW9uIGZv ciB3YWl0X2V2ZW50KCkgaW4NCj4gYmxrX3F1ZXVlX2VudGVyKCkgd2lsbCBuZXZlciBiZSBzYXRp c2ZpZWQuIEJ1dCB3aGF0IGRvZXMgdGhhdCB3YWl0X2V2ZW50KCkNCj4gd2FudCB0byBkbz8gSXNu J3QgInN0YXJ0IGZyZWV6aW5nIiBhIHNvcnQgb2YgYmxrX3F1ZXVlX2R5aW5nKHEpID09IHRydWU/ DQo+IFNpbmNlIHBlcmNwdV9yZWZfdHJ5Z2V0X2xpdmUoJnEtPnFfdXNhZ2VfY291bnRlcikgZmFp bGVkIGFuZCB0aGUgcXVldWUgaXMNCj4gYWJvdXQgdG8gYmUgZnJvemVuLCBzaG91bGRuJ3Qgd2Ug dHJlYXQgYXRvbWljX3JlYWQoJnEtPm1xX2ZyZWV6ZV9kZXB0aCkgIT0gMA0KPiBhcyBpZiBibGtf cXVldWVfZHlpbmcocSkgPT0gdHJ1ZT8gVGhhdCBpcywgc29tZXRoaW5nIGxpa2UgYmVsb3c6DQo+ IA0KPiBkaWZmIC0tZ2l0IGEvYmxvY2svYmxrLWNvcmUuYyBiL2Jsb2NrL2Jsay1jb3JlLmMNCj4g aW5kZXggODU5MDliNC4uNTllMjQ5NiAxMDA2NDQNCj4gLS0tIGEvYmxvY2svYmxrLWNvcmUuYw0K PiArKysgYi9ibG9jay9ibGstY29yZS5jDQo+IEBAIC05NTEsMTAgKzk1MSwxMCBAQCBpbnQgYmxr X3F1ZXVlX2VudGVyKHN0cnVjdCByZXF1ZXN0X3F1ZXVlICpxLCBibGtfbXFfcmVxX2ZsYWdzX3Qg ZmxhZ3MpDQo+ICAJCXNtcF9ybWIoKTsNCj4gIA0KPiAgCQl3YWl0X2V2ZW50KHEtPm1xX2ZyZWV6 ZV93cSwNCj4gLQkJCSAgIChhdG9taWNfcmVhZCgmcS0+bXFfZnJlZXplX2RlcHRoKSA9PSAwICYm DQo+IC0JCQkgICAgKHByZWVtcHQgfHwgIWJsa19xdWV1ZV9wcmVlbXB0X29ubHkocSkpKSB8fA0K PiArCQkJICAgYXRvbWljX3JlYWQoJnEtPm1xX2ZyZWV6ZV9kZXB0aCkgfHwNCj4gKwkJCSAgIChw cmVlbXB0IHx8ICFibGtfcXVldWVfcHJlZW1wdF9vbmx5KHEpKSB8fA0KPiAgCQkJICAgYmxrX3F1 ZXVlX2R5aW5nKHEpKTsNCj4gLQkJaWYgKGJsa19xdWV1ZV9keWluZyhxKSkNCj4gKwkJaWYgKGF0 b21pY19yZWFkKCZxLT5tcV9mcmVlemVfZGVwdGgpIHx8IGJsa19xdWV1ZV9keWluZyhxKSkNCj4g IAkJCXJldHVybiAtRU5PREVWOw0KPiAgCX0NCj4gIH0NCg0KVGhhdCBjaGFuZ2UgbG9va3Mgd3Jv bmcgdG8gbWUuIEFkZGl0aW9uYWxseSwgSSB0aGluayB0aGF0IHlvdSBhcmUgbG9va2luZyBpbg0K dGhlIHdyb25nIGRpcmVjdGlvbi4gU2luY2UgYmxrX21xX2ZyZWV6ZV9xdWV1ZV93YWl0KCkgYW5k IGJsa19xdWV1ZV9lbnRlcigpDQp3b3JrIGZpbmUgZm9yIGFsbCBibG9jayBkcml2ZXJzIGV4Y2Vw dCB0aGUgbG9vcCBkcml2ZXIgSSB0aGluayB0aGF0IHlvdSBzaG91bGQNCmhhdmUgYSBjbG9zZXIg bG9vayBhdCBob3cgdGhlIGxvb3AgZHJpdmVyIHVzZXMgdGhpcyBibG9jayBsYXllciBmdW5jdGlv bmFsaXR5Lg0KDQpUaGFua3MsDQoNCkJhcnQuDQoNCg0KDQo= ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-05-16 14:56 ` Bart Van Assche 0 siblings, 0 replies; 32+ messages in thread From: Bart Van Assche @ 2018-05-16 14:56 UTC (permalink / raw) To: syzbot+c4f9cebf9d651f6e54de, syzkaller-bugs, dan.j.williams, linux-block, penguin-kernel, axboe Cc: linux-kernel, jthumshirn, alan.christopher.jenkins, hch, martin.petersen, dvyukov, ming.lei, martin, oleksandr, hare, ross.zwisler, keith.busch, linux-ext4 On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > One ore more threads are waiting for q->mq_freeze_depth to become 0. But the > thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from > blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, > atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in > blk_queue_enter() will never be satisfied. But what does that wait_event() > want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? > Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is > about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 > as if blk_queue_dying(q) == true? That is, something like below: > > diff --git a/block/blk-core.c b/block/blk-core.c > index 85909b4..59e2496 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > smp_rmb(); > > wait_event(q->mq_freeze_wq, > - (atomic_read(&q->mq_freeze_depth) == 0 && > - (preempt || !blk_queue_preempt_only(q))) || > + atomic_read(&q->mq_freeze_depth) || > + (preempt || !blk_queue_preempt_only(q)) || > blk_queue_dying(q)); > - if (blk_queue_dying(q)) > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > return -ENODEV; > } > } That change looks wrong to me. Additionally, I think that you are looking in the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter() work fine for all block drivers except the loop driver I think that you should have a closer look at how the loop driver uses this block layer functionality. Thanks, Bart. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-16 14:56 ` Bart Van Assche @ 2018-05-16 15:16 ` Dmitry Vyukov -1 siblings, 0 replies; 32+ messages in thread From: Dmitry Vyukov @ 2018-05-16 15:16 UTC (permalink / raw) To: Bart Van Assche Cc: syzbot+c4f9cebf9d651f6e54de, syzkaller-bugs, dan.j.williams, linux-block, penguin-kernel@I-love.SAKURA.ne.jp, axboe, linux-kernel, jthumshirn, alan.christopher.jenkins, hch, martin.petersen, ming.lei, martin, oleksandr, hare, ross.zwisler, keith.busch, linux-ext4 On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: >> One ore more threads are waiting for q->mq_freeze_depth to become 0. But the >> thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from >> blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, >> atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in >> blk_queue_enter() will never be satisfied. But what does that wait_event() >> want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? >> Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is >> about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 >> as if blk_queue_dying(q) == true? That is, something like below: >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 85909b4..59e2496 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) >> smp_rmb(); >> >> wait_event(q->mq_freeze_wq, >> - (atomic_read(&q->mq_freeze_depth) == 0 && >> - (preempt || !blk_queue_preempt_only(q))) || >> + atomic_read(&q->mq_freeze_depth) || >> + (preempt || !blk_queue_preempt_only(q)) || >> blk_queue_dying(q)); >> - if (blk_queue_dying(q)) >> + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) >> return -ENODEV; >> } >> } > > That change looks wrong to me. Hi Bart, Why does it look wrong to you? > Additionally, I think that you are looking in > the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter() > work fine for all block drivers except the loop driver I think that you should > have a closer look at how the loop driver uses this block layer functionality. > > Thanks, > > Bart. > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-05-16 15:16 ` Dmitry Vyukov 0 siblings, 0 replies; 32+ messages in thread From: Dmitry Vyukov @ 2018-05-16 15:16 UTC (permalink / raw) To: Bart Van Assche Cc: syzbot+c4f9cebf9d651f6e54de, syzkaller-bugs, dan.j.williams, linux-block, penguin-kernel@I-love.SAKURA.ne.jp, axboe, linux-kernel, jthumshirn, alan.christopher.jenkins, hch, martin.petersen, ming.lei, martin, On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: >> One ore more threads are waiting for q->mq_freeze_depth to become 0. But the >> thread who incremented q->mq_freeze_depth at blk_freeze_queue_start(q) from >> blk_freeze_queue() is waiting at blk_mq_freeze_queue_wait(). Therefore, >> atomic_read(&q->mq_freeze_depth) == 0 condition for wait_event() in >> blk_queue_enter() will never be satisfied. But what does that wait_event() >> want to do? Isn't "start freezing" a sort of blk_queue_dying(q) == true? >> Since percpu_ref_tryget_live(&q->q_usage_counter) failed and the queue is >> about to be frozen, shouldn't we treat atomic_read(&q->mq_freeze_depth) != 0 >> as if blk_queue_dying(q) == true? That is, something like below: >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 85909b4..59e2496 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) >> smp_rmb(); >> >> wait_event(q->mq_freeze_wq, >> - (atomic_read(&q->mq_freeze_depth) == 0 && >> - (preempt || !blk_queue_preempt_only(q))) || >> + atomic_read(&q->mq_freeze_depth) || >> + (preempt || !blk_queue_preempt_only(q)) || >> blk_queue_dying(q)); >> - if (blk_queue_dying(q)) >> + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) >> return -ENODEV; >> } >> } > > That change looks wrong to me. Hi Bart, Why does it look wrong to you? > Additionally, I think that you are looking in > the wrong direction. Since blk_mq_freeze_queue_wait() and blk_queue_enter() > work fine for all block drivers except the loop driver I think that you should > have a closer look at how the loop driver uses this block layer functionality. > > Thanks, > > Bart. > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-16 15:16 ` Dmitry Vyukov (?) @ 2018-05-16 15:37 ` Bart Van Assche -1 siblings, 0 replies; 32+ messages in thread From: Bart Van Assche @ 2018-05-16 15:37 UTC (permalink / raw) To: dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, penguin-kernel, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 T24gV2VkLCAyMDE4LTA1LTE2IGF0IDE3OjE2ICswMjAwLCBEbWl0cnkgVnl1a292IHdyb3RlOg0K PiBPbiBXZWQsIE1heSAxNiwgMjAxOCBhdCA0OjU2IFBNLCBCYXJ0IFZhbiBBc3NjaGUgPEJhcnQu VmFuQXNzY2hlQHdkYy5jb20+IHdyb3RlOg0KPiA+IE9uIFdlZCwgMjAxOC0wNS0xNiBhdCAyMjow NSArMDkwMCwgVGV0c3VvIEhhbmRhIHdyb3RlOg0KPiA+ID4gZGlmZiAtLWdpdCBhL2Jsb2NrL2Js ay1jb3JlLmMgYi9ibG9jay9ibGstY29yZS5jDQo+ID4gPiBpbmRleCA4NTkwOWI0Li41OWUyNDk2 IDEwMDY0NA0KPiA+ID4gLS0tIGEvYmxvY2svYmxrLWNvcmUuYw0KPiA+ID4gKysrIGIvYmxvY2sv YmxrLWNvcmUuYw0KPiA+ID4gQEAgLTk1MSwxMCArOTUxLDEwIEBAIGludCBibGtfcXVldWVfZW50 ZXIoc3RydWN0IHJlcXVlc3RfcXVldWUgKnEsIGJsa19tcV9yZXFfZmxhZ3NfdCBmbGFncykNCj4g PiA+ICAgICAgICAgICAgICAgc21wX3JtYigpOw0KPiA+ID4gDQo+ID4gPiAgICAgICAgICAgICAg IHdhaXRfZXZlbnQocS0+bXFfZnJlZXplX3dxLA0KPiA+ID4gLSAgICAgICAgICAgICAgICAgICAg ICAgIChhdG9taWNfcmVhZCgmcS0+bXFfZnJlZXplX2RlcHRoKSA9PSAwICYmDQo+ID4gPiAtICAg ICAgICAgICAgICAgICAgICAgICAgIChwcmVlbXB0IHx8ICFibGtfcXVldWVfcHJlZW1wdF9vbmx5 KHEpKSkgfHwNCj4gPiA+ICsgICAgICAgICAgICAgICAgICAgICAgICBhdG9taWNfcmVhZCgmcS0+ bXFfZnJlZXplX2RlcHRoKSB8fA0KPiA+ID4gKyAgICAgICAgICAgICAgICAgICAgICAgIChwcmVl bXB0IHx8ICFibGtfcXVldWVfcHJlZW1wdF9vbmx5KHEpKSB8fA0KPiA+ID4gICAgICAgICAgICAg ICAgICAgICAgICAgIGJsa19xdWV1ZV9keWluZyhxKSk7DQo+ID4gPiAtICAgICAgICAgICAgIGlm IChibGtfcXVldWVfZHlpbmcocSkpDQo+ID4gPiArICAgICAgICAgICAgIGlmIChhdG9taWNfcmVh ZCgmcS0+bXFfZnJlZXplX2RlcHRoKSB8fCBibGtfcXVldWVfZHlpbmcocSkpDQo+ID4gPiAgICAg ICAgICAgICAgICAgICAgICAgcmV0dXJuIC1FTk9ERVY7DQo+ID4gPiAgICAgICB9DQo+ID4gPiAg fQ0KPiA+IA0KPiA+IFRoYXQgY2hhbmdlIGxvb2tzIHdyb25nIHRvIG1lLg0KPiANCj4gSGkgQmFy dCwNCj4gDQo+IFdoeSBkb2VzIGl0IGxvb2sgd3JvbmcgdG8geW91Pw0KDQpCZWNhdXNlIHRoYXQg Y2hhbmdlIGNvbmZsaWN0cyB3aXRoIHRoZSBwdXJwb3NlIG9mIHF1ZXVlIGZyZWV6aW5nIGFuZCBh bHNvIGJlY2F1c2UNCnRoYXQgY2hhbmdlIHdvdWxkIGluamVjdCBJL08gZXJyb3JzIGluIGNvZGUg cGF0aHMgdGhhdCBzaG91bGRuJ3QgaW5qZWN0IEkvTyBlcnJvcnMuDQpQbGVhc2UgaGF2ZSBhIGxv b2sgYXQgZS5nLiBnZW5lcmljX21ha2VfcmVxdWVzdCgpLiBGcm9tIHRoZSBzdGFydCBvZiB0aGF0 IGZ1bmN0aW9uOg0KDQoJaWYgKGJsa19xdWV1ZV9lbnRlcihxLCBmbGFncykgPCAwKSB7DQoJCWlm ICghYmxrX3F1ZXVlX2R5aW5nKHEpICYmIChiaW8tPmJpX29wZiAmIFJFUV9OT1dBSVQpKQ0KCQkJ YmlvX3dvdWxkYmxvY2tfZXJyb3IoYmlvKTsNCgkJZWxzZQ0KCQkJYmlvX2lvX2Vycm9yKGJpbyk7 DQoJCXJldHVybiByZXQ7DQoJfQ0KDQpUaGUgYWJvdmUgcGF0Y2ggY2hhbmdlcyB0aGUgYmVoYXZp b3Igb2YgYmxrX3F1ZXVlX2VudGVyKCkgY29kZSBmcm9tIHdhaXRpbmcgd2hpbGUNCnEtPm1xX2Zy ZWV6ZV9kZXB0aCAhPSAwIGludG8gcmV0dXJuaW5nIC1FTk9ERVYgd2hpbGUgdGhlIHJlcXVlc3Qg cXVldWUgaXMgZnJvemVuLg0KVGhhdCB3aWxsIGNhdXNlIGdlbmVyaWNfbWFrZV9yZXF1ZXN0KCkg dG8gY2FsbCBiaW9faW9fZXJyb3IoYmlvKSB3aGlsZSBhIHJlcXVlc3QNCnF1ZXVlIGlzIGZyb3pl biBpZiBSRVFfTk9XQUlUIGhhcyBub3QgYmVlbiBzZXQsIHdoaWNoIGlzIHRoZSBkZWZhdWx0IGJl aGF2aW9yLiBTbw0KYW55IG9wZXJhdGlvbiB0aGF0IGZyZWV6ZXMgdGhlIHF1ZXVlIHRlbXBvcmFy aWx5LCBlLmcuIGNoYW5naW5nIHRoZSBxdWV1ZSBkZXB0aCwNCmNvbmN1cnJlbnRseSB3aXRoIEkv TyBwcm9jZXNzaW5nIGNhbiBjYXVzZSBJL08gdG8gZmFpbCB3aXRoIC1FTk9ERVYuIEFzIHlvdQ0K cHJvYmFibHkga25vdyBmYWlsdXJlIG9mIHdyaXRlIHJlcXVlc3RzIGhhcyB2ZXJ5IGFubm95aW5n IGNvbnNlcXVlbmNlcy4gSXQgZS5nLg0KY2F1c2VzIGZpbGVzeXN0ZW1zIHRvIGdvIGludG8gcmVh ZC1vbmx5IG1vZGUuIFRoYXQncyB3aHkgSSB0aGluayB0aGF0IHRoZSBhYm92ZQ0KY2hhbmdlIGlz IGNvbXBsZXRlbHkgd3JvbmcuDQoNCkJhcnQuDQoNCg0K ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-05-16 15:37 ` Bart Van Assche 0 siblings, 0 replies; 32+ messages in thread From: Bart Van Assche @ 2018-05-16 15:37 UTC (permalink / raw) To: dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, penguin-kernel, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote: > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 85909b4..59e2496 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > smp_rmb(); > > > > > > wait_event(q->mq_freeze_wq, > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > - (preempt || !blk_queue_preempt_only(q))) || > > > + atomic_read(&q->mq_freeze_depth) || > > > + (preempt || !blk_queue_preempt_only(q)) || > > > blk_queue_dying(q)); > > > - if (blk_queue_dying(q)) > > > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > > > return -ENODEV; > > > } > > > } > > > > That change looks wrong to me. > > Hi Bart, > > Why does it look wrong to you? Because that change conflicts with the purpose of queue freezing and also because that change would inject I/O errors in code paths that shouldn't inject I/O errors. Please have a look at e.g. generic_make_request(). From the start of that function: if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else bio_io_error(bio); return ret; } The above patch changes the behavior of blk_queue_enter() code from waiting while q->mq_freeze_depth != 0 into returning -ENODEV while the request queue is frozen. That will cause generic_make_request() to call bio_io_error(bio) while a request queue is frozen if REQ_NOWAIT has not been set, which is the default behavior. So any operation that freezes the queue temporarily, e.g. changing the queue depth, concurrently with I/O processing can cause I/O to fail with -ENODEV. As you probably know failure of write requests has very annoying consequences. It e.g. causes filesystems to go into read-only mode. That's why I think that the above change is completely wrong. Bart. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-05-16 15:37 ` Bart Van Assche 0 siblings, 0 replies; 32+ messages in thread From: Bart Van Assche @ 2018-05-16 15:37 UTC (permalink / raw) To: dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, penguin-kernel, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote: > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 85909b4..59e2496 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > smp_rmb(); > > > > > > wait_event(q->mq_freeze_wq, > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > - (preempt || !blk_queue_preempt_only(q))) || > > > + atomic_read(&q->mq_freeze_depth) || > > > + (preempt || !blk_queue_preempt_only(q)) || > > > blk_queue_dying(q)); > > > - if (blk_queue_dying(q)) > > > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > > > return -ENODEV; > > > } > > > } > > > > That change looks wrong to me. > > Hi Bart, > > Why does it look wrong to you? Because that change conflicts with the purpose of queue freezing and also because that change would inject I/O errors in code paths that shouldn't inject I/O errors. Please have a look at e.g. generic_make_request(). From the start of that function: if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else bio_io_error(bio); return ret; } The above patch changes the behavior of blk_queue_enter() code from waiting while q->mq_freeze_depth != 0 into returning -ENODEV while the request queue is frozen. That will cause generic_make_request() to call bio_io_error(bio) while a request queue is frozen if REQ_NOWAIT has not been set, which is the default behavior. So any operation that freezes the queue temporarily, e.g. changing the queue depth, concurrently with I/O processing can cause I/O to fail with -ENODEV. As you probably know failure of write requests has very annoying consequences. It e.g. causes filesystems to go into read-only mode. That's why I think that the above change is completely wrong. Bart. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-16 15:37 ` Bart Van Assche (?) (?) @ 2018-05-21 21:52 ` Tetsuo Handa 2018-05-22 11:20 ` Tetsuo Handa -1 siblings, 1 reply; 32+ messages in thread From: Tetsuo Handa @ 2018-05-21 21:52 UTC (permalink / raw) To: Bart.VanAssche, dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 Bart Van Assche wrote: > On Wed, 2018-05-16 at 17:16 +0200, Dmitry Vyukov wrote: > > On Wed, May 16, 2018 at 4:56 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > > On Wed, 2018-05-16 at 22:05 +0900, Tetsuo Handa wrote: > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > > index 85909b4..59e2496 100644 > > > > --- a/block/blk-core.c > > > > +++ b/block/blk-core.c > > > > @@ -951,10 +951,10 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) > > > > smp_rmb(); > > > > > > > > wait_event(q->mq_freeze_wq, > > > > - (atomic_read(&q->mq_freeze_depth) == 0 && > > > > - (preempt || !blk_queue_preempt_only(q))) || > > > > + atomic_read(&q->mq_freeze_depth) || > > > > + (preempt || !blk_queue_preempt_only(q)) || > > > > blk_queue_dying(q)); > > > > - if (blk_queue_dying(q)) > > > > + if (atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)) > > > > return -ENODEV; > > > > } > > > > } > > > > > > That change looks wrong to me. > > > > Hi Bart, > > > > Why does it look wrong to you? > > Because that change conflicts with the purpose of queue freezing and also because > that change would inject I/O errors in code paths that shouldn't inject I/O errors. But waiting there until atomic_read(&q->mq_freeze_depth) becomes 0 is causing deadlock. wait_event() never returns is a bug. I think we should not wait for q->mq_freeze_depth. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-21 21:52 ` Tetsuo Handa @ 2018-05-22 11:20 ` Tetsuo Handa 2018-06-01 10:10 ` Tetsuo Handa 0 siblings, 1 reply; 32+ messages in thread From: Tetsuo Handa @ 2018-05-22 11:20 UTC (permalink / raw) To: Bart.VanAssche, dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 I checked counter values using debug printk() patch shown below, and found that q->q_usage_counter.count == 1 when this deadlock occurs. Since sum of percpu_count did not change after percpu_ref_kill(), this is not a race condition while folding percpu counter values into atomic counter value. That is, for some reason, someone who is responsible for calling percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is unable to call percpu_ref_put(). diff --git a/block/blk-core.c b/block/blk-core.c index 85909b4..6933020 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -908,6 +908,12 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask) } EXPORT_SYMBOL(blk_alloc_queue); +static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref) +{ + return (unsigned long __percpu *) + (ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC_DEAD); +} + /** * blk_queue_enter() - try to increase q->q_usage_counter * @q: request queue pointer @@ -950,10 +956,22 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags) */ smp_rmb(); - wait_event(q->mq_freeze_wq, - (atomic_read(&q->mq_freeze_depth) == 0 && - (preempt || !blk_queue_preempt_only(q))) || - blk_queue_dying(q)); + while (wait_event_timeout(q->mq_freeze_wq, + (atomic_read(&q->mq_freeze_depth) == 0 && + (preempt || !blk_queue_preempt_only(q))) || + blk_queue_dying(q), 3 * HZ) == 0) { + struct percpu_ref *ref = &q->q_usage_counter; + unsigned long __percpu *percpu_count = percpu_count_ptr(ref); + unsigned long count = 0; + int cpu; + + for_each_possible_cpu(cpu) + count += *per_cpu_ptr(percpu_count, cpu); + + printk("%s(%d): %px %ld %ld\n", current->comm, current->pid, + ref, atomic_long_read(&ref->count), count); + } + if (blk_queue_dying(q)) return -ENODEV; } diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index 9f96fa7..72773ce 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -133,8 +133,8 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) for_each_possible_cpu(cpu) count += *per_cpu_ptr(percpu_count, cpu); - pr_debug("global %ld percpu %ld", - atomic_long_read(&ref->count), (long)count); + printk("%px global %ld percpu %ld\n", ref, + atomic_long_read(&ref->count), (long)count); /* * It's crucial that we sum the percpu counters _before_ adding the sum @@ -150,6 +150,8 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) */ atomic_long_add((long)count - PERCPU_COUNT_BIAS, &ref->count); + printk("%px global %ld\n", ref, atomic_long_read(&ref->count)); + WARN_ONCE(atomic_long_read(&ref->count) <= 0, "percpu ref (%pf) <= 0 (%ld) after switching to atomic", ref->release, atomic_long_read(&ref->count)); If I change blk_queue_enter() not to wait at wait_event() if q->mq_freeze_depth != 0, this deadlock problem does not occur. Also, I found that if blk_freeze_queue_start() tries to wait for counters to become 1 before calling percpu_ref_kill() (like shown below), this deadlock problem does not occur. diff --git a/block/blk-mq.c b/block/blk-mq.c index 9ce9cac..4bff534 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -134,12 +134,36 @@ void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, blk_mq_queue_tag_busy_iter(q, blk_mq_check_inflight_rw, &mi); } +#define PERCPU_COUNT_BIAS (1LU << (BITS_PER_LONG - 1)) + +static unsigned long __percpu *percpu_count_ptr(struct percpu_ref *ref) +{ + return (unsigned long __percpu *) + (ref->percpu_count_ptr & ~__PERCPU_REF_ATOMIC_DEAD); +} + void blk_freeze_queue_start(struct request_queue *q) { int freeze_depth; freeze_depth = atomic_inc_return(&q->mq_freeze_depth); if (freeze_depth == 1) { + int i; + for (i = 0; i < 10; i++) { + struct percpu_ref *ref = &q->q_usage_counter; + unsigned long __percpu *percpu_count = percpu_count_ptr(ref); + unsigned long count = 0; + int cpu; + + for_each_possible_cpu(cpu) + count += *per_cpu_ptr(percpu_count, cpu); + + if (atomic_long_read(&ref->count) + count - PERCPU_COUNT_BIAS == 1) + break; + printk("%s(%d):! %px %ld %ld\n", current->comm, current->pid, + ref, atomic_long_read(&ref->count), count); + schedule_timeout_uninterruptible(HZ / 10); + } percpu_ref_kill(&q->q_usage_counter); if (q->mq_ops) blk_mq_run_hw_queues(q, false); But I don't know how to find someone who is failing to call percpu_ref_put()... ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-22 11:20 ` Tetsuo Handa @ 2018-06-01 10:10 ` Tetsuo Handa 2018-06-01 17:52 ` Jens Axboe 2018-06-04 11:46 ` Dmitry Vyukov 0 siblings, 2 replies; 32+ messages in thread From: Tetsuo Handa @ 2018-06-01 10:10 UTC (permalink / raw) To: Bart.VanAssche, dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, axboe, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 Tetsuo Handa wrote: > Since sum of percpu_count did not change after percpu_ref_kill(), this is > not a race condition while folding percpu counter values into atomic counter > value. That is, for some reason, someone who is responsible for calling > percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > unable to call percpu_ref_put(). > But I don't know how to find someone who is failing to call percpu_ref_put()... I found the someone. It was already there in the backtrace... ---------------------------------------- [ 62.065852] a.out D 0 4414 4337 0x00000000 [ 62.067677] Call Trace: [ 62.068545] __schedule+0x40b/0x860 [ 62.069726] schedule+0x31/0x80 [ 62.070796] schedule_timeout+0x1c1/0x3c0 [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 [ 62.073670] blk_queue_enter+0x218/0x520 [ 62.074985] ? remove_wait_queue+0x70/0x70 [ 62.076361] generic_make_request+0x3d/0x540 [ 62.077785] ? __bio_clone_fast+0x6b/0x80 [ 62.079147] ? bio_clone_fast+0x2c/0x70 [ 62.080456] blk_queue_split+0x29b/0x560 [ 62.081772] ? blk_queue_split+0x29b/0x560 [ 62.083162] blk_mq_make_request+0x7c/0x430 [ 62.084562] generic_make_request+0x276/0x540 [ 62.086034] submit_bio+0x6e/0x140 [ 62.087185] ? submit_bio+0x6e/0x140 [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 [ 62.089681] do_mpage_readpage+0x328/0x730 [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 [ 62.092726] mpage_readpages+0x120/0x190 [ 62.094034] ? check_disk_change+0x70/0x70 [ 62.095454] ? check_disk_change+0x70/0x70 [ 62.096849] ? alloc_pages_current+0x65/0xd0 [ 62.098277] blkdev_readpages+0x18/0x20 [ 62.099568] __do_page_cache_readahead+0x298/0x360 [ 62.101157] ondemand_readahead+0x1f6/0x490 [ 62.102546] ? ondemand_readahead+0x1f6/0x490 [ 62.103995] page_cache_sync_readahead+0x29/0x40 [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 [ 62.107067] ? futex_wait+0x221/0x240 [ 62.108303] ? trace_hardirqs_on+0xd/0x10 [ 62.109654] blkdev_read_iter+0x30/0x40 [ 62.110954] generic_file_splice_read+0xc5/0x140 [ 62.112538] do_splice_to+0x74/0x90 [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 [ 62.116773] do_splice_direct+0x8a/0xb0 [ 62.118056] do_sendfile+0x1aa/0x390 [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 [ 62.120666] do_syscall_64+0x6e/0x210 [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe ---------------------------------------- The someone is blk_queue_split() from blk_mq_make_request() who depends on an assumption that blk_queue_enter() from recursively called generic_make_request() does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. ---------------------------------------- generic_make_request(struct bio *bio) { if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else bio_io_error(bio); return ret; } (...snipped...) ret = q->make_request_fn(q, bio); (...snipped...) if (q) blk_queue_exit(q); } ---------------------------------------- where q->make_request_fn == blk_mq_make_request which does ---------------------------------------- blk_mq_make_request(struct request_queue *q, struct bio *bio) { blk_queue_split(q, &bio); } blk_queue_split(struct request_queue *q, struct bio **bio) { generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ } ---------------------------------------- and meanwhile atomic_inc_return(&q->mq_freeze_depth) and percpu_ref_kill() are called by blk_freeze_queue_start()... Now, it is up to you about how to fix this race problem. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-01 10:10 ` Tetsuo Handa @ 2018-06-01 17:52 ` Jens Axboe 2018-06-01 23:49 ` Ming Lei 2018-06-05 0:27 ` Tetsuo Handa 2018-06-04 11:46 ` Dmitry Vyukov 1 sibling, 2 replies; 32+ messages in thread From: Jens Axboe @ 2018-06-01 17:52 UTC (permalink / raw) To: Tetsuo Handa, Bart.VanAssche, dvyukov Cc: linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On 6/1/18 4:10 AM, Tetsuo Handa wrote: > Tetsuo Handa wrote: >> Since sum of percpu_count did not change after percpu_ref_kill(), this is >> not a race condition while folding percpu counter values into atomic counter >> value. That is, for some reason, someone who is responsible for calling >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >> unable to call percpu_ref_put(). >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > I found the someone. It was already there in the backtrace... > > ---------------------------------------- > [ 62.065852] a.out D 0 4414 4337 0x00000000 > [ 62.067677] Call Trace: > [ 62.068545] __schedule+0x40b/0x860 > [ 62.069726] schedule+0x31/0x80 > [ 62.070796] schedule_timeout+0x1c1/0x3c0 > [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 > [ 62.073670] blk_queue_enter+0x218/0x520 > [ 62.074985] ? remove_wait_queue+0x70/0x70 > [ 62.076361] generic_make_request+0x3d/0x540 > [ 62.077785] ? __bio_clone_fast+0x6b/0x80 > [ 62.079147] ? bio_clone_fast+0x2c/0x70 > [ 62.080456] blk_queue_split+0x29b/0x560 > [ 62.081772] ? blk_queue_split+0x29b/0x560 > [ 62.083162] blk_mq_make_request+0x7c/0x430 > [ 62.084562] generic_make_request+0x276/0x540 > [ 62.086034] submit_bio+0x6e/0x140 > [ 62.087185] ? submit_bio+0x6e/0x140 > [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 > [ 62.089681] do_mpage_readpage+0x328/0x730 > [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 > [ 62.092726] mpage_readpages+0x120/0x190 > [ 62.094034] ? check_disk_change+0x70/0x70 > [ 62.095454] ? check_disk_change+0x70/0x70 > [ 62.096849] ? alloc_pages_current+0x65/0xd0 > [ 62.098277] blkdev_readpages+0x18/0x20 > [ 62.099568] __do_page_cache_readahead+0x298/0x360 > [ 62.101157] ondemand_readahead+0x1f6/0x490 > [ 62.102546] ? ondemand_readahead+0x1f6/0x490 > [ 62.103995] page_cache_sync_readahead+0x29/0x40 > [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 > [ 62.107067] ? futex_wait+0x221/0x240 > [ 62.108303] ? trace_hardirqs_on+0xd/0x10 > [ 62.109654] blkdev_read_iter+0x30/0x40 > [ 62.110954] generic_file_splice_read+0xc5/0x140 > [ 62.112538] do_splice_to+0x74/0x90 > [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 > [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 > [ 62.116773] do_splice_direct+0x8a/0xb0 > [ 62.118056] do_sendfile+0x1aa/0x390 > [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 > [ 62.120666] do_syscall_64+0x6e/0x210 > [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe > ---------------------------------------- > > The someone is blk_queue_split() from blk_mq_make_request() who depends on an > assumption that blk_queue_enter() from recursively called generic_make_request() > does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. > > ---------------------------------------- > generic_make_request(struct bio *bio) { > if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ > if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > bio_wouldblock_error(bio); > else > bio_io_error(bio); > return ret; > } > (...snipped...) > ret = q->make_request_fn(q, bio); > (...snipped...) > if (q) > blk_queue_exit(q); > } > ---------------------------------------- > > where q->make_request_fn == blk_mq_make_request which does > > ---------------------------------------- > blk_mq_make_request(struct request_queue *q, struct bio *bio) { > blk_queue_split(q, &bio); > } > > blk_queue_split(struct request_queue *q, struct bio **bio) { > generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ > } > ---------------------------------------- > > and meanwhile atomic_inc_return(&q->mq_freeze_depth) and > percpu_ref_kill() are called by blk_freeze_queue_start()... > > Now, it is up to you about how to fix this race problem. Ahh, nicely spotted! One idea would be the one below. For this case, we're recursing, so we can either do a non-block queue enter, or we can just do a live enter. diff --git a/block/blk-core.c b/block/blk-core.c index 85909b431eb0..7de12e0dcc3d 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) if (bio->bi_opf & REQ_NOWAIT) flags = BLK_MQ_REQ_NOWAIT; - if (blk_queue_enter(q, flags) < 0) { + if (bio->bi_opf & REQ_QUEUE_ENTERED) + blk_queue_enter_live(q); + else if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else diff --git a/block/blk-merge.c b/block/blk-merge.c index 782940c65d8a..e1063e8f7eda 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) /* there isn't chance to merge the splitted bio */ split->bi_opf |= REQ_NOMERGE; + /* + * Since we're recursing into make_request here, ensure + * that we mark this bio as already having entered the queue. + * If not, and the queue is going away, we can get stuck + * forever on waiting for the queue reference to drop. But + * that will never happen, as we're already holding a + * reference to it. + */ + (*bio)->bi_opf |= REQ_QUEUE_ENTERED; + bio_chain(split, *bio); trace_block_split(q, split, (*bio)->bi_iter.bi_sector); generic_make_request(*bio); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 17b18b91ebac..a6bc789b83e7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -282,6 +282,8 @@ enum req_flag_bits { /* command specific flags for REQ_OP_WRITE_ZEROES: */ __REQ_NOUNMAP, /* do not free blocks when zeroing */ + __REQ_QUEUE_ENTERED, /* queue already entered for this request */ + /* for driver use */ __REQ_DRV, @@ -302,9 +304,8 @@ enum req_flag_bits { #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) - #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) - +#define REQ_QUEUE_ENTERED (1ULL << __REQ_QUEUE_ENTERED) #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_FAILFAST_MASK \ -- Jens Axboe ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-01 17:52 ` Jens Axboe @ 2018-06-01 23:49 ` Ming Lei 2018-06-02 0:49 ` Jens Axboe 2018-06-05 0:27 ` Tetsuo Handa 1 sibling, 1 reply; 32+ messages in thread From: Ming Lei @ 2018-06-01 23:49 UTC (permalink / raw) To: Jens Axboe Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On Fri, Jun 01, 2018 at 11:52:45AM -0600, Jens Axboe wrote: > On 6/1/18 4:10 AM, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is > >> not a race condition while folding percpu counter values into atomic counter > >> value. That is, for some reason, someone who is responsible for calling > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > >> unable to call percpu_ref_put(). > >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > > > I found the someone. It was already there in the backtrace... > > > > ---------------------------------------- > > [ 62.065852] a.out D 0 4414 4337 0x00000000 > > [ 62.067677] Call Trace: > > [ 62.068545] __schedule+0x40b/0x860 > > [ 62.069726] schedule+0x31/0x80 > > [ 62.070796] schedule_timeout+0x1c1/0x3c0 > > [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 > > [ 62.073670] blk_queue_enter+0x218/0x520 > > [ 62.074985] ? remove_wait_queue+0x70/0x70 > > [ 62.076361] generic_make_request+0x3d/0x540 > > [ 62.077785] ? __bio_clone_fast+0x6b/0x80 > > [ 62.079147] ? bio_clone_fast+0x2c/0x70 > > [ 62.080456] blk_queue_split+0x29b/0x560 > > [ 62.081772] ? blk_queue_split+0x29b/0x560 > > [ 62.083162] blk_mq_make_request+0x7c/0x430 > > [ 62.084562] generic_make_request+0x276/0x540 > > [ 62.086034] submit_bio+0x6e/0x140 > > [ 62.087185] ? submit_bio+0x6e/0x140 > > [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 > > [ 62.089681] do_mpage_readpage+0x328/0x730 > > [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 > > [ 62.092726] mpage_readpages+0x120/0x190 > > [ 62.094034] ? check_disk_change+0x70/0x70 > > [ 62.095454] ? check_disk_change+0x70/0x70 > > [ 62.096849] ? alloc_pages_current+0x65/0xd0 > > [ 62.098277] blkdev_readpages+0x18/0x20 > > [ 62.099568] __do_page_cache_readahead+0x298/0x360 > > [ 62.101157] ondemand_readahead+0x1f6/0x490 > > [ 62.102546] ? ondemand_readahead+0x1f6/0x490 > > [ 62.103995] page_cache_sync_readahead+0x29/0x40 > > [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 > > [ 62.107067] ? futex_wait+0x221/0x240 > > [ 62.108303] ? trace_hardirqs_on+0xd/0x10 > > [ 62.109654] blkdev_read_iter+0x30/0x40 > > [ 62.110954] generic_file_splice_read+0xc5/0x140 > > [ 62.112538] do_splice_to+0x74/0x90 > > [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 > > [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 > > [ 62.116773] do_splice_direct+0x8a/0xb0 > > [ 62.118056] do_sendfile+0x1aa/0x390 > > [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 > > [ 62.120666] do_syscall_64+0x6e/0x210 > > [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > ---------------------------------------- > > > > The someone is blk_queue_split() from blk_mq_make_request() who depends on an > > assumption that blk_queue_enter() from recursively called generic_make_request() > > does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. > > > > ---------------------------------------- > > generic_make_request(struct bio *bio) { > > if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ > > if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > > bio_wouldblock_error(bio); > > else > > bio_io_error(bio); > > return ret; > > } > > (...snipped...) > > ret = q->make_request_fn(q, bio); > > (...snipped...) > > if (q) > > blk_queue_exit(q); > > } > > ---------------------------------------- > > > > where q->make_request_fn == blk_mq_make_request which does > > > > ---------------------------------------- > > blk_mq_make_request(struct request_queue *q, struct bio *bio) { > > blk_queue_split(q, &bio); > > } > > > > blk_queue_split(struct request_queue *q, struct bio **bio) { > > generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ > > } > > ---------------------------------------- > > > > and meanwhile atomic_inc_return(&q->mq_freeze_depth) and > > percpu_ref_kill() are called by blk_freeze_queue_start()... > > > > Now, it is up to you about how to fix this race problem. > > Ahh, nicely spotted! One idea would be the one below. For this case, > we're recursing, so we can either do a non-block queue enter, or we > can just do a live enter. > > diff --git a/block/blk-core.c b/block/blk-core.c > index 85909b431eb0..7de12e0dcc3d 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) > > if (bio->bi_opf & REQ_NOWAIT) > flags = BLK_MQ_REQ_NOWAIT; > - if (blk_queue_enter(q, flags) < 0) { > + if (bio->bi_opf & REQ_QUEUE_ENTERED) > + blk_queue_enter_live(q); > + else if (blk_queue_enter(q, flags) < 0) { > if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > bio_wouldblock_error(bio); > else > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 782940c65d8a..e1063e8f7eda 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) > /* there isn't chance to merge the splitted bio */ > split->bi_opf |= REQ_NOMERGE; > > + /* > + * Since we're recursing into make_request here, ensure > + * that we mark this bio as already having entered the queue. > + * If not, and the queue is going away, we can get stuck > + * forever on waiting for the queue reference to drop. But > + * that will never happen, as we're already holding a > + * reference to it. > + */ > + (*bio)->bi_opf |= REQ_QUEUE_ENTERED; > + > bio_chain(split, *bio); > trace_block_split(q, split, (*bio)->bi_iter.bi_sector); > generic_make_request(*bio); > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 17b18b91ebac..a6bc789b83e7 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -282,6 +282,8 @@ enum req_flag_bits { > /* command specific flags for REQ_OP_WRITE_ZEROES: */ > __REQ_NOUNMAP, /* do not free blocks when zeroing */ > > + __REQ_QUEUE_ENTERED, /* queue already entered for this request */ > + > /* for driver use */ > __REQ_DRV, > > @@ -302,9 +304,8 @@ enum req_flag_bits { > #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) > #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) > #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) > - > #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) > - > +#define REQ_QUEUE_ENTERED (1ULL << __REQ_QUEUE_ENTERED) > #define REQ_DRV (1ULL << __REQ_DRV) > > #define REQ_FAILFAST_MASK \ bio clone inherits .bi_opf, so this way may not work for cloned bio. Given most of bio_split() is run in .make_request_fn(), holding the queue ref may not be needed in case of bio splitting, so another candidate fix might be to introduce one extra parameter of 'need_queue_ref' to generic_make_request(). Thanks, Ming ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-01 23:49 ` Ming Lei @ 2018-06-02 0:49 ` Jens Axboe 2018-06-02 0:56 ` Jens Axboe 2018-06-02 2:36 ` Ming Lei 0 siblings, 2 replies; 32+ messages in thread From: Jens Axboe @ 2018-06-02 0:49 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On 6/1/18 5:49 PM, Ming Lei wrote: > On Fri, Jun 01, 2018 at 11:52:45AM -0600, Jens Axboe wrote: >> On 6/1/18 4:10 AM, Tetsuo Handa wrote: >>> Tetsuo Handa wrote: >>>> Since sum of percpu_count did not change after percpu_ref_kill(), this is >>>> not a race condition while folding percpu counter values into atomic counter >>>> value. That is, for some reason, someone who is responsible for calling >>>> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >>>> unable to call percpu_ref_put(). >>>> But I don't know how to find someone who is failing to call percpu_ref_put()... >>> >>> I found the someone. It was already there in the backtrace... >>> >>> ---------------------------------------- >>> [ 62.065852] a.out D 0 4414 4337 0x00000000 >>> [ 62.067677] Call Trace: >>> [ 62.068545] __schedule+0x40b/0x860 >>> [ 62.069726] schedule+0x31/0x80 >>> [ 62.070796] schedule_timeout+0x1c1/0x3c0 >>> [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 >>> [ 62.073670] blk_queue_enter+0x218/0x520 >>> [ 62.074985] ? remove_wait_queue+0x70/0x70 >>> [ 62.076361] generic_make_request+0x3d/0x540 >>> [ 62.077785] ? __bio_clone_fast+0x6b/0x80 >>> [ 62.079147] ? bio_clone_fast+0x2c/0x70 >>> [ 62.080456] blk_queue_split+0x29b/0x560 >>> [ 62.081772] ? blk_queue_split+0x29b/0x560 >>> [ 62.083162] blk_mq_make_request+0x7c/0x430 >>> [ 62.084562] generic_make_request+0x276/0x540 >>> [ 62.086034] submit_bio+0x6e/0x140 >>> [ 62.087185] ? submit_bio+0x6e/0x140 >>> [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 >>> [ 62.089681] do_mpage_readpage+0x328/0x730 >>> [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 >>> [ 62.092726] mpage_readpages+0x120/0x190 >>> [ 62.094034] ? check_disk_change+0x70/0x70 >>> [ 62.095454] ? check_disk_change+0x70/0x70 >>> [ 62.096849] ? alloc_pages_current+0x65/0xd0 >>> [ 62.098277] blkdev_readpages+0x18/0x20 >>> [ 62.099568] __do_page_cache_readahead+0x298/0x360 >>> [ 62.101157] ondemand_readahead+0x1f6/0x490 >>> [ 62.102546] ? ondemand_readahead+0x1f6/0x490 >>> [ 62.103995] page_cache_sync_readahead+0x29/0x40 >>> [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 >>> [ 62.107067] ? futex_wait+0x221/0x240 >>> [ 62.108303] ? trace_hardirqs_on+0xd/0x10 >>> [ 62.109654] blkdev_read_iter+0x30/0x40 >>> [ 62.110954] generic_file_splice_read+0xc5/0x140 >>> [ 62.112538] do_splice_to+0x74/0x90 >>> [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 >>> [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 >>> [ 62.116773] do_splice_direct+0x8a/0xb0 >>> [ 62.118056] do_sendfile+0x1aa/0x390 >>> [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 >>> [ 62.120666] do_syscall_64+0x6e/0x210 >>> [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>> ---------------------------------------- >>> >>> The someone is blk_queue_split() from blk_mq_make_request() who depends on an >>> assumption that blk_queue_enter() from recursively called generic_make_request() >>> does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. >>> >>> ---------------------------------------- >>> generic_make_request(struct bio *bio) { >>> if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ >>> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) >>> bio_wouldblock_error(bio); >>> else >>> bio_io_error(bio); >>> return ret; >>> } >>> (...snipped...) >>> ret = q->make_request_fn(q, bio); >>> (...snipped...) >>> if (q) >>> blk_queue_exit(q); >>> } >>> ---------------------------------------- >>> >>> where q->make_request_fn == blk_mq_make_request which does >>> >>> ---------------------------------------- >>> blk_mq_make_request(struct request_queue *q, struct bio *bio) { >>> blk_queue_split(q, &bio); >>> } >>> >>> blk_queue_split(struct request_queue *q, struct bio **bio) { >>> generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ >>> } >>> ---------------------------------------- >>> >>> and meanwhile atomic_inc_return(&q->mq_freeze_depth) and >>> percpu_ref_kill() are called by blk_freeze_queue_start()... >>> >>> Now, it is up to you about how to fix this race problem. >> >> Ahh, nicely spotted! One idea would be the one below. For this case, >> we're recursing, so we can either do a non-block queue enter, or we >> can just do a live enter. >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 85909b431eb0..7de12e0dcc3d 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) >> >> if (bio->bi_opf & REQ_NOWAIT) >> flags = BLK_MQ_REQ_NOWAIT; >> - if (blk_queue_enter(q, flags) < 0) { >> + if (bio->bi_opf & REQ_QUEUE_ENTERED) >> + blk_queue_enter_live(q); >> + else if (blk_queue_enter(q, flags) < 0) { >> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) >> bio_wouldblock_error(bio); >> else >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index 782940c65d8a..e1063e8f7eda 100644 >> --- a/block/blk-merge.c >> +++ b/block/blk-merge.c >> @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) >> /* there isn't chance to merge the splitted bio */ >> split->bi_opf |= REQ_NOMERGE; >> >> + /* >> + * Since we're recursing into make_request here, ensure >> + * that we mark this bio as already having entered the queue. >> + * If not, and the queue is going away, we can get stuck >> + * forever on waiting for the queue reference to drop. But >> + * that will never happen, as we're already holding a >> + * reference to it. >> + */ >> + (*bio)->bi_opf |= REQ_QUEUE_ENTERED; >> + >> bio_chain(split, *bio); >> trace_block_split(q, split, (*bio)->bi_iter.bi_sector); >> generic_make_request(*bio); >> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >> index 17b18b91ebac..a6bc789b83e7 100644 >> --- a/include/linux/blk_types.h >> +++ b/include/linux/blk_types.h >> @@ -282,6 +282,8 @@ enum req_flag_bits { >> /* command specific flags for REQ_OP_WRITE_ZEROES: */ >> __REQ_NOUNMAP, /* do not free blocks when zeroing */ >> >> + __REQ_QUEUE_ENTERED, /* queue already entered for this request */ >> + >> /* for driver use */ >> __REQ_DRV, >> >> @@ -302,9 +304,8 @@ enum req_flag_bits { >> #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) >> #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) >> #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) >> - >> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) >> - >> +#define REQ_QUEUE_ENTERED (1ULL << __REQ_QUEUE_ENTERED) >> #define REQ_DRV (1ULL << __REQ_DRV) >> >> #define REQ_FAILFAST_MASK \ > > bio clone inherits .bi_opf, so this way may not work for cloned bio. > > Given most of bio_split() is run in .make_request_fn(), holding the > queue ref may not be needed in case of bio splitting, so another > candidate fix might be to introduce one extra parameter of > 'need_queue_ref' to generic_make_request(). I did consider that, but that's pretty silly for a corner case and you'd have to change a crap ton of calls to it. I think it'd be cleaner to clear the bit when we need to, potentially even adding a debug check to blk_queue_enter_live() that complains if the ref was not already elevated. Though that would be expensive, compared to the percpu inc now. Not saying the bit is necessarily the best way forward, but I do like it a LOT more than adding an argument to generic_make_request. It should have been a bio flag to begin with, that better captures the scope and also avoids clones inheriting anything they shouldn't. It's also not applicable to a request. diff --git a/block/blk-core.c b/block/blk-core.c index 85909b431eb0..8f1063f50863 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) if (bio->bi_opf & REQ_NOWAIT) flags = BLK_MQ_REQ_NOWAIT; - if (blk_queue_enter(q, flags) < 0) { + if (bio->bi_flags & BIO_QUEUE_ENTERED) + blk_queue_enter_live(q); + else if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else diff --git a/block/blk-merge.c b/block/blk-merge.c index 782940c65d8a..9ee9474e579c 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) /* there isn't chance to merge the splitted bio */ split->bi_opf |= REQ_NOMERGE; + /* + * Since we're recursing into make_request here, ensure + * that we mark this bio as already having entered the queue. + * If not, and the queue is going away, we can get stuck + * forever on waiting for the queue reference to drop. But + * that will never happen, as we're already holding a + * reference to it. + */ + (*bio)->bi_flags |= BIO_QUEUE_ENTERED; + bio_chain(split, *bio); trace_block_split(q, split, (*bio)->bi_iter.bi_sector); generic_make_request(*bio); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 17b18b91ebac..4687d7c99076 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -186,6 +186,8 @@ struct bio { * throttling rules. Don't do it again. */ #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ +#define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ + /* See BVEC_POOL_OFFSET below before adding new flags */ /* -- Jens Axboe ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 0:49 ` Jens Axboe @ 2018-06-02 0:56 ` Jens Axboe 2018-06-02 2:36 ` Ming Lei 1 sibling, 0 replies; 32+ messages in thread From: Jens Axboe @ 2018-06-02 0:56 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On 6/1/18 6:49 PM, Jens Axboe wrote: > On 6/1/18 5:49 PM, Ming Lei wrote: >> On Fri, Jun 01, 2018 at 11:52:45AM -0600, Jens Axboe wrote: >>> On 6/1/18 4:10 AM, Tetsuo Handa wrote: >>>> Tetsuo Handa wrote: >>>>> Since sum of percpu_count did not change after percpu_ref_kill(), this is >>>>> not a race condition while folding percpu counter values into atomic counter >>>>> value. That is, for some reason, someone who is responsible for calling >>>>> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >>>>> unable to call percpu_ref_put(). >>>>> But I don't know how to find someone who is failing to call percpu_ref_put()... >>>> >>>> I found the someone. It was already there in the backtrace... >>>> >>>> ---------------------------------------- >>>> [ 62.065852] a.out D 0 4414 4337 0x00000000 >>>> [ 62.067677] Call Trace: >>>> [ 62.068545] __schedule+0x40b/0x860 >>>> [ 62.069726] schedule+0x31/0x80 >>>> [ 62.070796] schedule_timeout+0x1c1/0x3c0 >>>> [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 >>>> [ 62.073670] blk_queue_enter+0x218/0x520 >>>> [ 62.074985] ? remove_wait_queue+0x70/0x70 >>>> [ 62.076361] generic_make_request+0x3d/0x540 >>>> [ 62.077785] ? __bio_clone_fast+0x6b/0x80 >>>> [ 62.079147] ? bio_clone_fast+0x2c/0x70 >>>> [ 62.080456] blk_queue_split+0x29b/0x560 >>>> [ 62.081772] ? blk_queue_split+0x29b/0x560 >>>> [ 62.083162] blk_mq_make_request+0x7c/0x430 >>>> [ 62.084562] generic_make_request+0x276/0x540 >>>> [ 62.086034] submit_bio+0x6e/0x140 >>>> [ 62.087185] ? submit_bio+0x6e/0x140 >>>> [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 >>>> [ 62.089681] do_mpage_readpage+0x328/0x730 >>>> [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 >>>> [ 62.092726] mpage_readpages+0x120/0x190 >>>> [ 62.094034] ? check_disk_change+0x70/0x70 >>>> [ 62.095454] ? check_disk_change+0x70/0x70 >>>> [ 62.096849] ? alloc_pages_current+0x65/0xd0 >>>> [ 62.098277] blkdev_readpages+0x18/0x20 >>>> [ 62.099568] __do_page_cache_readahead+0x298/0x360 >>>> [ 62.101157] ondemand_readahead+0x1f6/0x490 >>>> [ 62.102546] ? ondemand_readahead+0x1f6/0x490 >>>> [ 62.103995] page_cache_sync_readahead+0x29/0x40 >>>> [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 >>>> [ 62.107067] ? futex_wait+0x221/0x240 >>>> [ 62.108303] ? trace_hardirqs_on+0xd/0x10 >>>> [ 62.109654] blkdev_read_iter+0x30/0x40 >>>> [ 62.110954] generic_file_splice_read+0xc5/0x140 >>>> [ 62.112538] do_splice_to+0x74/0x90 >>>> [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 >>>> [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 >>>> [ 62.116773] do_splice_direct+0x8a/0xb0 >>>> [ 62.118056] do_sendfile+0x1aa/0x390 >>>> [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 >>>> [ 62.120666] do_syscall_64+0x6e/0x210 >>>> [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe >>>> ---------------------------------------- >>>> >>>> The someone is blk_queue_split() from blk_mq_make_request() who depends on an >>>> assumption that blk_queue_enter() from recursively called generic_make_request() >>>> does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. >>>> >>>> ---------------------------------------- >>>> generic_make_request(struct bio *bio) { >>>> if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ >>>> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) >>>> bio_wouldblock_error(bio); >>>> else >>>> bio_io_error(bio); >>>> return ret; >>>> } >>>> (...snipped...) >>>> ret = q->make_request_fn(q, bio); >>>> (...snipped...) >>>> if (q) >>>> blk_queue_exit(q); >>>> } >>>> ---------------------------------------- >>>> >>>> where q->make_request_fn == blk_mq_make_request which does >>>> >>>> ---------------------------------------- >>>> blk_mq_make_request(struct request_queue *q, struct bio *bio) { >>>> blk_queue_split(q, &bio); >>>> } >>>> >>>> blk_queue_split(struct request_queue *q, struct bio **bio) { >>>> generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ >>>> } >>>> ---------------------------------------- >>>> >>>> and meanwhile atomic_inc_return(&q->mq_freeze_depth) and >>>> percpu_ref_kill() are called by blk_freeze_queue_start()... >>>> >>>> Now, it is up to you about how to fix this race problem. >>> >>> Ahh, nicely spotted! One idea would be the one below. For this case, >>> we're recursing, so we can either do a non-block queue enter, or we >>> can just do a live enter. >>> >>> diff --git a/block/blk-core.c b/block/blk-core.c >>> index 85909b431eb0..7de12e0dcc3d 100644 >>> --- a/block/blk-core.c >>> +++ b/block/blk-core.c >>> @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) >>> >>> if (bio->bi_opf & REQ_NOWAIT) >>> flags = BLK_MQ_REQ_NOWAIT; >>> - if (blk_queue_enter(q, flags) < 0) { >>> + if (bio->bi_opf & REQ_QUEUE_ENTERED) >>> + blk_queue_enter_live(q); >>> + else if (blk_queue_enter(q, flags) < 0) { >>> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) >>> bio_wouldblock_error(bio); >>> else >>> diff --git a/block/blk-merge.c b/block/blk-merge.c >>> index 782940c65d8a..e1063e8f7eda 100644 >>> --- a/block/blk-merge.c >>> +++ b/block/blk-merge.c >>> @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) >>> /* there isn't chance to merge the splitted bio */ >>> split->bi_opf |= REQ_NOMERGE; >>> >>> + /* >>> + * Since we're recursing into make_request here, ensure >>> + * that we mark this bio as already having entered the queue. >>> + * If not, and the queue is going away, we can get stuck >>> + * forever on waiting for the queue reference to drop. But >>> + * that will never happen, as we're already holding a >>> + * reference to it. >>> + */ >>> + (*bio)->bi_opf |= REQ_QUEUE_ENTERED; >>> + >>> bio_chain(split, *bio); >>> trace_block_split(q, split, (*bio)->bi_iter.bi_sector); >>> generic_make_request(*bio); >>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h >>> index 17b18b91ebac..a6bc789b83e7 100644 >>> --- a/include/linux/blk_types.h >>> +++ b/include/linux/blk_types.h >>> @@ -282,6 +282,8 @@ enum req_flag_bits { >>> /* command specific flags for REQ_OP_WRITE_ZEROES: */ >>> __REQ_NOUNMAP, /* do not free blocks when zeroing */ >>> >>> + __REQ_QUEUE_ENTERED, /* queue already entered for this request */ >>> + >>> /* for driver use */ >>> __REQ_DRV, >>> >>> @@ -302,9 +304,8 @@ enum req_flag_bits { >>> #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) >>> #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) >>> #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) >>> - >>> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) >>> - >>> +#define REQ_QUEUE_ENTERED (1ULL << __REQ_QUEUE_ENTERED) >>> #define REQ_DRV (1ULL << __REQ_DRV) >>> >>> #define REQ_FAILFAST_MASK \ >> >> bio clone inherits .bi_opf, so this way may not work for cloned bio. >> >> Given most of bio_split() is run in .make_request_fn(), holding the >> queue ref may not be needed in case of bio splitting, so another >> candidate fix might be to introduce one extra parameter of >> 'need_queue_ref' to generic_make_request(). > > I did consider that, but that's pretty silly for a corner case and you'd > have to change a crap ton of calls to it. I think it'd be cleaner to > clear the bit when we need to, potentially even adding a debug check to > blk_queue_enter_live() that complains if the ref was not already > elevated. Though that would be expensive, compared to the percpu inc > now. Not saying the bit is necessarily the best way forward, but I do > like it a LOT more than adding an argument to generic_make_request. > > It should have been a bio flag to begin with, that better captures the > scope and also avoids clones inheriting anything they shouldn't. It's > also not applicable to a request. A little too much inheritance from the previous patch, better version below. diff --git a/block/blk-core.c b/block/blk-core.c index 85909b431eb0..71cb228d7374 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) if (bio->bi_opf & REQ_NOWAIT) flags = BLK_MQ_REQ_NOWAIT; - if (blk_queue_enter(q, flags) < 0) { + if (bio_flagged(bio, BIO_QUEUE_ENTERED)) + blk_queue_enter_live(q); + else if (blk_queue_enter(q, flags) < 0) { if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) bio_wouldblock_error(bio); else diff --git a/block/blk-merge.c b/block/blk-merge.c index 782940c65d8a..481dc02668f9 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) /* there isn't chance to merge the splitted bio */ split->bi_opf |= REQ_NOMERGE; + /* + * Since we're recursing into make_request here, ensure + * that we mark this bio as already having entered the queue. + * If not, and the queue is going away, we can get stuck + * forever on waiting for the queue reference to drop. But + * that will never happen, as we're already holding a + * reference to it. + */ + bio_set_flag(*bio, BIO_QUEUE_ENTERED); + bio_chain(split, *bio); trace_block_split(q, split, (*bio)->bi_iter.bi_sector); generic_make_request(*bio); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 17b18b91ebac..1602bf4ab4cd 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -186,6 +186,8 @@ struct bio { * throttling rules. Don't do it again. */ #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion * of this bio. */ +#define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ + /* See BVEC_POOL_OFFSET below before adding new flags */ /* -- Jens Axboe ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 0:49 ` Jens Axboe 2018-06-02 0:56 ` Jens Axboe @ 2018-06-02 2:36 ` Ming Lei 2018-06-02 4:31 ` Jens Axboe 1 sibling, 1 reply; 32+ messages in thread From: Ming Lei @ 2018-06-02 2:36 UTC (permalink / raw) To: Jens Axboe Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On Fri, Jun 01, 2018 at 06:49:10PM -0600, Jens Axboe wrote: > On 6/1/18 5:49 PM, Ming Lei wrote: > > On Fri, Jun 01, 2018 at 11:52:45AM -0600, Jens Axboe wrote: > >> On 6/1/18 4:10 AM, Tetsuo Handa wrote: > >>> Tetsuo Handa wrote: > >>>> Since sum of percpu_count did not change after percpu_ref_kill(), this is > >>>> not a race condition while folding percpu counter values into atomic counter > >>>> value. That is, for some reason, someone who is responsible for calling > >>>> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > >>>> unable to call percpu_ref_put(). > >>>> But I don't know how to find someone who is failing to call percpu_ref_put()... > >>> > >>> I found the someone. It was already there in the backtrace... > >>> > >>> ---------------------------------------- > >>> [ 62.065852] a.out D 0 4414 4337 0x00000000 > >>> [ 62.067677] Call Trace: > >>> [ 62.068545] __schedule+0x40b/0x860 > >>> [ 62.069726] schedule+0x31/0x80 > >>> [ 62.070796] schedule_timeout+0x1c1/0x3c0 > >>> [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 > >>> [ 62.073670] blk_queue_enter+0x218/0x520 > >>> [ 62.074985] ? remove_wait_queue+0x70/0x70 > >>> [ 62.076361] generic_make_request+0x3d/0x540 > >>> [ 62.077785] ? __bio_clone_fast+0x6b/0x80 > >>> [ 62.079147] ? bio_clone_fast+0x2c/0x70 > >>> [ 62.080456] blk_queue_split+0x29b/0x560 > >>> [ 62.081772] ? blk_queue_split+0x29b/0x560 > >>> [ 62.083162] blk_mq_make_request+0x7c/0x430 > >>> [ 62.084562] generic_make_request+0x276/0x540 > >>> [ 62.086034] submit_bio+0x6e/0x140 > >>> [ 62.087185] ? submit_bio+0x6e/0x140 > >>> [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 > >>> [ 62.089681] do_mpage_readpage+0x328/0x730 > >>> [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 > >>> [ 62.092726] mpage_readpages+0x120/0x190 > >>> [ 62.094034] ? check_disk_change+0x70/0x70 > >>> [ 62.095454] ? check_disk_change+0x70/0x70 > >>> [ 62.096849] ? alloc_pages_current+0x65/0xd0 > >>> [ 62.098277] blkdev_readpages+0x18/0x20 > >>> [ 62.099568] __do_page_cache_readahead+0x298/0x360 > >>> [ 62.101157] ondemand_readahead+0x1f6/0x490 > >>> [ 62.102546] ? ondemand_readahead+0x1f6/0x490 > >>> [ 62.103995] page_cache_sync_readahead+0x29/0x40 > >>> [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 > >>> [ 62.107067] ? futex_wait+0x221/0x240 > >>> [ 62.108303] ? trace_hardirqs_on+0xd/0x10 > >>> [ 62.109654] blkdev_read_iter+0x30/0x40 > >>> [ 62.110954] generic_file_splice_read+0xc5/0x140 > >>> [ 62.112538] do_splice_to+0x74/0x90 > >>> [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 > >>> [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 > >>> [ 62.116773] do_splice_direct+0x8a/0xb0 > >>> [ 62.118056] do_sendfile+0x1aa/0x390 > >>> [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 > >>> [ 62.120666] do_syscall_64+0x6e/0x210 > >>> [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe > >>> ---------------------------------------- > >>> > >>> The someone is blk_queue_split() from blk_mq_make_request() who depends on an > >>> assumption that blk_queue_enter() from recursively called generic_make_request() > >>> does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. > >>> > >>> ---------------------------------------- > >>> generic_make_request(struct bio *bio) { > >>> if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ > >>> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > >>> bio_wouldblock_error(bio); > >>> else > >>> bio_io_error(bio); > >>> return ret; > >>> } > >>> (...snipped...) > >>> ret = q->make_request_fn(q, bio); > >>> (...snipped...) > >>> if (q) > >>> blk_queue_exit(q); > >>> } > >>> ---------------------------------------- > >>> > >>> where q->make_request_fn == blk_mq_make_request which does > >>> > >>> ---------------------------------------- > >>> blk_mq_make_request(struct request_queue *q, struct bio *bio) { > >>> blk_queue_split(q, &bio); > >>> } > >>> > >>> blk_queue_split(struct request_queue *q, struct bio **bio) { > >>> generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ > >>> } > >>> ---------------------------------------- > >>> > >>> and meanwhile atomic_inc_return(&q->mq_freeze_depth) and > >>> percpu_ref_kill() are called by blk_freeze_queue_start()... > >>> > >>> Now, it is up to you about how to fix this race problem. > >> > >> Ahh, nicely spotted! One idea would be the one below. For this case, > >> we're recursing, so we can either do a non-block queue enter, or we > >> can just do a live enter. > >> > >> diff --git a/block/blk-core.c b/block/blk-core.c > >> index 85909b431eb0..7de12e0dcc3d 100644 > >> --- a/block/blk-core.c > >> +++ b/block/blk-core.c > >> @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) > >> > >> if (bio->bi_opf & REQ_NOWAIT) > >> flags = BLK_MQ_REQ_NOWAIT; > >> - if (blk_queue_enter(q, flags) < 0) { > >> + if (bio->bi_opf & REQ_QUEUE_ENTERED) > >> + blk_queue_enter_live(q); > >> + else if (blk_queue_enter(q, flags) < 0) { > >> if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > >> bio_wouldblock_error(bio); > >> else > >> diff --git a/block/blk-merge.c b/block/blk-merge.c > >> index 782940c65d8a..e1063e8f7eda 100644 > >> --- a/block/blk-merge.c > >> +++ b/block/blk-merge.c > >> @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) > >> /* there isn't chance to merge the splitted bio */ > >> split->bi_opf |= REQ_NOMERGE; > >> > >> + /* > >> + * Since we're recursing into make_request here, ensure > >> + * that we mark this bio as already having entered the queue. > >> + * If not, and the queue is going away, we can get stuck > >> + * forever on waiting for the queue reference to drop. But > >> + * that will never happen, as we're already holding a > >> + * reference to it. > >> + */ > >> + (*bio)->bi_opf |= REQ_QUEUE_ENTERED; > >> + > >> bio_chain(split, *bio); > >> trace_block_split(q, split, (*bio)->bi_iter.bi_sector); > >> generic_make_request(*bio); > >> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > >> index 17b18b91ebac..a6bc789b83e7 100644 > >> --- a/include/linux/blk_types.h > >> +++ b/include/linux/blk_types.h > >> @@ -282,6 +282,8 @@ enum req_flag_bits { > >> /* command specific flags for REQ_OP_WRITE_ZEROES: */ > >> __REQ_NOUNMAP, /* do not free blocks when zeroing */ > >> > >> + __REQ_QUEUE_ENTERED, /* queue already entered for this request */ > >> + > >> /* for driver use */ > >> __REQ_DRV, > >> > >> @@ -302,9 +304,8 @@ enum req_flag_bits { > >> #define REQ_RAHEAD (1ULL << __REQ_RAHEAD) > >> #define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) > >> #define REQ_NOWAIT (1ULL << __REQ_NOWAIT) > >> - > >> #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) > >> - > >> +#define REQ_QUEUE_ENTERED (1ULL << __REQ_QUEUE_ENTERED) > >> #define REQ_DRV (1ULL << __REQ_DRV) > >> > >> #define REQ_FAILFAST_MASK \ > > > > bio clone inherits .bi_opf, so this way may not work for cloned bio. > > > > Given most of bio_split() is run in .make_request_fn(), holding the > > queue ref may not be needed in case of bio splitting, so another > > candidate fix might be to introduce one extra parameter of > > 'need_queue_ref' to generic_make_request(). > > I did consider that, but that's pretty silly for a corner case and you'd > have to change a crap ton of calls to it. I think it'd be cleaner to Seems not necessary to change any call to it, since all blk_queue_split() are called from q->make_request_fn() now. > clear the bit when we need to, potentially even adding a debug check to > blk_queue_enter_live() that complains if the ref was not already > elevated. Though that would be expensive, compared to the percpu inc > now. Not saying the bit is necessarily the best way forward, but I do > like it a LOT more than adding an argument to generic_make_request. We may introduce generic_make_request_with_ref() for blk_queue_split(), what do you think of the following patch? diff --git a/block/blk-core.c b/block/blk-core.c index cee03cad99f2..e10ee3f9f338 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -2363,7 +2363,7 @@ generic_make_request_checks(struct bio *bio) * a lower device by calling into generic_make_request recursively, which * means the bio should NOT be touched after the call to ->make_request_fn. */ -blk_qc_t generic_make_request(struct bio *bio) +blk_qc_t __generic_make_request(struct bio *bio, bool got_ref) { /* * bio_list_on_stack[0] contains bios submitted by the current @@ -2379,12 +2379,14 @@ blk_qc_t generic_make_request(struct bio *bio) if (bio->bi_opf & REQ_NOWAIT) flags = BLK_MQ_REQ_NOWAIT; - if (blk_queue_enter(q, flags) < 0) { - if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) - bio_wouldblock_error(bio); - else - bio_io_error(bio); - return ret; + if (!got_ref) { + if (blk_queue_enter(q, flags) < 0) { + if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) + bio_wouldblock_error(bio); + else + bio_io_error(bio); + return ret; + } } if (!generic_make_request_checks(bio)) @@ -2426,8 +2428,9 @@ blk_qc_t generic_make_request(struct bio *bio) bool enter_succeeded = true; if (unlikely(q != bio->bi_disk->queue)) { - if (q) + if (q && !got_ref) blk_queue_exit(q); + got_ref = false; q = bio->bi_disk->queue; flags = 0; if (bio->bi_opf & REQ_NOWAIT) @@ -2476,7 +2479,7 @@ blk_qc_t generic_make_request(struct bio *bio) blk_queue_exit(q); return ret; } -EXPORT_SYMBOL(generic_make_request); +EXPORT_SYMBOL(__generic_make_request); /** * direct_make_request - hand a buffer directly to its device driver for I/O diff --git a/block/blk-merge.c b/block/blk-merge.c index 5573d0fbec53..997616f1c85e 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -180,6 +180,9 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, return do_split ? new : NULL; } +/* + * This function has to be called from q->make_request_fn(). + */ void blk_queue_split(struct request_queue *q, struct bio **bio) { struct bio *split, *res; @@ -212,7 +215,7 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) bio_chain(split, *bio); trace_block_split(q, split, (*bio)->bi_iter.bi_sector); - generic_make_request(*bio); + generic_make_request_with_ref(*bio); *bio = split; } } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 4efd9af62e25..051018e0126e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -956,9 +956,18 @@ static inline void rq_flush_dcache_pages(struct request *rq) } #endif +blk_qc_t generic_make_request(struct bio *bio) +{ + return __generic_make_request(bio, false); +} + +blk_qc_t generic_make_request_with_ref(struct bio *bio) +{ + return __generic_make_request(bio, true); +} + extern int blk_register_queue(struct gendisk *disk); extern void blk_unregister_queue(struct gendisk *disk); -extern blk_qc_t generic_make_request(struct bio *bio); extern blk_qc_t direct_make_request(struct bio *bio); extern void blk_rq_init(struct request_queue *q, struct request *rq); extern void blk_init_request_from_bio(struct request *req, struct bio *bio); > > It should have been a bio flag to begin with, that better captures the > scope and also avoids clones inheriting anything they shouldn't. It's > also not applicable to a request. > > diff --git a/block/blk-core.c b/block/blk-core.c > index 85909b431eb0..8f1063f50863 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -2392,7 +2392,9 @@ blk_qc_t generic_make_request(struct bio *bio) > > if (bio->bi_opf & REQ_NOWAIT) > flags = BLK_MQ_REQ_NOWAIT; > - if (blk_queue_enter(q, flags) < 0) { > + if (bio->bi_flags & BIO_QUEUE_ENTERED) > + blk_queue_enter_live(q); > + else if (blk_queue_enter(q, flags) < 0) { > if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > bio_wouldblock_error(bio); > else > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 782940c65d8a..9ee9474e579c 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -210,6 +210,16 @@ void blk_queue_split(struct request_queue *q, struct bio **bio) > /* there isn't chance to merge the splitted bio */ > split->bi_opf |= REQ_NOMERGE; > > + /* > + * Since we're recursing into make_request here, ensure > + * that we mark this bio as already having entered the queue. > + * If not, and the queue is going away, we can get stuck > + * forever on waiting for the queue reference to drop. But > + * that will never happen, as we're already holding a > + * reference to it. > + */ > + (*bio)->bi_flags |= BIO_QUEUE_ENTERED; > + > bio_chain(split, *bio); > trace_block_split(q, split, (*bio)->bi_iter.bi_sector); > generic_make_request(*bio); > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h > index 17b18b91ebac..4687d7c99076 100644 > --- a/include/linux/blk_types.h > +++ b/include/linux/blk_types.h > @@ -186,6 +186,8 @@ struct bio { > * throttling rules. Don't do it again. */ > #define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion > * of this bio. */ > +#define BIO_QUEUE_ENTERED 11 /* can use blk_queue_enter_live() */ > + > /* See BVEC_POOL_OFFSET below before adding new flags */ In DM or MD, there is the following use case: 1) bio comes from fs, then one clone is made from this bio 2) the cloned bio is submitted via generic_make_request() from another context, such as wq. 3) then the flag of BIO_QUEUE_ENTERED is inherited and causes the queue usage counter isn't held. Thanks, Ming ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 2:36 ` Ming Lei @ 2018-06-02 4:31 ` Jens Axboe 2018-06-02 4:54 ` Ming Lei 0 siblings, 1 reply; 32+ messages in thread From: Jens Axboe @ 2018-06-02 4:31 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On 6/1/18 8:36 PM, Ming Lei wrote: >> clear the bit when we need to, potentially even adding a debug check to >> blk_queue_enter_live() that complains if the ref was not already >> elevated. Though that would be expensive, compared to the percpu inc >> now. Not saying the bit is necessarily the best way forward, but I do >> like it a LOT more than adding an argument to generic_make_request. > > We may introduce generic_make_request_with_ref() for blk_queue_split(), > what do you think of the following patch? Functionally it'll work as well. > In DM or MD, there is the following use case: > > 1) bio comes from fs, then one clone is made from this bio > > 2) the cloned bio is submitted via generic_make_request() from > another context, such as wq. > > 3) then the flag of BIO_QUEUE_ENTERED is inherited and causes > the queue usage counter isn't held. We don't copy flags on a clone. -- Jens Axboe ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 4:31 ` Jens Axboe @ 2018-06-02 4:54 ` Ming Lei 2018-06-02 8:07 ` Martin Steigerwald 2018-06-02 13:48 ` Jens Axboe 0 siblings, 2 replies; 32+ messages in thread From: Ming Lei @ 2018-06-02 4:54 UTC (permalink / raw) To: Jens Axboe Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On Fri, Jun 01, 2018 at 10:31:13PM -0600, Jens Axboe wrote: > On 6/1/18 8:36 PM, Ming Lei wrote: > >> clear the bit when we need to, potentially even adding a debug check to > >> blk_queue_enter_live() that complains if the ref was not already > >> elevated. Though that would be expensive, compared to the percpu inc > >> now. Not saying the bit is necessarily the best way forward, but I do > >> like it a LOT more than adding an argument to generic_make_request. > > > > We may introduce generic_make_request_with_ref() for blk_queue_split(), > > what do you think of the following patch? > > Functionally it'll work as well. If you are fine with this one, I'm glad to prepare for a normal one, so that we can save one precious bio flag, and the total number is just 13. > > > In DM or MD, there is the following use case: > > > > 1) bio comes from fs, then one clone is made from this bio > > > > 2) the cloned bio is submitted via generic_make_request() from > > another context, such as wq. > > > > 3) then the flag of BIO_QUEUE_ENTERED is inherited and causes > > the queue usage counter isn't held. > > We don't copy flags on a clone. Yeah, you are right, seems I ignored that. Thanks, Ming ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 4:54 ` Ming Lei @ 2018-06-02 8:07 ` Martin Steigerwald 2018-06-02 13:48 ` Jens Axboe 1 sibling, 0 replies; 32+ messages in thread From: Martin Steigerwald @ 2018-06-02 8:07 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 Ming Lei - 02.06.18, 06:54: > On Fri, Jun 01, 2018 at 10:31:13PM -0600, Jens Axboe wrote: > > On 6/1/18 8:36 PM, Ming Lei wrote: > > >> clear the bit when we need to, potentially even adding a debug > > >> check to blk_queue_enter_live() that complains if the ref was > > >> not already elevated. Though that would be expensive, compared > > >> to the percpu inc now. Not saying the bit is necessarily the > > >> best way forward, but I do like it a LOT more than adding an > > >> argument to generic_make_request.> >=20 > > > We may introduce generic_make_request_with_ref() for > > > blk_queue_split(), what do you think of the following patch? > >=20 > > Functionally it'll work as well. >=20 > If you are fine with this one, I'm glad to prepare for a normal one, > so that we can save one precious bio flag, and the total number is > just 13. [=E2=80=A6] I am completely lost about what you are talking about. That is fine with me, but in case there is anything to test=E2=80=A6 let me= =20 know. I do not promise quick responses tough, as I have been quite a bit=20 overwhelmed with stuff lately. Currently I run 4.17-rc7 without any blk-mq related patches and it=20 appears to be free of hangs or stalls related to blk-mq. Thanks, =2D-=20 Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-06-02 8:07 ` Martin Steigerwald 0 siblings, 0 replies; 32+ messages in thread From: Martin Steigerwald @ 2018-06-02 8:07 UTC (permalink / raw) To: Ming Lei Cc: Jens Axboe, Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 Ming Lei - 02.06.18, 06:54: > On Fri, Jun 01, 2018 at 10:31:13PM -0600, Jens Axboe wrote: > > On 6/1/18 8:36 PM, Ming Lei wrote: > > >> clear the bit when we need to, potentially even adding a debug > > >> check to blk_queue_enter_live() that complains if the ref was > > >> not already elevated. Though that would be expensive, compared > > >> to the percpu inc now. Not saying the bit is necessarily the > > >> best way forward, but I do like it a LOT more than adding an > > >> argument to generic_make_request.> > > > > We may introduce generic_make_request_with_ref() for > > > blk_queue_split(), what do you think of the following patch? > > > > Functionally it'll work as well. > > If you are fine with this one, I'm glad to prepare for a normal one, > so that we can save one precious bio flag, and the total number is > just 13. […] I am completely lost about what you are talking about. That is fine with me, but in case there is anything to test… let me know. I do not promise quick responses tough, as I have been quite a bit overwhelmed with stuff lately. Currently I run 4.17-rc7 without any blk-mq related patches and it appears to be free of hangs or stalls related to blk-mq. Thanks, -- Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-02 4:54 ` Ming Lei @ 2018-06-02 13:48 ` Jens Axboe 2018-06-02 13:48 ` Jens Axboe 1 sibling, 0 replies; 32+ messages in thread From: Jens Axboe @ 2018-06-02 13:48 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 > On Jun 1, 2018, at 10:54 PM, Ming Lei <ming.lei@redhat.com> wrote: >=20 >> On Fri, Jun 01, 2018 at 10:31:13PM -0600, Jens Axboe wrote: >> On 6/1/18 8:36 PM, Ming Lei wrote: >>>> clear the bit when we need to, potentially even adding a debug check to= >>>> blk_queue_enter_live() that complains if the ref was not already >>>> elevated. Though that would be expensive, compared to the percpu inc >>>> now. Not saying the bit is necessarily the best way forward, but I do >>>> like it a LOT more than adding an argument to generic_make_request. >>>=20 >>> We may introduce generic_make_request_with_ref() for blk_queue_split(), >>> what do you think of the following patch? >>=20 >> Functionally it'll work as well. >=20 > If you are fine with this one, I'm glad to prepare for a normal one, > so that we can save one precious bio flag, and the total number is > just 13. We=E2=80=99ve never had much issue with bio specific flags, I prefer that ap= proach to a wrapper around generic_make_request().=20 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-06-02 13:48 ` Jens Axboe 0 siblings, 0 replies; 32+ messages in thread From: Jens Axboe @ 2018-06-02 13:48 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 > On Jun 1, 2018, at 10:54 PM, Ming Lei <ming.lei@redhat.com> wrote: > >> On Fri, Jun 01, 2018 at 10:31:13PM -0600, Jens Axboe wrote: >> On 6/1/18 8:36 PM, Ming Lei wrote: >>>> clear the bit when we need to, potentially even adding a debug check to >>>> blk_queue_enter_live() that complains if the ref was not already >>>> elevated. Though that would be expensive, compared to the percpu inc >>>> now. Not saying the bit is necessarily the best way forward, but I do >>>> like it a LOT more than adding an argument to generic_make_request. >>> >>> We may introduce generic_make_request_with_ref() for blk_queue_split(), >>> what do you think of the following patch? >> >> Functionally it'll work as well. > > If you are fine with this one, I'm glad to prepare for a normal one, > so that we can save one precious bio flag, and the total number is > just 13. We’ve never had much issue with bio specific flags, I prefer that approach to a wrapper around generic_make_request(). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-01 17:52 ` Jens Axboe 2018-06-01 23:49 ` Ming Lei @ 2018-06-05 0:27 ` Tetsuo Handa 2018-06-05 0:41 ` Ming Lei 1 sibling, 1 reply; 32+ messages in thread From: Tetsuo Handa @ 2018-06-05 0:27 UTC (permalink / raw) To: Jens Axboe Cc: Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, ming.lei, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 Jens Axboe wrote: > On 6/1/18 4:10 AM, Tetsuo Handa wrote: > > Tetsuo Handa wrote: > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is > >> not a race condition while folding percpu counter values into atomic counter > >> value. That is, for some reason, someone who is responsible for calling > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > >> unable to call percpu_ref_put(). > >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > > > I found the someone. It was already there in the backtrace... > > > > Ahh, nicely spotted! One idea would be the one below. For this case, > we're recursing, so we can either do a non-block queue enter, or we > can just do a live enter. > While "block: don't use blocking queue entered for recursive bio submits" was already applied, syzbot is still reporting a hung task with same signature but different trace. https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 ---------------------------------------- [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. [ 492.519604] Not tainted 4.17.0+ #83 [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 [ 492.537443] Call Trace: [ 492.540041] __schedule+0x801/0x1e30 [ 492.580958] schedule+0xef/0x430 [ 492.610154] blk_queue_enter+0x8da/0xdf0 [ 492.716327] generic_make_request+0x651/0x1790 [ 492.765680] submit_bio+0xba/0x460 [ 492.793198] submit_bio_wait+0x134/0x1e0 [ 492.801891] blkdev_issue_flush+0x204/0x300 [ 492.806236] blkdev_fsync+0x93/0xd0 [ 492.813620] vfs_fsync_range+0x140/0x220 [ 492.817702] vfs_fsync+0x29/0x30 [ 492.821081] __loop_update_dio+0x4de/0x6a0 [ 492.825341] lo_ioctl+0xd28/0x2190 [ 492.833442] blkdev_ioctl+0x9b6/0x2020 [ 492.872146] block_ioctl+0xee/0x130 [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 [ 492.927550] ksys_ioctl+0xa9/0xd0 [ 492.931036] __x64_sys_ioctl+0x73/0xb0 [ 492.934952] do_syscall_64+0x1b1/0x800 [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 493.212768] 1 lock held by syz-executor1/20263: [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 ---------------------------------------- Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() by caling atomic_inc_return() and percpu_ref_kill() ? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-05 0:27 ` Tetsuo Handa @ 2018-06-05 0:41 ` Ming Lei 2018-06-07 3:29 ` Ming Lei 0 siblings, 1 reply; 32+ messages in thread From: Ming Lei @ 2018-06-05 0:41 UTC (permalink / raw) To: Tetsuo Handa Cc: Jens Axboe, Bart.VanAssche, dvyukov, linux-kernel, linux-block, jthumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, martin.petersen, dan.j.williams, hch, oleksandr, martin, hare, syzkaller-bugs, ross.zwisler, keith.busch, linux-ext4 On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote: > Jens Axboe wrote: > > On 6/1/18 4:10 AM, Tetsuo Handa wrote: > > > Tetsuo Handa wrote: > > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is > > >> not a race condition while folding percpu counter values into atomic counter > > >> value. That is, for some reason, someone who is responsible for calling > > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is > > >> unable to call percpu_ref_put(). > > >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > > > > > I found the someone. It was already there in the backtrace... > > > > > > > Ahh, nicely spotted! One idea would be the one below. For this case, > > we're recursing, so we can either do a non-block queue enter, or we > > can just do a live enter. > > > > While "block: don't use blocking queue entered for recursive bio submits" was > already applied, syzbot is still reporting a hung task with same signature but > different trace. > > https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 > ---------------------------------------- > [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. > [ 492.519604] Not tainted 4.17.0+ #83 > [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 > [ 492.537443] Call Trace: > [ 492.540041] __schedule+0x801/0x1e30 > [ 492.580958] schedule+0xef/0x430 > [ 492.610154] blk_queue_enter+0x8da/0xdf0 > [ 492.716327] generic_make_request+0x651/0x1790 > [ 492.765680] submit_bio+0xba/0x460 > [ 492.793198] submit_bio_wait+0x134/0x1e0 > [ 492.801891] blkdev_issue_flush+0x204/0x300 > [ 492.806236] blkdev_fsync+0x93/0xd0 > [ 492.813620] vfs_fsync_range+0x140/0x220 > [ 492.817702] vfs_fsync+0x29/0x30 > [ 492.821081] __loop_update_dio+0x4de/0x6a0 > [ 492.825341] lo_ioctl+0xd28/0x2190 > [ 492.833442] blkdev_ioctl+0x9b6/0x2020 > [ 492.872146] block_ioctl+0xee/0x130 > [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 > [ 492.927550] ksys_ioctl+0xa9/0xd0 > [ 492.931036] __x64_sys_ioctl+0x73/0xb0 > [ 492.934952] do_syscall_64+0x1b1/0x800 > [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 493.212768] 1 lock held by syz-executor1/20263: > [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 > ---------------------------------------- > > Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and > blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling > blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() > by caling atomic_inc_return() and percpu_ref_kill() ? > The vfs_fsync() isn't necessary in loop_update_dio() since both generic_file_write_iter() and generic_file_read_iter() can handle buffered io vs dio well. I will send one patch to remove the vfs_sync() later. Thanks, Ming ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-05 0:41 ` Ming Lei @ 2018-06-07 3:29 ` Ming Lei 0 siblings, 0 replies; 32+ messages in thread From: Ming Lei @ 2018-06-07 3:29 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Dmitry Vyukov, Linux Kernel Mailing List, linux-block, Johannes Thumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, Martin K. Petersen, Dan Williams, Christoph Hellwig, =Oleksandr Natalenko, martin, Hannes Reinecke, syzkaller-bugs, Ross Zwisler, Keith Busch, open list:EXT4 FILE SYSTEM On Tue, Jun 5, 2018 at 8:41 AM, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote: >> Jens Axboe wrote: >> > On 6/1/18 4:10 AM, Tetsuo Handa wrote: >> > > Tetsuo Handa wrote: >> > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is >> > >> not a race condition while folding percpu counter values into atomic counter >> > >> value. That is, for some reason, someone who is responsible for calling >> > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >> > >> unable to call percpu_ref_put(). >> > >> But I don't know how to find someone who is failing to call percpu_ref_put()... >> > > >> > > I found the someone. It was already there in the backtrace... >> > > >> > >> > Ahh, nicely spotted! One idea would be the one below. For this case, >> > we're recursing, so we can either do a non-block queue enter, or we >> > can just do a live enter. >> > >> >> While "block: don't use blocking queue entered for recursive bio submits" was >> already applied, syzbot is still reporting a hung task with same signature but >> different trace. >> >> https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 >> ---------------------------------------- >> [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. >> [ 492.519604] Not tainted 4.17.0+ #83 >> [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 >> [ 492.537443] Call Trace: >> [ 492.540041] __schedule+0x801/0x1e30 >> [ 492.580958] schedule+0xef/0x430 >> [ 492.610154] blk_queue_enter+0x8da/0xdf0 >> [ 492.716327] generic_make_request+0x651/0x1790 >> [ 492.765680] submit_bio+0xba/0x460 >> [ 492.793198] submit_bio_wait+0x134/0x1e0 >> [ 492.801891] blkdev_issue_flush+0x204/0x300 >> [ 492.806236] blkdev_fsync+0x93/0xd0 >> [ 492.813620] vfs_fsync_range+0x140/0x220 >> [ 492.817702] vfs_fsync+0x29/0x30 >> [ 492.821081] __loop_update_dio+0x4de/0x6a0 >> [ 492.825341] lo_ioctl+0xd28/0x2190 >> [ 492.833442] blkdev_ioctl+0x9b6/0x2020 >> [ 492.872146] block_ioctl+0xee/0x130 >> [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 >> [ 492.927550] ksys_ioctl+0xa9/0xd0 >> [ 492.931036] __x64_sys_ioctl+0x73/0xb0 >> [ 492.934952] do_syscall_64+0x1b1/0x800 >> [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> [ 493.212768] 1 lock held by syz-executor1/20263: >> [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 >> ---------------------------------------- >> >> Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and >> blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling >> blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() >> by caling atomic_inc_return() and percpu_ref_kill() ? >> > > The vfs_fsync() isn't necessary in loop_update_dio() since both > generic_file_write_iter() and generic_file_read_iter() can handle > buffered io vs dio well. > > I will send one patch to remove the vfs_sync() later. Hi Tetsuo, The issue might be fixed by removing this vfs_sync(), but I'd like to understand the idea behind since vfs_sync() shouldn't have caused any IO to this loop queue. I also tried to do the test via the following c syzbot, but can't reproduce it yet after running it for several hours. https://syzkaller.appspot.com/x/repro.c?id=4727023951937536 Could you share us how you reproduce it? Thanks, Ming Lei ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter @ 2018-06-07 3:29 ` Ming Lei 0 siblings, 0 replies; 32+ messages in thread From: Ming Lei @ 2018-06-07 3:29 UTC (permalink / raw) To: Ming Lei Cc: Tetsuo Handa, Jens Axboe, Bart Van Assche, Dmitry Vyukov, Linux Kernel Mailing List, linux-block, Johannes Thumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, Martin K. Petersen, Dan Williams, Christoph Hellwig, =Oleksandr Natalenko, martin, Hannes Reinecke, syzkaller-bugs, Ross Zwisler, Keith Busch, On Tue, Jun 5, 2018 at 8:41 AM, Ming Lei <ming.lei@redhat.com> wrote: > On Tue, Jun 05, 2018 at 09:27:41AM +0900, Tetsuo Handa wrote: >> Jens Axboe wrote: >> > On 6/1/18 4:10 AM, Tetsuo Handa wrote: >> > > Tetsuo Handa wrote: >> > >> Since sum of percpu_count did not change after percpu_ref_kill(), this is >> > >> not a race condition while folding percpu counter values into atomic counter >> > >> value. That is, for some reason, someone who is responsible for calling >> > >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >> > >> unable to call percpu_ref_put(). >> > >> But I don't know how to find someone who is failing to call percpu_ref_put()... >> > > >> > > I found the someone. It was already there in the backtrace... >> > > >> > >> > Ahh, nicely spotted! One idea would be the one below. For this case, >> > we're recursing, so we can either do a non-block queue enter, or we >> > can just do a live enter. >> > >> >> While "block: don't use blocking queue entered for recursive bio submits" was >> already applied, syzbot is still reporting a hung task with same signature but >> different trace. >> >> https://syzkaller.appspot.com/text?tag=CrashLog&x=1432cedf800000 >> ---------------------------------------- >> [ 492.512243] INFO: task syz-executor1:20263 blocked for more than 120 seconds. >> [ 492.519604] Not tainted 4.17.0+ #83 >> [ 492.523793] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [ 492.531787] syz-executor1 D23384 20263 4574 0x00000004 >> [ 492.537443] Call Trace: >> [ 492.540041] __schedule+0x801/0x1e30 >> [ 492.580958] schedule+0xef/0x430 >> [ 492.610154] blk_queue_enter+0x8da/0xdf0 >> [ 492.716327] generic_make_request+0x651/0x1790 >> [ 492.765680] submit_bio+0xba/0x460 >> [ 492.793198] submit_bio_wait+0x134/0x1e0 >> [ 492.801891] blkdev_issue_flush+0x204/0x300 >> [ 492.806236] blkdev_fsync+0x93/0xd0 >> [ 492.813620] vfs_fsync_range+0x140/0x220 >> [ 492.817702] vfs_fsync+0x29/0x30 >> [ 492.821081] __loop_update_dio+0x4de/0x6a0 >> [ 492.825341] lo_ioctl+0xd28/0x2190 >> [ 492.833442] blkdev_ioctl+0x9b6/0x2020 >> [ 492.872146] block_ioctl+0xee/0x130 >> [ 492.880139] do_vfs_ioctl+0x1cf/0x16a0 >> [ 492.927550] ksys_ioctl+0xa9/0xd0 >> [ 492.931036] __x64_sys_ioctl+0x73/0xb0 >> [ 492.934952] do_syscall_64+0x1b1/0x800 >> [ 492.963624] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> [ 493.212768] 1 lock held by syz-executor1/20263: >> [ 493.217448] #0: 00000000956bf5a3 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8d/0x2190 >> ---------------------------------------- >> >> Is it OK to call [__]loop_update_dio() between blk_mq_freeze_queue() and >> blk_mq_unfreeze_queue(), for vfs_fsync() from __loop_update_dio() is calling >> blk_queue_enter() after blk_mq_freeze_queue() started blocking blk_queue_enter() >> by caling atomic_inc_return() and percpu_ref_kill() ? >> > > The vfs_fsync() isn't necessary in loop_update_dio() since both > generic_file_write_iter() and generic_file_read_iter() can handle > buffered io vs dio well. > > I will send one patch to remove the vfs_sync() later. Hi Tetsuo, The issue might be fixed by removing this vfs_sync(), but I'd like to understand the idea behind since vfs_sync() shouldn't have caused any IO to this loop queue. I also tried to do the test via the following c syzbot, but can't reproduce it yet after running it for several hours. https://syzkaller.appspot.com/x/repro.c?id=4727023951937536 Could you share us how you reproduce it? Thanks, Ming Lei ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-07 3:29 ` Ming Lei (?) @ 2018-06-07 13:19 ` Tetsuo Handa -1 siblings, 0 replies; 32+ messages in thread From: Tetsuo Handa @ 2018-06-07 13:19 UTC (permalink / raw) To: Ming Lei, Ming Lei Cc: Jens Axboe, Bart Van Assche, Dmitry Vyukov, Linux Kernel Mailing List, linux-block, Johannes Thumshirn, alan.christopher.jenkins, syzbot+c4f9cebf9d651f6e54de, Martin K. Petersen, Dan Williams, Christoph Hellwig, Oleksandr Natalenko, martin, Hannes Reinecke, syzkaller-bugs, Ross Zwisler, Keith Busch, open list:EXT4 FILE SYSTEM On 2018/06/07 12:29, Ming Lei wrote: >> The vfs_fsync() isn't necessary in loop_update_dio() since both >> generic_file_write_iter() and generic_file_read_iter() can handle >> buffered io vs dio well. >> >> I will send one patch to remove the vfs_sync() later. > > Hi Tetsuo, > > The issue might be fixed by removing this vfs_sync(), but I'd like to > understand the idea behind since vfs_sync() shouldn't have caused > any IO to this loop queue. > > I also tried to do the test via the following c syzbot, but can't reproduce > it yet after running it for several hours. > > https://syzkaller.appspot.com/x/repro.c?id=4727023951937536 > > Could you share us how you reproduce it? I just reported that syzbot started hitting vfs_sync() case. I haven't succeeded reproducing vfs_sync() case in my environment. But in general, I modify "any reproducer provided by syzbot that loops forever" to call exit() if open() etc. failed, for continuing the loop unlikely helps after hitting "Too many open files" error. res = syscall(__NR_memfd_create, 0x200002c0, 0); if (res != -1) r[1] = res; + else + exit(1); Thus, I restart the reproducer in a loop from shell rather than keep running for hours. while :; do echo -n .; ./a.out; done ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-01 10:10 ` Tetsuo Handa 2018-06-01 17:52 ` Jens Axboe @ 2018-06-04 11:46 ` Dmitry Vyukov 2018-06-04 13:13 ` Tetsuo Handa 1 sibling, 1 reply; 32+ messages in thread From: Dmitry Vyukov @ 2018-06-04 11:46 UTC (permalink / raw) To: Tetsuo Handa Cc: Bart Van Assche, LKML, linux-block, Johannes Thumshirn, Alan Jenkins, syzbot, Martin K. Petersen, Jens Axboe, Dan Williams, Christoph Hellwig, oleksandr, ming.lei, martin, Hannes Reinecke, syzkaller-bugs, Ross Zwisler, keith.busch, linux-ext4 On Fri, Jun 1, 2018 at 12:10 PM, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > Tetsuo Handa wrote: >> Since sum of percpu_count did not change after percpu_ref_kill(), this is >> not a race condition while folding percpu counter values into atomic counter >> value. That is, for some reason, someone who is responsible for calling >> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >> unable to call percpu_ref_put(). >> But I don't know how to find someone who is failing to call percpu_ref_put()... > > I found the someone. It was already there in the backtrace... Nice! Do I understand it correctly that this bug is probably the root cause of a whole lot of syzbot "task hung" reports? E.g. this one too? https://syzkaller.appspot.com/bug?id=cdc4add60bb95a4da3fec27c5fe6d75196b7f976 I guess we will need to sweep close everything related to filesystems/block devices when this is committed? > ---------------------------------------- > [ 62.065852] a.out D 0 4414 4337 0x00000000 > [ 62.067677] Call Trace: > [ 62.068545] __schedule+0x40b/0x860 > [ 62.069726] schedule+0x31/0x80 > [ 62.070796] schedule_timeout+0x1c1/0x3c0 > [ 62.072159] ? __next_timer_interrupt+0xd0/0xd0 > [ 62.073670] blk_queue_enter+0x218/0x520 > [ 62.074985] ? remove_wait_queue+0x70/0x70 > [ 62.076361] generic_make_request+0x3d/0x540 > [ 62.077785] ? __bio_clone_fast+0x6b/0x80 > [ 62.079147] ? bio_clone_fast+0x2c/0x70 > [ 62.080456] blk_queue_split+0x29b/0x560 > [ 62.081772] ? blk_queue_split+0x29b/0x560 > [ 62.083162] blk_mq_make_request+0x7c/0x430 > [ 62.084562] generic_make_request+0x276/0x540 > [ 62.086034] submit_bio+0x6e/0x140 > [ 62.087185] ? submit_bio+0x6e/0x140 > [ 62.088384] ? guard_bio_eod+0x9d/0x1d0 > [ 62.089681] do_mpage_readpage+0x328/0x730 > [ 62.091045] ? __add_to_page_cache_locked+0x12e/0x1a0 > [ 62.092726] mpage_readpages+0x120/0x190 > [ 62.094034] ? check_disk_change+0x70/0x70 > [ 62.095454] ? check_disk_change+0x70/0x70 > [ 62.096849] ? alloc_pages_current+0x65/0xd0 > [ 62.098277] blkdev_readpages+0x18/0x20 > [ 62.099568] __do_page_cache_readahead+0x298/0x360 > [ 62.101157] ondemand_readahead+0x1f6/0x490 > [ 62.102546] ? ondemand_readahead+0x1f6/0x490 > [ 62.103995] page_cache_sync_readahead+0x29/0x40 > [ 62.105539] generic_file_read_iter+0x7d0/0x9d0 > [ 62.107067] ? futex_wait+0x221/0x240 > [ 62.108303] ? trace_hardirqs_on+0xd/0x10 > [ 62.109654] blkdev_read_iter+0x30/0x40 > [ 62.110954] generic_file_splice_read+0xc5/0x140 > [ 62.112538] do_splice_to+0x74/0x90 > [ 62.113726] splice_direct_to_actor+0xa4/0x1f0 > [ 62.115209] ? generic_pipe_buf_nosteal+0x10/0x10 > [ 62.116773] do_splice_direct+0x8a/0xb0 > [ 62.118056] do_sendfile+0x1aa/0x390 > [ 62.119255] __x64_sys_sendfile64+0x4e/0xc0 > [ 62.120666] do_syscall_64+0x6e/0x210 > [ 62.121909] entry_SYSCALL_64_after_hwframe+0x49/0xbe > ---------------------------------------- > > The someone is blk_queue_split() from blk_mq_make_request() who depends on an > assumption that blk_queue_enter() from recursively called generic_make_request() > does not get blocked due to percpu_ref_tryget_live(&q->q_usage_counter) failure. > > ---------------------------------------- > generic_make_request(struct bio *bio) { > if (blk_queue_enter(q, flags) < 0) { /* <= percpu_ref_tryget_live() succeeds. */ > if (!blk_queue_dying(q) && (bio->bi_opf & REQ_NOWAIT)) > bio_wouldblock_error(bio); > else > bio_io_error(bio); > return ret; > } > (...snipped...) > ret = q->make_request_fn(q, bio); > (...snipped...) > if (q) > blk_queue_exit(q); > } > ---------------------------------------- > > where q->make_request_fn == blk_mq_make_request which does > > ---------------------------------------- > blk_mq_make_request(struct request_queue *q, struct bio *bio) { > blk_queue_split(q, &bio); > } > > blk_queue_split(struct request_queue *q, struct bio **bio) { > generic_make_request(*bio); /* <= percpu_ref_tryget_live() fails and waits until atomic_read(&q->mq_freeze_depth) becomes 0. */ > } > ---------------------------------------- > > and meanwhile atomic_inc_return(&q->mq_freeze_depth) and > percpu_ref_kill() are called by blk_freeze_queue_start()... > > Now, it is up to you about how to fix this race problem. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-06-04 11:46 ` Dmitry Vyukov @ 2018-06-04 13:13 ` Tetsuo Handa 0 siblings, 0 replies; 32+ messages in thread From: Tetsuo Handa @ 2018-06-04 13:13 UTC (permalink / raw) To: Dmitry Vyukov Cc: Bart Van Assche, LKML, linux-block, Johannes Thumshirn, Alan Jenkins, syzbot, Martin K. Petersen, Jens Axboe, Dan Williams, Christoph Hellwig, oleksandr, ming.lei, martin, Hannes Reinecke, syzkaller-bugs, Ross Zwisler, keith.busch, linux-ext4 On 2018/06/04 20:46, Dmitry Vyukov wrote: > On Fri, Jun 1, 2018 at 12:10 PM, Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> Tetsuo Handa wrote: >>> Since sum of percpu_count did not change after percpu_ref_kill(), this is >>> not a race condition while folding percpu counter values into atomic counter >>> value. That is, for some reason, someone who is responsible for calling >>> percpu_ref_put(&q->q_usage_counter) (presumably via blk_queue_exit()) is >>> unable to call percpu_ref_put(). >>> But I don't know how to find someone who is failing to call percpu_ref_put()... >> >> I found the someone. It was already there in the backtrace... > > Nice! > > Do I understand it correctly that this bug is probably the root cause > of a whole lot of syzbot "task hung" reports? E.g. this one too? > https://syzkaller.appspot.com/bug?id=cdc4add60bb95a4da3fec27c5fe6d75196b7f976 > I guess we will need to sweep close everything related to > filesystems/block devices when this is committed? > I can't tell it. We still have lockdep warning for loop module. Also, I suspect that hung tasks waiting for page bit might be related to binder module. Since we are about to merge "kernel/hung_task.c: show all hung tasks before panic" patch, we might be able to get more hints after 4.18-rc1. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: INFO: task hung in blk_queue_enter 2018-05-16 13:05 ` Tetsuo Handa 2018-05-16 14:56 ` Bart Van Assche @ 2018-05-16 17:33 ` Alan Jenkins 1 sibling, 0 replies; 32+ messages in thread From: Alan Jenkins @ 2018-05-16 17:33 UTC (permalink / raw) To: Tetsuo Handa, syzkaller-bugs Cc: syzbot, linux-block, Dan Williams, Jens Axboe, linux-kernel, linux-ext4, dvyukov, Bart Van Assche, Christoph Hellwig, Hannes Reinecke, Johannes Thumshirn, Keith Busch, Martin K. Petersen, Martin Steigerwald, Ming Lei, Oleksandr Natalenko, Ross Zwisler > jbd2/sda1-8(PID=2271) is stuck waiting for journal commit operation. > I don't know how this thread is involved to this problem. It feels like it should be a necessary link in the chain. This is the filesystem underneath the loop device. If that hangs, we would expect the loop device to hang, but not vice versa. AIUI, you couldn't reproduce this hang on your own machine. Don't you think, your fuzzer has just found a nice exploit for reiserfs? Alan (It's interesting that you found this hang just after we fixed block_queue_enter() to wait in D state instead of S state.[1] I don't know syszkaller, so I assume from this it only flagged this case up because of the hung task warning. I.e. syzkaller didn't have its own watchdog that would have failed the test anyway. [1] the commit that caused you to CC me. "block: do not use interruptible wait anywhere" https://github.com/torvalds/linux/commit/1dc3039bc87ae7d19a990c3ee71cfd8a9068f428 ) ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-06-07 13:19 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-28 11:24 INFO: task hung in blk_queue_enter syzbot 2018-05-15 11:45 ` Tetsuo Handa 2018-05-16 13:05 ` Tetsuo Handa 2018-05-16 14:56 ` Bart Van Assche 2018-05-16 14:56 ` Bart Van Assche 2018-05-16 15:16 ` Dmitry Vyukov 2018-05-16 15:16 ` Dmitry Vyukov 2018-05-16 15:37 ` Bart Van Assche 2018-05-16 15:37 ` Bart Van Assche 2018-05-16 15:37 ` Bart Van Assche 2018-05-21 21:52 ` Tetsuo Handa 2018-05-22 11:20 ` Tetsuo Handa 2018-06-01 10:10 ` Tetsuo Handa 2018-06-01 17:52 ` Jens Axboe 2018-06-01 23:49 ` Ming Lei 2018-06-02 0:49 ` Jens Axboe 2018-06-02 0:56 ` Jens Axboe 2018-06-02 2:36 ` Ming Lei 2018-06-02 4:31 ` Jens Axboe 2018-06-02 4:54 ` Ming Lei 2018-06-02 8:07 ` Martin Steigerwald 2018-06-02 8:07 ` Martin Steigerwald 2018-06-02 13:48 ` Jens Axboe 2018-06-02 13:48 ` Jens Axboe 2018-06-05 0:27 ` Tetsuo Handa 2018-06-05 0:41 ` Ming Lei 2018-06-07 3:29 ` Ming Lei 2018-06-07 3:29 ` Ming Lei 2018-06-07 13:19 ` Tetsuo Handa 2018-06-04 11:46 ` Dmitry Vyukov 2018-06-04 13:13 ` Tetsuo Handa 2018-05-16 17:33 ` Alan Jenkins
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.