* blk-mq + bfq IO hangs after writing partition table @ 2017-12-01 6:52 ivan 2017-12-01 8:50 ` Ming Lei 0 siblings, 1 reply; 6+ messages in thread From: ivan @ 2017-12-01 6:52 UTC (permalink / raw) To: axboe, paolo.valente; +Cc: linux-block, linux-kernel, oleksandr Hi, I think I am triggering a blk-mq + bfq bug that I can reproduce 100% of the time by using gdisk (1.0.1-1 in Debian stretch) to write a partition table to a USB flash drive. After it is triggered, IO hangs forever to that device and the machine cannot be shut down cleanly. I have reproduced this on two very different amd64 machines and two different USB drives. I don't know if this affects other storage devices. This happens *only* with blk-mq + bfq, never with mq-deadline or kyber. I built df8ba95c572a187ed2aa7403e97a7a7f58c01f00 (2017-11-30) from Linus's tree with make-kpkg, without patches of any sort. My cmdline was: scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y apparmor=1 security=apparmor .config file: https://gist.githubusercontent.com/ivan/35935783e3153878ce650ab105c1695f/raw/b3de6c85eabd342118b5fecf2b4fab362bde7aa6/config To reproduce: boot with blk-mq plug in a USB stick without any data you want to keep echo bfq > /sys/block/sdX/queue/scheduler gdisk /dev/sdX delete some partitions or add some partitions "w" to write the partition table observe IO hang and call trace (below) in the journal after 2 minutes Note the log below does not show "bfq" because it was loaded earlier. If it does not reproduce, try another USB flash drive; if that does not reproduce, cat /dev/zero over it first. Best regards, Ivan Dec 01 05:12:17 kernel: usb 3-1: new high-speed USB device number 2 using xhci_hcd Dec 01 05:12:18 kernel: usb 3-1: New USB device found, idVendor=26bf, idProduct=201a Dec 01 05:12:18 kernel: usb 3-1: New USB device strings: Mfr=0, Product=3, SerialNumber=4 Dec 01 05:12:18 kernel: usb 3-1: Product: STORAGE DEVICE Dec 01 05:12:18 kernel: usb 3-1: SerialNumber: 44705421454402634814 Dec 01 05:12:18 mtp-probe[3289]: checking bus 3, device 2: "/sys/devices/pci0000:00/0000:00:1c.7/0000:26:00.0/usb3/3-1" Dec 01 05:12:18 mtp-probe[3289]: bus: 3, device: 2 was not an MTP device Dec 01 05:12:18 kernel: usb-storage 3-1:1.0: USB Mass Storage device detected Dec 01 05:12:18 kernel: scsi host6: usb-storage 3-1:1.0 Dec 01 05:12:18 kernel: usbcore: registered new interface driver usb-storage Dec 01 05:12:18 kernel: usbcore: registered new interface driver uas Dec 01 05:12:19 kernel: scsi 6:0:0:0: Direct-Access Specific STORAGE DEVICE 0009 PQ: 0 ANSI: 5 Dec 01 05:12:19 kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0 Dec 01 05:12:19 kernel: sd 6:0:0:0: [sdc] 7864320 512-byte logical blocks: (4.03 GB/3.75 GiB) Dec 01 05:12:19 kernel: sd 6:0:0:0: [sdc] Attached SCSI removable disk Dec 01 05:12:19 kernel: sdc: Dec 01 05:12:24 kernel: sdc: Dec 01 05:12:35 kernel: sdc: sdc1 sdc2 Dec 01 05:12:36 kernel: sdc: sdc1 sdc2 Dec 01 05:12:40 kernel: io scheduler kyber registered Dec 01 05:12:41 kernel: sdc: sdc1 sdc2 Dec 01 05:12:52 kernel: sdc: sdc1 Dec 01 05:12:53 kernel: sdc: sdc1 Dec 01 05:12:59 kernel: sdc: sdc1 Dec 01 05:14:00 systemd-udevd[331]: seq 2520 '/devices/pci0000:00/0000:00:1c.7/0000:26:00.0/usb3/3-1/3-1:1.0/host6/target6:0:0/6:0:0:0/block/sdc' is taking a long time Dec 01 05:14:05 systemd[1]: Started Getty on tty3. Dec 01 05:14:09 login[3339]: pam_unix(login:session): session opened for user root by LOGIN(uid=0) Dec 01 05:14:09 systemd[1]: Created slice User Slice of root. Dec 01 05:14:09 systemd[1]: Starting User Manager for UID 0... Dec 01 05:14:09 systemd[3344]: pam_unix(systemd-user:session): session opened for user root by (uid=0) Dec 01 05:14:09 systemd-logind[545]: New session 3 of user root. Dec 01 05:14:09 systemd[1]: Started Session 3 of user root. Dec 01 05:14:09 systemd[3344]: Reached target Paths. Dec 01 05:14:09 systemd[3344]: Starting D-Bus User Message Bus Socket. Dec 01 05:14:09 systemd[3344]: Listening on GnuPG cryptographic agent and passphrase cache (restricted). Dec 01 05:14:09 systemd[3344]: Listening on GnuPG cryptographic agent and passphrase cache. Dec 01 05:14:09 systemd[3344]: Listening on GnuPG cryptographic agent (access for web browsers). Dec 01 05:14:09 systemd[3344]: Listening on GnuPG cryptographic agent (ssh-agent emulation). Dec 01 05:14:09 systemd[3344]: Reached target Timers. Dec 01 05:14:09 systemd[3344]: Listening on D-Bus User Message Bus Socket. Dec 01 05:14:09 systemd[3344]: Reached target Sockets. Dec 01 05:14:09 systemd[3344]: Reached target Basic System. Dec 01 05:14:09 systemd[3344]: Reached target Default. Dec 01 05:14:09 systemd[3344]: Startup finished in 13ms. Dec 01 05:14:09 systemd[1]: Started User Manager for UID 0. Dec 01 05:14:09 login[3350]: ROOT LOGIN on '/dev/tty3' Dec 01 05:15:23 kernel: INFO: task kworker/2:1:34 blocked for more than 120 seconds. Dec 01 05:15:23 kernel: Not tainted 4.15.0-rc1+ #2 Dec 01 05:15:23 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 01 05:15:23 kernel: kworker/2:1 D 0 34 2 0x80000000 Dec 01 05:15:23 kernel: Workqueue: events_freezable_power_ disk_events_workfn Dec 01 05:15:23 kernel: Call Trace: Dec 01 05:15:23 kernel: ? __schedule+0x3cb/0x850 Dec 01 05:15:23 kernel: schedule+0x32/0x80 Dec 01 05:15:23 kernel: io_schedule+0x12/0x40 Dec 01 05:15:23 kernel: blk_mq_get_tag+0x15e/0x250 Dec 01 05:15:23 kernel: ? remove_wait_queue+0x60/0x60 Dec 01 05:15:23 kernel: blk_mq_get_request+0xcb/0x390 Dec 01 05:15:23 kernel: blk_mq_alloc_request+0x7d/0xd0 Dec 01 05:15:23 kernel: blk_get_request_flags+0x40/0x190 Dec 01 05:15:23 kernel: scsi_execute+0x38/0x250 [scsi_mod] Dec 01 05:15:23 kernel: scsi_test_unit_ready+0x75/0xf0 [scsi_mod] Dec 01 05:15:23 kernel: sd_check_events+0x125/0x1a0 [sd_mod] Dec 01 05:15:23 kernel: disk_check_events+0x63/0x150 Dec 01 05:15:23 kernel: process_one_work+0x184/0x380 Dec 01 05:15:23 kernel: worker_thread+0x4d/0x3c0 Dec 01 05:15:23 kernel: kthread+0xf5/0x130 Dec 01 05:15:23 kernel: ? process_one_work+0x380/0x380 Dec 01 05:15:23 kernel: ? kthread_create_worker_on_cpu+0x70/0x70 Dec 01 05:15:23 kernel: ret_from_fork+0x1f/0x30 Dec 01 05:15:23 kernel: INFO: task kworker/u16:3:131 blocked for more than 120 seconds. Dec 01 05:15:23 kernel: Not tainted 4.15.0-rc1+ #2 Dec 01 05:15:23 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 01 05:15:23 kernel: kworker/u16:3 D 0 131 2 0x80000000 Dec 01 05:15:23 kernel: Workqueue: writeback wb_workfn (flush-8:32) Dec 01 05:15:23 kernel: Call Trace: Dec 01 05:15:23 kernel: ? __schedule+0x3cb/0x850 Dec 01 05:15:23 kernel: schedule+0x32/0x80 Dec 01 05:15:23 kernel: io_schedule+0x12/0x40 Dec 01 05:15:23 kernel: blk_mq_get_tag+0x15e/0x250 Dec 01 05:15:23 kernel: ? elv_merge+0x69/0xe0 Dec 01 05:15:23 kernel: ? remove_wait_queue+0x60/0x60 Dec 01 05:15:23 kernel: blk_mq_get_request+0xcb/0x390 Dec 01 05:15:23 kernel: blk_mq_make_request+0x12b/0x5e0 Dec 01 05:15:23 kernel: generic_make_request+0x11e/0x2f0 Dec 01 05:15:23 kernel: ? submit_bio+0x6c/0x140 Dec 01 05:15:23 kernel: submit_bio+0x6c/0x140 Dec 01 05:15:23 kernel: ? guard_bio_eod+0x26/0xe0 Dec 01 05:15:23 kernel: submit_bh_wbc+0x14b/0x170 Dec 01 05:15:23 kernel: __block_write_full_page+0x17c/0x400 Dec 01 05:15:23 kernel: ? I_BDEV+0x10/0x10 Dec 01 05:15:23 kernel: ? page_referenced_one+0x160/0x160 Dec 01 05:15:23 kernel: __writepage+0x13/0x50 Dec 01 05:15:23 kernel: write_cache_pages+0x1ed/0x430 Dec 01 05:15:23 kernel: ? __wb_calc_thresh+0x130/0x130 Dec 01 05:15:23 kernel: ? bit_clear+0x110/0x110 Dec 01 05:15:23 kernel: ? drm_fb_helper_cfb_fillrect+0x12/0x30 [drm_kms_helper] Dec 01 05:15:23 kernel: generic_writepages+0x61/0xa0 Dec 01 05:15:23 kernel: ? do_writepages+0x17/0x60 Dec 01 05:15:23 kernel: do_writepages+0x17/0x60 Dec 01 05:15:23 kernel: __writeback_single_inode+0x3d/0x310 Dec 01 05:15:23 kernel: writeback_sb_inodes+0x221/0x4b0 Dec 01 05:15:23 kernel: __writeback_inodes_wb+0x87/0xb0 Dec 01 05:15:23 kernel: wb_writeback+0x282/0x310 Dec 01 05:15:23 kernel: ? cpumask_next+0x16/0x20 Dec 01 05:15:23 kernel: ? wb_workfn+0x345/0x450 Dec 01 05:15:23 kernel: wb_workfn+0x345/0x450 Dec 01 05:15:23 kernel: process_one_work+0x184/0x380 Dec 01 05:15:23 kernel: worker_thread+0x4d/0x3c0 Dec 01 05:15:23 kernel: kthread+0xf5/0x130 Dec 01 05:15:23 kernel: ? process_one_work+0x380/0x380 Dec 01 05:15:23 kernel: ? kthread_create_worker_on_cpu+0x70/0x70 Dec 01 05:15:23 kernel: ret_from_fork+0x1f/0x30 Dec 01 05:15:23 kernel: INFO: task gdisk:3337 blocked for more than 120 seconds. Dec 01 05:15:23 kernel: Not tainted 4.15.0-rc1+ #2 Dec 01 05:15:23 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 01 05:15:23 kernel: gdisk D 0 3337 3253 0x00000000 Dec 01 05:15:23 kernel: Call Trace: Dec 01 05:15:23 kernel: ? __schedule+0x3cb/0x850 Dec 01 05:15:23 kernel: schedule+0x32/0x80 Dec 01 05:15:23 kernel: io_schedule+0x12/0x40 Dec 01 05:15:23 kernel: blk_mq_get_tag+0x15e/0x250 Dec 01 05:15:23 kernel: ? elv_merge+0x69/0xe0 Dec 01 05:15:23 kernel: ? remove_wait_queue+0x60/0x60 Dec 01 05:15:23 kernel: blk_mq_get_request+0xcb/0x390 Dec 01 05:15:23 kernel: blk_mq_make_request+0x12b/0x5e0 Dec 01 05:15:23 kernel: generic_make_request+0x11e/0x2f0 Dec 01 05:15:23 kernel: ? submit_bio+0x6c/0x140 Dec 01 05:15:23 kernel: submit_bio+0x6c/0x140 Dec 01 05:15:23 kernel: ? guard_bio_eod+0x26/0xe0 Dec 01 05:15:23 kernel: submit_bh_wbc+0x14b/0x170 Dec 01 05:15:23 kernel: __block_write_full_page+0x17c/0x400 Dec 01 05:15:23 kernel: ? I_BDEV+0x10/0x10 Dec 01 05:15:23 kernel: ? page_referenced_one+0x160/0x160 Dec 01 05:15:23 kernel: __writepage+0x13/0x50 Dec 01 05:15:23 kernel: write_cache_pages+0x1ed/0x430 Dec 01 05:15:23 kernel: ? __wb_calc_thresh+0x130/0x130 Dec 01 05:15:23 kernel: ? _cond_resched+0x16/0x40 Dec 01 05:15:23 kernel: ? flush_work+0x43/0x1d0 Dec 01 05:15:23 kernel: ? pick_next_task_fair+0x308/0x5d0 Dec 01 05:15:23 kernel: generic_writepages+0x61/0xa0 Dec 01 05:15:23 kernel: ? fdatawait_one_bdev+0x20/0x20 Dec 01 05:15:23 kernel: ? do_writepages+0x17/0x60 Dec 01 05:15:23 kernel: do_writepages+0x17/0x60 Dec 01 05:15:23 kernel: __filemap_fdatawrite_range+0xc8/0x100 Dec 01 05:15:23 kernel: ? _xfs_log_force+0x8a/0x2a0 [xfs] Dec 01 05:15:23 kernel: iterate_bdevs+0x140/0x150 Dec 01 05:15:23 kernel: sys_sync+0x6e/0xb0 Dec 01 05:15:23 kernel: entry_SYSCALL_64_fastpath+0x1e/0x81 Dec 01 05:15:23 kernel: RIP: 0033:0x7f0b4eaeb5e7 Dec 01 05:15:23 kernel: RSP: 002b:00007ffdaa3c1db8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2 Dec 01 05:15:23 kernel: RAX: ffffffffffffffda RBX: 00007ffdaa3c3878 RCX: 00007f0b4eaeb5e7 Dec 01 05:15:23 kernel: RDX: 00007ffdaa3c1cf0 RSI: 00007ffdaa3c1cf0 RDI: 0000000000000003 Dec 01 05:15:23 kernel: RBP: 00007ffdaa3c1b80 R08: 00005640d1adf940 R09: 0000000000000210 Dec 01 05:15:23 kernel: R10: 0000000000016830 R11: 0000000000000202 R12: 0000000000000001 Dec 01 05:15:23 kernel: R13: 00007ffdaa3c3878 R14: 0000000000000000 R15: 00007ffdaa3c26a2 Dec 01 05:16:00 systemd-udevd[331]: seq 2520 '/devices/pci0000:00/0000:00:1c.7/0000:26:00.0/usb3/3-1/3-1:1.0/host6/target6:0:0/6:0:0:0/block/sdc' killed ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: blk-mq + bfq IO hangs after writing partition table 2017-12-01 6:52 blk-mq + bfq IO hangs after writing partition table ivan @ 2017-12-01 8:50 ` Ming Lei 2017-12-01 12:18 ` Ivan Kozik 0 siblings, 1 reply; 6+ messages in thread From: Ming Lei @ 2017-12-01 8:50 UTC (permalink / raw) To: ivan; +Cc: axboe, paolo.valente, linux-block, linux-kernel, oleksandr On Fri, Dec 01, 2017 at 06:52:37AM +0000, ivan@ludios.org wrote: > > Hi, > > I think I am triggering a blk-mq + bfq bug that I can reproduce 100% > of the time by using gdisk (1.0.1-1 in Debian stretch) to write a > partition table to a USB flash drive. After it is triggered, IO hangs > forever to that device and the machine cannot be shut down cleanly. > I have reproduced this on two very different amd64 machines and two > different USB drives. I don't know if this affects other storage > devices. This happens *only* with blk-mq + bfq, never with mq-deadline > or kyber. > > I built df8ba95c572a187ed2aa7403e97a7a7f58c01f00 (2017-11-30) from > Linus's tree with make-kpkg, without patches of any sort. > > My cmdline was: > scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y apparmor=1 security=apparmor > > .config file: > https://gist.githubusercontent.com/ivan/35935783e3153878ce650ab105c1695f/raw/b3de6c85eabd342118b5fecf2b4fab362bde7aa6/config > > To reproduce: > boot with blk-mq > plug in a USB stick without any data you want to keep > echo bfq > /sys/block/sdX/queue/scheduler > gdisk /dev/sdX > delete some partitions or add some partitions > "w" to write the partition table > observe IO hang and call trace (below) in the journal after 2 minutes > > Note the log below does not show "bfq" because it was loaded earlier. > > If it does not reproduce, try another USB flash drive; if that does not > reproduce, cat /dev/zero over it first. Hi, The trick of USB flash is just that 'can_queue' is one, I have tried to simulate your test with scsi_debug by setting 'can_queue' as one, looks can't reproduce your issue. Could you run the following script[1] and provide us the result after the IO hang is triggered? #./dump-blk-info /dev/sdX #/dev/sdX is name of your USB disk [1] http://people.redhat.com/minlei/tests/tools/dump-blk-info Thanks, Ming ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: blk-mq + bfq IO hangs after writing partition table 2017-12-01 8:50 ` Ming Lei @ 2017-12-01 12:18 ` Ivan Kozik 2017-12-01 13:35 ` Ming Lei 0 siblings, 1 reply; 6+ messages in thread From: Ivan Kozik @ 2017-12-01 12:18 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, paolo.valente, linux-block, linux-kernel, oleksandr [-- Attachment #1: Type: text/plain, Size: 6530 bytes --] On Fri, Dec 1, 2017 at 8:50 AM, Ming Lei <ming.lei@redhat.com> wrote: > Could you run the following script[1] and provide us the result after > the IO hang is triggered? > > #./dump-blk-info /dev/sdX #/dev/sdX is name of your USB disk > > [1] http://people.redhat.com/minlei/tests/tools/dump-blk-info Hi Ming, Thanks for taking a look. I have pasted and attached (in case of gmail corruption) these files: Samsung-128GB-before: Samsung 128GB USB 3.0 Fit plugged into HP 8460p, before gdisk write Samsung-128GB-after: Samsung 128GB USB 3.0 Fit plugged into HP 8460p, after gdisk write Generic-4GB-after: some awful generic but working 4GB USB flash drive plugged into HP 8460p, after gdisk write I have reproduced this again on the same kernel revision and blk-mq bfq with: minimized cmdline: scsi_mod.use_blk_mq=y gdisk writing the partition table as-is, no changes and confirmed again that kyber never triggers this. Thanks, Ivan Samsung-128GB-before =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 3 14 $dispatched 3 14 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 159 76 $dispatched 159 76 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 84 57 $dispatched 84 57 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 154 54 $dispatched 155 54 $merged 1 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 8 1 701 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 602 $run 843 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=1 bits_per_word=64 map_nr=1 alloc_hint={0, 1, 0, 0, 0, 1, 1, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 01 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 Samsung-128GB-after =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 3 14 $dispatched 3 14 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 159 95 $dispatched 159 96 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 84 58 $dispatched 84 58 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 154 54 $dispatched 155 54 $merged 1 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 8 1 721 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 623 $run 887 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=2 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 1, 1, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 03 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 Generic-4GB-after =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 94 32 $dispatched 94 33 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 561 113 $dispatched 562 113 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 251 179 $dispatched 251 179 $merged 4 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 228 63 $dispatched 228 63 $merged 0 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 21 1 1707 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 1523 $run 2000 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=2 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 1, 1, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 03 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 [-- Attachment #2: Generic-4GB-after --] [-- Type: application/octet-stream, Size: 1691 bytes --] =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 94 32 $dispatched 94 33 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 561 113 $dispatched 562 113 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 251 179 $dispatched 251 179 $merged 4 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 228 63 $dispatched 228 63 $merged 0 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 21 1 1707 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 1523 $run 2000 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=2 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 1, 1, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 03 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 [-- Attachment #3: Samsung-128GB-after --] [-- Type: application/octet-stream, Size: 1679 bytes --] =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 3 14 $dispatched 3 14 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 159 95 $dispatched 159 96 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 84 58 $dispatched 84 58 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 154 54 $dispatched 155 54 $merged 1 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 8 1 721 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 623 $run 887 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=2 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 1, 1, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=active}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 03 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 [-- Attachment #4: Samsung-128GB-before --] [-- Type: application/octet-stream, Size: 1685 bytes --] =============sdc/hctx0================== $active 0 $busy $/sys/kernel/debug/block/sdc//hctx0/cpu0 $completed 3 14 $dispatched 3 14 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu1 $completed 159 76 $dispatched 159 76 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu2 $completed 84 57 $dispatched 84 57 $merged 0 $rq_list $/sys/kernel/debug/block/sdc//hctx0/cpu3 $completed 154 54 $dispatched 155 54 $merged 1 $rq_list $ctx_map 00000000: 00 $dispatch $dispatched 0 8 1 701 2 0 4 0 8 0 16 0 32+ 0 $flags alloc_policy=FIFO SHOULD_MERGE|SG_MERGE $io_poll considered=0 invoked=0 success=0 $queued 602 $run 843 $sched_tags nr_tags=2 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=2 busy=1 bits_per_word=64 map_nr=1 alloc_hint={0, 1, 0, 0, 0, 1, 1, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $sched_tags_bitmap 00000000: 01 $state $tags nr_tags=1 nr_reserved_tags=0 active_queues=0 bitmap_tags: depth=1 busy=0 bits_per_word=64 map_nr=1 alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} wake_batch=1 wake_index=0 ws={ {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, {.wait_cnt=1, .wait=inactive}, } round_robin=0 $tags_bitmap 00000000: 00 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: blk-mq + bfq IO hangs after writing partition table 2017-12-01 12:18 ` Ivan Kozik @ 2017-12-01 13:35 ` Ming Lei 2017-12-01 14:46 ` Ivan Kozik 0 siblings, 1 reply; 6+ messages in thread From: Ming Lei @ 2017-12-01 13:35 UTC (permalink / raw) To: Ivan Kozik; +Cc: axboe, paolo.valente, linux-block, linux-kernel, oleksandr On Fri, Dec 01, 2017 at 12:18:25PM +0000, Ivan Kozik wrote: > On Fri, Dec 1, 2017 at 8:50 AM, Ming Lei <ming.lei@redhat.com> wrote: > > Could you run the following script[1] and provide us the result after > > the IO hang is triggered? > > > > #./dump-blk-info /dev/sdX #/dev/sdX is name of your USB disk > > > > [1] http://people.redhat.com/minlei/tests/tools/dump-blk-info > > Hi Ming, > > Thanks for taking a look. I have pasted and attached (in case of > gmail corruption) these files: > > Samsung-128GB-before: Samsung 128GB USB 3.0 Fit plugged into HP 8460p, > before gdisk write > > Samsung-128GB-after: Samsung 128GB USB 3.0 Fit plugged into HP 8460p, > after gdisk write > > Generic-4GB-after: some awful generic but working 4GB USB flash > drive plugged into HP 8460p, after gdisk write > > I have reproduced this again on the same kernel revision and blk-mq bfq with: > > minimized cmdline: scsi_mod.use_blk_mq=y > gdisk writing the partition table as-is, no changes > > and confirmed again that kyber never triggers this. > > Thanks, > > Ivan > > > > > Samsung-128GB-before > =============sdc/hctx0================== > $active > 0 > > $busy > > $/sys/kernel/debug/block/sdc//hctx0/cpu0 > $completed > 3 14 > $dispatched > 3 14 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu1 > $completed > 159 76 > $dispatched > 159 76 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu2 > $completed > 84 57 > $dispatched > 84 57 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu3 > $completed > 154 54 > $dispatched > 155 54 > $merged > 1 > $rq_list > > $ctx_map > 00000000: 00 > > $dispatch > > $dispatched > 0 8 > 1 701 > 2 0 > 4 0 > 8 0 > 16 0 > 32+ 0 > > $flags > alloc_policy=FIFO SHOULD_MERGE|SG_MERGE > > $io_poll > considered=0 > invoked=0 > success=0 > > $queued > 602 > > $run > 843 > > $sched_tags > nr_tags=2 > nr_reserved_tags=0 > active_queues=0 > > bitmap_tags: > depth=2 > busy=1 > bits_per_word=64 > map_nr=1 > alloc_hint={0, 1, 0, 0, 0, 1, 1, 0} > wake_batch=1 > wake_index=0 > ws={ > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > } > round_robin=0 > > $sched_tags_bitmap > 00000000: 01 > > $state > > > $tags > nr_tags=1 > nr_reserved_tags=0 > active_queues=0 > > bitmap_tags: > depth=1 > busy=0 > bits_per_word=64 > map_nr=1 > alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} > wake_batch=1 > wake_index=0 > ws={ > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > } > round_robin=0 > > $tags_bitmap > 00000000: 00 > > > > > Samsung-128GB-after > =============sdc/hctx0================== > $active > 0 > > $busy > > $/sys/kernel/debug/block/sdc//hctx0/cpu0 > $completed > 3 14 > $dispatched > 3 14 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu1 > $completed > 159 95 > $dispatched > 159 96 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu2 > $completed > 84 58 > $dispatched > 84 58 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu3 > $completed > 154 54 > $dispatched > 155 54 > $merged > 1 > $rq_list > > $ctx_map > 00000000: 00 > > $dispatch > > $dispatched > 0 8 > 1 721 > 2 0 > 4 0 > 8 0 > 16 0 > 32+ 0 > > $flags > alloc_policy=FIFO SHOULD_MERGE|SG_MERGE > > $io_poll > considered=0 > invoked=0 > success=0 > > $queued > 623 > > $run > 887 > > $sched_tags > nr_tags=2 > nr_reserved_tags=0 > active_queues=0 > > bitmap_tags: > depth=2 > busy=2 > bits_per_word=64 > map_nr=1 > alloc_hint={0, 0, 0, 0, 0, 1, 1, 0} > wake_batch=1 > wake_index=0 > ws={ > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > } > round_robin=0 > > $sched_tags_bitmap > 00000000: 03 > > $state > > > $tags > nr_tags=1 > nr_reserved_tags=0 > active_queues=0 > > bitmap_tags: > depth=1 > busy=0 > bits_per_word=64 > map_nr=1 > alloc_hint={0, 0, 0, 0, 0, 0, 0, 0} > wake_batch=1 > wake_index=0 > ws={ > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > } > round_robin=0 > > $tags_bitmap > 00000000: 00 > > > > > Generic-4GB-after > =============sdc/hctx0================== > $active > 0 > > $busy > > $/sys/kernel/debug/block/sdc//hctx0/cpu0 > $completed > 94 32 > $dispatched > 94 33 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu1 > $completed > 561 113 > $dispatched > 562 113 > $merged > 0 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu2 > $completed > 251 179 > $dispatched > 251 179 > $merged > 4 > $rq_list > > $/sys/kernel/debug/block/sdc//hctx0/cpu3 > $completed > 228 63 > $dispatched > 228 63 > $merged > 0 > $rq_list > > $ctx_map > 00000000: 00 > > $dispatch > > $dispatched > 0 21 > 1 1707 > 2 0 > 4 0 > 8 0 > 16 0 > 32+ 0 > > $flags > alloc_policy=FIFO SHOULD_MERGE|SG_MERGE > > $io_poll > considered=0 > invoked=0 > success=0 > > $queued > 1523 > > $run > 2000 > > $sched_tags > nr_tags=2 > nr_reserved_tags=0 > active_queues=0 > > bitmap_tags: > depth=2 > busy=2 > bits_per_word=64 > map_nr=1 > alloc_hint={0, 0, 0, 0, 1, 1, 0, 0} > wake_batch=1 > wake_index=0 > ws={ > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=active}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, > {.wait_cnt=1, .wait=inactive}, Hi Ivan, Thanks for your post and test! Basically all IO hang happen inside get_request(), and seems this issue isn't related with recent change in V4.15, could you run V4.14 to see if there is such issue? I guess it might be related with sbitmap_queue's wakeup because SCSI always run queue after one request is completed, and there isn't any request in hctx->dispatch, and for all requests in scheduler queue, there is always chance to dispatch them from SCSI's restart(scsi_end_request). Maybe need Jens/Omar to take a look. Thanks, Ming ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: blk-mq + bfq IO hangs after writing partition table 2017-12-01 13:35 ` Ming Lei @ 2017-12-01 14:46 ` Ivan Kozik 2017-12-01 15:33 ` Ming Lei 0 siblings, 1 reply; 6+ messages in thread From: Ivan Kozik @ 2017-12-01 14:46 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, paolo.valente, linux-block, linux-kernel, oleksandr On Fri, Dec 1, 2017 at 1:35 PM, Ming Lei <ming.lei@redhat.com> wrote: > Basically all IO hang happen inside get_request(), and seems this issue > isn't related with recent change in V4.15, could you run V4.14 to see if > there is such issue? I tried several times and I could not reproduce the problem on linus's v4.14. Also, while testing, I noticed that udev rules can switch out the IO scheduler after gdisk writes partitions, but this appears to be unrelated to the problem, just something to remember when testing. Thanks, Ivan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: blk-mq + bfq IO hangs after writing partition table 2017-12-01 14:46 ` Ivan Kozik @ 2017-12-01 15:33 ` Ming Lei 0 siblings, 0 replies; 6+ messages in thread From: Ming Lei @ 2017-12-01 15:33 UTC (permalink / raw) To: Ivan Kozik; +Cc: axboe, paolo.valente, linux-block, linux-kernel, oleksandr [-- Attachment #1: Type: text/plain, Size: 1043 bytes --] On Fri, Dec 01, 2017 at 02:46:39PM +0000, Ivan Kozik wrote: > On Fri, Dec 1, 2017 at 1:35 PM, Ming Lei <ming.lei@redhat.com> wrote: > > Basically all IO hang happen inside get_request(), and seems this issue > > isn't related with recent change in V4.15, could you run V4.14 to see if > > there is such issue? > > I tried several times and I could not reproduce the problem on linus's v4.14. > > Also, while testing, I noticed that udev rules can switch out the IO > scheduler after gdisk writes partitions, but this appears to be > unrelated to the problem, just something to remember when testing. Now I can reproduce the issue on usb-storage. After applying the attached debug patch, the log[1] can be generated. >From the log, there are requests inserted to BFQ queue, but these requests can't be retrieved by e->type->ops.mq.dispatch_request() in blk_mq_do_dispatch_sched(), then IO hang is caused. Not see this issue on mq-deadline, looks a BFQ specific issue. [1] http://people.redhat.com/minlei/tests/logs/bfq-trace -- Ming [-- Attachment #2: us-bfq.patch --] [-- Type: text/plain, Size: 1689 bytes --] diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index bcb6d21baf12..2db8ac732fba 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -4362,6 +4362,10 @@ static void bfq_insert_requests(struct blk_mq_hw_ctx *hctx, rq = list_first_entry(list, struct request, queuelist); list_del_init(&rq->queuelist); + + if (hctx->queue->tag_set->flags & 0x8) + trace_printk("insert rq->%d\n", rq->internal_tag); + bfq_insert_request(hctx, rq, at_head); } } diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index c117bd8fd1f6..62efca331797 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -107,8 +107,16 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) rq = e->type->ops.mq.dispatch_request(hctx); if (!rq) { + if (q->tag_set->flags & 0x8) + trace_printk("not get rq, %d\n", + e->type->ops.mq.has_work(hctx)); blk_mq_put_dispatch_budget(hctx); break; + } else { + if (q->tag_set->flags & 0x8) + trace_printk("get rq->%d, %d\n", + rq->internal_tag, + e->type->ops.mq.has_work(hctx)); } /* diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 1cbc497e00bd..db9556662e27 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2268,6 +2268,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) shost->tag_set.cmd_size = cmd_size; shost->tag_set.numa_node = NUMA_NO_NODE; shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; + if (!strcmp(shost->hostt->name, "usb-storage")) + shost->tag_set.flags |= 0x8; shost->tag_set.flags |= BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy); shost->tag_set.driver_data = shost; ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-12-01 15:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-01 6:52 blk-mq + bfq IO hangs after writing partition table ivan 2017-12-01 8:50 ` Ming Lei 2017-12-01 12:18 ` Ivan Kozik 2017-12-01 13:35 ` Ming Lei 2017-12-01 14:46 ` Ivan Kozik 2017-12-01 15:33 ` Ming Lei
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.