v4.15 and I/O hang with BFQ

* v4.15 and I/O hang with BFQ
@ 2018-01-30  8:05 Oleksandr Natalenko
  2018-01-30  8:19 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Oleksandr Natalenko @ 2018-01-30  8:05 UTC (permalink / raw)
  To: paolo.valente
  Cc: Ivan Kozik, Ming Lei, linux-block, linux-kernel, bfq-iosched,
	Jens Axboe, linus.walleij

Hi, Paolo, Ivan, Ming et al.

It looks like I've just encountered the issue Ivan has already described 
in [1]. Since I'm able to reproduce it reliably in a VM, I'd like to 
draw more attention to it.

First, I'm using v4.15 kernel with all pending BFQ fixes:

===
2ad909a300c4 bfq-iosched: don't call bfqg_and_blkg_put for 
!CONFIG_BFQ_GROUP_IOSCHED
83c97a310f83 block, bfq: release oom-queue ref to root group on exit
5b9eb4716af1 block, bfq: put async queues for root bfq groups too
3c5529454a27 block, bfq: limit sectors served with interactive weight 
raising
e6c72be3486b block, bfq: limit tags for writes and async I/O
e579b91d96ce block, bfq: increase threshold to deem I/O as random
f6cbc16aac88 block, bfq: remove superfluous check in queue-merging setup
8045d8575183 block, bfq: let a queue be merged only shortly after 
starting I/O
242954975f5e block, bfq: check low_latency flag in bfq_bfqq_save_state()
8349c1bddd95 block, bfq: add missing rq_pos_tree update on rq removal
558200440cb9 block, bfq: fix occurrences of request finish method's old 
name
6ed2f47ee870 block, bfq: consider also past I/O in soft real-time 
detection
e5f295dd18f2 block, bfq: remove batches of confusing ifdefs
===

Next, I boot an Arch VM with this kernel and emulated USB stick 
attached:

===
qemu-system-x86_64 -display gtk,gl=on -machine q35,accel=kvm -cpu 
host,+vmx -enable-kvm -drive 
if=pflash,format=raw,readonly,file=/mnt/vms/ovmf/code.img -drive 
if=pflash,format=raw,file=/mnt/vms/ovmf/vars.img -cdrom 
/mnt/vms/ovmf/shell.iso -netdev user,id=user.0 -device 
virtio-net,netdev=user.0 -usb -device nec-usb-xhci,id=xhci -device 
usb-tablet,bus=xhci.0 -serial stdio -m 512 -hda sda.img -hdb sdb.img 
-smp 4 -drive if=none,id=stick,file=usb.img -device 
usb-storage,bus=xhci.0,drive=stick
===

Within the VM itself I use udev rule to set the I/O scheduler:

===
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="bfq"
===

Things boot and work fine until I try to create a partition on the 
emulated USB stick with cfdisk. On writing a new partition table it 
blocks, and I get the following stacktrace:

===
[  225.670702] INFO: task cfdisk:416 blocked for more than 20 seconds.
[  225.681427]       Not tainted 4.15.0-pf1 #1
[  225.685341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  225.691910] cfdisk          D    0   416    407 0x00000000
[  225.700777] Call Trace:
[  225.703654]  ? __schedule+0x35f/0x1000
[  225.706990]  ? __switch_to_asm+0x40/0x70
[  225.709943]  ? __switch_to_asm+0x34/0x70
[  225.712224]  ? __switch_to_asm+0x40/0x70
[  225.714225]  ? __switch_to_asm+0x40/0x70
[  225.716790]  schedule+0x32/0xc0
[  225.718355]  io_schedule+0x12/0x40
[  225.719747]  wait_on_page_bit+0xea/0x130
[  225.721266]  ? add_to_page_cache_lru+0xe0/0xe0
[  225.722622]  __filemap_fdatawait_range+0xbf/0x110
[  225.724625]  ? preempt_count_sub+0x50/0x90
[  225.726591]  ? sync_inodes_one_sb+0x20/0x20
[  225.727507]  filemap_fdatawait_keep_errors+0x1a/0x40
[  225.728491]  iterate_bdevs+0xa7/0x140
[  225.729304]  sys_sync+0x7c/0xb0
[  225.730095]  entry_SYSCALL_64_fastpath+0x20/0x83
[  225.732420] RIP: 0033:0x7f2631cf4a17
===

I've tried it several times with the same result. Next, I reboot the 
system, change the scheduler to Kyber, and then I can create a new 
partition successfully.

I also went further. While having Kyber activated to work with an 
emulated USB stick properly, I create RAID10 on it, then I create LVM 
PV, then VG, then LV, then XFS on top of that, then I set I/O scheduler 
back to BFQ, then reboot, and on reboot mdadm gets blocked while 
discovering things, producing the following stacktrace:

===
[   41.350763] INFO: task mdadm:354 blocked for more than 20 seconds.
[   41.356154]       Not tainted 4.15.0-pf1 #1
[   41.358674] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[   41.363046] mdadm           D    0   354    352 0x00000100
[   41.368700] Call Trace:
[   41.370417]  ? __schedule+0x35f/0x1000
[   41.372668]  ? blk_queue_exit+0x3e/0x60
[   41.374816]  ? generic_make_request+0x12f/0x2d0
[   41.377363]  schedule+0x32/0xc0
[   41.380416]  io_schedule+0x12/0x40
[   41.382698]  __blkdev_direct_IO_simple+0x206/0x360
[   41.382707]  ? bdget+0x120/0x120
[   41.382714]  ? blkdev_direct_IO+0x3a5/0x3f0
[   41.382718]  blkdev_direct_IO+0x3a5/0x3f0
[   41.382724]  ? current_time+0x18/0x70
[   41.382731]  ? __atime_needs_update+0x7f/0x190
[   41.382743]  ? generic_file_read_iter+0x8c/0x9d0
[   41.382747]  generic_file_read_iter+0x8c/0x9d0
[   41.382759]  ? __seccomp_filter+0x3b/0x260
[   41.382765]  __vfs_read+0xf5/0x170
[   41.382772]  vfs_read+0x91/0x130
[   41.382778]  SyS_read+0x52/0xc0
[   41.382819]  do_syscall_64+0x67/0x1a0
[   41.382828]  entry_SYSCALL64_slow_path+0x25/0x25
[   41.382833] RIP: 0033:0x7f4b8088a3a1
[   41.382836] RSP: 002b:00007ffdd681def8 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[   41.382841] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007f4b8088a3a1
[   41.382844] RDX: 0000000000001000 RSI: 00000000007f0000 RDI: 
0000000000000003
[   41.382846] RBP: 0000000000000003 R08: 0000000000000003 R09: 
00007f4b80b55050
[   41.382848] R10: 000000000000038b R11: 0000000000000246 R12: 
00000000007eeb10
[   41.382850] R13: 0000000000000000 R14: 00000000007f0000 R15: 
00000000007eeb10
===

If I change I/O scheduler back to Kyber, the system boots okay.

Again, since I'm able to reproduce the issue reliably, let me now what 
information I should also provide, and as usual I'm ready to test any 
patches or suggestions.

Important note: I was unable to reproduce this on v4.14 with the same 
set of BFQ patches (exactly the same case as Ivan has already 
described).

Thanks.

Regards,
   Oleksandr

[1] https://lkml.org/lkml/2017/12/1/80

^ permalink raw reply	[flat|nested] 8+ messages in thread