All of lore.kernel.org
 help / color / mirror / Atom feed
* v4.11-rc blk-mq lockup?
@ 2017-03-27 21:44 Bart Van Assche
  2017-03-28 14:06 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2017-03-27 21:44 UTC (permalink / raw)
  To: axboe; +Cc: linux-block

Hello Jens,

If I leave the srp-test software running for a few minutes using the
following command:

# while ~bart/software/infiniband/srp-test/run_tests -d -r 30; do :; done

then after some time the following complaint appears for multiple
kworkers:

INFO: task kworker/9:0:65 blocked for more than 480 seconds.
=A0=A0=A0=A0=A0=A0Tainted: G=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0I=A0=A0=A0=A0=A04=
.11.0-rc4-dbg+ #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/9:0=A0=A0=A0=A0=A0D=A0=A0=A0=A00=A0=A0=A0=A065=A0=A0=A0=A0=A0=A02 0=
x00000000
Workqueue: dio/dm-0 dio_aio_complete_work
Call Trace:
=A0__schedule+0x3df/0xc10
=A0schedule+0x38/0x90
=A0rwsem_down_write_failed+0x2c4/0x4c0
=A0call_rwsem_down_write_failed+0x17/0x30
=A0down_write+0x5a/0x70
=A0__generic_file_fsync+0x43/0x90
=A0ext4_sync_file+0x2d0/0x550
=A0vfs_fsync_range+0x46/0xa0
=A0dio_complete+0x181/0x1b0
=A0dio_aio_complete_work+0x17/0x20
=A0process_one_work+0x208/0x6a0
=A0worker_thread+0x49/0x4a0
=A0kthread+0x107/0x140
=A0ret_from_fork+0x2e/0x40

I had not yet observed this behavior with kernel v4.10 or older. If this
happens and I check the queue state with the following script:

#!/bin/bash

cd /sys/class/block || exit $?
for dev in *; do
=A0=A0=A0=A0if [ -e "$dev/mq" ]; then
	echo "$dev"
	for f in "$dev"/mq/*/{pending,*/rq_list}; do
	=A0=A0=A0=A0[ -e "$f" ] || continue
	=A0=A0=A0=A0if { read -r line1 && read -r line2; } <"$f"; then
		echo "$f"
		echo "$line1 $line2" >/dev/null
		head -n 9 "$f"
	=A0=A0=A0=A0fi
	done
	(
	=A0=A0=A0=A0cd /sys/kernel/debug/block >&/dev/null &&
	=A0=A0=A0=A0for d in "$dev"/mq/*; do
		[ ! -d "$d" ] && continue
		grep -q '^busy=3D0$' "$d/tags" && continue
	=A0=A0=A0=A0=A0=A0=A0=A0for f in "$d"/{dispatch,tags*,cpu*/rq_list}; do
		=A0=A0=A0=A0[ -e "$f" ] && grep -aH '' "$f"
		done
	=A0=A0=A0=A0done
	)
=A0=A0=A0=A0fi
done

then the following output appears:

sda
sdb
sdc
sdd
sdd/mq/3/dispatch:ffff880401655d00 {.cmd_flags=3D0xca01, .rq_flags=3D0x2040=
, .tag=3D59, .internal_tag=3D-1}
sdd/mq/3/tags:nr_tags=3D62
sdd/mq/3/tags:nr_reserved_tags=3D0
sdd/mq/3/tags:active_queues=3D0
sdd/mq/3/tags:
sdd/mq/3/tags:bitmap_tags:
sdd/mq/3/tags:depth=3D62
sdd/mq/3/tags:busy=3D31
sdd/mq/3/tags:bits_per_word=3D8
sdd/mq/3/tags:map_nr=3D8
sdd/mq/3/tags:alloc_hint=3D{23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, =
37, 43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sdd/mq/3/tags:wake_batch=3D7
sdd/mq/3/tags:wake_index=3D0
sdd/mq/3/tags:ws=3D{
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sdd/mq/3/tags:}
sdd/mq/3/tags:round_robin=3D0
sdd/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdd/mq/3/cpu5/rq_list:ffff880401657440 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D60, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba0000 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D0, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba1740 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D1, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba2e80 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D2, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba45c0 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D3, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba5d00 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D4, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba7440 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D5, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba8b80 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D6, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037abaa2c0 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D7, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037ababa00 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D8, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037abad140 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D9, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88037abae880 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D10, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369900000 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D11, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369901740 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D12, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369902e80 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D13, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8803699045c0 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D14, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369905d00 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D15, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369907440 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D16, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff880369908b80 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D17, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88036990a2c0 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D18, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88036990ba00 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D19, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88036990d140 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D20, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff88036990e880 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D21, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b0000 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D22, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b1740 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D23, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b2e80 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D24, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b45c0 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D25, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b5d00 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D26, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b7440 {.cmd_flags=3D0xca01, .rq_flags=3D0x=
2040, .tag=3D27, .internal_tag=3D-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b8b80 {.cmd_flags=3D0x7a01, .rq_flags=3D0x=
2040, .tag=3D28, .internal_tag=3D-1}
sde
sde/mq/3/tags:nr_tags=3D62
sde/mq/3/tags:nr_reserved_tags=3D0
sde/mq/3/tags:active_queues=3D0
sde/mq/3/tags:
sde/mq/3/tags:bitmap_tags:
sde/mq/3/tags:depth=3D62
sde/mq/3/tags:busy=3D31
sde/mq/3/tags:bits_per_word=3D8
sde/mq/3/tags:map_nr=3D8
sde/mq/3/tags:alloc_hint=3D{23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, =
37, 43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sde/mq/3/tags:wake_batch=3D7
sde/mq/3/tags:wake_index=3D0
sde/mq/3/tags:ws=3D{
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:	{.wait_cnt=3D7, .wait=3Dinactive},
sde/mq/3/tags:}
sde/mq/3/tags:round_robin=3D0
sde/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdf
sdg
sdh
sdi
sdj
sr0

I am using the "none" scheduler:

# cat /sys/class/block/sdd/queue/scheduler =A0
[none]=A0
# cat /sys/class/block/sde/queue/scheduler =A0=A0
[none]

What is remarkable is that I see pending requests for the sd* devices
but not for any dm* device and also that the number of busy requests (31)
is exactly half of the queue depth (62). Could this indicate that the
block layer stopped processing these blk-mq queues?

If this happens and I run the following command to trigger SRP logout:

# for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete;=A0done

then the test that was running finishes, reports that removing the
multipath device failed and echo w >/proc/sysrq-trigger produces the
following output:

sysrq: SysRq : Show Blocked State
=A0 task=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0PC stack=A0=A0=A0pid father
systemd-udevd=A0=A0=A0D=A0=A0=A0=A00 14490=A0=A0=A0=A0508 0x00000106
Call Trace:
=A0__schedule+0x3df/0xc10
=A0schedule+0x38/0x90
=A0io_schedule+0x11/0x40
=A0__lock_page+0x10c/0x140
=A0truncate_inode_pages_range+0x45d/0x780
=A0truncate_inode_pages+0x10/0x20
=A0kill_bdev+0x30/0x40
=A0__blkdev_put+0x71/0x220
=A0blkdev_put+0x49/0x170
=A0blkdev_close+0x20/0x30
=A0__fput+0xe8/0x1f0
=A0____fput+0x9/0x10
=A0task_work_run+0x80/0xb0
=A0do_exit+0x30c/0xc70
=A0do_group_exit+0x4b/0xc0
=A0get_signal+0x2c2/0x930
=A0do_signal+0x23/0x670
=A0exit_to_usermode_loop+0x5d/0xa0
=A0do_syscall_64+0xd5/0x140
=A0entry_SYSCALL64_slow_path+0x25/0x25

Does this indicate that truncate_inode_pages_range() is waiting
because a block layer queue got stuck?

The kernel tree I used in my tests is the result of merging the
following commits:
* commit 3dca2c2f3d3b=A0from git://git.kernel.dk/linux-block.git
  ("Merge branch 'for-4.12/block' into for-next")
* commit f88ab0c4b481 from git://git.kernel.org/pub/scm/linux/kernel/git/mk=
p/scsi.git
  ("scsi: libsas: fix ata xfer length")
* commit ad0376eb1483 from git://git.kernel.org/pub/scm/linux/kernel/git/to=
rvalds/linux.git
  ("Merge tag 'edac_for_4.11_2' of git://git.kernel.org/pub/scm/linux/kerne=
l/git/bp/bp")

Please let me know if you need more information.

Thanks,

Bart.=

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-03-29 20:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27 21:44 v4.11-rc blk-mq lockup? Bart Van Assche
2017-03-28 14:06 ` Jens Axboe
2017-03-28 16:25   ` Bart Van Assche
2017-03-28 16:30     ` Jens Axboe
2017-03-29 20:36       ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.