All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Bart Van Assche <Bart.VanAssche@wdc.com>,
	"ming.lei@redhat.com" <ming.lei@redhat.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"osandov@fb.com" <osandov@fb.com>,
	"john.garry@huawei.com" <john.garry@huawei.com>,
	"loberman@redhat.com" <loberman@redhat.com>
Subject: Re: [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget()
Date: Tue, 7 Nov 2017 15:06:31 -0700	[thread overview]
Message-ID: <a6a505f5-d374-2878-6c55-116255795346@kernel.dk> (raw)
In-Reply-To: <292e3f7d-cfe3-840a-5a5e-d00e01239152@kernel.dk>

On 11/07/2017 10:36 AM, Jens Axboe wrote:
> On 11/07/2017 10:10 AM, Jens Axboe wrote:
>> On 11/07/2017 09:29 AM, Jens Axboe wrote:
>>> On 11/07/2017 09:20 AM, Bart Van Assche wrote:
>>>> On Tue, 2017-11-07 at 10:11 +0800, Ming Lei wrote:
>>>>> If you can reproduce, please provide me at least the following log
>>>>> first:
>>>>>
>>>>> 	find /sys/kernel/debug/block -name tags | xargs cat | grep busy
>>>>>
>>>>> If any pending requests arn't completed, please provide the related
>>>>> info in dbgfs about where is the request.
>>>>
>>>> Every time I ran the above or a similar command its output was empty. I
>>>> assume that's because the hang usually occurs in a phase where these debugfs
>>>> attributes either have not yet been created or have already disappeared.
>>>
>>> Bart, do you still see a hang with the patch that fixes the tag leak when
>>> we fail to get a dispatch budget?
>>>
>>> https://marc.info/?l=linux-block&m=151004881411480&w=2
>>>
>>> I've run a lot of stability testing here, and I haven't run into any
>>> issues. This is with shared tags as well. So if you still see the failure
>>> case with the current tree AND the above patch, then I'll try and get
>>> a test case setup that hits it too so we can get to the bottom of this.
>>
>> Ming, I managed to reproduce the hang using null_blk. Note this is
>> WITHOUT the patch mentioned above, running with that now.
>>
>> # modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 submit_queues=1 hw_queue_depth=1
>>
>> and using this fio file:
>>
>> [global]
>> bs=4k
>> rw=randread
>> norandommap
>> direct=1
>> ioengine=libaio
>> iodepth=4
>>
>> [nullb0]
>> filename=/dev/nullb0
>> [nullb1]
>> filename=/dev/nullb1
>> [nullb2]
>> filename=/dev/nullb2
>> [nullb3]
>> filename=/dev/nullb3
>>
>> it seemed to keep running, but it hung when exiting. The troublesome
>> device was nullb1:
>>
>> [  492.513374] INFO: task fio:3263 blocked for more than 120 seconds.
>> [  492.520782]       Not tainted 4.14.0-rc7+ #499
>> [  492.526247] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  492.535904] fio             D13208  3263   3211 0x00000000
>> [  492.542535] Call Trace:
>> [  492.545764]  __schedule+0x279/0x720
>> [  492.550151]  schedule+0x33/0x90
>> [  492.554145]  io_schedule+0x16/0x40
>> [  492.558435]  blk_mq_get_tag+0x148/0x250
>> [  492.563211]  ? finish_wait+0x90/0x90
>> [  492.567693]  blk_mq_get_request+0xf0/0x3e0
>> [  492.572760]  blk_mq_make_request+0xe2/0x690
>> [  492.577913]  generic_make_request+0xfc/0x2f0
>> [  492.583177]  submit_bio+0x64/0x120
>> [  492.587475]  ? set_page_dirty_lock+0x4b/0x60
>> [  492.592736]  ? submit_bio+0x64/0x120
>> [  492.597309]  ? bio_set_pages_dirty+0x55/0x60
>> [  492.602570]  blkdev_direct_IO+0x388/0x3c0
>> [  492.607546]  ? free_ioctx_users+0xe0/0xe0
>> [  492.612511]  ? blk_mq_dispatch_rq_list+0x238/0x3a0
>> [  492.618353]  ? _raw_spin_unlock+0xe/0x20
>> [  492.623227]  generic_file_read_iter+0xb3/0xa00
>> [  492.628682]  ? generic_file_read_iter+0xb3/0xa00
>> [  492.634334]  ? security_file_permission+0x9b/0xc0
>> [  492.640114]  blkdev_read_iter+0x35/0x40
>> [  492.644877]  aio_read+0xc5/0x120
>> [  492.648973]  ? aio_read_events+0x24c/0x340
>> [  492.654124]  ? __might_sleep+0x4a/0x80
>> [  492.658800]  do_io_submit+0x47c/0x5e0
>> [  492.663373]  ? do_io_submit+0x47c/0x5e0
>> [  492.668234]  SyS_io_submit+0x10/0x20
>> [  492.672715]  ? SyS_io_submit+0x10/0x20
>> [  492.677394]  entry_SYSCALL_64_fastpath+0x13/0x94
>> [  492.683039] RIP: 0033:0x7f83d1871717
>> [  492.687521] RSP: 002b:00007ffd38fe5a88 EFLAGS: 00000202 ORIG_RAX: 00000000000000d1
>> [  492.696969] RAX: ffffffffffffffda RBX: 00007f83b6972b50 RCX: 00007f83d1871717
>> [  492.705423] RDX: 0000000001f41418 RSI: 0000000000000001 RDI: 00007f83e4d36000
>> [  492.713889] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000001f3f2e0
>> [  492.722352] R10: 0000000000001000 R11: 0000000000000202 R12: 00007ffd38fe5be0
>> [  492.730827] R13: 00007f83b6972b01 R14: 00007f83b69824b8 R15: 00007f83b6982368
>>
>> and if we look at the debug entries, it's waiting on a scheduler tag:
>>
>> sched_tags:nr_tags=2
>> sched_tags:nr_reserved_tags=0
>> sched_tags:active_queues=0
>> sched_tags:bitmap_tags:
>> sched_tags:depth=2
>> sched_tags:busy=2
>> sched_tags:bits_per_word=64
>> sched_tags:map_nr=1
>> sched_tags:alloc_hint={0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0}
>> sched_tags:wake_batch=1
>> sched_tags:wake_index=4
>> sched_tags:ws={
>> sched_tags:	{.wait_cnt=-102, .wait=inactive},
>> sched_tags:	{.wait_cnt=-126, .wait=inactive},
>> sched_tags:	{.wait_cnt=-72, .wait=inactive},
>> sched_tags:	{.wait_cnt=-96, .wait=inactive},
>> sched_tags:	{.wait_cnt=-134, .wait=inactive},
>> sched_tags:	{.wait_cnt=-116, .wait=inactive},
>> sched_tags:	{.wait_cnt=-90, .wait=inactive},
>> sched_tags:	{.wait_cnt=-115, .wait=active},
>> sched_tags:}
>> sched_tags:round_robin=0
>> sched_tags_bitmap:00000000: 03
>>
>> with SCHED_RESTART being set:
>>
>> state:SCHED_RESTART
>>
>> and with the driver tags being idle:
>>
>> tags:nr_tags=1
>> tags:nr_reserved_tags=0
>> tags:active_queues=0
>> tags:bitmap_tags:
>> tags:depth=1
>> tags:busy=0
>> tags:bits_per_word=64
>> tags:map_nr=1
>> tags:alloc_hint={0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
>> tags:wake_batch=1
>> tags:wake_index=3
>> tags:ws={
>> tags:	{.wait_cnt=-48, .wait=inactive},
>> tags:	{.wait_cnt=-39, .wait=inactive},
>> tags:	{.wait_cnt=-50, .wait=inactive},
>> tags:	{.wait_cnt=-47, .wait=inactive},
>> tags:	{.wait_cnt=-25, .wait=inactive},
>> tags:	{.wait_cnt=-24, .wait=inactive},
>> tags:	{.wait_cnt=-47, .wait=inactive},
>> tags:	{.wait_cnt=-47, .wait=inactive},
>> tags:}
> 
> Unsurprisingly (since this is sched_tags starvation) this still happens
> with the patch. Same trace as above. Note that dispatch has two requests
> sitting ready:
> 
> ffff881ff5ab1000 {.op=READ, .cmd_flags=, .rq_flags=STARTED|IO_STAT, .atomic_flags=COMPLETE, .tag=-1, .internal_tag=0}
> ffff881ff5ab1240 {.op=READ, .cmd_flags=, .rq_flags=STARTED|IO_STAT, .atomic_flags=COMPLETE, .tag=-1, .internal_tag=1}
> 
> while we're blocked trying to get a new sched tag off the regular
> blk_mq_make_request() path. It looks like a missing restart of the hctx.

Just to keep everyone in the loop, this bug is not new to
for-4.15/block, nor is it new to the current 4.41-rc or 4.13. So it's
probably different to what Bart is hitting, but it's a bug none the
less...

To me, this looks like a race between marking the queue as needing a
restart, and the driver tag being released and re-running the queue. If
I do:

# echo run > /sys/kernel/debug/block/nullb1/state

then it starts just fine. So there must be a race between marking the
need for a restart (when getting the driver tag fails), and when one of
the shared tag queues finishes a request and releases the driver tag.

-- 
Jens Axboe

  reply	other threads:[~2017-11-07 22:06 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-04  1:55 [PATCH] SCSI: don't get target/host busy_count in scsi_mq_get_budget() Ming Lei
2017-11-04 14:19 ` Jens Axboe
2017-11-06 19:45   ` Bart Van Assche
2017-11-06 19:45     ` Bart Van Assche
2017-11-07  2:11     ` Ming Lei
2017-11-07 16:20       ` Bart Van Assche
2017-11-07 16:20         ` Bart Van Assche
2017-11-07 16:29         ` Jens Axboe
2017-11-07 17:10           ` Jens Axboe
2017-11-07 17:36             ` Jens Axboe
2017-11-07 22:06               ` Jens Axboe [this message]
2017-11-07 22:34                 ` Bart Van Assche
2017-11-07 22:34                   ` Bart Van Assche
2017-11-07 22:39                   ` Jens Axboe
2017-11-08  0:50                   ` Ming Lei
2017-11-08  1:03                 ` Ming Lei
2017-11-08  3:01                   ` Jens Axboe
2017-11-08  3:12                     ` Ming Lei
2017-11-08  3:17                       ` Jens Axboe
2017-11-08  3:17                         ` Jens Axboe
2017-11-08  6:20                         ` Ming Lei
2017-11-08 15:59                           ` Ming Lei
2017-11-08 18:19                             ` Jens Axboe
2017-11-07 17:34           ` Bart Van Assche
2017-11-07 17:34             ` Bart Van Assche
2017-11-08  0:53             ` Ming Lei
2017-11-08  2:06               ` Ming Lei
2017-11-08  0:39         ` Ming Lei
2017-11-08  2:55           ` Jens Axboe
2017-11-08  2:58             ` Ming Lei
2017-11-08  3:06               ` Jens Axboe
2017-11-08 16:41                 ` Bart Van Assche
2017-11-08 16:41                   ` Bart Van Assche
2017-11-08 17:57                   ` Jens Axboe
2017-11-08 18:22                     ` Laurence Oberman
2017-11-08 18:22                       ` Laurence Oberman
2017-11-08 18:28                       ` Jens Axboe
2017-11-09  4:02                     ` Ming Lei
2017-11-09  2:05                   ` Ming Lei
2017-11-07 10:15     ` Ming Lei
2017-11-07 16:17       ` Bart Van Assche
2017-11-07 16:17         ` Bart Van Assche
2017-11-08  3:12         ` Jens Axboe
2017-11-06 18:04 ` Bart Van Assche
2017-11-06 18:04   ` Bart Van Assche
2017-11-07  2:19   ` Ming Lei
2017-11-07  3:53     ` Martin K. Petersen
2017-11-07  3:53       ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6a505f5-d374-2878-6c55-116255795346@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Bart.VanAssche@wdc.com \
    --cc=hch@infradead.org \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=loberman@redhat.com \
    --cc=ming.lei@redhat.com \
    --cc=osandov@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.