linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	"Hannes Reinecke" <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug
Date: Mon, 28 Oct 2019 11:55:42 +0000	[thread overview]
Message-ID: <a5e25466-c4db-c254-be37-45a9ca85851c@huawei.com> (raw)
In-Reply-To: <20191028104238.GA14008@ming.t460p>

>>
>> For the SCSI commands which timeout, I notice that
>> scsi_set_blocked(reason=SCSI_MLQUEUE_EH_RETRY) was called 30 seconds
>> earlier.
>>
>>   scsi_set_blocked+0x20/0xb8
>>   __scsi_queue_insert+0x40/0x90
>>   scsi_softirq_done+0x164/0x1c8
>>   __blk_mq_complete_request_remote+0x18/0x20
>>   flush_smp_call_function_queue+0xa8/0x150
>>   generic_smp_call_function_single_interrupt+0x10/0x18
>>   handle_IPI+0xec/0x1a8
>>   arch_cpu_idle+0x10/0x18
>>   do_idle+0x1d0/0x2b0
>>   cpu_startup_entry+0x24/0x40
>>   secondary_start_kernel+0x1b4/0x208
> 
> Could you investigate a bit the reason why timeout is triggered?

Yeah, it does seem a strange coincidence that the SCSI command even 
failed and we have to retry, since these should be uncommon events. I'll 
check on this LLDD error.

> 
> Especially we suppose to drain all in-flight requests before the
> last CPU of this hctx becomes offline, and it shouldn't be caused by
> the hctx becoming dead, so still need you to confirm that all
> in-flight requests are really drained in your test. 

ok

Or is it still
> possible to dispatch to LDD after BLK_MQ_S_INTERNAL_STOPPED is set?

It shouldn't be. However it would seem that this IO had been dispatched 
to the LLDD, the hctx dies, and then we attempt to requeue on that hctx.

> 
> In theory, it shouldn't be possible, given we drain in-flight request
> on the last CPU of this hctx.
> 
> Or blk_mq_hctx_next_cpu() may still run WORK_CPU_UNBOUND schedule after
> all CPUs are offline, could you add debug message in that branch?

ok

> 
>>
>> I also notice that the __scsi_queue_insert() call, above, seems to retry to
>> requeue the request on a dead rq in calling
>> __scsi_queue_insert()->blk_mq_requeue_requet()->__blk_mq_requeue_request(),
>> ***:
>>
>> [ 1185.235243] psci: CPU1 killed.
>> [ 1185.238610] blk_mq_hctx_notify_dead cpu1 dead
>> request_queue=0xffff0023ace24f60 (id=19)
>> [ 1185.246530] blk_mq_hctx_notify_dead cpu1 dead
>> request_queue=0xffff0023ace23f80 (id=17)
>> [ 1185.254443] blk_mq_hctx_notify_dead cpu1 dead
>> request_queue=0xffff0023ace22fa0 (id=15)
>> [ 1185.262356] blk_mq_hctx_notify_dead cpu1 dead
>> request_queue=0xffff0023ace21fc0 (id=13)***
>> [ 1185.270271] blk_mq_hctx_notify_dead cpu1 dead
>> request_queue=0xffff0023ace20fe0 (id=11)
>> [ 1185.939451] scsi_softirq_done NEEDS_RETRY rq=0xffff0023b7416000
>> [ 1185.945359] scsi_set_blocked reason=0x1057
>> [ 1185.949444] __blk_mq_requeue_request request_queue=0xffff0023ace21fc0
>> id=13 rq=0xffff0023b7416000***
>>
>> [...]
>>
>> [ 1214.903455] scsi_timeout req=0xffff0023add29000 reserved=0
>> [ 1214.908946] scsi_timeout req=0xffff0023add29300 reserved=0
>> [ 1214.914424] scsi_timeout req=0xffff0023add29600 reserved=0
>> [ 1214.919909] scsi_timeout req=0xffff0023add29900 reserved=0
>>
>> I guess that we're retrying as the SCSI failed in the LLDD for some reason.
>>
>> So could this be the problem - we're attempting to requeue on a dead request
>> queue?
> 
> If there are any in-flight requests originated from hctx which is going
> to become dead, they should have been drained before CPU becomes offline.

Sure, but we seem to hit a corner case here...

Thanks,
John

> 
> Thanks,
> Ming
> 
> .
> 


  reply	other threads:[~2019-10-28 11:55 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14  1:50 Ming Lei
2019-10-14  1:50 ` [PATCH V4 1/5] blk-mq: add new state of BLK_MQ_S_INTERNAL_STOPPED Ming Lei
2019-10-14  1:50 ` [PATCH V4 2/5] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2019-10-14  1:50 ` [PATCH V4 3/5] blk-mq: stop to handle IO and drain IO before hctx becomes dead Ming Lei
2019-11-28  9:29   ` John Garry
2019-10-14  1:50 ` [PATCH V4 4/5] blk-mq: re-submit IO in case that hctx is dead Ming Lei
2019-10-14  1:50 ` [PATCH V4 5/5] blk-mq: handle requests dispatched from IO scheduler " Ming Lei
2019-10-16  8:58 ` [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug John Garry
2019-10-16 12:07   ` Ming Lei
2019-10-16 16:19     ` John Garry
     [not found]       ` <55a84ea3-647d-0a76-596c-c6c6b2fc1b75@huawei.com>
2019-10-20 10:14         ` Ming Lei
2019-10-21  9:19           ` John Garry
2019-10-21  9:34             ` Ming Lei
2019-10-21  9:47               ` John Garry
2019-10-21 10:24                 ` Ming Lei
2019-10-21 11:49                   ` John Garry
2019-10-21 12:53                     ` Ming Lei
2019-10-21 14:02                       ` John Garry
2019-10-22  0:16                         ` Ming Lei
2019-10-22 11:19                           ` John Garry
2019-10-22 13:45                             ` Ming Lei
2019-10-25 16:33             ` John Garry
2019-10-28 10:42               ` Ming Lei
2019-10-28 11:55                 ` John Garry [this message]
2019-10-29  1:50                   ` Ming Lei
2019-10-29  9:22                     ` John Garry
2019-10-29 10:05                       ` Ming Lei
2019-10-29 17:54                         ` John Garry
2019-10-31 16:28                         ` John Garry
2019-11-28  1:09 ` chenxiang (M)
2019-11-28  2:02   ` Ming Lei
2019-11-28 10:45     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a5e25466-c4db-c254-be37-45a9ca85851c@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tglx@linutronix.de \
    --subject='Re: [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).