All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Bart Van Assche <bvanassche@acm.org>,
	"Hannes Reinecke" <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug
Date: Thu, 31 Oct 2019 16:28:51 +0000	[thread overview]
Message-ID: <1ad4288f-ebf3-196d-ddd1-9b584ebcba68@huawei.com> (raw)
In-Reply-To: <20191029100509.GC20854@ming.t460p>

On 29/10/2019 10:05, Ming Lei wrote:
>>>> ok
>>>>
>>>> Or is it still
>>>>> possible to dispatch to LDD after BLK_MQ_S_INTERNAL_STOPPED is set?
>>>> It shouldn't be. However it would seem that this IO had been dispatched to
>>>> the LLDD, the hctx dies, and then we attempt to requeue on that hctx.
>>> But this patch does wait for completion of in-flight request before
>>> shutdown the last CPU of this hctx.
>>>
>> Hi Ming,
>>
>> It may actually be a request from a hctx which is not shut down which errors
>> and causes the time out. I'm still checking.
> If that is the case, blk_mq_hctx_drain_inflight_rqs() will wait for
> completion of this request.
> 
> The only chance it is missed is that the last CPU of this hctx becomes
> offline just when this request stays in request list after it is
> retried from EH.
> 
>> BTW, Can you let me know exactly where you want the debug for "Or
>> blk_mq_hctx_next_cpu() may still run WORK_CPU_UNBOUND schedule after
>> all CPUs are offline, could you add debug message in that branch?"
> You can add the following debug message, then reproduce the issue and
> see if the debug log is dumped.
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 06081966549f..5a98a7b79c0d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1452,6 +1452,10 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
>   		 */
>   		hctx->next_cpu = next_cpu;
>   		hctx->next_cpu_batch = 1;
> +
> +		printk(KERN_WARNING "CPU %d, schedule from (dead) hctx %s\n",
> +			raw_smp_processor_id(),
> +			cpumask_empty(hctx->cpumask) ? "inactive": "active");
>   		return WORK_CPU_UNBOUND;
>   	}
>   


On 28/10/2019 10:42, Ming Lei wrote:
 > +++ b/block/blk-mq.c
 > @@ -679,6 +679,8 @@ void blk_mq_start_request(struct request *rq)
 >   {
 >          struct request_queue *q = rq->q;
 >
 > +       WARN_ON_ONCE(test_bit(BLK_MQ_S_INTERNAL_STOPPED,
 > &rq->mq_hctx->state));
 > +
 >          trace_block_rq_issue(q, rq);
 >
 >          if (test_bit(QUEUE_FLAG_STATS, &q->q


I have now hit this once:

[ 2960.943684] CPU7: shutdown
[ 2960.946424] psci: CPU7 killed.
[ 2961.030546] ------------[ cut here ]------------
[ 2961.035169] WARNING: CPU: 6 PID: 6127 at block/blk-mq.c:682 
blk_mq_start_request+0x1a8/0x1b0
[ 2961.043594] Modules linked in:
[ 2961.046642] CPU: 6 PID: 6127 Comm: fio Not tainted 
5.4.0-rc1-00129-gc61cbc373a43-dirty #457
[ 2961.054981] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI 
RC0 - V1.16.01 03/15/2019
[ 2961.063493] pstate: 00400009 (nzcv daif +PAN -UAO)
[ 2961.068274] pc : blk_mq_start_request+0x1a8/0x1b0
[ 2961.072967] lr : blk_mq_start_request+0x34/0x1b0
[ 2961.077572] sp : ffff0022ff5e71d0
[ 2961.080875] x29: ffff0022ff5e71d0 x28: ffff00232b8c6028
[ 2961.086176] x27: ffff00232b1ad6f0 x26: ffff00232b1addd0
[ 2961.091478] x25: ffff0022ff5e7300 x24: ffff00232b1ad500
[ 2961.096778] x23: ffff00232b8c6118 x22: ffff00232b379200
[ 2961.102079] x21: ffff00232b8c6010 x20: ffff00232b1d0998
[ 2961.107380] x19: ffff00232b8c6000 x18: 0000000000000000
[ 2961.112681] x17: 0000000000000000 x16: 0000000000000000
[ 2961.117981] x15: 0000000000000000 x14: 0000000000000000
[ 2961.123281] x13: 0000000000000006 x12: 1fffe00465718c56
[ 2961.128582] x11: ffff800465718c56 x10: dfffa00000000000
[ 2961.133883] x9 : 0000000000000000 x8 : ffff00232b8c62b8
[ 2961.139183] x7 : 0000000000000000 x6 : 000000000000003f
[ 2961.144484] x5 : 1fffe00465718c40 x4 : ffff00232b379228
[ 2961.149786] x3 : 0000000000000000 x2 : dfffa00000000000
[ 2961.155087] x1 : 0000000000000007 x0 : 000000000000000a
[ 2961.160387] Call trace:
[ 2961.162825]  blk_mq_start_request+0x1a8/0x1b0
[ 2961.167173]  scsi_queue_rq+0x808/0xf38
[ 2961.170911]  __blk_mq_try_issue_directly+0x238/0x310
[ 2961.175865]  blk_mq_request_issue_directly+0xb4/0xf8
[ 2961.180819]  blk_mq_try_issue_list_directly+0xa8/0x160
[ 2961.185947]  blk_mq_sched_insert_requests+0x2e4/0x390
[ 2961.190987]  blk_mq_flush_plug_list+0x254/0x2e0
[ 2961.195508]  blk_flush_plug_list+0x1d0/0x218
[ 2961.199767]  blk_finish_plug+0x48/0x240
[ 2961.203593]  blkdev_direct_IO+0x71c/0x7b0
[ 2961.207593]  generic_file_read_iter+0x16c/0xf28
[ 2961.212111]  blkdev_read_iter+0x80/0xb0
[ 2961.212961.227500]  el0_svc_common.constprop.1+0xa4/0x1d8
[ 2961.232281]  el0_svc_handler+0x34/0xb0
[ 2961.236019]  el0_svc+0x8/0xc
[ 2961.238888] ---[ end trace aa13b886f74a98be ]---
[ 2961.465018] blk_mq_hctx_notify_dead cpu6 dead 
request_queue=0xffff00232b1d5658 (id=19) hctx=0xffff00232abf9200 
hctx->cpumask=6-11 cpu_online_mask=0-5,13-95
[ 2961.478967] blk_mq_hctx_notify_dead cpu6 dead 
request_queue=0xffff00232b1d4328 (id=17) hctx=0xffff00232aa11200 
hctx->cpumask=6-11 cpu_online_mask=0-5,13-95

But this did not seem to trigger my timeout.

John

> 
> Thanks,
> Ming
> 
> .


  parent reply	other threads:[~2019-10-31 16:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14  1:50 [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug Ming Lei
2019-10-14  1:50 ` [PATCH V4 1/5] blk-mq: add new state of BLK_MQ_S_INTERNAL_STOPPED Ming Lei
2019-10-14  1:50 ` [PATCH V4 2/5] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2019-10-14  1:50 ` [PATCH V4 3/5] blk-mq: stop to handle IO and drain IO before hctx becomes dead Ming Lei
2019-11-28  9:29   ` John Garry
2019-10-14  1:50 ` [PATCH V4 4/5] blk-mq: re-submit IO in case that hctx is dead Ming Lei
2019-10-14  1:50 ` [PATCH V4 5/5] blk-mq: handle requests dispatched from IO scheduler " Ming Lei
2019-10-16  8:58 ` [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug John Garry
2019-10-16 12:07   ` Ming Lei
2019-10-16 16:19     ` John Garry
     [not found]       ` <55a84ea3-647d-0a76-596c-c6c6b2fc1b75@huawei.com>
2019-10-20 10:14         ` Ming Lei
2019-10-21  9:19           ` John Garry
2019-10-21  9:34             ` Ming Lei
2019-10-21  9:47               ` John Garry
2019-10-21 10:24                 ` Ming Lei
2019-10-21 11:49                   ` John Garry
2019-10-21 12:53                     ` Ming Lei
2019-10-21 14:02                       ` John Garry
2019-10-22  0:16                         ` Ming Lei
2019-10-22 11:19                           ` John Garry
2019-10-22 13:45                             ` Ming Lei
2019-10-25 16:33             ` John Garry
2019-10-28 10:42               ` Ming Lei
2019-10-28 11:55                 ` John Garry
2019-10-29  1:50                   ` Ming Lei
2019-10-29  9:22                     ` John Garry
2019-10-29 10:05                       ` Ming Lei
2019-10-29 17:54                         ` John Garry
2019-10-31 16:28                         ` John Garry [this message]
2019-11-28  1:09 ` chenxiang (M)
2019-11-28  2:02   ` Ming Lei
2019-11-28 10:45     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ad4288f-ebf3-196d-ddd1-9b584ebcba68@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.