All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: John Garry <john.garry@huawei.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Bart Van Assche <bvanassche@acm.org>,
	Hannes Reinecke <hare@suse.com>, Christoph Hellwig <hch@lst.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Keith Busch <keith.busch@intel.com>
Subject: Re: [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug
Date: Sun, 20 Oct 2019 18:14:08 +0800	[thread overview]
Message-ID: <20191020101404.GA5103@ming.t460p> (raw)
In-Reply-To: <55a84ea3-647d-0a76-596c-c6c6b2fc1b75@huawei.com>

On Thu, Oct 17, 2019 at 04:40:12PM +0100, John Garry wrote:
> On 16/10/2019 17:19, John Garry wrote:
> > On 16/10/2019 13:07, Ming Lei wrote:
> > > On Wed, Oct 16, 2019 at 09:58:27AM +0100, John Garry wrote:
> > > > On 14/10/2019 02:50, Ming Lei wrote:
> > > > > Hi,
> > > > > 
> > > > > Thomas mentioned:
> > > > >     "
> > > > >      That was the constraint of managed interrupts from the very
> > > > > beginning:
> > > > > 
> > > > >       The driver/subsystem has to quiesce the interrupt line and the
> > > > > associated
> > > > >       queue _before_ it gets shutdown in CPU unplug and not fiddle
> > > > > with it
> > > > >       until it's restarted by the core when the CPU is plugged in
> > > > > again.
> > > > >     "
> > > > > 
> > > > > But no drivers or blk-mq do that before one hctx becomes dead(all
> > > > > CPUs for one hctx are offline), and even it is worse, blk-mq stills
> > > > > tries
> > > > > to run hw queue after hctx is dead, see blk_mq_hctx_notify_dead().
> > > > > 
> > > > > This patchset tries to address the issue by two stages:
> > > > > 
> > > > > 1) add one new cpuhp state of CPUHP_AP_BLK_MQ_ONLINE
> > > > > 
> > > > > - mark the hctx as internal stopped, and drain all in-flight requests
> > > > > if the hctx is going to be dead.
> > > > > 
> > > > > 2) re-submit IO in the state of CPUHP_BLK_MQ_DEAD after the hctx
> > > > > becomes dead
> > > > > 
> > > > > - steal bios from the request, and resubmit them via
> > > > > generic_make_request(),
> > > > > then these IO will be mapped to other live hctx for dispatch
> > > > > 
> > > > > Please comment & review, thanks!
> > > > > 
> > > > > John, I don't add your tested-by tag since V3 have some changes,
> > > > > and I appreciate if you may run your test on V3.
> > > > 
> > > > Hi Ming,
> > > > 
> > > > So I got around to doing some testing. The good news is that issue
> > > > which we
> > > > were experiencing in v3 series seems to have has gone away - alot more
> > > > stable.
> > > > 
> > > > However, unfortunately, I did notice some SCSI timeouts:
> > > > 
> > > > 15508.615074] CPU2: shutdown
> > > > [15508.617778] psci: CPU2 killed.
> > > > [15508.651220] CPU1: shutdown
> > > > [15508.653924] psci: CPU1 killed.
> > > > [15518.406229] sas: Enter sas_scsi_recover_host busy: 63 failed: 63
> > > > Jobs: 1 (f=1): [R] [0.0% done] [0[15518.412239] sas: sas_scsi_find_task:
> > > > aborting task 0x00000000a7159744
> > > > KB/0KB/0KB /s] [0/0/0 iops] [eta [15518.421708] sas:
> > > > sas_eh_handle_sas_errors: task 0x00000000a7159744 is done
> > > > [15518.431266] sas: sas_scsi_find_task: aborting task 0x00000000d39731eb
> > > > [15518.442539] sas: sas_eh_handle_sas_errors: task 0x00000000d39731eb is
> > > > done
> > > > [15518.449407] sas: sas_scsi_find_task: aborting task 0x000000009f77c9bd
> > > > [15518.455899] sas: sas_eh_handle_sas_errors: task 0x000000009f77c9bd is
> > > > done
> > > > 
> > > > A couple of things to note:
> > > > - I added some debug prints in blk_mq_hctx_drain_inflight_rqs() for when
> > > > inflights rqs !=0, and I don't see them for this timeout
> > > > - 0 datarate reported from fio
> > > > 
> > > > I'll have a look...
> > > 
> > > What is the output of the following command?
> > > 
> > > (cd /sys/kernel/debug/block/$SAS_DISK && find . -type f -exec grep -aH
> > > . {} \;)
> > I assume that you want this run at about the time SCSI EH kicks in for
> > the timeout, right? If so, I need to find some simple sysfs entry to
> > tell me of this occurrence, to trigger the capture. Or add something. My
> > script is pretty dump.
> > 
> > BTW, I did notice that we the dump_stack in __blk_mq_run_hw_queue()
> > pretty soon before the problem happens - maybe a clue or maybe coincidence.
> > 
> 
> I finally got to capture that debugfs dump at the point the SCSI IOs
> timeout, as attached. Let me know if any problem receiving it.
> 
> Here's a kernel log snippet at that point (I added some print for the
> timeout):
> 
> 609] psci: CPU6 killed.
> [  547.722217] CPU5: shutdown
> [  547.724926] psci: CPU5 killed.
> [  547.749951] irq_shutdown
> [  547.752701] IRQ 800: no longer affine to CPU4
> [  547.757265] CPU4: shutdown
> [  547.759971] psci: CPU4 killed.
> [  547.790348] CPU3: shutdown
> [  547.793052] psci: CPU3 killed.
> [  547.818330] CPU2: shutdown
> [  547.821033] psci: CPU2 killed.
> [  547.854285] CPU1: shutdown
> [  547.856989] psci: CPU1 killed.
> [  575.925307] scsi_timeout req=0xffff0023b0dd9c00 reserved=0
> [  575.930794] scsi_timeout req=0xffff0023b0df2700 reserved=0

From the debugfs log, 66 requests are dumped, and 63 of them has
been submitted to device, and the other 3 is in ->dispatch list
via requeue after timeout is handled.

You mentioned that:

" - I added some debug prints in blk_mq_hctx_drain_inflight_rqs() for when
 inflights rqs !=0, and I don't see them for this timeout"

There might be two reasons:

1) You are still testing a multiple reply-queue device? As I
mentioned last times, it is hard to map reply-queue into blk-mq
hctx correctly.

2) concurrent dispatch to device, which can be observed by the
following patch.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 06081966549f..3590f6f947eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -679,6 +679,8 @@ void blk_mq_start_request(struct request *rq)
 {
        struct request_queue *q = rq->q;

+       WARN_ON_ONCE(test_bit(BLK_MQ_S_INTERNAL_STOPPED, &rq->mq_hctx->state));
+
        trace_block_rq_issue(q, rq);

        if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {

However, I think it is hard to be 2#, since the current CPU is the last
CPU in hctx->cpu_mask.


Thanks,
Ming


  parent reply	other threads:[~2019-10-20 10:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14  1:50 [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug Ming Lei
2019-10-14  1:50 ` [PATCH V4 1/5] blk-mq: add new state of BLK_MQ_S_INTERNAL_STOPPED Ming Lei
2019-10-14  1:50 ` [PATCH V4 2/5] blk-mq: prepare for draining IO when hctx's all CPUs are offline Ming Lei
2019-10-14  1:50 ` [PATCH V4 3/5] blk-mq: stop to handle IO and drain IO before hctx becomes dead Ming Lei
2019-11-28  9:29   ` John Garry
2019-10-14  1:50 ` [PATCH V4 4/5] blk-mq: re-submit IO in case that hctx is dead Ming Lei
2019-10-14  1:50 ` [PATCH V4 5/5] blk-mq: handle requests dispatched from IO scheduler " Ming Lei
2019-10-16  8:58 ` [PATCH V4 0/5] blk-mq: improvement on handling IO during CPU hotplug John Garry
2019-10-16 12:07   ` Ming Lei
2019-10-16 16:19     ` John Garry
     [not found]       ` <55a84ea3-647d-0a76-596c-c6c6b2fc1b75@huawei.com>
2019-10-20 10:14         ` Ming Lei [this message]
2019-10-21  9:19           ` John Garry
2019-10-21  9:34             ` Ming Lei
2019-10-21  9:47               ` John Garry
2019-10-21 10:24                 ` Ming Lei
2019-10-21 11:49                   ` John Garry
2019-10-21 12:53                     ` Ming Lei
2019-10-21 14:02                       ` John Garry
2019-10-22  0:16                         ` Ming Lei
2019-10-22 11:19                           ` John Garry
2019-10-22 13:45                             ` Ming Lei
2019-10-25 16:33             ` John Garry
2019-10-28 10:42               ` Ming Lei
2019-10-28 11:55                 ` John Garry
2019-10-29  1:50                   ` Ming Lei
2019-10-29  9:22                     ` John Garry
2019-10-29 10:05                       ` Ming Lei
2019-10-29 17:54                         ` John Garry
2019-10-31 16:28                         ` John Garry
2019-11-28  1:09 ` chenxiang (M)
2019-11-28  2:02   ` Ming Lei
2019-11-28 10:45     ` John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191020101404.GA5103@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=john.garry@huawei.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.