linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	linux-block@vger.kernel.org, Ming Lin <mlin@kernel.org>,
	Chao Leng <lengchao@huawei.com>
Subject: Re: [PATCH v5 1/2] blk-mq: add tagset quiesce interface
Date: Tue, 28 Jul 2020 10:17:44 +0800	[thread overview]
Message-ID: <20200728021744.GB1305646@T590> (raw)
In-Reply-To: <1d119df0-c3af-2dfa-d569-17109733ac80@kernel.dk>

On Mon, Jul 27, 2020 at 07:51:16PM -0600, Jens Axboe wrote:
> On 7/27/20 7:40 PM, Ming Lei wrote:
> > On Mon, Jul 27, 2020 at 04:10:21PM -0700, Sagi Grimberg wrote:
> >> drivers that have shared tagsets may need to quiesce potentially a lot
> >> of request queues that all share a single tagset (e.g. nvme). Add an interface
> >> to quiesce all the queues on a given tagset. This interface is useful because
> >> it can speedup the quiesce by doing it in parallel.
> >>
> >> For tagsets that have BLK_MQ_F_BLOCKING set, we use call_srcu to all hctxs
> >> in parallel such that all of them wait for the same rcu elapsed period with
> >> a per-hctx heap allocated rcu_synchronize. for tagsets that don't have
> >> BLK_MQ_F_BLOCKING set, we simply call a single synchronize_rcu as this is
> >> sufficient.
> >>
> >> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> >> ---
> >>  block/blk-mq.c         | 66 ++++++++++++++++++++++++++++++++++++++++++
> >>  include/linux/blk-mq.h |  4 +++
> >>  2 files changed, 70 insertions(+)
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index abcf590f6238..c37e37354330 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -209,6 +209,42 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
> >>  }
> >>  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> >>  
> >> +static void blk_mq_quiesce_blocking_queue_async(struct request_queue *q)
> >> +{
> >> +	struct blk_mq_hw_ctx *hctx;
> >> +	unsigned int i;
> >> +
> >> +	blk_mq_quiesce_queue_nowait(q);
> >> +
> >> +	queue_for_each_hw_ctx(q, hctx, i) {
> >> +		WARN_ON_ONCE(!(hctx->flags & BLK_MQ_F_BLOCKING));
> >> +		hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync), GFP_KERNEL);
> >> +		if (!hctx->rcu_sync)
> >> +			continue;
> > 
> > This approach of quiesce/unquiesce tagset is good abstraction.
> > 
> > Just one more thing, please allocate a rcu_sync array because hctx is
> > supposed to not store scratch stuff.
> 
> I'd be all for not stuffing this in the hctx, but how would that work?
> The only thing I can think of that would work reliably is batching the
> queue+wait into units of N. We could potentially have many thousands of
> queues, and it could get iffy (and/or unreliable) in terms of allocation
> size. Looks like rcu_synchronize is 48-bytes on my local install, and it
> doesn't take a lot of devices at current CPU counts to make an alloc
> covering all of it huge. Let's say 64 threads, and 32 devices, then
> we're already at 64*32*48 bytes which is an order 5 allocation. Not
> friendly, and not going to be reliable when you need it. And if we start
> batching in reasonable counts, then we're _almost_ back to doing a queue
> or two at the time... 32 * 48 is 1536 bytes, so we could only do two at
> the time for single page allocations.

We can convert to order 0 allocation by one extra indirect array. 


Thanks,
Ming


  reply	other threads:[~2020-07-28  2:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 23:10 [PATCH v5 0/2] improve nvme quiesce time for large amount of namespaces Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 1/2] blk-mq: add tagset quiesce interface Sagi Grimberg
2020-07-27 23:32   ` Keith Busch
2020-07-28  0:12     ` Sagi Grimberg
2020-07-28  1:40   ` Ming Lei
2020-07-28  1:51     ` Jens Axboe
2020-07-28  2:17       ` Ming Lei [this message]
2020-07-28  2:23         ` Jens Axboe
2020-07-28  2:28           ` Ming Lei
2020-07-28  2:32             ` Jens Axboe
2020-07-28  3:29               ` Sagi Grimberg
2020-07-28  3:25     ` Sagi Grimberg
2020-07-28  7:18   ` Christoph Hellwig
2020-07-28  7:48     ` Sagi Grimberg
2020-07-28  9:16     ` Ming Lei
2020-07-28  9:24       ` Sagi Grimberg
2020-07-28  9:33         ` Ming Lei
2020-07-28  9:37           ` Sagi Grimberg
2020-07-28  9:43             ` Sagi Grimberg
2020-07-28 10:10               ` Ming Lei
2020-07-28 10:57                 ` Christoph Hellwig
2020-07-28 14:13                 ` Paul E. McKenney
2020-07-28 10:58             ` Christoph Hellwig
2020-07-28 16:25               ` Sagi Grimberg
2020-07-28 13:54         ` Paul E. McKenney
2020-07-28 23:46           ` Sagi Grimberg
2020-07-29  0:31             ` Paul E. McKenney
2020-07-29  0:43               ` Sagi Grimberg
2020-07-29  0:59                 ` Keith Busch
2020-07-29  4:39                   ` Sagi Grimberg
2020-08-07  9:04                     ` Chao Leng
2020-08-07  9:24                       ` Ming Lei
2020-08-07  9:35                         ` Chao Leng
2020-07-29  4:10                 ` Paul E. McKenney
2020-07-29  4:37                   ` Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 2/2] nvme: use blk_mq_[un]quiesce_tagset Sagi Grimberg
2020-07-28  0:54   ` Sagi Grimberg
2020-07-28  3:21     ` Chao Leng
2020-07-28  3:34       ` Sagi Grimberg
2020-07-28  3:51         ` Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200728021744.GB1305646@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=mlin@kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).