linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Jens Axboe <axboe@kernel.dk>, Ming Lei <ming.lei@redhat.com>
Cc: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	linux-block@vger.kernel.org, Ming Lin <mlin@kernel.org>,
	Chao Leng <lengchao@huawei.com>
Subject: Re: [PATCH v5 1/2] blk-mq: add tagset quiesce interface
Date: Mon, 27 Jul 2020 20:29:43 -0700	[thread overview]
Message-ID: <0af89fcf-3505-acb1-6c91-1fff8e53b146@grimberg.me> (raw)
In-Reply-To: <baede23a-94c1-1494-bcca-964e1396f253@kernel.dk>


>>>>>>> +static void blk_mq_quiesce_blocking_queue_async(struct request_queue *q)
>>>>>>> +{
>>>>>>> +	struct blk_mq_hw_ctx *hctx;
>>>>>>> +	unsigned int i;
>>>>>>> +
>>>>>>> +	blk_mq_quiesce_queue_nowait(q);
>>>>>>> +
>>>>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>>>>> +		WARN_ON_ONCE(!(hctx->flags & BLK_MQ_F_BLOCKING));
>>>>>>> +		hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync), GFP_KERNEL);
>>>>>>> +		if (!hctx->rcu_sync)
>>>>>>> +			continue;
>>>>>>
>>>>>> This approach of quiesce/unquiesce tagset is good abstraction.
>>>>>>
>>>>>> Just one more thing, please allocate a rcu_sync array because hctx is
>>>>>> supposed to not store scratch stuff.
>>>>>
>>>>> I'd be all for not stuffing this in the hctx, but how would that work?
>>>>> The only thing I can think of that would work reliably is batching the
>>>>> queue+wait into units of N. We could potentially have many thousands of
>>>>> queues, and it could get iffy (and/or unreliable) in terms of allocation
>>>>> size. Looks like rcu_synchronize is 48-bytes on my local install, and it
>>>>> doesn't take a lot of devices at current CPU counts to make an alloc
>>>>> covering all of it huge. Let's say 64 threads, and 32 devices, then
>>>>> we're already at 64*32*48 bytes which is an order 5 allocation. Not
>>>>> friendly, and not going to be reliable when you need it. And if we start
>>>>> batching in reasonable counts, then we're _almost_ back to doing a queue
>>>>> or two at the time... 32 * 48 is 1536 bytes, so we could only do two at
>>>>> the time for single page allocations.
>>>>
>>>> We can convert to order 0 allocation by one extra indirect array.
>>>
>>> I guess that could work, and would just be one extra alloc + free if we
>>> still retain the batch. That'd take it to 16 devices (at 32 CPUs) per
>>> round, potentially way less of course if we have more CPUs. So still
>>> somewhat limiting, rather than do all at once.
>>
>> With the approach in blk_mq_alloc_rqs(), each allocated page can be
>> added to one list, so the indirect array can be saved. Then it is
>> possible to allocate for any size queues/devices since every
>> allocation is just for single page in case that it is needed, even no
>> pre-calculation is required.
> 
> As long as we watch the complexity, don't think we need to go overboard
> here in the risk of adding issues for the failure path.

No we don't. I prefer not to do it. And if this turns out to be that bad
we can later convert it to a complicated page vector.

I'll move forward with this approach.

  reply	other threads:[~2020-07-28  3:29 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 23:10 [PATCH v5 0/2] improve nvme quiesce time for large amount of namespaces Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 1/2] blk-mq: add tagset quiesce interface Sagi Grimberg
2020-07-27 23:32   ` Keith Busch
2020-07-28  0:12     ` Sagi Grimberg
2020-07-28  1:40   ` Ming Lei
2020-07-28  1:51     ` Jens Axboe
2020-07-28  2:17       ` Ming Lei
2020-07-28  2:23         ` Jens Axboe
2020-07-28  2:28           ` Ming Lei
2020-07-28  2:32             ` Jens Axboe
2020-07-28  3:29               ` Sagi Grimberg [this message]
2020-07-28  3:25     ` Sagi Grimberg
2020-07-28  7:18   ` Christoph Hellwig
2020-07-28  7:48     ` Sagi Grimberg
2020-07-28  9:16     ` Ming Lei
2020-07-28  9:24       ` Sagi Grimberg
2020-07-28  9:33         ` Ming Lei
2020-07-28  9:37           ` Sagi Grimberg
2020-07-28  9:43             ` Sagi Grimberg
2020-07-28 10:10               ` Ming Lei
2020-07-28 10:57                 ` Christoph Hellwig
2020-07-28 14:13                 ` Paul E. McKenney
2020-07-28 10:58             ` Christoph Hellwig
2020-07-28 16:25               ` Sagi Grimberg
2020-07-28 13:54         ` Paul E. McKenney
2020-07-28 23:46           ` Sagi Grimberg
2020-07-29  0:31             ` Paul E. McKenney
2020-07-29  0:43               ` Sagi Grimberg
2020-07-29  0:59                 ` Keith Busch
2020-07-29  4:39                   ` Sagi Grimberg
2020-08-07  9:04                     ` Chao Leng
2020-08-07  9:24                       ` Ming Lei
2020-08-07  9:35                         ` Chao Leng
2020-07-29  4:10                 ` Paul E. McKenney
2020-07-29  4:37                   ` Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 2/2] nvme: use blk_mq_[un]quiesce_tagset Sagi Grimberg
2020-07-28  0:54   ` Sagi Grimberg
2020-07-28  3:21     ` Chao Leng
2020-07-28  3:34       ` Sagi Grimberg
2020-07-28  3:51         ` Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0af89fcf-3505-acb1-6c91-1fff8e53b146@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=mlin@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).