All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Ming Lei <ming.lei@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, linux-block@vger.kernel.org,
	Chao Leng <lengchao@huawei.com>, Keith Busch <kbusch@kernel.org>,
	Ming Lin <mlin@kernel.org>
Subject: Re: [PATCH v5 1/2] blk-mq: add tagset quiesce interface
Date: Tue, 28 Jul 2020 17:31:24 -0700	[thread overview]
Message-ID: <20200729003124.GT9247@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <d1ba2009-130a-d423-1389-c7af72e25a6a@grimberg.me>

On Tue, Jul 28, 2020 at 04:46:23PM -0700, Sagi Grimberg wrote:
> Hey Paul,
> 
> > Indeed you cannot.  And if you build with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > it will yell at you when you try.
> > 
> > You -can- pass on-stack rcu_head structures to call_srcu(), though,
> > if that helps.  You of course must have some way of waiting for the
> > callback to be invoked before exiting that function.  This should be
> > easy for me to package into an API, maybe using one of the existing
> > reference-counting APIs.
> > 
> > So, do you have a separate stack frame for each of the desired call_srcu()
> > invocations?  If not, do you know at build time how many rcu_head
> > structures you need?  If the answer to both of these is "no", then
> > it is likely that there needs to be an rcu_head in each of the relevant
> > data structures, as was noted earlier in this thread.
> > 
> > Yeah, I should go read the code.  But I would need to know where it is
> > and it is still early in the morning over here!  ;-)
> > 
> > I probably should also have read the remainder of the thread before
> > replying, as well.  But what is the fun in that?
> 
> The use-case is to quiesce submissions to queues. This flow is where we
> want to teardown stuff, and we can potentially have 1000's of queues
> that we need to quiesce each one.
> 
> each queue (hctx) has either rcu or srcu depending if it may sleep
> during submission.
> 
> The goal is that the overall quiesce should be fast, so we want
> to wait for all of these queues elapsed period ~once, in parallel,
> instead of synchronizing each serially as done today.
> 
> The guys here are resisting to add a rcu_synchronize to each and
> every hctx because it will take 32 bytes more or less from 1000's
> of hctxs.
> 
> Dynamically allocating each one is possible but not very scalable.
> 
> The question is if there is some way, we can do this with on-stack
> or a single on-heap rcu_head or equivalent that can achieve the same
> effect.

If the hctx structures are guaranteed to stay put, you could count
them and then do a single allocation of an array of rcu_head structures
(or some larger structure containing an rcu_head structure, if needed).
You could then sequence through this array, consuming one rcu_head per
hctx as you processed it.  Once all the callbacks had been invoked,
it would be safe to free the array.

Sounds too simple, though.  So what am I missing?

							Thanx, Paul

WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-nvme@lists.infradead.org, Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, Chao Leng <lengchao@huawei.com>,
	Keith Busch <kbusch@kernel.org>, Ming Lin <mlin@kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v5 1/2] blk-mq: add tagset quiesce interface
Date: Tue, 28 Jul 2020 17:31:24 -0700	[thread overview]
Message-ID: <20200729003124.GT9247@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <d1ba2009-130a-d423-1389-c7af72e25a6a@grimberg.me>

On Tue, Jul 28, 2020 at 04:46:23PM -0700, Sagi Grimberg wrote:
> Hey Paul,
> 
> > Indeed you cannot.  And if you build with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > it will yell at you when you try.
> > 
> > You -can- pass on-stack rcu_head structures to call_srcu(), though,
> > if that helps.  You of course must have some way of waiting for the
> > callback to be invoked before exiting that function.  This should be
> > easy for me to package into an API, maybe using one of the existing
> > reference-counting APIs.
> > 
> > So, do you have a separate stack frame for each of the desired call_srcu()
> > invocations?  If not, do you know at build time how many rcu_head
> > structures you need?  If the answer to both of these is "no", then
> > it is likely that there needs to be an rcu_head in each of the relevant
> > data structures, as was noted earlier in this thread.
> > 
> > Yeah, I should go read the code.  But I would need to know where it is
> > and it is still early in the morning over here!  ;-)
> > 
> > I probably should also have read the remainder of the thread before
> > replying, as well.  But what is the fun in that?
> 
> The use-case is to quiesce submissions to queues. This flow is where we
> want to teardown stuff, and we can potentially have 1000's of queues
> that we need to quiesce each one.
> 
> each queue (hctx) has either rcu or srcu depending if it may sleep
> during submission.
> 
> The goal is that the overall quiesce should be fast, so we want
> to wait for all of these queues elapsed period ~once, in parallel,
> instead of synchronizing each serially as done today.
> 
> The guys here are resisting to add a rcu_synchronize to each and
> every hctx because it will take 32 bytes more or less from 1000's
> of hctxs.
> 
> Dynamically allocating each one is possible but not very scalable.
> 
> The question is if there is some way, we can do this with on-stack
> or a single on-heap rcu_head or equivalent that can achieve the same
> effect.

If the hctx structures are guaranteed to stay put, you could count
them and then do a single allocation of an array of rcu_head structures
(or some larger structure containing an rcu_head structure, if needed).
You could then sequence through this array, consuming one rcu_head per
hctx as you processed it.  Once all the callbacks had been invoked,
it would be safe to free the array.

Sounds too simple, though.  So what am I missing?

							Thanx, Paul

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-07-29  0:31 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 23:10 [PATCH v5 0/2] improve nvme quiesce time for large amount of namespaces Sagi Grimberg
2020-07-27 23:10 ` Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 1/2] blk-mq: add tagset quiesce interface Sagi Grimberg
2020-07-27 23:10   ` Sagi Grimberg
2020-07-27 23:32   ` Keith Busch
2020-07-27 23:32     ` Keith Busch
2020-07-28  0:12     ` Sagi Grimberg
2020-07-28  0:12       ` Sagi Grimberg
2020-07-28  1:40   ` Ming Lei
2020-07-28  1:40     ` Ming Lei
2020-07-28  1:51     ` Jens Axboe
2020-07-28  1:51       ` Jens Axboe
2020-07-28  2:17       ` Ming Lei
2020-07-28  2:17         ` Ming Lei
2020-07-28  2:23         ` Jens Axboe
2020-07-28  2:23           ` Jens Axboe
2020-07-28  2:28           ` Ming Lei
2020-07-28  2:28             ` Ming Lei
2020-07-28  2:32             ` Jens Axboe
2020-07-28  2:32               ` Jens Axboe
2020-07-28  3:29               ` Sagi Grimberg
2020-07-28  3:29                 ` Sagi Grimberg
2020-07-28  3:25     ` Sagi Grimberg
2020-07-28  3:25       ` Sagi Grimberg
2020-07-28  7:18   ` Christoph Hellwig
2020-07-28  7:18     ` Christoph Hellwig
2020-07-28  7:48     ` Sagi Grimberg
2020-07-28  7:48       ` Sagi Grimberg
2020-07-28  9:16     ` Ming Lei
2020-07-28  9:16       ` Ming Lei
2020-07-28  9:24       ` Sagi Grimberg
2020-07-28  9:24         ` Sagi Grimberg
2020-07-28  9:33         ` Ming Lei
2020-07-28  9:33           ` Ming Lei
2020-07-28  9:37           ` Sagi Grimberg
2020-07-28  9:37             ` Sagi Grimberg
2020-07-28  9:43             ` Sagi Grimberg
2020-07-28  9:43               ` Sagi Grimberg
2020-07-28 10:10               ` Ming Lei
2020-07-28 10:10                 ` Ming Lei
2020-07-28 10:57                 ` Christoph Hellwig
2020-07-28 10:57                   ` Christoph Hellwig
2020-07-28 14:13                 ` Paul E. McKenney
2020-07-28 14:13                   ` Paul E. McKenney
2020-07-28 10:58             ` Christoph Hellwig
2020-07-28 10:58               ` Christoph Hellwig
2020-07-28 16:25               ` Sagi Grimberg
2020-07-28 16:25                 ` Sagi Grimberg
2020-07-28 13:54         ` Paul E. McKenney
2020-07-28 13:54           ` Paul E. McKenney
2020-07-28 23:46           ` Sagi Grimberg
2020-07-28 23:46             ` Sagi Grimberg
2020-07-29  0:31             ` Paul E. McKenney [this message]
2020-07-29  0:31               ` Paul E. McKenney
2020-07-29  0:43               ` Sagi Grimberg
2020-07-29  0:43                 ` Sagi Grimberg
2020-07-29  0:59                 ` Keith Busch
2020-07-29  0:59                   ` Keith Busch
2020-07-29  4:39                   ` Sagi Grimberg
2020-07-29  4:39                     ` Sagi Grimberg
2020-08-07  9:04                     ` Chao Leng
2020-08-07  9:04                       ` Chao Leng
2020-08-07  9:24                       ` Ming Lei
2020-08-07  9:24                         ` Ming Lei
2020-08-07  9:35                         ` Chao Leng
2020-08-07  9:35                           ` Chao Leng
2020-07-29  4:10                 ` Paul E. McKenney
2020-07-29  4:10                   ` Paul E. McKenney
2020-07-29  4:37                   ` Sagi Grimberg
2020-07-29  4:37                     ` Sagi Grimberg
2020-07-27 23:10 ` [PATCH v5 2/2] nvme: use blk_mq_[un]quiesce_tagset Sagi Grimberg
2020-07-27 23:10   ` Sagi Grimberg
2020-07-28  0:54   ` Sagi Grimberg
2020-07-28  0:54     ` Sagi Grimberg
2020-07-28  3:21     ` Chao Leng
2020-07-28  3:21       ` Chao Leng
2020-07-28  3:34       ` Sagi Grimberg
2020-07-28  3:34         ` Sagi Grimberg
2020-07-28  3:51         ` Chao Leng
2020-07-28  3:51           ` Chao Leng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200729003124.GT9247@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=mlin@kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.