linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] improve quiesce time for large amount of namespaces
@ 2020-07-26  0:22 Sagi Grimberg
  2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
  2020-07-26  0:23 ` [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces Sagi Grimberg
  0 siblings, 2 replies; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-26  0:22 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, Jens Axboe
  Cc: linux-block, Chao Leng

This set attempts to improve the quiesce time when using a large set of
namespaces, which also improves I/O failover time in a multipath environment.

We improve for both non-blocking hctxs (e.g. pci, fc, rdma) and blocking
hctxs (e.g. tcp) by splitting queue auiesce to async call_(s)rcu and
async_wait to wait for it to complete.

Changes from v2:
- made blk_mq_quiesce_queue_async operate on both blocking and
  non-blocking hctxs.
- removed separation between blocking vs. non-blocking queues
- dropeed patch from Chao
- dropped nvme-rdma test patch

Changes from v1:
- trivial typo fixes

Sagi Grimberg (2):
  blk-mq: add async quiesce interface
  nvme: improve quiesce time for large amount of namespaces

 block/blk-mq.c           | 32 ++++++++++++++++++++++++++++++++
 drivers/nvme/host/core.c |  4 +++-
 include/linux/blk-mq.h   |  4 ++++
 3 files changed, 39 insertions(+), 1 deletion(-)

-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-26  0:22 [PATCH v3 0/2] improve quiesce time for large amount of namespaces Sagi Grimberg
@ 2020-07-26  0:23 ` Sagi Grimberg
  2020-07-26  9:31   ` Ming Lei
  2020-07-26  0:23 ` [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces Sagi Grimberg
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-26  0:23 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, Jens Axboe
  Cc: linux-block, Chao Leng

Drivers that may have to quiesce a large amount of request queues at once
(e.g. controller or adapter reset). These drivers would benefit from an
async quiesce interface such that the can trigger quiesce asynchronously
and wait for all in parallel.

This leaves the synchronization responsibility to the driver, but adds
a convenient interface to quiesce async and wait in a single pass.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-mq.c         | 32 ++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  4 ++++
 2 files changed, 36 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index abcf590f6238..60d137265bd9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,6 +209,38 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
+void blk_mq_quiesce_queue_async(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	blk_mq_quiesce_queue_nowait(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		init_completion(&hctx->rcu_sync.completion);
+		init_rcu_head(&hctx->rcu_sync.head);
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
+				wakeme_after_rcu);
+		else
+			call_rcu(&hctx->rcu_sync.head,
+				wakeme_after_rcu);
+	}
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
+
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		wait_for_completion(&hctx->rcu_sync.completion);
+		destroy_rcu_head(&hctx->rcu_sync.head);
+	}
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
+
 /**
  * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
  * @q: request queue.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 23230c1d031e..5536e434311a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,7 @@
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
 #include <linux/srcu.h>
+#include <linux/rcupdate_wait.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
 	 */
 	struct list_head	hctx_list;
 
+	struct rcu_synchronize	rcu_sync;
 	/**
 	 * @srcu: Sleepable RCU. Use as lock when type of the hardware queue is
 	 * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also
@@ -532,6 +534,8 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap);
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues);
 
 void blk_mq_quiesce_queue_nowait(struct request_queue *q);
+void blk_mq_quiesce_queue_async(struct request_queue *q);
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q);
 
 unsigned int blk_mq_rq_cpu(struct request *rq);
 
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces
  2020-07-26  0:22 [PATCH v3 0/2] improve quiesce time for large amount of namespaces Sagi Grimberg
  2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
@ 2020-07-26  0:23 ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-26  0:23 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Keith Busch, Jens Axboe
  Cc: linux-block, Chao Leng

nvme currently will synchronize queue quiesce for each namespace at once.
This can slow down failover time (which first quiesce all ns queues) if we
have a large amount of namespaces. Instead, we want to use an async interface
and do the namespaces quiesce in parallel rather than serially.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c16bfdff2953..45f1559ee160 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4535,7 +4535,9 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 	down_read(&ctrl->namespaces_rwsem);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		blk_mq_quiesce_queue_async(ns->queue);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_quiesce_queue_async_wait(ns->queue);
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
-- 
2.25.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
@ 2020-07-26  9:31   ` Ming Lei
  2020-07-26 16:27     ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-07-26  9:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, linux-nvme, linux-block, Chao Leng, Keith Busch,
	Christoph Hellwig

Hi Sagi,

On Sat, Jul 25, 2020 at 05:23:00PM -0700, Sagi Grimberg wrote:
> Drivers that may have to quiesce a large amount of request queues at once
> (e.g. controller or adapter reset). These drivers would benefit from an
> async quiesce interface such that the can trigger quiesce asynchronously
> and wait for all in parallel.
> 
> This leaves the synchronization responsibility to the driver, but adds
> a convenient interface to quiesce async and wait in a single pass.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  block/blk-mq.c         | 32 ++++++++++++++++++++++++++++++++
>  include/linux/blk-mq.h |  4 ++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index abcf590f6238..60d137265bd9 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,6 +209,38 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
>  
> +void blk_mq_quiesce_queue_async(struct request_queue *q)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned int i;
> +
> +	blk_mq_quiesce_queue_nowait(q);
> +
> +	queue_for_each_hw_ctx(q, hctx, i) {
> +		init_completion(&hctx->rcu_sync.completion);
> +		init_rcu_head(&hctx->rcu_sync.head);
> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> +				wakeme_after_rcu);
> +		else
> +			call_rcu(&hctx->rcu_sync.head,
> +				wakeme_after_rcu);
> +	}

Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
synchronize_rcu() is OK for all hctx during waiting.

> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> +
> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> +{
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned int i;
> +
> +	queue_for_each_hw_ctx(q, hctx, i) {
> +		wait_for_completion(&hctx->rcu_sync.completion);
> +		destroy_rcu_head(&hctx->rcu_sync.head);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> +
>  /**
>   * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>   * @q: request queue.
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 23230c1d031e..5536e434311a 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/sbitmap.h>
>  #include <linux/srcu.h>
> +#include <linux/rcupdate_wait.h>
>  
>  struct blk_mq_tags;
>  struct blk_flush_queue;
> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>  	 */
>  	struct list_head	hctx_list;
>  
> +	struct rcu_synchronize	rcu_sync;
 
The above struct takes at least 5 words, and I'd suggest to avoid it,
and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
Meantime !BLK_MQ_F_BLOCKING doesn't need it.


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-26  9:31   ` Ming Lei
@ 2020-07-26 16:27     ` Sagi Grimberg
  2020-07-27  2:08       ` Ming Lei
  0 siblings, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-26 16:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-nvme, linux-block, Chao Leng, Keith Busch,
	Christoph Hellwig


>> +void blk_mq_quiesce_queue_async(struct request_queue *q)
>> +{
>> +	struct blk_mq_hw_ctx *hctx;
>> +	unsigned int i;
>> +
>> +	blk_mq_quiesce_queue_nowait(q);
>> +
>> +	queue_for_each_hw_ctx(q, hctx, i) {
>> +		init_completion(&hctx->rcu_sync.completion);
>> +		init_rcu_head(&hctx->rcu_sync.head);
>> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
>> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
>> +				wakeme_after_rcu);
>> +		else
>> +			call_rcu(&hctx->rcu_sync.head,
>> +				wakeme_after_rcu);
>> +	}
> 
> Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> synchronize_rcu() is OK for all hctx during waiting.

That's true, but I want a single interface for both. v2 had exactly
that, but I decided that this approach is better.

Also, having the driver call a single synchronize_rcu isn't great
layering (as quiesce can possibly use a different mechanism in the 
future). So drivers assumptions like:

         /*
          * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
          * calling synchronize_rcu() once is enough.
          */
         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);

         if (!ret)
                 synchronize_rcu();

Are not great...

>> +}
>> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
>> +
>> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
>> +{
>> +	struct blk_mq_hw_ctx *hctx;
>> +	unsigned int i;
>> +
>> +	queue_for_each_hw_ctx(q, hctx, i) {
>> +		wait_for_completion(&hctx->rcu_sync.completion);
>> +		destroy_rcu_head(&hctx->rcu_sync.head);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
>> +
>>   /**
>>    * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>>    * @q: request queue.
>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>> index 23230c1d031e..5536e434311a 100644
>> --- a/include/linux/blk-mq.h
>> +++ b/include/linux/blk-mq.h
>> @@ -5,6 +5,7 @@
>>   #include <linux/blkdev.h>
>>   #include <linux/sbitmap.h>
>>   #include <linux/srcu.h>
>> +#include <linux/rcupdate_wait.h>
>>   
>>   struct blk_mq_tags;
>>   struct blk_flush_queue;
>> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>>   	 */
>>   	struct list_head	hctx_list;
>>   
>> +	struct rcu_synchronize	rcu_sync;
>   
> The above struct takes at least 5 words, and I'd suggest to avoid it,
> and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
> Meantime !BLK_MQ_F_BLOCKING doesn't need it.

It is at the end and contains exactly what is needed to synchronize. Not
sure what you mean by reuse hctx->srcu?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-26 16:27     ` Sagi Grimberg
@ 2020-07-27  2:08       ` Ming Lei
  2020-07-27  3:33         ` Chao Leng
  2020-07-27 18:36         ` Sagi Grimberg
  0 siblings, 2 replies; 19+ messages in thread
From: Ming Lei @ 2020-07-27  2:08 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, linux-nvme, linux-block, Chao Leng, Keith Busch,
	Christoph Hellwig

On Sun, Jul 26, 2020 at 09:27:56AM -0700, Sagi Grimberg wrote:
> 
> > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	blk_mq_quiesce_queue_nowait(q);
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		init_completion(&hctx->rcu_sync.completion);
> > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +		else
> > > +			call_rcu(&hctx->rcu_sync.head,
> > > +				wakeme_after_rcu);
> > > +	}
> > 
> > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > synchronize_rcu() is OK for all hctx during waiting.
> 
> That's true, but I want a single interface for both. v2 had exactly
> that, but I decided that this approach is better.

Not sure one new interface is needed, and one simple way is to:

1) call blk_mq_quiesce_queue_nowait() for each request queue

2) wait in driver specific way

Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
then you may add per-tagset APIs for the waiting.

> 
> Also, having the driver call a single synchronize_rcu isn't great

Too many drivers are using synchronize_rcu():

	$ git grep -n synchronize_rcu ./drivers/ | wc
	    186     524   11384

> layering (as quiesce can possibly use a different mechanism in the future).

What is the different mechanism?

> So drivers assumptions like:
> 
>         /*
>          * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>          * calling synchronize_rcu() once is enough.
>          */
>         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
> 
>         if (!ret)
>                 synchronize_rcu();
> 
> Are not great...

Both rcu read lock/unlock and synchronize_rcu is global interface, then
it is reasonable to avoid unnecessary synchronize_rcu().

> 
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> > > +
> > > +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> > > +{
> > > +	struct blk_mq_hw_ctx *hctx;
> > > +	unsigned int i;
> > > +
> > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > +		wait_for_completion(&hctx->rcu_sync.completion);
> > > +		destroy_rcu_head(&hctx->rcu_sync.head);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> > > +
> > >   /**
> > >    * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
> > >    * @q: request queue.
> > > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > > index 23230c1d031e..5536e434311a 100644
> > > --- a/include/linux/blk-mq.h
> > > +++ b/include/linux/blk-mq.h
> > > @@ -5,6 +5,7 @@
> > >   #include <linux/blkdev.h>
> > >   #include <linux/sbitmap.h>
> > >   #include <linux/srcu.h>
> > > +#include <linux/rcupdate_wait.h>
> > >   struct blk_mq_tags;
> > >   struct blk_flush_queue;
> > > @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
> > >   	 */
> > >   	struct list_head	hctx_list;
> > > +	struct rcu_synchronize	rcu_sync;
> > The above struct takes at least 5 words, and I'd suggest to avoid it,
> > and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
> > Meantime !BLK_MQ_F_BLOCKING doesn't need it.
> 
> It is at the end and contains exactly what is needed to synchronize. Not

The sync is simply single global synchronize_rcu(), and why bother to add
extra >=40bytes for each hctx.

> sure what you mean by reuse hctx->srcu?

You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
to each hctx for just simulating one single synchronize_rcu().


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  2:08       ` Ming Lei
@ 2020-07-27  3:33         ` Chao Leng
  2020-07-27  3:50           ` Ming Lei
  2020-07-27 18:36         ` Sagi Grimberg
  1 sibling, 1 reply; 19+ messages in thread
From: Chao Leng @ 2020-07-27  3:33 UTC (permalink / raw)
  To: Ming Lei, Sagi Grimberg
  Cc: Keith Busch, Jens Axboe, linux-block, Christoph Hellwig, linux-nvme



On 2020/7/27 10:08, Ming Lei wrote:
>> It is at the end and contains exactly what is needed to synchronize. Not
> The sync is simply single global synchronize_rcu(), and why bother to add
> extra >=40bytes for each hctx.
> 
>> sure what you mean by reuse hctx->srcu?
> You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
> to each hctx for just simulating one single synchronize_rcu().

To sync srcu together, the extra bytes must be needed, seperate blocking
and non blocking queue to two hctx may be a not good choice.

There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
not need if which do not need batch sync srcu, I still think it's better
for the SRCU to provide the batch synchronization interface.

We can add check ctrl->tagset->flags to provide same interface both for
blocking and non blocking queue. The code for TINY_SRCU:

---
  block/blk-mq.c           | 29 +++++++++++++++++++++++++++++
  drivers/nvme/host/core.c |  9 ++++++++-
  include/linux/blk-mq.h   |  2 ++
  include/linux/srcu.h     |  2 ++
  include/linux/srcutiny.h |  1 +
  kernel/rcu/srcutiny.c    | 16 ++++++++++++++++
  6 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4e0d173beaa3..3117fc3082ff 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -235,6 +235,35 @@ void blk_mq_quiesce_queue(struct request_queue *q)
  }
  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);

+void blk_mq_quiesce_queue_async(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	blk_mq_quiesce_queue_nowait(q);
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu_async(hctx->srcu);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
+
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+
+	if (q == NULL) {
+		synchronize_rcu();
+		return;
+	}
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu_async_wait(hctx->srcu);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
+
  /*
   * blk_mq_unquiesce_queue() - counterpart of blk_mq_quiesce_queue()
   * @q: request queue.
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a3b1157561f5..f13aa447ab64 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4322,7 +4322,14 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)

  	down_read(&ctrl->namespaces_rwsem);
  	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_quiesce_queue(ns->queue);
+		blk_mq_quiesce_queue_async(ns->queue);
+
+	if (ctrl->tagset->flags & BLK_MQ_F_BLOCKING) {
+		list_for_each_entry(ns, &ctrl->namespaces, list)
+			blk_mq_quiesce_queue_async_wait(ns->queue);
+	} else {
+		blk_mq_quiesce_queue_async_wait(NULL);
+	}
  	up_read(&ctrl->namespaces_rwsem);
  }
  EXPORT_SYMBOL_GPL(nvme_stop_queues);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d6fcae17da5a..092470c63558 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -515,6 +515,8 @@ void blk_mq_start_hw_queues(struct request_queue *q);
  void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
  void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
  void blk_mq_quiesce_queue(struct request_queue *q);
+void blk_mq_quiesce_queue_async(struct request_queue *q);
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q);
  void blk_mq_unquiesce_queue(struct request_queue *q);
  void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
  void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index e432cc92c73d..7e006e51ccf9 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -60,6 +60,8 @@ void cleanup_srcu_struct(struct srcu_struct *ssp);
  int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp);
  void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp);
  void synchronize_srcu(struct srcu_struct *ssp);
+void synchronize_srcu_async(struct srcu_struct *ssp);
+void synchronize_srcu_async_wait(struct srcu_struct *ssp);

  #ifdef CONFIG_DEBUG_LOCK_ALLOC

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 5a5a1941ca15..3d7d871bef61 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -23,6 +23,7 @@ struct srcu_struct {
  	struct rcu_head *srcu_cb_head;	/* Pending callbacks: Head. */
  	struct rcu_head **srcu_cb_tail;	/* Pending callbacks: Tail. */
  	struct work_struct srcu_work;	/* For driving grace periods. */
+	struct rcu_synchronize rcu_sync;
  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  	struct lockdep_map dep_map;
  #endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 6208c1dae5c9..6e1468175a45 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -190,6 +190,22 @@ void synchronize_srcu(struct srcu_struct *ssp)
  }
  EXPORT_SYMBOL_GPL(synchronize_srcu);

+void synchronize_srcu_async(struct srcu_struct *ssp)
+{
+	init_rcu_head(&ssp->rcu_sync.head);
+	init_completion(&ssp->rcu_sync.completion);
+	call_srcu(ssp, &ssp->rcu_sync.head, wakeme_after_rcu_batch);
+
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu_async);
+
+void synchronize_srcu_async_wait(struct srcu_struct *ssp)
+{
+	wait_for_completion(&ssp->rcu_sync.completion);
+	destroy_rcu_head(&ssp->rcu_sync.head);
+}
+EXPORT_SYMBOL_GPL(synchronize_srcu_async_wait);
+
  /* Lockdep diagnostics.  */
  void __init rcu_scheduler_starting(void)
  {
-- 
2.16.4

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  3:33         ` Chao Leng
@ 2020-07-27  3:50           ` Ming Lei
  2020-07-27  5:55             ` Chao Leng
  2020-07-27 18:38             ` Sagi Grimberg
  0 siblings, 2 replies; 19+ messages in thread
From: Ming Lei @ 2020-07-27  3:50 UTC (permalink / raw)
  To: Chao Leng
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig

On Mon, Jul 27, 2020 at 11:33:43AM +0800, Chao Leng wrote:
> 
> 
> On 2020/7/27 10:08, Ming Lei wrote:
> > > It is at the end and contains exactly what is needed to synchronize. Not
> > The sync is simply single global synchronize_rcu(), and why bother to add
> > extra >=40bytes for each hctx.
> > 
> > > sure what you mean by reuse hctx->srcu?
> > You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
> > to each hctx for just simulating one single synchronize_rcu().
> 
> To sync srcu together, the extra bytes must be needed, seperate blocking
> and non blocking queue to two hctx may be a not good choice.
> 
> There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
> Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
> not need if which do not need batch sync srcu, I still think it's better
> for the SRCU to provide the batch synchronization interface.

The 'struct rcu_synchronize' can be allocated from heap or even stack(
if no too many NSs), which is just one shot sync and the API is supposed
to be called in slow path. No need to allocate it as long lifetime variable.
Especially 'struct srcu_struct' has already been too fat.


Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  3:50           ` Ming Lei
@ 2020-07-27  5:55             ` Chao Leng
  2020-07-27  6:32               ` Ming Lei
  2020-07-27 18:38             ` Sagi Grimberg
  1 sibling, 1 reply; 19+ messages in thread
From: Chao Leng @ 2020-07-27  5:55 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig



On 2020/7/27 11:50, Ming Lei wrote:
> On Mon, Jul 27, 2020 at 11:33:43AM +0800, Chao Leng wrote:
>>
>>
>> On 2020/7/27 10:08, Ming Lei wrote:
>>>> It is at the end and contains exactly what is needed to synchronize. Not
>>> The sync is simply single global synchronize_rcu(), and why bother to add
>>> extra >=40bytes for each hctx.
>>>
>>>> sure what you mean by reuse hctx->srcu?
>>> You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
>>> to each hctx for just simulating one single synchronize_rcu().
>>
>> To sync srcu together, the extra bytes must be needed, seperate blocking
>> and non blocking queue to two hctx may be a not good choice.
>>
>> There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
>> Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
>> not need if which do not need batch sync srcu, I still think it's better
>> for the SRCU to provide the batch synchronization interface.
> 
> The 'struct rcu_synchronize' can be allocated from heap or even stack(
> if no too many NSs), which is just one shot sync and the API is supposed
> to be called in slow path. No need to allocate it as long lifetime variable.
> Especially 'struct srcu_struct' has already been too fat.

Stack is not suitable, stack can not provide so many space for
many name space. Heap maybe a choice, but need to add abnormal treat
when alloc memory failed, Thus io pause time can not be ensured.
So the extra space may must be needed for batch srcu sync.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  5:55             ` Chao Leng
@ 2020-07-27  6:32               ` Ming Lei
  2020-07-27 18:40                 ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2020-07-27  6:32 UTC (permalink / raw)
  To: Chao Leng
  Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block, Keith Busch,
	Christoph Hellwig

On Mon, Jul 27, 2020 at 01:55:53PM +0800, Chao Leng wrote:
> 
> 
> On 2020/7/27 11:50, Ming Lei wrote:
> > On Mon, Jul 27, 2020 at 11:33:43AM +0800, Chao Leng wrote:
> > > 
> > > 
> > > On 2020/7/27 10:08, Ming Lei wrote:
> > > > > It is at the end and contains exactly what is needed to synchronize. Not
> > > > The sync is simply single global synchronize_rcu(), and why bother to add
> > > > extra >=40bytes for each hctx.
> > > > 
> > > > > sure what you mean by reuse hctx->srcu?
> > > > You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
> > > > to each hctx for just simulating one single synchronize_rcu().
> > > 
> > > To sync srcu together, the extra bytes must be needed, seperate blocking
> > > and non blocking queue to two hctx may be a not good choice.
> > > 
> > > There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
> > > Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
> > > not need if which do not need batch sync srcu, I still think it's better
> > > for the SRCU to provide the batch synchronization interface.
> > 
> > The 'struct rcu_synchronize' can be allocated from heap or even stack(
> > if no too many NSs), which is just one shot sync and the API is supposed
> > to be called in slow path. No need to allocate it as long lifetime variable.
> > Especially 'struct srcu_struct' has already been too fat.
> 
> Stack is not suitable, stack can not provide so many space for

Stack is fine if number of NS is small, for example, most of times,
there is just one NS.

> many name space. Heap maybe a choice, but need to add abnormal treat
> when alloc memory failed, Thus io pause time can not be ensured.
> So the extra space may must be needed for batch srcu sync.

In case of allocation failure, you can switch to synchronize_srcu() simply.

Thanks, 
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  2:08       ` Ming Lei
  2020-07-27  3:33         ` Chao Leng
@ 2020-07-27 18:36         ` Sagi Grimberg
  2020-07-27 20:37           ` Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-27 18:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-nvme, linux-block, Chao Leng, Keith Busch,
	Christoph Hellwig


>>>> +void blk_mq_quiesce_queue_async(struct request_queue *q)
>>>> +{
>>>> +	struct blk_mq_hw_ctx *hctx;
>>>> +	unsigned int i;
>>>> +
>>>> +	blk_mq_quiesce_queue_nowait(q);
>>>> +
>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>> +		init_completion(&hctx->rcu_sync.completion);
>>>> +		init_rcu_head(&hctx->rcu_sync.head);
>>>> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
>>>> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
>>>> +				wakeme_after_rcu);
>>>> +		else
>>>> +			call_rcu(&hctx->rcu_sync.head,
>>>> +				wakeme_after_rcu);
>>>> +	}
>>>
>>> Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
>>> synchronize_rcu() is OK for all hctx during waiting.
>>
>> That's true, but I want a single interface for both. v2 had exactly
>> that, but I decided that this approach is better.
> 
> Not sure one new interface is needed, and one simple way is to:
> 
> 1) call blk_mq_quiesce_queue_nowait() for each request queue
> 
> 2) wait in driver specific way
> 
> Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
> then you may add per-tagset APIs for the waiting.

Because it puts assumptions on how quiesce works, which is something
I'd like to avoid because I think its cleaner, what do others think?
Jens? Christoph?

>> Also, having the driver call a single synchronize_rcu isn't great
> 
> Too many drivers are using synchronize_rcu():
> 
> 	$ git grep -n synchronize_rcu ./drivers/ | wc
> 	    186     524   11384

Wasn't talking about the usage of synchronize_rcu, was referring to
the hidden assumption that quiesce is an rcu driven operation.

>> layering (as quiesce can possibly use a different mechanism in the future).
> 
> What is the different mechanism?

Nothing specific, just said that having drivers assume that quiesce is
synchronizing rcu or srcu is not great.

>> So drivers assumptions like:
>>
>>          /*
>>           * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>>           * calling synchronize_rcu() once is enough.
>>           */
>>          WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
>>
>>          if (!ret)
>>                  synchronize_rcu();
>>
>> Are not great...
> 
> Both rcu read lock/unlock and synchronize_rcu is global interface, then
> it is reasonable to avoid unnecessary synchronize_rcu().

Again, the fact that quiesce translates to synchronize rcu/srcu based
on the underlying tagset is implicit.

>>>> +}
>>>> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
>>>> +
>>>> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
>>>> +{
>>>> +	struct blk_mq_hw_ctx *hctx;
>>>> +	unsigned int i;
>>>> +
>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>> +		wait_for_completion(&hctx->rcu_sync.completion);
>>>> +		destroy_rcu_head(&hctx->rcu_sync.head);
>>>> +	}
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
>>>> +
>>>>    /**
>>>>     * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>>>>     * @q: request queue.
>>>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>>>> index 23230c1d031e..5536e434311a 100644
>>>> --- a/include/linux/blk-mq.h
>>>> +++ b/include/linux/blk-mq.h
>>>> @@ -5,6 +5,7 @@
>>>>    #include <linux/blkdev.h>
>>>>    #include <linux/sbitmap.h>
>>>>    #include <linux/srcu.h>
>>>> +#include <linux/rcupdate_wait.h>
>>>>    struct blk_mq_tags;
>>>>    struct blk_flush_queue;
>>>> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>>>>    	 */
>>>>    	struct list_head	hctx_list;
>>>> +	struct rcu_synchronize	rcu_sync;
>>> The above struct takes at least 5 words, and I'd suggest to avoid it,
>>> and the hctx->srcu should be re-used for waiting BLK_MQ_F_BLOCKING.
>>> Meantime !BLK_MQ_F_BLOCKING doesn't need it.
>>
>> It is at the end and contains exactly what is needed to synchronize. Not
> 
> The sync is simply single global synchronize_rcu(), and why bother to add
> extra >=40bytes for each hctx.

We can use the heap for this, but it will slow down the operation. Not
sure if this is really meaningful given that it is in the end of the
struct...

We cannot use the stack, because we do the wait asynchronously.

>> sure what you mean by reuse hctx->srcu?
> 
> You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
> to each hctx for just simulating one single synchronize_rcu().

That is my preference, I don't want nvme or other drivers to take a
different route for blocking vs. non-bloking based on

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  3:50           ` Ming Lei
  2020-07-27  5:55             ` Chao Leng
@ 2020-07-27 18:38             ` Sagi Grimberg
  1 sibling, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-27 18:38 UTC (permalink / raw)
  To: Ming Lei, Chao Leng
  Cc: Keith Busch, Jens Axboe, linux-block, Christoph Hellwig, linux-nvme


>> On 2020/7/27 10:08, Ming Lei wrote:
>>>> It is at the end and contains exactly what is needed to synchronize. Not
>>> The sync is simply single global synchronize_rcu(), and why bother to add
>>> extra >=40bytes for each hctx.
>>>
>>>> sure what you mean by reuse hctx->srcu?
>>> You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
>>> to each hctx for just simulating one single synchronize_rcu().
>>
>> To sync srcu together, the extra bytes must be needed, seperate blocking
>> and non blocking queue to two hctx may be a not good choice.
>>
>> There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
>> Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
>> not need if which do not need batch sync srcu, I still think it's better
>> for the SRCU to provide the batch synchronization interface.
> 
> The 'struct rcu_synchronize' can be allocated from heap or even stack(
> if no too many NSs), which is just one shot sync and the API is supposed
> to be called in slow path. No need to allocate it as long lifetime variable.
> Especially 'struct srcu_struct' has already been too fat.

stack is not possible, and there is zero justification for this to go to 
every srcu user out there...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27  6:32               ` Ming Lei
@ 2020-07-27 18:40                 ` Sagi Grimberg
  0 siblings, 0 replies; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-27 18:40 UTC (permalink / raw)
  To: Ming Lei, Chao Leng
  Cc: Keith Busch, Jens Axboe, linux-block, Christoph Hellwig, linux-nvme


>>>>>> It is at the end and contains exactly what is needed to synchronize. Not
>>>>> The sync is simply single global synchronize_rcu(), and why bother to add
>>>>> extra >=40bytes for each hctx.
>>>>>
>>>>>> sure what you mean by reuse hctx->srcu?
>>>>> You already reuses hctx->srcu, but not see reason to add extra rcu_synchronize
>>>>> to each hctx for just simulating one single synchronize_rcu().
>>>>
>>>> To sync srcu together, the extra bytes must be needed, seperate blocking
>>>> and non blocking queue to two hctx may be a not good choice.
>>>>
>>>> There is two choice: the struct rcu_synchronize is added in hctx or in srcu.
>>>> Though add rcu_synchronize in srcu has a  weakness: the extra bytes is
>>>> not need if which do not need batch sync srcu, I still think it's better
>>>> for the SRCU to provide the batch synchronization interface.
>>>
>>> The 'struct rcu_synchronize' can be allocated from heap or even stack(
>>> if no too many NSs), which is just one shot sync and the API is supposed
>>> to be called in slow path. No need to allocate it as long lifetime variable.
>>> Especially 'struct srcu_struct' has already been too fat.
>>
>> Stack is not suitable, stack can not provide so many space for
> 
> Stack is fine if number of NS is small, for example, most of times,
> there is just one NS.

I prefer a single code-path, so stack is not good.

>> many name space. Heap maybe a choice, but need to add abnormal treat
>> when alloc memory failed, Thus io pause time can not be ensured.
>> So the extra space may must be needed for batch srcu sync.
> 
> In case of allocation failure, you can switch to synchronize_srcu() simply.

that is possible, but I still prefer to support both srcu and rcu hctx
in a single interface so I don't need to have different code paths in
nvme (or other drivers).

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 18:36         ` Sagi Grimberg
@ 2020-07-27 20:37           ` Jens Axboe
  2020-07-27 21:00             ` Sagi Grimberg
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2020-07-27 20:37 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Keith Busch, linux-block, Christoph Hellwig, linux-nvme, Chao Leng

On 7/27/20 12:36 PM, Sagi Grimberg wrote:
> 
>>>>> +void blk_mq_quiesce_queue_async(struct request_queue *q)
>>>>> +{
>>>>> +	struct blk_mq_hw_ctx *hctx;
>>>>> +	unsigned int i;
>>>>> +
>>>>> +	blk_mq_quiesce_queue_nowait(q);
>>>>> +
>>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>>> +		init_completion(&hctx->rcu_sync.completion);
>>>>> +		init_rcu_head(&hctx->rcu_sync.head);
>>>>> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
>>>>> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
>>>>> +				wakeme_after_rcu);
>>>>> +		else
>>>>> +			call_rcu(&hctx->rcu_sync.head,
>>>>> +				wakeme_after_rcu);
>>>>> +	}
>>>>
>>>> Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
>>>> synchronize_rcu() is OK for all hctx during waiting.
>>>
>>> That's true, but I want a single interface for both. v2 had exactly
>>> that, but I decided that this approach is better.
>>
>> Not sure one new interface is needed, and one simple way is to:
>>
>> 1) call blk_mq_quiesce_queue_nowait() for each request queue
>>
>> 2) wait in driver specific way
>>
>> Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
>> then you may add per-tagset APIs for the waiting.
> 
> Because it puts assumptions on how quiesce works, which is something
> I'd like to avoid because I think its cleaner, what do others think?
> Jens? Christoph?

I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
call that.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 20:37           ` Jens Axboe
@ 2020-07-27 21:00             ` Sagi Grimberg
  2020-07-27 21:05               ` Jens Axboe
  2020-07-28  1:09               ` Ming Lei
  0 siblings, 2 replies; 19+ messages in thread
From: Sagi Grimberg @ 2020-07-27 21:00 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei
  Cc: Keith Busch, linux-block, Christoph Hellwig, linux-nvme, Chao Leng


>>>>>> +void blk_mq_quiesce_queue_async(struct request_queue *q)
>>>>>> +{
>>>>>> +	struct blk_mq_hw_ctx *hctx;
>>>>>> +	unsigned int i;
>>>>>> +
>>>>>> +	blk_mq_quiesce_queue_nowait(q);
>>>>>> +
>>>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>>>> +		init_completion(&hctx->rcu_sync.completion);
>>>>>> +		init_rcu_head(&hctx->rcu_sync.head);
>>>>>> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
>>>>>> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
>>>>>> +				wakeme_after_rcu);
>>>>>> +		else
>>>>>> +			call_rcu(&hctx->rcu_sync.head,
>>>>>> +				wakeme_after_rcu);
>>>>>> +	}
>>>>>
>>>>> Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
>>>>> synchronize_rcu() is OK for all hctx during waiting.
>>>>
>>>> That's true, but I want a single interface for both. v2 had exactly
>>>> that, but I decided that this approach is better.
>>>
>>> Not sure one new interface is needed, and one simple way is to:
>>>
>>> 1) call blk_mq_quiesce_queue_nowait() for each request queue
>>>
>>> 2) wait in driver specific way
>>>
>>> Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
>>> then you may add per-tagset APIs for the waiting.
>>
>> Because it puts assumptions on how quiesce works, which is something
>> I'd like to avoid because I think its cleaner, what do others think?
>> Jens? Christoph?
> 
> I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
> call that.

I agree with this approach as well.

Jens, this mean that we use the call_rcu mechanism also for non-blocking
hctxs, because the caller  will call it for multiple request queues (see
patch 2) and we don't want to call synchronize_rcu for every request
queue serially, we want it to happen in parallel.

Which leaves us with the patchset as it is, just to convert the
rcu_synchronize structure to be dynamically allocated on the heap
rather than keeping it statically allocated in the hctx.

This is how it looks:
--
diff --git a/block/blk-mq.c b/block/blk-mq.c
index abcf590f6238..d913924117d2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,6 +209,52 @@ void blk_mq_quiesce_queue_nowait(struct 
request_queue *q)
  }
  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);

+void blk_mq_quiesce_queue_async(struct request_queue *q)
+{
+       struct blk_mq_hw_ctx *hctx;
+       unsigned int i;
+       int rcu = false;
+
+       blk_mq_quiesce_queue_nowait(q);
+
+       queue_for_each_hw_ctx(q, hctx, i) {
+               hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync), 
GFP_KERNEL);
+               if (!hctx->rcu_sync) {
+                       /* fallback to serial rcu sync */
+                       if (hctx->flags & BLK_MQ_F_BLOCKING)
+                               synchronize_srcu(hctx->srcu);
+                       else
+                               rcu = true;
+               } else {
+                       init_completion(&hctx->rcu_sync->completion);
+                       init_rcu_head(&hctx->rcu_sync->head);
+                       if (hctx->flags & BLK_MQ_F_BLOCKING)
+                               call_srcu(hctx->srcu, &hctx->rcu_sync->head,
+                                       wakeme_after_rcu);
+                       else
+                               call_rcu(&hctx->rcu_sync->head,
+                                       wakeme_after_rcu);
+               }
+       }
+       if (rcu)
+               synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
+
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
+{
+       struct blk_mq_hw_ctx *hctx;
+       unsigned int i;
+
+       queue_for_each_hw_ctx(q, hctx, i) {
+               if (!hctx->rcu_sync)
+                       continue;
+               wait_for_completion(&hctx->rcu_sync->completion);
+               destroy_rcu_head(&hctx->rcu_sync->head);
+       }
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
+
  /**
   * blk_mq_quiesce_queue() - wait until all ongoing dispatches have 
finished
   * @q: request queue.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 23230c1d031e..7213ce56bb31 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -5,6 +5,7 @@
  #include <linux/blkdev.h>
  #include <linux/sbitmap.h>
  #include <linux/srcu.h>
+#include <linux/rcupdate_wait.h>

  struct blk_mq_tags;
  struct blk_flush_queue;
@@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
          */
         struct list_head        hctx_list;

+       struct rcu_synchronize  *rcu_sync;
         /**
          * @srcu: Sleepable RCU. Use as lock when type of the hardware 
queue is
          * blocking (BLK_MQ_F_BLOCKING). Must be the last member - see also
@@ -532,6 +534,8 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap);
  void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int 
nr_hw_queues);

  void blk_mq_quiesce_queue_nowait(struct request_queue *q);
+void blk_mq_quiesce_queue_async(struct request_queue *q);
+void blk_mq_quiesce_queue_async_wait(struct request_queue *q);

  unsigned int blk_mq_rq_cpu(struct request *rq);
--

and in nvme:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 05aa568a60af..e8cc728dee46 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4561,7 +4561,9 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)

         down_read(&ctrl->namespaces_rwsem);
         list_for_each_entry(ns, &ctrl->namespaces, list)
-               blk_mq_quiesce_queue(ns->queue);
+               blk_mq_quiesce_queue_async(ns->queue);
+       list_for_each_entry(ns, &ctrl->namespaces, list)
+               blk_mq_quiesce_queue_async_wait(ns->queue);
         up_read(&ctrl->namespaces_rwsem);
  }
  EXPORT_SYMBOL_GPL(nvme_stop_queues);
--

Agreed on this?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 21:00             ` Sagi Grimberg
@ 2020-07-27 21:05               ` Jens Axboe
  2020-07-27 21:21                 ` Keith Busch
  2020-07-28  1:09               ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2020-07-27 21:05 UTC (permalink / raw)
  To: Sagi Grimberg, Ming Lei
  Cc: Keith Busch, linux-block, Christoph Hellwig, linux-nvme, Chao Leng

On 7/27/20 3:00 PM, Sagi Grimberg wrote:
> 
>>>>>>> +void blk_mq_quiesce_queue_async(struct request_queue *q)
>>>>>>> +{
>>>>>>> +	struct blk_mq_hw_ctx *hctx;
>>>>>>> +	unsigned int i;
>>>>>>> +
>>>>>>> +	blk_mq_quiesce_queue_nowait(q);
>>>>>>> +
>>>>>>> +	queue_for_each_hw_ctx(q, hctx, i) {
>>>>>>> +		init_completion(&hctx->rcu_sync.completion);
>>>>>>> +		init_rcu_head(&hctx->rcu_sync.head);
>>>>>>> +		if (hctx->flags & BLK_MQ_F_BLOCKING)
>>>>>>> +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
>>>>>>> +				wakeme_after_rcu);
>>>>>>> +		else
>>>>>>> +			call_rcu(&hctx->rcu_sync.head,
>>>>>>> +				wakeme_after_rcu);
>>>>>>> +	}
>>>>>>
>>>>>> Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
>>>>>> synchronize_rcu() is OK for all hctx during waiting.
>>>>>
>>>>> That's true, but I want a single interface for both. v2 had exactly
>>>>> that, but I decided that this approach is better.
>>>>
>>>> Not sure one new interface is needed, and one simple way is to:
>>>>
>>>> 1) call blk_mq_quiesce_queue_nowait() for each request queue
>>>>
>>>> 2) wait in driver specific way
>>>>
>>>> Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
>>>> then you may add per-tagset APIs for the waiting.
>>>
>>> Because it puts assumptions on how quiesce works, which is something
>>> I'd like to avoid because I think its cleaner, what do others think?
>>> Jens? Christoph?
>>
>> I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
>> call that.
> 
> I agree with this approach as well.
> 
> Jens, this mean that we use the call_rcu mechanism also for non-blocking
> hctxs, because the caller  will call it for multiple request queues (see
> patch 2) and we don't want to call synchronize_rcu for every request
> queue serially, we want it to happen in parallel.
> 
> Which leaves us with the patchset as it is, just to convert the
> rcu_synchronize structure to be dynamically allocated on the heap
> rather than keeping it statically allocated in the hctx.
> 
> This is how it looks:

OK, so maybe I'm not fully up-to-date on the thread, but why aren't we
just having a blk_mq_quiesce_queue_wait() that does something ala:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 667155f752f7..b4ceaaed2f9c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,32 +209,37 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
-/**
- * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
- * @q: request queue.
- *
- * Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Once this function is returned, we make
- * sure no dispatch can happen until the queue is unquiesced via
- * blk_mq_unquiesce_queue().
- */
-void blk_mq_quiesce_queue(struct request_queue *q)
+void blk_mq_quiesce_queue_wait(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int i;
 	bool rcu = false;
 
-	blk_mq_quiesce_queue_nowait(q);
-
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->flags & BLK_MQ_F_BLOCKING)
 			synchronize_srcu(hctx->srcu);
 		else
 			rcu = true;
 	}
+
 	if (rcu)
 		synchronize_rcu();
 }
+
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatch can happen until the queue is unquiesced via
+ * blk_mq_unquiesce_queue().
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	blk_mq_quiesce_queue_nowait(q);
+	blk_mq_quiesce_queue_wait(q);
+}
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
 /*

Do we care about BLK_MQ_F_BLOCKING runtime at all?

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 21:05               ` Jens Axboe
@ 2020-07-27 21:21                 ` Keith Busch
  2020-07-27 21:30                   ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Keith Busch @ 2020-07-27 21:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sagi Grimberg, linux-nvme, Ming Lei, linux-block, Chao Leng,
	Christoph Hellwig

On Mon, Jul 27, 2020 at 03:05:40PM -0600, Jens Axboe wrote:
> +void blk_mq_quiesce_queue_wait(struct request_queue *q)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	unsigned int i;
>  	bool rcu = false;
>  
> -	blk_mq_quiesce_queue_nowait(q);
> -
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		if (hctx->flags & BLK_MQ_F_BLOCKING)
>  			synchronize_srcu(hctx->srcu);
>  		else
>  			rcu = true;
>  	}
> +
>  	if (rcu)
>  		synchronize_rcu();
>  }

Either all the hctx's are blocking or none of them are: we don't need to
iterate the hctx's to see which sync method to use. We can add at the
very beginning (and get rid of 'bool rcu'):

	if (!(q->tag_set->flags & BLK_MQ_F_BLOCKING)) {
		synchronize_rcu();
		return;
	}


But the issue Sagi is trying to address is quiescing a lot
request queues sharing a tagset where synchronize_rcu() is too time
consuming to do repeatedly. He wants to synchrnoize once for the entire
tagset rather than per-request_queue, so I think he needs an API taking
a 'struct blk_mq_tag_set' instead of a 'struct request_queue'.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 21:21                 ` Keith Busch
@ 2020-07-27 21:30                   ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2020-07-27 21:30 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, linux-nvme, Ming Lei, linux-block, Chao Leng,
	Christoph Hellwig

On 7/27/20 3:21 PM, Keith Busch wrote:
> On Mon, Jul 27, 2020 at 03:05:40PM -0600, Jens Axboe wrote:
>> +void blk_mq_quiesce_queue_wait(struct request_queue *q)
>>  {
>>  	struct blk_mq_hw_ctx *hctx;
>>  	unsigned int i;
>>  	bool rcu = false;
>>  
>> -	blk_mq_quiesce_queue_nowait(q);
>> -
>>  	queue_for_each_hw_ctx(q, hctx, i) {
>>  		if (hctx->flags & BLK_MQ_F_BLOCKING)
>>  			synchronize_srcu(hctx->srcu);
>>  		else
>>  			rcu = true;
>>  	}
>> +
>>  	if (rcu)
>>  		synchronize_rcu();
>>  }
> 
> Either all the hctx's are blocking or none of them are: we don't need to
> iterate the hctx's to see which sync method to use. We can add at the
> very beginning (and get rid of 'bool rcu'):
> 
> 	if (!(q->tag_set->flags & BLK_MQ_F_BLOCKING)) {
> 		synchronize_rcu();
> 		return;
> 	}

Agree, was just copy/pasting the existing code.

> But the issue Sagi is trying to address is quiescing a lot
> request queues sharing a tagset where synchronize_rcu() is too time
> consuming to do repeatedly. He wants to synchrnoize once for the entire
> tagset rather than per-request_queue, so I think he needs an API taking
> a 'struct blk_mq_tag_set' instead of a 'struct request_queue'.

Gotcha, yeah that won't work for multiple queues obviously.

Are all these queues sharing a tag set? If so, yes that seems like the
right abstraction. And the pointer addition is a much better idea than
including a full srcu/rcu struct.

-- 
Jens Axboe


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] blk-mq: add async quiesce interface
  2020-07-27 21:00             ` Sagi Grimberg
  2020-07-27 21:05               ` Jens Axboe
@ 2020-07-28  1:09               ` Ming Lei
  1 sibling, 0 replies; 19+ messages in thread
From: Ming Lei @ 2020-07-28  1:09 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jens Axboe, linux-nvme, linux-block, Chao Leng, Keith Busch,
	Christoph Hellwig

On Mon, Jul 27, 2020 at 02:00:15PM -0700, Sagi Grimberg wrote:
> 
> > > > > > > +void blk_mq_quiesce_queue_async(struct request_queue *q)
> > > > > > > +{
> > > > > > > +	struct blk_mq_hw_ctx *hctx;
> > > > > > > +	unsigned int i;
> > > > > > > +
> > > > > > > +	blk_mq_quiesce_queue_nowait(q);
> > > > > > > +
> > > > > > > +	queue_for_each_hw_ctx(q, hctx, i) {
> > > > > > > +		init_completion(&hctx->rcu_sync.completion);
> > > > > > > +		init_rcu_head(&hctx->rcu_sync.head);
> > > > > > > +		if (hctx->flags & BLK_MQ_F_BLOCKING)
> > > > > > > +			call_srcu(hctx->srcu, &hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +		else
> > > > > > > +			call_rcu(&hctx->rcu_sync.head,
> > > > > > > +				wakeme_after_rcu);
> > > > > > > +	}
> > > > > > 
> > > > > > Looks not necessary to do anything in case of !BLK_MQ_F_BLOCKING, and single
> > > > > > synchronize_rcu() is OK for all hctx during waiting.
> > > > > 
> > > > > That's true, but I want a single interface for both. v2 had exactly
> > > > > that, but I decided that this approach is better.
> > > > 
> > > > Not sure one new interface is needed, and one simple way is to:
> > > > 
> > > > 1) call blk_mq_quiesce_queue_nowait() for each request queue
> > > > 
> > > > 2) wait in driver specific way
> > > > 
> > > > Or just wondering why nvme doesn't use set->tag_list to retrieve NS,
> > > > then you may add per-tagset APIs for the waiting.
> > > 
> > > Because it puts assumptions on how quiesce works, which is something
> > > I'd like to avoid because I think its cleaner, what do others think?
> > > Jens? Christoph?
> > 
> > I'd prefer to have it in a helper, and just have blk_mq_quiesce_queue()
> > call that.
> 
> I agree with this approach as well.
> 
> Jens, this mean that we use the call_rcu mechanism also for non-blocking
> hctxs, because the caller  will call it for multiple request queues (see
> patch 2) and we don't want to call synchronize_rcu for every request
> queue serially, we want it to happen in parallel.
> 
> Which leaves us with the patchset as it is, just to convert the
> rcu_synchronize structure to be dynamically allocated on the heap
> rather than keeping it statically allocated in the hctx.
> 
> This is how it looks:
> --
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index abcf590f6238..d913924117d2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -209,6 +209,52 @@ void blk_mq_quiesce_queue_nowait(struct request_queue
> *q)
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
> 
> +void blk_mq_quiesce_queue_async(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +       int rcu = false;
> +
> +       blk_mq_quiesce_queue_nowait(q);
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               hctx->rcu_sync = kmalloc(sizeof(*hctx->rcu_sync),
> GFP_KERNEL);
> +               if (!hctx->rcu_sync) {
> +                       /* fallback to serial rcu sync */
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               synchronize_srcu(hctx->srcu);
> +                       else
> +                               rcu = true;
> +               } else {
> +                       init_completion(&hctx->rcu_sync->completion);
> +                       init_rcu_head(&hctx->rcu_sync->head);
> +                       if (hctx->flags & BLK_MQ_F_BLOCKING)
> +                               call_srcu(hctx->srcu, &hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +                       else
> +                               call_rcu(&hctx->rcu_sync->head,
> +                                       wakeme_after_rcu);
> +               }
> +       }
> +       if (rcu)
> +               synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async);
> +
> +void blk_mq_quiesce_queue_async_wait(struct request_queue *q)
> +{
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int i;
> +
> +       queue_for_each_hw_ctx(q, hctx, i) {
> +               if (!hctx->rcu_sync)
> +                       continue;
> +               wait_for_completion(&hctx->rcu_sync->completion);
> +               destroy_rcu_head(&hctx->rcu_sync->head);
> +       }
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_async_wait);
> +
>  /**
>   * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
>   * @q: request queue.
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 23230c1d031e..7213ce56bb31 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -5,6 +5,7 @@
>  #include <linux/blkdev.h>
>  #include <linux/sbitmap.h>
>  #include <linux/srcu.h>
> +#include <linux/rcupdate_wait.h>
> 
>  struct blk_mq_tags;
>  struct blk_flush_queue;
> @@ -170,6 +171,7 @@ struct blk_mq_hw_ctx {
>          */
>         struct list_head        hctx_list;
> 
> +       struct rcu_synchronize  *rcu_sync;

The above pointer needn't to be added to blk_mq_hw_ctx, and it can be
allocated on heap and passed to the waiting helper.


Thanks,
Ming


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-07-28  1:09 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-26  0:22 [PATCH v3 0/2] improve quiesce time for large amount of namespaces Sagi Grimberg
2020-07-26  0:23 ` [PATCH v3 1/2] blk-mq: add async quiesce interface Sagi Grimberg
2020-07-26  9:31   ` Ming Lei
2020-07-26 16:27     ` Sagi Grimberg
2020-07-27  2:08       ` Ming Lei
2020-07-27  3:33         ` Chao Leng
2020-07-27  3:50           ` Ming Lei
2020-07-27  5:55             ` Chao Leng
2020-07-27  6:32               ` Ming Lei
2020-07-27 18:40                 ` Sagi Grimberg
2020-07-27 18:38             ` Sagi Grimberg
2020-07-27 18:36         ` Sagi Grimberg
2020-07-27 20:37           ` Jens Axboe
2020-07-27 21:00             ` Sagi Grimberg
2020-07-27 21:05               ` Jens Axboe
2020-07-27 21:21                 ` Keith Busch
2020-07-27 21:30                   ` Jens Axboe
2020-07-28  1:09               ` Ming Lei
2020-07-26  0:23 ` [PATCH v3 2/2] nvme: improve quiesce time for large amount of namespaces Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).