All of lore.kernel.org
 help / color / mirror / Atom feed
* per-tagset SRCU struct and quiesce
@ 2022-10-20 10:56 Christoph Hellwig
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Hi all,

this series moves the SRCU struct used for quiescing to the tag_set
as the SRCU critical sections for dispatch take about the same time
on all queues anyway, and then adopts the series from Chao that provides
tagset-wide quiesce to use that infrastructure.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
                     ` (4 more replies)
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 5 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

nvme and xen-blkfront are already doing this to stop buffered writes from
creating dirty pages that can't be written out later.  Move it to the
common code.  Note that this follows the xen-blkfront version that does
not send and uevent as the uevent is a bit confusing when the device is
about to go away a little later, and the the size change is just to stop
buffered writes faster.

This also removes the comment about the ordering from nvme, as bd_mutex
not only is gone entirely, but also hasn't been used for locking updates
to the disk size long before that, and thus the ordering requirement
documented there doesn't apply any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c                | 3 +++
 drivers/block/xen-blkfront.c | 1 -
 drivers/nvme/host/core.c     | 7 +------
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 17b33c62423df..2877b5f905579 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -555,6 +555,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
 {
 	set_bit(GD_DEAD, &disk->state);
 	blk_queue_start_drain(disk->queue);
+
+	/* stop buffered writers from dirtying pages that can't written out */
+	set_capacity(disk, 0);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 35b9bcad9db90..b28489290323f 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2129,7 +2129,6 @@ static void blkfront_closing(struct blkfront_info *info)
 	if (info->rq && info->gd) {
 		blk_mq_stop_hw_queues(info->rq);
 		blk_mark_disk_dead(info->gd);
-		set_capacity(info->gd, 0);
 	}
 
 	for_each_rinfo(info, rinfo, i) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 059737c1a2c19..44a5321743128 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5106,10 +5106,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 /*
  * Prepare a queue for teardown.
  *
- * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
- * the capacity to 0 after that to avoid blocking dispatchers that may be
- * holding bd_butex.  This will end buffered writers dirtying pages that can't
- * be synced.
+ * This must forcibly unquiesce queues to avoid blocking dispatch.
  */
 static void nvme_set_queue_dying(struct nvme_ns *ns)
 {
@@ -5118,8 +5115,6 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
 
 	blk_mark_disk_dead(ns->disk);
 	nvme_start_ns_queue(ns);
-
-	set_capacity_and_notify(ns->disk, 0);
 }
 
 /**
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
                     ` (3 more replies)
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 4 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

For submit_bio based queues there is no (S)RCU critical section during
I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
so skip doing any synchronization.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 33292c01875d5..df967c8af9fee 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
 void blk_mq_quiesce_queue(struct request_queue *q)
 {
 	blk_mq_quiesce_queue_nowait(q);
-	blk_mq_wait_quiesce_done(q);
+	/* nothing to wait for non-mq queues */
+	if (queue_is_mq(q))
+		blk_mq_wait_quiesce_done(q);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
                     ` (5 more replies)
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 6 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

All I/O submissions have fairly similar latencies, and a tagset-wide
quiesce is a fairly common operation.  Becuase there are a lot less
tagsets there is also no need for the variable size allocation trick.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       | 27 +++++----------------------
 block/blk-mq.c         | 25 +++++++++++++++++--------
 block/blk-mq.h         | 14 +++++++-------
 block/blk-sysfs.c      |  9 ++-------
 block/blk.h            |  9 +--------
 block/genhd.c          |  2 +-
 include/linux/blk-mq.h |  4 ++++
 include/linux/blkdev.h |  9 ---------
 8 files changed, 37 insertions(+), 62 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 17667159482e0..3a2ed8dadf738 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -65,7 +65,6 @@ DEFINE_IDA(blk_queue_ida);
  * For queue allocation
  */
 struct kmem_cache *blk_requestq_cachep;
-struct kmem_cache *blk_requestq_srcu_cachep;
 
 /*
  * Controlling structure to kblockd
@@ -373,26 +372,20 @@ static void blk_timeout_work(struct work_struct *work)
 {
 }
 
-struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
+struct request_queue *blk_alloc_queue(int node_id)
 {
 	struct request_queue *q;
 
-	q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
-			GFP_KERNEL | __GFP_ZERO, node_id);
+	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
+				  node_id);
 	if (!q)
 		return NULL;
 
-	if (alloc_srcu) {
-		blk_queue_flag_set(QUEUE_FLAG_HAS_SRCU, q);
-		if (init_srcu_struct(q->srcu) != 0)
-			goto fail_q;
-	}
-
 	q->last_merge = NULL;
 
 	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
 	if (q->id < 0)
-		goto fail_srcu;
+		goto fail_q;
 
 	q->stats = blk_alloc_queue_stats();
 	if (!q->stats)
@@ -435,11 +428,8 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
 	blk_free_queue_stats(q->stats);
 fail_id:
 	ida_free(&blk_queue_ida, q->id);
-fail_srcu:
-	if (alloc_srcu)
-		cleanup_srcu_struct(q->srcu);
 fail_q:
-	kmem_cache_free(blk_get_queue_kmem_cache(alloc_srcu), q);
+	kmem_cache_free(blk_requestq_cachep, q);
 	return NULL;
 }
 
@@ -1184,9 +1174,6 @@ int __init blk_dev_init(void)
 			sizeof_field(struct request, cmd_flags));
 	BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 *
 			sizeof_field(struct bio, bi_opf));
-	BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu),
-			   __alignof__(struct request_queue)) !=
-		     sizeof(struct request_queue));
 
 	/* used for unplugging and affects IO latency/throughput - HIGHPRI */
 	kblockd_workqueue = alloc_workqueue("kblockd",
@@ -1197,10 +1184,6 @@ int __init blk_dev_init(void)
 	blk_requestq_cachep = kmem_cache_create("request_queue",
 			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
 
-	blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu",
-			sizeof(struct request_queue) +
-			sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL);
-
 	blk_debugfs_root = debugfs_create_dir("block", NULL);
 
 	return 0;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df967c8af9fee..4a81a2da43328 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
  */
 void blk_mq_wait_quiesce_done(struct request_queue *q)
 {
-	if (blk_queue_has_srcu(q))
-		synchronize_srcu(q->srcu);
+	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
+		synchronize_srcu(&q->tag_set->srcu);
 	else
 		synchronize_rcu();
 }
@@ -3971,7 +3971,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
 	struct request_queue *q;
 	int ret;
 
-	q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING);
+	q = blk_alloc_queue(set->numa_node);
 	if (!q)
 		return ERR_PTR(-ENOMEM);
 	q->queuedata = queuedata;
@@ -4138,9 +4138,6 @@ static void blk_mq_update_poll_flag(struct request_queue *q)
 int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		struct request_queue *q)
 {
-	WARN_ON_ONCE(blk_queue_has_srcu(q) !=
-			!!(set->flags & BLK_MQ_F_BLOCKING));
-
 	/* mark the queue as mq asap */
 	q->mq_ops = set->ops;
 
@@ -4398,9 +4395,16 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	 */
 	if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
 		set->nr_hw_queues = nr_cpu_ids;
+ 
+	if (set->flags & BLK_MQ_F_BLOCKING) {
+		ret = init_srcu_struct(&set->srcu);
+		if (ret)
+			return ret;
+	}
 
-	if (blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues) < 0)
-		return -ENOMEM;
+	ret = blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues);
+	if (ret)
+		goto out_free_srcu;
 
 	ret = -ENOMEM;
 	for (i = 0; i < set->nr_maps; i++) {
@@ -4430,6 +4434,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	}
 	kfree(set->tags);
 	set->tags = NULL;
+out_free_srcu:
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&set->srcu);
 	return ret;
 }
 EXPORT_SYMBOL(blk_mq_alloc_tag_set);
@@ -4469,6 +4476,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 
 	kfree(set->tags);
 	set->tags = NULL;
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&set->srcu);
 }
 EXPORT_SYMBOL(blk_mq_free_tag_set);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 0b2870839cdd6..06eb46d1d7a76 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -377,17 +377,17 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 /* run the code block in @dispatch_ops with rcu/srcu read lock held */
 #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)	\
 do {								\
-	if (!blk_queue_has_srcu(q)) {				\
-		rcu_read_lock();				\
-		(dispatch_ops);					\
-		rcu_read_unlock();				\
-	} else {						\
+	if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {		\
 		int srcu_idx;					\
 								\
 		might_sleep_if(check_sleep);			\
-		srcu_idx = srcu_read_lock((q)->srcu);		\
+		srcu_idx = srcu_read_lock(&((q)->tag_set->srcu)); \
 		(dispatch_ops);					\
-		srcu_read_unlock((q)->srcu, srcu_idx);		\
+		srcu_read_unlock(&((q)->tag_set->srcu), srcu_idx); \
+	} else {						\
+		rcu_read_lock();				\
+		(dispatch_ops);					\
+		rcu_read_unlock();				\
 	}							\
 } while (0)
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e71b3b43927c0..e7871665825a3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -739,10 +739,8 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 
 static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 {
-	struct request_queue *q = container_of(rcu_head, struct request_queue,
-					       rcu_head);
-
-	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
+	kmem_cache_free(blk_requestq_cachep,
+			container_of(rcu_head, struct request_queue, rcu_head));
 }
 
 /**
@@ -779,9 +777,6 @@ static void blk_release_queue(struct kobject *kobj)
 	if (queue_is_mq(q))
 		blk_mq_release(q);
 
-	if (blk_queue_has_srcu(q))
-		cleanup_srcu_struct(q->srcu);
-
 	ida_free(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/block/blk.h b/block/blk.h
index 5350bf363035e..b25e2d22f3725 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -27,7 +27,6 @@ struct blk_flush_queue {
 };
 
 extern struct kmem_cache *blk_requestq_cachep;
-extern struct kmem_cache *blk_requestq_srcu_cachep;
 extern struct kobj_type blk_queue_ktype;
 extern struct ida blk_queue_ida;
 
@@ -420,13 +419,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
 		struct page *page, unsigned int len, unsigned int offset,
 		unsigned int max_sectors, bool *same_page);
 
-static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu)
-{
-	if (srcu)
-		return blk_requestq_srcu_cachep;
-	return blk_requestq_cachep;
-}
-struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
+struct request_queue *blk_alloc_queue(int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
 
diff --git a/block/genhd.c b/block/genhd.c
index 2877b5f905579..fd0b13d6175a3 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1410,7 +1410,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
 	struct request_queue *q;
 	struct gendisk *disk;
 
-	q = blk_alloc_queue(node, false);
+	q = blk_alloc_queue(node);
 	if (!q)
 		return NULL;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ba18e9bdb799b..f040a7cab5dbf 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -7,6 +7,7 @@
 #include <linux/lockdep.h>
 #include <linux/scatterlist.h>
 #include <linux/prefetch.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -501,6 +502,8 @@ enum hctx_type {
  * @tag_list_lock: Serializes tag_list accesses.
  * @tag_list:	   List of the request queues that use this tag set. See also
  *		   request_queue.tag_set_list.
+ * @srcu:	   Use as lock when type of the request queue is blocking
+ *		   (BLK_MQ_F_BLOCKING). Must be the last member
  */
 struct blk_mq_tag_set {
 	struct blk_mq_queue_map	map[HCTX_MAX_TYPES];
@@ -521,6 +524,7 @@ struct blk_mq_tag_set {
 
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
+	struct srcu_struct	srcu;
 };
 
 /**
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 50e358a19d986..b15b6a011c028 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -22,7 +22,6 @@
 #include <linux/blkzoned.h>
 #include <linux/sched.h>
 #include <linux/sbitmap.h>
-#include <linux/srcu.h>
 #include <linux/uuid.h>
 #include <linux/xarray.h>
 
@@ -543,18 +542,11 @@ struct request_queue {
 	struct mutex		debugfs_mutex;
 
 	bool			mq_sysfs_init_done;
-
-	/**
-	 * @srcu: Sleepable RCU. Use as lock when type of the request queue
-	 * is blocking (BLK_MQ_F_BLOCKING). Must be the last member
-	 */
-	struct srcu_struct	srcu[];
 };
 
 /* Keep blk_queue_flag_name[] in sync with the definitions below */
 #define QUEUE_FLAG_STOPPED	0	/* queue is stopped */
 #define QUEUE_FLAG_DYING	1	/* queue being torn down */
-#define QUEUE_FLAG_HAS_SRCU	2	/* SRCU is allocated */
 #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
 #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
@@ -590,7 +582,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 
 #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
 #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
-#define blk_queue_has_srcu(q)	test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags)
 #define blk_queue_init_done(q)	test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags)
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
                     ` (4 more replies)
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 5 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Noting in blk_mq_wait_quiesce_done needs the request_queue now, so just
pass the tagset, and move the non-mq check into the only caller that
needs it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c           | 10 +++++-----
 drivers/nvme/host/core.c |  4 ++--
 drivers/scsi/scsi_lib.c  |  2 +-
 include/linux/blk-mq.h   |  2 +-
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4a81a2da43328..cf8f9f9a96c35 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -254,15 +254,15 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
 /**
  * blk_mq_wait_quiesce_done() - wait until in-progress quiesce is done
- * @q: request queue.
+ * @set: tag_set to wait on
  *
  * Note: it is driver's responsibility for making sure that quiesce has
  * been started.
  */
-void blk_mq_wait_quiesce_done(struct request_queue *q)
+void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set)
 {
-	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
-		synchronize_srcu(&q->tag_set->srcu);
+	if (set->flags & BLK_MQ_F_BLOCKING)
+		synchronize_srcu(&set->srcu);
 	else
 		synchronize_rcu();
 }
@@ -282,7 +282,7 @@ void blk_mq_quiesce_queue(struct request_queue *q)
 	blk_mq_quiesce_queue_nowait(q);
 	/* nothing to wait for non-mq queues */
 	if (queue_is_mq(q))
-		blk_mq_wait_quiesce_done(q);
+		blk_mq_wait_quiesce_done(q->tag_set);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 44a5321743128..a74212a4f1a5f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5100,7 +5100,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
 		blk_mq_quiesce_queue(ns->queue);
 	else
-		blk_mq_wait_quiesce_done(ns->queue);
+		blk_mq_wait_quiesce_done(ns->queue->tag_set);
 }
 
 /*
@@ -5216,7 +5216,7 @@ void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
 	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
 		blk_mq_quiesce_queue(ctrl->admin_q);
 	else
-		blk_mq_wait_quiesce_done(ctrl->admin_q);
+		blk_mq_wait_quiesce_done(ctrl->admin_q->tag_set);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8b89fab7c4206..249757ddd8fea 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2735,7 +2735,7 @@ static void scsi_stop_queue(struct scsi_device *sdev, bool nowait)
 			blk_mq_quiesce_queue(sdev->request_queue);
 	} else {
 		if (!nowait)
-			blk_mq_wait_quiesce_done(sdev->request_queue);
+			blk_mq_wait_quiesce_done(sdev->request_queue->tag_set);
 	}
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f040a7cab5dbf..dfd565c4fb84e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -881,7 +881,7 @@ void blk_mq_start_hw_queues(struct request_queue *q);
 void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
-void blk_mq_wait_quiesce_done(struct request_queue *q);
+void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:24   ` Sagi Grimberg
                     ` (4 more replies)
  2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 5 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

From: Chao Leng <lengchao@huawei.com>

Drivers that have shared tagsets may need to quiesce potentially a lot
of request queues that all share a single tagset (e.g. nvme). Add an
interface to quiesce all the queues on a given tagset. This interface is
useful because it can speedup the quiesce by doing it in parallel.

Because some queues should not need to be quiesced(e.g. nvme connect_q)
when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to
tagset quiesce interface to skip the queue.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: simplify for the per-tag_set srcu_struct]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c         | 25 +++++++++++++++++++++++++
 include/linux/blk-mq.h |  2 ++
 include/linux/blkdev.h |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index cf8f9f9a96c35..b0d2dd56bfdf2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -313,6 +313,31 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
+void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set)
+{
+	struct request_queue *q;
+
+	mutex_lock(&set->tag_list_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		if (!blk_queue_skip_tagset_quiesce(q))
+			blk_mq_quiesce_queue_nowait(q);
+	}
+	blk_mq_wait_quiesce_done(set);
+	mutex_unlock(&set->tag_list_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset);
+
+void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set)
+{
+	struct request_queue *q;
+
+	mutex_lock(&set->tag_list_lock);
+	list_for_each_entry(q, &set->tag_list, tag_set_list)
+		blk_mq_unquiesce_queue(q);
+	mutex_unlock(&set->tag_list_lock);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unquiesce_tagset);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index dfd565c4fb84e..35747949f7739 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -882,6 +882,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
 void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
 void blk_mq_quiesce_queue(struct request_queue *q);
 void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
+void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set);
+void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set);
 void blk_mq_unquiesce_queue(struct request_queue *q);
 void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b15b6a011c028..854b4745cdd1f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -571,6 +571,7 @@ struct request_queue {
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
+#define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
 				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
@@ -610,6 +611,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
+#define blk_queue_skip_tagset_quiesce(q) \
+	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:30   ` Sagi Grimberg
                     ` (2 more replies)
  2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 3 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

The NVME_NS_DEAD flag is only set in nvme_set_queue_dying, which is
called in a loop over all namespaces in nvme_kill_queues.  Switch it
to a controller flag checked and set outside said loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 16 +++++++---------
 drivers/nvme/host/nvme.h |  2 +-
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index a74212a4f1a5f..fa7fdb744979c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4330,7 +4330,7 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
 	int ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
 
-	if (test_bit(NVME_NS_DEAD, &ns->flags))
+	if (test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
 		goto out;
 
 	ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
@@ -4404,7 +4404,8 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 
 	down_write(&ctrl->namespaces_rwsem);
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
+		if (ns->head->ns_id > nsid ||
+		    test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
 			list_move_tail(&ns->list, &rm_list);
 	}
 	up_write(&ctrl->namespaces_rwsem);
@@ -5110,9 +5111,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
  */
 static void nvme_set_queue_dying(struct nvme_ns *ns)
 {
-	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-		return;
-
 	blk_mark_disk_dead(ns->disk);
 	nvme_start_ns_queue(ns);
 }
@@ -5129,14 +5127,14 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
 		nvme_start_admin_queue(ctrl);
 
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_set_queue_dying(ns);
-
+	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
+		list_for_each_entry(ns, &ctrl->namespaces, list)
+			nvme_set_queue_dying(ns);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee65..82989a3322130 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -237,6 +237,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_FAILFAST_EXPIRED	= 0,
 	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
 	NVME_CTRL_STARTED_ONCE		= 2,
+	NVME_CTRL_NS_DEAD     		= 3,
 };
 
 struct nvme_ctrl {
@@ -483,7 +484,6 @@ struct nvme_ns {
 	unsigned long features;
 	unsigned long flags;
 #define NVME_NS_REMOVING	0
-#define NVME_NS_DEAD     	1
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:10   ` Sagi Grimberg
                     ` (2 more replies)
  2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

This helper is pretty pointless now, and also in the way of per-tagset
quiesce.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fa7fdb744979c..0ab3a18fd9f85 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5104,17 +5104,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
 		blk_mq_wait_quiesce_done(ns->queue->tag_set);
 }
 
-/*
- * Prepare a queue for teardown.
- *
- * This must forcibly unquiesce queues to avoid blocking dispatch.
- */
-static void nvme_set_queue_dying(struct nvme_ns *ns)
-{
-	blk_mark_disk_dead(ns->disk);
-	nvme_start_ns_queue(ns);
-}
-
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -5130,10 +5119,11 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
 		nvme_start_admin_queue(ctrl);
-
 	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
-		list_for_each_entry(ns, &ctrl->namespaces, list)
-			nvme_set_queue_dying(ns);
+		list_for_each_entry(ns, &ctrl->namespaces, list) {
+			blk_mark_disk_dead(ns->disk);
+			nvme_start_ns_queue(ns);
+		}
 	}
 	up_read(&ctrl->namespaces_rwsem);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
@ 2022-10-20 10:56 ` Christoph Hellwig
  2022-10-20 13:35   ` Sagi Grimberg
                     ` (2 more replies)
  2022-10-20 13:16 ` per-tagset SRCU struct and quiesce Sagi Grimberg
  2022-10-21 18:06 ` Keith Busch
  9 siblings, 3 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-20 10:56 UTC (permalink / raw)
  To: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

From: Chao Leng <lengchao@huawei.com>

All controller namespaces share the same tagset, so we can use this
interface which does the optimal operation for parallel quiesce based on
the tagset type(e.g. blocking tagsets and non-blocking tagsets).

nvme connect_q should not be quiesced when quiesce tagset, so set the
QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.

Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
In addition, we never really quiesce a single namespace. It is a better
choice to move the flag from ns to ctrl.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: rebased on top of prep patches]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 38 +++++++++-----------------------------
 drivers/nvme/host/nvme.h |  2 +-
 2 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0ab3a18fd9f85..cc71f1001144f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4891,6 +4891,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 			ret = PTR_ERR(ctrl->connect_q);
 			goto out_free_tag_set;
 		}
+		blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q);
 	}
 
 	ctrl->tagset = set;
@@ -5090,20 +5091,6 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 }
 EXPORT_SYMBOL_GPL(nvme_init_ctrl);
 
-static void nvme_start_ns_queue(struct nvme_ns *ns)
-{
-	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
-		blk_mq_unquiesce_queue(ns->queue);
-}
-
-static void nvme_stop_ns_queue(struct nvme_ns *ns)
-{
-	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
-		blk_mq_quiesce_queue(ns->queue);
-	else
-		blk_mq_wait_quiesce_done(ns->queue->tag_set);
-}
-
 /**
  * nvme_kill_queues(): Ends all namespace queues
  * @ctrl: the dead controller that needs to end
@@ -5120,10 +5107,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
 		nvme_start_admin_queue(ctrl);
 	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
-		list_for_each_entry(ns, &ctrl->namespaces, list) {
+		list_for_each_entry(ns, &ctrl->namespaces, list)
 			blk_mark_disk_dead(ns->disk);
-			nvme_start_ns_queue(ns);
-		}
+		nvme_start_queues(ctrl);
 	}
 	up_read(&ctrl->namespaces_rwsem);
 }
@@ -5179,23 +5165,17 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze);
 
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_stop_ns_queue(ns);
-	up_read(&ctrl->namespaces_rwsem);
+	if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags))
+		blk_mq_quiesce_tagset(ctrl->tagset);
+	else
+		blk_mq_wait_quiesce_done(ctrl->tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
 
 void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
-	struct nvme_ns *ns;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		nvme_start_ns_queue(ns);
-	up_read(&ctrl->namespaces_rwsem);
+	if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags))
+		blk_mq_unquiesce_tagset(ctrl->tagset);
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 82989a3322130..54d8127fb6dc5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -238,6 +238,7 @@ enum nvme_ctrl_flags {
 	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
 	NVME_CTRL_STARTED_ONCE		= 2,
 	NVME_CTRL_NS_DEAD     		= 3,
+	NVME_CTRL_STOPPED		= 4,
 };
 
 struct nvme_ctrl {
@@ -487,7 +488,6 @@ struct nvme_ns {
 #define NVME_NS_ANA_PENDING	2
 #define NVME_NS_FORCE_RO	3
 #define NVME_NS_READY		4
-#define NVME_NS_STOPPED		5
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
@ 2022-10-20 13:10   ` Sagi Grimberg
  2022-10-21 13:29     ` Christoph Hellwig
  2022-10-21  2:50   ` Chao Leng
  2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 1 reply; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:10 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block


> This helper is pretty pointless now, and also in the way of per-tagset
> quiesce.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 18 ++++--------------
>   1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fa7fdb744979c..0ab3a18fd9f85 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5104,17 +5104,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   		blk_mq_wait_quiesce_done(ns->queue->tag_set);
>   }
>   
> -/*
> - * Prepare a queue for teardown.
> - *
> - * This must forcibly unquiesce queues to avoid blocking dispatch.
> - */
> -static void nvme_set_queue_dying(struct nvme_ns *ns)
> -{
> -	blk_mark_disk_dead(ns->disk);
> -	nvme_start_ns_queue(ns);
> -}
> -
>   /**
>    * nvme_kill_queues(): Ends all namespace queues
>    * @ctrl: the dead controller that needs to end
> @@ -5130,10 +5119,11 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	/* Forcibly unquiesce queues to avoid blocking dispatch */
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
> -
>   	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> -		list_for_each_entry(ns, &ctrl->namespaces, list)
> -			nvme_set_queue_dying(ns);
> +		list_for_each_entry(ns, &ctrl->namespaces, list) {
> +			blk_mark_disk_dead(ns->disk);
> +			nvme_start_ns_queue(ns);
> +		}

I have to say that I always found nvme_kill_queues interface somewhat
odd. its a core function that unquiesces the admin/io queues
assuming that they were stopped at some point by the driver.

If now there is no dependency between unquiesce and blk_mark_disk_dead,
maybe it would be a good idea to move the unquiescing to the drivers
which can pair with the quiesce itself, and rename it to
nvme_mark_namespaces_dead() or something?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: per-tagset SRCU struct and quiesce
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
@ 2022-10-20 13:16 ` Sagi Grimberg
  2022-10-21 18:06 ` Keith Busch
  9 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block


> Hi all,
> 
> this series moves the SRCU struct used for quiescing to the tag_set
> as the SRCU critical sections for dispatch take about the same time
> on all queues anyway, and then adopts the series from Chao that provides
> tagset-wide quiesce to use that infrastructure.

Looks nice, should be easy enough to modify scsi_host_block() to
use this as well.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
@ 2022-10-20 13:16   ` Sagi Grimberg
  2022-10-21  1:09   ` Ming Lei
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
@ 2022-10-20 13:16   ` Sagi Grimberg
  2022-10-21  1:13   ` Ming Lei
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
@ 2022-10-20 13:23   ` Sagi Grimberg
  2022-10-20 17:26   ` Keith Busch
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
@ 2022-10-20 13:23   ` Sagi Grimberg
  2022-10-21  1:46   ` Ming Lei
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
@ 2022-10-20 13:24   ` Sagi Grimberg
  2022-10-21  1:53   ` Ming Lei
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:24 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
@ 2022-10-20 13:30   ` Sagi Grimberg
  2022-10-21 13:28     ` Christoph Hellwig
  2022-10-21  2:49   ` Chao Leng
  2022-10-21  6:51   ` Hannes Reinecke
  2 siblings, 1 reply; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block



On 10/20/22 13:56, Christoph Hellwig wrote:
> The NVME_NS_DEAD flag is only set in nvme_set_queue_dying, which is
> called in a loop over all namespaces in nvme_kill_queues.  Switch it
> to a controller flag checked and set outside said loop.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 16 +++++++---------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a74212a4f1a5f..fa7fdb744979c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4330,7 +4330,7 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
>   {
>   	int ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
>   
> -	if (test_bit(NVME_NS_DEAD, &ns->flags))
> +	if (test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
>   		goto out;
>   
>   	ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
> @@ -4404,7 +4404,8 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   
>   	down_write(&ctrl->namespaces_rwsem);
>   	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> -		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
> +		if (ns->head->ns_id > nsid ||
> +		    test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
>   			list_move_tail(&ns->list, &rm_list);
>   	}
>   	up_write(&ctrl->namespaces_rwsem);
> @@ -5110,9 +5111,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>    */
>   static void nvme_set_queue_dying(struct nvme_ns *ns)
>   {
> -	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
> -		return;
> -
>   	blk_mark_disk_dead(ns->disk);
>   	nvme_start_ns_queue(ns);
>   }
> @@ -5129,14 +5127,14 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -
>   	/* Forcibly unquiesce queues to avoid blocking dispatch */
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
>   
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> -		nvme_set_queue_dying(ns);
> -
> +	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> +		list_for_each_entry(ns, &ctrl->namespaces, list)
> +			nvme_set_queue_dying(ns);
> +	}

Looking at it now, I'm not sure I understand the need for this flag. It
seems to make nvme_kill_queues reentrant safe, but the admin queue
unquiesce can still end up unbalanced under reentrance?

How is this not broken today (or ever since quiesce/unquiesce started
accounting)? Maybe I lost some context on the exact subtlety of how
nvme-pci uses this interface...

>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_kill_queues);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a29877217ee65..82989a3322130 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -237,6 +237,7 @@ enum nvme_ctrl_flags {
>   	NVME_CTRL_FAILFAST_EXPIRED	= 0,
>   	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
>   	NVME_CTRL_STARTED_ONCE		= 2,
> +	NVME_CTRL_NS_DEAD     		= 3,
>   };
>   
>   struct nvme_ctrl {
> @@ -483,7 +484,6 @@ struct nvme_ns {
>   	unsigned long features;
>   	unsigned long flags;
>   #define NVME_NS_REMOVING	0
> -#define NVME_NS_DEAD     	1
>   #define NVME_NS_ANA_PENDING	2
>   #define NVME_NS_FORCE_RO	3
>   #define NVME_NS_READY		4

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset
  2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
@ 2022-10-20 13:35   ` Sagi Grimberg
  2022-10-21  2:50   ` Chao Leng
  2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-20 13:35 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block



On 10/20/22 13:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> All controller namespaces share the same tagset, so we can use this
> interface which does the optimal operation for parallel quiesce based on
> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
> 
> nvme connect_q should not be quiesced when quiesce tagset, so set the
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
> 
> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
> In addition, we never really quiesce a single namespace. It is a better
> choice to move the flag from ns to ctrl.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: rebased on top of prep patches]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 38 +++++++++-----------------------------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 10 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0ab3a18fd9f85..cc71f1001144f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4891,6 +4891,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   			ret = PTR_ERR(ctrl->connect_q);
>   			goto out_free_tag_set;
>   		}
> +		blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q);
>   	}
>   
>   	ctrl->tagset = set;
> @@ -5090,20 +5091,6 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   }
>   EXPORT_SYMBOL_GPL(nvme_init_ctrl);
>   
> -static void nvme_start_ns_queue(struct nvme_ns *ns)
> -{
> -	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
> -		blk_mq_unquiesce_queue(ns->queue);
> -}
> -
> -static void nvme_stop_ns_queue(struct nvme_ns *ns)
> -{
> -	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
> -		blk_mq_quiesce_queue(ns->queue);
> -	else
> -		blk_mq_wait_quiesce_done(ns->queue->tag_set);
> -}
> -
>   /**
>    * nvme_kill_queues(): Ends all namespace queues
>    * @ctrl: the dead controller that needs to end
> @@ -5120,10 +5107,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
>   	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> -		list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		list_for_each_entry(ns, &ctrl->namespaces, list)
>   			blk_mark_disk_dead(ns->disk);
> -			nvme_start_ns_queue(ns);
> -		}
> +		nvme_start_queues(ctrl);
>   	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
> @@ -5179,23 +5165,17 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze);
>   
>   void nvme_stop_queues(struct nvme_ctrl *ctrl)
>   {
> -	struct nvme_ns *ns;
> -
> -	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> -		nvme_stop_ns_queue(ns);
> -	up_read(&ctrl->namespaces_rwsem);
> +	if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags))
> +		blk_mq_quiesce_tagset(ctrl->tagset);
> +	else
> +		blk_mq_wait_quiesce_done(ctrl->tagset);

Isn't blk_mq_quiesce_tagset already waits for the (s)rcu
grace? Is this to make a concurrent caller also wait?

I wish we would sort out all the concurrency issues in
the driver(s) instead of making core functions reentrant safe...

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
@ 2022-10-20 17:26   ` Keith Busch
  2022-10-21 13:20     ` Christoph Hellwig
  2022-10-21  1:41   ` Ming Lei
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 61+ messages in thread
From: Keith Busch @ 2022-10-20 17:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Chao Leng, Ming Lei, linux-nvme, linux-block

On Thu, Oct 20, 2022 at 12:56:03PM +0200, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.  Becuase there are a lot less

s/Becuase/Because

> @@ -501,6 +502,8 @@ enum hctx_type {
>   * @tag_list_lock: Serializes tag_list accesses.
>   * @tag_list:	   List of the request queues that use this tag set. See also
>   *		   request_queue.tag_set_list.
> + * @srcu:	   Use as lock when type of the request queue is blocking
> + *		   (BLK_MQ_F_BLOCKING). Must be the last member

Since you're not dealing with flexible arrays anymore, I don't think
srcu strictly needs to be the last member.

The code looks great, though!

Reviewed-by: Keith Busch <kbusch@kernel.org>

>   */
>  struct blk_mq_tag_set {
>  	struct blk_mq_queue_map	map[HCTX_MAX_TYPES];
> @@ -521,6 +524,7 @@ struct blk_mq_tag_set {
>  
>  	struct mutex		tag_list_lock;
>  	struct list_head	tag_list;
> +	struct srcu_struct	srcu;
>  };

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
@ 2022-10-21  1:09   ` Ming Lei
  2022-10-21 13:11     ` Christoph Hellwig
  2022-10-21  1:53   ` Chao Leng
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Ming Lei @ 2022-10-21  1:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Thu, Oct 20, 2022 at 12:56:01PM +0200, Christoph Hellwig wrote:
> nvme and xen-blkfront are already doing this to stop buffered writes from
> creating dirty pages that can't be written out later.  Move it to the
> common code.  Note that this follows the xen-blkfront version that does
> not send and uevent as the uevent is a bit confusing when the device is
> about to go away a little later, and the the size change is just to stop
> buffered writes faster.
> 
> This also removes the comment about the ordering from nvme, as bd_mutex
> not only is gone entirely, but also hasn't been used for locking updates
> to the disk size long before that, and thus the ordering requirement
> documented there doesn't apply any more.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/genhd.c                | 3 +++
>  drivers/block/xen-blkfront.c | 1 -
>  drivers/nvme/host/core.c     | 7 +------
>  3 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 17b33c62423df..2877b5f905579 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -555,6 +555,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
>  {
>  	set_bit(GD_DEAD, &disk->state);
>  	blk_queue_start_drain(disk->queue);
> +
> +	/* stop buffered writers from dirtying pages that can't written out */
> +	set_capacity(disk, 0);

The idea makes sense:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Just one small issue on mtip32xx, which may call blk_mark_disk_dead() in
irq context, and ->bd_size_lock is actually not irq safe.

But mtip32xx is already broken since blk_queue_start_drain() need mutex,
maybe mtip32xx isn't actively used at all.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
@ 2022-10-21  1:13   ` Ming Lei
  2022-10-21 13:19     ` Christoph Hellwig
  2022-10-21  2:47   ` Chao Leng
  2022-10-21  6:49   ` Hannes Reinecke
  3 siblings, 1 reply; 61+ messages in thread
From: Ming Lei @ 2022-10-21  1:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Thu, Oct 20, 2022 at 12:56:02PM +0200, Christoph Hellwig wrote:
> For submit_bio based queues there is no (S)RCU critical section during
> I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
> so skip doing any synchronization.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 33292c01875d5..df967c8af9fee 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
>  void blk_mq_quiesce_queue(struct request_queue *q)
>  {
>  	blk_mq_quiesce_queue_nowait(q);
> -	blk_mq_wait_quiesce_done(q);
> +	/* nothing to wait for non-mq queues */
> +	if (queue_is_mq(q))
> +		blk_mq_wait_quiesce_done(q);

This interface can't work as expected for bio drivers, the only user
should be del_gendisk(), but anyway the patch is fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
  2022-10-20 17:26   ` Keith Busch
@ 2022-10-21  1:41   ` Ming Lei
  2022-10-21  2:49   ` Chao Leng
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 61+ messages in thread
From: Ming Lei @ 2022-10-21  1:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Thu, Oct 20, 2022 at 12:56:03PM +0200, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.  Becuase there are a lot less
> tagsets there is also no need for the variable size allocation trick.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-core.c       | 27 +++++----------------------
>  block/blk-mq.c         | 25 +++++++++++++++++--------
>  block/blk-mq.h         | 14 +++++++-------
>  block/blk-sysfs.c      |  9 ++-------
>  block/blk.h            |  9 +--------
>  block/genhd.c          |  2 +-
>  include/linux/blk-mq.h |  4 ++++
>  include/linux/blkdev.h |  9 ---------
>  8 files changed, 37 insertions(+), 62 deletions(-)

q->tag_set is supposed to be live when calling blk_mq_quiesce_queue
and blk_mq_unquiesce_queue, especially del_gendisk() quieses request
queue, so looks fine:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
@ 2022-10-21  1:46   ` Ming Lei
  2022-10-21 13:23     ` Christoph Hellwig
  2022-10-21  2:49   ` Chao Leng
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Ming Lei @ 2022-10-21  1:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Thu, Oct 20, 2022 at 12:56:04PM +0200, Christoph Hellwig wrote:
> Noting in blk_mq_wait_quiesce_done needs the request_queue now, so just
> pass the tagset, and move the non-mq check into the only caller that
> needs it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c           | 10 +++++-----
>  drivers/nvme/host/core.c |  4 ++--
>  drivers/scsi/scsi_lib.c  |  2 +-
>  include/linux/blk-mq.h   |  2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4a81a2da43328..cf8f9f9a96c35 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -254,15 +254,15 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
>  
>  /**
>   * blk_mq_wait_quiesce_done() - wait until in-progress quiesce is done
> - * @q: request queue.
> + * @set: tag_set to wait on
>   *
>   * Note: it is driver's responsibility for making sure that quiesce has
>   * been started.
>   */
> -void blk_mq_wait_quiesce_done(struct request_queue *q)
> +void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set)

The change is fine, but the interface could confuse people, it
looks like it is waiting for whole tagset quiesced, but it needs
to mark all request queues as quiesced first, otherwise it is just
wait for one specific queue's quiesce.

So suggest to document such thing.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
  2022-10-21  1:09   ` Ming Lei
@ 2022-10-21  1:53   ` Chao Leng
  2022-10-21  6:49   ` Hannes Reinecke
  2022-10-21 21:12   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  1:53 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> nvme and xen-blkfront are already doing this to stop buffered writes from
> creating dirty pages that can't be written out later.  Move it to the
> common code.  Note that this follows the xen-blkfront version that does
> not send and uevent as the uevent is a bit confusing when the device is
> about to go away a little later, and the the size change is just to stop
> buffered writes faster.
> 
> This also removes the comment about the ordering from nvme, as bd_mutex
> not only is gone entirely, but also hasn't been used for locking updates
> to the disk size long before that, and thus the ordering requirement
> documented there doesn't apply any more.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/genhd.c                | 3 +++
>   drivers/block/xen-blkfront.c | 1 -
>   drivers/nvme/host/core.c     | 7 +------
>   3 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 17b33c62423df..2877b5f905579 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -555,6 +555,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
>   {
>   	set_bit(GD_DEAD, &disk->state);
>   	blk_queue_start_drain(disk->queue);
> +
> +	/* stop buffered writers from dirtying pages that can't written out */
> +	set_capacity(disk, 0);
>   }
>   EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>   
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 35b9bcad9db90..b28489290323f 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2129,7 +2129,6 @@ static void blkfront_closing(struct blkfront_info *info)
>   	if (info->rq && info->gd) {
>   		blk_mq_stop_hw_queues(info->rq);
>   		blk_mark_disk_dead(info->gd);
> -		set_capacity(info->gd, 0);
>   	}
>   
>   	for_each_rinfo(info, rinfo, i) {
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 059737c1a2c19..44a5321743128 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5106,10 +5106,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   /*
>    * Prepare a queue for teardown.
>    *
> - * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
> - * the capacity to 0 after that to avoid blocking dispatchers that may be
> - * holding bd_butex.  This will end buffered writers dirtying pages that can't
> - * be synced.
> + * This must forcibly unquiesce queues to avoid blocking dispatch.
>    */
>   static void nvme_set_queue_dying(struct nvme_ns *ns)
>   {
> @@ -5118,8 +5115,6 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
>   
>   	blk_mark_disk_dead(ns->disk);
>   	nvme_start_ns_queue(ns);
> -
> -	set_capacity_and_notify(ns->disk, 0);
>   }
>   
>   /**
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
  2022-10-20 13:24   ` Sagi Grimberg
@ 2022-10-21  1:53   ` Ming Lei
  2022-10-21  2:49   ` Chao Leng
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Ming Lei @ 2022-10-21  1:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Thu, Oct 20, 2022 at 12:56:05PM +0200, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> Drivers that have shared tagsets may need to quiesce potentially a lot
> of request queues that all share a single tagset (e.g. nvme). Add an
> interface to quiesce all the queues on a given tagset. This interface is
> useful because it can speedup the quiesce by doing it in parallel.
> 
> Because some queues should not need to be quiesced(e.g. nvme connect_q)
> when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to
> tagset quiesce interface to skip the queue.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: simplify for the per-tag_set srcu_struct]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
  2022-10-20 13:16   ` Sagi Grimberg
  2022-10-21  1:13   ` Ming Lei
@ 2022-10-21  2:47   ` Chao Leng
  2022-10-21  3:16     ` Chao Leng
  2022-10-21  6:49   ` Hannes Reinecke
  3 siblings, 1 reply; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:47 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block



On 2022/10/20 18:56, Christoph Hellwig wrote:
> For submit_bio based queues there is no (S)RCU critical section during
> I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
> so skip doing any synchronization.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 33292c01875d5..df967c8af9fee 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
>   void blk_mq_quiesce_queue(struct request_queue *q)
>   {
>   	blk_mq_quiesce_queue_nowait(q);
> -	blk_mq_wait_quiesce_done(q);
> +	/* nothing to wait for non-mq queues */
> +	if (queue_is_mq(q))
> +		blk_mq_wait_quiesce_done(q);
For the non-mq queues, maybe we should add a synchronize_rcu.
	if (queue_is_mq(q))
		blk_mq_wait_quiesce_done(q);
	else
		synchronize_rcu();
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>   
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
                     ` (2 preceding siblings ...)
  2022-10-21  1:41   ` Ming Lei
@ 2022-10-21  2:49   ` Chao Leng
  2022-10-21  6:50   ` Hannes Reinecke
  2022-10-21  7:16   ` Chao Leng
  5 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.  Becuase there are a lot less
> tagsets there is also no need for the variable size allocation trick.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c       | 27 +++++----------------------
>   block/blk-mq.c         | 25 +++++++++++++++++--------
>   block/blk-mq.h         | 14 +++++++-------
>   block/blk-sysfs.c      |  9 ++-------
>   block/blk.h            |  9 +--------
>   block/genhd.c          |  2 +-
>   include/linux/blk-mq.h |  4 ++++
>   include/linux/blkdev.h |  9 ---------
>   8 files changed, 37 insertions(+), 62 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 17667159482e0..3a2ed8dadf738 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -65,7 +65,6 @@ DEFINE_IDA(blk_queue_ida);
>    * For queue allocation
>    */
>   struct kmem_cache *blk_requestq_cachep;
> -struct kmem_cache *blk_requestq_srcu_cachep;
>   
>   /*
>    * Controlling structure to kblockd
> @@ -373,26 +372,20 @@ static void blk_timeout_work(struct work_struct *work)
>   {
>   }
>   
> -struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
> +struct request_queue *blk_alloc_queue(int node_id)
>   {
>   	struct request_queue *q;
>   
> -	q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
> -			GFP_KERNEL | __GFP_ZERO, node_id);
> +	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
> +				  node_id);
>   	if (!q)
>   		return NULL;
>   
> -	if (alloc_srcu) {
> -		blk_queue_flag_set(QUEUE_FLAG_HAS_SRCU, q);
> -		if (init_srcu_struct(q->srcu) != 0)
> -			goto fail_q;
> -	}
> -
>   	q->last_merge = NULL;
>   
>   	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
>   	if (q->id < 0)
> -		goto fail_srcu;
> +		goto fail_q;
>   
>   	q->stats = blk_alloc_queue_stats();
>   	if (!q->stats)
> @@ -435,11 +428,8 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
>   	blk_free_queue_stats(q->stats);
>   fail_id:
>   	ida_free(&blk_queue_ida, q->id);
> -fail_srcu:
> -	if (alloc_srcu)
> -		cleanup_srcu_struct(q->srcu);
>   fail_q:
> -	kmem_cache_free(blk_get_queue_kmem_cache(alloc_srcu), q);
> +	kmem_cache_free(blk_requestq_cachep, q);
>   	return NULL;
>   }
>   
> @@ -1184,9 +1174,6 @@ int __init blk_dev_init(void)
>   			sizeof_field(struct request, cmd_flags));
>   	BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 *
>   			sizeof_field(struct bio, bi_opf));
> -	BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu),
> -			   __alignof__(struct request_queue)) !=
> -		     sizeof(struct request_queue));
>   
>   	/* used for unplugging and affects IO latency/throughput - HIGHPRI */
>   	kblockd_workqueue = alloc_workqueue("kblockd",
> @@ -1197,10 +1184,6 @@ int __init blk_dev_init(void)
>   	blk_requestq_cachep = kmem_cache_create("request_queue",
>   			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
>   
> -	blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu",
> -			sizeof(struct request_queue) +
> -			sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL);
> -
>   	blk_debugfs_root = debugfs_create_dir("block", NULL);
>   
>   	return 0;
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df967c8af9fee..4a81a2da43328 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
>    */
>   void blk_mq_wait_quiesce_done(struct request_queue *q)
>   {
> -	if (blk_queue_has_srcu(q))
> -		synchronize_srcu(q->srcu);
> +	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
> +		synchronize_srcu(&q->tag_set->srcu);
>   	else
>   		synchronize_rcu();
>   }
> @@ -3971,7 +3971,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
>   	struct request_queue *q;
>   	int ret;
>   
> -	q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING);
> +	q = blk_alloc_queue(set->numa_node);
>   	if (!q)
>   		return ERR_PTR(-ENOMEM);
>   	q->queuedata = queuedata;
> @@ -4138,9 +4138,6 @@ static void blk_mq_update_poll_flag(struct request_queue *q)
>   int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>   		struct request_queue *q)
>   {
> -	WARN_ON_ONCE(blk_queue_has_srcu(q) !=
> -			!!(set->flags & BLK_MQ_F_BLOCKING));
> -
>   	/* mark the queue as mq asap */
>   	q->mq_ops = set->ops;
>   
> @@ -4398,9 +4395,16 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	 */
>   	if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
>   		set->nr_hw_queues = nr_cpu_ids;
> +
> +	if (set->flags & BLK_MQ_F_BLOCKING) {
> +		ret = init_srcu_struct(&set->srcu);
> +		if (ret)
> +			return ret;
> +	}
>   
> -	if (blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues) < 0)
> -		return -ENOMEM;
> +	ret = blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues);
> +	if (ret)
> +		goto out_free_srcu;
>   
>   	ret = -ENOMEM;
>   	for (i = 0; i < set->nr_maps; i++) {
> @@ -4430,6 +4434,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	}
>   	kfree(set->tags);
>   	set->tags = NULL;
> +out_free_srcu:
> +	if (set->flags & BLK_MQ_F_BLOCKING)
> +		cleanup_srcu_struct(&set->srcu);
>   	return ret;
>   }
>   EXPORT_SYMBOL(blk_mq_alloc_tag_set);
> @@ -4469,6 +4476,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>   
>   	kfree(set->tags);
>   	set->tags = NULL;
> +	if (set->flags & BLK_MQ_F_BLOCKING)
> +		cleanup_srcu_struct(&set->srcu);
>   }
>   EXPORT_SYMBOL(blk_mq_free_tag_set);
>   
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 0b2870839cdd6..06eb46d1d7a76 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -377,17 +377,17 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>   /* run the code block in @dispatch_ops with rcu/srcu read lock held */
>   #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)	\
>   do {								\
> -	if (!blk_queue_has_srcu(q)) {				\
> -		rcu_read_lock();				\
> -		(dispatch_ops);					\
> -		rcu_read_unlock();				\
> -	} else {						\
> +	if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {		\
>   		int srcu_idx;					\
>   								\
>   		might_sleep_if(check_sleep);			\
> -		srcu_idx = srcu_read_lock((q)->srcu);		\
> +		srcu_idx = srcu_read_lock(&((q)->tag_set->srcu)); \
>   		(dispatch_ops);					\
> -		srcu_read_unlock((q)->srcu, srcu_idx);		\
> +		srcu_read_unlock(&((q)->tag_set->srcu), srcu_idx); \
> +	} else {						\
> +		rcu_read_lock();				\
> +		(dispatch_ops);					\
> +		rcu_read_unlock();				\
>   	}							\
>   } while (0)
>   
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index e71b3b43927c0..e7871665825a3 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -739,10 +739,8 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
>   
>   static void blk_free_queue_rcu(struct rcu_head *rcu_head)
>   {
> -	struct request_queue *q = container_of(rcu_head, struct request_queue,
> -					       rcu_head);
> -
> -	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
> +	kmem_cache_free(blk_requestq_cachep,
> +			container_of(rcu_head, struct request_queue, rcu_head));
>   }
>   
>   /**
> @@ -779,9 +777,6 @@ static void blk_release_queue(struct kobject *kobj)
>   	if (queue_is_mq(q))
>   		blk_mq_release(q);
>   
> -	if (blk_queue_has_srcu(q))
> -		cleanup_srcu_struct(q->srcu);
> -
>   	ida_free(&blk_queue_ida, q->id);
>   	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>   }
> diff --git a/block/blk.h b/block/blk.h
> index 5350bf363035e..b25e2d22f3725 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -27,7 +27,6 @@ struct blk_flush_queue {
>   };
>   
>   extern struct kmem_cache *blk_requestq_cachep;
> -extern struct kmem_cache *blk_requestq_srcu_cachep;
>   extern struct kobj_type blk_queue_ktype;
>   extern struct ida blk_queue_ida;
>   
> @@ -420,13 +419,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
>   		struct page *page, unsigned int len, unsigned int offset,
>   		unsigned int max_sectors, bool *same_page);
>   
> -static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu)
> -{
> -	if (srcu)
> -		return blk_requestq_srcu_cachep;
> -	return blk_requestq_cachep;
> -}
> -struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
> +struct request_queue *blk_alloc_queue(int node_id);
>   
>   int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
>   
> diff --git a/block/genhd.c b/block/genhd.c
> index 2877b5f905579..fd0b13d6175a3 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1410,7 +1410,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
>   	struct request_queue *q;
>   	struct gendisk *disk;
>   
> -	q = blk_alloc_queue(node, false);
> +	q = blk_alloc_queue(node);
>   	if (!q)
>   		return NULL;
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ba18e9bdb799b..f040a7cab5dbf 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -7,6 +7,7 @@
>   #include <linux/lockdep.h>
>   #include <linux/scatterlist.h>
>   #include <linux/prefetch.h>
> +#include <linux/srcu.h>
>   
>   struct blk_mq_tags;
>   struct blk_flush_queue;
> @@ -501,6 +502,8 @@ enum hctx_type {
>    * @tag_list_lock: Serializes tag_list accesses.
>    * @tag_list:	   List of the request queues that use this tag set. See also
>    *		   request_queue.tag_set_list.
> + * @srcu:	   Use as lock when type of the request queue is blocking
> + *		   (BLK_MQ_F_BLOCKING). Must be the last member
>    */
>   struct blk_mq_tag_set {
>   	struct blk_mq_queue_map	map[HCTX_MAX_TYPES];
> @@ -521,6 +524,7 @@ struct blk_mq_tag_set {
>   
>   	struct mutex		tag_list_lock;
>   	struct list_head	tag_list;
> +	struct srcu_struct	srcu;
>   };
>   
>   /**
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 50e358a19d986..b15b6a011c028 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -22,7 +22,6 @@
>   #include <linux/blkzoned.h>
>   #include <linux/sched.h>
>   #include <linux/sbitmap.h>
> -#include <linux/srcu.h>
>   #include <linux/uuid.h>
>   #include <linux/xarray.h>
>   
> @@ -543,18 +542,11 @@ struct request_queue {
>   	struct mutex		debugfs_mutex;
>   
>   	bool			mq_sysfs_init_done;
> -
> -	/**
> -	 * @srcu: Sleepable RCU. Use as lock when type of the request queue
> -	 * is blocking (BLK_MQ_F_BLOCKING). Must be the last member
> -	 */
> -	struct srcu_struct	srcu[];
>   };
>   
>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
>   #define QUEUE_FLAG_STOPPED	0	/* queue is stopped */
>   #define QUEUE_FLAG_DYING	1	/* queue being torn down */
> -#define QUEUE_FLAG_HAS_SRCU	2	/* SRCU is allocated */
>   #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
>   #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
>   #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
> @@ -590,7 +582,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>   
>   #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
>   #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
> -#define blk_queue_has_srcu(q)	test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags)
>   #define blk_queue_init_done(q)	test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags)
>   #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
>   #define blk_queue_noxmerges(q)	\
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
  2022-10-20 13:23   ` Sagi Grimberg
  2022-10-21  1:46   ` Ming Lei
@ 2022-10-21  2:49   ` Chao Leng
  2022-10-21  6:50   ` Hannes Reinecke
  2022-10-21 21:18   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> Noting in blk_mq_wait_quiesce_done needs the request_queue now, so just
> pass the tagset, and move the non-mq check into the only caller that
> needs it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c           | 10 +++++-----
>   drivers/nvme/host/core.c |  4 ++--
>   drivers/scsi/scsi_lib.c  |  2 +-
>   include/linux/blk-mq.h   |  2 +-
>   4 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4a81a2da43328..cf8f9f9a96c35 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -254,15 +254,15 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
>   
>   /**
>    * blk_mq_wait_quiesce_done() - wait until in-progress quiesce is done
> - * @q: request queue.
> + * @set: tag_set to wait on
>    *
>    * Note: it is driver's responsibility for making sure that quiesce has
>    * been started.
>    */
> -void blk_mq_wait_quiesce_done(struct request_queue *q)
> +void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set)
>   {
> -	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
> -		synchronize_srcu(&q->tag_set->srcu);
> +	if (set->flags & BLK_MQ_F_BLOCKING)
> +		synchronize_srcu(&set->srcu);
>   	else
>   		synchronize_rcu();
>   }
> @@ -282,7 +282,7 @@ void blk_mq_quiesce_queue(struct request_queue *q)
>   	blk_mq_quiesce_queue_nowait(q);
>   	/* nothing to wait for non-mq queues */
>   	if (queue_is_mq(q))
> -		blk_mq_wait_quiesce_done(q);
> +		blk_mq_wait_quiesce_done(q->tag_set);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>   
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 44a5321743128..a74212a4f1a5f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5100,7 +5100,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
>   		blk_mq_quiesce_queue(ns->queue);
>   	else
> -		blk_mq_wait_quiesce_done(ns->queue);
> +		blk_mq_wait_quiesce_done(ns->queue->tag_set);
>   }
>   
>   /*
> @@ -5216,7 +5216,7 @@ void nvme_stop_admin_queue(struct nvme_ctrl *ctrl)
>   	if (!test_and_set_bit(NVME_CTRL_ADMIN_Q_STOPPED, &ctrl->flags))
>   		blk_mq_quiesce_queue(ctrl->admin_q);
>   	else
> -		blk_mq_wait_quiesce_done(ctrl->admin_q);
> +		blk_mq_wait_quiesce_done(ctrl->admin_q->tag_set);
>   }
>   EXPORT_SYMBOL_GPL(nvme_stop_admin_queue);
>   
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 8b89fab7c4206..249757ddd8fea 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2735,7 +2735,7 @@ static void scsi_stop_queue(struct scsi_device *sdev, bool nowait)
>   			blk_mq_quiesce_queue(sdev->request_queue);
>   	} else {
>   		if (!nowait)
> -			blk_mq_wait_quiesce_done(sdev->request_queue);
> +			blk_mq_wait_quiesce_done(sdev->request_queue->tag_set);
>   	}
>   }
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index f040a7cab5dbf..dfd565c4fb84e 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -881,7 +881,7 @@ void blk_mq_start_hw_queues(struct request_queue *q);
>   void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
>   void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
>   void blk_mq_quiesce_queue(struct request_queue *q);
> -void blk_mq_wait_quiesce_done(struct request_queue *q);
> +void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
>   void blk_mq_unquiesce_queue(struct request_queue *q);
>   void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
>   void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
  2022-10-20 13:24   ` Sagi Grimberg
  2022-10-21  1:53   ` Ming Lei
@ 2022-10-21  2:49   ` Chao Leng
  2022-10-21  6:51   ` Hannes Reinecke
  2022-10-21 21:22   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> Drivers that have shared tagsets may need to quiesce potentially a lot
> of request queues that all share a single tagset (e.g. nvme). Add an
> interface to quiesce all the queues on a given tagset. This interface is
> useful because it can speedup the quiesce by doing it in parallel.
> 
> Because some queues should not need to be quiesced(e.g. nvme connect_q)
> when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to
> tagset quiesce interface to skip the queue.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: simplify for the per-tag_set srcu_struct]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c         | 25 +++++++++++++++++++++++++
>   include/linux/blk-mq.h |  2 ++
>   include/linux/blkdev.h |  3 +++
>   3 files changed, 30 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index cf8f9f9a96c35..b0d2dd56bfdf2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -313,6 +313,31 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
>   
> +void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&set->tag_list_lock);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list) {
> +		if (!blk_queue_skip_tagset_quiesce(q))
> +			blk_mq_quiesce_queue_nowait(q);
> +	}
> +	blk_mq_wait_quiesce_done(set);
> +	mutex_unlock(&set->tag_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset);
> +
> +void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set)
> +{
> +	struct request_queue *q;
> +
> +	mutex_lock(&set->tag_list_lock);
> +	list_for_each_entry(q, &set->tag_list, tag_set_list)
> +		blk_mq_unquiesce_queue(q);
> +	mutex_unlock(&set->tag_list_lock);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_unquiesce_tagset);
> +
>   void blk_mq_wake_waiters(struct request_queue *q)
>   {
>   	struct blk_mq_hw_ctx *hctx;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index dfd565c4fb84e..35747949f7739 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -882,6 +882,8 @@ void blk_mq_start_stopped_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
>   void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async);
>   void blk_mq_quiesce_queue(struct request_queue *q);
>   void blk_mq_wait_quiesce_done(struct blk_mq_tag_set *set);
> +void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set);
> +void blk_mq_unquiesce_tagset(struct blk_mq_tag_set *set);
>   void blk_mq_unquiesce_queue(struct request_queue *q);
>   void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
>   void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b15b6a011c028..854b4745cdd1f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -571,6 +571,7 @@ struct request_queue {
>   #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
>   #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
>   #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
> +#define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
>   
>   #define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
>   				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
> @@ -610,6 +611,8 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>   #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
>   #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
>   #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
> +#define blk_queue_skip_tagset_quiesce(q) \
> +	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
>   
>   extern void blk_set_pm_only(struct request_queue *q);
>   extern void blk_clear_pm_only(struct request_queue *q);
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
  2022-10-20 13:30   ` Sagi Grimberg
@ 2022-10-21  2:49   ` Chao Leng
  2022-10-21  6:51   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> The NVME_NS_DEAD flag is only set in nvme_set_queue_dying, which is
> called in a loop over all namespaces in nvme_kill_queues.  Switch it
> to a controller flag checked and set outside said loop.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 16 +++++++---------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index a74212a4f1a5f..fa7fdb744979c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4330,7 +4330,7 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_info *info)
>   {
>   	int ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
>   
> -	if (test_bit(NVME_NS_DEAD, &ns->flags))
> +	if (test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
>   		goto out;
>   
>   	ret = NVME_SC_INVALID_NS | NVME_SC_DNR;
> @@ -4404,7 +4404,8 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
>   
>   	down_write(&ctrl->namespaces_rwsem);
>   	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
> -		if (ns->head->ns_id > nsid || test_bit(NVME_NS_DEAD, &ns->flags))
> +		if (ns->head->ns_id > nsid ||
> +		    test_bit(NVME_CTRL_NS_DEAD, &ns->ctrl->flags))
>   			list_move_tail(&ns->list, &rm_list);
>   	}
>   	up_write(&ctrl->namespaces_rwsem);
> @@ -5110,9 +5111,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>    */
>   static void nvme_set_queue_dying(struct nvme_ns *ns)
>   {
> -	if (test_and_set_bit(NVME_NS_DEAD, &ns->flags))
> -		return;
> -
>   	blk_mark_disk_dead(ns->disk);
>   	nvme_start_ns_queue(ns);
>   }
> @@ -5129,14 +5127,14 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	struct nvme_ns *ns;
>   
>   	down_read(&ctrl->namespaces_rwsem);
> -
>   	/* Forcibly unquiesce queues to avoid blocking dispatch */
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
>   
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> -		nvme_set_queue_dying(ns);
> -
> +	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> +		list_for_each_entry(ns, &ctrl->namespaces, list)
> +			nvme_set_queue_dying(ns);
> +	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
>   EXPORT_SYMBOL_GPL(nvme_kill_queues);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a29877217ee65..82989a3322130 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -237,6 +237,7 @@ enum nvme_ctrl_flags {
>   	NVME_CTRL_FAILFAST_EXPIRED	= 0,
>   	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
>   	NVME_CTRL_STARTED_ONCE		= 2,
> +	NVME_CTRL_NS_DEAD     		= 3,
>   };
>   
>   struct nvme_ctrl {
> @@ -483,7 +484,6 @@ struct nvme_ns {
>   	unsigned long features;
>   	unsigned long flags;
>   #define NVME_NS_REMOVING	0
> -#define NVME_NS_DEAD     	1
>   #define NVME_NS_ANA_PENDING	2
>   #define NVME_NS_FORCE_RO	3
>   #define NVME_NS_READY		4
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
  2022-10-20 13:10   ` Sagi Grimberg
@ 2022-10-21  2:50   ` Chao Leng
  2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> This helper is pretty pointless now, and also in the way of per-tagset
> quiesce.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 18 ++++--------------
>   1 file changed, 4 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fa7fdb744979c..0ab3a18fd9f85 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5104,17 +5104,6 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   		blk_mq_wait_quiesce_done(ns->queue->tag_set);
>   }
>   
> -/*
> - * Prepare a queue for teardown.
> - *
> - * This must forcibly unquiesce queues to avoid blocking dispatch.
> - */
> -static void nvme_set_queue_dying(struct nvme_ns *ns)
> -{
> -	blk_mark_disk_dead(ns->disk);
> -	nvme_start_ns_queue(ns);
> -}
> -
>   /**
>    * nvme_kill_queues(): Ends all namespace queues
>    * @ctrl: the dead controller that needs to end
> @@ -5130,10 +5119,11 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	/* Forcibly unquiesce queues to avoid blocking dispatch */
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
> -
>   	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> -		list_for_each_entry(ns, &ctrl->namespaces, list)
> -			nvme_set_queue_dying(ns);
> +		list_for_each_entry(ns, &ctrl->namespaces, list) {
> +			blk_mark_disk_dead(ns->disk);
> +			nvme_start_ns_queue(ns);
> +		}
>   	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset
  2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
  2022-10-20 13:35   ` Sagi Grimberg
@ 2022-10-21  2:50   ` Chao Leng
  2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  2:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

Reviewed-by: Chao Leng <lengchao@huawei.com>

On 2022/10/20 18:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> All controller namespaces share the same tagset, so we can use this
> interface which does the optimal operation for parallel quiesce based on
> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
> 
> nvme connect_q should not be quiesced when quiesce tagset, so set the
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
> 
> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
> In addition, we never really quiesce a single namespace. It is a better
> choice to move the flag from ns to ctrl.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: rebased on top of prep patches]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 38 +++++++++-----------------------------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 10 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0ab3a18fd9f85..cc71f1001144f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4891,6 +4891,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   			ret = PTR_ERR(ctrl->connect_q);
>   			goto out_free_tag_set;
>   		}
> +		blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, ctrl->connect_q);
>   	}
>   
>   	ctrl->tagset = set;
> @@ -5090,20 +5091,6 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   }
>   EXPORT_SYMBOL_GPL(nvme_init_ctrl);
>   
> -static void nvme_start_ns_queue(struct nvme_ns *ns)
> -{
> -	if (test_and_clear_bit(NVME_NS_STOPPED, &ns->flags))
> -		blk_mq_unquiesce_queue(ns->queue);
> -}
> -
> -static void nvme_stop_ns_queue(struct nvme_ns *ns)
> -{
> -	if (!test_and_set_bit(NVME_NS_STOPPED, &ns->flags))
> -		blk_mq_quiesce_queue(ns->queue);
> -	else
> -		blk_mq_wait_quiesce_done(ns->queue->tag_set);
> -}
> -
>   /**
>    * nvme_kill_queues(): Ends all namespace queues
>    * @ctrl: the dead controller that needs to end
> @@ -5120,10 +5107,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>   	if (ctrl->admin_q && !blk_queue_dying(ctrl->admin_q))
>   		nvme_start_admin_queue(ctrl);
>   	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
> -		list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		list_for_each_entry(ns, &ctrl->namespaces, list)
>   			blk_mark_disk_dead(ns->disk);
> -			nvme_start_ns_queue(ns);
> -		}
> +		nvme_start_queues(ctrl);
>   	}
>   	up_read(&ctrl->namespaces_rwsem);
>   }
> @@ -5179,23 +5165,17 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze);
>   
>   void nvme_stop_queues(struct nvme_ctrl *ctrl)
>   {
> -	struct nvme_ns *ns;
> -
> -	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> -		nvme_stop_ns_queue(ns);
> -	up_read(&ctrl->namespaces_rwsem);
> +	if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags))
> +		blk_mq_quiesce_tagset(ctrl->tagset);
> +	else
> +		blk_mq_wait_quiesce_done(ctrl->tagset);
>   }
>   EXPORT_SYMBOL_GPL(nvme_stop_queues);
>   
>   void nvme_start_queues(struct nvme_ctrl *ctrl)
>   {
> -	struct nvme_ns *ns;
> -
> -	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list)
> -		nvme_start_ns_queue(ns);
> -	up_read(&ctrl->namespaces_rwsem);
> +	if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags))
> +		blk_mq_unquiesce_tagset(ctrl->tagset);
>   }
>   EXPORT_SYMBOL_GPL(nvme_start_queues);
>   
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 82989a3322130..54d8127fb6dc5 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -238,6 +238,7 @@ enum nvme_ctrl_flags {
>   	NVME_CTRL_ADMIN_Q_STOPPED	= 1,
>   	NVME_CTRL_STARTED_ONCE		= 2,
>   	NVME_CTRL_NS_DEAD     		= 3,
> +	NVME_CTRL_STOPPED		= 4,
>   };
>   
>   struct nvme_ctrl {
> @@ -487,7 +488,6 @@ struct nvme_ns {
>   #define NVME_NS_ANA_PENDING	2
>   #define NVME_NS_FORCE_RO	3
>   #define NVME_NS_READY		4
> -#define NVME_NS_STOPPED		5
>   
>   	struct cdev		cdev;
>   	struct device		cdev_device;
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-21  2:47   ` Chao Leng
@ 2022-10-21  3:16     ` Chao Leng
  0 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-21  3:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block



On 2022/10/21 10:47, Chao Leng wrote:
> 
> 
> On 2022/10/20 18:56, Christoph Hellwig wrote:
>> For submit_bio based queues there is no (S)RCU critical section during
>> I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
>> so skip doing any synchronization.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   block/blk-mq.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 33292c01875d5..df967c8af9fee 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
>>   void blk_mq_quiesce_queue(struct request_queue *q)
>>   {
>>       blk_mq_quiesce_queue_nowait(q);
>> -    blk_mq_wait_quiesce_done(q);
>> +    /* nothing to wait for non-mq queues */
>> +    if (queue_is_mq(q))
>> +        blk_mq_wait_quiesce_done(q);
> For the non-mq queues, maybe we should add a synchronize_rcu.
>      if (queue_is_mq(q))
>          blk_mq_wait_quiesce_done(q);
>      else
>          synchronize_rcu();
Please ignore this, synchronize_rcu is not needed.
>>   }
>>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>>
> 
> .

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
                     ` (2 preceding siblings ...)
  2022-10-21  1:53   ` Chao Leng
@ 2022-10-21  6:49   ` Hannes Reinecke
  2022-10-21 13:13     ` Christoph Hellwig
  2022-10-21 21:12   ` Bart Van Assche
  4 siblings, 1 reply; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> nvme and xen-blkfront are already doing this to stop buffered writes from
> creating dirty pages that can't be written out later.  Move it to the
> common code.  Note that this follows the xen-blkfront version that does
> not send and uevent as the uevent is a bit confusing when the device is
> about to go away a little later, and the the size change is just to stop
> buffered writes faster.
> 
> This also removes the comment about the ordering from nvme, as bd_mutex
> not only is gone entirely, but also hasn't been used for locking updates
> to the disk size long before that, and thus the ordering requirement
> documented there doesn't apply any more.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/genhd.c                | 3 +++
>   drivers/block/xen-blkfront.c | 1 -
>   drivers/nvme/host/core.c     | 7 +------
>   3 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 17b33c62423df..2877b5f905579 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -555,6 +555,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
>   {
>   	set_bit(GD_DEAD, &disk->state);
>   	blk_queue_start_drain(disk->queue);
> +
> +	/* stop buffered writers from dirtying pages that can't written out */
> +	set_capacity(disk, 0);
>   }
>   EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>   
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 35b9bcad9db90..b28489290323f 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2129,7 +2129,6 @@ static void blkfront_closing(struct blkfront_info *info)
>   	if (info->rq && info->gd) {
>   		blk_mq_stop_hw_queues(info->rq);
>   		blk_mark_disk_dead(info->gd);
> -		set_capacity(info->gd, 0);
>   	}
>   
>   	for_each_rinfo(info, rinfo, i) {
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 059737c1a2c19..44a5321743128 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5106,10 +5106,7 @@ static void nvme_stop_ns_queue(struct nvme_ns *ns)
>   /*
>    * Prepare a queue for teardown.
>    *
> - * This must forcibly unquiesce queues to avoid blocking dispatch, and only set
> - * the capacity to 0 after that to avoid blocking dispatchers that may be
> - * holding bd_butex.  This will end buffered writers dirtying pages that can't
> - * be synced.
> + * This must forcibly unquiesce queues to avoid blocking dispatch.
>    */
>   static void nvme_set_queue_dying(struct nvme_ns *ns)
>   {
> @@ -5118,8 +5115,6 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
>   
>   	blk_mark_disk_dead(ns->disk);
>   	nvme_start_ns_queue(ns);
> -
> -	set_capacity_and_notify(ns->disk, 0);
>   }
>   
>   /**
I'm ever so slightly concerned about not sending the uevent anymore; MD 
for one relies on that event to figure out if a device is down.
And I'm also relatively sure that testing with MD on Xen had been 
relatively few.
What do we lose by using the 'notify' version instead?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
                     ` (2 preceding siblings ...)
  2022-10-21  2:47   ` Chao Leng
@ 2022-10-21  6:49   ` Hannes Reinecke
  3 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> For submit_bio based queues there is no (S)RCU critical section during
> I/O submission and thus nothing to wait for in blk_mq_wait_quiesce_done,
> so skip doing any synchronization.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 33292c01875d5..df967c8af9fee 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -280,7 +280,9 @@ EXPORT_SYMBOL_GPL(blk_mq_wait_quiesce_done);
>   void blk_mq_quiesce_queue(struct request_queue *q)
>   {
>   	blk_mq_quiesce_queue_nowait(q);
> -	blk_mq_wait_quiesce_done(q);
> +	/* nothing to wait for non-mq queues */
> +	if (queue_is_mq(q))
> +		blk_mq_wait_quiesce_done(q);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
>   
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
                     ` (3 preceding siblings ...)
  2022-10-21  2:49   ` Chao Leng
@ 2022-10-21  6:50   ` Hannes Reinecke
  2022-10-21  7:16   ` Chao Leng
  5 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.  Becuase there are a lot less
> tagsets there is also no need for the variable size allocation trick.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c       | 27 +++++----------------------
>   block/blk-mq.c         | 25 +++++++++++++++++--------
>   block/blk-mq.h         | 14 +++++++-------
>   block/blk-sysfs.c      |  9 ++-------
>   block/blk.h            |  9 +--------
>   block/genhd.c          |  2 +-
>   include/linux/blk-mq.h |  4 ++++
>   include/linux/blkdev.h |  9 ---------
>   8 files changed, 37 insertions(+), 62 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
                     ` (2 preceding siblings ...)
  2022-10-21  2:49   ` Chao Leng
@ 2022-10-21  6:50   ` Hannes Reinecke
  2022-10-21 21:18   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> Noting in blk_mq_wait_quiesce_done needs the request_queue now, so just
> pass the tagset, and move the non-mq check into the only caller that
> needs it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c           | 10 +++++-----
>   drivers/nvme/host/core.c |  4 ++--
>   drivers/scsi/scsi_lib.c  |  2 +-
>   include/linux/blk-mq.h   |  2 +-
>   4 files changed, 9 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
                     ` (2 preceding siblings ...)
  2022-10-21  2:49   ` Chao Leng
@ 2022-10-21  6:51   ` Hannes Reinecke
  2022-10-21 21:22   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> Drivers that have shared tagsets may need to quiesce potentially a lot
> of request queues that all share a single tagset (e.g. nvme). Add an
> interface to quiesce all the queues on a given tagset. This interface is
> useful because it can speedup the quiesce by doing it in parallel.
> 
> Because some queues should not need to be quiesced(e.g. nvme connect_q)
> when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to
> tagset quiesce interface to skip the queue.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: simplify for the per-tag_set srcu_struct]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c         | 25 +++++++++++++++++++++++++
>   include/linux/blk-mq.h |  2 ++
>   include/linux/blkdev.h |  3 +++
>   3 files changed, 30 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
  2022-10-20 13:30   ` Sagi Grimberg
  2022-10-21  2:49   ` Chao Leng
@ 2022-10-21  6:51   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> The NVME_NS_DEAD flag is only set in nvme_set_queue_dying, which is
> called in a loop over all namespaces in nvme_kill_queues.  Switch it
> to a controller flag checked and set outside said loop.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 16 +++++++---------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 8 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
  2022-10-20 13:10   ` Sagi Grimberg
  2022-10-21  2:50   ` Chao Leng
@ 2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:52 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> This helper is pretty pointless now, and also in the way of per-tagset
> quiesce.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 18 ++++--------------
>   1 file changed, 4 insertions(+), 14 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset
  2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
  2022-10-20 13:35   ` Sagi Grimberg
  2022-10-21  2:50   ` Chao Leng
@ 2022-10-21  6:52   ` Hannes Reinecke
  2 siblings, 0 replies; 61+ messages in thread
From: Hannes Reinecke @ 2022-10-21  6:52 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 12:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> All controller namespaces share the same tagset, so we can use this
> interface which does the optimal operation for parallel quiesce based on
> the tagset type(e.g. blocking tagsets and non-blocking tagsets).
> 
> nvme connect_q should not be quiesced when quiesce tagset, so set the
> QUEUE_FLAG_SKIP_TAGSET_QUIESCE to skip it when init connect_q.
> 
> Currently we use NVME_NS_STOPPED to ensure pairing quiescing and
> unquiescing. If use blk_mq_[un]quiesce_tagset, NVME_NS_STOPPED will be
> invalided, so introduce NVME_CTRL_STOPPED to replace NVME_NS_STOPPED.
> In addition, we never really quiesce a single namespace. It is a better
> choice to move the flag from ns to ctrl.
> 
> Signed-off-by: Chao Leng <lengchao@huawei.com>
> [hch: rebased on top of prep patches]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 38 +++++++++-----------------------------
>   drivers/nvme/host/nvme.h |  2 +-
>   2 files changed, 10 insertions(+), 30 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
                     ` (4 preceding siblings ...)
  2022-10-21  6:50   ` Hannes Reinecke
@ 2022-10-21  7:16   ` Chao Leng
  2022-10-21 13:22     ` Christoph Hellwig
  5 siblings, 1 reply; 61+ messages in thread
From: Chao Leng @ 2022-10-21  7:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block



On 2022/10/20 18:56, Christoph Hellwig wrote:
> All I/O submissions have fairly similar latencies, and a tagset-wide
> quiesce is a fairly common operation.  Becuase there are a lot less
> tagsets there is also no need for the variable size allocation trick.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c       | 27 +++++----------------------
>   block/blk-mq.c         | 25 +++++++++++++++++--------
>   block/blk-mq.h         | 14 +++++++-------
>   block/blk-sysfs.c      |  9 ++-------
>   block/blk.h            |  9 +--------
>   block/genhd.c          |  2 +-
>   include/linux/blk-mq.h |  4 ++++
>   include/linux/blkdev.h |  9 ---------
>   8 files changed, 37 insertions(+), 62 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 17667159482e0..3a2ed8dadf738 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -65,7 +65,6 @@ DEFINE_IDA(blk_queue_ida);
>    * For queue allocation
>    */
>   struct kmem_cache *blk_requestq_cachep;
> -struct kmem_cache *blk_requestq_srcu_cachep;
>   
>   /*
>    * Controlling structure to kblockd
> @@ -373,26 +372,20 @@ static void blk_timeout_work(struct work_struct *work)
>   {
>   }
>   
> -struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
> +struct request_queue *blk_alloc_queue(int node_id)
>   {
>   	struct request_queue *q;
>   
> -	q = kmem_cache_alloc_node(blk_get_queue_kmem_cache(alloc_srcu),
> -			GFP_KERNEL | __GFP_ZERO, node_id);
> +	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
> +				  node_id);
>   	if (!q)
>   		return NULL;
>   
> -	if (alloc_srcu) {
> -		blk_queue_flag_set(QUEUE_FLAG_HAS_SRCU, q);
> -		if (init_srcu_struct(q->srcu) != 0)
> -			goto fail_q;
> -	}
> -
>   	q->last_merge = NULL;
>   
>   	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
>   	if (q->id < 0)
> -		goto fail_srcu;
> +		goto fail_q;
>   
>   	q->stats = blk_alloc_queue_stats();
>   	if (!q->stats)
> @@ -435,11 +428,8 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
>   	blk_free_queue_stats(q->stats);
>   fail_id:
>   	ida_free(&blk_queue_ida, q->id);
> -fail_srcu:
> -	if (alloc_srcu)
> -		cleanup_srcu_struct(q->srcu);
>   fail_q:
> -	kmem_cache_free(blk_get_queue_kmem_cache(alloc_srcu), q);
> +	kmem_cache_free(blk_requestq_cachep, q);
>   	return NULL;
>   }
>   
> @@ -1184,9 +1174,6 @@ int __init blk_dev_init(void)
>   			sizeof_field(struct request, cmd_flags));
>   	BUILD_BUG_ON(REQ_OP_BITS + REQ_FLAG_BITS > 8 *
>   			sizeof_field(struct bio, bi_opf));
> -	BUILD_BUG_ON(ALIGN(offsetof(struct request_queue, srcu),
> -			   __alignof__(struct request_queue)) !=
> -		     sizeof(struct request_queue));
>   
>   	/* used for unplugging and affects IO latency/throughput - HIGHPRI */
>   	kblockd_workqueue = alloc_workqueue("kblockd",
> @@ -1197,10 +1184,6 @@ int __init blk_dev_init(void)
>   	blk_requestq_cachep = kmem_cache_create("request_queue",
>   			sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
>   
> -	blk_requestq_srcu_cachep = kmem_cache_create("request_queue_srcu",
> -			sizeof(struct request_queue) +
> -			sizeof(struct srcu_struct), 0, SLAB_PANIC, NULL);
> -
>   	blk_debugfs_root = debugfs_create_dir("block", NULL);
>   
>   	return 0;
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df967c8af9fee..4a81a2da43328 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -261,8 +261,8 @@ EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
>    */
>   void blk_mq_wait_quiesce_done(struct request_queue *q)
>   {
> -	if (blk_queue_has_srcu(q))
> -		synchronize_srcu(q->srcu);
> +	if (q->tag_set->flags & BLK_MQ_F_BLOCKING)
> +		synchronize_srcu(&q->tag_set->srcu);
>   	else
>   		synchronize_rcu();
>   }
> @@ -3971,7 +3971,7 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
>   	struct request_queue *q;
>   	int ret;
>   
> -	q = blk_alloc_queue(set->numa_node, set->flags & BLK_MQ_F_BLOCKING);
> +	q = blk_alloc_queue(set->numa_node);
>   	if (!q)
>   		return ERR_PTR(-ENOMEM);
>   	q->queuedata = queuedata;
> @@ -4138,9 +4138,6 @@ static void blk_mq_update_poll_flag(struct request_queue *q)
>   int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>   		struct request_queue *q)
>   {
> -	WARN_ON_ONCE(blk_queue_has_srcu(q) !=
> -			!!(set->flags & BLK_MQ_F_BLOCKING));
> -
>   	/* mark the queue as mq asap */
>   	q->mq_ops = set->ops;
>   
> @@ -4398,9 +4395,16 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	 */
>   	if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
>   		set->nr_hw_queues = nr_cpu_ids;
> +
> +	if (set->flags & BLK_MQ_F_BLOCKING) {
> +		ret = init_srcu_struct(&set->srcu);
> +		if (ret)
> +			return ret;
> +	}
>   
> -	if (blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues) < 0)
> -		return -ENOMEM;
> +	ret = blk_mq_alloc_tag_set_tags(set, set->nr_hw_queues);
> +	if (ret)
> +		goto out_free_srcu;
>   
>   	ret = -ENOMEM;
>   	for (i = 0; i < set->nr_maps; i++) {
> @@ -4430,6 +4434,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	}
>   	kfree(set->tags);
>   	set->tags = NULL;
> +out_free_srcu:
> +	if (set->flags & BLK_MQ_F_BLOCKING)
> +		cleanup_srcu_struct(&set->srcu);
>   	return ret;
>   }
>   EXPORT_SYMBOL(blk_mq_alloc_tag_set);
> @@ -4469,6 +4476,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>   
>   	kfree(set->tags);
>   	set->tags = NULL;
> +	if (set->flags & BLK_MQ_F_BLOCKING)
> +		cleanup_srcu_struct(&set->srcu);
>   }
>   EXPORT_SYMBOL(blk_mq_free_tag_set);
>   
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 0b2870839cdd6..06eb46d1d7a76 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -377,17 +377,17 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>   /* run the code block in @dispatch_ops with rcu/srcu read lock held */
>   #define __blk_mq_run_dispatch_ops(q, check_sleep, dispatch_ops)	\
>   do {								\
> -	if (!blk_queue_has_srcu(q)) {				\
> -		rcu_read_lock();				\
> -		(dispatch_ops);					\
> -		rcu_read_unlock();				\
> -	} else {						\
> +	if ((q)->tag_set->flags & BLK_MQ_F_BLOCKING) {		\
>   		int srcu_idx;					\
>   								\
>   		might_sleep_if(check_sleep);			\
> -		srcu_idx = srcu_read_lock((q)->srcu);		\
> +		srcu_idx = srcu_read_lock(&((q)->tag_set->srcu)); \
>   		(dispatch_ops);					\
> -		srcu_read_unlock((q)->srcu, srcu_idx);		\
> +		srcu_read_unlock(&((q)->tag_set->srcu), srcu_idx); \
> +	} else {						\
> +		rcu_read_lock();				\
> +		(dispatch_ops);					\
> +		rcu_read_unlock();				\
>   	}							\
>   } while (0)
>   
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index e71b3b43927c0..e7871665825a3 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -739,10 +739,8 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
>   
>   static void blk_free_queue_rcu(struct rcu_head *rcu_head)
>   {
> -	struct request_queue *q = container_of(rcu_head, struct request_queue,
> -					       rcu_head);
> -
> -	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
> +	kmem_cache_free(blk_requestq_cachep,
> +			container_of(rcu_head, struct request_queue, rcu_head));
>   }
>   
>   /**
> @@ -779,9 +777,6 @@ static void blk_release_queue(struct kobject *kobj)
>   	if (queue_is_mq(q))
>   		blk_mq_release(q);
>   
> -	if (blk_queue_has_srcu(q))
> -		cleanup_srcu_struct(q->srcu);
> -
>   	ida_free(&blk_queue_ida, q->id);
>   	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>   }
> diff --git a/block/blk.h b/block/blk.h
> index 5350bf363035e..b25e2d22f3725 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -27,7 +27,6 @@ struct blk_flush_queue {
>   };
>   
>   extern struct kmem_cache *blk_requestq_cachep;
> -extern struct kmem_cache *blk_requestq_srcu_cachep;
>   extern struct kobj_type blk_queue_ktype;
>   extern struct ida blk_queue_ida;
>   
> @@ -420,13 +419,7 @@ int bio_add_hw_page(struct request_queue *q, struct bio *bio,
>   		struct page *page, unsigned int len, unsigned int offset,
>   		unsigned int max_sectors, bool *same_page);
>   
> -static inline struct kmem_cache *blk_get_queue_kmem_cache(bool srcu)
> -{
> -	if (srcu)
> -		return blk_requestq_srcu_cachep;
> -	return blk_requestq_cachep;
> -}
> -struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
> +struct request_queue *blk_alloc_queue(int node_id);
>   
>   int disk_scan_partitions(struct gendisk *disk, fmode_t mode);
>   
> diff --git a/block/genhd.c b/block/genhd.c
> index 2877b5f905579..fd0b13d6175a3 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1410,7 +1410,7 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
>   	struct request_queue *q;
>   	struct gendisk *disk;
>   
> -	q = blk_alloc_queue(node, false);
> +	q = blk_alloc_queue(node);
>   	if (!q)
>   		return NULL;
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index ba18e9bdb799b..f040a7cab5dbf 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -7,6 +7,7 @@
>   #include <linux/lockdep.h>
>   #include <linux/scatterlist.h>
>   #include <linux/prefetch.h>
> +#include <linux/srcu.h>
>   
>   struct blk_mq_tags;
>   struct blk_flush_queue;
> @@ -501,6 +502,8 @@ enum hctx_type {
>    * @tag_list_lock: Serializes tag_list accesses.
>    * @tag_list:	   List of the request queues that use this tag set. See also
>    *		   request_queue.tag_set_list.
> + * @srcu:	   Use as lock when type of the request queue is blocking
> + *		   (BLK_MQ_F_BLOCKING). Must be the last member
>    */
>   struct blk_mq_tag_set {
>   	struct blk_mq_queue_map	map[HCTX_MAX_TYPES];
> @@ -521,6 +524,7 @@ struct blk_mq_tag_set {
>   
>   	struct mutex		tag_list_lock;
>   	struct list_head	tag_list;
> +	struct srcu_struct	srcu;
srcu_struct size is more than 50+ KB, it is waste for the tagset which do not set
the BLK_MQ_F_BLOCKING, and most tagsets do not set the BLK_MQ_F_BLOCKING.
Maybe we can define "srcu" as a pointer, and allocate the memory in blk_mq_alloc_tag_set
just for tagset which set the BLK_MQ_F_BLOCKING.
>   };
>   
>   /**
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 50e358a19d986..b15b6a011c028 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -22,7 +22,6 @@
>   #include <linux/blkzoned.h>
>   #include <linux/sched.h>
>   #include <linux/sbitmap.h>
> -#include <linux/srcu.h>
>   #include <linux/uuid.h>
>   #include <linux/xarray.h>
>   
> @@ -543,18 +542,11 @@ struct request_queue {
>   	struct mutex		debugfs_mutex;
>   
>   	bool			mq_sysfs_init_done;
> -
> -	/**
> -	 * @srcu: Sleepable RCU. Use as lock when type of the request queue
> -	 * is blocking (BLK_MQ_F_BLOCKING). Must be the last member
> -	 */
> -	struct srcu_struct	srcu[];
>   };
>   
>   /* Keep blk_queue_flag_name[] in sync with the definitions below */
>   #define QUEUE_FLAG_STOPPED	0	/* queue is stopped */
>   #define QUEUE_FLAG_DYING	1	/* queue being torn down */
> -#define QUEUE_FLAG_HAS_SRCU	2	/* SRCU is allocated */
>   #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
>   #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
>   #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
> @@ -590,7 +582,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
>   
>   #define blk_queue_stopped(q)	test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
>   #define blk_queue_dying(q)	test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags)
> -#define blk_queue_has_srcu(q)	test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags)
>   #define blk_queue_init_done(q)	test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags)
>   #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
>   #define blk_queue_noxmerges(q)	\
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-21  1:09   ` Ming Lei
@ 2022-10-21 13:11     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	Chao Leng, linux-nvme, linux-block

On Fri, Oct 21, 2022 at 09:09:07AM +0800, Ming Lei wrote:
> Just one small issue on mtip32xx, which may call blk_mark_disk_dead() in
> irq context, and ->bd_size_lock is actually not irq safe.
> 
> But mtip32xx is already broken since blk_queue_start_drain() need mutex,
> maybe mtip32xx isn't actively used at all.

A while ago active users were still reported.  But I bet no one
regularly tests hot removals with these cards.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-21  6:49   ` Hannes Reinecke
@ 2022-10-21 13:13     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:13 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	Chao Leng, Ming Lei, linux-nvme, linux-block

On Fri, Oct 21, 2022 at 08:49:30AM +0200, Hannes Reinecke wrote:
> I'm ever so slightly concerned about not sending the uevent anymore; MD for 
> one relies on that event to figure out if a device is down.

Hmm, where?  I actually just had customer reports about md not noticing
nvme devices going down properly and thus looking into in-kernel
delivery of notifications..

> And I'm also relatively sure that testing with MD on Xen had been 
> relatively few.
> What do we lose by using the 'notify' version instead?

Mostly that we get incorrect resize events to userspace,  but maybe
because nvme is more common that might still be better overall.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-21  1:13   ` Ming Lei
@ 2022-10-21 13:19     ` Christoph Hellwig
  2022-10-21 15:08       ` Ming Lei
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	Chao Leng, linux-nvme, linux-block

On Fri, Oct 21, 2022 at 09:13:29AM +0800, Ming Lei wrote:
> > -	blk_mq_wait_quiesce_done(q);
> > +	/* nothing to wait for non-mq queues */
> > +	if (queue_is_mq(q))
> > +		blk_mq_wait_quiesce_done(q);
> 
> This interface can't work as expected for bio drivers, the only user
> should be del_gendisk(), but anyway the patch is fine:

Another one is the wb_lat_usec sysfs attribute.  But maybe it is better
do just do this in the callers and WARN?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-20 17:26   ` Keith Busch
@ 2022-10-21 13:20     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:20 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, Chao Leng,
	Ming Lei, linux-nvme, linux-block

On Thu, Oct 20, 2022 at 11:26:55AM -0600, Keith Busch wrote:
> Since you're not dealing with flexible arrays anymore, I don't think
> srcu strictly needs to be the last member.

True.  I thought I had removed this sentence when copying the comment
over, but it looks like I messed that up somewhow.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset
  2022-10-21  7:16   ` Chao Leng
@ 2022-10-21 13:22     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:22 UTC (permalink / raw)
  To: Chao Leng
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	Ming Lei, linux-nvme, linux-block

On Fri, Oct 21, 2022 at 03:16:50PM +0800, Chao Leng wrote:
>>   	struct list_head	tag_list;
>> +	struct srcu_struct	srcu;
> srcu_struct size is more than 50+ KB, it is waste for the tagset which do not set
> the BLK_MQ_F_BLOCKING, and most tagsets do not set the BLK_MQ_F_BLOCKING.
> Maybe we can define "srcu" as a pointer, and allocate the memory in blk_mq_alloc_tag_set
> just for tagset which set the BLK_MQ_F_BLOCKING.

I guess we could do that.  Still better than the variable sized struct.

p.s. any chance you could properly trim your replies?  I took me a while
to find your comment in the sea of quotes.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-21  1:46   ` Ming Lei
@ 2022-10-21 13:23     ` Christoph Hellwig
  0 siblings, 0 replies; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	Chao Leng, linux-nvme, linux-block

On Fri, Oct 21, 2022 at 09:46:33AM +0800, Ming Lei wrote:
> The change is fine, but the interface could confuse people, it
> looks like it is waiting for whole tagset quiesced, but it needs
> to mark all request queues as quiesced first, otherwise it is just
> wait for one specific queue's quiesce.
> 
> So suggest to document such thing.

Yes, that's probably a good idea.  Still better would be to make
this API purely internal, as the pure wait callers in NVMe and SCSI
are a bit sketchy.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-20 13:30   ` Sagi Grimberg
@ 2022-10-21 13:28     ` Christoph Hellwig
  2022-10-24  8:43       ` Sagi Grimberg
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng, Ming Lei,
	linux-nvme, linux-block

On Thu, Oct 20, 2022 at 04:30:21PM +0300, Sagi Grimberg wrote:
>> -
>> +	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
>> +		list_for_each_entry(ns, &ctrl->namespaces, list)
>> +			nvme_set_queue_dying(ns);
>> +	}
>
> Looking at it now, I'm not sure I understand the need for this flag. It
> seems to make nvme_kill_queues reentrant safe, but the admin queue
> unquiesce can still end up unbalanced under reentrance?
>
> How is this not broken today (or ever since quiesce/unquiesce started
> accounting)? Maybe I lost some context on the exact subtlety of how
> nvme-pci uses this interface...

Yes, this also looks weird and I had a TODO list entry for myself
to look into what is going on here.  The whole interaction
with nvme_remove_namespaces is pretty weird to start with, and then
the code in PCIe is even more weird.  But to feel confident to
touch this I'd need real hot removal testing, for which I don't
have a good rig right now.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-20 13:10   ` Sagi Grimberg
@ 2022-10-21 13:29     ` Christoph Hellwig
  2022-10-24  8:48       ` Sagi Grimberg
  0 siblings, 1 reply; 61+ messages in thread
From: Christoph Hellwig @ 2022-10-21 13:29 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Chao Leng, Ming Lei,
	linux-nvme, linux-block

On Thu, Oct 20, 2022 at 04:10:47PM +0300, Sagi Grimberg wrote:
> I have to say that I always found nvme_kill_queues interface somewhat
> odd.

It is..

> its a core function that unquiesces the admin/io queues
> assuming that they were stopped at some point by the driver.
>
> If now there is no dependency between unquiesce and blk_mark_disk_dead,
> maybe it would be a good idea to move the unquiescing to the drivers
> which can pair with the quiesce itself, and rename it to
> nvme_mark_namespaces_dead() or something?

Yes, something like that is probably a good idea.  This entire area
is a bit of a mess to say it mildly, so maybe we need to sort that
out first.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue
  2022-10-21 13:19     ` Christoph Hellwig
@ 2022-10-21 15:08       ` Ming Lei
  0 siblings, 0 replies; 61+ messages in thread
From: Ming Lei @ 2022-10-21 15:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng, linux-nvme,
	linux-block

On Fri, Oct 21, 2022 at 03:19:32PM +0200, Christoph Hellwig wrote:
> On Fri, Oct 21, 2022 at 09:13:29AM +0800, Ming Lei wrote:
> > > -	blk_mq_wait_quiesce_done(q);
> > > +	/* nothing to wait for non-mq queues */
> > > +	if (queue_is_mq(q))
> > > +		blk_mq_wait_quiesce_done(q);
> > 
> > This interface can't work as expected for bio drivers, the only user
> > should be del_gendisk(), but anyway the patch is fine:
> 
> Another one is the wb_lat_usec sysfs attribute.  But maybe it is better
> do just do this in the callers and WARN?

wbt is only available for blk-mq.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: per-tagset SRCU struct and quiesce
  2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-10-20 13:16 ` per-tagset SRCU struct and quiesce Sagi Grimberg
@ 2022-10-21 18:06 ` Keith Busch
  9 siblings, 0 replies; 61+ messages in thread
From: Keith Busch @ 2022-10-21 18:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Chao Leng, Ming Lei, linux-nvme, linux-block

Looks good; for the series:

Reviewed-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead
  2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
                     ` (3 preceding siblings ...)
  2022-10-21  6:49   ` Hannes Reinecke
@ 2022-10-21 21:12   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2022-10-21 21:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 03:56, Christoph Hellwig wrote:
> +	/* stop buffered writers from dirtying pages that can't written out */

Hi Christoph,

If this series is reposted, please insert the missing verb "be" in the 
above comment.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done
  2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
                     ` (3 preceding siblings ...)
  2022-10-21  6:50   ` Hannes Reinecke
@ 2022-10-21 21:18   ` Bart Van Assche
  4 siblings, 0 replies; 61+ messages in thread
From: Bart Van Assche @ 2022-10-21 21:18 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 03:56, Christoph Hellwig wrote:
> Noting in blk_mq_wait_quiesce_done needs the request_queue now, so just

Noting -> Nothing

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
                     ` (3 preceding siblings ...)
  2022-10-21  6:51   ` Hannes Reinecke
@ 2022-10-21 21:22   ` Bart Van Assche
  2022-10-24  1:57     ` Chao Leng
  4 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2022-10-21 21:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg, Chao Leng
  Cc: Ming Lei, linux-nvme, linux-block

On 10/20/22 03:56, Christoph Hellwig wrote:
> From: Chao Leng <lengchao@huawei.com>
> 
> Drivers that have shared tagsets may need to quiesce potentially a lot
> of request queues that all share a single tagset (e.g. nvme). Add an
> interface to quiesce all the queues on a given tagset. This interface is
> useful because it can speedup the quiesce by doing it in parallel.
> 
> Because some queues should not need to be quiesced(e.g. nvme connect_q)

should not need -> do not have

> when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to

quiesce -> quiescing

. So -> ,

> tagset quiesce interface to skip the queue.

Hi Christoph,

Are there any drivers that may call blk_mq_quiesce_tagset() concurrently 
from two different threads? blk_mq_quiesce_queue() supports this but 
blk_mq_quiesce_tagset() cannot support this because of the mechanism 
used to mark request queues that should be skipped (queue flag).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-21 21:22   ` Bart Van Assche
@ 2022-10-24  1:57     ` Chao Leng
  2022-10-24 13:35       ` Bart Van Assche
  0 siblings, 1 reply; 61+ messages in thread
From: Chao Leng @ 2022-10-24  1:57 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig, Jens Axboe, Keith Busch,
	Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block



On 2022/10/22 5:22, Bart Van Assche wrote:
> On 10/20/22 03:56, Christoph Hellwig wrote:
>> From: Chao Leng <lengchao@huawei.com>
>>
>> Drivers that have shared tagsets may need to quiesce potentially a lot
>> of request queues that all share a single tagset (e.g. nvme). Add an
>> interface to quiesce all the queues on a given tagset. This interface is
>> useful because it can speedup the quiesce by doing it in parallel.
>>
>> Because some queues should not need to be quiesced(e.g. nvme connect_q)
> 
> should not need -> do not have
> 
>> when quiesce the tagset. So introduce QUEUE_FLAG_SKIP_TAGSET_QUIESCE to
> 
> quiesce -> quiescing
> 
> . So -> ,
> 
>> tagset quiesce interface to skip the queue.
> 
> Hi Christoph,
> 
> Are there any drivers that may call blk_mq_quiesce_tagset() concurrently from two different threads? blk_mq_quiesce_queue() supports this but blk_mq_quiesce_tagset() cannot support this because of the mechanism used to mark request queues that should be skipped (queue flag).
> 
blk_mq_quiesce_tagset() support concurrency. blk_mq_quiesce_tagset() just
check the flag(QUEUE_FLAG_SKIP_TAGSET_QUIESCE), it has no impact on concurrency.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-21 13:28     ` Christoph Hellwig
@ 2022-10-24  8:43       ` Sagi Grimberg
  2022-10-24  8:50         ` Sagi Grimberg
  0 siblings, 1 reply; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-24  8:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Chao Leng, Ming Lei, linux-nvme, linux-block


>>> -
>>> +	if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
>>> +		list_for_each_entry(ns, &ctrl->namespaces, list)
>>> +			nvme_set_queue_dying(ns);
>>> +	}
>>
>> Looking at it now, I'm not sure I understand the need for this flag. It
>> seems to make nvme_kill_queues reentrant safe, but the admin queue
>> unquiesce can still end up unbalanced under reentrance?
>>
>> How is this not broken today (or ever since quiesce/unquiesce started
>> accounting)? Maybe I lost some context on the exact subtlety of how
>> nvme-pci uses this interface...
> 
> Yes, this also looks weird and I had a TODO list entry for myself
> to look into what is going on here.  The whole interaction
> with nvme_remove_namespaces is pretty weird to start with, and then
> the code in PCIe is even more weird.  But to feel confident to
> touch this I'd need real hot removal testing, for which I don't
> have a good rig right now.

Lets for start move the bit check up in the function and reverse
the polarity to return if it is set. Unless someone can make sense
of why this is OK.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 7/8] nvme: remove nvme_set_queue_dying
  2022-10-21 13:29     ` Christoph Hellwig
@ 2022-10-24  8:48       ` Sagi Grimberg
  0 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-24  8:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Chao Leng, Ming Lei, linux-nvme, linux-block


>> I have to say that I always found nvme_kill_queues interface somewhat
>> odd.
> 
> It is..
> 
>> its a core function that unquiesces the admin/io queues
>> assuming that they were stopped at some point by the driver.
>>
>> If now there is no dependency between unquiesce and blk_mark_disk_dead,
>> maybe it would be a good idea to move the unquiescing to the drivers
>> which can pair with the quiesce itself, and rename it to
>> nvme_mark_namespaces_dead() or something?
> 
> Yes, something like that is probably a good idea.  This entire area
> is a bit of a mess to say it mildly, so maybe we need to sort that
> out first.

I agree.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 6/8] nvme: move the NS_DEAD flag to the controller
  2022-10-24  8:43       ` Sagi Grimberg
@ 2022-10-24  8:50         ` Sagi Grimberg
  0 siblings, 0 replies; 61+ messages in thread
From: Sagi Grimberg @ 2022-10-24  8:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Chao Leng, Ming Lei, linux-nvme, linux-block


>>>> -
>>>> +    if (!test_and_set_bit(NVME_CTRL_NS_DEAD, &ctrl->flags)) {
>>>> +        list_for_each_entry(ns, &ctrl->namespaces, list)
>>>> +            nvme_set_queue_dying(ns);
>>>> +    }
>>>
>>> Looking at it now, I'm not sure I understand the need for this flag. It
>>> seems to make nvme_kill_queues reentrant safe, but the admin queue
>>> unquiesce can still end up unbalanced under reentrance?
>>>
>>> How is this not broken today (or ever since quiesce/unquiesce started
>>> accounting)? Maybe I lost some context on the exact subtlety of how
>>> nvme-pci uses this interface...
>>
>> Yes, this also looks weird and I had a TODO list entry for myself
>> to look into what is going on here.  The whole interaction
>> with nvme_remove_namespaces is pretty weird to start with,

git blame points to:
0ff9d4e1a284 ("NVMe: Short-cut removal on surprise hot-unplug")

But it is unclear as to why it sits in nvme_remove_namespaces instead
of somewhere before that in nvme_remove()...

Keith?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-24  1:57     ` Chao Leng
@ 2022-10-24 13:35       ` Bart Van Assche
  2022-10-25  1:38         ` Chao Leng
  0 siblings, 1 reply; 61+ messages in thread
From: Bart Van Assche @ 2022-10-24 13:35 UTC (permalink / raw)
  To: Chao Leng, Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block

On 10/23/22 18:57, Chao Leng wrote:
> blk_mq_quiesce_tagset() support concurrency. blk_mq_quiesce_tagset() just
> check the flag(QUEUE_FLAG_SKIP_TAGSET_QUIESCE), it has no impact on 
> concurrency.

Hi Chao,

I think it depends on how the QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag is 
set. I agree if that flag is set once and never modified that there is 
no race. What I'm wondering about is whether there could be a need for 
block drivers to set the QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag just before 
blk_mq_quiesce_tagset() is called and cleared immediately after 
blk_mq_quiesce_tagset() returns? In that case I think there is a race 
condition.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 5/8] blk-mq: add tagset quiesce interface
  2022-10-24 13:35       ` Bart Van Assche
@ 2022-10-25  1:38         ` Chao Leng
  0 siblings, 0 replies; 61+ messages in thread
From: Chao Leng @ 2022-10-25  1:38 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig, Jens Axboe, Keith Busch,
	Sagi Grimberg
  Cc: Ming Lei, linux-nvme, linux-block



On 2022/10/24 21:35, Bart Van Assche wrote:
> On 10/23/22 18:57, Chao Leng wrote:
>> blk_mq_quiesce_tagset() support concurrency. blk_mq_quiesce_tagset() just
>> check the flag(QUEUE_FLAG_SKIP_TAGSET_QUIESCE), it has no impact on concurrency.
> 
> Hi Chao,
> 
> I think it depends on how the QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag is set. I agree if that flag is set once and never modified that there is no race. What I'm wondering about is whether there could be a need for block drivers to set the QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag just before blk_mq_quiesce_tagset() is called and cleared immediately after blk_mq_quiesce_tagset() returns? In that case I think there is a race condition.
> 
Even if dynamically modify the QUEUE_FLAG_SKIP_TAGSET_QUIESCE, it still has no
impact on concurrency of blk_mq_quiesce_tagset(). It may affect atomicity of
blk_mq_quiesce_tagset(), blk_mq_quiesce_tagset() do not ensure atomicity, the caller
should ensure it if needed.

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2022-10-25  1:47 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20 10:56 per-tagset SRCU struct and quiesce Christoph Hellwig
2022-10-20 10:56 ` [PATCH 1/8] block: set the disk capacity to 0 in blk_mark_disk_dead Christoph Hellwig
2022-10-20 13:16   ` Sagi Grimberg
2022-10-21  1:09   ` Ming Lei
2022-10-21 13:11     ` Christoph Hellwig
2022-10-21  1:53   ` Chao Leng
2022-10-21  6:49   ` Hannes Reinecke
2022-10-21 13:13     ` Christoph Hellwig
2022-10-21 21:12   ` Bart Van Assche
2022-10-20 10:56 ` [PATCH 2/8] blk-mq: skip non-mq queues in blk_mq_quiesce_queue Christoph Hellwig
2022-10-20 13:16   ` Sagi Grimberg
2022-10-21  1:13   ` Ming Lei
2022-10-21 13:19     ` Christoph Hellwig
2022-10-21 15:08       ` Ming Lei
2022-10-21  2:47   ` Chao Leng
2022-10-21  3:16     ` Chao Leng
2022-10-21  6:49   ` Hannes Reinecke
2022-10-20 10:56 ` [PATCH 3/8] blk-mq: move the srcu_struct used for quiescing to the tagset Christoph Hellwig
2022-10-20 13:23   ` Sagi Grimberg
2022-10-20 17:26   ` Keith Busch
2022-10-21 13:20     ` Christoph Hellwig
2022-10-21  1:41   ` Ming Lei
2022-10-21  2:49   ` Chao Leng
2022-10-21  6:50   ` Hannes Reinecke
2022-10-21  7:16   ` Chao Leng
2022-10-21 13:22     ` Christoph Hellwig
2022-10-20 10:56 ` [PATCH 4/8] blk-mq: pass a tagset to blk_mq_wait_quiesce_done Christoph Hellwig
2022-10-20 13:23   ` Sagi Grimberg
2022-10-21  1:46   ` Ming Lei
2022-10-21 13:23     ` Christoph Hellwig
2022-10-21  2:49   ` Chao Leng
2022-10-21  6:50   ` Hannes Reinecke
2022-10-21 21:18   ` Bart Van Assche
2022-10-20 10:56 ` [PATCH 5/8] blk-mq: add tagset quiesce interface Christoph Hellwig
2022-10-20 13:24   ` Sagi Grimberg
2022-10-21  1:53   ` Ming Lei
2022-10-21  2:49   ` Chao Leng
2022-10-21  6:51   ` Hannes Reinecke
2022-10-21 21:22   ` Bart Van Assche
2022-10-24  1:57     ` Chao Leng
2022-10-24 13:35       ` Bart Van Assche
2022-10-25  1:38         ` Chao Leng
2022-10-20 10:56 ` [PATCH 6/8] nvme: move the NS_DEAD flag to the controller Christoph Hellwig
2022-10-20 13:30   ` Sagi Grimberg
2022-10-21 13:28     ` Christoph Hellwig
2022-10-24  8:43       ` Sagi Grimberg
2022-10-24  8:50         ` Sagi Grimberg
2022-10-21  2:49   ` Chao Leng
2022-10-21  6:51   ` Hannes Reinecke
2022-10-20 10:56 ` [PATCH 7/8] nvme: remove nvme_set_queue_dying Christoph Hellwig
2022-10-20 13:10   ` Sagi Grimberg
2022-10-21 13:29     ` Christoph Hellwig
2022-10-24  8:48       ` Sagi Grimberg
2022-10-21  2:50   ` Chao Leng
2022-10-21  6:52   ` Hannes Reinecke
2022-10-20 10:56 ` [PATCH 8/8] nvme: use blk_mq_[un]quiesce_tagset Christoph Hellwig
2022-10-20 13:35   ` Sagi Grimberg
2022-10-21  2:50   ` Chao Leng
2022-10-21  6:52   ` Hannes Reinecke
2022-10-20 13:16 ` per-tagset SRCU struct and quiesce Sagi Grimberg
2022-10-21 18:06 ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.