All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap
@ 2021-09-24  8:28 John Garry
  2021-09-24  8:28 ` [PATCH v4 01/13] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Currently a full set of static requests are allocated per hw queue per
tagset when shared sbitmap is used.

However, only tagset->queue_depth number of requests may be active at
any given time. As such, only tagset->queue_depth number of static
requests are required.

The same goes for using an IO scheduler, which allocates a full set of
static requests per hw queue per request queue.

This series changes shared sbitmap support by using a shared tags per
tagset and request queue. Ming suggested something along those lines in
v1 review. But we'll keep name "shared sbitmap" name as it is familiar. In
using a shared tags, the static rqs also become shared, reducing the
number of sets of static rqs, reducing memory usage.

Patch "blk-mq: Use shared tags for shared sbitmap support" is a bit big,
and could potentially be broken down. But then maintaining ability to
bisect becomes harder and each sub-patch would get more convoluted.

For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
save approx. 300MB(!) [370MB -> 60MB]

Baseline is v5.15-rc2

Changes since v3:
- Fix transient error handling issue in  05/13
- Add Hannes RB tags (thanks!)

Changes since v2:
- Make blk_mq_clear_rq_mapping() static again
- Various function renaming for conciseness and consistency
- Add/refactor alloc/free map and rqs function
- Drop the new blk_mq_ops init_request method in favour of passing an
  invalid HW queue index for shared sbitmap
- Add patch to not clear rq mapping for driver tags
- Remove blk_mq_init_bitmap_tags()
- Add some more RB tags (thanks!)

Changes since v1:
- Switch to use blk_mq_tags for shared sbitmap
- Add new blk_mq_ops init request callback
- Add some RB tags (thanks!)


John Garry (13):
  blk-mq: Change rqs check in blk_mq_free_rqs()
  block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
  blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  blk-mq: Invert check in blk_mq_update_nr_requests()
  blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
  blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
  blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
  blk-mq: Don't clear driver tags own mapping
  blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
  blk-mq: Add blk_mq_alloc_map_and_rqs()
  blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  blk-mq: Use shared tags for shared sbitmap support
  blk-mq: Stop using pointers for blk_mq_tags bitmap tags

 block/bfq-iosched.c    |   4 +-
 block/blk-core.c       |   4 +-
 block/blk-mq-debugfs.c |   8 +-
 block/blk-mq-sched.c   | 116 +++++++++++------------
 block/blk-mq-sched.h   |   4 +-
 block/blk-mq-tag.c     | 125 +++++++++----------------
 block/blk-mq-tag.h     |  14 +--
 block/blk-mq.c         | 208 ++++++++++++++++++++++++-----------------
 block/blk-mq.h         |  18 ++--
 block/blk.h            |   2 +-
 block/kyber-iosched.c  |   4 +-
 block/mq-deadline.c    |   2 +-
 drivers/block/rbd.c    |   2 +-
 include/linux/blk-mq.h |  15 ++-
 include/linux/blkdev.h |   5 +-
 15 files changed, 255 insertions(+), 276 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 01/13] blk-mq: Change rqs check in blk_mq_free_rqs()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24  8:28 ` [PATCH v4 02/13] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

The original code in commit 24d2f90309b23 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 108a352051be..2316ff27c1f5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2340,7 +2340,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 {
 	struct page *page;
 
-	if (tags->rqs && set->ops->exit_request) {
+	if (tags->static_rqs && set->ops->exit_request) {
 		int i;
 
 		for (i = 0; i < tags->nr_tags; i++) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 02/13] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
  2021-09-24  8:28 ` [PATCH v4 01/13] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24  8:28 ` [PATCH v4 03/13] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching it's usage - that being
the default number of requests assigned when allocating a request queue.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c       | 2 +-
 block/blk-mq-sched.c   | 2 +-
 block/blk-mq-sched.h   | 2 +-
 drivers/block/rbd.c    | 2 +-
 include/linux/blkdev.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5454db2fa263..5d7137bec48e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -568,7 +568,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 
 	blk_queue_dma_alignment(q, 511);
 	blk_set_default_limits(&q->limits);
-	q->nr_requests = BLKDEV_MAX_RQ;
+	q->nr_requests = BLKDEV_DEFAULT_RQ;
 
 	return q;
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 0f006cabfd91..2231fb0d4c35 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -606,7 +606,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	 * Additionally, this is a per-hw queue depth.
 	 */
 	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+				   BLKDEV_DEFAULT_RQ);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_sched_alloc_tags(q, hctx, i);
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 5246ae040704..1e46be6c5178 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -5,7 +5,7 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
-#define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ)
+#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
 
 void blk_mq_sched_assign_ioc(struct request *rq);
 
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e65c9d706f6f..bf60aebd0cfb 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -836,7 +836,7 @@ struct rbd_options {
 	u32 alloc_hint_flags;  /* CEPH_OSD_OP_ALLOC_HINT_FLAG_* */
 };
 
-#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_MAX_RQ
+#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_DEFAULT_RQ
 #define RBD_ALLOC_SIZE_DEFAULT	(64 * 1024)
 #define RBD_LOCK_TIMEOUT_DEFAULT 0  /* no timeout */
 #define RBD_READ_ONLY_DEFAULT	false
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 12b9dbcc980e..4baf9435232d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -40,7 +40,7 @@ struct blk_stat_callback;
 struct blk_keyslot_manager;
 
 #define BLKDEV_MIN_RQ	4
-#define BLKDEV_MAX_RQ	128	/* Default maximum */
+#define BLKDEV_DEFAULT_RQ	128
 
 /* Must be consistent with blk_mq_poll_stats_bkt() */
 #define BLK_MQ_POLL_STATS_BKTS 16
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 03/13] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
  2021-09-24  8:28 ` [PATCH v4 01/13] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
  2021-09-24  8:28 ` [PATCH v4 02/13] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24  8:28 ` [PATCH v4 04/13] blk-mq: Invert check " John Garry
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2316ff27c1f5..1a4bb2db30e5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3616,8 +3616,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (!hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
-			if (!ret && blk_mq_is_sbitmap_shared(set->flags))
-				blk_mq_tag_resize_shared_sbitmap(set, nr);
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
@@ -3635,9 +3633,13 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	}
 	if (!ret) {
 		q->nr_requests = nr;
-		if (q->elevator && blk_mq_is_sbitmap_shared(set->flags))
-			sbitmap_queue_resize(&q->sched_bitmap_tags,
-					     nr - set->reserved_tags);
+		if (blk_mq_is_sbitmap_shared(set->flags)) {
+			if (q->elevator)
+				sbitmap_queue_resize(&q->sched_bitmap_tags,
+						     nr - set->reserved_tags);
+			else
+				blk_mq_tag_resize_shared_sbitmap(set, nr);
+		}
 	}
 
 	blk_mq_unquiesce_queue(q);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 04/13] blk-mq: Invert check in blk_mq_update_nr_requests()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (2 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 03/13] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24  8:28 ` [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}() John Garry
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

It's easier to read:

if (x)
	X;
else
	Y;

over:

if (!x)
	Y;
else
	X;

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1a4bb2db30e5..47d6ab725bcc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3613,18 +3613,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		 * If we're using an MQ scheduler, just update the scheduler
 		 * queue depth. This is similar to what the old code would do.
 		 */
-		if (!hctx->sched_tags) {
-			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
-							false);
-		} else {
+		if (hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
-							nr, true);
+						      nr, true);
 			if (blk_mq_is_sbitmap_shared(set->flags)) {
 				hctx->sched_tags->bitmap_tags =
 					&q->sched_bitmap_tags;
 				hctx->sched_tags->breserved_tags =
 					&q->sched_breserved_tags;
 			}
+		} else {
+			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
+						      false);
 		}
 		if (ret)
 			break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (3 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 04/13] blk-mq: Invert check " John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24 10:08   ` Hannes Reinecke
  2021-09-24  8:28 ` [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() John Garry
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-sched.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 2231fb0d4c35..644b6d554d72 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -515,9 +515,9 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
 	percpu_ref_put(&q->q_usage_counter);
 }
 
-static int blk_mq_sched_alloc_tags(struct request_queue *q,
-				   struct blk_mq_hw_ctx *hctx,
-				   unsigned int hctx_idx)
+static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
+					  struct blk_mq_hw_ctx *hctx,
+					  unsigned int hctx_idx)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	int ret;
@@ -609,15 +609,15 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 				   BLKDEV_DEFAULT_RQ);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		ret = blk_mq_sched_alloc_tags(q, hctx, i);
+		ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
 		if (ret)
-			goto err_free_tags;
+			goto err_free_map_and_rqs;
 	}
 
 	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
 		ret = blk_mq_init_sched_shared_sbitmap(q);
 		if (ret)
-			goto err_free_tags;
+			goto err_free_map_and_rqs;
 	}
 
 	ret = e->ops.init_sched(q, e);
@@ -645,7 +645,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 err_free_sbitmap:
 	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
 		blk_mq_exit_sched_shared_sbitmap(q);
-err_free_tags:
+err_free_map_and_rqs:
 	blk_mq_sched_free_requests(q);
 	blk_mq_sched_tags_teardown(q);
 	q->elevator = NULL;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (4 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-26  1:15   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping() John Garry
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-core.c     | 2 +-
 block/blk-mq-sched.c | 6 +++---
 block/blk-mq-sched.h | 2 +-
 block/blk.h          | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5d7137bec48e..3480df0e958c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -406,7 +406,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	mutex_lock(&q->sysfs_lock);
 	if (q->elevator)
-		blk_mq_sched_free_requests(q);
+		blk_mq_sched_free_rqs(q);
 	mutex_unlock(&q->sysfs_lock);
 
 	percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 644b6d554d72..bdbb6c31b433 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -631,7 +631,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 			ret = e->ops.init_hctx(hctx, i);
 			if (ret) {
 				eq = q->elevator;
-				blk_mq_sched_free_requests(q);
+				blk_mq_sched_free_rqs(q);
 				blk_mq_exit_sched(q, eq);
 				kobject_put(&eq->kobj);
 				return ret;
@@ -646,7 +646,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
 		blk_mq_exit_sched_shared_sbitmap(q);
 err_free_map_and_rqs:
-	blk_mq_sched_free_requests(q);
+	blk_mq_sched_free_rqs(q);
 	blk_mq_sched_tags_teardown(q);
 	q->elevator = NULL;
 	return ret;
@@ -656,7 +656,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
  * called in either blk_queue_cleanup or elevator_switch, tagset
  * is required for freeing requests
  */
-void blk_mq_sched_free_requests(struct request_queue *q)
+void blk_mq_sched_free_rqs(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
 	int i;
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 1e46be6c5178..e70748d18754 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -28,7 +28,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
 void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
-void blk_mq_sched_free_requests(struct request_queue *q);
+void blk_mq_sched_free_rqs(struct request_queue *q);
 
 static inline bool
 blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk.h b/block/blk.h
index 7d2a0ba7ed21..6f125c8cddb0 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -200,7 +200,7 @@ static inline void elevator_exit(struct request_queue *q,
 {
 	lockdep_assert_held(&q->sysfs_lock);
 
-	blk_mq_sched_free_requests(q);
+	blk_mq_sched_free_rqs(q);
 	__elevator_exit(q, e);
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (5 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-26  1:42   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping John Garry
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 47d6ab725bcc..4bae8afdfbe1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2302,10 +2302,9 @@ static size_t order_to_size(unsigned int order)
 }
 
 /* called before freeing request pool in @tags */
-static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
-		struct blk_mq_tags *tags, unsigned int hctx_idx)
+static void blk_mq_clear_rq_mapping(struct blk_mq_tags *drv_tags,
+				    struct blk_mq_tags *tags)
 {
-	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
 	struct page *page;
 	unsigned long flags;
 
@@ -2314,7 +2313,7 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
 		unsigned long end = start + order_to_size(page->private);
 		int i;
 
-		for (i = 0; i < set->queue_depth; i++) {
+		for (i = 0; i < drv_tags->nr_tags; i++) {
 			struct request *rq = drv_tags->rqs[i];
 			unsigned long rq_addr = (unsigned long)rq;
 
@@ -2338,8 +2337,11 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
+	struct blk_mq_tags *drv_tags;
 	struct page *page;
 
+	drv_tags = set->tags[hctx_idx];
+
 	if (tags->static_rqs && set->ops->exit_request) {
 		int i;
 
@@ -2353,7 +2355,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		}
 	}
 
-	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+	blk_mq_clear_rq_mapping(drv_tags, tags);
 
 	while (!list_empty(&tags->page_list)) {
 		page = list_first_entry(&tags->page_list, struct page, lru);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (6 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-26  1:43   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 09/13] blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap() John Garry
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4bae8afdfbe1..5229c5420b85 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2308,6 +2308,10 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tags *drv_tags,
 	struct page *page;
 	unsigned long flags;
 
+	/* There is no need to clear a driver tags own mapping */
+	if (drv_tags == tags)
+		return;
+
 	list_for_each_entry(page, &tags->page_list, lru) {
 		unsigned long start = (unsigned long)page_address(page);
 		unsigned long end = start + order_to_size(page->private);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 09/13] blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (7 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24  8:28 ` [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs() John Garry
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-sched.c | 3 +--
 block/blk-mq-tag.c   | 6 ++++++
 block/blk-mq-tag.h   | 1 +
 block/blk-mq.c       | 3 +--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index bdbb6c31b433..6c15f6e98e2e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -575,8 +575,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 					&queue->sched_breserved_tags;
 	}
 
-	sbitmap_queue_resize(&queue->sched_bitmap_tags,
-			     queue->nr_requests - set->reserved_tags);
+	blk_mq_tag_update_sched_shared_sbitmap(queue);
 
 	return 0;
 }
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index ff5caeb82542..55b5a226dcc0 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -634,6 +634,12 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
 	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
 }
 
+void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
+{
+	sbitmap_queue_resize(&q->sched_bitmap_tags,
+			     q->nr_requests - q->tag_set->reserved_tags);
+}
+
 /**
  * blk_mq_unique_tag() - return a tag that is unique queue-wide
  * @rq: request for which to compute a unique tag
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 8ed55af08427..88f3c6485543 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -48,6 +48,7 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					unsigned int depth, bool can_grow);
 extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
 					     unsigned int size);
+extern void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q);
 
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5229c5420b85..5fec444d6399 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3641,8 +3641,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		q->nr_requests = nr;
 		if (blk_mq_is_sbitmap_shared(set->flags)) {
 			if (q->elevator)
-				sbitmap_queue_resize(&q->sched_bitmap_tags,
-						     nr - set->reserved_tags);
+				blk_mq_tag_update_sched_shared_sbitmap(q);
 			else
 				blk_mq_tag_resize_shared_sbitmap(set, nr);
 		}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (8 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 09/13] blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-26  2:00   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}() John Garry
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-sched.c | 15 +++--------
 block/blk-mq-tag.c   |  9 +------
 block/blk-mq.c       | 62 +++++++++++++++++++++++++-------------------
 block/blk-mq.h       |  9 ++-----
 4 files changed, 42 insertions(+), 53 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 6c15f6e98e2e..d1b56bb9ac64 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -519,21 +519,12 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
 					  struct blk_mq_hw_ctx *hctx,
 					  unsigned int hctx_idx)
 {
-	struct blk_mq_tag_set *set = q->tag_set;
-	int ret;
+	hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
+						    q->nr_requests);
 
-	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
-					       set->reserved_tags, set->flags);
 	if (!hctx->sched_tags)
 		return -ENOMEM;
-
-	ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
-	if (ret) {
-		blk_mq_free_rq_map(hctx->sched_tags, set->flags);
-		hctx->sched_tags = NULL;
-	}
-
-	return ret;
+	return 0;
 }
 
 /* called in queue's release handler, tagset has gone away */
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 55b5a226dcc0..db99f1246795 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -592,7 +592,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	if (tdepth > tags->nr_tags) {
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
 		struct blk_mq_tags *new;
-		bool ret;
 
 		if (!can_grow)
 			return -EINVAL;
@@ -604,15 +603,9 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		if (tdepth > MAX_SCHED_RQ)
 			return -EINVAL;
 
-		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
-				tags->nr_reserved_tags, set->flags);
+		new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
 		if (!new)
 			return -ENOMEM;
-		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
-		if (ret) {
-			blk_mq_free_rq_map(new, set->flags);
-			return -ENOMEM;
-		}
 
 		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
 		blk_mq_free_rq_map(*tagsptr, set->flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5fec444d6399..46772773b9c4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2383,11 +2383,11 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
 	blk_mq_free_tags(tags, flags);
 }
 
-struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
-					unsigned int hctx_idx,
-					unsigned int nr_tags,
-					unsigned int reserved_tags,
-					unsigned int flags)
+static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
+					       unsigned int hctx_idx,
+					       unsigned int nr_tags,
+					       unsigned int reserved_tags,
+					       unsigned int flags)
 {
 	struct blk_mq_tags *tags;
 	int node;
@@ -2435,8 +2435,9 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 	return 0;
 }
 
-int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx, unsigned int depth)
+static int blk_mq_alloc_rqs(struct blk_mq_tag_set *set,
+			    struct blk_mq_tags *tags,
+			    unsigned int hctx_idx, unsigned int depth)
 {
 	unsigned int i, j, entries_per_page, max_order = 4;
 	size_t rq_size, left;
@@ -2847,25 +2848,34 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 	}
 }
 
-static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
-					int hctx_idx)
+struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+					     unsigned int hctx_idx,
+					     unsigned int depth)
 {
-	unsigned int flags = set->flags;
-	int ret = 0;
+	struct blk_mq_tags *tags;
+	int ret;
 
-	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
-					set->queue_depth, set->reserved_tags, flags);
-	if (!set->tags[hctx_idx])
-		return false;
+	tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags,
+				   set->flags);
+	if (!tags)
+		return NULL;
 
-	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
-				set->queue_depth);
-	if (!ret)
-		return true;
+	ret = blk_mq_alloc_rqs(set, tags, hctx_idx, depth);
+	if (ret) {
+		blk_mq_free_rq_map(tags, set->flags);
+		return NULL;
+	}
 
-	blk_mq_free_rq_map(set->tags[hctx_idx], flags);
-	set->tags[hctx_idx] = NULL;
-	return false;
+	return tags;
+}
+
+static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+				       int hctx_idx)
+{
+	set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs(set, hctx_idx,
+						       set->queue_depth);
+
+	return set->tags[hctx_idx];
 }
 
 static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
@@ -2910,7 +2920,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 			hctx_idx = set->map[j].mq_map[i];
 			/* unmapped hw queue can be remapped after CPU topo changed */
 			if (!set->tags[hctx_idx] &&
-			    !__blk_mq_alloc_map_and_request(set, hctx_idx)) {
+			    !__blk_mq_alloc_map_and_rqs(set, hctx_idx)) {
 				/*
 				 * If tags initialization fail for some hctx,
 				 * that hctx won't be brought online.  In this
@@ -3343,7 +3353,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	int i;
 
 	for (i = 0; i < set->nr_hw_queues; i++) {
-		if (!__blk_mq_alloc_map_and_request(set, i))
+		if (!__blk_mq_alloc_map_and_rqs(set, i))
 			goto out_unwind;
 		cond_resched();
 	}
@@ -3362,7 +3372,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
  * may reduce the depth asked for, if memory is tight. set->queue_depth
  * will be updated to reflect the allocated depth.
  */
-static int blk_mq_alloc_map_and_requests(struct blk_mq_tag_set *set)
+static int blk_mq_alloc_set_map_and_rqs(struct blk_mq_tag_set *set)
 {
 	unsigned int depth;
 	int err;
@@ -3528,7 +3538,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
-	ret = blk_mq_alloc_map_and_requests(set);
+	ret = blk_mq_alloc_set_map_and_rqs(set);
 	if (ret)
 		goto out_free_mq_map;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a26..83585a344568 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -55,13 +55,8 @@ void blk_mq_put_rq_ref(struct request *rq);
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx);
 void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
-struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
-					unsigned int hctx_idx,
-					unsigned int nr_tags,
-					unsigned int reserved_tags,
-					unsigned int flags);
-int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx, unsigned int depth);
+struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+				unsigned int hctx_idx, unsigned int depth);
 
 /*
  * Internal helpers for request insertion into sw queues
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (9 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-26  2:05   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support John Garry
  2021-09-24  8:28 ` [PATCH v4 13/13] blk-mq: Stop using pointers for blk_mq_tags bitmap tags John Garry
  12 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-tag.c |  3 +--
 block/blk-mq.c     | 40 ++++++++++++++++++++++++----------------
 block/blk-mq.h     |  4 +++-
 3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index db99f1246795..a0ecc6d88f84 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -607,8 +607,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		if (!new)
 			return -ENOMEM;
 
-		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
-		blk_mq_free_rq_map(*tagsptr, set->flags);
+		blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
 		*tagsptr = new;
 	} else {
 		/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 46772773b9c4..464ea20b9bcb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2878,15 +2878,15 @@ static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 	return set->tags[hctx_idx];
 }
 
-static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
-					 unsigned int hctx_idx)
+void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+			     struct blk_mq_tags *tags,
+			     unsigned int hctx_idx)
 {
 	unsigned int flags = set->flags;
 
-	if (set->tags && set->tags[hctx_idx]) {
-		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
-		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
-		set->tags[hctx_idx] = NULL;
+	if (tags) {
+		blk_mq_free_rqs(set, tags, hctx_idx);
+		blk_mq_free_rq_map(tags, flags);
 	}
 }
 
@@ -2967,8 +2967,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 			 * fallback in case of a new remap fails
 			 * allocation
 			 */
-			if (i && set->tags[i])
-				blk_mq_free_map_and_requests(set, i);
+			if (i && set->tags[i]) {
+				blk_mq_free_map_and_rqs(set, set->tags[i], i);
+				set->tags[i] = NULL;
+			}
 
 			hctx->tags = NULL;
 			continue;
@@ -3264,8 +3266,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx = hctxs[j];
 
 		if (hctx) {
-			if (hctx->tags)
-				blk_mq_free_map_and_requests(set, j);
+			blk_mq_free_map_and_rqs(set, set->tags[j], j);
+			set->tags[j] = NULL;
 			blk_mq_exit_hctx(q, set, hctx, j);
 			hctxs[j] = NULL;
 		}
@@ -3361,8 +3363,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	return 0;
 
 out_unwind:
-	while (--i >= 0)
-		blk_mq_free_map_and_requests(set, i);
+	while (--i >= 0) {
+		blk_mq_free_map_and_rqs(set, set->tags[i], i);
+		set->tags[i] = NULL;
+	}
 
 	return -ENOMEM;
 }
@@ -3557,8 +3561,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	return 0;
 
 out_free_mq_rq_maps:
-	for (i = 0; i < set->nr_hw_queues; i++)
-		blk_mq_free_map_and_requests(set, i);
+	for (i = 0; i < set->nr_hw_queues; i++) {
+		blk_mq_free_map_and_rqs(set, set->tags[i], i);
+		set->tags[i] = NULL;
+	}
 out_free_mq_map:
 	for (i = 0; i < set->nr_maps; i++) {
 		kfree(set->map[i].mq_map);
@@ -3590,8 +3596,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i, j;
 
-	for (i = 0; i < set->nr_hw_queues; i++)
-		blk_mq_free_map_and_requests(set, i);
+	for (i = 0; i < set->nr_hw_queues; i++) {
+		blk_mq_free_map_and_rqs(set, set->tags[i], i);
+		set->tags[i] = NULL;
+	}
 
 	if (blk_mq_is_sbitmap_shared(set->flags))
 		blk_mq_exit_shared_sbitmap(set);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 83585a344568..bcb0ca89d37a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -57,7 +57,9 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
 struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 				unsigned int hctx_idx, unsigned int depth);
-
+void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+			     struct blk_mq_tags *tags,
+			     unsigned int hctx_idx);
 /*
  * Internal helpers for request insertion into sw queues
  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (10 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}() John Garry
@ 2021-09-24  8:28 ` John Garry
  2021-09-24 10:23   ` Hannes Reinecke
  2021-09-26  3:25   ` Ming Lei
  2021-09-24  8:28 ` [PATCH v4 13/13] blk-mq: Stop using pointers for blk_mq_tags bitmap tags John Garry
  12 siblings, 2 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).

Continue to use term "shared sbitmap" for now, as the meaning is known.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c   | 82 ++++++++++++++++++-------------------
 block/blk-mq-tag.c     | 61 ++++++++++------------------
 block/blk-mq-tag.h     |  6 +--
 block/blk-mq.c         | 91 +++++++++++++++++++++++-------------------
 block/blk-mq.h         |  5 ++-
 include/linux/blk-mq.h | 15 ++++---
 include/linux/blkdev.h |  3 +-
 7 files changed, 125 insertions(+), 138 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d1b56bb9ac64..428da4949d80 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -519,6 +519,11 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
 					  struct blk_mq_hw_ctx *hctx,
 					  unsigned int hctx_idx)
 {
+	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+		hctx->sched_tags = q->shared_sbitmap_tags;
+		return 0;
+	}
+
 	hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
 						    q->nr_requests);
 
@@ -527,61 +532,54 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
 	return 0;
 }
 
+static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
+{
+	blk_mq_free_rq_map(queue->shared_sbitmap_tags);
+	queue->shared_sbitmap_tags = NULL;
+}
+
 /* called in queue's release handler, tagset has gone away */
-static void blk_mq_sched_tags_teardown(struct request_queue *q)
+static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags)
 {
 	struct blk_mq_hw_ctx *hctx;
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->sched_tags) {
-			blk_mq_free_rq_map(hctx->sched_tags, hctx->flags);
+			if (!blk_mq_is_sbitmap_shared(q->tag_set->flags))
+				blk_mq_free_rq_map(hctx->sched_tags);
 			hctx->sched_tags = NULL;
 		}
 	}
+
+	if (blk_mq_is_sbitmap_shared(flags))
+		blk_mq_exit_sched_shared_sbitmap(q);
 }
 
 static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 {
 	struct blk_mq_tag_set *set = queue->tag_set;
-	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
-	struct blk_mq_hw_ctx *hctx;
-	int ret, i;
 
 	/*
 	 * Set initial depth at max so that we don't need to reallocate for
 	 * updating nr_requests.
 	 */
-	ret = blk_mq_init_bitmaps(&queue->sched_bitmap_tags,
-				  &queue->sched_breserved_tags,
-				  MAX_SCHED_RQ, set->reserved_tags,
-				  set->numa_node, alloc_policy);
-	if (ret)
-		return ret;
-
-	queue_for_each_hw_ctx(queue, hctx, i) {
-		hctx->sched_tags->bitmap_tags =
-					&queue->sched_bitmap_tags;
-		hctx->sched_tags->breserved_tags =
-					&queue->sched_breserved_tags;
-	}
+	queue->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+						BLK_MQ_NO_HCTX_IDX,
+						MAX_SCHED_RQ);
+	if (!queue->shared_sbitmap_tags)
+		return -ENOMEM;
 
 	blk_mq_tag_update_sched_shared_sbitmap(queue);
 
 	return 0;
 }
 
-static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
-{
-	sbitmap_queue_free(&queue->sched_bitmap_tags);
-	sbitmap_queue_free(&queue->sched_breserved_tags);
-}
-
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 {
+	unsigned int i, flags = q->tag_set->flags;
 	struct blk_mq_hw_ctx *hctx;
 	struct elevator_queue *eq;
-	unsigned int i;
 	int ret;
 
 	if (!e) {
@@ -598,21 +596,21 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
 				   BLKDEV_DEFAULT_RQ);
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
+	if (blk_mq_is_sbitmap_shared(flags)) {
+		ret = blk_mq_init_sched_shared_sbitmap(q);
 		if (ret)
-			goto err_free_map_and_rqs;
+			return ret;
 	}
 
-	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
-		ret = blk_mq_init_sched_shared_sbitmap(q);
+	queue_for_each_hw_ctx(q, hctx, i) {
+		ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
 		if (ret)
 			goto err_free_map_and_rqs;
 	}
 
 	ret = e->ops.init_sched(q, e);
 	if (ret)
-		goto err_free_sbitmap;
+		goto err_free_map_and_rqs;
 
 	blk_mq_debugfs_register_sched(q);
 
@@ -632,12 +630,10 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 
 	return 0;
 
-err_free_sbitmap:
-	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
-		blk_mq_exit_sched_shared_sbitmap(q);
 err_free_map_and_rqs:
 	blk_mq_sched_free_rqs(q);
-	blk_mq_sched_tags_teardown(q);
+	blk_mq_sched_tags_teardown(q, flags);
+
 	q->elevator = NULL;
 	return ret;
 }
@@ -651,9 +647,15 @@ void blk_mq_sched_free_rqs(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	int i;
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		if (hctx->sched_tags)
-			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+		blk_mq_free_rqs(q->tag_set, q->shared_sbitmap_tags,
+				BLK_MQ_NO_HCTX_IDX);
+	} else {
+		queue_for_each_hw_ctx(q, hctx, i) {
+			if (hctx->sched_tags)
+				blk_mq_free_rqs(q->tag_set,
+						hctx->sched_tags, i);
+		}
 	}
 }
 
@@ -674,8 +676,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 	blk_mq_debugfs_unregister_sched(q);
 	if (e->type->ops.exit_sched)
 		e->type->ops.exit_sched(e);
-	blk_mq_sched_tags_teardown(q);
-	if (blk_mq_is_sbitmap_shared(flags))
-		blk_mq_exit_sched_shared_sbitmap(q);
+	blk_mq_sched_tags_teardown(q, flags);
 	q->elevator = NULL;
 }
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a0ecc6d88f84..4e71ce6b37ea 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -27,10 +27,11 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
 		struct request_queue *q = hctx->queue;
 		struct blk_mq_tag_set *set = q->tag_set;
+		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
 
 		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
 		    !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
-			atomic_inc(&set->active_queues_shared_sbitmap);
+			atomic_inc(&tags->active_queues);
 	} else {
 		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
 		    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
@@ -61,10 +62,12 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tag_set *set = q->tag_set;
 
 	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
+
 		if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
 					&q->queue_flags))
 			return;
-		atomic_dec(&set->active_queues_shared_sbitmap);
+		atomic_dec(&tags->active_queues);
 	} else {
 		if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 			return;
@@ -510,38 +513,10 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return 0;
 }
 
-int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set)
-{
-	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
-	int i, ret;
-
-	ret = blk_mq_init_bitmaps(&set->__bitmap_tags, &set->__breserved_tags,
-				  set->queue_depth, set->reserved_tags,
-				  set->numa_node, alloc_policy);
-	if (ret)
-		return ret;
-
-	for (i = 0; i < set->nr_hw_queues; i++) {
-		struct blk_mq_tags *tags = set->tags[i];
-
-		tags->bitmap_tags = &set->__bitmap_tags;
-		tags->breserved_tags = &set->__breserved_tags;
-	}
-
-	return 0;
-}
-
-void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
-{
-	sbitmap_queue_free(&set->__bitmap_tags);
-	sbitmap_queue_free(&set->__breserved_tags);
-}
-
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 				     unsigned int reserved_tags,
-				     int node, unsigned int flags)
+				     int node, int alloc_policy)
 {
-	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
 	struct blk_mq_tags *tags;
 
 	if (total_tags > BLK_MQ_TAG_MAX) {
@@ -557,9 +532,6 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_reserved_tags = reserved_tags;
 	spin_lock_init(&tags->lock);
 
-	if (blk_mq_is_sbitmap_shared(flags))
-		return tags;
-
 	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
 		kfree(tags);
 		return NULL;
@@ -567,12 +539,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	return tags;
 }
 
-void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
+void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	if (!blk_mq_is_sbitmap_shared(flags)) {
-		sbitmap_queue_free(tags->bitmap_tags);
-		sbitmap_queue_free(tags->breserved_tags);
-	}
+	sbitmap_queue_free(tags->bitmap_tags);
+	sbitmap_queue_free(tags->breserved_tags);
 	kfree(tags);
 }
 
@@ -603,6 +573,13 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		if (tdepth > MAX_SCHED_RQ)
 			return -EINVAL;
 
+		/*
+		 * Only the sbitmap needs resizing since we allocated the max
+		 * initially.
+		 */
+		if (blk_mq_is_sbitmap_shared(set->flags))
+			return 0;
+
 		new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
 		if (!new)
 			return -ENOMEM;
@@ -623,12 +600,14 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 
 void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
 {
-	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
+	struct blk_mq_tags *tags = set->shared_sbitmap_tags;
+
+	sbitmap_queue_resize(&tags->__bitmap_tags, size - set->reserved_tags);
 }
 
 void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
 {
-	sbitmap_queue_resize(&q->sched_bitmap_tags,
+	sbitmap_queue_resize(q->shared_sbitmap_tags->bitmap_tags,
 			     q->nr_requests - q->tag_set->reserved_tags);
 }
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 88f3c6485543..e433e39a9cfa 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -30,16 +30,14 @@ struct blk_mq_tags {
 
 extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 					unsigned int reserved_tags,
-					int node, unsigned int flags);
-extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
+					int node, int alloc_policy);
+extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 extern int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
 			       struct sbitmap_queue *breserved_tags,
 			       unsigned int queue_depth,
 			       unsigned int reserved,
 			       int node, int alloc_policy);
 
-extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set);
-extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
 extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 			   unsigned int tag);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 464ea20b9bcb..ece43855bcdf 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2344,7 +2344,10 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	struct blk_mq_tags *drv_tags;
 	struct page *page;
 
-	drv_tags = set->tags[hctx_idx];
+	if (blk_mq_is_sbitmap_shared(set->flags))
+		drv_tags = set->shared_sbitmap_tags;
+	else
+		drv_tags = set->tags[hctx_idx];
 
 	if (tags->static_rqs && set->ops->exit_request) {
 		int i;
@@ -2373,21 +2376,20 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	}
 }
 
-void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
+void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 {
 	kfree(tags->rqs);
 	tags->rqs = NULL;
 	kfree(tags->static_rqs);
 	tags->static_rqs = NULL;
 
-	blk_mq_free_tags(tags, flags);
+	blk_mq_free_tags(tags);
 }
 
 static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					       unsigned int hctx_idx,
 					       unsigned int nr_tags,
-					       unsigned int reserved_tags,
-					       unsigned int flags)
+					       unsigned int reserved_tags)
 {
 	struct blk_mq_tags *tags;
 	int node;
@@ -2396,7 +2398,8 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node, flags);
+	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
 
@@ -2404,7 +2407,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 				 node);
 	if (!tags->rqs) {
-		blk_mq_free_tags(tags, flags);
+		blk_mq_free_tags(tags);
 		return NULL;
 	}
 
@@ -2413,7 +2416,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					node);
 	if (!tags->static_rqs) {
 		kfree(tags->rqs);
-		blk_mq_free_tags(tags, flags);
+		blk_mq_free_tags(tags);
 		return NULL;
 	}
 
@@ -2855,14 +2858,13 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 	struct blk_mq_tags *tags;
 	int ret;
 
-	tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags,
-				   set->flags);
+	tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags);
 	if (!tags)
 		return NULL;
 
 	ret = blk_mq_alloc_rqs(set, tags, hctx_idx, depth);
 	if (ret) {
-		blk_mq_free_rq_map(tags, set->flags);
+		blk_mq_free_rq_map(tags);
 		return NULL;
 	}
 
@@ -2872,6 +2874,12 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 				       int hctx_idx)
 {
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		set->tags[hctx_idx] = set->shared_sbitmap_tags;
+
+		return true;
+	}
+
 	set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs(set, hctx_idx,
 						       set->queue_depth);
 
@@ -2882,14 +2890,22 @@ void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
 			     struct blk_mq_tags *tags,
 			     unsigned int hctx_idx)
 {
-	unsigned int flags = set->flags;
-
 	if (tags) {
 		blk_mq_free_rqs(set, tags, hctx_idx);
-		blk_mq_free_rq_map(tags, flags);
+		blk_mq_free_rq_map(tags);
 	}
 }
 
+static void __blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+				      struct blk_mq_tags *tags,
+				      unsigned int hctx_idx)
+{
+	if (blk_mq_is_sbitmap_shared(set->flags))
+		return;
+
+	blk_mq_free_map_and_rqs(set, tags, hctx_idx);
+}
+
 static void blk_mq_map_swqueue(struct request_queue *q)
 {
 	unsigned int i, j, hctx_idx;
@@ -2968,7 +2984,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 			 * allocation
 			 */
 			if (i && set->tags[i]) {
-				blk_mq_free_map_and_rqs(set, set->tags[i], i);
+				__blk_mq_free_map_and_rqs(set, set->tags[i], i);
 				set->tags[i] = NULL;
 			}
 
@@ -3266,7 +3282,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx = hctxs[j];
 
 		if (hctx) {
-			blk_mq_free_map_and_rqs(set, set->tags[j], j);
+			__blk_mq_free_map_and_rqs(set, set->tags[j], j);
 			set->tags[j] = NULL;
 			blk_mq_exit_hctx(q, set, hctx, j);
 			hctxs[j] = NULL;
@@ -3354,6 +3370,14 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		set->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+						BLK_MQ_NO_HCTX_IDX,
+						set->queue_depth);
+		if (!set->shared_sbitmap_tags)
+			return -ENOMEM;
+	}
+
 	for (i = 0; i < set->nr_hw_queues; i++) {
 		if (!__blk_mq_alloc_map_and_rqs(set, i))
 			goto out_unwind;
@@ -3364,10 +3388,15 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 
 out_unwind:
 	while (--i >= 0) {
-		blk_mq_free_map_and_rqs(set, set->tags[i], i);
+		__blk_mq_free_map_and_rqs(set, set->tags[i], i);
 		set->tags[i] = NULL;
 	}
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+					BLK_MQ_NO_HCTX_IDX);
+	}
+
 	return -ENOMEM;
 }
 
@@ -3546,25 +3575,11 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
-	if (blk_mq_is_sbitmap_shared(set->flags)) {
-		atomic_set(&set->active_queues_shared_sbitmap, 0);
-
-		if (blk_mq_init_shared_sbitmap(set)) {
-			ret = -ENOMEM;
-			goto out_free_mq_rq_maps;
-		}
-	}
-
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
 	return 0;
 
-out_free_mq_rq_maps:
-	for (i = 0; i < set->nr_hw_queues; i++) {
-		blk_mq_free_map_and_rqs(set, set->tags[i], i);
-		set->tags[i] = NULL;
-	}
 out_free_mq_map:
 	for (i = 0; i < set->nr_maps; i++) {
 		kfree(set->map[i].mq_map);
@@ -3597,12 +3612,14 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 	int i, j;
 
 	for (i = 0; i < set->nr_hw_queues; i++) {
-		blk_mq_free_map_and_rqs(set, set->tags[i], i);
+		__blk_mq_free_map_and_rqs(set, set->tags[i], i);
 		set->tags[i] = NULL;
 	}
 
-	if (blk_mq_is_sbitmap_shared(set->flags))
-		blk_mq_exit_shared_sbitmap(set);
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+					BLK_MQ_NO_HCTX_IDX);
+	}
 
 	for (j = 0; j < set->nr_maps; j++) {
 		kfree(set->map[j].mq_map);
@@ -3640,12 +3657,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 						      nr, true);
-			if (blk_mq_is_sbitmap_shared(set->flags)) {
-				hctx->sched_tags->bitmap_tags =
-					&q->sched_bitmap_tags;
-				hctx->sched_tags->breserved_tags =
-					&q->sched_breserved_tags;
-			}
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 						      false);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index bcb0ca89d37a..b34385211e0a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -54,7 +54,7 @@ void blk_mq_put_rq_ref(struct request *rq);
  */
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx);
-void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
+void blk_mq_free_rq_map(struct blk_mq_tags *tags);
 struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
 				unsigned int hctx_idx, unsigned int depth);
 void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
@@ -331,10 +331,11 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
 		struct request_queue *q = hctx->queue;
 		struct blk_mq_tag_set *set = q->tag_set;
+		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
 
 		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
 			return true;
-		users = atomic_read(&set->active_queues_shared_sbitmap);
+		users = atomic_read(&tags->active_queues);
 	} else {
 		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 			return true;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 13ba1861e688..808854a8ebc4 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -232,13 +232,11 @@ enum hctx_type {
  * @flags:	   Zero or more BLK_MQ_F_* flags.
  * @driver_data:   Pointer to data owned by the block driver that created this
  *		   tag set.
- * @active_queues_shared_sbitmap:
- * 		   number of active request queues per tag set.
- * @__bitmap_tags: A shared tags sbitmap, used over all hctx's
- * @__breserved_tags:
- *		   A shared reserved tags sbitmap, used over all hctx's
  * @tags:	   Tag sets. One tag set per hardware queue. Has @nr_hw_queues
  *		   elements.
+ * @shared_sbitmap_tags:
+ *		   Shared sbitmap set of tags. Has @nr_hw_queues elements. If
+ *		   set, shared by all @tags.
  * @tag_list_lock: Serializes tag_list accesses.
  * @tag_list:	   List of the request queues that use this tag set. See also
  *		   request_queue.tag_set_list.
@@ -255,12 +253,11 @@ struct blk_mq_tag_set {
 	unsigned int		timeout;
 	unsigned int		flags;
 	void			*driver_data;
-	atomic_t		active_queues_shared_sbitmap;
 
-	struct sbitmap_queue	__bitmap_tags;
-	struct sbitmap_queue	__breserved_tags;
 	struct blk_mq_tags	**tags;
 
+	struct blk_mq_tags	*shared_sbitmap_tags;
+
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
 };
@@ -432,6 +429,8 @@ enum {
 	((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
 		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
 
+#define BLK_MQ_NO_HCTX_IDX	(-1U)
+
 struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
 		struct lock_class_key *lkclass);
 #define blk_mq_alloc_disk(set, queuedata)				\
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4baf9435232d..17e50e5ef47b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -459,8 +459,7 @@ struct request_queue {
 
 	atomic_t		nr_active_requests_shared_sbitmap;
 
-	struct sbitmap_queue	sched_bitmap_tags;
-	struct sbitmap_queue	sched_breserved_tags;
+	struct blk_mq_tags	*shared_sbitmap_tags;
 
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 13/13] blk-mq: Stop using pointers for blk_mq_tags bitmap tags
  2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (11 preceding siblings ...)
  2021-09-24  8:28 ` [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support John Garry
@ 2021-09-24  8:28 ` John Garry
  12 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-24  8:28 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare, John Garry

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae03cdd ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/bfq-iosched.c    |  4 +--
 block/blk-mq-debugfs.c |  8 +++---
 block/blk-mq-tag.c     | 56 +++++++++++++++---------------------------
 block/blk-mq-tag.h     |  7 ++----
 block/blk-mq.c         |  8 +++---
 block/kyber-iosched.c  |  4 +--
 block/mq-deadline.c    |  2 +-
 7 files changed, 35 insertions(+), 54 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index dd13c2bbc29c..4674f85d7df0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6894,8 +6894,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4b66d2776eda..4000376330c9 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -452,11 +452,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 		   atomic_read(&tags->active_queues));
 
 	seq_puts(m, "\nbitmap_tags:\n");
-	sbitmap_queue_show(tags->bitmap_tags, m);
+	sbitmap_queue_show(&tags->bitmap_tags, m);
 
 	if (tags->nr_reserved_tags) {
 		seq_puts(m, "\nbreserved_tags:\n");
-		sbitmap_queue_show(tags->breserved_tags, m);
+		sbitmap_queue_show(&tags->breserved_tags, m);
 	}
 }
 
@@ -487,7 +487,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
+		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
@@ -521,7 +521,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->sched_tags)
-		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
+		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 4e71ce6b37ea..4cf0f74f4fa9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -46,9 +46,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(tags->bitmap_tags);
+	sbitmap_queue_wake_all(&tags->bitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(tags->breserved_tags);
+		sbitmap_queue_wake_all(&tags->breserved_tags);
 }
 
 /*
@@ -104,10 +104,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_NO_TAG;
 		}
-		bt = tags->breserved_tags;
+		bt = &tags->breserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = tags->bitmap_tags;
+		bt = &tags->bitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -153,9 +153,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = tags->breserved_tags;
+			bt = &tags->breserved_tags;
 		else
-			bt = tags->bitmap_tags;
+			bt = &tags->bitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -189,10 +189,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -343,9 +343,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
 	WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
 
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
+		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
 				 flags | BT_TAG_ITER_RESERVED);
-	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
+	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
 }
 
 /**
@@ -462,8 +462,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -495,24 +495,6 @@ int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
 	return -ENOMEM;
 }
 
-static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
-				   int node, int alloc_policy)
-{
-	int ret;
-
-	ret = blk_mq_init_bitmaps(&tags->__bitmap_tags,
-				  &tags->__breserved_tags,
-				  tags->nr_tags, tags->nr_reserved_tags,
-				  node, alloc_policy);
-	if (ret)
-		return ret;
-
-	tags->bitmap_tags = &tags->__bitmap_tags;
-	tags->breserved_tags = &tags->__breserved_tags;
-
-	return 0;
-}
-
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
@@ -532,7 +514,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_reserved_tags = reserved_tags;
 	spin_lock_init(&tags->lock);
 
-	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+	if (blk_mq_init_bitmaps(&tags->bitmap_tags, &tags->breserved_tags,
+				total_tags, reserved_tags, node,
+				alloc_policy) < 0) {
 		kfree(tags);
 		return NULL;
 	}
@@ -541,8 +525,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	sbitmap_queue_free(tags->bitmap_tags);
-	sbitmap_queue_free(tags->breserved_tags);
+	sbitmap_queue_free(&tags->bitmap_tags);
+	sbitmap_queue_free(&tags->breserved_tags);
 	kfree(tags);
 }
 
@@ -591,7 +575,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(tags->bitmap_tags,
+		sbitmap_queue_resize(&tags->bitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
@@ -602,12 +586,12 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
 {
 	struct blk_mq_tags *tags = set->shared_sbitmap_tags;
 
-	sbitmap_queue_resize(&tags->__bitmap_tags, size - set->reserved_tags);
+	sbitmap_queue_resize(&tags->bitmap_tags, size - set->reserved_tags);
 }
 
 void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
 {
-	sbitmap_queue_resize(q->shared_sbitmap_tags->bitmap_tags,
+	sbitmap_queue_resize(&q->shared_sbitmap_tags->bitmap_tags,
 			     q->nr_requests - q->tag_set->reserved_tags);
 }
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index e433e39a9cfa..23747ea2bb53 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -11,11 +11,8 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
-	struct sbitmap_queue *bitmap_tags;
-	struct sbitmap_queue *breserved_tags;
-
-	struct sbitmap_queue __bitmap_tags;
-	struct sbitmap_queue __breserved_tags;
+	struct sbitmap_queue bitmap_tags;
+	struct sbitmap_queue breserved_tags;
 
 	struct request **rqs;
 	struct request **static_rqs;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ece43855bcdf..5baa35cae8a0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1062,14 +1062,14 @@ static inline unsigned int queued_to_index(unsigned int queued)
 
 static bool __blk_mq_get_driver_tag(struct request *rq)
 {
-	struct sbitmap_queue *bt = rq->mq_hctx->tags->bitmap_tags;
+	struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags;
 	unsigned int tag_offset = rq->mq_hctx->tags->nr_reserved_tags;
 	int tag;
 
 	blk_mq_tag_busy(rq->mq_hctx);
 
 	if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag)) {
-		bt = rq->mq_hctx->tags->breserved_tags;
+		bt = &rq->mq_hctx->tags->breserved_tags;
 		tag_offset = 0;
 	} else {
 		if (!hctx_may_queue(rq->mq_hctx, bt))
@@ -1112,7 +1112,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = hctx->tags->bitmap_tags;
+		sbq = &hctx->tags->bitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1130,7 +1130,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 15a8be57203d..9fb735bf1134 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -451,11 +451,11 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
 {
 	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
-	unsigned int shift = tags->bitmap_tags->sb.shift;
+	unsigned int shift = tags->bitmap_tags.sb.shift;
 
 	kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
 
-	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, kqd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
 }
 
 static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 7f3c3932b723..7fd07d00838e 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -519,7 +519,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)
 
 	dd->async_depth = max(1UL, 3 * q->nr_requests / 4);
 
-	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, dd->async_depth);
+	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
 }
 
 /* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
  2021-09-24  8:28 ` [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}() John Garry
@ 2021-09-24 10:08   ` Hannes Reinecke
  0 siblings, 0 replies; 30+ messages in thread
From: Hannes Reinecke @ 2021-09-24 10:08 UTC (permalink / raw)
  To: John Garry, axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei

On 9/24/21 10:28 AM, John Garry wrote:
> Function blk_mq_sched_alloc_tags() does same as
> __blk_mq_alloc_map_and_request(), so give a similar name to be consistent.
> 
> Similarly rename label err_free_tags -> err_free_map_and_rqs.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq-sched.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-24  8:28 ` [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support John Garry
@ 2021-09-24 10:23   ` Hannes Reinecke
  2021-09-24 10:39     ` John Garry
  2021-09-26  3:25   ` Ming Lei
  1 sibling, 1 reply; 30+ messages in thread
From: Hannes Reinecke @ 2021-09-24 10:23 UTC (permalink / raw)
  To: John Garry, axboe; +Cc: linux-block, linux-kernel, linux-scsi, ming.lei

On 9/24/21 10:28 AM, John Garry wrote:
> Currently we use separate sbitmap pairs and active_queues atomic_t for
> shared sbitmap support.
> 
> However a full sets of static requests are used per HW queue, which is
> quite wasteful, considering that the total number of requests usable at
> any given time across all HW queues is limited by the shared sbitmap depth.
> 
> As such, it is considerably more memory efficient in the case of shared
> sbitmap to allocate a set of static rqs per tag set or request queue, and
> not per HW queue.
> 
> So replace the sbitmap pairs and active_queues atomic_t with a shared
> tags per tagset and request queue, which will hold a set of shared static
> rqs.
> 
> Since there is now no valid HW queue index to be passed to the blk_mq_ops
> .init and .exit_request callbacks, pass an invalid index token. This
> changes the semantics of the APIs, such that the callback would need to
> validate the HW queue index before using it. Currently no user of shared
> sbitmap actually uses the HW queue index (as would be expected).
> 
> Continue to use term "shared sbitmap" for now, as the meaning is known.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>   block/blk-mq-sched.c   | 82 ++++++++++++++++++-------------------
>   block/blk-mq-tag.c     | 61 ++++++++++------------------
>   block/blk-mq-tag.h     |  6 +--
>   block/blk-mq.c         | 91 +++++++++++++++++++++++-------------------
>   block/blk-mq.h         |  5 ++-
>   include/linux/blk-mq.h | 15 ++++---
>   include/linux/blkdev.h |  3 +-
>   7 files changed, 125 insertions(+), 138 deletions(-)
> 
The overall idea to keep the full request allocation per queue was to 
ensure memory locality for the requests themselves.
When moving to a shared request structure we obviously loose that feature.

But I'm not sure if that matters here; the performance impact might be 
too small to be measurable, seeing that we'll be most likely bound by 
hardware latencies anyway.

Nevertheless: have you tested for performance regressions with this 
patchset?
I'm especially thinking of Kashyaps high-IOPS megaraid setup; if there 
is a performance impact that'll be likely scenario where we can measure it.

But even if there is a performance impact this patchset might be 
worthwhile, seeing that it'll reduce the memory footprint massively.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-24 10:23   ` Hannes Reinecke
@ 2021-09-24 10:39     ` John Garry
  2021-09-29 13:36       ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-24 10:39 UTC (permalink / raw)
  To: Hannes Reinecke, axboe
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, Kashyap Desai

+ Kashyap

On 24/09/2021 11:23, Hannes Reinecke wrote:
> On 9/24/21 10:28 AM, John Garry wrote:
>> Currently we use separate sbitmap pairs and active_queues atomic_t for
>> shared sbitmap support.
>>
>> However a full sets of static requests are used per HW queue, which is
>> quite wasteful, considering that the total number of requests usable at
>> any given time across all HW queues is limited by the shared sbitmap 
>> depth.
>>
>> As such, it is considerably more memory efficient in the case of shared
>> sbitmap to allocate a set of static rqs per tag set or request queue, and
>> not per HW queue.
>>
>> So replace the sbitmap pairs and active_queues atomic_t with a shared
>> tags per tagset and request queue, which will hold a set of shared static
>> rqs.
>>
>> Since there is now no valid HW queue index to be passed to the blk_mq_ops
>> .init and .exit_request callbacks, pass an invalid index token. This
>> changes the semantics of the APIs, such that the callback would need to
>> validate the HW queue index before using it. Currently no user of shared
>> sbitmap actually uses the HW queue index (as would be expected).
>>
>> Continue to use term "shared sbitmap" for now, as the meaning is known.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   block/blk-mq-sched.c   | 82 ++++++++++++++++++-------------------
>>   block/blk-mq-tag.c     | 61 ++++++++++------------------
>>   block/blk-mq-tag.h     |  6 +--
>>   block/blk-mq.c         | 91 +++++++++++++++++++++++-------------------
>>   block/blk-mq.h         |  5 ++-
>>   include/linux/blk-mq.h | 15 ++++---
>>   include/linux/blkdev.h |  3 +-
>>   7 files changed, 125 insertions(+), 138 deletions(-)
>>
> The overall idea to keep the full request allocation per queue was to 
> ensure memory locality for the requests themselves.
> When moving to a shared request structure we obviously loose that feature.
> 
> But I'm not sure if that matters here; the performance impact might be 
> too small to be measurable, seeing that we'll be most likely bound by 
> hardware latencies anyway.
> 
> Nevertheless: have you tested for performance regressions with this 
> patchset?

I have tested relatively lower rates, like ~450K IOPS, without any 
noticeable regression.

> I'm especially thinking of Kashyaps high-IOPS megaraid setup; if there 
> is a performance impact that'll be likely scenario where we can measure it.
> 

I can test higher rates, like 2M IOPS, when I get access to the HW.

@Kashyap, Any chance you can help test performance here?

> But even if there is a performance impact this patchset might be 
> worthwhile, seeing that it'll reduce the memory footprint massively.

Sure, I don't think that minor performance improvements can justify the 
excessive memory.

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
  2021-09-24  8:28 ` [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() John Garry
@ 2021-09-26  1:15   ` Ming Lei
  0 siblings, 0 replies; 30+ messages in thread
From: Ming Lei @ 2021-09-26  1:15 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:23PM +0800, John Garry wrote:
> To be more concise and consistent in naming, rename
> blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
  2021-09-24  8:28 ` [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping() John Garry
@ 2021-09-26  1:42   ` Ming Lei
  0 siblings, 0 replies; 30+ messages in thread
From: Ming Lei @ 2021-09-26  1:42 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:24PM +0800, John Garry wrote:
> Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
> in future, so pass a driver tags pointer instead of the tagset container
> and HW queue index.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping
  2021-09-24  8:28 ` [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping John Garry
@ 2021-09-26  1:43   ` Ming Lei
  0 siblings, 0 replies; 30+ messages in thread
From: Ming Lei @ 2021-09-26  1:43 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:25PM +0800, John Garry wrote:
> Function blk_mq_clear_rq_mapping() is required to clear the sched tags
> mappings in driver tags rqs[].
> 
> But there is no need for a driver tags to clear its own mapping, so skip
> clearing the mapping in this scenario.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs()
  2021-09-24  8:28 ` [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs() John Garry
@ 2021-09-26  2:00   ` Ming Lei
  0 siblings, 0 replies; 30+ messages in thread
From: Ming Lei @ 2021-09-26  2:00 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:27PM +0800, John Garry wrote:
> Add a function to combine allocating tags and the associated requests,
> and factor out common patterns to use this new function.
> 
> Some function only call blk_mq_alloc_map_and_rqs() now, but more
> functionality will be added later.
> 
> Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
> are only used in blk-mq.c, and finally rename some functions for
> conciseness and consistency with other function names:
> - __blk_mq_alloc_map_and_{request -> rqs}()
> - blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()
> 
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  2021-09-24  8:28 ` [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}() John Garry
@ 2021-09-26  2:05   ` Ming Lei
  2021-09-27  9:02     ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Ming Lei @ 2021-09-26  2:05 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:28PM +0800, John Garry wrote:
> Refactor blk_mq_free_map_and_requests() such that it can be used at many
> sites at which the tag map and rqs are freed.
> 
> Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
> alloc equivalent.
> 
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: John Garry <john.garry@huawei.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/blk-mq-tag.c |  3 +--
>  block/blk-mq.c     | 40 ++++++++++++++++++++++++----------------
>  block/blk-mq.h     |  4 +++-
>  3 files changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index db99f1246795..a0ecc6d88f84 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -607,8 +607,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  		if (!new)
>  			return -ENOMEM;
>  
> -		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
> -		blk_mq_free_rq_map(*tagsptr, set->flags);
> +		blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
>  		*tagsptr = new;
>  	} else {
>  		/*
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 46772773b9c4..464ea20b9bcb 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2878,15 +2878,15 @@ static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>  	return set->tags[hctx_idx];
>  }
>  
> -static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
> -					 unsigned int hctx_idx)
> +void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
> +			     struct blk_mq_tags *tags,
> +			     unsigned int hctx_idx)
>  {
>  	unsigned int flags = set->flags;
>  
> -	if (set->tags && set->tags[hctx_idx]) {
> -		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
> -		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
> -		set->tags[hctx_idx] = NULL;
> +	if (tags) {
> +		blk_mq_free_rqs(set, tags, hctx_idx);
> +		blk_mq_free_rq_map(tags, flags);
>  	}
>  }
>  
> @@ -2967,8 +2967,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  			 * fallback in case of a new remap fails
>  			 * allocation
>  			 */
> -			if (i && set->tags[i])
> -				blk_mq_free_map_and_requests(set, i);
> +			if (i && set->tags[i]) {
> +				blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +				set->tags[i] = NULL;
> +			}
>  
>  			hctx->tags = NULL;
>  			continue;
> @@ -3264,8 +3266,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx = hctxs[j];
>  
>  		if (hctx) {
> -			if (hctx->tags)
> -				blk_mq_free_map_and_requests(set, j);
> +			blk_mq_free_map_and_rqs(set, set->tags[j], j);
> +			set->tags[j] = NULL;
>  			blk_mq_exit_hctx(q, set, hctx, j);
>  			hctxs[j] = NULL;
>  		}
> @@ -3361,8 +3363,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  	return 0;
>  
>  out_unwind:
> -	while (--i >= 0)
> -		blk_mq_free_map_and_requests(set, i);
> +	while (--i >= 0) {
> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +		set->tags[i] = NULL;
> +	}
>  
>  	return -ENOMEM;
>  }
> @@ -3557,8 +3561,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	return 0;
>  
>  out_free_mq_rq_maps:
> -	for (i = 0; i < set->nr_hw_queues; i++)
> -		blk_mq_free_map_and_requests(set, i);
> +	for (i = 0; i < set->nr_hw_queues; i++) {
> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +		set->tags[i] = NULL;
> +	}
>  out_free_mq_map:
>  	for (i = 0; i < set->nr_maps; i++) {
>  		kfree(set->map[i].mq_map);
> @@ -3590,8 +3596,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>  {
>  	int i, j;
>  
> -	for (i = 0; i < set->nr_hw_queues; i++)
> -		blk_mq_free_map_and_requests(set, i);
> +	for (i = 0; i < set->nr_hw_queues; i++) {
> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +		set->tags[i] = NULL;
> +	}

There are 5 callers in which 'set->tags[i]' is cleared, so just
wondering why you don't clear set->tags[i] at default in
blk_mq_free_map_and_rqs(). And just call __blk_mq_free_map_and_rqs()
for the only other user?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-24  8:28 ` [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support John Garry
  2021-09-24 10:23   ` Hannes Reinecke
@ 2021-09-26  3:25   ` Ming Lei
  2021-09-27  9:13     ` John Garry
  1 sibling, 1 reply; 30+ messages in thread
From: Ming Lei @ 2021-09-26  3:25 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Fri, Sep 24, 2021 at 04:28:29PM +0800, John Garry wrote:
> Currently we use separate sbitmap pairs and active_queues atomic_t for
> shared sbitmap support.
> 
> However a full sets of static requests are used per HW queue, which is
> quite wasteful, considering that the total number of requests usable at
> any given time across all HW queues is limited by the shared sbitmap depth.
> 
> As such, it is considerably more memory efficient in the case of shared
> sbitmap to allocate a set of static rqs per tag set or request queue, and
> not per HW queue.
> 
> So replace the sbitmap pairs and active_queues atomic_t with a shared
> tags per tagset and request queue, which will hold a set of shared static
> rqs.
> 
> Since there is now no valid HW queue index to be passed to the blk_mq_ops
> .init and .exit_request callbacks, pass an invalid index token. This
> changes the semantics of the APIs, such that the callback would need to
> validate the HW queue index before using it. Currently no user of shared
> sbitmap actually uses the HW queue index (as would be expected).
> 
> Continue to use term "shared sbitmap" for now, as the meaning is known.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/blk-mq-sched.c   | 82 ++++++++++++++++++-------------------
>  block/blk-mq-tag.c     | 61 ++++++++++------------------
>  block/blk-mq-tag.h     |  6 +--
>  block/blk-mq.c         | 91 +++++++++++++++++++++++-------------------
>  block/blk-mq.h         |  5 ++-
>  include/linux/blk-mq.h | 15 ++++---
>  include/linux/blkdev.h |  3 +-
>  7 files changed, 125 insertions(+), 138 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index d1b56bb9ac64..428da4949d80 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -519,6 +519,11 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
>  					  struct blk_mq_hw_ctx *hctx,
>  					  unsigned int hctx_idx)
>  {
> +	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
> +		hctx->sched_tags = q->shared_sbitmap_tags;
> +		return 0;
> +	}
> +
>  	hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
>  						    q->nr_requests);
>  
> @@ -527,61 +532,54 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
>  	return 0;
>  }
>  
> +static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
> +{
> +	blk_mq_free_rq_map(queue->shared_sbitmap_tags);
> +	queue->shared_sbitmap_tags = NULL;
> +}
> +
>  /* called in queue's release handler, tagset has gone away */
> -static void blk_mq_sched_tags_teardown(struct request_queue *q)
> +static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	int i;
>  
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		if (hctx->sched_tags) {
> -			blk_mq_free_rq_map(hctx->sched_tags, hctx->flags);
> +			if (!blk_mq_is_sbitmap_shared(q->tag_set->flags))
> +				blk_mq_free_rq_map(hctx->sched_tags);
>  			hctx->sched_tags = NULL;
>  		}
>  	}
> +
> +	if (blk_mq_is_sbitmap_shared(flags))
> +		blk_mq_exit_sched_shared_sbitmap(q);
>  }
>  
>  static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
>  {
>  	struct blk_mq_tag_set *set = queue->tag_set;
> -	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
> -	struct blk_mq_hw_ctx *hctx;
> -	int ret, i;
>  
>  	/*
>  	 * Set initial depth at max so that we don't need to reallocate for
>  	 * updating nr_requests.
>  	 */
> -	ret = blk_mq_init_bitmaps(&queue->sched_bitmap_tags,
> -				  &queue->sched_breserved_tags,
> -				  MAX_SCHED_RQ, set->reserved_tags,
> -				  set->numa_node, alloc_policy);
> -	if (ret)
> -		return ret;
> -
> -	queue_for_each_hw_ctx(queue, hctx, i) {
> -		hctx->sched_tags->bitmap_tags =
> -					&queue->sched_bitmap_tags;
> -		hctx->sched_tags->breserved_tags =
> -					&queue->sched_breserved_tags;
> -	}
> +	queue->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
> +						BLK_MQ_NO_HCTX_IDX,
> +						MAX_SCHED_RQ);
> +	if (!queue->shared_sbitmap_tags)
> +		return -ENOMEM;
>  
>  	blk_mq_tag_update_sched_shared_sbitmap(queue);
>  
>  	return 0;
>  }
>  
> -static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
> -{
> -	sbitmap_queue_free(&queue->sched_bitmap_tags);
> -	sbitmap_queue_free(&queue->sched_breserved_tags);
> -}
> -
>  int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  {
> +	unsigned int i, flags = q->tag_set->flags;
>  	struct blk_mq_hw_ctx *hctx;
>  	struct elevator_queue *eq;
> -	unsigned int i;
>  	int ret;
>  
>  	if (!e) {
> @@ -598,21 +596,21 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
>  				   BLKDEV_DEFAULT_RQ);
>  
> -	queue_for_each_hw_ctx(q, hctx, i) {
> -		ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
> +	if (blk_mq_is_sbitmap_shared(flags)) {
> +		ret = blk_mq_init_sched_shared_sbitmap(q);
>  		if (ret)
> -			goto err_free_map_and_rqs;
> +			return ret;
>  	}
>  
> -	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
> -		ret = blk_mq_init_sched_shared_sbitmap(q);
> +	queue_for_each_hw_ctx(q, hctx, i) {
> +		ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
>  		if (ret)
>  			goto err_free_map_and_rqs;
>  	}
>  
>  	ret = e->ops.init_sched(q, e);
>  	if (ret)
> -		goto err_free_sbitmap;
> +		goto err_free_map_and_rqs;
>  
>  	blk_mq_debugfs_register_sched(q);
>  
> @@ -632,12 +630,10 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  
>  	return 0;
>  
> -err_free_sbitmap:
> -	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
> -		blk_mq_exit_sched_shared_sbitmap(q);
>  err_free_map_and_rqs:
>  	blk_mq_sched_free_rqs(q);
> -	blk_mq_sched_tags_teardown(q);
> +	blk_mq_sched_tags_teardown(q, flags);
> +
>  	q->elevator = NULL;
>  	return ret;
>  }
> @@ -651,9 +647,15 @@ void blk_mq_sched_free_rqs(struct request_queue *q)
>  	struct blk_mq_hw_ctx *hctx;
>  	int i;
>  
> -	queue_for_each_hw_ctx(q, hctx, i) {
> -		if (hctx->sched_tags)
> -			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
> +	if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
> +		blk_mq_free_rqs(q->tag_set, q->shared_sbitmap_tags,
> +				BLK_MQ_NO_HCTX_IDX);
> +	} else {
> +		queue_for_each_hw_ctx(q, hctx, i) {
> +			if (hctx->sched_tags)
> +				blk_mq_free_rqs(q->tag_set,
> +						hctx->sched_tags, i);
> +		}
>  	}
>  }
>  
> @@ -674,8 +676,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
>  	blk_mq_debugfs_unregister_sched(q);
>  	if (e->type->ops.exit_sched)
>  		e->type->ops.exit_sched(e);
> -	blk_mq_sched_tags_teardown(q);
> -	if (blk_mq_is_sbitmap_shared(flags))
> -		blk_mq_exit_sched_shared_sbitmap(q);
> +	blk_mq_sched_tags_teardown(q, flags);
>  	q->elevator = NULL;
>  }
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a0ecc6d88f84..4e71ce6b37ea 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -27,10 +27,11 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>  		struct request_queue *q = hctx->queue;
>  		struct blk_mq_tag_set *set = q->tag_set;
> +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;

The local variable of 'set' can be removed and just retrieve 'tags' from
hctx->tags.

>  
>  		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
>  		    !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> -			atomic_inc(&set->active_queues_shared_sbitmap);
> +			atomic_inc(&tags->active_queues);
>  	} else {
>  		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
>  		    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> @@ -61,10 +62,12 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>  	struct blk_mq_tag_set *set = q->tag_set;
>  
>  	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
> +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
> +

Same with above.

>  		if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
>  					&q->queue_flags))
>  			return;
> -		atomic_dec(&set->active_queues_shared_sbitmap);
> +		atomic_dec(&tags->active_queues);
>  	} else {
>  		if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  			return;
> @@ -510,38 +513,10 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  	return 0;
>  }
>  
> -int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set)
> -{
> -	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
> -	int i, ret;
> -
> -	ret = blk_mq_init_bitmaps(&set->__bitmap_tags, &set->__breserved_tags,
> -				  set->queue_depth, set->reserved_tags,
> -				  set->numa_node, alloc_policy);
> -	if (ret)
> -		return ret;
> -
> -	for (i = 0; i < set->nr_hw_queues; i++) {
> -		struct blk_mq_tags *tags = set->tags[i];
> -
> -		tags->bitmap_tags = &set->__bitmap_tags;
> -		tags->breserved_tags = &set->__breserved_tags;
> -	}
> -
> -	return 0;
> -}
> -
> -void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
> -{
> -	sbitmap_queue_free(&set->__bitmap_tags);
> -	sbitmap_queue_free(&set->__breserved_tags);
> -}
> -
>  struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  				     unsigned int reserved_tags,
> -				     int node, unsigned int flags)
> +				     int node, int alloc_policy)
>  {
> -	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
>  	struct blk_mq_tags *tags;
>  
>  	if (total_tags > BLK_MQ_TAG_MAX) {
> @@ -557,9 +532,6 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	tags->nr_reserved_tags = reserved_tags;
>  	spin_lock_init(&tags->lock);
>  
> -	if (blk_mq_is_sbitmap_shared(flags))
> -		return tags;
> -
>  	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
>  		kfree(tags);
>  		return NULL;
> @@ -567,12 +539,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	return tags;
>  }
>  
> -void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
> +void blk_mq_free_tags(struct blk_mq_tags *tags)
>  {
> -	if (!blk_mq_is_sbitmap_shared(flags)) {
> -		sbitmap_queue_free(tags->bitmap_tags);
> -		sbitmap_queue_free(tags->breserved_tags);
> -	}
> +	sbitmap_queue_free(tags->bitmap_tags);
> +	sbitmap_queue_free(tags->breserved_tags);
>  	kfree(tags);
>  }
>  
> @@ -603,6 +573,13 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  		if (tdepth > MAX_SCHED_RQ)
>  			return -EINVAL;
>  
> +		/*
> +		 * Only the sbitmap needs resizing since we allocated the max
> +		 * initially.
> +		 */
> +		if (blk_mq_is_sbitmap_shared(set->flags))
> +			return 0;
> +
>  		new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
>  		if (!new)
>  			return -ENOMEM;
> @@ -623,12 +600,14 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  
>  void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
>  {
> -	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
> +	struct blk_mq_tags *tags = set->shared_sbitmap_tags;
> +
> +	sbitmap_queue_resize(&tags->__bitmap_tags, size - set->reserved_tags);
>  }
>  
>  void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
>  {
> -	sbitmap_queue_resize(&q->sched_bitmap_tags,
> +	sbitmap_queue_resize(q->shared_sbitmap_tags->bitmap_tags,
>  			     q->nr_requests - q->tag_set->reserved_tags);
>  }
>  
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 88f3c6485543..e433e39a9cfa 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -30,16 +30,14 @@ struct blk_mq_tags {
>  
>  extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
>  					unsigned int reserved_tags,
> -					int node, unsigned int flags);
> -extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
> +					int node, int alloc_policy);
> +extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>  extern int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
>  			       struct sbitmap_queue *breserved_tags,
>  			       unsigned int queue_depth,
>  			       unsigned int reserved,
>  			       int node, int alloc_policy);
>  
> -extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set);
> -extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
>  extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
>  extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
>  			   unsigned int tag);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 464ea20b9bcb..ece43855bcdf 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2344,7 +2344,10 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	struct blk_mq_tags *drv_tags;
>  	struct page *page;
>  
> -	drv_tags = set->tags[hctx_idx];
> +	if (blk_mq_is_sbitmap_shared(set->flags))
> +		drv_tags = set->shared_sbitmap_tags;
> +	else
> +		drv_tags = set->tags[hctx_idx];
>  
>  	if (tags->static_rqs && set->ops->exit_request) {
>  		int i;
> @@ -2373,21 +2376,20 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	}
>  }
>  
> -void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
> +void blk_mq_free_rq_map(struct blk_mq_tags *tags)
>  {
>  	kfree(tags->rqs);
>  	tags->rqs = NULL;
>  	kfree(tags->static_rqs);
>  	tags->static_rqs = NULL;
>  
> -	blk_mq_free_tags(tags, flags);
> +	blk_mq_free_tags(tags);
>  }
>  
>  static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  					       unsigned int hctx_idx,
>  					       unsigned int nr_tags,
> -					       unsigned int reserved_tags,
> -					       unsigned int flags)
> +					       unsigned int reserved_tags)
>  {
>  	struct blk_mq_tags *tags;
>  	int node;
> @@ -2396,7 +2398,8 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  	if (node == NUMA_NO_NODE)
>  		node = set->numa_node;
>  
> -	tags = blk_mq_init_tags(nr_tags, reserved_tags, node, flags);
> +	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> +				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>  	if (!tags)
>  		return NULL;
>  
> @@ -2404,7 +2407,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
>  				 node);
>  	if (!tags->rqs) {
> -		blk_mq_free_tags(tags, flags);
> +		blk_mq_free_tags(tags);
>  		return NULL;
>  	}
>  
> @@ -2413,7 +2416,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  					node);
>  	if (!tags->static_rqs) {
>  		kfree(tags->rqs);
> -		blk_mq_free_tags(tags, flags);
> +		blk_mq_free_tags(tags);
>  		return NULL;
>  	}
>  
> @@ -2855,14 +2858,13 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>  	struct blk_mq_tags *tags;
>  	int ret;
>  
> -	tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags,
> -				   set->flags);
> +	tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags);
>  	if (!tags)
>  		return NULL;
>  
>  	ret = blk_mq_alloc_rqs(set, tags, hctx_idx, depth);
>  	if (ret) {
> -		blk_mq_free_rq_map(tags, set->flags);
> +		blk_mq_free_rq_map(tags);
>  		return NULL;
>  	}
>  
> @@ -2872,6 +2874,12 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>  static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>  				       int hctx_idx)
>  {
> +	if (blk_mq_is_sbitmap_shared(set->flags)) {
> +		set->tags[hctx_idx] = set->shared_sbitmap_tags;
> +
> +		return true;
> +	}
> +
>  	set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs(set, hctx_idx,
>  						       set->queue_depth);
>  
> @@ -2882,14 +2890,22 @@ void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
>  			     struct blk_mq_tags *tags,
>  			     unsigned int hctx_idx)
>  {
> -	unsigned int flags = set->flags;
> -
>  	if (tags) {
>  		blk_mq_free_rqs(set, tags, hctx_idx);
> -		blk_mq_free_rq_map(tags, flags);
> +		blk_mq_free_rq_map(tags);
>  	}
>  }
>  
> +static void __blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
> +				      struct blk_mq_tags *tags,
> +				      unsigned int hctx_idx)
> +{
> +	if (blk_mq_is_sbitmap_shared(set->flags))
> +		return;
> +
> +	blk_mq_free_map_and_rqs(set, tags, hctx_idx);
> +}
> +
>  static void blk_mq_map_swqueue(struct request_queue *q)
>  {
>  	unsigned int i, j, hctx_idx;
> @@ -2968,7 +2984,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>  			 * allocation
>  			 */
>  			if (i && set->tags[i]) {
> -				blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +				__blk_mq_free_map_and_rqs(set, set->tags[i], i);
>  				set->tags[i] = NULL;
>  			}
>  
> @@ -3266,7 +3282,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx = hctxs[j];
>  
>  		if (hctx) {
> -			blk_mq_free_map_and_rqs(set, set->tags[j], j);
> +			__blk_mq_free_map_and_rqs(set, set->tags[j], j);
>  			set->tags[j] = NULL;
>  			blk_mq_exit_hctx(q, set, hctx, j);
>  			hctxs[j] = NULL;
> @@ -3354,6 +3370,14 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  {
>  	int i;
>  
> +	if (blk_mq_is_sbitmap_shared(set->flags)) {
> +		set->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
> +						BLK_MQ_NO_HCTX_IDX,
> +						set->queue_depth);
> +		if (!set->shared_sbitmap_tags)
> +			return -ENOMEM;
> +	}
> +
>  	for (i = 0; i < set->nr_hw_queues; i++) {
>  		if (!__blk_mq_alloc_map_and_rqs(set, i))
>  			goto out_unwind;
> @@ -3364,10 +3388,15 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  
>  out_unwind:
>  	while (--i >= 0) {
> -		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +		__blk_mq_free_map_and_rqs(set, set->tags[i], i);
>  		set->tags[i] = NULL;
>  	}
>  
> +	if (blk_mq_is_sbitmap_shared(set->flags)) {
> +		blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
> +					BLK_MQ_NO_HCTX_IDX);
> +	}
> +
>  	return -ENOMEM;
>  }
>  
> @@ -3546,25 +3575,11 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	if (ret)
>  		goto out_free_mq_map;
>  
> -	if (blk_mq_is_sbitmap_shared(set->flags)) {
> -		atomic_set(&set->active_queues_shared_sbitmap, 0);
> -
> -		if (blk_mq_init_shared_sbitmap(set)) {
> -			ret = -ENOMEM;
> -			goto out_free_mq_rq_maps;
> -		}
> -	}
> -
>  	mutex_init(&set->tag_list_lock);
>  	INIT_LIST_HEAD(&set->tag_list);
>  
>  	return 0;
>  
> -out_free_mq_rq_maps:
> -	for (i = 0; i < set->nr_hw_queues; i++) {
> -		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> -		set->tags[i] = NULL;
> -	}
>  out_free_mq_map:
>  	for (i = 0; i < set->nr_maps; i++) {
>  		kfree(set->map[i].mq_map);
> @@ -3597,12 +3612,14 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>  	int i, j;
>  
>  	for (i = 0; i < set->nr_hw_queues; i++) {
> -		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +		__blk_mq_free_map_and_rqs(set, set->tags[i], i);
>  		set->tags[i] = NULL;
>  	}
>  
> -	if (blk_mq_is_sbitmap_shared(set->flags))
> -		blk_mq_exit_shared_sbitmap(set);
> +	if (blk_mq_is_sbitmap_shared(set->flags)) {
> +		blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
> +					BLK_MQ_NO_HCTX_IDX);
> +	}
>  
>  	for (j = 0; j < set->nr_maps; j++) {
>  		kfree(set->map[j].mq_map);
> @@ -3640,12 +3657,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  		if (hctx->sched_tags) {
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>  						      nr, true);
> -			if (blk_mq_is_sbitmap_shared(set->flags)) {
> -				hctx->sched_tags->bitmap_tags =
> -					&q->sched_bitmap_tags;
> -				hctx->sched_tags->breserved_tags =
> -					&q->sched_breserved_tags;
> -			}
>  		} else {
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>  						      false);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index bcb0ca89d37a..b34385211e0a 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -54,7 +54,7 @@ void blk_mq_put_rq_ref(struct request *rq);
>   */
>  void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  		     unsigned int hctx_idx);
> -void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
> +void blk_mq_free_rq_map(struct blk_mq_tags *tags);
>  struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>  				unsigned int hctx_idx, unsigned int depth);
>  void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
> @@ -331,10 +331,11 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>  	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>  		struct request_queue *q = hctx->queue;
>  		struct blk_mq_tag_set *set = q->tag_set;
> +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;

Same with above, tags can be retrieved from hctx->tags directly,
and both 'q' and 'set' can be killed.

>  
>  		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>  			return true;
> -		users = atomic_read(&set->active_queues_shared_sbitmap);
> +		users = atomic_read(&tags->active_queues);
>  	} else {
>  		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  			return true;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 13ba1861e688..808854a8ebc4 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -232,13 +232,11 @@ enum hctx_type {
>   * @flags:	   Zero or more BLK_MQ_F_* flags.
>   * @driver_data:   Pointer to data owned by the block driver that created this
>   *		   tag set.
> - * @active_queues_shared_sbitmap:
> - * 		   number of active request queues per tag set.
> - * @__bitmap_tags: A shared tags sbitmap, used over all hctx's
> - * @__breserved_tags:
> - *		   A shared reserved tags sbitmap, used over all hctx's
>   * @tags:	   Tag sets. One tag set per hardware queue. Has @nr_hw_queues
>   *		   elements.
> + * @shared_sbitmap_tags:
> + *		   Shared sbitmap set of tags. Has @nr_hw_queues elements. If
> + *		   set, shared by all @tags.
>   * @tag_list_lock: Serializes tag_list accesses.
>   * @tag_list:	   List of the request queues that use this tag set. See also
>   *		   request_queue.tag_set_list.
> @@ -255,12 +253,11 @@ struct blk_mq_tag_set {
>  	unsigned int		timeout;
>  	unsigned int		flags;
>  	void			*driver_data;
> -	atomic_t		active_queues_shared_sbitmap;
>  
> -	struct sbitmap_queue	__bitmap_tags;
> -	struct sbitmap_queue	__breserved_tags;
>  	struct blk_mq_tags	**tags;
>  
> +	struct blk_mq_tags	*shared_sbitmap_tags;
> +
>  	struct mutex		tag_list_lock;
>  	struct list_head	tag_list;
>  };
> @@ -432,6 +429,8 @@ enum {
>  	((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
>  		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
>  
> +#define BLK_MQ_NO_HCTX_IDX	(-1U)
> +
>  struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
>  		struct lock_class_key *lkclass);
>  #define blk_mq_alloc_disk(set, queuedata)				\
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4baf9435232d..17e50e5ef47b 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -459,8 +459,7 @@ struct request_queue {
>  
>  	atomic_t		nr_active_requests_shared_sbitmap;
>  
> -	struct sbitmap_queue	sched_bitmap_tags;
> -	struct sbitmap_queue	sched_breserved_tags;
> +	struct blk_mq_tags	*shared_sbitmap_tags;

Maybe better with shared_sched_sbitmap_tags or sched_sbitmap_tags?

Nice cleanup, once the above comments are addressed, feel free to
add:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  2021-09-26  2:05   ` Ming Lei
@ 2021-09-27  9:02     ` John Garry
  2021-09-27  9:19       ` Ming Lei
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-27  9:02 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On 26/09/2021 03:05, Ming Lei wrote:
> On Fri, Sep 24, 2021 at 04:28:28PM +0800, John Garry wrote:
>> Refactor blk_mq_free_map_and_requests() such that it can be used at many
>> sites at which the tag map and rqs are freed.
>>
>> Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
>> alloc equivalent.
>>
>> Suggested-by: Ming Lei<ming.lei@redhat.com>
>> Signed-off-by: John Garry<john.garry@huawei.com>
>> Reviewed-by: Hannes Reinecke<hare@suse.de>
>> ---
>>   block/blk-mq-tag.c |  3 +--
>>   block/blk-mq.c     | 40 ++++++++++++++++++++++++----------------
>>   block/blk-mq.h     |  4 +++-
>>   3 files changed, 28 insertions(+), 19 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index db99f1246795..a0ecc6d88f84 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -607,8 +607,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>>   		if (!new)
>>   			return -ENOMEM;
>>   
>> -		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
>> -		blk_mq_free_rq_map(*tagsptr, set->flags);
>> +		blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
>>   		*tagsptr = new;
>>   	} else {
>>   		/*
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 46772773b9c4..464ea20b9bcb 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2878,15 +2878,15 @@ static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
>>   	return set->tags[hctx_idx];
>>   }
>>   
>> -static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>> -					 unsigned int hctx_idx)
>> +void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
>> +			     struct blk_mq_tags *tags,
>> +			     unsigned int hctx_idx)
>>   {
>>   	unsigned int flags = set->flags;
>>   
>> -	if (set->tags && set->tags[hctx_idx]) {
>> -		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
>> -		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
>> -		set->tags[hctx_idx] = NULL;
>> +	if (tags) {
>> +		blk_mq_free_rqs(set, tags, hctx_idx);
>> +		blk_mq_free_rq_map(tags, flags);
>>   	}
>>   }
>>   
>> @@ -2967,8 +2967,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>>   			 * fallback in case of a new remap fails
>>   			 * allocation
>>   			 */
>> -			if (i && set->tags[i])
>> -				blk_mq_free_map_and_requests(set, i);
>> +			if (i && set->tags[i]) {
>> +				blk_mq_free_map_and_rqs(set, set->tags[i], i);
>> +				set->tags[i] = NULL;
>> +			}
>>   
>>   			hctx->tags = NULL;
>>   			continue;
>> @@ -3264,8 +3266,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>>   		struct blk_mq_hw_ctx *hctx = hctxs[j];
>>   
>>   		if (hctx) {
>> -			if (hctx->tags)
>> -				blk_mq_free_map_and_requests(set, j);
>> +			blk_mq_free_map_and_rqs(set, set->tags[j], j);
>> +			set->tags[j] = NULL;
>>   			blk_mq_exit_hctx(q, set, hctx, j);
>>   			hctxs[j] = NULL;
>>   		}
>> @@ -3361,8 +3363,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>>   	return 0;
>>   
>>   out_unwind:
>> -	while (--i >= 0)
>> -		blk_mq_free_map_and_requests(set, i);
>> +	while (--i >= 0) {
>> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
>> +		set->tags[i] = NULL;
>> +	}
>>   
>>   	return -ENOMEM;
>>   }
>> @@ -3557,8 +3561,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>>   	return 0;
>>   
>>   out_free_mq_rq_maps:
>> -	for (i = 0; i < set->nr_hw_queues; i++)
>> -		blk_mq_free_map_and_requests(set, i);
>> +	for (i = 0; i < set->nr_hw_queues; i++) {
>> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
>> +		set->tags[i] = NULL;
>> +	}
>>   out_free_mq_map:
>>   	for (i = 0; i < set->nr_maps; i++) {
>>   		kfree(set->map[i].mq_map);
>> @@ -3590,8 +3596,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>>   {
>>   	int i, j;
>>   
>> -	for (i = 0; i < set->nr_hw_queues; i++)
>> -		blk_mq_free_map_and_requests(set, i);
>> +	for (i = 0; i < set->nr_hw_queues; i++) {
>> +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
>> +		set->tags[i] = NULL;
>> +	}
> There are 5 callers in which 'set->tags[i]' is cleared, so just
> wondering why you don't clear set->tags[i] at default in
> blk_mq_free_map_and_rqs(). And just call __blk_mq_free_map_and_rqs()
> for the only other user?

blk_mq_free_map_and_rqs() is not always passed set->tags[i]:

- blk_mq_tag_update_depth() calls blk_mq_free_map_and_rqs() for sched tags

- __blk_mq_alloc_rq_maps() calls blk_mq_free_map_and_rqs() for 
shared_sbitmap_tags

Function __blk_mq_free_map_and_rqs() is not public and really only 
intended for set->tags[i]

So I did consider passing the address of the tags pointer to
blk_mq_free_map_and_rqs(), like:

void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
			struct blk_mq_tag **tags,
			unsigned int hctx_idx)

{
	...
	*tags = NULL;
}

But then the API becomes a bit asymmetric, as we deal with tags pointer 
normally:

struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
					     unsigned int hctx_idx,
					     unsigned int depth);


However, apart from this, I can change __blk_mq_free_map_and_rqs() to 
NULLify set->tags[i], as it is always passed set->tags[i].

Do you have a preference?

Thanks,
John



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-26  3:25   ` Ming Lei
@ 2021-09-27  9:13     ` John Garry
  2021-09-27  9:26       ` Ming Lei
  0 siblings, 1 reply; 30+ messages in thread
From: John Garry @ 2021-09-27  9:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On 26/09/2021 04:25, Ming Lei wrote:
>> c
>> @@ -27,10 +27,11 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>   	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>>   		struct request_queue *q = hctx->queue;
>>   		struct blk_mq_tag_set *set = q->tag_set;
>> +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
> The local variable of 'set' can be removed and just retrieve 'tags' from
> hctx->tags.
> 
>>   
>>   		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
>>   		    !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
>> -			atomic_inc(&set->active_queues_shared_sbitmap);
>> +			atomic_inc(&tags->active_queues);
>>   	} else {
>>   		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
>>   		    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>> @@ -61,10 +62,12 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>>   	struct blk_mq_tag_set *set = q->tag_set;
>>   
>>   	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
>> +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
>> +
> Same with above.

ok

> 
>>   		if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
>>   					&q->queue_flags))
>>   			return;
>> -		atomic_dec(&set->active_queues_shared_sbitmap);
>> +		atomic_dec(&tags->active_queues);
>>   	} else {
>>   		if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>>   			return;
>> @@ -510,38 +513,10 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>>   	return 0;
>>   }

...

>> -	struct sbitmap_queue	__bitmap_tags;
>> -	struct sbitmap_queue	__breserved_tags;
>>   	struct blk_mq_tags	**tags;
>>   
>> +	struct blk_mq_tags	*shared_sbitmap_tags;
>> +
>>   	struct mutex		tag_list_lock;
>>   	struct list_head	tag_list;
>>   };
>> @@ -432,6 +429,8 @@ enum {
>>   	((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
>>   		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
>>   
>> +#define BLK_MQ_NO_HCTX_IDX	(-1U)
>> +
>>   struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
>>   		struct lock_class_key *lkclass);
>>   #define blk_mq_alloc_disk(set, queuedata)				\
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 4baf9435232d..17e50e5ef47b 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -459,8 +459,7 @@ struct request_queue {
>>   
>>   	atomic_t		nr_active_requests_shared_sbitmap;
>>   
>> -	struct sbitmap_queue	sched_bitmap_tags;
>> -	struct sbitmap_queue	sched_breserved_tags;
>> +	struct blk_mq_tags	*shared_sbitmap_tags;
> Maybe better with shared_sched_sbitmap_tags or sched_sbitmap_tags?

Yeah, I suppose I should add "sched" to the name, as before.

BTW, Do you think that I should just change shared_sbitmap -> 
shared_tags naming now globally? I'm thinking now that I should...

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  2021-09-27  9:02     ` John Garry
@ 2021-09-27  9:19       ` Ming Lei
  2021-09-27  9:40         ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Ming Lei @ 2021-09-27  9:19 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Mon, Sep 27, 2021 at 10:02:40AM +0100, John Garry wrote:
> On 26/09/2021 03:05, Ming Lei wrote:
> > On Fri, Sep 24, 2021 at 04:28:28PM +0800, John Garry wrote:
> > > Refactor blk_mq_free_map_and_requests() such that it can be used at many
> > > sites at which the tag map and rqs are freed.
> > > 
> > > Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
> > > alloc equivalent.
> > > 
> > > Suggested-by: Ming Lei<ming.lei@redhat.com>
> > > Signed-off-by: John Garry<john.garry@huawei.com>
> > > Reviewed-by: Hannes Reinecke<hare@suse.de>
> > > ---
> > >   block/blk-mq-tag.c |  3 +--
> > >   block/blk-mq.c     | 40 ++++++++++++++++++++++++----------------
> > >   block/blk-mq.h     |  4 +++-
> > >   3 files changed, 28 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index db99f1246795..a0ecc6d88f84 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -607,8 +607,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
> > >   		if (!new)
> > >   			return -ENOMEM;
> > > -		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
> > > -		blk_mq_free_rq_map(*tagsptr, set->flags);
> > > +		blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
> > >   		*tagsptr = new;
> > >   	} else {
> > >   		/*
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 46772773b9c4..464ea20b9bcb 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -2878,15 +2878,15 @@ static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
> > >   	return set->tags[hctx_idx];
> > >   }
> > > -static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
> > > -					 unsigned int hctx_idx)
> > > +void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
> > > +			     struct blk_mq_tags *tags,
> > > +			     unsigned int hctx_idx)
> > >   {
> > >   	unsigned int flags = set->flags;
> > > -	if (set->tags && set->tags[hctx_idx]) {
> > > -		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
> > > -		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
> > > -		set->tags[hctx_idx] = NULL;
> > > +	if (tags) {
> > > +		blk_mq_free_rqs(set, tags, hctx_idx);
> > > +		blk_mq_free_rq_map(tags, flags);
> > >   	}
> > >   }
> > > @@ -2967,8 +2967,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> > >   			 * fallback in case of a new remap fails
> > >   			 * allocation
> > >   			 */
> > > -			if (i && set->tags[i])
> > > -				blk_mq_free_map_and_requests(set, i);
> > > +			if (i && set->tags[i]) {
> > > +				blk_mq_free_map_and_rqs(set, set->tags[i], i);
> > > +				set->tags[i] = NULL;
> > > +			}
> > >   			hctx->tags = NULL;
> > >   			continue;
> > > @@ -3264,8 +3266,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> > >   		struct blk_mq_hw_ctx *hctx = hctxs[j];
> > >   		if (hctx) {
> > > -			if (hctx->tags)
> > > -				blk_mq_free_map_and_requests(set, j);
> > > +			blk_mq_free_map_and_rqs(set, set->tags[j], j);
> > > +			set->tags[j] = NULL;
> > >   			blk_mq_exit_hctx(q, set, hctx, j);
> > >   			hctxs[j] = NULL;
> > >   		}
> > > @@ -3361,8 +3363,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> > >   	return 0;
> > >   out_unwind:
> > > -	while (--i >= 0)
> > > -		blk_mq_free_map_and_requests(set, i);
> > > +	while (--i >= 0) {
> > > +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> > > +		set->tags[i] = NULL;
> > > +	}
> > >   	return -ENOMEM;
> > >   }
> > > @@ -3557,8 +3561,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> > >   	return 0;
> > >   out_free_mq_rq_maps:
> > > -	for (i = 0; i < set->nr_hw_queues; i++)
> > > -		blk_mq_free_map_and_requests(set, i);
> > > +	for (i = 0; i < set->nr_hw_queues; i++) {
> > > +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> > > +		set->tags[i] = NULL;
> > > +	}
> > >   out_free_mq_map:
> > >   	for (i = 0; i < set->nr_maps; i++) {
> > >   		kfree(set->map[i].mq_map);
> > > @@ -3590,8 +3596,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
> > >   {
> > >   	int i, j;
> > > -	for (i = 0; i < set->nr_hw_queues; i++)
> > > -		blk_mq_free_map_and_requests(set, i);
> > > +	for (i = 0; i < set->nr_hw_queues; i++) {
> > > +		blk_mq_free_map_and_rqs(set, set->tags[i], i);
> > > +		set->tags[i] = NULL;
> > > +	}
> > There are 5 callers in which 'set->tags[i]' is cleared, so just
> > wondering why you don't clear set->tags[i] at default in
> > blk_mq_free_map_and_rqs(). And just call __blk_mq_free_map_and_rqs()
> > for the only other user?
> 
> blk_mq_free_map_and_rqs() is not always passed set->tags[i]:
> 
> - blk_mq_tag_update_depth() calls blk_mq_free_map_and_rqs() for sched tags
> 
> - __blk_mq_alloc_rq_maps() calls blk_mq_free_map_and_rqs() for
> shared_sbitmap_tags
> 
> Function __blk_mq_free_map_and_rqs() is not public and really only intended
> for set->tags[i]
> 
> So I did consider passing the address of the tags pointer to
> blk_mq_free_map_and_rqs(), like:
> 
> void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
> 			struct blk_mq_tag **tags,
> 			unsigned int hctx_idx)
> 
> {
> 	...
> 	*tags = NULL;
> }
> 
> But then the API becomes a bit asymmetric, as we deal with tags pointer
> normally:
> 
> struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
> 					     unsigned int hctx_idx,
> 					     unsigned int depth);
> 
> 
> However, apart from this, I can change __blk_mq_free_map_and_rqs() to
> NULLify set->tags[i], as it is always passed set->tags[i].
> 
> Do you have a preference?

I meant there are 5 following uses in your patch:

+                               blk_mq_free_map_and_rqs(set, set->tags[i], i);
+                               set->tags[i] = NULL;

and one new helper(blk_mq_free_set_map_and_rqs(set, i)?) can be added for just
doing that, 

Thanks,
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-27  9:13     ` John Garry
@ 2021-09-27  9:26       ` Ming Lei
  2021-09-27  9:41         ` John Garry
  0 siblings, 1 reply; 30+ messages in thread
From: Ming Lei @ 2021-09-27  9:26 UTC (permalink / raw)
  To: John Garry; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On Mon, Sep 27, 2021 at 10:13:29AM +0100, John Garry wrote:
> On 26/09/2021 04:25, Ming Lei wrote:
> > > c
> > > @@ -27,10 +27,11 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> > >   	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
> > >   		struct request_queue *q = hctx->queue;
> > >   		struct blk_mq_tag_set *set = q->tag_set;
> > > +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
> > The local variable of 'set' can be removed and just retrieve 'tags' from
> > hctx->tags.
> > 
> > >   		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
> > >   		    !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
> > > -			atomic_inc(&set->active_queues_shared_sbitmap);
> > > +			atomic_inc(&tags->active_queues);
> > >   	} else {
> > >   		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
> > >   		    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> > > @@ -61,10 +62,12 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
> > >   	struct blk_mq_tag_set *set = q->tag_set;
> > >   	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
> > > +		struct blk_mq_tags *tags = set->shared_sbitmap_tags;
> > > +
> > Same with above.
> 
> ok
> 
> > 
> > >   		if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
> > >   					&q->queue_flags))
> > >   			return;
> > > -		atomic_dec(&set->active_queues_shared_sbitmap);
> > > +		atomic_dec(&tags->active_queues);
> > >   	} else {
> > >   		if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> > >   			return;
> > > @@ -510,38 +513,10 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
> > >   	return 0;
> > >   }
> 
> ...
> 
> > > -	struct sbitmap_queue	__bitmap_tags;
> > > -	struct sbitmap_queue	__breserved_tags;
> > >   	struct blk_mq_tags	**tags;
> > > +	struct blk_mq_tags	*shared_sbitmap_tags;
> > > +
> > >   	struct mutex		tag_list_lock;
> > >   	struct list_head	tag_list;
> > >   };
> > > @@ -432,6 +429,8 @@ enum {
> > >   	((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
> > >   		<< BLK_MQ_F_ALLOC_POLICY_START_BIT)
> > > +#define BLK_MQ_NO_HCTX_IDX	(-1U)
> > > +
> > >   struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
> > >   		struct lock_class_key *lkclass);
> > >   #define blk_mq_alloc_disk(set, queuedata)				\
> > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > > index 4baf9435232d..17e50e5ef47b 100644
> > > --- a/include/linux/blkdev.h
> > > +++ b/include/linux/blkdev.h
> > > @@ -459,8 +459,7 @@ struct request_queue {
> > >   	atomic_t		nr_active_requests_shared_sbitmap;
> > > -	struct sbitmap_queue	sched_bitmap_tags;
> > > -	struct sbitmap_queue	sched_breserved_tags;
> > > +	struct blk_mq_tags	*shared_sbitmap_tags;
> > Maybe better with shared_sched_sbitmap_tags or sched_sbitmap_tags?
> 
> Yeah, I suppose I should add "sched" to the name, as before.
> 
> BTW, Do you think that I should just change shared_sbitmap -> shared_tags
> naming now globally? I'm thinking now that I should...

Yeah, I think so, but seems you preferred to the name of shared sbitmap, so I
did't mention that, :-)


Thanks,
Ming


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
  2021-09-27  9:19       ` Ming Lei
@ 2021-09-27  9:40         ` John Garry
  0 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-27  9:40 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On 27/09/2021 10:19, Ming Lei wrote:
>> However, apart from this, I can change __blk_mq_free_map_and_rqs() to
>> NULLify set->tags[i], as it is always passed set->tags[i].
>>
>> Do you have a preference?
> I meant there are 5 following uses in your patch:
> 
> +                               blk_mq_free_map_and_rqs(set, set->tags[i], i);
> +                               set->tags[i] = NULL;
> 
> and one new helper(blk_mq_free_set_map_and_rqs(set, i)?) can be added for just
> doing that,

Ah, ok, but in the next patch we replace these blk_mq_free_map_and_rqs() 
calls with __blk_mq_free_map_and_rqs(), and __blk_mq_free_map_and_rqs() 
is always passed set->tags[i], so we do as you request there, i.e. 
NULLify set->tags[i] in __blk_mq_free_map_and_rqs().

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-27  9:26       ` Ming Lei
@ 2021-09-27  9:41         ` John Garry
  0 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-27  9:41 UTC (permalink / raw)
  To: Ming Lei; +Cc: axboe, linux-block, linux-kernel, linux-scsi, hare

On 27/09/2021 10:26, Ming Lei wrote:
>>>> -	struct sbitmap_queue	sched_bitmap_tags;
>>>> -	struct sbitmap_queue	sched_breserved_tags;
>>>> +	struct blk_mq_tags	*shared_sbitmap_tags;
>>> Maybe better with shared_sched_sbitmap_tags or sched_sbitmap_tags?
>> Yeah, I suppose I should add "sched" to the name, as before.
>>
>> BTW, Do you think that I should just change shared_sbitmap -> shared_tags
>> naming now globally? I'm thinking now that I should...
> Yeah, I think so, but seems you preferred to the name of shared sbitmap, so I
> did't mention that,:-)

OK, I can clear up any references in a follow on patch to try to keep 
this one as small as possible.

Thanks,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support
  2021-09-24 10:39     ` John Garry
@ 2021-09-29 13:36       ` John Garry
  0 siblings, 0 replies; 30+ messages in thread
From: John Garry @ 2021-09-29 13:36 UTC (permalink / raw)
  To: Hannes Reinecke, axboe
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, Kashyap Desai

On 24/09/2021 11:39, John Garry wrote:
> + Kashyap
> 
> On 24/09/2021 11:23, Hannes Reinecke wrote:
>> On 9/24/21 10:28 AM, John Garry wrote:
>>> Currently we use separate sbitmap pairs and active_queues atomic_t for
>>> shared sbitmap support.
>>>
>>> However a full sets of static requests are used per HW queue, which is
>>> quite wasteful, considering that the total number of requests usable at
>>> any given time across all HW queues is limited by the shared sbitmap 
>>> depth.
>>>
>>> As such, it is considerably more memory efficient in the case of shared
>>> sbitmap to allocate a set of static rqs per tag set or request queue, 
>>> and
>>> not per HW queue.
>>>
>>> So replace the sbitmap pairs and active_queues atomic_t with a shared
>>> tags per tagset and request queue, which will hold a set of shared 
>>> static
>>> rqs.
>>>
>>> Since there is now no valid HW queue index to be passed to the 
>>> blk_mq_ops
>>> .init and .exit_request callbacks, pass an invalid index token. This
>>> changes the semantics of the APIs, such that the callback would need to
>>> validate the HW queue index before using it. Currently no user of shared
>>> sbitmap actually uses the HW queue index (as would be expected).
>>>
>>> Continue to use term "shared sbitmap" for now, as the meaning is known.
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   block/blk-mq-sched.c   | 82 ++++++++++++++++++-------------------
>>>   block/blk-mq-tag.c     | 61 ++++++++++------------------
>>>   block/blk-mq-tag.h     |  6 +--
>>>   block/blk-mq.c         | 91 +++++++++++++++++++++++-------------------
>>>   block/blk-mq.h         |  5 ++-
>>>   include/linux/blk-mq.h | 15 ++++---
>>>   include/linux/blkdev.h |  3 +-
>>>   7 files changed, 125 insertions(+), 138 deletions(-)
>>>
>> The overall idea to keep the full request allocation per queue was to 
>> ensure memory locality for the requests themselves.
>> When moving to a shared request structure we obviously loose that 
>> feature.
>>
>> But I'm not sure if that matters here; the performance impact might be 
>> too small to be measurable, seeing that we'll be most likely bound by 
>> hardware latencies anyway.
>>
>> Nevertheless: have you tested for performance regressions with this 
>> patchset?
> 
> I have tested relatively lower rates, like ~450K IOPS, without any 
> noticeable regression.
> 
>> I'm especially thinking of Kashyaps high-IOPS megaraid setup; if there 
>> is a performance impact that'll be likely scenario where we can 
>> measure it.
>>
> 
> I can test higher rates, like 2M IOPS, when I get access to the HW.
> 
> @Kashyap, Any chance you can help test performance here?
> 
>> But even if there is a performance impact this patchset might be 
>> worthwhile, seeing that it'll reduce the memory footprint massively.
> 
> Sure, I don't think that minor performance improvements can justify the 
> excessive memory.
> 

JFYI, with 6x SAS SSDs on my arm64 board, I see:

Before (5.15-rc2 baseline):
none: 445K IOPs, mq-deadline: 418K IOPs (fio read)

After:
none: 442K IOPs, mq-deadline: 407K IOPs (fio read)

So only a marginal drop there for mq-deadline.

I'll try my 12x SAS SSD setup when I get a chance. Kashyap is kindly 
also testing.

Thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-09-29 13:33 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24  8:28 [PATCH v4 00/13] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
2021-09-24  8:28 ` [PATCH v4 01/13] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
2021-09-24  8:28 ` [PATCH v4 02/13] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
2021-09-24  8:28 ` [PATCH v4 03/13] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
2021-09-24  8:28 ` [PATCH v4 04/13] blk-mq: Invert check " John Garry
2021-09-24  8:28 ` [PATCH v4 05/13] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}() John Garry
2021-09-24 10:08   ` Hannes Reinecke
2021-09-24  8:28 ` [PATCH v4 06/13] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}() John Garry
2021-09-26  1:15   ` Ming Lei
2021-09-24  8:28 ` [PATCH v4 07/13] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping() John Garry
2021-09-26  1:42   ` Ming Lei
2021-09-24  8:28 ` [PATCH v4 08/13] blk-mq: Don't clear driver tags own mapping John Garry
2021-09-26  1:43   ` Ming Lei
2021-09-24  8:28 ` [PATCH v4 09/13] blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap() John Garry
2021-09-24  8:28 ` [PATCH v4 10/13] blk-mq: Add blk_mq_alloc_map_and_rqs() John Garry
2021-09-26  2:00   ` Ming Lei
2021-09-24  8:28 ` [PATCH v4 11/13] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}() John Garry
2021-09-26  2:05   ` Ming Lei
2021-09-27  9:02     ` John Garry
2021-09-27  9:19       ` Ming Lei
2021-09-27  9:40         ` John Garry
2021-09-24  8:28 ` [PATCH v4 12/13] blk-mq: Use shared tags for shared sbitmap support John Garry
2021-09-24 10:23   ` Hannes Reinecke
2021-09-24 10:39     ` John Garry
2021-09-29 13:36       ` John Garry
2021-09-26  3:25   ` Ming Lei
2021-09-27  9:13     ` John Garry
2021-09-27  9:26       ` Ming Lei
2021-09-27  9:41         ` John Garry
2021-09-24  8:28 ` [PATCH v4 13/13] blk-mq: Stop using pointers for blk_mq_tags bitmap tags John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.