linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap
@ 2021-07-14 15:06 John Garry
  2021-07-14 15:06 ` [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

Currently a full set of static requests are allocated per hw queue per
tagset when shared sbitmap is used.

However, only tagset->queue_depth number of requests may be active at
any given time. As such, only tagset->queue_depth number of static
requests are required.

The same goes for using an IO scheduler, which allocates a full set of
static requests per hw queue per request queue.

This series very significantly reduces memory usage in both scenarios by
allocating static rqs per tagset and per request queue, respectively,
rather than per hw queue per tagset and per request queue.

For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
save approx. 300MB(!) [370MB -> 60MB]

A couple of patches are marked as RFC, as maybe there is a better
implementation approach.

Any more testing would be appreciated also.

John Garry (9):
  blk-mq: Change rqs check in blk_mq_free_rqs()
  block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
  blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap()
  blk-mq: Invert check in blk_mq_update_nr_requests()
  blk-mq: Refactor blk_mq_{alloc,free}_rqs
  blk-mq: Allocate per tag set static rqs for shared sbitmap
  blk-mq: Allocate per request queue static rqs for shared sbitmap
  blk-mq: Clear mappings for shared sbitmap sched static rqs

 block/blk-core.c       |   2 +-
 block/blk-mq-sched.c   |  57 ++++++++++++--
 block/blk-mq-sched.h   |   2 +-
 block/blk-mq-tag.c     |  22 ++++--
 block/blk-mq-tag.h     |   1 +
 block/blk-mq.c         | 165 +++++++++++++++++++++++++++++++----------
 block/blk-mq.h         |   9 +++
 drivers/block/rbd.c    |   2 +-
 include/linux/blk-mq.h |   4 +
 include/linux/blkdev.h |   6 +-
 10 files changed, 215 insertions(+), 55 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs()
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-20  7:35   ` Ming Lei
  2021-07-14 15:06 ` [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

The original code in commit 24d2f90309b23 ("blk-mq: split out tag
initialization, support shared tags") would check tags->rqs is non-NULL and
then dereference tags->rqs[].

Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
started to dereference tags->static_rqs[], but continued to check non-NULL
tags->rqs.

Check tags->static_rqs as non-NULL instead, which is more logical.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2c4ac51e54eb..ae28f470893c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2348,7 +2348,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 {
 	struct page *page;
 
-	if (tags->rqs && set->ops->exit_request) {
+	if (tags->static_rqs && set->ops->exit_request) {
 		int i;
 
 		for (i = 0; i < tags->nr_tags; i++) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
  2021-07-14 15:06 ` [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-20  7:44   ` Ming Lei
  2021-07-14 15:06 ` [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
the name BLKDEV_MAX_RQ would imply the max requests always, which it is
not.

Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
the default number of requests assigned when allocating a request queue.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-core.c       | 2 +-
 block/blk-mq-sched.c   | 2 +-
 block/blk-mq-sched.h   | 2 +-
 drivers/block/rbd.c    | 2 +-
 include/linux/blkdev.h | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 04477697ee4b..5d71382b6131 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -579,7 +579,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 
 	blk_queue_dma_alignment(q, 511);
 	blk_set_default_limits(&q->limits);
-	q->nr_requests = BLKDEV_MAX_RQ;
+	q->nr_requests = BLKDEV_DEFAULT_RQ;
 
 	return q;
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c838d81ac058..f5cb2931c20d 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -615,7 +615,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	 * Additionally, this is a per-hw queue depth.
 	 */
 	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
-				   BLKDEV_MAX_RQ);
+				   BLKDEV_DEFAULT_RQ);
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_sched_alloc_tags(q, hctx, i);
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 5246ae040704..1e46be6c5178 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -5,7 +5,7 @@
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
 
-#define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ)
+#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
 
 void blk_mq_sched_assign_ioc(struct request *rq);
 
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 531d390902dd..d3f329749173 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -836,7 +836,7 @@ struct rbd_options {
 	u32 alloc_hint_flags;  /* CEPH_OSD_OP_ALLOC_HINT_FLAG_* */
 };
 
-#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_MAX_RQ
+#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_DEFAULT_RQ
 #define RBD_ALLOC_SIZE_DEFAULT	(64 * 1024)
 #define RBD_LOCK_TIMEOUT_DEFAULT 0  /* no timeout */
 #define RBD_READ_ONLY_DEFAULT	false
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3177181c4326..6a64ea23f552 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -45,7 +45,7 @@ struct blk_stat_callback;
 struct blk_keyslot_manager;
 
 #define BLKDEV_MIN_RQ	4
-#define BLKDEV_MAX_RQ	128	/* Default maximum */
+#define BLKDEV_DEFAULT_RQ	128
 
 /* Must be consistent with blk_mq_poll_stats_bkt() */
 #define BLK_MQ_POLL_STATS_BKTS 16
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
  2021-07-14 15:06 ` [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
  2021-07-14 15:06 ` [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-20  7:50   ` Ming Lei
  2021-07-14 15:06 ` [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap() John Garry
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

For shared sbitmap, if the call to blk_mq_tag_update_depth() was
successful for any hctx when hctx->sched_tags is not set, then it would be
successful for all (due to nature in which blk_mq_tag_update_depth()
fails).

As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
each hctx. So relocate the call until after the hctx iteration under the
!q->elevator check, which is equivalent (to !hctx->sched_tags).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ae28f470893c..56e3c6fdba60 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3624,8 +3624,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (!hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
-			if (!ret && blk_mq_is_sbitmap_shared(set->flags))
-				blk_mq_tag_resize_shared_sbitmap(set, nr);
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
@@ -3643,9 +3641,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	}
 	if (!ret) {
 		q->nr_requests = nr;
-		if (q->elevator && blk_mq_is_sbitmap_shared(set->flags))
-			sbitmap_queue_resize(&q->sched_bitmap_tags,
-					     nr - set->reserved_tags);
+		if (blk_mq_is_sbitmap_shared(set->flags)) {
+			if (q->elevator) {
+				sbitmap_queue_resize(&q->sched_bitmap_tags,
+						     nr - set->reserved_tags);
+			} else {
+				blk_mq_tag_resize_shared_sbitmap(set, nr);
+			}
+		}
 	}
 
 	blk_mq_unquiesce_queue(q);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap()
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (2 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-20  7:57   ` Ming Lei
  2021-07-14 15:06 ` [PATCH 5/9] blk-mq: Invert check in blk_mq_update_nr_requests() John Garry
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

Put the functionality to resize the sched shared sbitmap in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c |  3 +--
 block/blk-mq-tag.c   | 10 ++++++++++
 block/blk-mq-tag.h   |  1 +
 block/blk-mq.c       |  3 +--
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index f5cb2931c20d..1e028183557d 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -584,8 +584,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 					&queue->sched_breserved_tags;
 	}
 
-	sbitmap_queue_resize(&queue->sched_bitmap_tags,
-			     queue->nr_requests - set->reserved_tags);
+	blk_mq_tag_resize_sched_shared_sbitmap(queue);
 
 	return 0;
 }
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..55c7f1bf41c7 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -634,6 +634,16 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
 	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
 }
 
+/*
+ * We always resize with q->nr_requests - q->tag_set->reserved_tags, so
+ * don't bother passing a size.
+ */
+void blk_mq_tag_resize_sched_shared_sbitmap(struct request_queue *q)
+{
+	sbitmap_queue_resize(&q->sched_bitmap_tags,
+			     q->nr_requests - q->tag_set->reserved_tags);
+}
+
 /**
  * blk_mq_unique_tag() - return a tag that is unique queue-wide
  * @rq: request for which to compute a unique tag
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 8ed55af08427..3a7495e47fb4 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -48,6 +48,7 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					unsigned int depth, bool can_grow);
 extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
 					     unsigned int size);
+extern void blk_mq_tag_resize_sched_shared_sbitmap(struct request_queue *q);
 
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 56e3c6fdba60..b0d4197d36c7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3643,8 +3643,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		q->nr_requests = nr;
 		if (blk_mq_is_sbitmap_shared(set->flags)) {
 			if (q->elevator) {
-				sbitmap_queue_resize(&q->sched_bitmap_tags,
-						     nr - set->reserved_tags);
+				blk_mq_tag_resize_sched_shared_sbitmap(q);
 			} else {
 				blk_mq_tag_resize_shared_sbitmap(set, nr);
 			}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/9] blk-mq: Invert check in blk_mq_update_nr_requests()
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (3 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap() John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-14 15:06 ` [PATCH RFC 6/9] blk-mq: Refactor blk_mq_{alloc,free}_rqs John Garry
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

It's easier to read:

if (x)
	X;
else
	Y;

over:

if (!x)
	Y;
else
	X;

No functional change intended.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b0d4197d36c7..3ff5c15ce0eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3621,18 +3621,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		 * If we're using an MQ scheduler, just update the scheduler
 		 * queue depth. This is similar to what the old code would do.
 		 */
-		if (!hctx->sched_tags) {
-			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
-							false);
-		} else {
+		if (hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
-							nr, true);
+						      nr, true);
 			if (blk_mq_is_sbitmap_shared(set->flags)) {
 				hctx->sched_tags->bitmap_tags =
 					&q->sched_bitmap_tags;
 				hctx->sched_tags->breserved_tags =
 					&q->sched_breserved_tags;
 			}
+		} else {
+			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
+						      false);
 		}
 		if (ret)
 			break;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 6/9] blk-mq: Refactor blk_mq_{alloc,free}_rqs
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (4 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH 5/9] blk-mq: Invert check in blk_mq_update_nr_requests() John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-14 15:06 ` [PATCH RFC 7/9] blk-mq: Allocate per tag set static rqs for shared sbitmap John Garry
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

It will be required to be able to allocate and free static requests for a
tag set or a request queue (in addition to a struct blk_mq_tags), so break
out the common functionality of blk_mq_{alloc,free}_rqs, passing the
required static rqs ** and page list pointer to the core code.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c | 61 +++++++++++++++++++++++++++++++++-----------------
 block/blk-mq.h |  7 ++++++
 2 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ff5c15ce0eb..801af0993044 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2310,14 +2310,15 @@ static size_t order_to_size(unsigned int order)
 }
 
 /* called before freeing request pool in @tags */
-static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
-		struct blk_mq_tags *tags, unsigned int hctx_idx)
+void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
+			     unsigned int hctx_idx,
+			     struct list_head *page_list)
 {
 	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
 	struct page *page;
 	unsigned long flags;
 
-	list_for_each_entry(page, &tags->page_list, lru) {
+	list_for_each_entry(page, page_list, lru) {
 		unsigned long start = (unsigned long)page_address(page);
 		unsigned long end = start + order_to_size(page->private);
 		int i;
@@ -2343,28 +2344,29 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
 	spin_unlock_irqrestore(&drv_tags->lock, flags);
 }
 
-void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx)
+void __blk_mq_free_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
+		       unsigned int depth, struct list_head *page_list,
+		       struct request **static_rqs)
 {
 	struct page *page;
 
-	if (tags->static_rqs && set->ops->exit_request) {
+	if (static_rqs && set->ops->exit_request) {
 		int i;
 
-		for (i = 0; i < tags->nr_tags; i++) {
-			struct request *rq = tags->static_rqs[i];
+		for (i = 0; i < depth; i++) {
+			struct request *rq = static_rqs[i];
 
 			if (!rq)
 				continue;
 			set->ops->exit_request(set, rq, hctx_idx);
-			tags->static_rqs[i] = NULL;
+			static_rqs[i] = NULL;
 		}
 	}
 
-	blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+	blk_mq_clear_rq_mapping(set, hctx_idx, page_list);
 
-	while (!list_empty(&tags->page_list)) {
-		page = list_first_entry(&tags->page_list, struct page, lru);
+	while (!list_empty(page_list)) {
+		page = list_first_entry(page_list, struct page, lru);
 		list_del_init(&page->lru);
 		/*
 		 * Remove kmemleak object previously allocated in
@@ -2375,6 +2377,13 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	}
 }
 
+void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
+		     unsigned int hctx_idx)
+{
+	__blk_mq_free_rqs(set, hctx_idx, tags->nr_tags, &tags->page_list,
+			  tags->static_rqs);
+}
+
 void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
 {
 	kfree(tags->rqs);
@@ -2437,8 +2446,10 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
 	return 0;
 }
 
-int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
-		     unsigned int hctx_idx, unsigned int depth)
+
+int __blk_mq_alloc_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
+		       unsigned int depth, struct list_head *page_list,
+		       struct request **static_rqs)
 {
 	unsigned int i, j, entries_per_page, max_order = 4;
 	size_t rq_size, left;
@@ -2448,7 +2459,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	INIT_LIST_HEAD(&tags->page_list);
+	INIT_LIST_HEAD(page_list);
 
 	/*
 	 * rq_size is the size of the request plus driver payload, rounded
@@ -2483,7 +2494,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 			goto fail;
 
 		page->private = this_order;
-		list_add_tail(&page->lru, &tags->page_list);
+		list_add_tail(&page->lru, page_list);
 
 		p = page_address(page);
 		/*
@@ -2497,9 +2508,9 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		for (j = 0; j < to_do; j++) {
 			struct request *rq = p;
 
-			tags->static_rqs[i] = rq;
+			static_rqs[i] = rq;
 			if (blk_mq_init_request(set, rq, hctx_idx, node)) {
-				tags->static_rqs[i] = NULL;
+				static_rqs[i] = NULL;
 				goto fail;
 			}
 
@@ -2510,10 +2521,17 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return 0;
 
 fail:
-	blk_mq_free_rqs(set, tags, hctx_idx);
+	__blk_mq_free_rqs(set, hctx_idx, depth, page_list, static_rqs);
 	return -ENOMEM;
 }
 
+int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
+		     unsigned int hctx_idx, unsigned int depth)
+{
+	return __blk_mq_alloc_rqs(set, hctx_idx, depth, &tags->page_list,
+				  tags->static_rqs);
+}
+
 struct rq_iter_data {
 	struct blk_mq_hw_ctx *hctx;
 	bool has_rq;
@@ -2856,12 +2874,13 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 	int ret = 0;
 
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
-					set->queue_depth, set->reserved_tags, flags);
+						  set->queue_depth,
+						  set->reserved_tags, flags);
 	if (!set->tags[hctx_idx])
 		return false;
 
 	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
-				set->queue_depth);
+			       set->queue_depth);
 	if (!ret)
 		return true;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a26..1e0fbb06412b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -54,6 +54,9 @@ void blk_mq_put_rq_ref(struct request *rq);
  */
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx);
+void __blk_mq_free_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
+		       unsigned int depth, struct list_head *page_list,
+		       struct request **static_rqs);
 void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
@@ -63,6 +66,10 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth);
 
+int __blk_mq_alloc_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
+		       unsigned int depth, struct list_head *page_list,
+		       struct request **static_rqs);
+
 /*
  * Internal helpers for request insertion into sw queues
  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC 7/9] blk-mq: Allocate per tag set static rqs for shared sbitmap
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (5 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH RFC 6/9] blk-mq: Refactor blk_mq_{alloc,free}_rqs John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-14 15:06 ` [PATCH 8/9] blk-mq: Allocate per request queue " John Garry
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

It is wasteful for memory that a full set of static rqs is allocated per
hw queue for shared sbitmap, considering that the total number of requests
usable at any given time across all hctx is limited by the shared sbitmap
depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set, and not per hw queue.

Make this change, and ensure that the hctx_idx argument to
blk_mq_init_request() and __blk_mq_free_rqs() is == 0 for case of
shared sbitmap. If a driver needs to init the static rq with a hctx idx,
then it cannot now use shared sbitmap - only SCSI drivers and null block
use shared sbitmap today, and neither use hctx_idx for static request init
or free.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c         | 66 +++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  4 +++
 2 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 801af0993044..764c601376c6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2350,6 +2350,9 @@ void __blk_mq_free_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
 {
 	struct page *page;
 
+	if (WARN_ON((hctx_idx > 0) && blk_mq_is_sbitmap_shared(set->flags)))
+		return;
+
 	if (static_rqs && set->ops->exit_request) {
 		int i;
 
@@ -2455,6 +2458,9 @@ int __blk_mq_alloc_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
 	size_t rq_size, left;
 	int node;
 
+	if (WARN_ON((hctx_idx > 0) && blk_mq_is_sbitmap_shared(set->flags)))
+		return -EINVAL;
+
 	node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
@@ -2875,15 +2881,30 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
 						  set->queue_depth,
-						  set->reserved_tags, flags);
+						  set->reserved_tags,
+						  flags);
 	if (!set->tags[hctx_idx])
 		return false;
 
-	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
-			       set->queue_depth);
+	if (blk_mq_is_sbitmap_shared(flags)) {
+		int i;
+
+		if (!set->static_rqs)
+			goto err_out;
+
+		for (i = 0; i < set->queue_depth; i++)
+			set->tags[hctx_idx]->static_rqs[i] =
+						set->static_rqs[i];
+
+		return true;
+	} else {
+		ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
+				       set->queue_depth);
+	}
 	if (!ret)
 		return true;
 
+err_out:
 	blk_mq_free_rq_map(set->tags[hctx_idx], flags);
 	set->tags[hctx_idx] = NULL;
 	return false;
@@ -2895,7 +2916,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 	unsigned int flags = set->flags;
 
 	if (set->tags && set->tags[hctx_idx]) {
-		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
+		if (!blk_mq_is_sbitmap_shared(set->flags))
+			blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
 		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
 		set->tags[hctx_idx] = NULL;
 	}
@@ -3363,6 +3385,21 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		gfp_t flags = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
+		int ret;
+
+		set->static_rqs = kcalloc_node(set->queue_depth,
+					       sizeof(struct request *),
+					       flags, set->numa_node);
+		if (!set->static_rqs)
+			return -ENOMEM;
+		ret = __blk_mq_alloc_rqs(set, 0, set->queue_depth,
+					 &set->page_list, set->static_rqs);
+		if (ret < 0)
+			goto out_free_static_rqs;
+	}
+
 	for (i = 0; i < set->nr_hw_queues; i++) {
 		if (!__blk_mq_alloc_map_and_request(set, i))
 			goto out_unwind;
@@ -3374,7 +3411,15 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 out_unwind:
 	while (--i >= 0)
 		blk_mq_free_map_and_requests(set, i);
-
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		__blk_mq_free_rqs(set, 0, set->queue_depth, &set->page_list,
+				  set->static_rqs);
+	}
+out_free_static_rqs:
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		kfree(set->static_rqs);
+		set->static_rqs = NULL;
+	}
 	return -ENOMEM;
 }
 
@@ -3601,12 +3646,17 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i, j;
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		blk_mq_exit_shared_sbitmap(set);
+		__blk_mq_free_rqs(set, 0, set->queue_depth, &set->page_list,
+				  set->static_rqs);
+		kfree(set->static_rqs);
+		set->static_rqs = NULL;
+	}
+
 	for (i = 0; i < set->nr_hw_queues; i++)
 		blk_mq_free_map_and_requests(set, i);
 
-	if (blk_mq_is_sbitmap_shared(set->flags))
-		blk_mq_exit_shared_sbitmap(set);
-
 	for (j = 0; j < set->nr_maps; j++) {
 		kfree(set->map[j].mq_map);
 		set->map[j].mq_map = NULL;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1d18447ebebc..68b648ab5435 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -261,6 +261,10 @@ struct blk_mq_tag_set {
 	struct sbitmap_queue	__breserved_tags;
 	struct blk_mq_tags	**tags;
 
+	/* for shared sbitmap */
+	struct request **static_rqs;
+	struct list_head page_list;
+
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
 };
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 8/9] blk-mq: Allocate per request queue static rqs for shared sbitmap
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (6 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH RFC 7/9] blk-mq: Allocate per tag set static rqs for shared sbitmap John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-14 15:06 ` [PATCH 9/9] blk-mq: Clear mappings for shared sbitmap sched static rqs John Garry
  2021-07-20 15:22 ` [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap Ming Lei
  9 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

Similar to allocating a full set of static rqs per hw queue per tag set for
when using a shared sbitmap, it is also inefficient in memory terms to
allocate a full set of static rqs per hw queue per request queue.

Reduce memory usage by allocating a set of static rqs per request queue
for when using a shared sbitmap, and make the per-hctx
sched_tags->static_rqs[] point at them.

Error handling for updating the number of requests in
blk_mq_update_nr_requests() -> blk_mq_tag_update_depth() can get quite
complicated, so allocate the full max depth of rqs at init time to try to
simplify things. This will be somewhat inefficient for when the request
queue depth is not close to max, but generally still more efficient than
the current situation.

For failures in blk_mq_update_nr_requests() -> blk_mq_tag_update_depth()
when shrinking request queue depth, q->nr_requests still needs to be
updated. This is because some of the hctx->sched_tags may be successfully
updated, and now they are now smaller than q->nr_requests, which will lead
to problems since a scheduler tag could be greater than hctx->sched_tags
size.

For failures in blk_mq_update_nr_requests() -> blk_mq_tag_update_depth()
when growing a request queue depth, q->nr_requests does not need to be
updated.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c   | 51 ++++++++++++++++++++++++++++++++++++++----
 block/blk-mq-tag.c     | 12 +++++-----
 block/blk-mq.c         | 20 ++++++++++++++++-
 include/linux/blkdev.h |  4 ++++
 4 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 1e028183557d..7b5c46647820 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -538,6 +538,9 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
+	if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
+		return 0;
+
 	ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
 	if (ret)
 		blk_mq_sched_free_tags(set, hctx, hctx_idx);
@@ -563,8 +566,30 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 {
 	struct blk_mq_tag_set *set = queue->tag_set;
 	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
+	gfp_t flags = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
 	struct blk_mq_hw_ctx *hctx;
-	int ret, i;
+	int ret, i, j;
+
+	/*
+	 * In case we need to grow, allocate max we will ever need. This will
+	 * waste memory when the request queue depth is less than the max,
+	 * i.e. almost always. But helps keep our sanity, rather than dealing
+	 * with error handling in blk_mq_update_nr_requests().
+	 */
+	queue->static_rqs = kcalloc_node(MAX_SCHED_RQ, sizeof(struct request *),
+					 flags, queue->node);
+	if (!queue->static_rqs)
+		return -ENOMEM;
+
+	ret = __blk_mq_alloc_rqs(set, 0, MAX_SCHED_RQ, &queue->page_list,
+				 queue->static_rqs);
+	if (ret)
+		goto err_rqs;
+
+	queue_for_each_hw_ctx(queue, hctx, i) {
+		for (j = 0; j < queue->nr_requests; j++)
+			hctx->sched_tags->static_rqs[j] = queue->static_rqs[j];
+	}
 
 	/*
 	 * Set initial depth at max so that we don't need to reallocate for
@@ -575,7 +600,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 				  MAX_SCHED_RQ, set->reserved_tags,
 				  set->numa_node, alloc_policy);
 	if (ret)
-		return ret;
+		goto err_bitmaps;
 
 	queue_for_each_hw_ctx(queue, hctx, i) {
 		hctx->sched_tags->bitmap_tags =
@@ -587,10 +612,24 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
 	blk_mq_tag_resize_sched_shared_sbitmap(queue);
 
 	return 0;
+
+err_bitmaps:
+	__blk_mq_free_rqs(set, 0, MAX_SCHED_RQ, &queue->page_list,
+			  queue->static_rqs);
+err_rqs:
+	kfree(queue->static_rqs);
+	queue->static_rqs = NULL;
+	return ret;
 }
 
 static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
 {
+	__blk_mq_free_rqs(queue->tag_set, 0, MAX_SCHED_RQ, &queue->page_list,
+			  queue->static_rqs);
+
+	kfree(queue->static_rqs);
+	queue->static_rqs = NULL;
+
 	sbitmap_queue_free(&queue->sched_bitmap_tags);
 	sbitmap_queue_free(&queue->sched_breserved_tags);
 }
@@ -670,8 +709,12 @@ void blk_mq_sched_free_requests(struct request_queue *q)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (hctx->sched_tags)
-			blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+		if (hctx->sched_tags) {
+			if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+			} else {
+				blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+			}
+		}
 	}
 }
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 55c7f1bf41c7..e5a21907306a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -592,7 +592,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	if (tdepth > tags->nr_tags) {
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
 		struct blk_mq_tags *new;
-		bool ret;
 
 		if (!can_grow)
 			return -EINVAL;
@@ -608,13 +607,14 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 				tags->nr_reserved_tags, set->flags);
 		if (!new)
 			return -ENOMEM;
-		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
-		if (ret) {
-			blk_mq_free_rq_map(new, set->flags);
-			return -ENOMEM;
+		if (!blk_mq_is_sbitmap_shared(hctx->flags)) {
+			if (blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth)) {
+				blk_mq_free_rq_map(new, set->flags);
+				return -ENOMEM;
+			}
+			blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
 		}
 
-		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
 		blk_mq_free_rq_map(*tagsptr, set->flags);
 		*tagsptr = new;
 	} else {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 764c601376c6..443fd64239ed 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3694,10 +3694,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 						      nr, true);
 			if (blk_mq_is_sbitmap_shared(set->flags)) {
+				int j;
+
 				hctx->sched_tags->bitmap_tags =
 					&q->sched_bitmap_tags;
 				hctx->sched_tags->breserved_tags =
 					&q->sched_breserved_tags;
+
+				for (j = 0;j < hctx->sched_tags->nr_tags; j++) {
+					hctx->sched_tags->static_rqs[j] =
+							q->static_rqs[j];
+				}
 			}
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
@@ -3708,7 +3715,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (q->elevator && q->elevator->type->ops.depth_updated)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
-	if (!ret) {
+	if (ret) {
+		if (blk_mq_is_sbitmap_shared(set->flags) && (q->elevator)) {
+			/*
+			 * If we error'ed, then we need to revert to the
+			 * lowest size, otherwise we may attempt to reference
+			 * unset hctx->sched_tags->static_rqs[]
+			 */
+			q->nr_requests = min((unsigned long)nr,
+					     q->nr_requests);
+			blk_mq_tag_resize_sched_shared_sbitmap(q);
+		}
+	} else {
 		q->nr_requests = nr;
 		if (blk_mq_is_sbitmap_shared(set->flags)) {
 			if (q->elevator) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6a64ea23f552..97e80a07adb2 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -470,6 +470,10 @@ struct request_queue {
 	struct sbitmap_queue	sched_bitmap_tags;
 	struct sbitmap_queue	sched_breserved_tags;
 
+	/* For shared sbitmap */
+	struct request **static_rqs;
+	struct list_head page_list;
+
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
 	DECLARE_BITMAP		(blkcg_pols, BLKCG_MAX_POLS);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 9/9] blk-mq: Clear mappings for shared sbitmap sched static rqs
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (7 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH 8/9] blk-mq: Allocate per request queue " John Garry
@ 2021-07-14 15:06 ` John Garry
  2021-07-20 15:22 ` [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap Ming Lei
  9 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-14 15:06 UTC (permalink / raw)
  To: axboe
  Cc: linux-block, linux-kernel, ming.lei, linux-scsi, kashyap.desai,
	hare, John Garry

To avoid use-after-free in accessing stale requests in the driver tags
rqs[], clear the mappings for the request queue static rqs.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c | 1 +
 block/blk-mq.h       | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 7b5c46647820..f1cea7f3bc68 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -711,6 +711,7 @@ void blk_mq_sched_free_requests(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->sched_tags) {
 			if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+				blk_mq_clear_rq_mapping(q->tag_set, i, &q->page_list);
 			} else {
 				blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
 			}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1e0fbb06412b..a5b7aa7a07b9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -69,6 +69,8 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 int __blk_mq_alloc_rqs(struct blk_mq_tag_set *set, unsigned int hctx_idx,
 		       unsigned int depth, struct list_head *page_list,
 		       struct request **static_rqs);
+void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set, unsigned int hctx_idx,
+			     struct list_head *page_list);
 
 /*
  * Internal helpers for request insertion into sw queues
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs()
  2021-07-14 15:06 ` [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
@ 2021-07-20  7:35   ` Ming Lei
  0 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2021-07-20  7:35 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On Wed, Jul 14, 2021 at 11:06:27PM +0800, John Garry wrote:
> The original code in commit 24d2f90309b23 ("blk-mq: split out tag
> initialization, support shared tags") would check tags->rqs is non-NULL and
> then dereference tags->rqs[].
> 
> Then in commit 2af8cbe30531 ("blk-mq: split tag ->rqs[] into two"), we
> started to dereference tags->static_rqs[], but continued to check non-NULL
> tags->rqs.
> 
> Check tags->static_rqs as non-NULL instead, which is more logical.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/blk-mq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2c4ac51e54eb..ae28f470893c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2348,7 +2348,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  {
>  	struct page *page;
>  
> -	if (tags->rqs && set->ops->exit_request) {
> +	if (tags->static_rqs && set->ops->exit_request) {

Yeah, it is reasonable to check ->static_rqs since both ->init_request()
and ->exit_request() operate on request from ->static_rqs[]:

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
  2021-07-14 15:06 ` [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
@ 2021-07-20  7:44   ` Ming Lei
  0 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2021-07-20  7:44 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On Wed, Jul 14, 2021 at 11:06:28PM +0800, John Garry wrote:
> It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as
> the name BLKDEV_MAX_RQ would imply the max requests always, which it is
> not.
> 
> Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being
> the default number of requests assigned when allocating a request queue.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/blk-core.c       | 2 +-
>  block/blk-mq-sched.c   | 2 +-
>  block/blk-mq-sched.h   | 2 +-
>  drivers/block/rbd.c    | 2 +-
>  include/linux/blkdev.h | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 04477697ee4b..5d71382b6131 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -579,7 +579,7 @@ struct request_queue *blk_alloc_queue(int node_id)
>  
>  	blk_queue_dma_alignment(q, 511);
>  	blk_set_default_limits(&q->limits);
> -	q->nr_requests = BLKDEV_MAX_RQ;
> +	q->nr_requests = BLKDEV_DEFAULT_RQ;

The above assignment isn't necessary since bio based queue doesn't use
->nr_requests. For request based queue, ->nr_requests will be re-set
in either blk_mq_init_sched() or blk_mq_init_allocated_queue(), but
that may not be related with this patch itself.

>  
>  	return q;
>  
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index c838d81ac058..f5cb2931c20d 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -615,7 +615,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  	 * Additionally, this is a per-hw queue depth.
>  	 */
>  	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
> -				   BLKDEV_MAX_RQ);
> +				   BLKDEV_DEFAULT_RQ);
>  
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		ret = blk_mq_sched_alloc_tags(q, hctx, i);
> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 5246ae040704..1e46be6c5178 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -5,7 +5,7 @@
>  #include "blk-mq.h"
>  #include "blk-mq-tag.h"
>  
> -#define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ)
> +#define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
>  
>  void blk_mq_sched_assign_ioc(struct request *rq);
>  
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 531d390902dd..d3f329749173 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -836,7 +836,7 @@ struct rbd_options {
>  	u32 alloc_hint_flags;  /* CEPH_OSD_OP_ALLOC_HINT_FLAG_* */
>  };
>  
> -#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_MAX_RQ
> +#define RBD_QUEUE_DEPTH_DEFAULT	BLKDEV_DEFAULT_RQ
>  #define RBD_ALLOC_SIZE_DEFAULT	(64 * 1024)
>  #define RBD_LOCK_TIMEOUT_DEFAULT 0  /* no timeout */
>  #define RBD_READ_ONLY_DEFAULT	false
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3177181c4326..6a64ea23f552 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -45,7 +45,7 @@ struct blk_stat_callback;
>  struct blk_keyslot_manager;
>  
>  #define BLKDEV_MIN_RQ	4
> -#define BLKDEV_MAX_RQ	128	/* Default maximum */
> +#define BLKDEV_DEFAULT_RQ	128
>  
>  /* Must be consistent with blk_mq_poll_stats_bkt() */
>  #define BLK_MQ_POLL_STATS_BKTS 16
> -- 
> 2.26.2
> 

Looks fine,

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  2021-07-14 15:06 ` [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
@ 2021-07-20  7:50   ` Ming Lei
  2021-07-20  8:06     ` John Garry
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2021-07-20  7:50 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On Wed, Jul 14, 2021 at 11:06:29PM +0800, John Garry wrote:
> For shared sbitmap, if the call to blk_mq_tag_update_depth() was
> successful for any hctx when hctx->sched_tags is not set, then it would be
> successful for all (due to nature in which blk_mq_tag_update_depth()
> fails).
> 
> As such, there is no need to call blk_mq_tag_resize_shared_sbitmap() for
> each hctx. So relocate the call until after the hctx iteration under the
> !q->elevator check, which is equivalent (to !hctx->sched_tags).
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/blk-mq.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index ae28f470893c..56e3c6fdba60 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3624,8 +3624,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  		if (!hctx->sched_tags) {
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>  							false);
> -			if (!ret && blk_mq_is_sbitmap_shared(set->flags))
> -				blk_mq_tag_resize_shared_sbitmap(set, nr);
>  		} else {
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>  							nr, true);
> @@ -3643,9 +3641,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  	}
>  	if (!ret) {
>  		q->nr_requests = nr;
> -		if (q->elevator && blk_mq_is_sbitmap_shared(set->flags))
> -			sbitmap_queue_resize(&q->sched_bitmap_tags,
> -					     nr - set->reserved_tags);
> +		if (blk_mq_is_sbitmap_shared(set->flags)) {
> +			if (q->elevator) {
> +				sbitmap_queue_resize(&q->sched_bitmap_tags,
> +						     nr - set->reserved_tags);
> +			} else {
> +				blk_mq_tag_resize_shared_sbitmap(set, nr);
> +			}

The above two {} can be removed.

-- 
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap()
  2021-07-14 15:06 ` [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap() John Garry
@ 2021-07-20  7:57   ` Ming Lei
  2021-07-20  8:08     ` John Garry
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2021-07-20  7:57 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On Wed, Jul 14, 2021 at 11:06:30PM +0800, John Garry wrote:
> Put the functionality to resize the sched shared sbitmap in a common
> function.
> 
> Since the same formula is always used to resize, and it can be got from
> the request queue argument, so just pass the request queue pointer.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/blk-mq-sched.c |  3 +--
>  block/blk-mq-tag.c   | 10 ++++++++++
>  block/blk-mq-tag.h   |  1 +
>  block/blk-mq.c       |  3 +--
>  4 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index f5cb2931c20d..1e028183557d 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -584,8 +584,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
>  					&queue->sched_breserved_tags;
>  	}
>  
> -	sbitmap_queue_resize(&queue->sched_bitmap_tags,
> -			     queue->nr_requests - set->reserved_tags);
> +	blk_mq_tag_resize_sched_shared_sbitmap(queue);
>  
>  	return 0;
>  }
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..55c7f1bf41c7 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -634,6 +634,16 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
>  	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
>  }
>  
> +/*
> + * We always resize with q->nr_requests - q->tag_set->reserved_tags, so
> + * don't bother passing a size.
> + */
> +void blk_mq_tag_resize_sched_shared_sbitmap(struct request_queue *q)
> +{
> +	sbitmap_queue_resize(&q->sched_bitmap_tags,
> +			     q->nr_requests - q->tag_set->reserved_tags);
> +}

It is a bit hard to follow the resize part of the name, since no size
parameter passed in. Maybe update is better?

-- 
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
  2021-07-20  7:50   ` Ming Lei
@ 2021-07-20  8:06     ` John Garry
  0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-20  8:06 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On 20/07/2021 08:50, Ming Lei wrote:
>> Signed-off-by: John Garry<john.garry@huawei.com>
>> ---
>>   block/blk-mq.c | 13 ++++++++-----
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index ae28f470893c..56e3c6fdba60 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3624,8 +3624,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   		if (!hctx->sched_tags) {
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>>   							false);
>> -			if (!ret && blk_mq_is_sbitmap_shared(set->flags))
>> -				blk_mq_tag_resize_shared_sbitmap(set, nr);
>>   		} else {
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>   							nr, true);
>> @@ -3643,9 +3641,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   	}
>>   	if (!ret) {
>>   		q->nr_requests = nr;
>> -		if (q->elevator && blk_mq_is_sbitmap_shared(set->flags))
>> -			sbitmap_queue_resize(&q->sched_bitmap_tags,
>> -					     nr - set->reserved_tags);
>> +		if (blk_mq_is_sbitmap_shared(set->flags)) {

Hi Ming,

>> +			if (q->elevator) {
>> +				sbitmap_queue_resize(&q->sched_bitmap_tags,
>> +						     nr - set->reserved_tags);

I have learned that some people prefer {} for multi-line single 
statements, like this.

Anyway, more code is added here later in the series, so better to add {} 
now, rather than re-arrange code later.

>> +			} else {
>> +				blk_mq_tag_resize_shared_sbitmap(set, nr);
>> +			}
> The above two {} can be removed.

Thanks,
John



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap()
  2021-07-20  7:57   ` Ming Lei
@ 2021-07-20  8:08     ` John Garry
  0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-20  8:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On 20/07/2021 08:57, Ming Lei wrote:
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index f5cb2931c20d..1e028183557d 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -584,8 +584,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
>>   					&queue->sched_breserved_tags;
>>   	}
>>   
>> -	sbitmap_queue_resize(&queue->sched_bitmap_tags,
>> -			     queue->nr_requests - set->reserved_tags);
>> +	blk_mq_tag_resize_sched_shared_sbitmap(queue);
>>   
>>   	return 0;
>>   }
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 86f87346232a..55c7f1bf41c7 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -634,6 +634,16 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
>>   	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
>>   }
>>   
>> +/*
>> + * We always resize with q->nr_requests - q->tag_set->reserved_tags, so
>> + * don't bother passing a size.
>> + */
>> +void blk_mq_tag_resize_sched_shared_sbitmap(struct request_queue *q)
>> +{
>> +	sbitmap_queue_resize(&q->sched_bitmap_tags,
>> +			     q->nr_requests - q->tag_set->reserved_tags);
>> +}
> It is a bit hard to follow the resize part of the name, since no size
> parameter passed in. Maybe update is better?

Right, this function is a bit odd. Maybe I can just have a size argument 
for consistency with blk_mq_tag_resize_shared_sbitmap().

Thanks,
John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap
  2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
                   ` (8 preceding siblings ...)
  2021-07-14 15:06 ` [PATCH 9/9] blk-mq: Clear mappings for shared sbitmap sched static rqs John Garry
@ 2021-07-20 15:22 ` Ming Lei
  2021-07-20 17:05   ` John Garry
  9 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2021-07-20 15:22 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On Wed, Jul 14, 2021 at 11:06:26PM +0800, John Garry wrote:
> Currently a full set of static requests are allocated per hw queue per
> tagset when shared sbitmap is used.
> 
> However, only tagset->queue_depth number of requests may be active at
> any given time. As such, only tagset->queue_depth number of static
> requests are required.
> 
> The same goes for using an IO scheduler, which allocates a full set of
> static requests per hw queue per request queue.
> 
> This series very significantly reduces memory usage in both scenarios by
> allocating static rqs per tagset and per request queue, respectively,
> rather than per hw queue per tagset and per request queue.
> 
> For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
> save approx. 300MB(!) [370MB -> 60MB]
> 
> A couple of patches are marked as RFC, as maybe there is a better
> implementation approach.

There is another candidate for addressing this issue, and looks simpler:

 block/blk-mq-sched.c |  4 ++++
 block/blk-mq-tag.c   |  4 ++++
 block/blk-mq-tag.h   |  3 +++
 block/blk-mq.c       | 18 ++++++++++++++++++
 block/blk-mq.h       | 11 +++++++++++
 5 files changed, 40 insertions(+)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index c838d81ac058..b9236ee0fe4e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -538,6 +538,10 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
+	blk_mq_set_master_tags(hctx->sched_tags,
+			q->queue_hw_ctx[0]->sched_tags, hctx->flags,
+			hctx_idx);
+
 	ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
 	if (ret)
 		blk_mq_sched_free_tags(set, hctx, hctx_idx);
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 86f87346232a..c471a073234d 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -608,6 +608,10 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 				tags->nr_reserved_tags, set->flags);
 		if (!new)
 			return -ENOMEM;
+
+		blk_mq_set_master_tags(new,
+			hctx->queue->queue_hw_ctx[0]->sched_tags, set->flags,
+			hctx->queue_num);
 		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
 		if (ret) {
 			blk_mq_free_rq_map(new, set->flags);
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 8ed55af08427..0a3fbbc61e06 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -21,6 +21,9 @@ struct blk_mq_tags {
 	struct request **static_rqs;
 	struct list_head page_list;
 
+	/* only used for blk_mq_is_sbitmap_shared() */
+	struct blk_mq_tags	*master;
+
 	/*
 	 * used to clear request reference in rqs[] before freeing one
 	 * request pool
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2c4ac51e54eb..ef8a6a7e5f7c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2348,6 +2348,15 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 {
 	struct page *page;
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		if (tags->master)
+			tags = tags->master;
+		if (hctx_idx < set->nr_hw_queues - 1) {
+			blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+			return;
+		}
+	}
+
 	if (tags->rqs && set->ops->exit_request) {
 		int i;
 
@@ -2444,6 +2453,12 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	size_t rq_size, left;
 	int node;
 
+	if (blk_mq_is_sbitmap_shared(set->flags) && tags->master) {
+		memcpy(tags->static_rqs, tags->master->static_rqs,
+		       sizeof(tags->static_rqs[0]) * tags->nr_tags);
+		return 0;
+	}
+
 	node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
@@ -2860,6 +2875,9 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 	if (!set->tags[hctx_idx])
 		return false;
 
+	blk_mq_set_master_tags(set->tags[hctx_idx], set->tags[0], flags,
+			       hctx_idx);
+
 	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
 				set->queue_depth);
 	if (!ret)
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a26..a08b89be6acc 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -354,5 +354,16 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 	return __blk_mq_active_requests(hctx) < depth;
 }
 
+static inline void blk_mq_set_master_tags(struct blk_mq_tags *tags,
+		struct blk_mq_tags *master_tags, unsigned int flags,
+		unsigned int hctx_idx)
+{
+	if (blk_mq_is_sbitmap_shared(flags)) {
+		if (hctx_idx)
+			tags->master = master_tags;
+		else
+			tags->master = NULL;
+	}
+}
 
 #endif


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap
  2021-07-20 15:22 ` [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap Ming Lei
@ 2021-07-20 17:05   ` John Garry
  0 siblings, 0 replies; 18+ messages in thread
From: John Garry @ 2021-07-20 17:05 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, linux-block, linux-kernel, linux-scsi, kashyap.desai, hare

On 20/07/2021 16:22, Ming Lei wrote:
> On Wed, Jul 14, 2021 at 11:06:26PM +0800, John Garry wrote:
>> Currently a full set of static requests are allocated per hw queue per
>> tagset when shared sbitmap is used.
>>
>> However, only tagset->queue_depth number of requests may be active at
>> any given time. As such, only tagset->queue_depth number of static
>> requests are required.
>>
>> The same goes for using an IO scheduler, which allocates a full set of
>> static requests per hw queue per request queue.
>>
>> This series very significantly reduces memory usage in both scenarios by
>> allocating static rqs per tagset and per request queue, respectively,
>> rather than per hw queue per tagset and per request queue.
>>
>> For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
>> save approx. 300MB(!) [370MB -> 60MB]
>>
>> A couple of patches are marked as RFC, as maybe there is a better
>> implementation approach.

Hi Ming,

To be clear, my concerns with my implementation were:
a. Interface of __blk_mq_{alloc,free}_rqs was a bit strange in passing a 
struct list_head * and struct request ** . But maybe it's ok.
b. We need to ensure that the host should not expect a set of static rqs 
per hw queue, i.e. should not use blk_mq_ops.init_request hctx_idx 
argument. The guard I have added is effectively useless against that. I 
am thinking that we should have a variant of blk_mq_ops.init_request 
without a hctx_idx argument, which needs to be used for shared sbitmap.

> 
> There is another candidate for addressing this issue, and looks simpler:

Thanks, I think that this solution does not deal with b., above, either.

> 
>   block/blk-mq-sched.c |  4 ++++
>   block/blk-mq-tag.c   |  4 ++++
>   block/blk-mq-tag.h   |  3 +++
>   block/blk-mq.c       | 18 ++++++++++++++++++
>   block/blk-mq.h       | 11 +++++++++++
>   5 files changed, 40 insertions(+)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index c838d81ac058..b9236ee0fe4e 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -538,6 +538,10 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
>   	if (!hctx->sched_tags)
>   		return -ENOMEM;
>   
> +	blk_mq_set_master_tags(hctx->sched_tags,
> +			q->queue_hw_ctx[0]->sched_tags, hctx->flags,
> +			hctx_idx);
> +
>   	ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
>   	if (ret)
>   		blk_mq_sched_free_tags(set, hctx, hctx_idx);
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 86f87346232a..c471a073234d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -608,6 +608,10 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>   				tags->nr_reserved_tags, set->flags);
>   		if (!new)
>   			return -ENOMEM;
> +
> +		blk_mq_set_master_tags(new,
> +			hctx->queue->queue_hw_ctx[0]->sched_tags, set->flags,
> +			hctx->queue_num);
>   		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
>   		if (ret) {
>   			blk_mq_free_rq_map(new, set->flags);
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 8ed55af08427..0a3fbbc61e06 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -21,6 +21,9 @@ struct blk_mq_tags {
>   	struct request **static_rqs;
>   	struct list_head page_list;
>   
> +	/* only used for blk_mq_is_sbitmap_shared() */
> +	struct blk_mq_tags	*master;

It is a bit odd to have a pointer to struct blk_mq_tags in struct 
blk_mq_tags like this

> +
>   	/*
>   	 * used to clear request reference in rqs[] before freeing one
>   	 * request pool
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2c4ac51e54eb..ef8a6a7e5f7c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2348,6 +2348,15 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>   {
>   	struct page *page;
>   
> +	if (blk_mq_is_sbitmap_shared(set->flags)) {
> +		if (tags->master)
> +			tags = tags->master;
> +		if (hctx_idx < set->nr_hw_queues - 1) {
> +			blk_mq_clear_rq_mapping(set, tags, hctx_idx);
> +			return;
> +		}
> +	}
> +
>   	if (tags->rqs && set->ops->exit_request) {
>   		int i;
>   
> @@ -2444,6 +2453,12 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>   	size_t rq_size, left;
>   	int node;
>   
> +	if (blk_mq_is_sbitmap_shared(set->flags) && tags->master) {
> +		memcpy(tags->static_rqs, tags->master->static_rqs,
> +		       sizeof(tags->static_rqs[0]) * tags->nr_tags);
> +		return 0;
> +	}
> +
>   	node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
>   	if (node == NUMA_NO_NODE)
>   		node = set->numa_node;
> @@ -2860,6 +2875,9 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
>   	if (!set->tags[hctx_idx])
>   		return false;
>   
> +	blk_mq_set_master_tags(set->tags[hctx_idx], set->tags[0], flags,
> +			       hctx_idx);
> +
>   	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
>   				set->queue_depth);
>   	if (!ret)
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index d08779f77a26..a08b89be6acc 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -354,5 +354,16 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>   	return __blk_mq_active_requests(hctx) < depth;
>   }
>   
> +static inline void blk_mq_set_master_tags(struct blk_mq_tags *tags,
> +		struct blk_mq_tags *master_tags, unsigned int flags,
> +		unsigned int hctx_idx)
> +{
> +	if (blk_mq_is_sbitmap_shared(flags)) {
> +		if (hctx_idx)
> +			tags->master = master_tags;
> +		else
> +			tags->master = NULL;
> +	}
> +}
>   
>   #endif
> 

I'll check this solution a bit further.

Thanks,
John

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-07-20 17:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-14 15:06 [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap John Garry
2021-07-14 15:06 ` [PATCH 1/9] blk-mq: Change rqs check in blk_mq_free_rqs() John Garry
2021-07-20  7:35   ` Ming Lei
2021-07-14 15:06 ` [PATCH 2/9] block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ John Garry
2021-07-20  7:44   ` Ming Lei
2021-07-14 15:06 ` [PATCH 3/9] blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() John Garry
2021-07-20  7:50   ` Ming Lei
2021-07-20  8:06     ` John Garry
2021-07-14 15:06 ` [PATCH 4/9] blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap() John Garry
2021-07-20  7:57   ` Ming Lei
2021-07-20  8:08     ` John Garry
2021-07-14 15:06 ` [PATCH 5/9] blk-mq: Invert check in blk_mq_update_nr_requests() John Garry
2021-07-14 15:06 ` [PATCH RFC 6/9] blk-mq: Refactor blk_mq_{alloc,free}_rqs John Garry
2021-07-14 15:06 ` [PATCH RFC 7/9] blk-mq: Allocate per tag set static rqs for shared sbitmap John Garry
2021-07-14 15:06 ` [PATCH 8/9] blk-mq: Allocate per request queue " John Garry
2021-07-14 15:06 ` [PATCH 9/9] blk-mq: Clear mappings for shared sbitmap sched static rqs John Garry
2021-07-20 15:22 ` [PATCH 0/9] blk-mq: Reduce static requests memory footprint for shared sbitmap Ming Lei
2021-07-20 17:05   ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).