All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: <axboe@kernel.dk>, <jejb@linux.ibm.com>,
	<martin.petersen@oracle.com>, <don.brace@microsemi.com>,
	<kashyap.desai@broadcom.com>, <sumit.saxena@broadcom.com>,
	<ming.lei@redhat.com>, <bvanassche@acm.org>, <hare@suse.com>,
	<hch@lst.de>, <shivasharan.srikanteshwara@broadcom.com>
Cc: <linux-block@vger.kernel.org>, <linux-scsi@vger.kernel.org>,
	<esc.storagedev@microsemi.com>, <chenxiang66@hisilicon.com>,
	<megaraidlinux.pdl@broadcom.com>,
	John Garry <john.garry@huawei.com>
Subject: [PATCH RFC v7 04/12] blk-mq: Facilitate a shared sbitmap per tagset
Date: Thu, 11 Jun 2020 01:29:11 +0800	[thread overview]
Message-ID: <1591810159-240929-5-git-send-email-john.garry@huawei.com> (raw)
In-Reply-To: <1591810159-240929-1-git-send-email-john.garry@huawei.com>

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0607d
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW queues
need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092ef ("scsi:
 core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c     | 39 +++++++++++++++++++++++++++++++++++++--
 block/blk-mq-tag.h     | 10 +++++++++-
 block/blk-mq.c         | 24 +++++++++++++++++++++++-
 block/blk-mq.h         |  5 +++++
 include/linux/blk-mq.h |  6 ++++++
 5 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index be39db3c88d7..92843e3e1a2a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -228,7 +228,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -466,6 +466,7 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 		     round_robin, node))
 		goto free_bitmap_tags;
 
+	/* We later overwrite these in case of per-set shared sbitmap */
 	tags->bitmap_tags = &tags->__bitmap_tags;
 	tags->breserved_tags = &tags->__breserved_tags;
 
@@ -475,7 +476,32 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	unsigned int depth = tag_set->queue_depth - tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->__bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->__breserved_tags, tag_set->reserved_tags,
+		     round_robin, node))
+		goto free_bitmap_tags;
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->__bitmap_tags);
+	return false;
+}
+
+void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	sbitmap_queue_free(&tag_set->__bitmap_tags);
+	sbitmap_queue_free(&tag_set->__breserved_tags);
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
 {
@@ -502,6 +528,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
+	/*
+	 * Do not free tags->{bitmap, breserved}_tags, as this may point to
+	 * shared sbitmap
+	 */
 	sbitmap_queue_free(&tags->__bitmap_tags);
 	sbitmap_queue_free(&tags->__breserved_tags);
 	kfree(tags);
@@ -560,6 +590,11 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	return 0;
 }
 
+void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
+{
+	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
+}
+
 /**
  * blk_mq_unique_tag() - return a tag that is unique queue-wide
  * @rq: request for which to compute a unique tag
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index cebf7a4b280a..cf39dd13a24d 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -25,7 +25,12 @@ struct blk_mq_tags {
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+					    unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
@@ -34,6 +39,9 @@ extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
+extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
+					     unsigned int size);
+
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90b645c3092c..77120dd4e4d5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2229,7 +2229,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
@@ -3349,11 +3349,28 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set)) {
+		if (!blk_mq_init_shared_sbitmap(set)) {
+			ret = -ENOMEM;
+			goto out_free_mq_rq_maps;
+		}
+
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			struct blk_mq_tags *tags = set->tags[i];
+
+			tags->bitmap_tags = &set->__bitmap_tags;
+			tags->breserved_tags = &set->__breserved_tags;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
 	return 0;
 
+out_free_mq_rq_maps:
+	for (i = 0; i < set->nr_hw_queues; i++)
+		blk_mq_free_rq_map(set->tags[i]);
 out_free_mq_map:
 	for (i = 0; i < set->nr_maps; i++) {
 		kfree(set->map[i].mq_map);
@@ -3372,6 +3389,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 	for (i = 0; i < set->nr_hw_queues; i++)
 		blk_mq_free_map_and_requests(set, i);
 
+	if (blk_mq_is_sbitmap_shared(set))
+		blk_mq_exit_shared_sbitmap(set);
+
 	for (j = 0; j < set->nr_maps; j++) {
 		kfree(set->map[j].mq_map);
 		set->map[j].mq_map = NULL;
@@ -3408,6 +3428,8 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (!hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
+			if (!ret && blk_mq_is_sbitmap_shared(set))
+				blk_mq_tag_resize_shared_sbitmap(set, nr);
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a139b0631817..1a283c707215 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -158,6 +158,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+	return tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED;
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 233209e8030d..7b31cdb92a71 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -231,6 +231,9 @@ enum hctx_type {
  * @flags:	   Zero or more BLK_MQ_F_* flags.
  * @driver_data:   Pointer to data owned by the block driver that created this
  *		   tag set.
+ * @__bitmap_tags: A shared tags sbitmap, used over all hctx's
+ * @__breserved_tags:
+ *		   A shared reserved tags sbitmap, used over all hctx's
  * @tags:	   Tag sets. One tag set per hardware queue. Has @nr_hw_queues
  *		   elements.
  * @tag_list_lock: Serializes tag_list accesses.
@@ -250,6 +253,8 @@ struct blk_mq_tag_set {
 	unsigned int		flags;
 	void			*driver_data;
 
+	struct sbitmap_queue	__bitmap_tags;
+	struct sbitmap_queue	__breserved_tags;
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -398,6 +403,7 @@ enum {
 	 * completing IO:
 	 */
 	BLK_MQ_F_STACKING	= 1 << 2,
+	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.26.2


  parent reply	other threads:[~2020-06-10 17:33 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-10 17:29 [PATCH RFC v7 00/12] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2020-06-10 17:29 ` [PATCH RFC v7 01/12] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2020-06-10 17:29 ` [PATCH RFC v7 02/12] blk-mq: rename blk_mq_update_tag_set_depth() John Garry
2020-06-11  2:57   ` Ming Lei
2020-06-11  8:26     ` John Garry
2020-06-23 11:25       ` John Garry
2020-06-23 14:23         ` Hannes Reinecke
2020-06-24  8:13           ` Kashyap Desai
2020-06-29 16:18             ` John Garry
2020-08-10 16:51           ` Kashyap Desai
2020-08-11  8:01             ` John Garry
2020-08-11 16:34               ` Kashyap Desai
2020-06-10 17:29 ` [PATCH RFC v7 03/12] blk-mq: Use pointers for blk_mq_tags bitmap tags John Garry
2020-06-10 17:29 ` John Garry [this message]
2020-06-11  3:37   ` [PATCH RFC v7 04/12] blk-mq: Facilitate a shared sbitmap per tagset Ming Lei
2020-06-11 10:09     ` John Garry
2020-06-10 17:29 ` [PATCH RFC v7 05/12] blk-mq: Record nr_active_requests per queue for when using shared sbitmap John Garry
2020-06-11  4:04   ` Ming Lei
2020-06-11 10:22     ` John Garry
2020-06-10 17:29 ` [PATCH RFC v7 06/12] blk-mq: Record active_queues_shared_sbitmap per tag_set " John Garry
2020-06-11 13:16   ` Hannes Reinecke
2020-06-11 14:22     ` John Garry
2020-06-10 17:29 ` [PATCH RFC v7 07/12] blk-mq: Add support in hctx_tags_bitmap_show() for a " John Garry
2020-06-11 13:19   ` Hannes Reinecke
2020-06-11 14:33     ` John Garry
2020-06-12  6:06       ` Hannes Reinecke
2020-06-29 15:32         ` About sbitmap_bitmap_show() and cleared bits (was Re: [PATCH RFC v7 07/12] blk-mq: Add support in hctx_tags_bitmap_show() for a shared sbitmap) John Garry
2020-06-30  6:33           ` Hannes Reinecke
2020-06-30  7:30             ` John Garry
2020-06-30 11:36               ` John Garry
2020-06-30 14:55           ` Bart Van Assche
2020-07-13  9:41         ` [PATCH RFC v7 07/12] blk-mq: Add support in hctx_tags_bitmap_show() for a shared sbitmap John Garry
2020-07-13 12:20           ` Hannes Reinecke
2020-06-10 17:29 ` [PATCH RFC v7 08/12] scsi: Add template flag 'host_tagset' John Garry
2020-06-10 17:29 ` [PATCH RFC v7 09/12] scsi: hisi_sas: Switch v3 hw to MQ John Garry
2020-06-10 17:29 ` [PATCH RFC v7 10/12] megaraid_sas: switch fusion adapters " John Garry
2020-07-02 10:23   ` Kashyap Desai
2020-07-06  8:23     ` John Garry
2020-07-06  8:45       ` Hannes Reinecke
2020-07-06  9:26         ` John Garry
2020-07-06  9:40           ` Hannes Reinecke
2020-07-06 19:19       ` Kashyap Desai
2020-07-07  7:58         ` John Garry
2020-07-07 14:45           ` Kashyap Desai
2020-07-07 16:17             ` John Garry
2020-07-09 19:01               ` Kashyap Desai
2020-07-10  8:10                 ` John Garry
2020-07-13  7:55                   ` Kashyap Desai
2020-07-13  8:42                     ` John Garry
2020-07-19 19:07                       ` Kashyap Desai
2020-07-20  7:23                       ` Kashyap Desai
2020-07-20  9:18                         ` John Garry
2020-07-21  1:13                         ` Ming Lei
2020-07-21  6:53                           ` Kashyap Desai
2020-07-22  4:12                             ` Ming Lei
2020-07-22  5:30                               ` Kashyap Desai
2020-07-22  8:04                                 ` Ming Lei
2020-07-22  9:32                                   ` John Garry
2020-07-23 14:07                                     ` Ming Lei
2020-07-23 17:29                                       ` John Garry
2020-07-24  2:47                                         ` Ming Lei
2020-07-28  7:54                                           ` John Garry
2020-07-28  8:45                                             ` Ming Lei
2020-07-29  5:25                                               ` Kashyap Desai
2020-07-29 15:36                                                 ` Ming Lei
2020-07-29 18:31                                                   ` Kashyap Desai
2020-08-04  8:36                                                     ` Ming Lei
2020-08-04  9:27                                                       ` Kashyap Desai
2020-08-05  8:40                                                         ` Ming Lei
2020-08-06 10:25                                                           ` Kashyap Desai
2020-08-06 13:38                                                             ` Ming Lei
2020-08-06 14:37                                                               ` Kashyap Desai
2020-08-06 15:29                                                                 ` Ming Lei
2020-08-08 19:05                                                                   ` Kashyap Desai
2020-08-09  2:16                                                                     ` Ming Lei
2020-08-10 16:38                                                                       ` Kashyap Desai
2020-08-11  8:09                                                                         ` John Garry
2020-08-04 17:00                                               ` John Garry
2020-08-05  2:56                                                 ` Ming Lei
2020-07-28  8:01                                   ` Kashyap Desai
2020-07-08 11:31         ` John Garry
2020-06-10 17:29 ` [PATCH RFC v7 11/12] smartpqi: enable host tagset John Garry
2020-07-14 13:16   ` John Garry
2020-07-14 13:31     ` John Garry
2020-07-14 18:16       ` Don.Brace
2020-07-15  7:28         ` John Garry
2020-07-14 14:02     ` Hannes Reinecke
2020-08-18  8:33       ` John Garry
2020-06-10 17:29 ` [PATCH RFC v7 12/12] hpsa: enable host_tagset and switch to MQ John Garry
2020-07-14  7:37   ` John Garry
2020-07-14  7:41     ` Hannes Reinecke
2020-07-14  7:52       ` John Garry
2020-07-14  8:06         ` Ming Lei
2020-07-14  9:53           ` John Garry
2020-07-14 10:14             ` Ming Lei
2020-07-14 10:43               ` Hannes Reinecke
2020-07-14 10:19             ` Hannes Reinecke
2020-07-14 10:35               ` John Garry
2020-07-14 10:44               ` Ming Lei
2020-07-14 10:52                 ` John Garry
2020-07-14 12:04                   ` Ming Lei
2020-08-03 20:39         ` Don.Brace
2020-08-04  9:27           ` John Garry
2020-08-04 15:18             ` Don.Brace
2020-08-05 11:21               ` John Garry
2020-08-14 21:04                 ` Don.Brace
2020-08-17  8:00                   ` John Garry
2020-08-17 18:39                     ` Don.Brace
2020-08-18  7:14                       ` Hannes Reinecke
2020-07-16 16:14     ` Don.Brace
2020-07-16 19:45     ` Don.Brace
2020-07-17 10:11       ` John Garry
2020-06-11  3:07 ` [PATCH RFC v7 00/12] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Ming Lei
2020-06-11  9:35   ` John Garry
2020-06-12 18:47     ` Kashyap Desai
2020-06-15  2:13       ` Ming Lei
2020-06-15  6:57         ` Kashyap Desai
2020-06-16  1:00           ` Ming Lei
2020-06-17 11:26             ` Kashyap Desai
2020-06-22  6:24               ` Hannes Reinecke
2020-06-23  0:55                 ` Ming Lei
2020-06-23 11:50                   ` Kashyap Desai
2020-06-23 12:11                   ` Kashyap Desai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1591810159-240929-5-git-send-email-john.garry@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=chenxiang66@hisilicon.com \
    --cc=don.brace@microsemi.com \
    --cc=esc.storagedev@microsemi.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=jejb@linux.ibm.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=ming.lei@redhat.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.