linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] blk-mq: Support sharing tags across hardware queues
@ 2019-11-26 17:56 Bart Van Assche
  2019-11-26 17:56 ` [PATCH 1/3] blk-mq: Remove some unused function arguments Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bart Van Assche @ 2019-11-26 17:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

Although the block layer already supports sharing hardware queues across
request queues, it does not yet support sharing tags across hardware queues.
Some SCSI hardware needs this functionality because this is a good match for
how some SCSI HBA's work. This patch does not incur a performance overhead
for block drivers that do not share tags across hardware queues.

Note: my original plan was to post this patch series after the merge window
has closed. I'm posting this now to allow comparison with alternative
approaches.

Thanks,

Bart.

Bart Van Assche (2):
  blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into
    blk_mq_tags
  block: Add support for sharing tags across hardware queues

John Garry (1):
  blk-mq: Remove some unused function arguments

 block/blk-mq-debugfs.c | 42 ++++++++++++++++++++++++++++++++++++++----
 block/blk-mq-sched.c   |  8 ++++----
 block/blk-mq-sched.h   |  2 +-
 block/blk-mq-tag.c     | 19 +++++++++++--------
 block/blk-mq-tag.h     | 13 +++++++++++--
 block/blk-mq.c         | 38 +++++++++++++++++++++++++-------------
 block/blk-mq.h         |  2 +-
 include/linux/blk-mq.h | 10 ++++++----
 8 files changed, 97 insertions(+), 37 deletions(-)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] blk-mq: Remove some unused function arguments
  2019-11-26 17:56 [PATCH 0/3] blk-mq: Support sharing tags across hardware queues Bart Van Assche
@ 2019-11-26 17:56 ` Bart Van Assche
  2019-11-26 17:56 ` [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags Bart Van Assche
  2019-11-26 17:56 ` [PATCH 3/3] block: Add support for sharing tags across hardware queues Bart Van Assche
  2 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2019-11-26 17:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, John Garry,
	Christoph Hellwig, Ming Lei, Hannes Reinecke

From: John Garry <john.garry@huawei.com>

The hardware queue argument in blk_mq_put_tag(), blk_mq_poll_nsecs(), and
blk_mq_poll_hybrid_sleep() is unused, so remove it.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
[ bvanassche: edited patch description ]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c |  4 ++--
 block/blk-mq-tag.h |  3 +--
 block/blk-mq.c     | 10 ++++------
 block/blk-mq.h     |  2 +-
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index fbacde454718..586c9d6e904a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -183,8 +183,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	return tag + tag_offset;
 }
 
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-		    struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+		    unsigned int tag)
 {
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 15bc74acb57e..d0c10d043891 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,7 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-			   struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 323c9cb28066..fec4b82ff91c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -477,9 +477,9 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
 	if (rq->tag != -1)
-		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
@@ -3340,7 +3340,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
 }
 
 static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
-				       struct blk_mq_hw_ctx *hctx,
 				       struct request *rq)
 {
 	unsigned long ret = 0;
@@ -3373,7 +3372,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 }
 
 static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
-				     struct blk_mq_hw_ctx *hctx,
 				     struct request *rq)
 {
 	struct hrtimer_sleeper hs;
@@ -3393,7 +3391,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
-		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+		nsecs = blk_mq_poll_nsecs(q, rq);
 
 	if (!nsecs)
 		return false;
@@ -3448,7 +3446,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 			return false;
 	}
 
-	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
 /**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index eaaca8fc1c28..ba37cd64894c 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -199,7 +199,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
 	rq->tag = -1;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags
  2019-11-26 17:56 [PATCH 0/3] blk-mq: Support sharing tags across hardware queues Bart Van Assche
  2019-11-26 17:56 ` [PATCH 1/3] blk-mq: Remove some unused function arguments Bart Van Assche
@ 2019-11-26 17:56 ` Bart Van Assche
  2019-11-27  0:43   ` Ming Lei
  2019-11-26 17:56 ` [PATCH 3/3] block: Add support for sharing tags across hardware queues Bart Van Assche
  2 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-11-26 17:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, John Garry

If each hardware queue has its own tag set it's fine to manage these
flags per hardware queue. Since the next patch will share tag sets across
hardware queues, move these flags into blk_mq_tags. This patch does not
change any functionality.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-debugfs.c |  2 --
 block/blk-mq-sched.c   |  8 ++++----
 block/blk-mq-sched.h   |  2 +-
 block/blk-mq-tag.c     |  8 ++++----
 block/blk-mq-tag.h     | 10 ++++++++++
 include/linux/blk-mq.h |  2 --
 6 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..3678e95ec947 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -211,8 +211,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 #define HCTX_STATE_NAME(name) [BLK_MQ_S_##name] = #name
 static const char *const hctx_state_name[] = {
 	HCTX_STATE_NAME(STOPPED),
-	HCTX_STATE_NAME(TAG_ACTIVE),
-	HCTX_STATE_NAME(SCHED_RESTART),
 };
 #undef HCTX_STATE_NAME
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..7d98b6513148 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -64,18 +64,18 @@ void blk_mq_sched_assign_ioc(struct request *rq)
  */
 void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
 {
-	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
+	if (test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state))
 		return;
 
-	set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	set_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
 
 void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
 {
-	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
+	if (!test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state))
 		return;
-	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	clear_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
 
 	blk_mq_run_hw_queue(hctx, true);
 }
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 126021fc3a11..15174a646468 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -82,7 +82,7 @@ static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
 
 static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
 {
-	return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
+	return test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
 }
 
 #endif
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 586c9d6e904a..a60e1b4a8158 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -23,8 +23,8 @@
  */
 bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
-	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+	if (!test_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state) &&
+	    !test_and_set_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
 		atomic_inc(&hctx->tags->active_queues);
 
 	return true;
@@ -48,7 +48,7 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
 	struct blk_mq_tags *tags = hctx->tags;
 
-	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+	if (!test_and_clear_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
 		return;
 
 	atomic_dec(&tags->active_queues);
@@ -67,7 +67,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 
 	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
 		return true;
-	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+	if (!test_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
 		return true;
 
 	/*
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index d0c10d043891..f75fa936b090 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -4,6 +4,11 @@
 
 #include "blk-mq.h"
 
+enum {
+	BLK_MQ_T_ACTIVE		= 1,
+	BLK_MQ_T_SCHED_RESTART	= 2,
+};
+
 /*
  * Tag address space map.
  */
@@ -11,6 +16,11 @@ struct blk_mq_tags {
 	unsigned int nr_tags;
 	unsigned int nr_reserved_tags;
 
+	/**
+	 * @state: BLK_MQ_T_* flags. Defines the state of the hw
+	 * queue (active, scheduled to restart).
+	 */
+	unsigned long	state;
 	atomic_t active_queues;
 
 	struct sbitmap_queue bitmap_tags;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 11cfd6470b1a..522631d108af 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -394,8 +394,6 @@ enum {
 	BLK_MQ_F_ALLOC_POLICY_BITS = 1,
 
 	BLK_MQ_S_STOPPED	= 0,
-	BLK_MQ_S_TAG_ACTIVE	= 1,
-	BLK_MQ_S_SCHED_RESTART	= 2,
 
 	BLK_MQ_MAX_DEPTH	= 10240,
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] block: Add support for sharing tags across hardware queues
  2019-11-26 17:56 [PATCH 0/3] blk-mq: Support sharing tags across hardware queues Bart Van Assche
  2019-11-26 17:56 ` [PATCH 1/3] blk-mq: Remove some unused function arguments Bart Van Assche
  2019-11-26 17:56 ` [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags Bart Van Assche
@ 2019-11-26 17:56 ` Bart Van Assche
  2019-11-27  9:51   ` John Garry
  2 siblings, 1 reply; 6+ messages in thread
From: Bart Van Assche @ 2019-11-26 17:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, John Garry

Add a boolean member 'share_tags' in struct blk_mq_tag_set. If that member
variable is set, make all hctx->tags[] pointers identical. Implement the
necessary changes in the functions that allocate, free and resize tag sets.
Modify blk_mq_tagset_busy_iter() such that it continues to call the
callback function once per request. Modify blk_mq_queue_tag_busy_iter()
such that the callback function is only called with the correct hctx
as first argument. Modify the debugfs code such that it keeps showing only
matching tags per hctx.

This patch has been tested by running blktests on top of a kernel that
includes the following change to enable shared tags for all block drivers
except the NVMe drivers:

diff --git a/block/blk-mq.c b/block/blk-mq.c
@@ -3037,6 +3037,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)

        BUILD_BUG_ON(BLK_MQ_MAX_DEPTH > 1 << BLK_MQ_UNIQUE_TAG_BITS);

+       /* Test code: enable tag sharing for all block drivers except NVMe */
+       if (!set->ops->poll)
+               set->share_tags = true;
+
        if (!set->nr_hw_queues)
                return -EINVAL;
        if (!set->queue_depth)

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-debugfs.c | 40 ++++++++++++++++++++++++++++++++++++++--
 block/blk-mq-tag.c     |  7 +++++--
 block/blk-mq.c         | 28 +++++++++++++++++++++-------
 include/linux/blk-mq.h |  8 ++++++--
 4 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3678e95ec947..653e80ede3bd 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -472,20 +472,56 @@ static int hctx_tags_show(void *data, struct seq_file *m)
 	return res;
 }
 
+struct hctx_sb_data {
+	struct sbitmap		*sb;	/* output bitmap */
+	struct blk_mq_hw_ctx	*hctx;	/* input hctx */
+};
+
+static bool hctx_filter_fn(struct blk_mq_hw_ctx *hctx, struct request *req,
+			   void *priv, bool reserved)
+{
+	struct hctx_sb_data *hctx_sb_data = priv;
+
+	if (hctx == hctx_sb_data->hctx)
+		sbitmap_set_bit(hctx_sb_data->sb, req->tag);
+	return true;
+}
+
+static void hctx_filter_sb(struct sbitmap *sb, struct blk_mq_hw_ctx *hctx)
+{
+	struct hctx_sb_data hctx_sb_data = { .sb = sb, .hctx = hctx };
+
+	blk_mq_queue_tag_busy_iter(hctx->queue, hctx_filter_fn, &hctx_sb_data);
+}
+
 static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
 	struct request_queue *q = hctx->queue;
+	struct sbitmap sb, *hctx_sb;
 	int res;
 
+	if (!hctx->tags)
+		return 0;
+	hctx_sb = &hctx->tags->bitmap_tags.sb;
+	res = sbitmap_init_node(&sb, hctx_sb->depth, hctx_sb->shift, GFP_KERNEL,
+				NUMA_NO_NODE);
+	if (res)
+		return res;
+
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
+	/*
+	 * If tags are shared across hardware queues, hctx_sb contains tags
+	 * for multiple hardware queues. Filter the tags for 'hctx' into 'sb'.
+	 */
+	hctx_filter_sb(&sb, hctx);
 	mutex_unlock(&q->sysfs_lock);
+	sbitmap_bitmap_show(&sb, m);
 
 out:
+	sbitmap_free(&sb);
 	return res;
 }
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a60e1b4a8158..770fe2324230 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -341,8 +341,11 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 	int i;
 
 	for (i = 0; i < tagset->nr_hw_queues; i++) {
-		if (tagset->tags && tagset->tags[i])
+		if (tagset->tags && tagset->tags[i]) {
 			blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);
+			if (tagset->share_tags)
+				break;
+		}
 	}
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index fec4b82ff91c..fa4cfc4b7e7c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2404,6 +2404,12 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 {
 	int ret = 0;
 
+	if (hctx_idx > 0 && set->share_tags) {
+		WARN_ON_ONCE(!set->tags[0]);
+		set->tags[hctx_idx] = set->tags[0];
+		return 0;
+	}
+
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
 					set->queue_depth, set->reserved_tags);
 	if (!set->tags[hctx_idx])
@@ -2423,8 +2429,10 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 					 unsigned int hctx_idx)
 {
 	if (set->tags && set->tags[hctx_idx]) {
-		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
-		blk_mq_free_rq_map(set->tags[hctx_idx]);
+		if (hctx_idx == 0 || !set->share_tags) {
+			blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
+			blk_mq_free_rq_map(set->tags[hctx_idx]);
+		}
 		set->tags[hctx_idx] = NULL;
 	}
 }
@@ -2568,7 +2576,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 
 	mutex_lock(&set->tag_list_lock);
 	list_del_rcu(&q->tag_set_list);
-	if (list_is_singular(&set->tag_list)) {
+	if (list_is_singular(&set->tag_list) && !set->share_tags) {
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_SHARED;
 		/* update existing queue */
@@ -2586,7 +2594,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	/*
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
-	if (!list_empty(&set->tag_list) &&
+	if ((!list_empty(&set->tag_list) || set->share_tags) &&
 	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
 		set->flags |= BLK_MQ_F_TAG_SHARED;
 		/* update existing queue */
@@ -2911,15 +2919,21 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 {
 	int i;
 
-	for (i = 0; i < set->nr_hw_queues; i++)
-		if (!__blk_mq_alloc_rq_map(set, i))
+	for (i = 0; i < set->nr_hw_queues; i++) {
+		if (i > 0 && set->share_tags) {
+			set->tags[i] = set->tags[0];
+		} else if (!__blk_mq_alloc_rq_map(set, i))
 			goto out_unwind;
+	}
 
 	return 0;
 
 out_unwind:
-	while (--i >= 0)
+	while (--i >= 0) {
+		if (i > 0 && set->share_tags)
+			continue;
 		blk_mq_free_rq_map(set->tags[i]);
+	}
 
 	return -ENOMEM;
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 522631d108af..dd5517476314 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -224,10 +224,13 @@ enum hctx_type {
  * @numa_node:	   NUMA node the storage adapter has been connected to.
  * @timeout:	   Request processing timeout in jiffies.
  * @flags:	   Zero or more BLK_MQ_F_* flags.
+ * @share_tags:	   Whether or not to share one tag set across hardware queues.
  * @driver_data:   Pointer to data owned by the block driver that created this
  *		   tag set.
- * @tags:	   Tag sets. One tag set per hardware queue. Has @nr_hw_queues
- *		   elements.
+ * @tags:	   Array of tag set pointers. Has @nr_hw_queues elements. If
+ *		   share_tags has not been set, all tag set pointers are
+ *		   different. If share_tags has been set, all tag_set pointers
+ *		   are identical.
  * @tag_list_lock: Serializes tag_list accesses.
  * @tag_list:	   List of the request queues that use this tag set. See also
  *		   request_queue.tag_set_list.
@@ -243,6 +246,7 @@ struct blk_mq_tag_set {
 	int			numa_node;
 	unsigned int		timeout;
 	unsigned int		flags;
+	bool			share_tags;
 	void			*driver_data;
 
 	struct blk_mq_tags	**tags;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags
  2019-11-26 17:56 ` [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags Bart Van Assche
@ 2019-11-27  0:43   ` Ming Lei
  0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-11-27  0:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, John Garry

On Tue, Nov 26, 2019 at 09:56:55AM -0800, Bart Van Assche wrote:
> If each hardware queue has its own tag set it's fine to manage these
> flags per hardware queue. Since the next patch will share tag sets across
> hardware queues, move these flags into blk_mq_tags. This patch does not
> change any functionality.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: John Garry <john.garry@huawei.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-debugfs.c |  2 --
>  block/blk-mq-sched.c   |  8 ++++----
>  block/blk-mq-sched.h   |  2 +-
>  block/blk-mq-tag.c     |  8 ++++----
>  block/blk-mq-tag.h     | 10 ++++++++++
>  include/linux/blk-mq.h |  2 --
>  6 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index b3f2ba483992..3678e95ec947 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -211,8 +211,6 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
>  #define HCTX_STATE_NAME(name) [BLK_MQ_S_##name] = #name
>  static const char *const hctx_state_name[] = {
>  	HCTX_STATE_NAME(STOPPED),
> -	HCTX_STATE_NAME(TAG_ACTIVE),
> -	HCTX_STATE_NAME(SCHED_RESTART),
>  };
>  #undef HCTX_STATE_NAME
>  
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..7d98b6513148 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -64,18 +64,18 @@ void blk_mq_sched_assign_ioc(struct request *rq)
>   */
>  void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
>  {
> -	if (test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
> +	if (test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state))
>  		return;
>  
> -	set_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
> +	set_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_sched_mark_restart_hctx);
>  
>  void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
>  {
> -	if (!test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state))
> +	if (!test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state))
>  		return;
> -	clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
> +	clear_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
>  
>  	blk_mq_run_hw_queue(hctx, true);
>  }

RESTART is supposed for restarting the hctx of this request queue,
instead of the tags of host-wide, which is covered by blk_mq_mark_tag_wait().

> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
> index 126021fc3a11..15174a646468 100644
> --- a/block/blk-mq-sched.h
> +++ b/block/blk-mq-sched.h
> @@ -82,7 +82,7 @@ static inline bool blk_mq_sched_has_work(struct blk_mq_hw_ctx *hctx)
>  
>  static inline bool blk_mq_sched_needs_restart(struct blk_mq_hw_ctx *hctx)
>  {
> -	return test_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
> +	return test_bit(BLK_MQ_T_SCHED_RESTART, &hctx->tags->state);
>  }
>  
>  #endif
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 586c9d6e904a..a60e1b4a8158 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -23,8 +23,8 @@
>   */
>  bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
> -	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
> -	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +	if (!test_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state) &&
> +	    !test_and_set_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
>  		atomic_inc(&hctx->tags->active_queues);

The above is wrong.

With this change, tags->active_queues may become just 1, and the
variable is supposed to represent number of active LUNs using this
shared tags.

That is said the flag of BLK_MQ_T_ACTIVE is really per-hctx instead of
per-tags.

>  
>  	return true;
> @@ -48,7 +48,7 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>  {
>  	struct blk_mq_tags *tags = hctx->tags;
>  
> -	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +	if (!test_and_clear_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
>  		return;
>  
>  	atomic_dec(&tags->active_queues);
> @@ -67,7 +67,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>  
>  	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
>  		return true;
> -	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> +	if (!test_bit(BLK_MQ_T_ACTIVE, &hctx->tags->state))
>  		return true;
>  
>  	/*
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index d0c10d043891..f75fa936b090 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -4,6 +4,11 @@
>  
>  #include "blk-mq.h"
>  
> +enum {
> +	BLK_MQ_T_ACTIVE		= 1,
> +	BLK_MQ_T_SCHED_RESTART	= 2,
> +};
> +
>  /*
>   * Tag address space map.
>   */
> @@ -11,6 +16,11 @@ struct blk_mq_tags {
>  	unsigned int nr_tags;
>  	unsigned int nr_reserved_tags;
>  
> +	/**
> +	 * @state: BLK_MQ_T_* flags. Defines the state of the hw
> +	 * queue (active, scheduled to restart).
> +	 */
> +	unsigned long	state;

It isn't unusual for SCSI HBA to see hundreds of LUNs, and this patch
will make .state shared for these LUNs, and read/write concurrently from
IO path on all these queues, and performance should hurt much in this way
given all related helpers are used in hot path.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 3/3] block: Add support for sharing tags across hardware queues
  2019-11-26 17:56 ` [PATCH 3/3] block: Add support for sharing tags across hardware queues Bart Van Assche
@ 2019-11-27  9:51   ` John Garry
  0 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2019-11-27  9:51 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Christoph Hellwig, Ming Lei,
	Hannes Reinecke

On 26/11/2019 17:56, Bart Van Assche wrote:
> Add a boolean member 'share_tags' in struct blk_mq_tag_set. If that member
> variable is set, make all hctx->tags[] pointers identical. Implement the
> necessary changes in the functions that allocate, free and resize tag sets.
> Modify blk_mq_tagset_busy_iter() such that it continues to call the
> callback function once per request. Modify blk_mq_queue_tag_busy_iter()
> such that the callback function is only called with the correct hctx
> as first argument. Modify the debugfs code such that it keeps showing only
> matching tags per hctx.
> 

Hi Bart,

 > This patch has been tested by running blktests on top of a kernel that
 > includes the following change to enable shared tags for all block drivers
 > except the NVMe drivers:

Could something be broken here with this approach, see ***:

static int
nvme_init_request(struct blk_mq_tag_set *set, struct request *req,
unsigned int hctx_idx, unsigned int numa_node)
{
	struct nvme_dev *dev = set->driver_data;
	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
	int queue_idx = (set == &dev->tagset) ? hctx_idx + 1 : 0;
	struct nvme_queue *nvmeq = &dev->queues[queue_idx];

	BUG_ON(!nvmeq);
	iod->nvmeq = nvmeq; ***

	nvme_req(req)->ctrl = &dev->ctrl;
	return 0;
}

All iods are from hctx0, but could use different hctx's and nvme queues.

Obviously NVMe would not want shared tags, but I am just trying to 
illustrate a potential problem in how requests are associated with 
queues. I haven't audited all users.

Thanks,
John

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-11-27  9:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26 17:56 [PATCH 0/3] blk-mq: Support sharing tags across hardware queues Bart Van Assche
2019-11-26 17:56 ` [PATCH 1/3] blk-mq: Remove some unused function arguments Bart Van Assche
2019-11-26 17:56 ` [PATCH 2/3] blk-mq: Move the TAG_ACTIVE and SCHED_RESTART flags from hctx into blk_mq_tags Bart Van Assche
2019-11-27  0:43   ` Ming Lei
2019-11-26 17:56 ` [PATCH 3/3] block: Add support for sharing tags across hardware queues Bart Van Assche
2019-11-27  9:51   ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).