linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2019-11-26  9:14 Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 1/8] blk-mq: Remove some unused function arguments Hannes Reinecke
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

Hi all,

here now is an updated version of the v2 patchset from John Garry,
including the suggestions and reviews from the mailing list.
John, apologies for hijacking your work :-)

The main diffence is that I've changed the bitmaps to be allocated
separately in all cases, and just set the pointer to the shared bitmap
for the hostwide tags case.
I've also modified smartpqi and hpsa to take advantage of host_tags.

I did audit the iterators, and I _think_ they do the correct thing even
in the shared bitmap case. But then I might have overlooked things,
so feedback and reviews are welcome.

The one thing I'm not happy with is the debugfs interface; for shared
bitmaps all will be displaying essentially the same information, which
could be moved to a top-level directory. But that would change the
layout and I'm not sure if that buys us anything.

Differences to v2:
- Drop embedded tag bitmaps
- Do not share scheduling tags
- Add patches for hpsa and smartpqi

Differences to v1:
- Use a shared sbitmap, and not a separate shared tags (a big change!)
	- Drop request.shared_tag
- Add RB tags

Hannes Reinecke (4):
  blk-mq: Use a pointer for sbitmap
  scsi: Add template flag 'host_tagset'
  smartpqi: enable host tagset
  hpsa: switch to using blk-mq

John Garry (3):
  blk-mq: Remove some unused function arguments
  blk-mq: Facilitate a shared sbitmap per tagset
  scsi: hisi_sas: Switch v3 hw to MQ

Ming Lei (1):
  blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED

 block/bfq-iosched.c                    |   4 +-
 block/blk-mq-debugfs.c                 |  10 ++--
 block/blk-mq-sched.c                   |   8 ++-
 block/blk-mq-tag.c                     | 104 ++++++++++++++++++++++-----------
 block/blk-mq-tag.h                     |  18 +++---
 block/blk-mq.c                         |  80 ++++++++++++++++---------
 block/blk-mq.h                         |   9 ++-
 block/kyber-iosched.c                  |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h       |   3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  |  36 ++++++------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  86 +++++++++++----------------
 drivers/scsi/hpsa.c                    |  44 +++-----------
 drivers/scsi/hpsa.h                    |   1 -
 drivers/scsi/scsi_lib.c                |   2 +
 drivers/scsi/smartpqi/smartpqi_init.c  |  38 ++++++++----
 include/linux/blk-mq.h                 |   9 ++-
 include/scsi/scsi_host.h               |   3 +
 17 files changed, 260 insertions(+), 199 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/8] blk-mq: Remove some unused function arguments
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 2/8] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block

From: John Garry <john.garry@huawei.com>

The struct blk_mq_hw_ctx * argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c |  4 ++--
 block/blk-mq-tag.h |  4 ++--
 block/blk-mq.c     | 10 ++++------
 block/blk-mq.h     |  2 +-
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 008388e82b5c..53b4a9414fbd 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -191,8 +191,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	return tag + tag_offset;
 }
 
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-		    struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+		    unsigned int tag)
 {
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..66d04dea0bdb 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,8 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-			   struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+				unsigned int tag);
 extern bool blk_mq_has_free_tags(struct blk_mq_tags *tags);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6e3b15f70cd7..16aa20d23b67 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -499,9 +499,9 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
 	if (rq->tag != -1)
-		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
@@ -3354,7 +3354,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
 }
 
 static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
-				       struct blk_mq_hw_ctx *hctx,
 				       struct request *rq)
 {
 	unsigned long ret = 0;
@@ -3387,7 +3386,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 }
 
 static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
-				     struct blk_mq_hw_ctx *hctx,
 				     struct request *rq)
 {
 	struct hrtimer_sleeper hs;
@@ -3407,7 +3405,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
-		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+		nsecs = blk_mq_poll_nsecs(q, rq);
 
 	if (!nsecs)
 		return false;
@@ -3462,7 +3460,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 			return false;
 	}
 
-	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
 /**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 32c62c64e6c2..78d38b5f2793 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -208,7 +208,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
 	rq->tag = -1;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/8] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 1/8] blk-mq: Remove some unused function arguments Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 3/8] blk-mq: Use a pointer for sbitmap Hannes Reinecke
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block

From: Ming Lei <ming.lei@redhat.com>

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-debugfs.c |  2 +-
 block/blk-mq-tag.c     |  2 +-
 block/blk-mq-tag.h     |  4 ++--
 block/blk-mq.c         | 20 ++++++++++----------
 include/linux/blk-mq.h |  2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..33a40ae1d60f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -236,7 +236,7 @@ static const char *const alloc_policy_name[] = {
 #define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
-	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 53b4a9414fbd..d7aa23c82dbf 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -73,7 +73,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 66d04dea0bdb..6c0f7c9ce9f6 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -55,7 +55,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
 static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return false;
 
 	return __blk_mq_tag_busy(hctx);
@@ -63,7 +63,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return;
 
 	__blk_mq_tag_idle(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16aa20d23b67..6b39cf0efdcd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -302,7 +302,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		rq->tag = -1;
 		rq->internal_tag = tag;
 	} else {
-		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
+		if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
 			rq_flags = RQF_MQ_INFLIGHT;
 			atomic_inc(&data->hctx->nr_active);
 		}
@@ -1118,7 +1118,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -1249,7 +1249,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				 * For non-shared tags, the RESTART check
 				 * will suffice.
 				 */
-				if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+				if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 					no_tag = true;
 				break;
 			}
@@ -2358,7 +2358,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
 	hctx->queue = q;
-	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
 
 	INIT_LIST_HEAD(&hctx->hctx_list);
 
@@ -2575,9 +2575,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (shared)
-			hctx->flags |= BLK_MQ_F_TAG_SHARED;
+			hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		else
-			hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+			hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 	}
 }
 
@@ -2603,7 +2603,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	list_del_rcu(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
-		set->flags &= ~BLK_MQ_F_TAG_SHARED;
+		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, false);
 	}
@@ -2620,12 +2620,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
 	if (!list_empty(&set->tag_list) &&
-	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
-		set->flags |= BLK_MQ_F_TAG_SHARED;
+	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, true);
 	}
-	if (set->flags & BLK_MQ_F_TAG_SHARED)
+	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
 	list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 0bf056de5cc3..147185394a25 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -225,7 +225,7 @@ struct blk_mq_ops {
 
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
-	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 1/8] blk-mq: Remove some unused function arguments Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 2/8] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26 15:14   ` Jens Axboe
  2019-11-26  9:14 ` [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

Instead of allocating the tag bitmap in place we should be using a
pointer. This is in preparation for shared host-wide bitmaps.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/bfq-iosched.c    |  4 +--
 block/blk-mq-debugfs.c |  8 +++---
 block/blk-mq-tag.c     | 68 +++++++++++++++++++++++++++++++-------------------
 block/blk-mq-tag.h     |  4 +--
 block/blk-mq.c         |  5 ++--
 block/kyber-iosched.c  |  4 +--
 6 files changed, 54 insertions(+), 39 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0319d6339822..ca89d0c34994 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
+	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 33a40ae1d60f..fca87470e267 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -449,11 +449,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 		   atomic_read(&tags->active_queues));
 
 	seq_puts(m, "\nbitmap_tags:\n");
-	sbitmap_queue_show(&tags->bitmap_tags, m);
+	sbitmap_queue_show(tags->bitmap_tags, m);
 
 	if (tags->nr_reserved_tags) {
 		seq_puts(m, "\nbreserved_tags:\n");
-		sbitmap_queue_show(&tags->breserved_tags, m);
+		sbitmap_queue_show(tags->breserved_tags, m);
 	}
 }
 
@@ -484,7 +484,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
@@ -518,7 +518,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->sched_tags)
-		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d7aa23c82dbf..332d71cd3976 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -20,7 +20,7 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
 	if (!tags)
 		return true;
 
-	return sbitmap_any_bit_clear(&tags->bitmap_tags.sb);
+	return sbitmap_any_bit_clear(&tags->bitmap_tags->sb);
 }
 
 /*
@@ -43,9 +43,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->bitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->breserved_tags);
 }
 
 /*
@@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_TAG_FAIL;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -170,9 +170,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->breserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->bitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -198,10 +198,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -329,8 +329,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, tags->breserved_tags, fn, priv, true);
+	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, false);
 }
 
 /**
@@ -427,8 +427,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -440,24 +440,34 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
 				       node);
 }
 
-static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
-						   int node, int alloc_policy)
+static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
+				   int node, int alloc_policy)
 {
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
-	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
+	tags->bitmap_tags = kzalloc(sizeof(struct sbitmap_queue), GFP_KERNEL);
+	if (!tags->bitmap_tags)
+		return -ENOMEM;
+	if (bt_alloc(tags->bitmap_tags, depth, round_robin, node))
 		goto free_tags;
-	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
-		     node))
+
+	tags->breserved_tags = kzalloc(sizeof(struct sbitmap_queue),
+				       GFP_KERNEL);
+	if (!tags->breserved_tags)
 		goto free_bitmap_tags;
+	if (bt_alloc(tags->breserved_tags, tags->nr_reserved_tags,
+		     round_robin, node))
+		goto free_reserved_tags;
 
-	return tags;
+	return 0;
+free_reserved_tags:
+	kfree(tags->breserved_tags);
 free_bitmap_tags:
-	sbitmap_queue_free(&tags->bitmap_tags);
+	sbitmap_queue_free(tags->bitmap_tags);
 free_tags:
-	kfree(tags);
-	return NULL;
+	kfree(tags->bitmap_tags);
+	return -ENOMEM;
 }
 
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
@@ -478,13 +488,19 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
-	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
+	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+		kfree(tags);
+		tags = NULL;
+	}
+	return tags;
 }
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	sbitmap_queue_free(&tags->bitmap_tags);
-	sbitmap_queue_free(&tags->breserved_tags);
+	sbitmap_queue_free(tags->bitmap_tags);
+	kfree(tags->bitmap_tags);
+	sbitmap_queue_free(tags->breserved_tags);
+	kfree(tags->breserved_tags);
 	kfree(tags);
 }
 
@@ -534,7 +550,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
+		sbitmap_queue_resize(tags->bitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 6c0f7c9ce9f6..10b66fd4664a 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,8 +13,8 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
-	struct sbitmap_queue bitmap_tags;
-	struct sbitmap_queue breserved_tags;
+	struct sbitmap_queue *bitmap_tags;
+	struct sbitmap_queue *breserved_tags;
 
 	struct request **rqs;
 	struct request **static_rqs;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b39cf0efdcd..f0c40fcbd8ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1095,7 +1095,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = &hctx->tags->bitmap_tags;
+		sbq = hctx->tags->bitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1113,7 +1113,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = hctx->tags->bitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -2081,7 +2081,6 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 	tags->rqs = NULL;
 	kfree(tags->static_rqs);
 	tags->static_rqs = NULL;
-
 	blk_mq_free_tags(tags);
 }
 
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0ef637..a7a537501d70 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
-	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+	sbitmap_queue_min_shallow_depth(hctx->sched_tags->bitmap_tags,
 					kqd->async_depth);
 
 	return 0;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (2 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 3/8] blk-mq: Use a pointer for sbitmap Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26 11:05   ` Ming Lei
  2019-11-26  9:14 ` [PATCH 5/8] scsi: Add template flag 'host_tagset' Hannes Reinecke
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

From: John Garry <john.garry@huawei.com>

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown.

To solve that problem, Ming's patchset to drain hctx's should ensure no
IOs are missed in-flight [1].

However, to take advantage of that patchset, we need to map the HBA HW
queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
host busy counter, which would stop the LLDD being sent more than
.can_queue commands; however, we should still ensure that the block layer
does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
tags and requests per hctx; the reason for this is that if we only allocate
tags and requests for a single hctx - like hctx0 - we may break block
drivers which expect a request be associated with a specific hctx, i.e.
not hctx0.

This is based on work originally from Ming Lei in [3] and from Bart's
suggestion in [4].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
[3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-sched.c   |  8 ++++++--
 block/blk-mq-tag.c     | 36 +++++++++++++++++++++++++++++-------
 block/blk-mq-tag.h     |  6 +++++-
 block/blk-mq.c         | 45 ++++++++++++++++++++++++++++++++++++---------
 block/blk-mq.h         |  7 ++++++-
 include/linux/blk-mq.h |  7 +++++++
 6 files changed, 89 insertions(+), 20 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..f3589f42b96d 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 {
 	if (hctx->sched_tags) {
 		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
-		blk_mq_free_rq_map(hctx->sched_tags);
+		blk_mq_free_rq_map(hctx->sched_tags, false);
 		hctx->sched_tags = NULL;
 	}
 }
@@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 				   unsigned int hctx_idx)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
+	int flags = set->flags;
 	int ret;
 
+	/* Scheduler tags are never shared */
+	set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
 					       set->reserved_tags);
+	set->flags = flags;
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
@@ -484,7 +488,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (hctx->sched_tags) {
-			blk_mq_free_rq_map(hctx->sched_tags);
+			blk_mq_free_rq_map(hctx->sched_tags, false);
 			hctx->sched_tags = NULL;
 		}
 	}
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 332d71cd3976..53b4c4c53c6a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -228,7 +228,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -470,7 +470,27 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
+	return false;
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
 {
@@ -488,9 +508,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
-	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
-		kfree(tags);
-		tags = NULL;
+	if (!blk_mq_is_sbitmap_shared(set)) {
+		if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+			kfree(tags);
+			tags = NULL;
+		}
 	}
 	return tags;
 }
@@ -538,12 +560,12 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 			return -ENOMEM;
 		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
 		if (ret) {
-			blk_mq_free_rq_map(new);
+			blk_mq_free_rq_map(new, blk_mq_is_sbitmap_shared(set));
 			return -ENOMEM;
 		}
 
 		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
-		blk_mq_free_rq_map(*tagsptr);
+		blk_mq_free_rq_map(*tagsptr, blk_mq_is_sbitmap_shared(set));
 		*tagsptr = new;
 	} else {
 		/*
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 10b66fd4664a..279a861c7e58 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -22,7 +22,11 @@ struct blk_mq_tags {
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+					    unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f0c40fcbd8ae..db87b0f57dbe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2075,13 +2075,14 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	}
 }
 
-void blk_mq_free_rq_map(struct blk_mq_tags *tags)
+void blk_mq_free_rq_map(struct blk_mq_tags *tags, bool shared)
 {
 	kfree(tags->rqs);
 	tags->rqs = NULL;
 	kfree(tags->static_rqs);
 	tags->static_rqs = NULL;
-	blk_mq_free_tags(tags);
+	if (!shared)
+		blk_mq_free_tags(tags);
 }
 
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
@@ -2096,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
@@ -2105,7 +2106,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 				 node);
 	if (!tags->rqs) {
-		blk_mq_free_tags(tags);
+		if (!blk_mq_is_sbitmap_shared(set))
+			blk_mq_free_tags(tags);
 		return NULL;
 	}
 
@@ -2114,7 +2116,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					node);
 	if (!tags->static_rqs) {
 		kfree(tags->rqs);
-		blk_mq_free_tags(tags);
+		if (!blk_mq_is_sbitmap_shared(set))
+			blk_mq_free_tags(tags);
 		return NULL;
 	}
 
@@ -2446,7 +2449,7 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 	if (!ret)
 		return true;
 
-	blk_mq_free_rq_map(set->tags[hctx_idx]);
+	blk_mq_free_rq_map(set->tags[hctx_idx], blk_mq_is_sbitmap_shared(set));
 	set->tags[hctx_idx] = NULL;
 	return false;
 }
@@ -2456,7 +2459,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 {
 	if (set->tags && set->tags[hctx_idx]) {
 		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
-		blk_mq_free_rq_map(set->tags[hctx_idx]);
+		blk_mq_free_rq_map(set->tags[hctx_idx],
+				   blk_mq_is_sbitmap_shared(set));
 		set->tags[hctx_idx] = NULL;
 	}
 }
@@ -2954,7 +2958,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 
 out_unwind:
 	while (--i >= 0)
-		blk_mq_free_rq_map(set->tags[i]);
+		blk_mq_free_rq_map(set->tags[i], blk_mq_is_sbitmap_shared(set));
 
 	return -ENOMEM;
 }
@@ -3099,6 +3103,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set)) {
+		if (!blk_mq_init_shared_sbitmap(set)) {
+			ret = -ENOMEM;
+			goto out_free_mq_map;
+		}
+
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			struct blk_mq_tags *tags = set->tags[i];
+
+			tags->bitmap_tags = &set->shared_bitmap_tags;
+			tags->breserved_tags = &set->shared_breserved_tags;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
@@ -3168,8 +3186,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
 
-	if (!ret)
+	if (!ret) {
+		if (blk_mq_is_sbitmap_shared(set)) {
+			sbitmap_queue_resize(&set->shared_bitmap_tags, nr);
+			sbitmap_queue_resize(&set->shared_breserved_tags, nr);
+		}
 		q->nr_requests = nr;
+	}
+	/*
+	 * if ret != 0, q->nr_requests would not be updated, yet the depth
+	 * for some hctx may have changed - is that right?
+	 */
 
 	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 78d38b5f2793..c4b8213dfdfc 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -53,7 +53,7 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
  */
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx);
-void blk_mq_free_rq_map(struct blk_mq_tags *tags);
+void blk_mq_free_rq_map(struct blk_mq_tags *tags, bool shared);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
@@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 147185394a25..670e9a949d32 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
 	unsigned int		flags;		/* BLK_MQ_F_* */
 	void			*driver_data;
 
+	struct sbitmap_queue shared_bitmap_tags;
+	struct sbitmap_queue shared_breserved_tags;
+
+	struct sbitmap_queue sched_shared_bitmap_tags;
+	struct sbitmap_queue sched_shared_breserved_tags;
+
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -226,6 +232,7 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/8] scsi: Add template flag 'host_tagset'
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (3 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 6/8] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2563b061f56b..d7ad7b99bc05 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1899,6 +1899,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f577647bf5f2..4fd0af0883dd 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -429,6 +429,9 @@ struct scsi_host_template {
 	/* True if the low-level driver supports blk-mq only */
 	unsigned force_blk_mq:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/8] scsi: hisi_sas: Switch v3 hw to MQ
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (4 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 5/8] scsi: Add template flag 'host_tagset' Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 7/8] smartpqi: enable host tagset Hannes Reinecke
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block

From: John Garry <john.garry@huawei.com>

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 +++++++-------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 ++++++++++++++--------------------
 3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 233c73e01246..0405602df2a4 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
 #define _HISI_SAS_H_
 
 #include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
@@ -431,7 +433,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* bist */
 	enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 03588ec3c394..e185935c3399 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -421,6 +421,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
 	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+	struct scsi_cmnd *scmd = NULL;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -436,10 +437,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	if (task->uldd_task) {
+		struct ata_queued_cmd *qc;
 
+		if (dev_is_sata(device)) {
+			qc = task->uldd_task;
+			scmd = qc->scsicmd;
+		} else {
+			scmd = task->uldd_task;
+		}
+	}
+
+	if (scmd) {
+		unsigned int dq_index;
+		u32 blk_tag;
+
+		blk_tag = blk_mq_unique_tag(scmd->request);
+		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
 		*dq_pointer = dq = sas_dev->dq;
@@ -468,21 +482,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
+	else
+		rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
 
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
-		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index bf5d5f138437..e7b015d88968 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2353,66 +2353,35 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 {
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+	int vectors;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	struct irq_affinity desc = {
+		.pre_vectors = BASE_VECTORS_V3_HW,
+	};
+
+	min_msi = MIN_AFFINE_VECTORS_V3_HW;
+	vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+						 min_msi, max_msi,
+						 PCI_IRQ_MSI |
+						 PCI_IRQ_AFFINITY,
+						 &desc);
+	if (vectors < 0)
+		return -ENOENT;
 
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
 
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
+	return 0;
 }
 
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc, i;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
-	if (auto_affine_msi_experimental) {
-		struct irq_affinity desc = {
-			.pre_vectors = BASE_VECTORS_V3_HW,
-		};
-
-		min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
-		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
-							 min_msi, max_msi,
-							 PCI_IRQ_MSI |
-							 PCI_IRQ_AFFINITY,
-							 &desc);
-		if (vectors < 0)
-			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
-	} else {
-		min_msi = max_msi;
-		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
-						max_msi, PCI_IRQ_MSI);
-		if (vectors < 0)
-			return vectors;
-	}
-
-	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	int rc, i;
 
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
@@ -3057,6 +3026,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
 	return 0;
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+					     BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -3065,6 +3043,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
+	.map_queues		= hisi_sas_map_queues,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
@@ -3078,6 +3057,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3249,6 +3229,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	rc = interrupt_preinit_v3_hw(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+	dev_err(dev, "%d hw qeues\n", shost->nr_hw_queues);
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/8] smartpqi: enable host tagset
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (5 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 6/8] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26  9:14 ` [PATCH 8/8] hpsa: switch to using blk-mq Hannes Reinecke
  2019-11-26 10:09 ` [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

Enable host tagset for smartpqi; with this we can use the request
tag to look command from the pool avoiding the list iteration in
the hot path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/smartpqi/smartpqi_init.c | 38 ++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 7b7ef3acb504..c17b533c84ad 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -575,17 +575,29 @@ static inline void pqi_reinit_io_request(struct pqi_io_request *io_request)
 }
 
 static struct pqi_io_request *pqi_alloc_io_request(
-	struct pqi_ctrl_info *ctrl_info)
+	struct pqi_ctrl_info *ctrl_info, struct scsi_cmnd *scmd)
 {
 	struct pqi_io_request *io_request;
+	unsigned int limit = PQI_RESERVED_IO_SLOTS;
 	u16 i = ctrl_info->next_io_request_slot;	/* benignly racy */
 
-	while (1) {
+	if (scmd) {
+		u32 blk_tag = blk_mq_unique_tag(scmd->request);
+
+		i = blk_mq_unique_tag_to_tag(blk_tag) + limit;
 		io_request = &ctrl_info->io_request_pool[i];
-		if (atomic_inc_return(&io_request->refcount) == 1)
-			break;
-		atomic_dec(&io_request->refcount);
-		i = (i + 1) % ctrl_info->max_io_slots;
+		if (WARN_ON(atomic_inc_return(&io_request->refcount) > 1)) {
+			atomic_dec(&io_request->refcount);
+			return NULL;
+		}
+	} else {
+		while (1) {
+			io_request = &ctrl_info->io_request_pool[i];
+			if (atomic_inc_return(&io_request->refcount) == 1)
+				break;
+			atomic_dec(&io_request->refcount);
+			i = (i + 1) % limit;
+		}
 	}
 
 	/* benignly racy */
@@ -4075,7 +4087,7 @@ static int pqi_submit_raid_request_synchronous(struct pqi_ctrl_info *ctrl_info,
 
 	atomic_inc(&ctrl_info->sync_cmds_outstanding);
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 
 	put_unaligned_le16(io_request->index,
 		&(((struct pqi_raid_path_request *)request)->request_id));
@@ -5032,7 +5044,9 @@ static inline int pqi_raid_submit_scsi_cmd(struct pqi_ctrl_info *ctrl_info,
 {
 	struct pqi_io_request *io_request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
 
 	return pqi_raid_submit_scsi_cmd_with_io_request(ctrl_info, io_request,
 		device, scmd, queue_group);
@@ -5230,7 +5244,10 @@ static int pqi_aio_submit_io(struct pqi_ctrl_info *ctrl_info,
 	struct pqi_io_request *io_request;
 	struct pqi_aio_path_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
+
 	io_request->io_complete_callback = pqi_aio_io_complete;
 	io_request->scmd = scmd;
 	io_request->raid_bypass = raid_bypass;
@@ -5657,7 +5674,7 @@ static int pqi_lun_reset(struct pqi_ctrl_info *ctrl_info,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct pqi_task_management_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 	io_request->io_complete_callback = pqi_lun_reset_complete;
 	io_request->context = &wait;
 
@@ -6504,6 +6521,7 @@ static struct scsi_host_template pqi_driver_template = {
 	.map_queues = pqi_map_queues,
 	.sdev_attrs = pqi_sdev_attrs,
 	.shost_attrs = pqi_shost_attrs,
+	.host_tagset = 1,
 };
 
 static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 8/8] hpsa: switch to using blk-mq
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (6 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 7/8] smartpqi: enable host tagset Hannes Reinecke
@ 2019-11-26  9:14 ` Hannes Reinecke
  2019-11-26 10:09 ` [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  8 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26  9:14 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

Enable host_tagset and switch to using blk-mq.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/hpsa.c | 44 +++++++-------------------------------------
 drivers/scsi/hpsa.h |  1 -
 2 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index ac39ed79ccaa..2b811c981b43 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -974,6 +974,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 2048,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -1138,12 +1139,14 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
 static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	struct CommandList *c, int reply_queue)
 {
+	u32 blk_tag = blk_mq_unique_tag(c->scsi_cmd->request);
+
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
 	if (c->device)
 		atomic_inc(&c->device->commands_outstanding);
 
-	reply_queue = h->reply_map[raw_smp_processor_id()];
+	reply_queue = blk_mq_unique_tag_to_hwq(blk_tag);
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -5628,8 +5631,6 @@ static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
 	/* Get the ptr to our adapter structure out of cmd->host. */
 	h = sdev_to_hba(cmd->device);
 
-	BUG_ON(cmd->request->tag < 0);
-
 	dev = cmd->device->hostdata;
 	if (!dev) {
 		cmd->result = DID_NO_CONNECT << 16;
@@ -5805,7 +5806,7 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)
 	sh->hostdata[0] = (unsigned long) h;
 	sh->irq = pci_irq_vector(h->pdev, 0);
 	sh->unique_id = sh->irq;
-
+	sh->nr_hw_queues = h->msix_vectors > 0 ? h->msix_vectors : 1;
 	h->scsi_host = sh;
 	return 0;
 }
@@ -5831,7 +5832,8 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
  */
 static int hpsa_get_cmd_index(struct scsi_cmnd *scmd)
 {
-	int idx = scmd->request->tag;
+	u32 blk_tag = blk_mq_unique_tag(scmd->request);
+	int idx = blk_mq_unique_tag_to_tag(blk_tag);
 
 	if (idx < 0)
 		return idx;
@@ -7431,26 +7433,6 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
-static void hpsa_setup_reply_map(struct ctlr_info *h)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu;
-
-	for (queue = 0; queue < h->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(h->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			h->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		h->reply_map[cpu] = 0;
-}
-
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7847,9 +7829,6 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	if (err)
 		goto clean1;
 
-	/* setup mapping between CPU and reply queue */
-	hpsa_setup_reply_map(h);
-
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8575,7 +8554,6 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 
 static void hpda_free_ctlr_info(struct ctlr_info *h)
 {
-	kfree(h->reply_map);
 	kfree(h);
 }
 
@@ -8584,14 +8562,6 @@ static struct ctlr_info *hpda_alloc_ctlr_info(void)
 	struct ctlr_info *h;
 
 	h = kzalloc(sizeof(*h), GFP_KERNEL);
-	if (!h)
-		return NULL;
-
-	h->reply_map = kcalloc(nr_cpu_ids, sizeof(*h->reply_map), GFP_KERNEL);
-	if (!h->reply_map) {
-		kfree(h);
-		return NULL;
-	}
 	return h;
 }
 
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index f8c88fc7b80a..ea4a609e3eb7 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -161,7 +161,6 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
-	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (7 preceding siblings ...)
  2019-11-26  9:14 ` [PATCH 8/8] hpsa: switch to using blk-mq Hannes Reinecke
@ 2019-11-26 10:09 ` John Garry
  8 siblings, 0 replies; 32+ messages in thread
From: John Garry @ 2019-11-26 10:09 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 26/11/2019 09:14, Hannes Reinecke wrote:
> Hi all,
> 
> here now is an updated version of the v2 patchset from John Garry,
> including the suggestions and reviews from the mailing list.
> John, apologies for hijacking your work :-)

No worries as long as we can keep moving this forward.

> 
> The main diffence is that I've changed the bitmaps to be allocated
> separately in all cases, and just set the pointer to the shared bitmap
> for the hostwide tags case.

Yeah, I was considering this also.

> I've also modified smartpqi and hpsa to take advantage of host_tags.
> 
> I did audit the iterators, and I _think_ they do the correct thing even
> in the shared bitmap case. But then I might have overlooked things,
> so feedback and reviews are welcome.
> 
> The one thing I'm not happy with is the debugfs interface; for shared
> bitmaps all will be displaying essentially the same information, which
> could be moved to a top-level directory. But that would change the
> layout and I'm not sure if that buys us anything.

I was having a look at this. I'd say we need to still show which bits 
are set per hctx.

Maybe we can do something like this:

a. In hctx_tags_bitmaphow() or other relevant functions, copy shared 
sbitmap map into temp sbitmap map (maybe even the per-hctx sbitmap)
b. iterate over to unset bits not relevant to hctx in temp sbitmap map
c. then do sbitmap_show

Locking may be a bit tricky.

Thanks,
John

> 
> Differences to v2:
> - Drop embedded tag bitmaps
> - Do not share scheduling tags
> - Add patches for hpsa and smartpqi
> 
> Differences to v1:
> - Use a shared sbitmap, and not a separate shared tags (a big change!)
> 	- Drop request.shared_tag
> - Add RB tags
> 
> Hannes Reinecke (4):
>    blk-mq: Use a pointer for sbitmap
>    scsi: Add template flag 'host_tagset'
>    smartpqi: enable host tagset
>    hpsa: switch to using blk-mq
> 
> John Garry (3):
>    blk-mq: Remove some unused function arguments
>    blk-mq: Facilitate a shared sbitmap per tagset
>    scsi: hisi_sas: Switch v3 hw to MQ
> 
> Ming Lei (1):
>    blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
>   block/bfq-iosched.c                    |   4 +-
>   block/blk-mq-debugfs.c                 |  10 ++--
>   block/blk-mq-sched.c                   |   8 ++-
>   block/blk-mq-tag.c                     | 104 ++++++++++++++++++++++-----------
>   block/blk-mq-tag.h                     |  18 +++---
>   block/blk-mq.c                         |  80 ++++++++++++++++---------
>   block/blk-mq.h                         |   9 ++-
>   block/kyber-iosched.c                  |   4 +-
>   drivers/scsi/hisi_sas/hisi_sas.h       |   3 +-
>   drivers/scsi/hisi_sas/hisi_sas_main.c  |  36 ++++++------
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  86 +++++++++++----------------
>   drivers/scsi/hpsa.c                    |  44 +++-----------
>   drivers/scsi/hpsa.h                    |   1 -
>   drivers/scsi/scsi_lib.c                |   2 +
>   drivers/scsi/smartpqi/smartpqi_init.c  |  38 ++++++++----
>   include/linux/blk-mq.h                 |   9 ++-
>   include/scsi/scsi_host.h               |   3 +
>   17 files changed, 260 insertions(+), 199 deletions(-)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26  9:14 ` [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
@ 2019-11-26 11:05   ` Ming Lei
  2019-11-26 11:27     ` Hannes Reinecke
  0 siblings, 1 reply; 32+ messages in thread
From: Ming Lei @ 2019-11-26 11:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, John Garry, linux-scsi, linux-block

On Tue, Nov 26, 2019 at 10:14:12AM +0100, Hannes Reinecke wrote:
> From: John Garry <john.garry@huawei.com>
> 
> Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
> multiple reply queues with single hostwide tags.
> 
> In addition, these drivers want to use interrupt assignment in
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
> CPU hotplug may cause in-flight IO completion to not be serviced when an
> interrupt is shutdown.
> 
> To solve that problem, Ming's patchset to drain hctx's should ensure no
> IOs are missed in-flight [1].
> 
> However, to take advantage of that patchset, we need to map the HBA HW
> queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.
> 
> In making that transition, the per-SCSI command request tags are no
> longer unique per Scsi host - they are just unique per hctx. As such, the
> HBA LLDD would have to generate this tag internally, which has a certain
> performance overhead.
> 
> However another problem is that blk mq assumes the host may accept
> (Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
> host busy counter, which would stop the LLDD being sent more than
> .can_queue commands; however, we should still ensure that the block layer
> does not issue more than .can_queue commands to the Scsi host.
> 
> To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
> which may be requested at init time.
> 
> New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
> tagset to indicate whether the shared sbitmap should be used.
> 
> Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
> tags and requests per hctx; the reason for this is that if we only allocate
> tags and requests for a single hctx - like hctx0 - we may break block
> drivers which expect a request be associated with a specific hctx, i.e.
> not hctx0.
> 
> This is based on work originally from Ming Lei in [3] and from Bart's
> suggestion in [4].
> 
> [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
> [1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
> [2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
> [3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
> [4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  block/blk-mq-sched.c   |  8 ++++++--
>  block/blk-mq-tag.c     | 36 +++++++++++++++++++++++++++++-------
>  block/blk-mq-tag.h     |  6 +++++-
>  block/blk-mq.c         | 45 ++++++++++++++++++++++++++++++++++++---------
>  block/blk-mq.h         |  7 ++++++-
>  include/linux/blk-mq.h |  7 +++++++
>  6 files changed, 89 insertions(+), 20 deletions(-)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..f3589f42b96d 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
>  {
>  	if (hctx->sched_tags) {
>  		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
> -		blk_mq_free_rq_map(hctx->sched_tags);
> +		blk_mq_free_rq_map(hctx->sched_tags, false);
>  		hctx->sched_tags = NULL;
>  	}
>  }
> @@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
>  				   unsigned int hctx_idx)
>  {
>  	struct blk_mq_tag_set *set = q->tag_set;
> +	int flags = set->flags;
>  	int ret;
>  
> +	/* Scheduler tags are never shared */
> +	set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
>  	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
>  					       set->reserved_tags);
> +	set->flags = flags;

This way is very fragile, race is made against other uses of
blk_mq_is_sbitmap_shared().

From performance viewpoint, all hctx belonging to this request queue should
share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
driver tag queue depth isn't changed.

>  	if (!hctx->sched_tags)
>  		return -ENOMEM;
>  
> @@ -484,7 +488,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
>  
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		if (hctx->sched_tags) {
> -			blk_mq_free_rq_map(hctx->sched_tags);
> +			blk_mq_free_rq_map(hctx->sched_tags, false);
>  			hctx->sched_tags = NULL;
>  		}
>  	}
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 332d71cd3976..53b4c4c53c6a 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -228,7 +228,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>  	 * We can hit rq == NULL here, because the tagging functions
>  	 * test and set the bit before assigning ->rqs[].
>  	 */
> -	if (rq && rq->q == hctx->queue)
> +	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
>  		return iter_data->fn(hctx, rq, iter_data->data, reserved);
>  	return true;
>  }
> @@ -470,7 +470,27 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
>  
> -struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> +bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
> +{
> +	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
> +	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> +	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> +	int node = tag_set->numa_node;
> +
> +	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
> +		return false;
> +	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
> +			 node))
> +		goto free_bitmap_tags;
> +
> +	return true;
> +free_bitmap_tags:
> +	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
> +	return false;
> +}
> +
> +struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
> +				     unsigned int total_tags,
>  				     unsigned int reserved_tags,
>  				     int node, int alloc_policy)
>  {
> @@ -488,9 +508,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	tags->nr_tags = total_tags;
>  	tags->nr_reserved_tags = reserved_tags;
>  
> -	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
> -		kfree(tags);
> -		tags = NULL;
> +	if (!blk_mq_is_sbitmap_shared(set)) {
> +		if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
> +			kfree(tags);
> +			tags = NULL;
> +		}
>  	}
>  	return tags;
>  }
> @@ -538,12 +560,12 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  			return -ENOMEM;
>  		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
>  		if (ret) {
> -			blk_mq_free_rq_map(new);
> +			blk_mq_free_rq_map(new, blk_mq_is_sbitmap_shared(set));
>  			return -ENOMEM;
>  		}
>  
>  		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
> -		blk_mq_free_rq_map(*tagsptr);
> +		blk_mq_free_rq_map(*tagsptr, blk_mq_is_sbitmap_shared(set));
>  		*tagsptr = new;
>  	} else {
>  		/*
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 10b66fd4664a..279a861c7e58 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -22,7 +22,11 @@ struct blk_mq_tags {
>  };
>  
>  
> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
> +extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
> +extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
> +					    unsigned int nr_tags,
> +					    unsigned int reserved_tags,
> +					    int node, int alloc_policy);
>  extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>  
>  extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f0c40fcbd8ae..db87b0f57dbe 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2075,13 +2075,14 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	}
>  }
>  
> -void blk_mq_free_rq_map(struct blk_mq_tags *tags)
> +void blk_mq_free_rq_map(struct blk_mq_tags *tags, bool shared)
>  {
>  	kfree(tags->rqs);
>  	tags->rqs = NULL;
>  	kfree(tags->static_rqs);
>  	tags->static_rqs = NULL;
> -	blk_mq_free_tags(tags);
> +	if (!shared)
> +		blk_mq_free_tags(tags);
>  }
>  
>  struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
> @@ -2096,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  	if (node == NUMA_NO_NODE)
>  		node = set->numa_node;
>  
> -	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> +	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
>  				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>  	if (!tags)
>  		return NULL;
> @@ -2105,7 +2106,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
>  				 node);
>  	if (!tags->rqs) {
> -		blk_mq_free_tags(tags);
> +		if (!blk_mq_is_sbitmap_shared(set))
> +			blk_mq_free_tags(tags);
>  		return NULL;
>  	}
>  
> @@ -2114,7 +2116,8 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  					node);
>  	if (!tags->static_rqs) {
>  		kfree(tags->rqs);
> -		blk_mq_free_tags(tags);
> +		if (!blk_mq_is_sbitmap_shared(set))
> +			blk_mq_free_tags(tags);
>  		return NULL;
>  	}
>  
> @@ -2446,7 +2449,7 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
>  	if (!ret)
>  		return true;
>  
> -	blk_mq_free_rq_map(set->tags[hctx_idx]);
> +	blk_mq_free_rq_map(set->tags[hctx_idx], blk_mq_is_sbitmap_shared(set));
>  	set->tags[hctx_idx] = NULL;
>  	return false;
>  }
> @@ -2456,7 +2459,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  {
>  	if (set->tags && set->tags[hctx_idx]) {
>  		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
> -		blk_mq_free_rq_map(set->tags[hctx_idx]);
> +		blk_mq_free_rq_map(set->tags[hctx_idx],
> +				   blk_mq_is_sbitmap_shared(set));
>  		set->tags[hctx_idx] = NULL;
>  	}

Who will free the shared tags finally in case of blk_mq_is_sbitmap_shared()?

>  }
> @@ -2954,7 +2958,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>  
>  out_unwind:
>  	while (--i >= 0)
> -		blk_mq_free_rq_map(set->tags[i]);
> +		blk_mq_free_rq_map(set->tags[i], blk_mq_is_sbitmap_shared(set));
>  
>  	return -ENOMEM;
>  }
> @@ -3099,6 +3103,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	if (ret)
>  		goto out_free_mq_map;
>  
> +	if (blk_mq_is_sbitmap_shared(set)) {
> +		if (!blk_mq_init_shared_sbitmap(set)) {
> +			ret = -ENOMEM;
> +			goto out_free_mq_map;
> +		}
> +
> +		for (i = 0; i < set->nr_hw_queues; i++) {
> +			struct blk_mq_tags *tags = set->tags[i];
> +
> +			tags->bitmap_tags = &set->shared_bitmap_tags;
> +			tags->breserved_tags = &set->shared_breserved_tags;
> +		}
> +	}
> +
>  	mutex_init(&set->tag_list_lock);
>  	INIT_LIST_HEAD(&set->tag_list);
>  
> @@ -3168,8 +3186,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  			q->elevator->type->ops.depth_updated(hctx);
>  	}
>  
> -	if (!ret)
> +	if (!ret) {
> +		if (blk_mq_is_sbitmap_shared(set)) {
> +			sbitmap_queue_resize(&set->shared_bitmap_tags, nr);
> +			sbitmap_queue_resize(&set->shared_breserved_tags, nr);
> +		}

The above change is wrong in case of hctx->sched_tags.

>  		q->nr_requests = nr;
> +	}
> +	/*
> +	 * if ret != 0, q->nr_requests would not be updated, yet the depth
> +	 * for some hctx may have changed - is that right?
> +	 */
>  
>  	blk_mq_unquiesce_queue(q);
>  	blk_mq_unfreeze_queue(q);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 78d38b5f2793..c4b8213dfdfc 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -53,7 +53,7 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
>   */
>  void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  		     unsigned int hctx_idx);
> -void blk_mq_free_rq_map(struct blk_mq_tags *tags);
> +void blk_mq_free_rq_map(struct blk_mq_tags *tags, bool shared);
>  struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  					unsigned int hctx_idx,
>  					unsigned int nr_tags,
> @@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
>  	struct blk_mq_hw_ctx *hctx;
>  };
>  
> +static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
> +{
> +	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
> +}
> +
>  static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
>  {
>  	if (data->flags & BLK_MQ_REQ_INTERNAL)
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 147185394a25..670e9a949d32 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -109,6 +109,12 @@ struct blk_mq_tag_set {
>  	unsigned int		flags;		/* BLK_MQ_F_* */
>  	void			*driver_data;
>  
> +	struct sbitmap_queue shared_bitmap_tags;
> +	struct sbitmap_queue shared_breserved_tags;
> +
> +	struct sbitmap_queue sched_shared_bitmap_tags;
> +	struct sbitmap_queue sched_shared_breserved_tags;
> +

The above two fields aren't used in this patch.

>  	struct blk_mq_tags	**tags;
>  
>  	struct mutex		tag_list_lock;
> @@ -226,6 +232,7 @@ struct blk_mq_ops {
>  enum {
>  	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
>  	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
> +	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
>  	BLK_MQ_F_BLOCKING	= 1 << 5,
>  	BLK_MQ_F_NO_SCHED	= 1 << 6,
>  	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
> -- 
> 2.16.4
> 

-- 
Ming


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26 11:05   ` Ming Lei
@ 2019-11-26 11:27     ` Hannes Reinecke
  2019-11-26 11:50       ` John Garry
  2019-11-26 15:54       ` Ming Lei
  0 siblings, 2 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26 11:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, John Garry, linux-scsi, linux-block

On 11/26/19 12:05 PM, Ming Lei wrote:
> On Tue, Nov 26, 2019 at 10:14:12AM +0100, Hannes Reinecke wrote:
[ .. ]
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index ca22afd47b3d..f3589f42b96d 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
>>  {
>>  	if (hctx->sched_tags) {
>>  		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
>> -		blk_mq_free_rq_map(hctx->sched_tags);
>> +		blk_mq_free_rq_map(hctx->sched_tags, false);
>>  		hctx->sched_tags = NULL;
>>  	}
>>  }
>> @@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
>>  				   unsigned int hctx_idx)
>>  {
>>  	struct blk_mq_tag_set *set = q->tag_set;
>> +	int flags = set->flags;
>>  	int ret;
>>  
>> +	/* Scheduler tags are never shared */
>> +	set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
>>  	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
>>  					       set->reserved_tags);
>> +	set->flags = flags;
> 
> This way is very fragile, race is made against other uses of
> blk_mq_is_sbitmap_shared().
> 
We are allocating tags, I don't think we're even able to modify it at
this point.

> From performance viewpoint, all hctx belonging to this request queue should
> share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
> driver tag queue depth isn't changed.
> 
Hmm. Now you get me confused.
In an earlier mail you said:

> This kind of sharing is wrong, sched tags should be request
> queue wide instead of tagset wide, and each request queue has
> its own & independent scheduler queue.

as in v2 we _had_ shared scheduler tags, too.
Did I misread your comment above?

>>  	if (!hctx->sched_tags)
>>  		return -ENOMEM;
>>  
[ .. ]
>> @@ -2456,7 +2459,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>>  {
>>  	if (set->tags && set->tags[hctx_idx]) {
>>  		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
>> -		blk_mq_free_rq_map(set->tags[hctx_idx]);
>> +		blk_mq_free_rq_map(set->tags[hctx_idx],
>> +				   blk_mq_is_sbitmap_shared(set));
>>  		set->tags[hctx_idx] = NULL;
>>  	}
> 
> Who will free the shared tags finally in case of blk_mq_is_sbitmap_shared()?
> 
Hmm. Indeed, that bit is missing; will be adding it.

>>  }
[ .. ]
>> @@ -3168,8 +3186,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>  			q->elevator->type->ops.depth_updated(hctx);
>>  	}
>>  
>> -	if (!ret)
>> +	if (!ret) {
>> +		if (blk_mq_is_sbitmap_shared(set)) {
>> +			sbitmap_queue_resize(&set->shared_bitmap_tags, nr);
>> +			sbitmap_queue_resize(&set->shared_breserved_tags, nr);
>> +		}
> 
> The above change is wrong in case of hctx->sched_tags.
> 
Yes, we need to add a marker here if these are 'normal' or 'scheduler'
tags. This will also help when allocating as then we won't need this
flag twiddling you've complained about.

>>  		q->nr_requests = nr;
>> +	}
>> +	/*
>> +	 * if ret != 0, q->nr_requests would not be updated, yet the depth
>> +	 * for some hctx may have changed - is that right?
>> +	 */
>>  
>>  	blk_mq_unquiesce_queue(q);
>>  	blk_mq_unfreeze_queue(q);
[ .. ]
>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>> index 147185394a25..670e9a949d32 100644
>> --- a/include/linux/blk-mq.h
>> +++ b/include/linux/blk-mq.h
>> @@ -109,6 +109,12 @@ struct blk_mq_tag_set {
>>  	unsigned int		flags;		/* BLK_MQ_F_* */
>>  	void			*driver_data;
>>  
>> +	struct sbitmap_queue shared_bitmap_tags;
>> +	struct sbitmap_queue shared_breserved_tags;
>> +
>> +	struct sbitmap_queue sched_shared_bitmap_tags;
>> +	struct sbitmap_queue sched_shared_breserved_tags;
>> +
> 
> The above two fields aren't used in this patch.
> 
Left-overs from merging. Will be removed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26 11:27     ` Hannes Reinecke
@ 2019-11-26 11:50       ` John Garry
  2019-11-26 15:54       ` Ming Lei
  1 sibling, 0 replies; 32+ messages in thread
From: John Garry @ 2019-11-26 11:50 UTC (permalink / raw)
  To: Hannes Reinecke, Ming Lei
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, linux-block

On 26/11/2019 11:27, Hannes Reinecke wrote:
> On 11/26/19 12:05 PM, Ming Lei wrote:
>> On Tue, Nov 26, 2019 at 10:14:12AM +0100, Hannes Reinecke wrote:
> [ .. ]
>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>>> index ca22afd47b3d..f3589f42b96d 100644
>>> --- a/block/blk-mq-sched.c
>>> +++ b/block/blk-mq-sched.c
>>> @@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
>>>   {
>>>   	if (hctx->sched_tags) {
>>>   		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
>>> -		blk_mq_free_rq_map(hctx->sched_tags);
>>> +		blk_mq_free_rq_map(hctx->sched_tags, false);
>>>   		hctx->sched_tags = NULL;
>>>   	}
>>>   }
>>> @@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
>>>   				   unsigned int hctx_idx)
>>>   {
>>>   	struct blk_mq_tag_set *set = q->tag_set;
>>> +	int flags = set->flags;
>>>   	int ret;
>>>   
>>> +	/* Scheduler tags are never shared */
>>> +	set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
>>>   	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
>>>   					       set->reserved_tags);
>>> +	set->flags = flags;
>>
>> This way is very fragile, race is made against other uses of
>> blk_mq_is_sbitmap_shared().
>>
> We are allocating tags, I don't think we're even able to modify it at
> this point.
> 
>>  From performance viewpoint, all hctx belonging to this request queue should
>> share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
>> driver tag queue depth isn't changed.
>>
> Hmm. Now you get me confused.
> In an earlier mail you said:
> 
>> This kind of sharing is wrong, sched tags should be request
>> queue wide instead of tagset wide, and each request queue has
>> its own & independent scheduler queue.
> 
> as in v2 we _had_ shared scheduler tags, too.
> Did I misread your comment above?

As I understand, we just don't require shared scheduler tags. It's only 
blk_mq_tag_set.tags which need to use the shared sbitmap.

> 
>>>   	if (!hctx->sched_tags)
>>>   		return -ENOMEM;
>>>   
> [ .. ]
>>> @@ -2456,7 +2459,8 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>>>   {
>>>   	if (set->tags && set->tags[hctx_idx]) {
>>>   		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
>>> -		blk_mq_free_rq_map(set->tags[hctx_idx]);
>>> +		blk_mq_free_rq_map(set->tags[hctx_idx],
>>> +				   blk_mq_is_sbitmap_shared(set));
>>>   		set->tags[hctx_idx] = NULL;
>>>   	}
>>
>> Who will free the shared tags finally in case of blk_mq_is_sbitmap_shared()?
>>
> Hmm. Indeed, that bit is missing; will be adding it.
> 
>>>   }
> [ .. ]
>>> @@ -3168,8 +3186,17 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>>   			q->elevator->type->ops.depth_updated(hctx);
>>>   	}
>>>   
>>> -	if (!ret)
>>> +	if (!ret) {
>>> +		if (blk_mq_is_sbitmap_shared(set)) {
>>> +			sbitmap_queue_resize(&set->shared_bitmap_tags, nr);
>>> +			sbitmap_queue_resize(&set->shared_breserved_tags, nr);
>>> +		}
>>
>> The above change is wrong in case of hctx->sched_tags.
>>
> Yes, we need to add a marker here if these are 'normal' or 'scheduler'
> tags. This will also help when allocating as then we won't need this
> flag twiddling you've complained about.

nit: To be proper, since this is tag management, we should move it to 
blk-mq-tags.c

> 
>>>   		q->nr_requests = nr;
>>> +	}
>>> +	/*
>>> +	 * if ret != 0, q->nr_requests would not be updated, yet the depth
>>> +	 * for some hctx may have changed - is that right?
>>> +	 */
>>>   
>>>   	blk_mq_unquiesce_queue(q);
>>>   	blk_mq_unfreeze_queue(q);
> [ .. ]
>>> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
>>> index 147185394a25..670e9a949d32 100644
>>> --- a/include/linux/blk-mq.h
>>> +++ b/include/linux/blk-mq.h
>>> @@ -109,6 +109,12 @@ struct blk_mq_tag_set {
>>>   	unsigned int		flags;		/* BLK_MQ_F_* */
>>>   	void			*driver_data;
>>>   
>>> +	struct sbitmap_queue shared_bitmap_tags;
>>> +	struct sbitmap_queue shared_breserved_tags;
>>> +
>>> +	struct sbitmap_queue sched_shared_bitmap_tags;
>>> +	struct sbitmap_queue sched_shared_breserved_tags;
>>> +
>>
>> The above two fields aren't used in this patch.
>>
> Left-overs from merging. Will be removed.
> 
> Cheers,
> 
> Hannes
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26  9:14 ` [PATCH 3/8] blk-mq: Use a pointer for sbitmap Hannes Reinecke
@ 2019-11-26 15:14   ` Jens Axboe
  2019-11-26 16:54     ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Axboe @ 2019-11-26 15:14 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block

On 11/26/19 2:14 AM, Hannes Reinecke wrote:
> Instead of allocating the tag bitmap in place we should be using a
> pointer. This is in preparation for shared host-wide bitmaps.

Not a huge fan of this, it's an extra indirection in the hot path
of both submission and completion.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26 11:27     ` Hannes Reinecke
  2019-11-26 11:50       ` John Garry
@ 2019-11-26 15:54       ` Ming Lei
  2019-11-27 17:02         ` Hannes Reinecke
  1 sibling, 1 reply; 32+ messages in thread
From: Ming Lei @ 2019-11-26 15:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, John Garry, linux-scsi, linux-block

On Tue, Nov 26, 2019 at 12:27:50PM +0100, Hannes Reinecke wrote:
> On 11/26/19 12:05 PM, Ming Lei wrote:
> > On Tue, Nov 26, 2019 at 10:14:12AM +0100, Hannes Reinecke wrote:
> [ .. ]
> >> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> >> index ca22afd47b3d..f3589f42b96d 100644
> >> --- a/block/blk-mq-sched.c
> >> +++ b/block/blk-mq-sched.c
> >> @@ -452,7 +452,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
> >>  {
> >>  	if (hctx->sched_tags) {
> >>  		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
> >> -		blk_mq_free_rq_map(hctx->sched_tags);
> >> +		blk_mq_free_rq_map(hctx->sched_tags, false);
> >>  		hctx->sched_tags = NULL;
> >>  	}
> >>  }
> >> @@ -462,10 +462,14 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
> >>  				   unsigned int hctx_idx)
> >>  {
> >>  	struct blk_mq_tag_set *set = q->tag_set;
> >> +	int flags = set->flags;
> >>  	int ret;
> >>  
> >> +	/* Scheduler tags are never shared */
> >> +	set->flags &= ~BLK_MQ_F_TAG_HCTX_SHARED;
> >>  	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
> >>  					       set->reserved_tags);
> >> +	set->flags = flags;
> > 
> > This way is very fragile, race is made against other uses of
> > blk_mq_is_sbitmap_shared().
> > 
> We are allocating tags, I don't think we're even able to modify it at
> this point.

Sched tags is allocated when setting up scheduler, which can be done
anytime from writing to queue/scheduler.

> 
> > From performance viewpoint, all hctx belonging to this request queue should
> > share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
> > driver tag queue depth isn't changed.
> > 
> Hmm. Now you get me confused.
> In an earlier mail you said:
> 
> > This kind of sharing is wrong, sched tags should be request
> > queue wide instead of tagset wide, and each request queue has
> > its own & independent scheduler queue.
> 
> as in v2 we _had_ shared scheduler tags, too.
> Did I misread your comment above?

Yes, what I meant is that we can't share sched tags in tagset wide.

Now I mean we should share sched tags among all hctxs in same request
queue, and I believe I have described it clearly.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 15:14   ` Jens Axboe
@ 2019-11-26 16:54     ` John Garry
  2019-11-26 17:11       ` Jens Axboe
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2019-11-26 16:54 UTC (permalink / raw)
  To: Jens Axboe, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 26/11/2019 15:14, Jens Axboe wrote:
> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>> Instead of allocating the tag bitmap in place we should be using a
>> pointer. This is in preparation for shared host-wide bitmaps.
> 
> Not a huge fan of this, it's an extra indirection in the hot path
> of both submission and completion.

Hi Jens,

Thanks for having a look.

I checked the disassembly for blk_mq_get_tag() as a sample - which I 
assume is one hot path function which you care about - and the cost of 
the indirection is a load instruction instead of an add, denoted by ***, 
below:

Before:
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct 
blk_mq_alloc_data *data)
{
	if (data->flags & BLK_MQ_REQ_INTERNAL)
		return data->hctx->sched_tags;
      6ac:	a9554c64 	ldp	x4, x19, [x3, #336]

	return data->hctx->tags;
      6b0:	f27e003f 	tst	x1, #0x4
      6b4:	f9003ba0 	str	x0, [x29, #112]
      6b8:	a9078ba2 	stp	x2, x2, [x29, #120]
      6bc:	9a841273 	csel	x19, x19, x4, ne  // ne = any
	if (data->flags & BLK_MQ_REQ_RESERVED) {
      6c0:	36081021 	tbz	w1, #1, 8c4 <blk_mq_get_tag+0x264>
		if (unlikely(!tags->nr_reserved_tags)) {
      6c4:	b9400660 	ldr	w0, [x19, #4]

      6d4:	f90027ba 	str	x26, [x29, #72]
		tag_offset = 0;
      6c8:	52800018 	mov	w24, #0x0                   	// #0
		bt = &tags->breserved_tags;
      6cc:	91014273 	add	x19, x19, #0x50 ***
		if (unlikely(!tags->nr_reserved_tags)) {
      6d0:	340012e0 	cbz	w0, 92c <blk_mq_get_tag+0x2cc>
	tag = __blk_mq_get_tag(data, bt);
      6d8:	aa1303e1 	mov	x1, x19
      6dc:	aa1403e0 	mov	x0, x20
      6e0:	97fffe92 	bl	128 <__blk_mq_get_tag>

After:
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct 
blk_mq_alloc_data *data)
{
	if (data->flags & BLK_MQ_REQ_INTERNAL)
		return data->hctx->sched_tags;

      6b4:	a9550004 	ldp	x4, x0, [x0, #336]

	return data->hctx->tags;
      6ac:	f27e005f 	tst	x2, #0x4
      6b0:	f9003ba1 	str	x1, [x29, #112]
		return data->hctx->sched_tags;
      6b8:	a9078fa3 	stp	x3, x3, [x29, #120]
	return data->hctx->tags;
      6bc:	9a841000 	csel	x0, x0, x4, ne  // ne = any
	if (data->flags & BLK_MQ_REQ_RESERVED) {
      6c0:	36080fa2 	tbz	w2, #1, 8b4 <blk_mq_get_tag+0x254>
		if (unlikely(!tags->nr_reserved_tags)) {
      6c4:	b9400401 	ldr	w1, [x0, #4]
      6cc:	f90027ba 	str	x26, [x29, #72]
		tag_offset = 0;
      6d0:	52800017 	mov	w23, #0x0                   	// #0
		bt = tags->breserved_tags;

      6c8:	340012a1 	cbz	w1, 91c <blk_mq_get_tag+0x2bc>
      6d4:	f9400c14 	ldr	x20, [x0, #24] ***
	tag = __blk_mq_get_tag(data, bt);
      6d8:	aa1303e0 	mov	x0, x19
      6dc:	aa1403e1 	mov	x1, x20
      6e0:	97fffe92 	bl	128 <__blk_mq_get_tag>

This is arm64 dis.

I'm just saying this to provide some illustration of the potential 
performance impact of this change.

Thanks,
John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 16:54     ` John Garry
@ 2019-11-26 17:11       ` Jens Axboe
  2019-11-26 17:23         ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Axboe @ 2019-11-26 17:11 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/26/19 9:54 AM, John Garry wrote:
> On 26/11/2019 15:14, Jens Axboe wrote:
>> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>>> Instead of allocating the tag bitmap in place we should be using a
>>> pointer. This is in preparation for shared host-wide bitmaps.
>>
>> Not a huge fan of this, it's an extra indirection in the hot path
>> of both submission and completion.
> 
> Hi Jens,
> 
> Thanks for having a look.
> 
> I checked the disassembly for blk_mq_get_tag() as a sample - which I
> assume is one hot path function which you care about - and the cost of
> the indirection is a load instruction instead of an add, denoted by ***,
> below:

I'm not that worried about an extra instruction, my worry is the extra
load is from different memory. When it's embedded in the struct, we're
on the same cache line or adjacent.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 17:11       ` Jens Axboe
@ 2019-11-26 17:23         ` John Garry
  2019-11-26 17:25           ` Jens Axboe
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2019-11-26 17:23 UTC (permalink / raw)
  To: Jens Axboe, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 26/11/2019 17:11, Jens Axboe wrote:
> On 11/26/19 9:54 AM, John Garry wrote:
>> On 26/11/2019 15:14, Jens Axboe wrote:
>>> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>>>> Instead of allocating the tag bitmap in place we should be using a
>>>> pointer. This is in preparation for shared host-wide bitmaps.
>>>
>>> Not a huge fan of this, it's an extra indirection in the hot path
>>> of both submission and completion.
>>
>> Hi Jens,
>>
>> Thanks for having a look.
>>
>> I checked the disassembly for blk_mq_get_tag() as a sample - which I
>> assume is one hot path function which you care about - and the cost of
>> the indirection is a load instruction instead of an add, denoted by ***,
>> below:
> 

Hi Jens,

> I'm not that worried about an extra instruction, my worry is the extra
> load is from different memory. When it's embedded in the struct, we're
> on the same cache line or adjacent.

Right, so the earlier iteration of this series kept the embedded struct 
and we simply pointed at that, so I wouldn't expect a caching issue of 
different memory in that case.

Cheers,
John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 17:23         ` John Garry
@ 2019-11-26 17:25           ` Jens Axboe
  2019-11-26 18:08             ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Axboe @ 2019-11-26 17:25 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/26/19 10:23 AM, John Garry wrote:
> On 26/11/2019 17:11, Jens Axboe wrote:
>> On 11/26/19 9:54 AM, John Garry wrote:
>>> On 26/11/2019 15:14, Jens Axboe wrote:
>>>> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>>>>> Instead of allocating the tag bitmap in place we should be using a
>>>>> pointer. This is in preparation for shared host-wide bitmaps.
>>>>
>>>> Not a huge fan of this, it's an extra indirection in the hot path
>>>> of both submission and completion.
>>>
>>> Hi Jens,
>>>
>>> Thanks for having a look.
>>>
>>> I checked the disassembly for blk_mq_get_tag() as a sample - which I
>>> assume is one hot path function which you care about - and the cost of
>>> the indirection is a load instruction instead of an add, denoted by ***,
>>> below:
>>
> 
> Hi Jens,
> 
>> I'm not that worried about an extra instruction, my worry is the extra
>> load is from different memory. When it's embedded in the struct, we're
>> on the same cache line or adjacent.
> 
> Right, so the earlier iteration of this series kept the embedded struct
> and we simply pointed at that, so I wouldn't expect a caching issue of
> different memory in that case.

That would be a much better solution for the common case, my concern
here is slowing down the fast path for device that don't need shared
tags.

Would be interesting to check the generated code for that, ideally we'd
get rid of the extra load for that case, even if it is in the same
cacheline.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 17:25           ` Jens Axboe
@ 2019-11-26 18:08             ` John Garry
  2019-11-27  1:46               ` Jens Axboe
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2019-11-26 18:08 UTC (permalink / raw)
  To: Jens Axboe, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 26/11/2019 17:25, Jens Axboe wrote:
> On 11/26/19 10:23 AM, John Garry wrote:
>> On 26/11/2019 17:11, Jens Axboe wrote:
>>> On 11/26/19 9:54 AM, John Garry wrote:
>>>> On 26/11/2019 15:14, Jens Axboe wrote:
>>>>> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>>>>>> Instead of allocating the tag bitmap in place we should be using a
>>>>>> pointer. This is in preparation for shared host-wide bitmaps.
>>>>>
>>>>> Not a huge fan of this, it's an extra indirection in the hot path
>>>>> of both submission and completion.
>>>>
>>>> Hi Jens,
>>>>
>>>> Thanks for having a look.
>>>>
>>>> I checked the disassembly for blk_mq_get_tag() as a sample - which I
>>>> assume is one hot path function which you care about - and the cost of
>>>> the indirection is a load instruction instead of an add, denoted by ***,
>>>> below:
>>>
>>
>> Hi Jens,
>>
>>> I'm not that worried about an extra instruction, my worry is the extra
>>> load is from different memory. When it's embedded in the struct, we're
>>> on the same cache line or adjacent.
>>
>> Right, so the earlier iteration of this series kept the embedded struct
>> and we simply pointed at that, so I wouldn't expect a caching issue of
>> different memory in that case.
> 

Hi Jens,

> That would be a much better solution for the common case, my concern
> here is slowing down the fast path for device that don't need shared
> tags.
> 
> Would be interesting to check the generated code for that, ideally we'd
> get rid of the extra load for that case, even if it is in the same
> cacheline.
> 

I checked the disassembly and we still have the load instead of the add.

This is not surprising, as the compiler would not know for certain that 
we point to a field within the same struct. But at least we still should 
point to a close memory.

Note that the pointer could be dropped, which would remove the load, but 
then we have many if-elses which could be slower, not to mention that 
the blk-mq-tag code deals in bitmap pointers anyway.

Thanks,
John




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-26 18:08             ` John Garry
@ 2019-11-27  1:46               ` Jens Axboe
  2019-11-27 13:05                 ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Jens Axboe @ 2019-11-27  1:46 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/26/19 11:08 AM, John Garry wrote:
> On 26/11/2019 17:25, Jens Axboe wrote:
>> On 11/26/19 10:23 AM, John Garry wrote:
>>> On 26/11/2019 17:11, Jens Axboe wrote:
>>>> On 11/26/19 9:54 AM, John Garry wrote:
>>>>> On 26/11/2019 15:14, Jens Axboe wrote:
>>>>>> On 11/26/19 2:14 AM, Hannes Reinecke wrote:
>>>>>>> Instead of allocating the tag bitmap in place we should be using a
>>>>>>> pointer. This is in preparation for shared host-wide bitmaps.
>>>>>>
>>>>>> Not a huge fan of this, it's an extra indirection in the hot path
>>>>>> of both submission and completion.
>>>>>
>>>>> Hi Jens,
>>>>>
>>>>> Thanks for having a look.
>>>>>
>>>>> I checked the disassembly for blk_mq_get_tag() as a sample - which I
>>>>> assume is one hot path function which you care about - and the cost of
>>>>> the indirection is a load instruction instead of an add, denoted by ***,
>>>>> below:
>>>>
>>>
>>> Hi Jens,
>>>
>>>> I'm not that worried about an extra instruction, my worry is the extra
>>>> load is from different memory. When it's embedded in the struct, we're
>>>> on the same cache line or adjacent.
>>>
>>> Right, so the earlier iteration of this series kept the embedded struct
>>> and we simply pointed at that, so I wouldn't expect a caching issue of
>>> different memory in that case.
>>
> 
> Hi Jens,
> 
>> That would be a much better solution for the common case, my concern
>> here is slowing down the fast path for device that don't need shared
>> tags.
>>
>> Would be interesting to check the generated code for that, ideally we'd
>> get rid of the extra load for that case, even if it is in the same
>> cacheline.
>>
> 
> I checked the disassembly and we still have the load instead of the add.
> 
> This is not surprising, as the compiler would not know for certain that
> we point to a field within the same struct. But at least we still should
> point to a close memory.
> 
> Note that the pointer could be dropped, which would remove the load, but
> then we have many if-elses which could be slower, not to mention that
> the blk-mq-tag code deals in bitmap pointers anyway.

It might still be worthwhile to do:

if (tags->ptr == &tags->__default)
	foo(&tags->__default);

to make it clear, as that branch will predict easily. If if can be done
in a nice enough fashion and not sprinkled everywhere, in some fashion.

Should be testable, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27  1:46               ` Jens Axboe
@ 2019-11-27 13:05                 ` John Garry
  2019-11-27 13:12                   ` Hannes Reinecke
  2019-11-27 14:21                   ` Jens Axboe
  0 siblings, 2 replies; 32+ messages in thread
From: John Garry @ 2019-11-27 13:05 UTC (permalink / raw)
  To: Jens Axboe, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 27/11/2019 01:46, Jens Axboe wrote:
>>> Would be interesting to check the generated code for that, ideally we'd
>>> get rid of the extra load for that case, even if it is in the same
>>> cacheline.
>>>
>> I checked the disassembly and we still have the load instead of the add.
>>
>> This is not surprising, as the compiler would not know for certain that
>> we point to a field within the same struct. But at least we still should
>> point to a close memory.
>>
>> Note that the pointer could be dropped, which would remove the load, but
>> then we have many if-elses which could be slower, not to mention that
>> the blk-mq-tag code deals in bitmap pointers anyway.

Hi Jens,

> It might still be worthwhile to do:
> 
> if (tags->ptr == &tags->__default)
> 	foo(&tags->__default);
> 
> to make it clear, as that branch will predict easily. 

Not sure. So this code does produce the same assembly, as we still need 
to do the tags->ptr load for the comparison.

And then if you consider blk_mq_get_tags() as an example, there is no 
other hot value available to indicate whether the shared tags are used 
to decide whether to use  &tags->__default.

I'll consider it more.

If if can be done
> in a nice enough fashion and not sprinkled everywhere, in some fashion.
> 
> Should be testable, though.
> 
> -- Jens Axboe

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 13:05                 ` John Garry
@ 2019-11-27 13:12                   ` Hannes Reinecke
  2019-11-27 14:20                     ` Jens Axboe
  2019-11-27 14:21                   ` Jens Axboe
  1 sibling, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-27 13:12 UTC (permalink / raw)
  To: John Garry, Jens Axboe, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/27/19 2:05 PM, John Garry wrote:
> On 27/11/2019 01:46, Jens Axboe wrote:
>>>> Would be interesting to check the generated code for that, ideally we'd
>>>> get rid of the extra load for that case, even if it is in the same
>>>> cacheline.
>>>>
>>> I checked the disassembly and we still have the load instead of the add.
>>>
>>> This is not surprising, as the compiler would not know for certain that
>>> we point to a field within the same struct. But at least we still should
>>> point to a close memory.
>>>
>>> Note that the pointer could be dropped, which would remove the load, but
>>> then we have many if-elses which could be slower, not to mention that
>>> the blk-mq-tag code deals in bitmap pointers anyway.
> 
> Hi Jens,
> 
>> It might still be worthwhile to do:
>>
>> if (tags->ptr == &tags->__default)
>>     foo(&tags->__default);
>>
>> to make it clear, as that branch will predict easily. 
> 
> Not sure. So this code does produce the same assembly, as we still need
> to do the tags->ptr load for the comparison.
> 
> And then if you consider blk_mq_get_tags() as an example, there is no
> other hot value available to indicate whether the shared tags are used
> to decide whether to use  &tags->__default.
> 
> I'll consider it more.
> 
After talking to our compiler folks I guess it should be possible to
resurrect your original patch (with both the embedded bitmap and the
point to the bitmap), linking the pointer to the embedded bitmap in the
non-shared case.

Then we can access the pointer in all cases, but in the non-shared case
that will then point to the embedded one, thus avoiding the possible
cache miss.

I'll see how that goes.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 13:12                   ` Hannes Reinecke
@ 2019-11-27 14:20                     ` Jens Axboe
  0 siblings, 0 replies; 32+ messages in thread
From: Jens Axboe @ 2019-11-27 14:20 UTC (permalink / raw)
  To: Hannes Reinecke, John Garry, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/27/19 6:12 AM, Hannes Reinecke wrote:
> On 11/27/19 2:05 PM, John Garry wrote:
>> On 27/11/2019 01:46, Jens Axboe wrote:
>>>>> Would be interesting to check the generated code for that, ideally we'd
>>>>> get rid of the extra load for that case, even if it is in the same
>>>>> cacheline.
>>>>>
>>>> I checked the disassembly and we still have the load instead of the add.
>>>>
>>>> This is not surprising, as the compiler would not know for certain that
>>>> we point to a field within the same struct. But at least we still should
>>>> point to a close memory.
>>>>
>>>> Note that the pointer could be dropped, which would remove the load, but
>>>> then we have many if-elses which could be slower, not to mention that
>>>> the blk-mq-tag code deals in bitmap pointers anyway.
>>
>> Hi Jens,
>>
>>> It might still be worthwhile to do:
>>>
>>> if (tags->ptr == &tags->__default)
>>>      foo(&tags->__default);
>>>
>>> to make it clear, as that branch will predict easily.
>>
>> Not sure. So this code does produce the same assembly, as we still need
>> to do the tags->ptr load for the comparison.
>>
>> And then if you consider blk_mq_get_tags() as an example, there is no
>> other hot value available to indicate whether the shared tags are used
>> to decide whether to use  &tags->__default.
>>
>> I'll consider it more.
>>
> After talking to our compiler folks I guess it should be possible to
> resurrect your original patch (with both the embedded bitmap and the
> point to the bitmap), linking the pointer to the embedded bitmap in the
> non-shared case.
> 
> Then we can access the pointer in all cases, but in the non-shared case
> that will then point to the embedded one, thus avoiding the possible
> cache miss.
> 
> I'll see how that goes.

That's exactly what I suggested yesterday, the discussion now is on how
to make that work in the cleanest way.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 13:05                 ` John Garry
  2019-11-27 13:12                   ` Hannes Reinecke
@ 2019-11-27 14:21                   ` Jens Axboe
  2019-11-27 14:44                     ` John Garry
  1 sibling, 1 reply; 32+ messages in thread
From: Jens Axboe @ 2019-11-27 14:21 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/27/19 6:05 AM, John Garry wrote:
> On 27/11/2019 01:46, Jens Axboe wrote:
>>>> Would be interesting to check the generated code for that, ideally we'd
>>>> get rid of the extra load for that case, even if it is in the same
>>>> cacheline.
>>>>
>>> I checked the disassembly and we still have the load instead of the add.
>>>
>>> This is not surprising, as the compiler would not know for certain that
>>> we point to a field within the same struct. But at least we still should
>>> point to a close memory.
>>>
>>> Note that the pointer could be dropped, which would remove the load, but
>>> then we have many if-elses which could be slower, not to mention that
>>> the blk-mq-tag code deals in bitmap pointers anyway.
> 
> Hi Jens,
> 
>> It might still be worthwhile to do:
>>
>> if (tags->ptr == &tags->__default)
>> 	foo(&tags->__default);
>>
>> to make it clear, as that branch will predict easily.
> 
> Not sure. So this code does produce the same assembly, as we still need
> to do the tags->ptr load for the comparison.

How can it be the same? The approach in the patchset needs to load
*tags->ptr, this one needs tags->ptr. That's the big difference.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 14:21                   ` Jens Axboe
@ 2019-11-27 14:44                     ` John Garry
  2019-11-27 16:52                       ` Hannes Reinecke
  0 siblings, 1 reply; 32+ messages in thread
From: John Garry @ 2019-11-27 14:44 UTC (permalink / raw)
  To: Jens Axboe, Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 27/11/2019 14:21, Jens Axboe wrote:
> On 11/27/19 6:05 AM, John Garry wrote:
>> On 27/11/2019 01:46, Jens Axboe wrote:
>>>>> Would be interesting to check the generated code for that, ideally 
>>>>> we'd
>>>>> get rid of the extra load for that case, even if it is in the same
>>>>> cacheline.
>>>>>
>>>> I checked the disassembly and we still have the load instead of the 
>>>> add.
>>>>
>>>> This is not surprising, as the compiler would not know for certain that
>>>> we point to a field within the same struct. But at least we still 
>>>> should
>>>> point to a close memory.
>>>>
>>>> Note that the pointer could be dropped, which would remove the load, 
>>>> but
>>>> then we have many if-elses which could be slower, not to mention that
>>>> the blk-mq-tag code deals in bitmap pointers anyway.
>>
>> Hi Jens,
>>
>>> It might still be worthwhile to do:
>>>
>>> if (tags->ptr == &tags->__default)
>>>     foo(&tags->__default);
>>>
>>> to make it clear, as that branch will predict easily.
>>
>> Not sure. So this code does produce the same assembly, as we still need
>> to do the tags->ptr load for the comparison.
> 

Hi Jens,

> How can it be the same? The approach in the patchset needs to load
> *tags->ptr, this one needs tags->ptr. That's the big difference.
> 

In the patch for this thread, we have:

@@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct 
blk_mq_alloc_data *data)
  			WARN_ON_ONCE(1);
  			return BLK_MQ_TAG_FAIL;
  		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
  		tag_offset = 0;
  	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
  		tag_offset = tags->nr_reserved_tags;
  	}


So current code gets bt pointer by simply offsetting a certain distance 
from tags pointer - that is the add I mention.

With the change in this patch, we need to load memory at address 
&tags->bitmap_tags to get bt - this is the load I mention.

So for this:

if (tags->ptr == &tags->__default)

We load &tags->ptr to get the pointer value for comparison vs 
&tags->__default.

There must be something I'm missing...

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 14:44                     ` John Garry
@ 2019-11-27 16:52                       ` Hannes Reinecke
  2019-11-27 18:02                         ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-27 16:52 UTC (permalink / raw)
  To: John Garry, Jens Axboe, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 11/27/19 3:44 PM, John Garry wrote:
> On 27/11/2019 14:21, Jens Axboe wrote:
>> On 11/27/19 6:05 AM, John Garry wrote:
>>> On 27/11/2019 01:46, Jens Axboe wrote:
>>>>>> Would be interesting to check the generated code for that, ideally 
>>>>>> we'd
>>>>>> get rid of the extra load for that case, even if it is in the same
>>>>>> cacheline.
>>>>>>
>>>>> I checked the disassembly and we still have the load instead of the 
>>>>> add.
>>>>>
>>>>> This is not surprising, as the compiler would not know for certain 
>>>>> that
>>>>> we point to a field within the same struct. But at least we still 
>>>>> should
>>>>> point to a close memory.
>>>>>
>>>>> Note that the pointer could be dropped, which would remove the 
>>>>> load, but
>>>>> then we have many if-elses which could be slower, not to mention that
>>>>> the blk-mq-tag code deals in bitmap pointers anyway.
>>>
>>> Hi Jens,
>>>
>>>> It might still be worthwhile to do:
>>>>
>>>> if (tags->ptr == &tags->__default)
>>>>     foo(&tags->__default);
>>>>
>>>> to make it clear, as that branch will predict easily.
>>>
>>> Not sure. So this code does produce the same assembly, as we still need
>>> to do the tags->ptr load for the comparison.
>>
> 
> Hi Jens,
> 
>> How can it be the same? The approach in the patchset needs to load
>> *tags->ptr, this one needs tags->ptr. That's the big difference.
>>
> 
> In the patch for this thread, we have:
> 
> @@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct 
> blk_mq_alloc_data *data)
>               WARN_ON_ONCE(1);
>               return BLK_MQ_TAG_FAIL;
>           }
> -        bt = &tags->breserved_tags;
> +        bt = tags->breserved_tags;
>           tag_offset = 0;
>       } else {
> -        bt = &tags->bitmap_tags;
> +        bt = tags->bitmap_tags;
>           tag_offset = tags->nr_reserved_tags;
>       }
> 
> 
> So current code gets bt pointer by simply offsetting a certain distance 
> from tags pointer - that is the add I mention.
> 
> With the change in this patch, we need to load memory at address 
> &tags->bitmap_tags to get bt - this is the load I mention.
> 
> So for this:
> 
> if (tags->ptr == &tags->__default)
> 
> We load &tags->ptr to get the pointer value for comparison vs 
> &tags->__default.
> 
> There must be something I'm missing...
> 
The point here was that the load might refer to _other_ memory locations 
(as it's being allocated separately), thus incurring a cache miss.
With embedded tag bitmaps we'll load from the same cache line 
(hopefully), and won't get a performance hit.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                              +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-26 15:54       ` Ming Lei
@ 2019-11-27 17:02         ` Hannes Reinecke
  2019-11-29  0:25           ` Ming Lei
  0 siblings, 1 reply; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-27 17:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, John Garry, linux-scsi, linux-block

On 11/26/19 4:54 PM, Ming Lei wrote:
> On Tue, Nov 26, 2019 at 12:27:50PM +0100, Hannes Reinecke wrote:
>> On 11/26/19 12:05 PM, Ming Lei wrote:
[ .. ]
>>>  From performance viewpoint, all hctx belonging to this request queue should
>>> share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
>>> driver tag queue depth isn't changed.
>>>
>> Hmm. Now you get me confused.
>> In an earlier mail you said:
>>
>>> This kind of sharing is wrong, sched tags should be request
>>> queue wide instead of tagset wide, and each request queue has
>>> its own & independent scheduler queue.
>>
>> as in v2 we _had_ shared scheduler tags, too.
>> Did I misread your comment above?
> 
> Yes, what I meant is that we can't share sched tags in tagset wide.
> 
> Now I mean we should share sched tags among all hctxs in same request
> queue, and I believe I have described it clearly.
> 
I wonder if this makes a big difference; in the end, scheduler tags are 
primarily there to allow the scheduler to queue more requests, and 
potentially merge them. These tags are later converted into 'real' ones 
via blk_mq_get_driver_tag(), and only then the resource limitation takes 
hold.
Wouldn't it be sufficient to look at the number of outstanding commands 
per queue when getting a scheduler tag, and not having to implement yet 
another bitmap?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                              +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] blk-mq: Use a pointer for sbitmap
  2019-11-27 16:52                       ` Hannes Reinecke
@ 2019-11-27 18:02                         ` John Garry
  0 siblings, 0 replies; 32+ messages in thread
From: John Garry @ 2019-11-27 18:02 UTC (permalink / raw)
  To: Hannes Reinecke, Jens Axboe, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	linux-scsi, linux-block

On 27/11/2019 16:52, Hannes Reinecke wrote:
> On 11/27/19 3:44 PM, John Garry wrote:
>> On 27/11/2019 14:21, Jens Axboe wrote:
>>> On 11/27/19 6:05 AM, John Garry wrote:
>>>> On 27/11/2019 01:46, Jens Axboe wrote:
>>>>>>> Would be interesting to check the generated code for that, 
>>>>>>> ideally we'd
>>>>>>> get rid of the extra load for that case, even if it is in the same
>>>>>>> cacheline.
>>>>>>>
>>>>>> I checked the disassembly and we still have the load instead of 
>>>>>> the add.
>>>>>>
>>>>>> This is not surprising, as the compiler would not know for certain 
>>>>>> that
>>>>>> we point to a field within the same struct. But at least we still 
>>>>>> should
>>>>>> point to a close memory.
>>>>>>
>>>>>> Note that the pointer could be dropped, which would remove the 
>>>>>> load, but
>>>>>> then we have many if-elses which could be slower, not to mention that
>>>>>> the blk-mq-tag code deals in bitmap pointers anyway.
>>>>
>>>> Hi Jens,
>>>>
>>>>> It might still be worthwhile to do:
>>>>>
>>>>> if (tags->ptr == &tags->__default)
>>>>>     foo(&tags->__default);
>>>>>
>>>>> to make it clear, as that branch will predict easily.
>>>>
>>>> Not sure. So this code does produce the same assembly, as we still need
>>>> to do the tags->ptr load for the comparison.
>>>
>>
>> Hi Jens,
>>
>>> How can it be the same? The approach in the patchset needs to load
>>> *tags->ptr, this one needs tags->ptr. That's the big difference.
>>>
>>
>> In the patch for this thread, we have:
>>
>> @@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct 
>> blk_mq_alloc_data *data)
>>               WARN_ON_ONCE(1);
>>               return BLK_MQ_TAG_FAIL;
>>           }
>> -        bt = &tags->breserved_tags;
>> +        bt = tags->breserved_tags;
>>           tag_offset = 0;
>>       } else {
>> -        bt = &tags->bitmap_tags;
>> +        bt = tags->bitmap_tags;
>>           tag_offset = tags->nr_reserved_tags;
>>       }
>>
>>
>> So current code gets bt pointer by simply offsetting a certain 
>> distance from tags pointer - that is the add I mention.
>>
>> With the change in this patch, we need to load memory at address 
>> &tags->bitmap_tags to get bt - this is the load I mention.
>>
>> So for this:
>>
>> if (tags->ptr == &tags->__default)
>>
>> We load &tags->ptr to get the pointer value for comparison vs 
>> &tags->__default.
>>
>> There must be something I'm missing...
>>
> The point here was that the load might refer to _other_ memory locations 
> (as it's being allocated separately),

I think that we're talking about something different.

  thus incurring a cache miss.
> With embedded tag bitmaps we'll load from the same cache line 
> (hopefully), and won't get a performance hit.

But I'll just wait to see what you come up with.

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-27 17:02         ` Hannes Reinecke
@ 2019-11-29  0:25           ` Ming Lei
  2019-11-29  9:21             ` John Garry
  0 siblings, 1 reply; 32+ messages in thread
From: Ming Lei @ 2019-11-29  0:25 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, John Garry, linux-scsi, linux-block,
	Kashyap Desai

On Wed, Nov 27, 2019 at 06:02:54PM +0100, Hannes Reinecke wrote:
> On 11/26/19 4:54 PM, Ming Lei wrote:
> > On Tue, Nov 26, 2019 at 12:27:50PM +0100, Hannes Reinecke wrote:
> > > On 11/26/19 12:05 PM, Ming Lei wrote:
> [ .. ]
> > > >  From performance viewpoint, all hctx belonging to this request queue should
> > > > share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
> > > > driver tag queue depth isn't changed.
> > > > 
> > > Hmm. Now you get me confused.
> > > In an earlier mail you said:
> > > 
> > > > This kind of sharing is wrong, sched tags should be request
> > > > queue wide instead of tagset wide, and each request queue has
> > > > its own & independent scheduler queue.
> > > 
> > > as in v2 we _had_ shared scheduler tags, too.
> > > Did I misread your comment above?
> > 
> > Yes, what I meant is that we can't share sched tags in tagset wide.
> > 
> > Now I mean we should share sched tags among all hctxs in same request
> > queue, and I believe I have described it clearly.
> > 
> I wonder if this makes a big difference; in the end, scheduler tags are
> primarily there to allow the scheduler to queue more requests, and
> potentially merge them. These tags are later converted into 'real' ones via
> blk_mq_get_driver_tag(), and only then the resource limitation takes hold.
> Wouldn't it be sufficient to look at the number of outstanding commands per
> queue when getting a scheduler tag, and not having to implement yet another
> bitmap?

Firstly too much((nr_hw_queues - 1) times) memory is wasted. Secondly IO
latency could be increased by too deep scheduler queue depth. Finally CPU
could be wasted in the retrying of running busy hw queue.

Wrt. driver tags, this patch may be worse, given the average limit for
each LUN is reduced by (nr_hw_queues) times, see hctx_may_queue().

Another change is bt_wait_ptr(). Before your patches, there is single
.wait_index, now the number of .wait_index is changed to nr_hw_queues.

Also the run queue number is increased a lot in SCSI's IO completion, see
scsi_end_request().

Kashyap Desai has performance benchmark on fast megaraid SSD, and you can
ask him to provide performance data for this patches.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-29  0:25           ` Ming Lei
@ 2019-11-29  9:21             ` John Garry
  0 siblings, 0 replies; 32+ messages in thread
From: John Garry @ 2019-11-29  9:21 UTC (permalink / raw)
  To: Ming Lei, Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, linux-block, Kashyap Desai

On 29/11/2019 00:25, Ming Lei wrote:
> On Wed, Nov 27, 2019 at 06:02:54PM +0100, Hannes Reinecke wrote:
>> On 11/26/19 4:54 PM, Ming Lei wrote:
>>> On Tue, Nov 26, 2019 at 12:27:50PM +0100, Hannes Reinecke wrote:
>>>> On 11/26/19 12:05 PM, Ming Lei wrote:
>> [ .. ]
>>>>>   From performance viewpoint, all hctx belonging to this request queue should
>>>>> share one scheduler tagset in case of BLK_MQ_F_TAG_HCTX_SHARED, cause
>>>>> driver tag queue depth isn't changed.
>>>>>
>>>> Hmm. Now you get me confused.
>>>> In an earlier mail you said:
>>>>
>>>>> This kind of sharing is wrong, sched tags should be request
>>>>> queue wide instead of tagset wide, and each request queue has
>>>>> its own & independent scheduler queue.
>>>>
>>>> as in v2 we _had_ shared scheduler tags, too.
>>>> Did I misread your comment above?
>>>
>>> Yes, what I meant is that we can't share sched tags in tagset wide.
>>>
>>> Now I mean we should share sched tags among all hctxs in same request
>>> queue, and I believe I have described it clearly.
>>>
>> I wonder if this makes a big difference; in the end, scheduler tags are
>> primarily there to allow the scheduler to queue more requests, and
>> potentially merge them. These tags are later converted into 'real' ones via
>> blk_mq_get_driver_tag(), and only then the resource limitation takes hold.
>> Wouldn't it be sufficient to look at the number of outstanding commands per
>> queue when getting a scheduler tag, and not having to implement yet another
>> bitmap?
> 
> Firstly too much((nr_hw_queues - 1) times) memory is wasted. Secondly IO
> latency could be increased by too deep scheduler queue depth. Finally CPU
> could be wasted in the retrying of running busy hw queue.
> 
> Wrt. driver tags, this patch may be worse, given the average limit for
> each LUN is reduced by (nr_hw_queues) times, see hctx_may_queue().
> 
> Another change is bt_wait_ptr(). Before your patches, there is single
> .wait_index, now the number of .wait_index is changed to nr_hw_queues.
> 
> Also the run queue number is increased a lot in SCSI's IO completion, see
> scsi_end_request().
> 
> Kashyap Desai has performance benchmark on fast megaraid SSD, and you can
> ask him to provide performance data for this patches.

On v2 series (which is effectively same as this one [it would be nice if 
we had per-patch versioning]), for hisi_sas_v3_hw we get about same 
performance as when we use the reply_map, about 3.0M IOPS vs 3.1M IOPS, 
respectively.

Without this, we get 700/800K IOPS. I don't know why the performance is 
so poor without. Only CPU0 serves the completion interrupts, which could 
explain it, but v2 hw can get > 800K IOPS with only 6x SSDs.

Thanks,
John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 7/8] smartpqi: enable host tagset
  2019-11-26 13:10 [PATCH RFC v4 " Hannes Reinecke
@ 2019-11-26 13:10 ` Hannes Reinecke
  0 siblings, 0 replies; 32+ messages in thread
From: Hannes Reinecke @ 2019-11-26 13:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Ming Lei, Bart van Assche,
	John Garry, linux-scsi, linux-block, Hannes Reinecke

Enable host tagset for smartpqi; with this we can use the request
tag to look command from the pool avoiding the list iteration in
the hot path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/smartpqi/smartpqi_init.c | 38 ++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 7b7ef3acb504..c17b533c84ad 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -575,17 +575,29 @@ static inline void pqi_reinit_io_request(struct pqi_io_request *io_request)
 }
 
 static struct pqi_io_request *pqi_alloc_io_request(
-	struct pqi_ctrl_info *ctrl_info)
+	struct pqi_ctrl_info *ctrl_info, struct scsi_cmnd *scmd)
 {
 	struct pqi_io_request *io_request;
+	unsigned int limit = PQI_RESERVED_IO_SLOTS;
 	u16 i = ctrl_info->next_io_request_slot;	/* benignly racy */
 
-	while (1) {
+	if (scmd) {
+		u32 blk_tag = blk_mq_unique_tag(scmd->request);
+
+		i = blk_mq_unique_tag_to_tag(blk_tag) + limit;
 		io_request = &ctrl_info->io_request_pool[i];
-		if (atomic_inc_return(&io_request->refcount) == 1)
-			break;
-		atomic_dec(&io_request->refcount);
-		i = (i + 1) % ctrl_info->max_io_slots;
+		if (WARN_ON(atomic_inc_return(&io_request->refcount) > 1)) {
+			atomic_dec(&io_request->refcount);
+			return NULL;
+		}
+	} else {
+		while (1) {
+			io_request = &ctrl_info->io_request_pool[i];
+			if (atomic_inc_return(&io_request->refcount) == 1)
+				break;
+			atomic_dec(&io_request->refcount);
+			i = (i + 1) % limit;
+		}
 	}
 
 	/* benignly racy */
@@ -4075,7 +4087,7 @@ static int pqi_submit_raid_request_synchronous(struct pqi_ctrl_info *ctrl_info,
 
 	atomic_inc(&ctrl_info->sync_cmds_outstanding);
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 
 	put_unaligned_le16(io_request->index,
 		&(((struct pqi_raid_path_request *)request)->request_id));
@@ -5032,7 +5044,9 @@ static inline int pqi_raid_submit_scsi_cmd(struct pqi_ctrl_info *ctrl_info,
 {
 	struct pqi_io_request *io_request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
 
 	return pqi_raid_submit_scsi_cmd_with_io_request(ctrl_info, io_request,
 		device, scmd, queue_group);
@@ -5230,7 +5244,10 @@ static int pqi_aio_submit_io(struct pqi_ctrl_info *ctrl_info,
 	struct pqi_io_request *io_request;
 	struct pqi_aio_path_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
+
 	io_request->io_complete_callback = pqi_aio_io_complete;
 	io_request->scmd = scmd;
 	io_request->raid_bypass = raid_bypass;
@@ -5657,7 +5674,7 @@ static int pqi_lun_reset(struct pqi_ctrl_info *ctrl_info,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct pqi_task_management_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 	io_request->io_complete_callback = pqi_lun_reset_complete;
 	io_request->context = &wait;
 
@@ -6504,6 +6521,7 @@ static struct scsi_host_template pqi_driver_template = {
 	.map_queues = pqi_map_queues,
 	.sdev_attrs = pqi_sdev_attrs,
 	.shost_attrs = pqi_shost_attrs,
+	.host_tagset = 1,
 };
 
 static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-11-29  9:21 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-26  9:14 [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
2019-11-26  9:14 ` [PATCH 1/8] blk-mq: Remove some unused function arguments Hannes Reinecke
2019-11-26  9:14 ` [PATCH 2/8] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
2019-11-26  9:14 ` [PATCH 3/8] blk-mq: Use a pointer for sbitmap Hannes Reinecke
2019-11-26 15:14   ` Jens Axboe
2019-11-26 16:54     ` John Garry
2019-11-26 17:11       ` Jens Axboe
2019-11-26 17:23         ` John Garry
2019-11-26 17:25           ` Jens Axboe
2019-11-26 18:08             ` John Garry
2019-11-27  1:46               ` Jens Axboe
2019-11-27 13:05                 ` John Garry
2019-11-27 13:12                   ` Hannes Reinecke
2019-11-27 14:20                     ` Jens Axboe
2019-11-27 14:21                   ` Jens Axboe
2019-11-27 14:44                     ` John Garry
2019-11-27 16:52                       ` Hannes Reinecke
2019-11-27 18:02                         ` John Garry
2019-11-26  9:14 ` [PATCH 4/8] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
2019-11-26 11:05   ` Ming Lei
2019-11-26 11:27     ` Hannes Reinecke
2019-11-26 11:50       ` John Garry
2019-11-26 15:54       ` Ming Lei
2019-11-27 17:02         ` Hannes Reinecke
2019-11-29  0:25           ` Ming Lei
2019-11-29  9:21             ` John Garry
2019-11-26  9:14 ` [PATCH 5/8] scsi: Add template flag 'host_tagset' Hannes Reinecke
2019-11-26  9:14 ` [PATCH 6/8] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
2019-11-26  9:14 ` [PATCH 7/8] smartpqi: enable host tagset Hannes Reinecke
2019-11-26  9:14 ` [PATCH 8/8] hpsa: switch to using blk-mq Hannes Reinecke
2019-11-26 10:09 ` [PATCH RFC v3 0/8] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-26 13:10 [PATCH RFC v4 " Hannes Reinecke
2019-11-26 13:10 ` [PATCH 7/8] smartpqi: enable host tagset Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).