All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2019-12-02 15:39 Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 01/11] blk-mq: Remove some unused function arguments Hannes Reinecke
                   ` (11 more replies)
  0 siblings, 12 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

Hi all,

here now is an updated version of the v2 patchset from John Garry,
including the suggestions and reviews from the mailing list.
John, apologies for hijacking your work :-)

For this version I've only added some slight modifications to Johns
original patch (renaming variables etc); the contentious separate sbitmap
allocation has been dropped in favour of Johns original version with pointers
to the embedded sbitmap.

But more importantly I've reworked the scheduler tag allocation after
discussions with Ming Lei.

Point is, hostwide shared tags can't really be resized; they surely
cannot be increased (as it's a hardware limitation), and even decreasing
is questionable as any modification here would affect all devices
served by this HBA.

Scheduler tags, OTOH, can be considered as per-queue, as the I/O scheduler
might want to look at all requests on all queues. As such the queue depth
is distinct from the actual queue depth of the tagset.
Seeing that it is distinct the depth can now be changed independently of
the underlying tagset, and there's no need ever to change the tagset itself.

I've also modified megaraid_sas, smartpqi and hpsa to take advantage of
host_tags.

Performance for megaraid_sas is on par with the original implementation,
with the added benefit that with this we should be able to handle cpu
hotplug properly.

Differences to v4:
- Rework scheduler tag allocations
- Revert back to the original implementation from John

Differences to v3:
- Include reviews from Ming Lei

Differences to v2:
- Drop embedded tag bitmaps
- Do not share scheduling tags
- Add patches for hpsa and smartpqi

Differences to v1:
- Use a shared sbitmap, and not a separate shared tags (a big change!)
	- Drop request.shared_tag
- Add RB tags

Hannes Reinecke (7):
  blk-mq: rename blk_mq_update_tag_set_depth()
  blk-mq: add WARN_ON in blk_mq_free_rqs()
  blk-mq: move shared sbitmap into elevator queue
  scsi: Add template flag 'host_tagset'
  megaraid_sas: switch fusion adapters to MQ
  smartpqi: enable host tagset
  hpsa: enable host_tagset and switch to MQ

John Garry (3):
  blk-mq: Remove some unused function arguments
  blk-mq: Facilitate a shared sbitmap per tagset
  scsi: hisi_sas: Switch v3 hw to MQ

Ming Lei (1):
  blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED

 block/bfq-iosched.c                         |   4 +-
 block/blk-mq-debugfs.c                      |  12 +--
 block/blk-mq-sched.c                        |  22 +++++
 block/blk-mq-tag.c                          | 140 +++++++++++++++++++++-------
 block/blk-mq-tag.h                          |  27 ++++--
 block/blk-mq.c                              | 104 +++++++++++++++------
 block/blk-mq.h                              |   7 +-
 block/blk-sysfs.c                           |   7 ++
 block/kyber-iosched.c                       |   4 +-
 drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 +++----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  86 +++++++----------
 drivers/scsi/hpsa.c                         |  44 ++-------
 drivers/scsi/hpsa.h                         |   1 -
 drivers/scsi/megaraid/megaraid_sas.h        |   1 -
 drivers/scsi/megaraid/megaraid_sas_base.c   |  65 ++++---------
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  14 ++-
 drivers/scsi/scsi_lib.c                     |   2 +
 drivers/scsi/smartpqi/smartpqi_init.c       |  38 ++++++--
 include/linux/blk-mq.h                      |   7 +-
 include/linux/elevator.h                    |   3 +
 include/scsi/scsi_host.h                    |   3 +
 22 files changed, 380 insertions(+), 250 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/11] blk-mq: Remove some unused function arguments
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 02/11] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block

From: John Garry <john.garry@huawei.com>

The struct blk_mq_hw_ctx * argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c |  4 ++--
 block/blk-mq-tag.h |  4 ++--
 block/blk-mq.c     | 10 ++++------
 block/blk-mq.h     |  2 +-
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 008388e82b5c..53b4a9414fbd 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -191,8 +191,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	return tag + tag_offset;
 }
 
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-		    struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+		    unsigned int tag)
 {
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..66d04dea0bdb 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,8 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-			   struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+				unsigned int tag);
 extern bool blk_mq_has_free_tags(struct blk_mq_tags *tags);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6e3b15f70cd7..16aa20d23b67 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -499,9 +499,9 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
 	if (rq->tag != -1)
-		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
@@ -3354,7 +3354,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
 }
 
 static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
-				       struct blk_mq_hw_ctx *hctx,
 				       struct request *rq)
 {
 	unsigned long ret = 0;
@@ -3387,7 +3386,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 }
 
 static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
-				     struct blk_mq_hw_ctx *hctx,
 				     struct request *rq)
 {
 	struct hrtimer_sleeper hs;
@@ -3407,7 +3405,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
-		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+		nsecs = blk_mq_poll_nsecs(q, rq);
 
 	if (!nsecs)
 		return false;
@@ -3462,7 +3460,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 			return false;
 	}
 
-	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
 /**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 32c62c64e6c2..78d38b5f2793 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -208,7 +208,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
 	rq->tag = -1;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/11] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 01/11] blk-mq: Remove some unused function arguments Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth() Hannes Reinecke
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block

From: Ming Lei <ming.lei@redhat.com>

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-debugfs.c |  2 +-
 block/blk-mq-tag.c     |  2 +-
 block/blk-mq-tag.h     |  4 ++--
 block/blk-mq.c         | 20 ++++++++++----------
 include/linux/blk-mq.h |  2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..33a40ae1d60f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -236,7 +236,7 @@ static const char *const alloc_policy_name[] = {
 #define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
-	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 53b4a9414fbd..d7aa23c82dbf 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -73,7 +73,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 66d04dea0bdb..6c0f7c9ce9f6 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -55,7 +55,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
 static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return false;
 
 	return __blk_mq_tag_busy(hctx);
@@ -63,7 +63,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return;
 
 	__blk_mq_tag_idle(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 16aa20d23b67..6b39cf0efdcd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -302,7 +302,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		rq->tag = -1;
 		rq->internal_tag = tag;
 	} else {
-		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
+		if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
 			rq_flags = RQF_MQ_INFLIGHT;
 			atomic_inc(&data->hctx->nr_active);
 		}
@@ -1118,7 +1118,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -1249,7 +1249,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				 * For non-shared tags, the RESTART check
 				 * will suffice.
 				 */
-				if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+				if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 					no_tag = true;
 				break;
 			}
@@ -2358,7 +2358,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
 	hctx->queue = q;
-	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
 
 	INIT_LIST_HEAD(&hctx->hctx_list);
 
@@ -2575,9 +2575,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (shared)
-			hctx->flags |= BLK_MQ_F_TAG_SHARED;
+			hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		else
-			hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+			hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 	}
 }
 
@@ -2603,7 +2603,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	list_del_rcu(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
-		set->flags &= ~BLK_MQ_F_TAG_SHARED;
+		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, false);
 	}
@@ -2620,12 +2620,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
 	if (!list_empty(&set->tag_list) &&
-	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
-		set->flags |= BLK_MQ_F_TAG_SHARED;
+	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, true);
 	}
-	if (set->flags & BLK_MQ_F_TAG_SHARED)
+	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
 	list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 0bf056de5cc3..147185394a25 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -225,7 +225,7 @@ struct blk_mq_ops {
 
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
-	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth()
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 01/11] blk-mq: Remove some unused function arguments Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 02/11] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-03 14:30   ` John Garry
  2019-12-02 15:39 ` [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

The function does not set the depth, but rather transitions from
shared to non-shared queues and vice versa.
So rename it to blk_mq_update_tag_set_shared() to better reflect
its purpose.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-tag.c | 18 ++++++++++--------
 block/blk-mq.c     |  8 ++++----
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d7aa23c82dbf..f5009587e1b5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -440,24 +440,22 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
 				       node);
 }
 
-static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
-						   int node, int alloc_policy)
+static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
+				   int node, int alloc_policy)
 {
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
 	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
-		goto free_tags;
+		return -ENOMEM;
 	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
 		     node))
 		goto free_bitmap_tags;
 
-	return tags;
+	return 0;
 free_bitmap_tags:
 	sbitmap_queue_free(&tags->bitmap_tags);
-free_tags:
-	kfree(tags);
-	return NULL;
+	return -ENOMEM;
 }
 
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
@@ -478,7 +476,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
-	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
+	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+		kfree(tags);
+		tags = NULL;
+	}
+	return tags;
 }
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6b39cf0efdcd..91950d3e436a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2581,8 +2581,8 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 	}
 }
 
-static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
-					bool shared)
+static void blk_mq_update_tag_set_shared(struct blk_mq_tag_set *set,
+					 bool shared)
 {
 	struct request_queue *q;
 
@@ -2605,7 +2605,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
-		blk_mq_update_tag_set_depth(set, false);
+		blk_mq_update_tag_set_shared(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
 	INIT_LIST_HEAD(&q->tag_set_list);
@@ -2623,7 +2623,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
-		blk_mq_update_tag_set_depth(set, true);
+		blk_mq_update_tag_set_shared(set, true);
 	}
 	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (2 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth() Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-03 14:54   ` John Garry
  2019-12-02 15:39 ` [PATCH 05/11] blk-mq: add WARN_ON in blk_mq_free_rqs() Hannes Reinecke
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

From: John Garry <john.garry@huawei.com>

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown.

To solve that problem, Ming's patchset to drain hctx's should ensure no
IOs are missed in-flight [1].

However, to take advantage of that patchset, we need to map the HBA HW
queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
host busy counter, which would stop the LLDD being sent more than
.can_queue commands; however, we should still ensure that the block layer
does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
tags and requests per hctx; the reason for this is that if we only allocate
tags and requests for a single hctx - like hctx0 - we may break block
drivers which expect a request be associated with a specific hctx, i.e.
not hctx0.

This is based on work originally from Ming Lei in [3] and from Bart's
suggestion in [4].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
[3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/bfq-iosched.c    |   4 +-
 block/blk-mq-debugfs.c |  10 ++---
 block/blk-mq-sched.c   |  14 ++++++
 block/blk-mq-tag.c     | 114 +++++++++++++++++++++++++++++++++++++++----------
 block/blk-mq-tag.h     |  17 ++++++--
 block/blk-mq.c         |  67 ++++++++++++++++++++++++++---
 block/blk-mq.h         |   5 +++
 block/kyber-iosched.c  |   4 +-
 include/linux/blk-mq.h |   9 ++++
 9 files changed, 204 insertions(+), 40 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0319d6339822..ca89d0c34994 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
+	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 33a40ae1d60f..46f57dbed890 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -449,11 +449,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 		   atomic_read(&tags->active_queues));
 
 	seq_puts(m, "\nbitmap_tags:\n");
-	sbitmap_queue_show(&tags->bitmap_tags, m);
+	sbitmap_queue_show(tags->bitmap_tags, m);
 
 	if (tags->nr_reserved_tags) {
 		seq_puts(m, "\nbreserved_tags:\n");
-		sbitmap_queue_show(&tags->breserved_tags, m);
+		sbitmap_queue_show(tags->breserved_tags, m);
 	}
 }
 
@@ -483,8 +483,8 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
+	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
+		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
@@ -518,7 +518,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->sched_tags)
-		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..1855f8f5edd4 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 {
+	struct blk_mq_tag_set *tag_set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
 	struct elevator_queue *eq;
 	unsigned int i;
@@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		blk_mq_debugfs_register_sched_hctx(q, hctx);
 	}
 
+	if (blk_mq_is_sbitmap_shared(tag_set)) {
+		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
+			ret = -ENOMEM;
+			goto err;
+		}
+		queue_for_each_hw_ctx(q, hctx, i) {
+			struct blk_mq_tags *tags = hctx->sched_tags;
+
+			tags->bitmap_tags = &tag_set->__sched_bitmap_tags;
+			tags->breserved_tags = &tag_set->__sched_breserved_tags;
+		}
+	}
+
 	return 0;
 
 err:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index f5009587e1b5..2e714123e846 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -20,7 +20,7 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
 	if (!tags)
 		return true;
 
-	return sbitmap_any_bit_clear(&tags->bitmap_tags.sb);
+	return sbitmap_any_bit_clear(&tags->bitmap_tags->sb);
 }
 
 /*
@@ -43,9 +43,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->bitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->breserved_tags);
 }
 
 /*
@@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_TAG_FAIL;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -170,9 +170,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->breserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->bitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -198,10 +198,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -228,7 +228,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -329,8 +329,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, tags->breserved_tags, fn, priv, true);
+	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, false);
 }
 
 /**
@@ -427,8 +427,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -446,19 +446,85 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
-	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
+	if (bt_alloc(&tags->__bitmap_tags, depth, round_robin, node))
 		return -ENOMEM;
-	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
-		     node))
+	if (bt_alloc(&tags->__breserved_tags, tags->nr_reserved_tags,
+		     round_robin, node))
 		goto free_bitmap_tags;
 
+	tags->bitmap_tags = &tags->__bitmap_tags;
+	tags->breserved_tags = &tags->__breserved_tags;
+
 	return 0;
 free_bitmap_tags:
-	sbitmap_queue_free(&tags->bitmap_tags);
+	sbitmap_queue_free(&tags->__bitmap_tags);
 	return -ENOMEM;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (tag_set->flags & BLK_MQ_F_TAG_BITMAP_ALLOCATED)
+		return false;
+	if (bt_alloc(&tag_set->__bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->__breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+	tag_set->flags |= BLK_MQ_F_TAG_BITMAP_ALLOCATED;
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->__bitmap_tags);
+	return false;
+}
+
+void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	if (tag_set->flags & BLK_MQ_F_TAG_BITMAP_ALLOCATED) {
+		sbitmap_queue_free(&tag_set->__bitmap_tags);
+		sbitmap_queue_free(&tag_set->__breserved_tags);
+		tag_set->flags &= ~BLK_MQ_F_TAG_BITMAP_ALLOCATED;
+	}
+}
+
+bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
+				      unsigned long nr_requests)
+{
+	unsigned int depth = nr_requests -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (tag_set->flags & BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED)
+		return false;
+	if (bt_alloc(&tag_set->__sched_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->__sched_breserved_tags, tag_set->reserved_tags,
+		     round_robin, node))
+		goto free_bitmap_tags;
+
+	tag_set->flags |= BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED;
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->__sched_bitmap_tags);
+	return false;
+}
+
+void blk_mq_exit_shared_sched_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	if (tag_set->flags & BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED) {
+		sbitmap_queue_free(&tag_set->__sched_bitmap_tags);
+		sbitmap_queue_free(&tag_set->__sched_breserved_tags);
+		tag_set->flags &= ~BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED;
+	}
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
 {
@@ -476,6 +542,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	if (blk_mq_is_sbitmap_shared(set))
+		return tags;
 	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
 		kfree(tags);
 		tags = NULL;
@@ -485,8 +553,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	sbitmap_queue_free(&tags->bitmap_tags);
-	sbitmap_queue_free(&tags->breserved_tags);
+	if (tags->bitmap_tags == &tags->__bitmap_tags)
+		sbitmap_queue_free(&tags->__bitmap_tags);
+	if (tags->breserved_tags == &tags->__breserved_tags)
+		sbitmap_queue_free(&tags->__breserved_tags);
 	kfree(tags);
 }
 
@@ -536,7 +606,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
+		sbitmap_queue_resize(tags->bitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 6c0f7c9ce9f6..9463b878462f 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,8 +13,11 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
-	struct sbitmap_queue bitmap_tags;
-	struct sbitmap_queue breserved_tags;
+	struct sbitmap_queue *bitmap_tags;
+	struct sbitmap_queue *breserved_tags;
+
+	struct sbitmap_queue __bitmap_tags;
+	struct sbitmap_queue __breserved_tags;
 
 	struct request **rqs;
 	struct request **static_rqs;
@@ -22,7 +25,15 @@ struct blk_mq_tags {
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
+					     unsigned long nr_requests);
+extern void blk_mq_exit_shared_sched_sbitmap(struct blk_mq_tag_set *tag_set);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+					    unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 91950d3e436a..016f8401cfb9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1095,7 +1095,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = &hctx->tags->bitmap_tags;
+		sbq = hctx->tags->bitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1113,7 +1113,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = hctx->tags->bitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -2081,7 +2081,6 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 	tags->rqs = NULL;
 	kfree(tags->static_rqs);
 	tags->static_rqs = NULL;
-
 	blk_mq_free_tags(tags);
 }
 
@@ -2097,7 +2096,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
@@ -2954,8 +2953,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	return 0;
 
 out_unwind:
-	while (--i >= 0)
+	while (--i >= 0) {
 		blk_mq_free_rq_map(set->tags[i]);
+		set->tags[i] = NULL;
+	}
 
 	return -ENOMEM;
 }
@@ -3100,6 +3101,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set)) {
+		if (!blk_mq_init_shared_sbitmap(set)) {
+			ret = -ENOMEM;
+			goto out_free_mq_map;
+		}
+
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			struct blk_mq_tags *tags = set->tags[i];
+
+			tags->bitmap_tags = &set->__bitmap_tags;
+			tags->breserved_tags = &set->__breserved_tags;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
@@ -3123,6 +3138,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 	for (i = 0; i < nr_hw_queues(set); i++)
 		blk_mq_free_map_and_requests(set, i);
 
+	blk_mq_exit_shared_sched_sbitmap(set);
+	blk_mq_exit_shared_sbitmap(set);
+
 	for (j = 0; j < set->nr_maps; j++) {
 		kfree(set->map[j].mq_map);
 		set->map[j].mq_map = NULL;
@@ -3137,6 +3155,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
+	bool sched_tags = false;
 	int i, ret;
 
 	if (!set)
@@ -3160,6 +3179,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
 		} else {
+			sched_tags = true;
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
 		}
@@ -3169,8 +3189,43 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
 
-	if (!ret)
+	/*
+	 * if ret is 0, all queues should have been updated to the same depth
+	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
+	 * if some are updated, we should probably roll back the change altogether. FIXME
+	 */
+	if (!ret) {
+		if (blk_mq_is_sbitmap_shared(set)) {
+			if (sched_tags) {
+				blk_mq_exit_shared_sched_sbitmap(set);
+				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->sched_tags->bitmap_tags =
+						&set->__sched_bitmap_tags;
+					hctx->sched_tags->breserved_tags =
+						&set->__sched_breserved_tags;
+				}
+			} else {
+				blk_mq_exit_shared_sbitmap(set);
+				if (!blk_mq_init_shared_sbitmap(set))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->tags->bitmap_tags =
+						&set->__bitmap_tags;
+					hctx->tags->breserved_tags =
+						&set->__breserved_tags;
+				}
+			}
+		}
 		q->nr_requests = nr;
+	}
+	/*
+	 * if ret != 0, q->nr_requests would not be updated, yet the depth
+	 * for some hctx may have changed - is that right?
+	 */
 
 	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 78d38b5f2793..4c1ea206d3f4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0ef637..a7a537501d70 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
-	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+	sbitmap_queue_min_shallow_depth(hctx->sched_tags->bitmap_tags,
 					kqd->async_depth);
 
 	return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 147185394a25..10c9ed3dbe80 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
 	unsigned int		flags;		/* BLK_MQ_F_* */
 	void			*driver_data;
 
+	struct sbitmap_queue	__bitmap_tags;
+	struct sbitmap_queue	__breserved_tags;
+
+	struct sbitmap_queue	__sched_bitmap_tags;
+	struct sbitmap_queue	__sched_breserved_tags;
+
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -226,6 +232,9 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
+	BLK_MQ_F_TAG_BITMAP_ALLOCATED	= 1 << 3,
+	BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED = 1 << 4,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/11] blk-mq: add WARN_ON in blk_mq_free_rqs()
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (3 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 06/11] blk-mq: move shared sbitmap into elevator queue Hannes Reinecke
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

Before freeing up the requests we should ensure that none of
those requests are still present in the ->rqs array; this could
lead to an use-after free error.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 016f8401cfb9..054c0597c052 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2049,10 +2049,14 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx)
 {
 	struct page *page;
+	int i;
 
-	if (tags->rqs && set->ops->exit_request) {
-		int i;
-
+	if (tags->rqs) {
+		for (i = 0; i < tags->nr_tags; i++)
+			if (WARN_ON(tags->rqs[i]))
+				tags->rqs[i] = NULL;
+	}
+	if (tags->static_rqs && set->ops->exit_request) {
 		for (i = 0; i < tags->nr_tags; i++) {
 			struct request *rq = tags->static_rqs[i];
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/11] blk-mq: move shared sbitmap into elevator queue
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (4 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 05/11] blk-mq: add WARN_ON in blk_mq_free_rqs() Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 07/11] scsi: Add template flag 'host_tagset' Hannes Reinecke
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

When an elevator is present the sbitmap is actually per request queue,
so modifying the tagset queue depth in the shared sbitmap case will
be doing the wrong thing as it'll change the queue depth for all
request queues.
So this patch moves the shared scheduler sbitmap into struct
elevator queue, thereby insulating any modifications to this
request queue only.
And with that we can increase the number of requests for the
shared sbitmap case to the queue depth times the number of queues,
as now all tags are allocated from the same bitmap.
This also solves the problem of sbitmap resizing in the shared sbitmap
case; we can simply require an elevator to be present if the queue depth
needs to be modified, as then the queue depth will be modified for this
particular request queue only.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 block/blk-mq-sched.c     | 14 +++++++++++---
 block/blk-mq-tag.c       | 30 ++++++++++++++++--------------
 block/blk-mq-tag.h       |  8 +++++---
 block/blk-mq.c           | 41 +++++++++++++++--------------------------
 block/blk-sysfs.c        |  7 +++++++
 include/linux/blk-mq.h   |  4 ----
 include/linux/elevator.h |  3 +++
 7 files changed, 57 insertions(+), 50 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 1855f8f5edd4..f2184199a1b7 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -511,6 +511,12 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	 */
 	q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
 				   BLKDEV_MAX_RQ);
+	/*
+	 * For the shared sbitmap case it's per request queue, so multiply
+	 * with the number of hw queues
+	 */
+	if (blk_mq_is_sbitmap_shared(tag_set))
+		q->nr_requests *= q->nr_hw_queues;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		ret = blk_mq_sched_alloc_tags(q, hctx, i);
@@ -539,15 +545,16 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	}
 
 	if (blk_mq_is_sbitmap_shared(tag_set)) {
-		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
+		eq = q->elevator;
+		if (!blk_mq_init_elevator_sbitmap(q, eq, q->nr_requests)) {
 			ret = -ENOMEM;
 			goto err;
 		}
 		queue_for_each_hw_ctx(q, hctx, i) {
 			struct blk_mq_tags *tags = hctx->sched_tags;
 
-			tags->bitmap_tags = &tag_set->__sched_bitmap_tags;
-			tags->breserved_tags = &tag_set->__sched_breserved_tags;
+			tags->bitmap_tags = &eq->__bitmap_tags;
+			tags->breserved_tags = &eq->__breserved_tags;
 		}
 	}
 
@@ -591,5 +598,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
 	if (e->type->ops.exit_sched)
 		e->type->ops.exit_sched(e);
 	blk_mq_sched_tags_teardown(q);
+	blk_mq_exit_elevator_sbitmap(q, e);
 	q->elevator = NULL;
 }
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 2e714123e846..39c7beffdd04 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -491,35 +491,37 @@ void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set)
 	}
 }
 
-bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
-				      unsigned long nr_requests)
+bool blk_mq_init_elevator_sbitmap(struct request_queue *q,
+				  struct elevator_queue *eq,
+				  unsigned int nr_requests)
 {
-	unsigned int depth = nr_requests -tag_set->reserved_tags;
+	struct blk_mq_tag_set *tag_set = q->tag_set;
+	unsigned int depth = nr_requests - tag_set->reserved_tags;
 	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 	int node = tag_set->numa_node;
 
-	if (tag_set->flags & BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED)
-		return false;
-	if (bt_alloc(&tag_set->__sched_bitmap_tags, depth, round_robin, node))
+	if (!blk_mq_is_sbitmap_shared(q->tag_set))
+		return true;
+
+	if (bt_alloc(&eq->__bitmap_tags, depth, round_robin, node))
 		return false;
-	if (bt_alloc(&tag_set->__sched_breserved_tags, tag_set->reserved_tags,
+	if (bt_alloc(&eq->__breserved_tags, tag_set->reserved_tags,
 		     round_robin, node))
 		goto free_bitmap_tags;
 
-	tag_set->flags |= BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED;
 	return true;
 free_bitmap_tags:
-	sbitmap_queue_free(&tag_set->__sched_bitmap_tags);
+	sbitmap_queue_free(&eq->__bitmap_tags);
 	return false;
 }
 
-void blk_mq_exit_shared_sched_sbitmap(struct blk_mq_tag_set *tag_set)
+void blk_mq_exit_elevator_sbitmap(struct request_queue *q,
+				  struct elevator_queue *eq)
 {
-	if (tag_set->flags & BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED) {
-		sbitmap_queue_free(&tag_set->__sched_bitmap_tags);
-		sbitmap_queue_free(&tag_set->__sched_breserved_tags);
-		tag_set->flags &= ~BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED;
+	if (blk_mq_is_sbitmap_shared(q->tag_set)) {
+		sbitmap_queue_free(&eq->__bitmap_tags);
+		sbitmap_queue_free(&eq->__breserved_tags);
 	}
 }
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 9463b878462f..31a42d50a8f1 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -27,9 +27,11 @@ struct blk_mq_tags {
 
 extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
 extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *tag_set);
-extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
-					     unsigned long nr_requests);
-extern void blk_mq_exit_shared_sched_sbitmap(struct blk_mq_tag_set *tag_set);
+extern bool blk_mq_init_elevator_sbitmap(struct request_queue *q,
+					 struct elevator_queue *eq,
+					 unsigned int nr_tags);
+extern void blk_mq_exit_elevator_sbitmap(struct request_queue *q,
+					 struct elevator_queue *eq);
 extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
 					    unsigned int nr_tags,
 					    unsigned int reserved_tags,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 054c0597c052..c5cff1de56b3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3142,7 +3142,6 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 	for (i = 0; i < nr_hw_queues(set); i++)
 		blk_mq_free_map_and_requests(set, i);
 
-	blk_mq_exit_shared_sched_sbitmap(set);
 	blk_mq_exit_shared_sbitmap(set);
 
 	for (j = 0; j < set->nr_maps; j++) {
@@ -3159,12 +3158,14 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
-	bool sched_tags = false;
 	int i, ret;
 
 	if (!set)
 		return -EINVAL;
 
+	if (blk_mq_is_sbitmap_shared(set) && !q->elevator)
+		return -EINVAL;
+
 	if (q->nr_requests == nr)
 		return 0;
 
@@ -3183,7 +3184,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
 		} else {
-			sched_tags = true;
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
 		}
@@ -3199,29 +3199,18 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	 * if some are updated, we should probably roll back the change altogether. FIXME
 	 */
 	if (!ret) {
-		if (blk_mq_is_sbitmap_shared(set)) {
-			if (sched_tags) {
-				blk_mq_exit_shared_sched_sbitmap(set);
-				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
-					return -ENOMEM; /* fixup error handling */
-
-				queue_for_each_hw_ctx(q, hctx, i) {
-					hctx->sched_tags->bitmap_tags =
-						&set->__sched_bitmap_tags;
-					hctx->sched_tags->breserved_tags =
-						&set->__sched_breserved_tags;
-				}
-			} else {
-				blk_mq_exit_shared_sbitmap(set);
-				if (!blk_mq_init_shared_sbitmap(set))
-					return -ENOMEM; /* fixup error handling */
-
-				queue_for_each_hw_ctx(q, hctx, i) {
-					hctx->tags->bitmap_tags =
-						&set->__bitmap_tags;
-					hctx->tags->breserved_tags =
-						&set->__breserved_tags;
-				}
+		if (blk_mq_is_sbitmap_shared(set) && q->elevator) {
+			struct elevator_queue *eq = q->elevator;
+
+			blk_mq_exit_elevator_sbitmap(q, eq);
+			if (!blk_mq_init_elevator_sbitmap(q, eq, nr))
+				return -ENOMEM; /* fixup error handling */
+
+			queue_for_each_hw_ctx(q, hctx, i) {
+				hctx->sched_tags->bitmap_tags =
+					&eq->__bitmap_tags;
+				hctx->sched_tags->breserved_tags =
+					&eq->__breserved_tags;
 			}
 		}
 		q->nr_requests = nr;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 46f5198be017..01e644b1cba5 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -78,6 +78,13 @@ queue_requests_store(struct request_queue *q, const char *page, size_t count)
 	if (nr < BLKDEV_MIN_RQ)
 		nr = BLKDEV_MIN_RQ;
 
+	/*
+	 * We can only modify the queue depth for shared sbitmaps
+	 * if an I/O scheduler is set.
+	 */
+	if (blk_mq_is_sbitmap_shared(q->tag_set) && !q->elevator)
+		return -EINVAL;
+
 	err = blk_mq_update_nr_requests(q, nr);
 	if (err)
 		return err;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 10c9ed3dbe80..b4515bb862d4 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -112,9 +112,6 @@ struct blk_mq_tag_set {
 	struct sbitmap_queue	__bitmap_tags;
 	struct sbitmap_queue	__breserved_tags;
 
-	struct sbitmap_queue	__sched_bitmap_tags;
-	struct sbitmap_queue	__sched_breserved_tags;
-
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -234,7 +231,6 @@ enum {
 	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
 	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
 	BLK_MQ_F_TAG_BITMAP_ALLOCATED	= 1 << 3,
-	BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED = 1 << 4,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 901bda352dcb..11d492f56089 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -4,6 +4,7 @@
 
 #include <linux/percpu.h>
 #include <linux/hashtable.h>
+#include <linux/sbitmap.h>
 
 #ifdef CONFIG_BLOCK
 
@@ -104,6 +105,8 @@ struct elevator_queue
 	void *elevator_data;
 	struct kobject kobj;
 	struct mutex sysfs_lock;
+	struct sbitmap_queue __bitmap_tags;
+	struct sbitmap_queue __breserved_tags;
 	unsigned int registered:1;
 	DECLARE_HASHTABLE(hash, ELV_HASH_BITS);
 };
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/11] scsi: Add template flag 'host_tagset'
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (5 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 06/11] blk-mq: move shared sbitmap into elevator queue Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 08/11] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2563b061f56b..d7ad7b99bc05 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1899,6 +1899,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f577647bf5f2..4fd0af0883dd 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -429,6 +429,9 @@ struct scsi_host_template {
 	/* True if the low-level driver supports blk-mq only */
 	unsigned force_blk_mq:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/11] scsi: hisi_sas: Switch v3 hw to MQ
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (6 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 07/11] scsi: Add template flag 'host_tagset' Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 09/11] megaraid_sas: switch fusion adapters " Hannes Reinecke
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block

From: John Garry <john.garry@huawei.com>

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 +++++++-------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 ++++++++++++++--------------------
 3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 233c73e01246..0405602df2a4 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
 #define _HISI_SAS_H_
 
 #include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
@@ -431,7 +433,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* bist */
 	enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 03588ec3c394..e185935c3399 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -421,6 +421,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
 	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+	struct scsi_cmnd *scmd = NULL;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -436,10 +437,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	if (task->uldd_task) {
+		struct ata_queued_cmd *qc;
 
+		if (dev_is_sata(device)) {
+			qc = task->uldd_task;
+			scmd = qc->scsicmd;
+		} else {
+			scmd = task->uldd_task;
+		}
+	}
+
+	if (scmd) {
+		unsigned int dq_index;
+		u32 blk_tag;
+
+		blk_tag = blk_mq_unique_tag(scmd->request);
+		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
 		*dq_pointer = dq = sas_dev->dq;
@@ -468,21 +482,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
+	else
+		rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
 
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
-		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index bf5d5f138437..e7b015d88968 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2353,66 +2353,35 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 {
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+	int vectors;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	struct irq_affinity desc = {
+		.pre_vectors = BASE_VECTORS_V3_HW,
+	};
+
+	min_msi = MIN_AFFINE_VECTORS_V3_HW;
+	vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+						 min_msi, max_msi,
+						 PCI_IRQ_MSI |
+						 PCI_IRQ_AFFINITY,
+						 &desc);
+	if (vectors < 0)
+		return -ENOENT;
 
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
 
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
+	return 0;
 }
 
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc, i;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
-	if (auto_affine_msi_experimental) {
-		struct irq_affinity desc = {
-			.pre_vectors = BASE_VECTORS_V3_HW,
-		};
-
-		min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
-		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
-							 min_msi, max_msi,
-							 PCI_IRQ_MSI |
-							 PCI_IRQ_AFFINITY,
-							 &desc);
-		if (vectors < 0)
-			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
-	} else {
-		min_msi = max_msi;
-		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
-						max_msi, PCI_IRQ_MSI);
-		if (vectors < 0)
-			return vectors;
-	}
-
-	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	int rc, i;
 
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
@@ -3057,6 +3026,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
 	return 0;
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+					     BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -3065,6 +3043,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
+	.map_queues		= hisi_sas_map_queues,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
@@ -3078,6 +3057,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3249,6 +3229,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	rc = interrupt_preinit_v3_hw(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+	dev_err(dev, "%d hw qeues\n", shost->nr_hw_queues);
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (7 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 08/11] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-09 10:10   ` Sumit Saxena
  2019-12-02 15:39 ` [PATCH 10/11] smartpqi: enable host tagset Hannes Reinecke
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke,
	Hannes Reinecke

Fusion adapters can steer completions to individual queues, and
we now have support for shared host-wide tags.
So we can enable multiqueue support for fusion adapters and
drop the hand-crafted interrupt affinity settings.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/megaraid/megaraid_sas.h        |  1 -
 drivers/scsi/megaraid/megaraid_sas_base.c   | 65 +++++++++--------------------
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 ++++---
 3 files changed, 28 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index bd8184072bed..844ea2d6dbb8 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -2261,7 +2261,6 @@ enum MR_PERF_MODE {
 
 struct megasas_instance {
 
-	unsigned int *reply_map;
 	__le32 *producer;
 	dma_addr_t producer_h;
 	__le32 *consumer;
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index a4bc81479284..9d0d74e3d491 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -37,6 +37,7 @@
 #include <linux/poll.h>
 #include <linux/vmalloc.h>
 #include <linux/irq_poll.h>
+#include <linux/blk-mq-pci.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -3106,6 +3107,19 @@ megasas_bios_param(struct scsi_device *sdev, struct block_device *bdev,
 	return 0;
 }
 
+static int megasas_map_queues(struct Scsi_Host *shost)
+{
+	struct megasas_instance *instance;
+
+	instance = (struct megasas_instance *)shost->hostdata;
+
+	if (!instance->smp_affinity_enable)
+		return 0;
+
+	return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
+			instance->pdev, instance->low_latency_index_start);
+}
+
 static void megasas_aen_polling(struct work_struct *work);
 
 /**
@@ -3414,9 +3428,11 @@ static struct scsi_host_template megasas_template = {
 	.eh_timed_out = megasas_reset_timer,
 	.shost_attrs = megaraid_host_attrs,
 	.bios_param = megasas_bios_param,
+	.map_queues = megasas_map_queues,
 	.change_queue_depth = scsi_change_queue_depth,
 	.max_segment_size = 0xffffffff,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 /**
@@ -5695,34 +5711,6 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
 		instance->use_seqnum_jbod_fp = false;
 }
 
-static void megasas_setup_reply_map(struct megasas_instance *instance)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu, low_latency_index_start;
-
-	low_latency_index_start = instance->low_latency_index_start;
-
-	for (queue = low_latency_index_start; queue < instance->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(instance->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			instance->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	queue = low_latency_index_start;
-	for_each_possible_cpu(cpu) {
-		instance->reply_map[cpu] = queue;
-		if (queue == (instance->msix_vectors - 1))
-			queue = low_latency_index_start;
-		else
-			queue++;
-	}
-}
-
 /**
  * megasas_get_device_list -	Get the PD and LD device list from FW.
  * @instance:			Adapter soft state
@@ -6021,12 +6009,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
 					instance->is_rdpq = (scratch_pad_1 & MR_RDPQ_MODE_OFFSET) ?
 								1 : 0;
 
-				if (instance->adapter_type >= INVADER_SERIES &&
-				    !instance->msix_combined) {
-					instance->msix_load_balance = true;
-					instance->smp_affinity_enable = false;
-				}
-
 				/* Save 1-15 reply post index address to local memory
 				 * Index 0 is already saved from reg offset
 				 * MPI2_REPLY_POST_HOST_INDEX_OFFSET
@@ -6145,8 +6127,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
 			goto fail_init_adapter;
 	}
 
-	megasas_setup_reply_map(instance);
-
 	dev_info(&instance->pdev->dev,
 		"current msix/online cpus\t: (%d/%d)\n",
 		instance->msix_vectors, (unsigned int)num_online_cpus());
@@ -6780,6 +6760,9 @@ static int megasas_io_attach(struct megasas_instance *instance)
 	host->max_id = MEGASAS_MAX_DEV_PER_CHANNEL;
 	host->max_lun = MEGASAS_MAX_LUN;
 	host->max_cmd_len = 16;
+	if (instance->adapter_type != MFI_SERIES && instance->msix_vectors > 0)
+		host->nr_hw_queues = instance->msix_vectors -
+			instance->low_latency_index_start;
 
 	/*
 	 * Notify the mid-layer about the new controller
@@ -6947,11 +6930,6 @@ static inline int megasas_alloc_mfi_ctrl_mem(struct megasas_instance *instance)
  */
 static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
 {
-	instance->reply_map = kcalloc(nr_cpu_ids, sizeof(unsigned int),
-				      GFP_KERNEL);
-	if (!instance->reply_map)
-		return -ENOMEM;
-
 	switch (instance->adapter_type) {
 	case MFI_SERIES:
 		if (megasas_alloc_mfi_ctrl_mem(instance))
@@ -6968,8 +6946,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
 
 	return 0;
  fail:
-	kfree(instance->reply_map);
-	instance->reply_map = NULL;
 	return -ENOMEM;
 }
 
@@ -6982,7 +6958,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
  */
 static inline void megasas_free_ctrl_mem(struct megasas_instance *instance)
 {
-	kfree(instance->reply_map);
 	if (instance->adapter_type == MFI_SERIES) {
 		if (instance->producer)
 			dma_free_coherent(&instance->pdev->dev, sizeof(u32),
@@ -7645,8 +7620,6 @@ megasas_resume(struct pci_dev *pdev)
 	if (rval < 0)
 		goto fail_reenable_msix;
 
-	megasas_setup_reply_map(instance);
-
 	if (instance->adapter_type != MFI_SERIES) {
 		megasas_reset_reply_desc(instance);
 		if (megasas_ioc_init_fusion(instance)) {
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index e301458bcbae..bae96b82bb10 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2731,6 +2731,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 	struct MR_PRIV_DEVICE *mrdev_priv;
 	struct RAID_CONTEXT *rctx;
 	struct RAID_CONTEXT_G35 *rctx_g35;
+	u32 tag = blk_mq_unique_tag(scp->request);
 
 	device_id = MEGASAS_DEV_INDEX(scp);
 
@@ -2837,7 +2838,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 				    instance->msix_vectors));
 	else
 		cmd->request_desc->SCSIIO.MSIxIndex =
-			instance->reply_map[raw_smp_processor_id()];
+			blk_mq_unique_tag_to_hwq(tag);
 
 	if (instance->adapter_type >= VENTURA_SERIES) {
 		/* FP for Optimal raid level 1.
@@ -3080,6 +3081,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 	u16 pd_index = 0;
 	u16 os_timeout_value;
 	u16 timeout_limit;
+	u32 tag = blk_mq_unique_tag(scmd->request);
 	struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
 	struct RAID_CONTEXT	*pRAID_Context;
 	struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
@@ -3169,7 +3171,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 				    instance->msix_vectors));
 	else
 		cmd->request_desc->SCSIIO.MSIxIndex =
-			instance->reply_map[raw_smp_processor_id()];
+			blk_mq_unique_tag_to_hwq(tag);
 
 	if (!fp_possible) {
 		/* system pd firmware path */
@@ -3373,7 +3375,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
 {
 	struct megasas_cmd_fusion *cmd, *r1_cmd = NULL;
 	union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
-	u32 index;
+	u32 index, blk_tag, unique_tag;
 
 	if ((megasas_cmd_type(scmd) == READ_WRITE_LDIO) &&
 		instance->ldio_threshold &&
@@ -3389,7 +3391,9 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
 		return SCSI_MLQUEUE_HOST_BUSY;
 	}
 
-	cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);
+	unique_tag = blk_mq_unique_tag(scmd->request);
+	blk_tag = blk_mq_unique_tag_to_tag(unique_tag);
+	cmd = megasas_get_cmd_fusion(instance, blk_tag);
 
 	if (!cmd) {
 		atomic_dec(&instance->fw_outstanding);
@@ -3430,7 +3434,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
 	 */
 	if (cmd->r1_alt_dev_handle != MR_DEVHANDLE_INVALID) {
 		r1_cmd = megasas_get_cmd_fusion(instance,
-				(scmd->request->tag + instance->max_fw_cmds));
+				(blk_tag + instance->max_fw_cmds));
 		megasas_prepare_secondRaid1_IO(instance, cmd, r1_cmd);
 	}
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/11] smartpqi: enable host tagset
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (8 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 09/11] megaraid_sas: switch fusion adapters " Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2019-12-02 15:39 ` [PATCH 11/11] hpsa: enable host_tagset and switch to MQ Hannes Reinecke
  2020-02-26 11:09 ` [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

Enable host tagset for smartpqi; with this we can use the request
tag to look command from the pool avoiding the list iteration in
the hot path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/smartpqi/smartpqi_init.c | 38 ++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index 7b7ef3acb504..c17b533c84ad 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -575,17 +575,29 @@ static inline void pqi_reinit_io_request(struct pqi_io_request *io_request)
 }
 
 static struct pqi_io_request *pqi_alloc_io_request(
-	struct pqi_ctrl_info *ctrl_info)
+	struct pqi_ctrl_info *ctrl_info, struct scsi_cmnd *scmd)
 {
 	struct pqi_io_request *io_request;
+	unsigned int limit = PQI_RESERVED_IO_SLOTS;
 	u16 i = ctrl_info->next_io_request_slot;	/* benignly racy */
 
-	while (1) {
+	if (scmd) {
+		u32 blk_tag = blk_mq_unique_tag(scmd->request);
+
+		i = blk_mq_unique_tag_to_tag(blk_tag) + limit;
 		io_request = &ctrl_info->io_request_pool[i];
-		if (atomic_inc_return(&io_request->refcount) == 1)
-			break;
-		atomic_dec(&io_request->refcount);
-		i = (i + 1) % ctrl_info->max_io_slots;
+		if (WARN_ON(atomic_inc_return(&io_request->refcount) > 1)) {
+			atomic_dec(&io_request->refcount);
+			return NULL;
+		}
+	} else {
+		while (1) {
+			io_request = &ctrl_info->io_request_pool[i];
+			if (atomic_inc_return(&io_request->refcount) == 1)
+				break;
+			atomic_dec(&io_request->refcount);
+			i = (i + 1) % limit;
+		}
 	}
 
 	/* benignly racy */
@@ -4075,7 +4087,7 @@ static int pqi_submit_raid_request_synchronous(struct pqi_ctrl_info *ctrl_info,
 
 	atomic_inc(&ctrl_info->sync_cmds_outstanding);
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 
 	put_unaligned_le16(io_request->index,
 		&(((struct pqi_raid_path_request *)request)->request_id));
@@ -5032,7 +5044,9 @@ static inline int pqi_raid_submit_scsi_cmd(struct pqi_ctrl_info *ctrl_info,
 {
 	struct pqi_io_request *io_request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
 
 	return pqi_raid_submit_scsi_cmd_with_io_request(ctrl_info, io_request,
 		device, scmd, queue_group);
@@ -5230,7 +5244,10 @@ static int pqi_aio_submit_io(struct pqi_ctrl_info *ctrl_info,
 	struct pqi_io_request *io_request;
 	struct pqi_aio_path_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
+
 	io_request->io_complete_callback = pqi_aio_io_complete;
 	io_request->scmd = scmd;
 	io_request->raid_bypass = raid_bypass;
@@ -5657,7 +5674,7 @@ static int pqi_lun_reset(struct pqi_ctrl_info *ctrl_info,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct pqi_task_management_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 	io_request->io_complete_callback = pqi_lun_reset_complete;
 	io_request->context = &wait;
 
@@ -6504,6 +6521,7 @@ static struct scsi_host_template pqi_driver_template = {
 	.map_queues = pqi_map_queues,
 	.sdev_attrs = pqi_sdev_attrs,
 	.shost_attrs = pqi_shost_attrs,
+	.host_tagset = 1,
 };
 
 static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/11] hpsa: enable host_tagset and switch to MQ
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (9 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 10/11] smartpqi: enable host tagset Hannes Reinecke
@ 2019-12-02 15:39 ` Hannes Reinecke
  2020-02-26 11:09 ` [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  11 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-02 15:39 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, John Garry,
	Ming Lei, linux-scsi, linux-block, Hannes Reinecke

The smart array HBAs can steer interrupt completion, so this
patch switches the implementation to use multiqueue and enables
'host_tagset' as the HBA has a shared host-wide tagset.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/hpsa.c | 44 +++++++-------------------------------------
 drivers/scsi/hpsa.h |  1 -
 2 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index ac39ed79ccaa..2b811c981b43 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -974,6 +974,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 2048,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -1138,12 +1139,14 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
 static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	struct CommandList *c, int reply_queue)
 {
+	u32 blk_tag = blk_mq_unique_tag(c->scsi_cmd->request);
+
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
 	if (c->device)
 		atomic_inc(&c->device->commands_outstanding);
 
-	reply_queue = h->reply_map[raw_smp_processor_id()];
+	reply_queue = blk_mq_unique_tag_to_hwq(blk_tag);
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -5628,8 +5631,6 @@ static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
 	/* Get the ptr to our adapter structure out of cmd->host. */
 	h = sdev_to_hba(cmd->device);
 
-	BUG_ON(cmd->request->tag < 0);
-
 	dev = cmd->device->hostdata;
 	if (!dev) {
 		cmd->result = DID_NO_CONNECT << 16;
@@ -5805,7 +5806,7 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)
 	sh->hostdata[0] = (unsigned long) h;
 	sh->irq = pci_irq_vector(h->pdev, 0);
 	sh->unique_id = sh->irq;
-
+	sh->nr_hw_queues = h->msix_vectors > 0 ? h->msix_vectors : 1;
 	h->scsi_host = sh;
 	return 0;
 }
@@ -5831,7 +5832,8 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
  */
 static int hpsa_get_cmd_index(struct scsi_cmnd *scmd)
 {
-	int idx = scmd->request->tag;
+	u32 blk_tag = blk_mq_unique_tag(scmd->request);
+	int idx = blk_mq_unique_tag_to_tag(blk_tag);
 
 	if (idx < 0)
 		return idx;
@@ -7431,26 +7433,6 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
-static void hpsa_setup_reply_map(struct ctlr_info *h)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu;
-
-	for (queue = 0; queue < h->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(h->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			h->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		h->reply_map[cpu] = 0;
-}
-
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7847,9 +7829,6 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	if (err)
 		goto clean1;
 
-	/* setup mapping between CPU and reply queue */
-	hpsa_setup_reply_map(h);
-
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8575,7 +8554,6 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 
 static void hpda_free_ctlr_info(struct ctlr_info *h)
 {
-	kfree(h->reply_map);
 	kfree(h);
 }
 
@@ -8584,14 +8562,6 @@ static struct ctlr_info *hpda_alloc_ctlr_info(void)
 	struct ctlr_info *h;
 
 	h = kzalloc(sizeof(*h), GFP_KERNEL);
-	if (!h)
-		return NULL;
-
-	h->reply_map = kcalloc(nr_cpu_ids, sizeof(*h->reply_map), GFP_KERNEL);
-	if (!h->reply_map) {
-		kfree(h);
-		return NULL;
-	}
 	return h;
 }
 
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index f8c88fc7b80a..ea4a609e3eb7 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -161,7 +161,6 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
-	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth()
  2019-12-02 15:39 ` [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth() Hannes Reinecke
@ 2019-12-03 14:30   ` John Garry
  2019-12-03 14:53     ` Hannes Reinecke
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2019-12-03 14:30 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

On 02/12/2019 15:39, Hannes Reinecke wrote:
> The function does not set the depth, but rather transitions from
> shared to non-shared queues and vice versa.
> So rename it to blk_mq_update_tag_set_shared() to better reflect
> its purpose.

We will probably need to split this patch into two, as we're doing more 
than just renaming.

The prep patches could be picked up earlier if we're lucky.

Thanks,
John

> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>   block/blk-mq-tag.c | 18 ++++++++++--------
>   block/blk-mq.c     |  8 ++++----
>   2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index d7aa23c82dbf..f5009587e1b5 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -440,24 +440,22 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
>   				       node);
>   }
>   
> -static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
> -						   int node, int alloc_policy)
> +static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
> +				   int node, int alloc_policy)
>   {
>   	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
>   	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
>   
>   	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
> -		goto free_tags;
> +		return -ENOMEM;
>   	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
>   		     node))
>   		goto free_bitmap_tags;
>   
> -	return tags;
> +	return 0;
>   free_bitmap_tags:
>   	sbitmap_queue_free(&tags->bitmap_tags);
> -free_tags:
> -	kfree(tags);
> -	return NULL;
> +	return -ENOMEM;
>   }
>   
>   struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> @@ -478,7 +476,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>   	tags->nr_tags = total_tags;
>   	tags->nr_reserved_tags = reserved_tags;
>   
> -	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
> +	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
> +		kfree(tags);
> +		tags = NULL;
> +	}
> +	return tags;
>   }
>   
>   void blk_mq_free_tags(struct blk_mq_tags *tags)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6b39cf0efdcd..91950d3e436a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2581,8 +2581,8 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
>   	}
>   }
>   
> -static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
> -					bool shared)
> +static void blk_mq_update_tag_set_shared(struct blk_mq_tag_set *set,
> +					 bool shared)
>   {
>   	struct request_queue *q;
>   
> @@ -2605,7 +2605,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
>   		/* just transitioned to unshared */
>   		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
>   		/* update existing queue */
> -		blk_mq_update_tag_set_depth(set, false);
> +		blk_mq_update_tag_set_shared(set, false);
>   	}
>   	mutex_unlock(&set->tag_list_lock);
>   	INIT_LIST_HEAD(&q->tag_set_list);
> @@ -2623,7 +2623,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
>   	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
>   		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
>   		/* update existing queue */
> -		blk_mq_update_tag_set_depth(set, true);
> +		blk_mq_update_tag_set_shared(set, true);
>   	}
>   	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
>   		queue_set_hctx_shared(q, true);
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth()
  2019-12-03 14:30   ` John Garry
@ 2019-12-03 14:53     ` Hannes Reinecke
  0 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-03 14:53 UTC (permalink / raw)
  To: John Garry, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

On 12/3/19 3:30 PM, John Garry wrote:
> On 02/12/2019 15:39, Hannes Reinecke wrote:
>> The function does not set the depth, but rather transitions from
>> shared to non-shared queues and vice versa.
>> So rename it to blk_mq_update_tag_set_shared() to better reflect
>> its purpose.
> 
> We will probably need to split this patch into two, as we're doing more
> than just renaming.
> 
> The prep patches could be picked up earlier if we're lucky.
> 
> Thanks,
> John
> 
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   block/blk-mq-tag.c | 18 ++++++++++--------
>>   block/blk-mq.c     |  8 ++++----
>>   2 files changed, 14 insertions(+), 12 deletions(-)
>>
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index d7aa23c82dbf..f5009587e1b5 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -440,24 +440,22 @@ static int bt_alloc(struct sbitmap_queue *bt,
>> unsigned int depth,
>>                          node);
>>   }
>>   -static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct
>> blk_mq_tags *tags,
>> -                           int node, int alloc_policy)
>> +static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>> +                   int node, int alloc_policy)
>>   {
>>       unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
>>       bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
>>         if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
>> -        goto free_tags;
>> +        return -ENOMEM;
>>       if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags,
>> round_robin,
>>                node))
>>           goto free_bitmap_tags;
>>   -    return tags;
>> +    return 0;
>>   free_bitmap_tags:
>>       sbitmap_queue_free(&tags->bitmap_tags);
>> -free_tags:
>> -    kfree(tags);
>> -    return NULL;
>> +    return -ENOMEM;
>>   }
>>     struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>> @@ -478,7 +476,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int
>> total_tags,
>>       tags->nr_tags = total_tags;
>>       tags->nr_reserved_tags = reserved_tags;
>>   -    return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
>> +    if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
>> +        kfree(tags);
>> +        tags = NULL;
>> +    }
>> +    return tags;
>>   }
>>     void blk_mq_free_tags(struct blk_mq_tags *tags)
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 6b39cf0efdcd..91950d3e436a 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2581,8 +2581,8 @@ static void queue_set_hctx_shared(struct
>> request_queue *q, bool shared)
>>       }
>>   }
>>   -static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
>> -                    bool shared)
>> +static void blk_mq_update_tag_set_shared(struct blk_mq_tag_set *set,
>> +                     bool shared)
>>   {
>>       struct request_queue *q;
>>   @@ -2605,7 +2605,7 @@ static void blk_mq_del_queue_tag_set(struct
>> request_queue *q)
>>           /* just transitioned to unshared */
>>           set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
>>           /* update existing queue */
>> -        blk_mq_update_tag_set_depth(set, false);
>> +        blk_mq_update_tag_set_shared(set, false);
>>       }
>>       mutex_unlock(&set->tag_list_lock);
>>       INIT_LIST_HEAD(&q->tag_set_list);
>> @@ -2623,7 +2623,7 @@ static void blk_mq_add_queue_tag_set(struct
>> blk_mq_tag_set *set,
>>           !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
>>           set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
>>           /* update existing queue */
>> -        blk_mq_update_tag_set_depth(set, true);
>> +        blk_mq_update_tag_set_shared(set, true);
>>       }
>>       if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
>>           queue_set_hctx_shared(q, true);
>>
> 
Hmm. That shouldn't have happened; I'm sure I've had it in two patches
originally. Oh well.

But I'd rather wait for feedback to my shared scheduler tag patch; guess
that'll meet with some raise eyebrows ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset
  2019-12-02 15:39 ` [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
@ 2019-12-03 14:54   ` John Garry
  2019-12-03 15:02     ` Hannes Reinecke
  2019-12-03 16:38     ` Bart Van Assche
  0 siblings, 2 replies; 37+ messages in thread
From: John Garry @ 2019-12-03 14:54 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

>   
> @@ -483,8 +483,8 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
>   	res = mutex_lock_interruptible(&q->sysfs_lock);
>   	if (res)
>   		goto out;
> -	if (hctx->tags)
> -		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
> +	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */

Bart's solution to this problem seemed ok, if he doesn't mind us 
borrowing his idea:

https://lore.kernel.org/linux-block/5183ab13-0c81-95f0-95ba-40318569c6c6@huawei.com/T/#m24394fe70b1ea79a154dfd9620f5e553c3e7e7da

See hctx_tags_bitmap_show().

It might be also reasonable to put that in another follow on patch, as 
there would be no enablers of the "shared" bitmap until later patches.

> +		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
>   	mutex_unlock(&q->sysfs_lock);
>   
>   out:
> @@ -518,7 +518,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
>   	if (res)
>   		goto out;
>   	if (hctx->sched_tags)
> -		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
> +		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
>   	mutex_unlock(&q->sysfs_lock);
>   
>   out:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..1855f8f5edd4 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
>   
>   int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>   {
> +	struct blk_mq_tag_set *tag_set = q->tag_set;
>   	struct blk_mq_hw_ctx *hctx;
>   	struct elevator_queue *eq;
>   	unsigned int i;
> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>   		blk_mq_debugfs_register_sched_hctx(q, hctx);
>   	}
>   
> +	if (blk_mq_is_sbitmap_shared(tag_set)) {
> +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +		queue_for_each_hw_ctx(q, hctx, i) {
> +			struct blk_mq_tags *tags = hctx->sched_tags;
> +
> +			tags->bitmap_tags = &tag_set->__sched_bitmap_tags;
> +			tags->breserved_tags = &tag_set->__sched_breserved_tags;
> +		}
> +	}
> +
>   	return 0;
>   
>   err:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index f5009587e1b5..2e714123e846 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -20,7 +20,7 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
>   	if (!tags)
>   		return true;
>   
> -	return sbitmap_any_bit_clear(&tags->bitmap_tags.sb);
> +	return sbitmap_any_bit_clear(&tags->bitmap_tags->sb);
>   }
>   
>   /*
> @@ -43,9 +43,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>    */
>   void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>   {
> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> +	sbitmap_queue_wake_all(tags->bitmap_tags);
>   	if (include_reserve)
> -		sbitmap_queue_wake_all(&tags->breserved_tags);
> +		sbitmap_queue_wake_all(tags->breserved_tags);
>   }
>   
>   /*
> @@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>   			WARN_ON_ONCE(1);
>   			return BLK_MQ_TAG_FAIL;
>   		}
> -		bt = &tags->breserved_tags;
> +		bt = tags->breserved_tags;

We could put all of this in an earlier patch (as you had in v4, modulo 
dynamic memory part), which would be easier to review and get accepted.

>   		tag_offset = 0;
>   	} else {
> -		bt = &tags->bitmap_tags;
> +		bt = tags->bitmap_tags;
>   		tag_offset = tags->nr_reserved_tags;
>   	}


[...]

>   	if (!set)
> @@ -3160,6 +3179,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>   							false);
>   		} else {
> +			sched_tags = true;
>   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>   							nr, true);
>   		}
> @@ -3169,8 +3189,43 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>   			q->elevator->type->ops.depth_updated(hctx);
>   	}
>   
> -	if (!ret)
> +	/*
> +	 * if ret is 0, all queues should have been updated to the same depth
> +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> +	 * if some are updated, we should probably roll back the change altogether. FIXME
> +	 */

If you don't have a shared sched bitmap - which I didn't think we needed 
- then all we need is a simple sbitmap_queue_resize(&tagset->__bitmap_tags)

Otherwise it's horrible to resize shared sched bitmaps...

> +	if (!ret) {
> +		if (blk_mq_is_sbitmap_shared(set)) {
> +			if (sched_tags) {
> +				blk_mq_exit_shared_sched_sbitmap(set);
> +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> +					return -ENOMEM; /* fixup error handling */
> +
> +				queue_for_each_hw_ctx(q, hctx, i) {
> +					hctx->sched_tags->bitmap_tags =
> +						&set->__sched_bitmap_tags;
> +					hctx->sched_tags->breserved_tags =
> +						&set->__sched_breserved_tags;
> +				}
> +			} else {
> +				blk_mq_exit_shared_sbitmap(set);
> +				if (!blk_mq_init_shared_sbitmap(set))
> +					return -ENOMEM; /* fixup error handling */
> +
> +				queue_for_each_hw_ctx(q, hctx, i) {
> +					hctx->tags->bitmap_tags =
> +						&set->__bitmap_tags;
> +					hctx->tags->breserved_tags =
> +						&set->__breserved_tags;
> +				}
> +			}
> +		}
>   		q->nr_requests = nr;
> +	}
> +	/*
> +	 * if ret != 0, q->nr_requests would not be updated, yet the depth
> +	 * for some hctx may have changed - is that right?
> +	 */
>   
>   	blk_mq_unquiesce_queue(q);
>   	blk_mq_unfreeze_queue(q);
> diff --git a/block/blk-mq.h b/block/blk-mq.h
> index 78d38b5f2793..4c1ea206d3f4 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
>   	struct blk_mq_hw_ctx *hctx;
>   };
>   
> +static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
> +{
> +	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);

Bart already gave some comments on this

> +}
> +
>   static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
>   {
>   	if (data->flags & BLK_MQ_REQ_INTERNAL)
> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> index 34dcea0ef637..a7a537501d70 100644
> --- a/block/kyber-iosched.c
> +++ b/block/kyber-iosched.c
> @@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
>   	 * All of the hardware queues have the same depth, so we can just grab
>   	 * the shift of the first one.
>   	 */
> -	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
> +	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
>   }
>   
>   static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
> @@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
>   	khd->batching = 0;
>   
>   	hctx->sched_data = khd;
> -	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
> +	sbitmap_queue_min_shallow_depth(hctx->sched_tags->bitmap_tags,
>   					kqd->async_depth);
>   
>   	return 0;
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 147185394a25..10c9ed3dbe80 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -109,6 +109,12 @@ struct blk_mq_tag_set {
>   	unsigned int		flags;		/* BLK_MQ_F_* */
>   	void			*driver_data;
>   
> +	struct sbitmap_queue	__bitmap_tags;
> +	struct sbitmap_queue	__breserved_tags;
> +
> +	struct sbitmap_queue	__sched_bitmap_tags;
> +	struct sbitmap_queue	__sched_breserved_tags;
> +
>   	struct blk_mq_tags	**tags;
>   
>   	struct mutex		tag_list_lock;
> @@ -226,6 +232,9 @@ struct blk_mq_ops {
>   enum {
>   	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
>   	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
> +	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
> +	BLK_MQ_F_TAG_BITMAP_ALLOCATED	= 1 << 3,
> +	BLK_MQ_F_TAG_SCHED_BITMAP_ALLOCATED = 1 << 4,
>   	BLK_MQ_F_BLOCKING	= 1 << 5,
>   	BLK_MQ_F_NO_SCHED	= 1 << 6,
>   	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset
  2019-12-03 14:54   ` John Garry
@ 2019-12-03 15:02     ` Hannes Reinecke
  2019-12-04 10:24       ` John Garry
  2019-12-03 16:38     ` Bart Van Assche
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-03 15:02 UTC (permalink / raw)
  To: John Garry, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

On 12/3/19 3:54 PM, John Garry wrote:
>>   @@ -483,8 +483,8 @@ static int hctx_tags_bitmap_show(void *data,
>> struct seq_file *m)
>>       res = mutex_lock_interruptible(&q->sysfs_lock);
>>       if (res)
>>           goto out;
>> -    if (hctx->tags)
>> -        sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
>> +    if (hctx->tags) /* We should just iterate the relevant bits for
>> this hctx FIXME */
> 
> Bart's solution to this problem seemed ok, if he doesn't mind us
> borrowing his idea:
> 
> https://lore.kernel.org/linux-block/5183ab13-0c81-95f0-95ba-40318569c6c6@huawei.com/T/#m24394fe70b1ea79a154dfd9620f5e553c3e7e7da
> 
> 
> See hctx_tags_bitmap_show().
> 
> It might be also reasonable to put that in another follow on patch, as
> there would be no enablers of the "shared" bitmap until later patches.
> 
Yeah, that was my plan, too.
But then I'd rather wait for feedback on the general approach here;
no point is wasting perfectly good bits if no-one's wanting them ...

[ .. ]
>> @@ -121,10 +121,10 @@ unsigned int blk_mq_get_tag(struct
>> blk_mq_alloc_data *data)
>>               WARN_ON_ONCE(1);
>>               return BLK_MQ_TAG_FAIL;
>>           }
>> -        bt = &tags->breserved_tags;
>> +        bt = tags->breserved_tags;
> 
> We could put all of this in an earlier patch (as you had in v4, modulo
> dynamic memory part), which would be easier to review and get accepted.
> 
Yeah, but I felt it a bit odd, just having pointers to an existing
structure element.
But yes, will be doing for the next round.

>>           tag_offset = 0;
>>       } else {
>> -        bt = &tags->bitmap_tags;
>> +        bt = tags->bitmap_tags;
>>           tag_offset = tags->nr_reserved_tags;
>>       }
> 
> 
> [...]
> 
>>       if (!set)
>> @@ -3160,6 +3179,7 @@ int blk_mq_update_nr_requests(struct
>> request_queue *q, unsigned int nr)
>>               ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>>                               false);
>>           } else {
>> +            sched_tags = true;
>>               ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>                               nr, true);
>>           }
>> @@ -3169,8 +3189,43 @@ int blk_mq_update_nr_requests(struct
>> request_queue *q, unsigned int nr)
>>               q->elevator->type->ops.depth_updated(hctx);
>>       }
>>   -    if (!ret)
>> +    /*
>> +     * if ret is 0, all queues should have been updated to the same
>> depth
>> +     * if not, then maybe some have been updated - yuk, need to
>> handle this for shared sbitmap...
>> +     * if some are updated, we should probably roll back the change
>> altogether. FIXME
>> +     */
> 
> If you don't have a shared sched bitmap - which I didn't think we needed
> - then all we need is a simple sbitmap_queue_resize(&tagset->__bitmap_tags)
> 
> Otherwise it's horrible to resize shared sched bitmaps...
> 
Resizing shared sched bitmaps is done in patch 6/11.
General idea is to move the scheduler bitmap into the request queue
(well, actually the elevator), as this gives us a per-request_queue
bitmap. Which is actually what we want here, as the scheduler will need
to look at all requests, hence needing to access to the same bitmap.
And it also gives us an easy way of resizing the sched tag bitmap, as
then we can resize the bitmap on a per-queue basis, and leave the
underlying tagset bitmap untouched.

[ .. ]
>> diff --git a/block/blk-mq.h b/block/blk-mq.h
>> index 78d38b5f2793..4c1ea206d3f4 100644
>> --- a/block/blk-mq.h
>> +++ b/block/blk-mq.h
>> @@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
>>       struct blk_mq_hw_ctx *hctx;
>>   };
>>   +static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set
>> *tag_set)
>> +{
>> +    return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
> 
> Bart already gave some comments on this
> 
Ah. Missed that one. Will be including in the next round.

Thanks for the review!

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset
  2019-12-03 14:54   ` John Garry
  2019-12-03 15:02     ` Hannes Reinecke
@ 2019-12-03 16:38     ` Bart Van Assche
  1 sibling, 0 replies; 37+ messages in thread
From: Bart Van Assche @ 2019-12-03 16:38 UTC (permalink / raw)
  To: John Garry, Hannes Reinecke, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

On 12/3/19 6:54 AM, John Garry wrote:
>> @@ -483,8 +483,8 @@ static int hctx_tags_bitmap_show(void *data, 
>> struct seq_file *m)
>>       res = mutex_lock_interruptible(&q->sysfs_lock);
>>       if (res)
>>           goto out;
>> -    if (hctx->tags)
>> -        sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
>> +    if (hctx->tags) /* We should just iterate the relevant bits for 
>> this hctx FIXME */
> 
> Bart's solution to this problem seemed ok, if he doesn't mind us 
> borrowing his idea:
> 
> https://lore.kernel.org/linux-block/5183ab13-0c81-95f0-95ba-40318569c6c6@huawei.com/T/#m24394fe70b1ea79a154dfd9620f5e553c3e7e7da 
>  
> See hctx_tags_bitmap_show().

Hi John,

Sure, borrowing that code is fine with me.

Bart.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset
  2019-12-03 15:02     ` Hannes Reinecke
@ 2019-12-04 10:24       ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2019-12-04 10:24 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley, Ming Lei,
	linux-scsi, linux-block

On 03/12/2019 15:02, Hannes Reinecke wrote:
>>> +     */
>> If you don't have a shared sched bitmap - which I didn't think we needed
>> - then all we need is a simple sbitmap_queue_resize(&tagset->__bitmap_tags)
>>
>> Otherwise it's horrible to resize shared sched bitmaps...
>>
> Resizing shared sched bitmaps is done in patch 6/11.
> General idea is to move the scheduler bitmap into the request queue
> (well, actually the elevator), as this gives us a per-request_queue
> bitmap. Which is actually what we want here, as the scheduler will need
> to look at all requests, hence needing to access to the same bitmap.
> And it also gives us an easy way of resizing the sched tag bitmap, as
> then we can resize the bitmap on a per-queue basis, and leave the
> underlying tagset bitmap untouched.

OK, but I am just concerned if that is really required in this series 
and whether it is just another obstacle to getting it accepted.

Thanks,
John

> 
> [ .. ]
>>> diff --git a/block/blk-mq.h b/block/blk-mq.h
>>> index 78d38b5f2793..4c1ea206d3f4 100644
>>> --- a/block/blk-mq.h
>>> +++ b/block/blk-mq.h


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2019-12-02 15:39 ` [PATCH 09/11] megaraid_sas: switch fusion adapters " Hannes Reinecke
@ 2019-12-09 10:10   ` Sumit Saxena
  2019-12-09 11:02     ` Hannes Reinecke
  2020-01-09 11:55     ` John Garry
  0 siblings, 2 replies; 37+ messages in thread
From: Sumit Saxena @ 2019-12-09 10:10 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, John Garry, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>
> Fusion adapters can steer completions to individual queues, and
> we now have support for shared host-wide tags.
> So we can enable multiqueue support for fusion adapters and
> drop the hand-crafted interrupt affinity settings.

Hi Hannes,

Ming Lei also proposed similar changes in megaraid_sas driver some
time back and it had resulted in performance drop-
https://patchwork.kernel.org/patch/10969511/

So, we will do some performance tests with this patch and update you.

Thanks,
Sumit
>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/megaraid/megaraid_sas.h        |  1 -
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 65 +++++++++--------------------
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 ++++---
>  3 files changed, 28 insertions(+), 52 deletions(-)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
> index bd8184072bed..844ea2d6dbb8 100644
> --- a/drivers/scsi/megaraid/megaraid_sas.h
> +++ b/drivers/scsi/megaraid/megaraid_sas.h
> @@ -2261,7 +2261,6 @@ enum MR_PERF_MODE {
>
>  struct megasas_instance {
>
> -       unsigned int *reply_map;
>         __le32 *producer;
>         dma_addr_t producer_h;
>         __le32 *consumer;
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index a4bc81479284..9d0d74e3d491 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -37,6 +37,7 @@
>  #include <linux/poll.h>
>  #include <linux/vmalloc.h>
>  #include <linux/irq_poll.h>
> +#include <linux/blk-mq-pci.h>
>
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -3106,6 +3107,19 @@ megasas_bios_param(struct scsi_device *sdev, struct block_device *bdev,
>         return 0;
>  }
>
> +static int megasas_map_queues(struct Scsi_Host *shost)
> +{
> +       struct megasas_instance *instance;
> +
> +       instance = (struct megasas_instance *)shost->hostdata;
> +
> +       if (!instance->smp_affinity_enable)
> +               return 0;
> +
> +       return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
> +                       instance->pdev, instance->low_latency_index_start);
> +}
> +
>  static void megasas_aen_polling(struct work_struct *work);
>
>  /**
> @@ -3414,9 +3428,11 @@ static struct scsi_host_template megasas_template = {
>         .eh_timed_out = megasas_reset_timer,
>         .shost_attrs = megaraid_host_attrs,
>         .bios_param = megasas_bios_param,
> +       .map_queues = megasas_map_queues,
>         .change_queue_depth = scsi_change_queue_depth,
>         .max_segment_size = 0xffffffff,
>         .no_write_same = 1,
> +       .host_tagset = 1,
>  };
>
>  /**
> @@ -5695,34 +5711,6 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
>                 instance->use_seqnum_jbod_fp = false;
>  }
>
> -static void megasas_setup_reply_map(struct megasas_instance *instance)
> -{
> -       const struct cpumask *mask;
> -       unsigned int queue, cpu, low_latency_index_start;
> -
> -       low_latency_index_start = instance->low_latency_index_start;
> -
> -       for (queue = low_latency_index_start; queue < instance->msix_vectors; queue++) {
> -               mask = pci_irq_get_affinity(instance->pdev, queue);
> -               if (!mask)
> -                       goto fallback;
> -
> -               for_each_cpu(cpu, mask)
> -                       instance->reply_map[cpu] = queue;
> -       }
> -       return;
> -
> -fallback:
> -       queue = low_latency_index_start;
> -       for_each_possible_cpu(cpu) {
> -               instance->reply_map[cpu] = queue;
> -               if (queue == (instance->msix_vectors - 1))
> -                       queue = low_latency_index_start;
> -               else
> -                       queue++;
> -       }
> -}
> -
>  /**
>   * megasas_get_device_list -   Get the PD and LD device list from FW.
>   * @instance:                  Adapter soft state
> @@ -6021,12 +6009,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
>                                         instance->is_rdpq = (scratch_pad_1 & MR_RDPQ_MODE_OFFSET) ?
>                                                                 1 : 0;
>
> -                               if (instance->adapter_type >= INVADER_SERIES &&
> -                                   !instance->msix_combined) {
> -                                       instance->msix_load_balance = true;
> -                                       instance->smp_affinity_enable = false;
> -                               }
> -
>                                 /* Save 1-15 reply post index address to local memory
>                                  * Index 0 is already saved from reg offset
>                                  * MPI2_REPLY_POST_HOST_INDEX_OFFSET
> @@ -6145,8 +6127,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
>                         goto fail_init_adapter;
>         }
>
> -       megasas_setup_reply_map(instance);
> -
>         dev_info(&instance->pdev->dev,
>                 "current msix/online cpus\t: (%d/%d)\n",
>                 instance->msix_vectors, (unsigned int)num_online_cpus());
> @@ -6780,6 +6760,9 @@ static int megasas_io_attach(struct megasas_instance *instance)
>         host->max_id = MEGASAS_MAX_DEV_PER_CHANNEL;
>         host->max_lun = MEGASAS_MAX_LUN;
>         host->max_cmd_len = 16;
> +       if (instance->adapter_type != MFI_SERIES && instance->msix_vectors > 0)
> +               host->nr_hw_queues = instance->msix_vectors -
> +                       instance->low_latency_index_start;
>
>         /*
>          * Notify the mid-layer about the new controller
> @@ -6947,11 +6930,6 @@ static inline int megasas_alloc_mfi_ctrl_mem(struct megasas_instance *instance)
>   */
>  static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>  {
> -       instance->reply_map = kcalloc(nr_cpu_ids, sizeof(unsigned int),
> -                                     GFP_KERNEL);
> -       if (!instance->reply_map)
> -               return -ENOMEM;
> -
>         switch (instance->adapter_type) {
>         case MFI_SERIES:
>                 if (megasas_alloc_mfi_ctrl_mem(instance))
> @@ -6968,8 +6946,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>
>         return 0;
>   fail:
> -       kfree(instance->reply_map);
> -       instance->reply_map = NULL;
>         return -ENOMEM;
>  }
>
> @@ -6982,7 +6958,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>   */
>  static inline void megasas_free_ctrl_mem(struct megasas_instance *instance)
>  {
> -       kfree(instance->reply_map);
>         if (instance->adapter_type == MFI_SERIES) {
>                 if (instance->producer)
>                         dma_free_coherent(&instance->pdev->dev, sizeof(u32),
> @@ -7645,8 +7620,6 @@ megasas_resume(struct pci_dev *pdev)
>         if (rval < 0)
>                 goto fail_reenable_msix;
>
> -       megasas_setup_reply_map(instance);
> -
>         if (instance->adapter_type != MFI_SERIES) {
>                 megasas_reset_reply_desc(instance);
>                 if (megasas_ioc_init_fusion(instance)) {
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index e301458bcbae..bae96b82bb10 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -2731,6 +2731,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
>         struct MR_PRIV_DEVICE *mrdev_priv;
>         struct RAID_CONTEXT *rctx;
>         struct RAID_CONTEXT_G35 *rctx_g35;
> +       u32 tag = blk_mq_unique_tag(scp->request);
>
>         device_id = MEGASAS_DEV_INDEX(scp);
>
> @@ -2837,7 +2838,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
>                                     instance->msix_vectors));
>         else
>                 cmd->request_desc->SCSIIO.MSIxIndex =
> -                       instance->reply_map[raw_smp_processor_id()];
> +                       blk_mq_unique_tag_to_hwq(tag);
>
>         if (instance->adapter_type >= VENTURA_SERIES) {
>                 /* FP for Optimal raid level 1.
> @@ -3080,6 +3081,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
>         u16 pd_index = 0;
>         u16 os_timeout_value;
>         u16 timeout_limit;
> +       u32 tag = blk_mq_unique_tag(scmd->request);
>         struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
>         struct RAID_CONTEXT     *pRAID_Context;
>         struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
> @@ -3169,7 +3171,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
>                                     instance->msix_vectors));
>         else
>                 cmd->request_desc->SCSIIO.MSIxIndex =
> -                       instance->reply_map[raw_smp_processor_id()];
> +                       blk_mq_unique_tag_to_hwq(tag);
>
>         if (!fp_possible) {
>                 /* system pd firmware path */
> @@ -3373,7 +3375,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>  {
>         struct megasas_cmd_fusion *cmd, *r1_cmd = NULL;
>         union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
> -       u32 index;
> +       u32 index, blk_tag, unique_tag;
>
>         if ((megasas_cmd_type(scmd) == READ_WRITE_LDIO) &&
>                 instance->ldio_threshold &&
> @@ -3389,7 +3391,9 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>                 return SCSI_MLQUEUE_HOST_BUSY;
>         }
>
> -       cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);
> +       unique_tag = blk_mq_unique_tag(scmd->request);
> +       blk_tag = blk_mq_unique_tag_to_tag(unique_tag);
> +       cmd = megasas_get_cmd_fusion(instance, blk_tag);
>
>         if (!cmd) {
>                 atomic_dec(&instance->fw_outstanding);
> @@ -3430,7 +3434,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>          */
>         if (cmd->r1_alt_dev_handle != MR_DEVHANDLE_INVALID) {
>                 r1_cmd = megasas_get_cmd_fusion(instance,
> -                               (scmd->request->tag + instance->max_fw_cmds));
> +                               (blk_tag + instance->max_fw_cmds));
>                 megasas_prepare_secondRaid1_IO(instance, cmd, r1_cmd);
>         }
>
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2019-12-09 10:10   ` Sumit Saxena
@ 2019-12-09 11:02     ` Hannes Reinecke
  2020-01-10  4:00       ` Sumit Saxena
  2020-01-09 11:55     ` John Garry
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2019-12-09 11:02 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, John Garry, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On 12/9/19 11:10 AM, Sumit Saxena wrote:
> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> Fusion adapters can steer completions to individual queues, and
>> we now have support for shared host-wide tags.
>> So we can enable multiqueue support for fusion adapters and
>> drop the hand-crafted interrupt affinity settings.
> 
> Hi Hannes,
> 
> Ming Lei also proposed similar changes in megaraid_sas driver some
> time back and it had resulted in performance drop-
> https://patchwork.kernel.org/patch/10969511/
> 
> So, we will do some performance tests with this patch and update you.
> Thank you.

I'm aware of the results of Ming Leis work, but I do hope this patchset 
performs better.

And when you do performance measurements, can you please run with both, 
'none' I/O scheduler and 'mq-deadline' I/O scheduler?
I've measured quite a performance improvements when using mq-deadline, 
up to the point where I've gotten on-par performance with the original, 
non-mq, implementation.
(As a data point, on my setup I've measured about 270k IOPS and 1092 
MB/s througput, running on just 2 SSDs).

But thanks for doing a performance test here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2019-12-09 10:10   ` Sumit Saxena
  2019-12-09 11:02     ` Hannes Reinecke
@ 2020-01-09 11:55     ` John Garry
  2020-01-09 15:19       ` Hannes Reinecke
  2020-01-10  2:00       ` Ming Lei
  1 sibling, 2 replies; 37+ messages in thread
From: John Garry @ 2020-01-09 11:55 UTC (permalink / raw)
  To: Sumit Saxena, Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 09/12/2019 10:10, Sumit Saxena wrote:
> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> Fusion adapters can steer completions to individual queues, and
>> we now have support for shared host-wide tags.
>> So we can enable multiqueue support for fusion adapters and
>> drop the hand-crafted interrupt affinity settings.
> 
> Hi Hannes,
> 
> Ming Lei also proposed similar changes in megaraid_sas driver some
> time back and it had resulted in performance drop-
> https://patchwork.kernel.org/patch/10969511/
> 
> So, we will do some performance tests with this patch and update you.
> 

Hi Sumit,

I was wondering if you had a chance to do this test yet?

It would be good to know, so we can try to progress this work.

@Hannes, This shared sbitmap work now seems to conflict with Jens work 
on tag caching 
https://lore.kernel.org/linux-block/20200107163037.31745-1-axboe@kernel.dk/T/#t, 
but should be resolvable AFAICS (v1, anyway, which I checked). Anway, we 
seem to have stalled, which I feared...

Thanks,
John

> Thanks,
> Sumit
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>   drivers/scsi/megaraid/megaraid_sas.h        |  1 -
>>   drivers/scsi/megaraid/megaraid_sas_base.c   | 65 +++++++++--------------------
>>   drivers/scsi/megaraid/megaraid_sas_fusion.c | 14 ++++---
>>   3 files changed, 28 insertions(+), 52 deletions(-)
>>
>> diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
>> index bd8184072bed..844ea2d6dbb8 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas.h
>> +++ b/drivers/scsi/megaraid/megaraid_sas.h
>> @@ -2261,7 +2261,6 @@ enum MR_PERF_MODE {
>>
>>   struct megasas_instance {
>>
>> -       unsigned int *reply_map;
>>          __le32 *producer;
>>          dma_addr_t producer_h;
>>          __le32 *consumer;
>> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
>> index a4bc81479284..9d0d74e3d491 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
>> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>> @@ -37,6 +37,7 @@
>>   #include <linux/poll.h>
>>   #include <linux/vmalloc.h>
>>   #include <linux/irq_poll.h>
>> +#include <linux/blk-mq-pci.h>
>>
>>   #include <scsi/scsi.h>
>>   #include <scsi/scsi_cmnd.h>
>> @@ -3106,6 +3107,19 @@ megasas_bios_param(struct scsi_device *sdev, struct block_device *bdev,
>>          return 0;
>>   }
>>
>> +static int megasas_map_queues(struct Scsi_Host *shost)
>> +{
>> +       struct megasas_instance *instance;
>> +
>> +       instance = (struct megasas_instance *)shost->hostdata;
>> +
>> +       if (!instance->smp_affinity_enable)
>> +               return 0;
>> +
>> +       return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
>> +                       instance->pdev, instance->low_latency_index_start);
>> +}
>> +
>>   static void megasas_aen_polling(struct work_struct *work);
>>
>>   /**
>> @@ -3414,9 +3428,11 @@ static struct scsi_host_template megasas_template = {
>>          .eh_timed_out = megasas_reset_timer,
>>          .shost_attrs = megaraid_host_attrs,
>>          .bios_param = megasas_bios_param,
>> +       .map_queues = megasas_map_queues,
>>          .change_queue_depth = scsi_change_queue_depth,
>>          .max_segment_size = 0xffffffff,
>>          .no_write_same = 1,
>> +       .host_tagset = 1,
>>   };
>>
>>   /**
>> @@ -5695,34 +5711,6 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
>>                  instance->use_seqnum_jbod_fp = false;
>>   }
>>
>> -static void megasas_setup_reply_map(struct megasas_instance *instance)
>> -{
>> -       const struct cpumask *mask;
>> -       unsigned int queue, cpu, low_latency_index_start;
>> -
>> -       low_latency_index_start = instance->low_latency_index_start;
>> -
>> -       for (queue = low_latency_index_start; queue < instance->msix_vectors; queue++) {
>> -               mask = pci_irq_get_affinity(instance->pdev, queue);
>> -               if (!mask)
>> -                       goto fallback;
>> -
>> -               for_each_cpu(cpu, mask)
>> -                       instance->reply_map[cpu] = queue;
>> -       }
>> -       return;
>> -
>> -fallback:
>> -       queue = low_latency_index_start;
>> -       for_each_possible_cpu(cpu) {
>> -               instance->reply_map[cpu] = queue;
>> -               if (queue == (instance->msix_vectors - 1))
>> -                       queue = low_latency_index_start;
>> -               else
>> -                       queue++;
>> -       }
>> -}
>> -
>>   /**
>>    * megasas_get_device_list -   Get the PD and LD device list from FW.
>>    * @instance:                  Adapter soft state
>> @@ -6021,12 +6009,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
>>                                          instance->is_rdpq = (scratch_pad_1 & MR_RDPQ_MODE_OFFSET) ?
>>                                                                  1 : 0;
>>
>> -                               if (instance->adapter_type >= INVADER_SERIES &&
>> -                                   !instance->msix_combined) {
>> -                                       instance->msix_load_balance = true;
>> -                                       instance->smp_affinity_enable = false;
>> -                               }
>> -
>>                                  /* Save 1-15 reply post index address to local memory
>>                                   * Index 0 is already saved from reg offset
>>                                   * MPI2_REPLY_POST_HOST_INDEX_OFFSET
>> @@ -6145,8 +6127,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
>>                          goto fail_init_adapter;
>>          }
>>
>> -       megasas_setup_reply_map(instance);
>> -
>>          dev_info(&instance->pdev->dev,
>>                  "current msix/online cpus\t: (%d/%d)\n",
>>                  instance->msix_vectors, (unsigned int)num_online_cpus());
>> @@ -6780,6 +6760,9 @@ static int megasas_io_attach(struct megasas_instance *instance)
>>          host->max_id = MEGASAS_MAX_DEV_PER_CHANNEL;
>>          host->max_lun = MEGASAS_MAX_LUN;
>>          host->max_cmd_len = 16;
>> +       if (instance->adapter_type != MFI_SERIES && instance->msix_vectors > 0)
>> +               host->nr_hw_queues = instance->msix_vectors -
>> +                       instance->low_latency_index_start;
>>
>>          /*
>>           * Notify the mid-layer about the new controller
>> @@ -6947,11 +6930,6 @@ static inline int megasas_alloc_mfi_ctrl_mem(struct megasas_instance *instance)
>>    */
>>   static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>>   {
>> -       instance->reply_map = kcalloc(nr_cpu_ids, sizeof(unsigned int),
>> -                                     GFP_KERNEL);
>> -       if (!instance->reply_map)
>> -               return -ENOMEM;
>> -
>>          switch (instance->adapter_type) {
>>          case MFI_SERIES:
>>                  if (megasas_alloc_mfi_ctrl_mem(instance))
>> @@ -6968,8 +6946,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>>
>>          return 0;
>>    fail:
>> -       kfree(instance->reply_map);
>> -       instance->reply_map = NULL;
>>          return -ENOMEM;
>>   }
>>
>> @@ -6982,7 +6958,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>>    */
>>   static inline void megasas_free_ctrl_mem(struct megasas_instance *instance)
>>   {
>> -       kfree(instance->reply_map);
>>          if (instance->adapter_type == MFI_SERIES) {
>>                  if (instance->producer)
>>                          dma_free_coherent(&instance->pdev->dev, sizeof(u32),
>> @@ -7645,8 +7620,6 @@ megasas_resume(struct pci_dev *pdev)
>>          if (rval < 0)
>>                  goto fail_reenable_msix;
>>
>> -       megasas_setup_reply_map(instance);
>> -
>>          if (instance->adapter_type != MFI_SERIES) {
>>                  megasas_reset_reply_desc(instance);
>>                  if (megasas_ioc_init_fusion(instance)) {
>> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> index e301458bcbae..bae96b82bb10 100644
>> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
>> @@ -2731,6 +2731,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
>>          struct MR_PRIV_DEVICE *mrdev_priv;
>>          struct RAID_CONTEXT *rctx;
>>          struct RAID_CONTEXT_G35 *rctx_g35;
>> +       u32 tag = blk_mq_unique_tag(scp->request);
>>
>>          device_id = MEGASAS_DEV_INDEX(scp);
>>
>> @@ -2837,7 +2838,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
>>                                      instance->msix_vectors));
>>          else
>>                  cmd->request_desc->SCSIIO.MSIxIndex =
>> -                       instance->reply_map[raw_smp_processor_id()];
>> +                       blk_mq_unique_tag_to_hwq(tag);
>>
>>          if (instance->adapter_type >= VENTURA_SERIES) {
>>                  /* FP for Optimal raid level 1.
>> @@ -3080,6 +3081,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
>>          u16 pd_index = 0;
>>          u16 os_timeout_value;
>>          u16 timeout_limit;
>> +       u32 tag = blk_mq_unique_tag(scmd->request);
>>          struct MR_DRV_RAID_MAP_ALL *local_map_ptr;
>>          struct RAID_CONTEXT     *pRAID_Context;
>>          struct MR_PD_CFG_SEQ_NUM_SYNC *pd_sync;
>> @@ -3169,7 +3171,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
>>                                      instance->msix_vectors));
>>          else
>>                  cmd->request_desc->SCSIIO.MSIxIndex =
>> -                       instance->reply_map[raw_smp_processor_id()];
>> +                       blk_mq_unique_tag_to_hwq(tag);
>>
>>          if (!fp_possible) {
>>                  /* system pd firmware path */
>> @@ -3373,7 +3375,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>>   {
>>          struct megasas_cmd_fusion *cmd, *r1_cmd = NULL;
>>          union MEGASAS_REQUEST_DESCRIPTOR_UNION *req_desc;
>> -       u32 index;
>> +       u32 index, blk_tag, unique_tag;
>>
>>          if ((megasas_cmd_type(scmd) == READ_WRITE_LDIO) &&
>>                  instance->ldio_threshold &&
>> @@ -3389,7 +3391,9 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>>                  return SCSI_MLQUEUE_HOST_BUSY;
>>          }
>>
>> -       cmd = megasas_get_cmd_fusion(instance, scmd->request->tag);
>> +       unique_tag = blk_mq_unique_tag(scmd->request);
>> +       blk_tag = blk_mq_unique_tag_to_tag(unique_tag);
>> +       cmd = megasas_get_cmd_fusion(instance, blk_tag);
>>
>>          if (!cmd) {
>>                  atomic_dec(&instance->fw_outstanding);
>> @@ -3430,7 +3434,7 @@ megasas_build_and_issue_cmd_fusion(struct megasas_instance *instance,
>>           */
>>          if (cmd->r1_alt_dev_handle != MR_DEVHANDLE_INVALID) {
>>                  r1_cmd = megasas_get_cmd_fusion(instance,
>> -                               (scmd->request->tag + instance->max_fw_cmds));
>> +                               (blk_tag + instance->max_fw_cmds));
>>                  megasas_prepare_secondRaid1_IO(instance, cmd, r1_cmd);
>>          }
>>
>> --
>> 2.16.4
>>
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-09 11:55     ` John Garry
@ 2020-01-09 15:19       ` Hannes Reinecke
  2020-01-09 18:17         ` John Garry
  2020-01-10  2:00       ` Ming Lei
  1 sibling, 1 reply; 37+ messages in thread
From: Hannes Reinecke @ 2020-01-09 15:19 UTC (permalink / raw)
  To: John Garry, Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 1/9/20 12:55 PM, John Garry wrote:
> On 09/12/2019 10:10, Sumit Saxena wrote:
>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>
>>> Fusion adapters can steer completions to individual queues, and
>>> we now have support for shared host-wide tags.
>>> So we can enable multiqueue support for fusion adapters and
>>> drop the hand-crafted interrupt affinity settings.
>>
>> Hi Hannes,
>>
>> Ming Lei also proposed similar changes in megaraid_sas driver some
>> time back and it had resulted in performance drop-
>> https://patchwork.kernel.org/patch/10969511/
>>
>> So, we will do some performance tests with this patch and update you.
>>
> 
> Hi Sumit,
> 
> I was wondering if you had a chance to do this test yet?
> 
> It would be good to know, so we can try to progress this work.
> 
> @Hannes, This shared sbitmap work now seems to conflict with Jens work
> on tag caching
> https://lore.kernel.org/linux-block/20200107163037.31745-1-axboe@kernel.dk/T/#t,
> but should be resolvable AFAICS (v1, anyway, which I checked). Anway, we
> seem to have stalled, which I feared...
> 
Thanks for the reminder.
That was a topic I was wanting to discuss at LSF; will be sending a
topic then.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-09 15:19       ` Hannes Reinecke
@ 2020-01-09 18:17         ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2020-01-09 18:17 UTC (permalink / raw)
  To: Hannes Reinecke, Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 09/01/2020 15:19, Hannes Reinecke wrote:
> On 1/9/20 12:55 PM, John Garry wrote:
>> On 09/12/2019 10:10, Sumit Saxena wrote:
>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Fusion adapters can steer completions to individual queues, and
>>>> we now have support for shared host-wide tags.
>>>> So we can enable multiqueue support for fusion adapters and
>>>> drop the hand-crafted interrupt affinity settings.
>>>
>>> Hi Hannes,
>>>
>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>> time back and it had resulted in performance drop-
>>> https://patchwork.kernel.org/patch/10969511/
>>>
>>> So, we will do some performance tests with this patch and update you.
>>>
>>
>> Hi Sumit,
>>
>> I was wondering if you had a chance to do this test yet?
>>
>> It would be good to know, so we can try to progress this work.
>>
>> @Hannes, This shared sbitmap work now seems to conflict with Jens work
>> on tag caching
>> https://lore.kernel.org/linux-block/20200107163037.31745-1-axboe@kernel.dk/T/#t,
>> but should be resolvable AFAICS (v1, anyway, which I checked). Anway, we
>> seem to have stalled, which I feared...
>>
> Thanks for the reminder.
> That was a topic I was wanting to discuss at LSF; will be sending a
> topic then.

Alright, but I am not really sure what else we need to wait months to 
discuss, unless this shared sbitmap approach is rejected and/or testing 
on other HBAs shows unsatisfactory performance.

To summarize, as I see, we have 3 topics to tackle:

- shared tags

- block hotplug improvement
	- Ming Lei had said that he will post another version of 
https://lore.kernel.org/linux-block/20191128020205.GB3277@ming.t460p/

- https://patchwork.kernel.org/cover/10967071/
	- I'm not sure what's happening on that, but thought that it was 
somewhat straightforward.

If there's something else I've missed, then let me know.

Cheers,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-09 11:55     ` John Garry
  2020-01-09 15:19       ` Hannes Reinecke
@ 2020-01-10  2:00       ` Ming Lei
  2020-01-10 12:09         ` John Garry
  1 sibling, 1 reply; 37+ messages in thread
From: Ming Lei @ 2020-01-10  2:00 UTC (permalink / raw)
  To: John Garry
  Cc: Sumit Saxena, Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Linux SCSI List, linux-block,
	Hannes Reinecke

On Thu, Jan 09, 2020 at 11:55:12AM +0000, John Garry wrote:
> On 09/12/2019 10:10, Sumit Saxena wrote:
> > On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
> > > 
> > > Fusion adapters can steer completions to individual queues, and
> > > we now have support for shared host-wide tags.
> > > So we can enable multiqueue support for fusion adapters and
> > > drop the hand-crafted interrupt affinity settings.
> > 
> > Hi Hannes,
> > 
> > Ming Lei also proposed similar changes in megaraid_sas driver some
> > time back and it had resulted in performance drop-
> > https://patchwork.kernel.org/patch/10969511/
> > 
> > So, we will do some performance tests with this patch and update you.
> > 
> 
> Hi Sumit,
> 
> I was wondering if you had a chance to do this test yet?
> 
> It would be good to know, so we can try to progress this work.

Looks most of the comment in the following link isn't addressed:

https://lore.kernel.org/linux-block/20191129002540.GA1829@ming.t460p/

> Firstly too much((nr_hw_queues - 1) times) memory is wasted. Secondly IO
> latency could be increased by too deep scheduler queue depth. Finally CPU
> could be wasted in the retrying of running busy hw queue.
> 
> Wrt. driver tags, this patch may be worse, given the average limit for
> each LUN is reduced by (nr_hw_queues) times, see hctx_may_queue().
> 
> Another change is bt_wait_ptr(). Before your patches, there is single
> .wait_index, now the number of .wait_index is changed to nr_hw_queues.
> 
> Also the run queue number is increased a lot in SCSI's IO completion, see
> scsi_end_request().

I guess memory waste won't be a blocker.

But it may not be one accepted behavior to reduce average active queue
depth for each LUN by nr_hw_queues times, meantime scheduler queue depth
is increased by nr_hw_queues times, compared with single queue.

thanks,
Ming


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2019-12-09 11:02     ` Hannes Reinecke
@ 2020-01-10  4:00       ` Sumit Saxena
  2020-01-10 12:18         ` John Garry
  2020-01-13 17:42         ` John Garry
  0 siblings, 2 replies; 37+ messages in thread
From: Sumit Saxena @ 2020-01-10  4:00 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, John Garry, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 12/9/19 11:10 AM, Sumit Saxena wrote:
> > On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> Fusion adapters can steer completions to individual queues, and
> >> we now have support for shared host-wide tags.
> >> So we can enable multiqueue support for fusion adapters and
> >> drop the hand-crafted interrupt affinity settings.
> >
> > Hi Hannes,
> >
> > Ming Lei also proposed similar changes in megaraid_sas driver some
> > time back and it had resulted in performance drop-
> > https://patchwork.kernel.org/patch/10969511/
> >
> > So, we will do some performance tests with this patch and update you.
> > Thank you.
>
> I'm aware of the results of Ming Leis work, but I do hope this patchset
> performs better.
>
> And when you do performance measurements, can you please run with both,
> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
> I've measured quite a performance improvements when using mq-deadline,
> up to the point where I've gotten on-par performance with the original,
> non-mq, implementation.
> (As a data point, on my setup I've measured about 270k IOPS and 1092
> MB/s througput, running on just 2 SSDs).
>
> But thanks for doing a performance test here.

Hi Hannes,

Sorry for the delay in replying, I observed a few issues with this patchset:

1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
which IO submitter CPU is affined with. Due to this IO submission and
completion CPUs are different which causes performance drop for low
latency workloads.

lspcu:

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
Stepping:              4
CPU MHz:               3204.246
CPU max MHz:           3700.0000
CPU min MHz:           1200.0000
BogoMIPS:              5400.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17,36-53
NUMA node1 CPU(s):     18-35,54-71
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm pbe s
yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
lahf_lm abm
3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
bmi1 hle
avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
xsavec
xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo


fio test:

#numactl -N 0 fio jbod.fio

# cat jbod.fio
[global]
allow_mounted_write=0
ioengine=libaio
buffered=0
direct=1
group_reporting
iodepth=1
bs=4096
readwrite=randread
..

In this test, IOs are submitted on Node 0 but observed IO completions on Node 1.

IRQs / 1 second(s)
IRQ#  TOTAL  NODE0   NODE1  NAME
176  48387  48387       0  IR-PCI-MSI 34603073-edge megasas14-msix65
178  47966  47966       0  IR-PCI-MSI 34603075-edge megasas14-msix67
170  47706  47706       0  IR-PCI-MSI 34603067-edge megasas14-msix59
180  47291  47291       0  IR-PCI-MSI 34603077-edge megasas14-msix69
181  47155  47155       0  IR-PCI-MSI 34603078-edge megasas14-msix70
173  46806  46806       0  IR-PCI-MSI 34603070-edge megasas14-msix62
179  46773  46773       0  IR-PCI-MSI 34603076-edge megasas14-msix68
169  46600  46600       0  IR-PCI-MSI 34603066-edge megasas14-msix58
175  46447  46447       0  IR-PCI-MSI 34603072-edge megasas14-msix64
172  46184  46184       0  IR-PCI-MSI 34603069-edge megasas14-msix61
182  46117  46117       0  IR-PCI-MSI 34603079-edge megasas14-msix71
165  46070  46070       0  IR-PCI-MSI 34603062-edge megasas14-msix54
164  45892  45892       0  IR-PCI-MSI 34603061-edge megasas14-msix53
174  45864  45864       0  IR-PCI-MSI 34603071-edge megasas14-msix63
156  45348  45348       0  IR-PCI-MSI 34603053-edge megasas14-msix45
147  45302      0   45302  IR-PCI-MSI 34603044-edge megasas14-msix36
151  44922      0   44922  IR-PCI-MSI 34603048-edge megasas14-msix40
171  44876  44876       0  IR-PCI-MSI 34603068-edge megasas14-msix60
159  44755  44755       0  IR-PCI-MSI 34603056-edge megasas14-msix48
148  44695      0   44695  IR-PCI-MSI 34603045-edge megasas14-msix37
157  44304  44304       0  IR-PCI-MSI 34603054-edge megasas14-msix46
167  42552  42552       0  IR-PCI-MSI 34603064-edge megasas14-msix56
154  35937      0   35937  IR-PCI-MSI 34603051-edge megasas14-msix43
166  16004  16004       0  IR-PCI-MSI 34603063-edge megasas14-msix55


IRQ-CPU affinity:

Ran below script to get IRQ-CPU affinity:
--
#!/bin/bash
PCID=`lspci | grep "SAS39xx" | cut -c1-7`
PCIP=`find /sys/devices -name *$PCID | grep pci`
IRQS=`ls $PCIP/msi_irqs`

echo "kernel version: "
uname -a

for IRQ in $IRQS; do
    CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
    echo "irq $IRQ, cpu list $CPUS"
done

--

irq 103, cpu list 0-17,36-53
irq 112, cpu list 0-17,36-53
irq 113, cpu list 0-17,36-53
irq 114, cpu list 0-17,36-53
irq 115, cpu list 0-17,36-53
irq 116, cpu list 0-17,36-53
irq 117, cpu list 0-17,36-53
irq 118, cpu list 0-17,36-53
irq 119, cpu list 18
irq 120, cpu list 19
irq 121, cpu list 20
irq 122, cpu list 21
irq 123, cpu list 22
irq 124, cpu list 23
irq 125, cpu list 24
irq 126, cpu list 25
irq 127, cpu list 26
irq 128, cpu list 27
irq 129, cpu list 28
irq 130, cpu list 29
irq 131, cpu list 30
irq 132, cpu list 31
irq 133, cpu list 32
irq 134, cpu list 33
irq 135, cpu list 34
irq 136, cpu list 35
irq 137, cpu list 54
irq 138, cpu list 55
irq 139, cpu list 56
irq 140, cpu list 57
irq 141, cpu list 58
irq 142, cpu list 59
irq 143, cpu list 60
irq 144, cpu list 61
irq 145, cpu list 62
irq 146, cpu list 63
irq 147, cpu list 64
irq 148, cpu list 65
irq 149, cpu list 66
irq 150, cpu list 67
irq 151, cpu list 68
irq 152, cpu list 69
irq 153, cpu list 70
irq 154, cpu list 71
irq 155, cpu list 0
irq 156, cpu list 1
irq 157, cpu list 2
irq 158, cpu list 3
irq 159, cpu list 4
irq 160, cpu list 5
irq 161, cpu list 6
irq 162, cpu list 7
irq 163, cpu list 8
irq 164, cpu list 9
irq 165, cpu list 10
irq 166, cpu list 11
irq 167, cpu list 12
irq 168, cpu list 13
irq 169, cpu list 14
irq 170, cpu list 15
irq 171, cpu list 16
irq 172, cpu list 17
irq 173, cpu list 36
irq 174, cpu list 37
irq 175, cpu list 38
irq 176, cpu list 39
irq 177, cpu list 40
irq 178, cpu list 41
irq 179, cpu list 42
irq 180, cpu list 43
irq 181, cpu list 44
irq 182, cpu list 45
irq 183, cpu list 46
irq 184, cpu list 47
irq 185, cpu list 48
irq 186, cpu list 49
irq 187, cpu list 50
irq 188, cpu list 51
irq 189, cpu list 52
irq 190, cpu list 53


I added prints in megaraid_sas driver's IO path to catch when MSI-x
affined to IO submitter CPU does not match with what is returned by
API "blk_mq_unique_tag_to_hwq(tag)".
I have copied below few prints from dmesg logs- in these prints CPU is
submitter CPU, calculated MSX-x is what is returned by
"blk_mq_unique_tag_to_hwq(tag)" and affined MSI-x is actual MSI-x to
which submitting
CPU is affined to:

[2536843.629877] BRCM DBG: CPU:6 calculated MSI-x: 153 affined MSIx: 161
[2536843.641674] BRCM DBG: CPU:39 calculated MSI-x: 168 affined MSIx: 176
[2536843.641674] BRCM DBG: CPU:13 calculated MSI-x: 160 affined MSIx: 168


2. Seeing below stack traces/messages in dmesg during driver unload –

[2565601.054366] Call Trace:
[2565601.054368]  blk_mq_free_map_and_requests+0x28/0x50
[2565601.054369]  blk_mq_free_tag_set+0x1d/0x90
[2565601.054370]  scsi_host_dev_release+0x8a/0xf0
[2565601.054370]  device_release+0x27/0x80
[2565601.054371]  kobject_cleanup+0x61/0x190
[2565601.054373]  megasas_detach_one+0x4c1/0x650 [megaraid_sas]
[2565601.054374]  pci_device_remove+0x3b/0xc0
[2565601.054375]  device_release_driver_internal+0xec/0x1b0
[2565601.054376]  driver_detach+0x46/0x90
[2565601.054377]  bus_remove_driver+0x58/0xd0
[2565601.054378]  pci_unregister_driver+0x26/0xa0
[2565601.054379]  megasas_exit+0x91/0x882 [megaraid_sas]
[2565601.054381]  __x64_sys_delete_module+0x16c/0x250
[2565601.054382]  do_syscall_64+0x5b/0x1b0
[2565601.054383]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2565601.054383] RIP: 0033:0x7f7212a82837
[2565601.054384] RSP: 002b:00007ffdfa2dcea8 EFLAGS: 00000202 ORIG_RAX:
00000000000000b0
[2565601.054385] RAX: ffffffffffffffda RBX: 0000000000b6e2e0 RCX:
00007f7212a82837
[2565601.054385] RDX: 00007f7212af3ac0 RSI: 0000000000000800 RDI:
0000000000b6e348
[2565601.054386] RBP: 0000000000000000 R08: 00007f7212d47060 R09:
00007f7212af3ac0
[2565601.054386] R10: 00007ffdfa2dcbc0 R11: 0000000000000202 R12:
00007ffdfa2dd71c
[2565601.054387] R13: 0000000000000000 R14: 0000000000b6e2e0 R15:
0000000000b6e010
[2565601.054387] ---[ end trace 38899303bd85e838 ]---
[2565601.054390] ------------[ cut here ]------------
[2565601.054391] WARNING: CPU: 31 PID: 50996 at block/blk-mq.c:2056
blk_mq_free_rqs+0x10b/0x120
[2565601.054391] Modules linked in: megaraid_sas(-) ses enclosure
scsi_transport_sas xt_CHECKSUM xt_MASQUERADE tun bridge stp llc
ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
iptable_security iptable_raw ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter intel_rapl_msr intel_rapl_common skx_edac
nfit libnvdimm ftdi_sio x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic
ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core
irqbypass snd_hwdep crct10dif_pclmul snd_seq crc32_pclmul
ghash_clmulni_intel snd_seq_device snd_pcm iTCO_wdt mei_me
iTCO_vendor_support cryptd snd_timer joydev snd mei soundcore ioatdma
sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler dca i2c_i801 lpc_ich
acpi_power_meter wmi
[2565601.054400]  acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
e1000e ahci libahci crc32c_intel nvme libata nvme_core i2c_core
dm_mirror dm_region_hash dm_log dm_mod [last unloaded: megaraid_sas]
[2565601.054404] CPU: 31 PID: 50996 Comm: rmmod Kdump: loaded Tainted:
G        W  OE     5.4.0-rc1+ #2
[2565601.054405] Hardware name: Supermicro Super Server/X11DPG-QT,
BIOS 1.0 06/22/2017
[2565601.054406] RIP: 0010:blk_mq_free_rqs+0x10b/0x120
[2565601.054407] Code: 89 10 48 8b 73 20 48 89 1b 4c 89 e7 48 89 5b 08
e8 2a 54 e7 ff 48 8b 85 b0 00 00 00 49 39 c5 75 bc 5b 5d 41 5c 41 5d
41 5e c3 <0f> 0b 48 c7 02 00 00 00 00 e9 2b ff ff ff 0f 1f 80 00 00 00
00 0f
[2565601.054407] RSP: 0018:ffffb37446a6bd58 EFLAGS: 00010286
[2565601.054408] RAX: 0000000000000747 RBX: ffff92219cb280a8 RCX:
0000000000000747
[2565601.054408] RDX: ffff92219b153a38 RSI: ffff9221692bb5c0 RDI:
ffff92219cb280a8
[2565601.054408] RBP: ffff9221692bb5c0 R08: 0000000000000001 R09:
0000000000000000
[2565601.054409] R10: ffff9221692bb680 R11: ffff9221bffd5f60 R12:
ffff9221970dd678
[2565601.054409] R13: 0000000000000030 R14: ffff92219cb280a8 R15:
ffff922199ada010
[2565601.054410] FS:  00007f72135a1740(0000) GS:ffff92299f540000(0000)
knlGS:0000000000000000
[2565601.054410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2565601.054410] CR2: 0000000000b77c78 CR3: 0000000f27db0006 CR4:
00000000007606e0
[2565601.054412] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[2565601.054412] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[2565601.054413] PKRU: 55555554


3. For High IOs(outstanding IOs per physical disk > 8) oriented
workloads, performance numbers are good(no performance drop) as in
that case driver uses non-managed affinity high IOPs reply queues and
this patchset does not touch driver's high IOPs IO path.

4. This patch removes below code from driver so what this piece of
code does is broken-


-                               if (instance->adapter_type >= INVADER_SERIES &&
-                                   !instance->msix_combined) {
-                                       instance->msix_load_balance = true;
-                                       instance->smp_affinity_enable = false;
-                               }

Thanks,
Sumit
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke            Teamlead Storage & Networking
> hare@suse.de                               +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-10  2:00       ` Ming Lei
@ 2020-01-10 12:09         ` John Garry
  2020-01-14 13:57           ` John Garry
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2020-01-10 12:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sumit Saxena, Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Linux SCSI List, linux-block,
	Hannes Reinecke

On 10/01/2020 02:00, Ming Lei wrote:
> On Thu, Jan 09, 2020 at 11:55:12AM +0000, John Garry wrote:
>> On 09/12/2019 10:10, Sumit Saxena wrote:
>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Fusion adapters can steer completions to individual queues, and
>>>> we now have support for shared host-wide tags.
>>>> So we can enable multiqueue support for fusion adapters and
>>>> drop the hand-crafted interrupt affinity settings.
>>>
>>> Hi Hannes,
>>>
>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>> time back and it had resulted in performance drop-
>>> https://patchwork.kernel.org/patch/10969511/
>>>
>>> So, we will do some performance tests with this patch and update you.
>>>
>>
>> Hi Sumit,
>>
>> I was wondering if you had a chance to do this test yet?
>>
>> It would be good to know, so we can try to progress this work.
> 
> Looks most of the comment in the following link isn't addressed:
> 
> https://lore.kernel.org/linux-block/20191129002540.GA1829@ming.t460p/

OK, but I was waiting for results first, which I hoped would not take 
too long. They weren't forgotten, for sure. Let me check them now.

> 
>> Firstly too much((nr_hw_queues - 1) times) memory is wasted. Secondly IO
>> latency could be increased by too deep scheduler queue depth. Finally CPU
>> could be wasted in the retrying of running busy hw queue.
>>
>> Wrt. driver tags, this patch may be worse, given the average limit for
>> each LUN is reduced by (nr_hw_queues) times, see hctx_may_queue().
>>
>> Another change is bt_wait_ptr(). Before your patches, there is single
>> .wait_index, now the number of .wait_index is changed to nr_hw_queues.
>>
>> Also the run queue number is increased a lot in SCSI's IO completion, see
>> scsi_end_request().
> 
> I guess memory waste won't be a blocker.

Yeah, that's a trade-off. And, as I remember, memory waste does not seem 
extreme.

> 
> But it may not be one accepted behavior to reduce average active queue
> depth for each LUN by nr_hw_queues times, meantime scheduler queue depth
> is increased by nr_hw_queues times, compared with single queue.
> 

Thanks,
John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-10  4:00       ` Sumit Saxena
@ 2020-01-10 12:18         ` John Garry
  2020-01-13 17:42         ` John Garry
  1 sibling, 0 replies; 37+ messages in thread
From: John Garry @ 2020-01-10 12:18 UTC (permalink / raw)
  To: Sumit Saxena, Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 10/01/2020 04:00, Sumit Saxena wrote:
> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Fusion adapters can steer completions to individual queues, and
>>>> we now have support for shared host-wide tags.
>>>> So we can enable multiqueue support for fusion adapters and
>>>> drop the hand-crafted interrupt affinity settings.
>>>
>>> Hi Hannes,
>>>
>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>> time back and it had resulted in performance drop-
>>> https://patchwork.kernel.org/patch/10969511/
>>>
>>> So, we will do some performance tests with this patch and update you.
>>> Thank you.
>>
>> I'm aware of the results of Ming Leis work, but I do hope this patchset
>> performs better.
>>
>> And when you do performance measurements, can you please run with both,
>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
>> I've measured quite a performance improvements when using mq-deadline,
>> up to the point where I've gotten on-par performance with the original,
>> non-mq, implementation.
>> (As a data point, on my setup I've measured about 270k IOPS and 1092
>> MB/s througput, running on just 2 SSDs).
>>
>> But thanks for doing a performance test here.
> 

Hi Sumit,

Thanks for this. We'll have a look ...

Cheers,
John

EOM

> Hi Hannes,
> 
> Sorry for the delay in replying, I observed a few issues with this patchset:
> 
> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> which IO submitter CPU is affined with. Due to this IO submission and
> completion CPUs are different which causes performance drop for low
> latency workloads.
> 
> lspcu:
> 
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               3204.246
> CPU max MHz:           3700.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> tm pbe s
> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
> lahf_lm abm
> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> bmi1 hle
> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> xsavec
> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
> 
> 
> fio test:
> 
> #numactl -N 0 fio jbod.fio
> 
> # cat jbod.fio
> [global]
> allow_mounted_write=0
> ioengine=libaio
> buffered=0
> direct=1
> group_reporting
> iodepth=1
> bs=4096
> readwrite=randread
> ..
> 
> In this test, IOs are submitted on Node 0 but observed IO completions on Node 1.
> 
> IRQs / 1 second(s)
> IRQ#  TOTAL  NODE0   NODE1  NAME
> 176  48387  48387       0  IR-PCI-MSI 34603073-edge megasas14-msix65
> 178  47966  47966       0  IR-PCI-MSI 34603075-edge megasas14-msix67
> 170  47706  47706       0  IR-PCI-MSI 34603067-edge megasas14-msix59
> 180  47291  47291       0  IR-PCI-MSI 34603077-edge megasas14-msix69
> 181  47155  47155       0  IR-PCI-MSI 34603078-edge megasas14-msix70
> 173  46806  46806       0  IR-PCI-MSI 34603070-edge megasas14-msix62
> 179  46773  46773       0  IR-PCI-MSI 34603076-edge megasas14-msix68
> 169  46600  46600       0  IR-PCI-MSI 34603066-edge megasas14-msix58
> 175  46447  46447       0  IR-PCI-MSI 34603072-edge megasas14-msix64
> 172  46184  46184       0  IR-PCI-MSI 34603069-edge megasas14-msix61
> 182  46117  46117       0  IR-PCI-MSI 34603079-edge megasas14-msix71
> 165  46070  46070       0  IR-PCI-MSI 34603062-edge megasas14-msix54
> 164  45892  45892       0  IR-PCI-MSI 34603061-edge megasas14-msix53
> 174  45864  45864       0  IR-PCI-MSI 34603071-edge megasas14-msix63
> 156  45348  45348       0  IR-PCI-MSI 34603053-edge megasas14-msix45
> 147  45302      0   45302  IR-PCI-MSI 34603044-edge megasas14-msix36
> 151  44922      0   44922  IR-PCI-MSI 34603048-edge megasas14-msix40
> 171  44876  44876       0  IR-PCI-MSI 34603068-edge megasas14-msix60
> 159  44755  44755       0  IR-PCI-MSI 34603056-edge megasas14-msix48
> 148  44695      0   44695  IR-PCI-MSI 34603045-edge megasas14-msix37
> 157  44304  44304       0  IR-PCI-MSI 34603054-edge megasas14-msix46
> 167  42552  42552       0  IR-PCI-MSI 34603064-edge megasas14-msix56
> 154  35937      0   35937  IR-PCI-MSI 34603051-edge megasas14-msix43
> 166  16004  16004       0  IR-PCI-MSI 34603063-edge megasas14-msix55
> 
> 
> IRQ-CPU affinity:
> 
> Ran below script to get IRQ-CPU affinity:
> --
> #!/bin/bash
> PCID=`lspci | grep "SAS39xx" | cut -c1-7`
> PCIP=`find /sys/devices -name *$PCID | grep pci`
> IRQS=`ls $PCIP/msi_irqs`
> 
> echo "kernel version: "
> uname -a
> 
> for IRQ in $IRQS; do
>      CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
>      echo "irq $IRQ, cpu list $CPUS"
> done
> 
> --
> 
> irq 103, cpu list 0-17,36-53
> irq 112, cpu list 0-17,36-53
> irq 113, cpu list 0-17,36-53
> irq 114, cpu list 0-17,36-53
> irq 115, cpu list 0-17,36-53
> irq 116, cpu list 0-17,36-53
> irq 117, cpu list 0-17,36-53
> irq 118, cpu list 0-17,36-53
> irq 119, cpu list 18
> irq 120, cpu list 19
> irq 121, cpu list 20
> irq 122, cpu list 21
> irq 123, cpu list 22
> irq 124, cpu list 23
> irq 125, cpu list 24
> irq 126, cpu list 25
> irq 127, cpu list 26
> irq 128, cpu list 27
> irq 129, cpu list 28
> irq 130, cpu list 29
> irq 131, cpu list 30
> irq 132, cpu list 31
> irq 133, cpu list 32
> irq 134, cpu list 33
> irq 135, cpu list 34
> irq 136, cpu list 35
> irq 137, cpu list 54
> irq 138, cpu list 55
> irq 139, cpu list 56
> irq 140, cpu list 57
> irq 141, cpu list 58
> irq 142, cpu list 59
> irq 143, cpu list 60
> irq 144, cpu list 61
> irq 145, cpu list 62
> irq 146, cpu list 63
> irq 147, cpu list 64
> irq 148, cpu list 65
> irq 149, cpu list 66
> irq 150, cpu list 67
> irq 151, cpu list 68
> irq 152, cpu list 69
> irq 153, cpu list 70
> irq 154, cpu list 71
> irq 155, cpu list 0
> irq 156, cpu list 1
> irq 157, cpu list 2
> irq 158, cpu list 3
> irq 159, cpu list 4
> irq 160, cpu list 5
> irq 161, cpu list 6
> irq 162, cpu list 7
> irq 163, cpu list 8
> irq 164, cpu list 9
> irq 165, cpu list 10
> irq 166, cpu list 11
> irq 167, cpu list 12
> irq 168, cpu list 13
> irq 169, cpu list 14
> irq 170, cpu list 15
> irq 171, cpu list 16
> irq 172, cpu list 17
> irq 173, cpu list 36
> irq 174, cpu list 37
> irq 175, cpu list 38
> irq 176, cpu list 39
> irq 177, cpu list 40
> irq 178, cpu list 41
> irq 179, cpu list 42
> irq 180, cpu list 43
> irq 181, cpu list 44
> irq 182, cpu list 45
> irq 183, cpu list 46
> irq 184, cpu list 47
> irq 185, cpu list 48
> irq 186, cpu list 49
> irq 187, cpu list 50
> irq 188, cpu list 51
> irq 189, cpu list 52
> irq 190, cpu list 53
> 
> 
> I added prints in megaraid_sas driver's IO path to catch when MSI-x
> affined to IO submitter CPU does not match with what is returned by
> API "blk_mq_unique_tag_to_hwq(tag)".
> I have copied below few prints from dmesg logs- in these prints CPU is
> submitter CPU, calculated MSX-x is what is returned by
> "blk_mq_unique_tag_to_hwq(tag)" and affined MSI-x is actual MSI-x to
> which submitting
> CPU is affined to:
> 
> [2536843.629877] BRCM DBG: CPU:6 calculated MSI-x: 153 affined MSIx: 161
> [2536843.641674] BRCM DBG: CPU:39 calculated MSI-x: 168 affined MSIx: 176
> [2536843.641674] BRCM DBG: CPU:13 calculated MSI-x: 160 affined MSIx: 168
> 
> 
> 2. Seeing below stack traces/messages in dmesg during driver unload –
> 
> [2565601.054366] Call Trace:
> [2565601.054368]  blk_mq_free_map_and_requests+0x28/0x50
> [2565601.054369]  blk_mq_free_tag_set+0x1d/0x90
> [2565601.054370]  scsi_host_dev_release+0x8a/0xf0
> [2565601.054370]  device_release+0x27/0x80
> [2565601.054371]  kobject_cleanup+0x61/0x190
> [2565601.054373]  megasas_detach_one+0x4c1/0x650 [megaraid_sas]
> [2565601.054374]  pci_device_remove+0x3b/0xc0
> [2565601.054375]  device_release_driver_internal+0xec/0x1b0
> [2565601.054376]  driver_detach+0x46/0x90
> [2565601.054377]  bus_remove_driver+0x58/0xd0
> [2565601.054378]  pci_unregister_driver+0x26/0xa0
> [2565601.054379]  megasas_exit+0x91/0x882 [megaraid_sas]
> [2565601.054381]  __x64_sys_delete_module+0x16c/0x250
> [2565601.054382]  do_syscall_64+0x5b/0x1b0
> [2565601.054383]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [2565601.054383] RIP: 0033:0x7f7212a82837
> [2565601.054384] RSP: 002b:00007ffdfa2dcea8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000b0
> [2565601.054385] RAX: ffffffffffffffda RBX: 0000000000b6e2e0 RCX:
> 00007f7212a82837
> [2565601.054385] RDX: 00007f7212af3ac0 RSI: 0000000000000800 RDI:
> 0000000000b6e348
> [2565601.054386] RBP: 0000000000000000 R08: 00007f7212d47060 R09:
> 00007f7212af3ac0
> [2565601.054386] R10: 00007ffdfa2dcbc0 R11: 0000000000000202 R12:
> 00007ffdfa2dd71c
> [2565601.054387] R13: 0000000000000000 R14: 0000000000b6e2e0 R15:
> 0000000000b6e010
> [2565601.054387] ---[ end trace 38899303bd85e838 ]---
> [2565601.054390] ------------[ cut here ]------------
> [2565601.054391] WARNING: CPU: 31 PID: 50996 at block/blk-mq.c:2056
> blk_mq_free_rqs+0x10b/0x120
> [2565601.054391] Modules linked in: megaraid_sas(-) ses enclosure
> scsi_transport_sas xt_CHECKSUM xt_MASQUERADE tun bridge stp llc
> ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6
> xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat
> ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
> iptable_security iptable_raw ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter intel_rapl_msr intel_rapl_common skx_edac
> nfit libnvdimm ftdi_sio x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic
> ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core
> irqbypass snd_hwdep crct10dif_pclmul snd_seq crc32_pclmul
> ghash_clmulni_intel snd_seq_device snd_pcm iTCO_wdt mei_me
> iTCO_vendor_support cryptd snd_timer joydev snd mei soundcore ioatdma
> sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler dca i2c_i801 lpc_ich
> acpi_power_meter wmi
> [2565601.054400]  acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> ip_tables xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper ttm
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> e1000e ahci libahci crc32c_intel nvme libata nvme_core i2c_core
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: megaraid_sas]
> [2565601.054404] CPU: 31 PID: 50996 Comm: rmmod Kdump: loaded Tainted:
> G        W  OE     5.4.0-rc1+ #2
> [2565601.054405] Hardware name: Supermicro Super Server/X11DPG-QT,
> BIOS 1.0 06/22/2017
> [2565601.054406] RIP: 0010:blk_mq_free_rqs+0x10b/0x120
> [2565601.054407] Code: 89 10 48 8b 73 20 48 89 1b 4c 89 e7 48 89 5b 08
> e8 2a 54 e7 ff 48 8b 85 b0 00 00 00 49 39 c5 75 bc 5b 5d 41 5c 41 5d
> 41 5e c3 <0f> 0b 48 c7 02 00 00 00 00 e9 2b ff ff ff 0f 1f 80 00 00 00
> 00 0f
> [2565601.054407] RSP: 0018:ffffb37446a6bd58 EFLAGS: 00010286
> [2565601.054408] RAX: 0000000000000747 RBX: ffff92219cb280a8 RCX:
> 0000000000000747
> [2565601.054408] RDX: ffff92219b153a38 RSI: ffff9221692bb5c0 RDI:
> ffff92219cb280a8
> [2565601.054408] RBP: ffff9221692bb5c0 R08: 0000000000000001 R09:
> 0000000000000000
> [2565601.054409] R10: ffff9221692bb680 R11: ffff9221bffd5f60 R12:
> ffff9221970dd678
> [2565601.054409] R13: 0000000000000030 R14: ffff92219cb280a8 R15:
> ffff922199ada010
> [2565601.054410] FS:  00007f72135a1740(0000) GS:ffff92299f540000(0000)
> knlGS:0000000000000000
> [2565601.054410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2565601.054410] CR2: 0000000000b77c78 CR3: 0000000f27db0006 CR4:
> 00000000007606e0
> [2565601.054412] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [2565601.054412] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [2565601.054413] PKRU: 55555554
> 
> 
> 3. For High IOs(outstanding IOs per physical disk > 8) oriented
> workloads, performance numbers are good(no performance drop) as in
> that case driver uses non-managed affinity high IOPs reply queues and
> this patchset does not touch driver's high IOPs IO path.
> 
> 4. This patch removes below code from driver so what this piece of
> code does is broken-
> 
> 
> -                               if (instance->adapter_type >= INVADER_SERIES &&
> -                                   !instance->msix_combined) {
> -                                       instance->msix_load_balance = true;
> -                                       instance->smp_affinity_enable = false;
> -                               }
> 
> Thanks,
> Sumit
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke            Teamlead Storage & Networking
>> hare@suse.de                               +49 911 74053 688
>> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-10  4:00       ` Sumit Saxena
  2020-01-10 12:18         ` John Garry
@ 2020-01-13 17:42         ` John Garry
  2020-01-14  7:05           ` Hannes Reinecke
  2020-01-17 11:18           ` Sumit Saxena
  1 sibling, 2 replies; 37+ messages in thread
From: John Garry @ 2020-01-13 17:42 UTC (permalink / raw)
  To: Sumit Saxena, Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 10/01/2020 04:00, Sumit Saxena wrote:
> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Fusion adapters can steer completions to individual queues, and
>>>> we now have support for shared host-wide tags.
>>>> So we can enable multiqueue support for fusion adapters and
>>>> drop the hand-crafted interrupt affinity settings.
>>>
>>> Hi Hannes,
>>>
>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>> time back and it had resulted in performance drop-
>>> https://patchwork.kernel.org/patch/10969511/
>>>
>>> So, we will do some performance tests with this patch and update you.
>>> Thank you.
>>
>> I'm aware of the results of Ming Leis work, but I do hope this patchset
>> performs better.
>>
>> And when you do performance measurements, can you please run with both,
>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
>> I've measured quite a performance improvements when using mq-deadline,
>> up to the point where I've gotten on-par performance with the original,
>> non-mq, implementation.
>> (As a data point, on my setup I've measured about 270k IOPS and 1092
>> MB/s througput, running on just 2 SSDs).
>>asas_build_ldio_fusion
>> But thanks for doing a performance test here.
> 
> Hi Hannes,
> 
> Sorry for the delay in replying, I observed a few issues with this patchset:
> 
> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> which IO submitter CPU is affined with. Due to this IO submission and
> completion CPUs are different which causes performance drop for low
> latency workloads.

Hi Sumit,

So the new code has:

megasas_build_ldio_fusion()
{

cmd->request_desc->SCSIIO.MSIxIndex =
blk_mq_unique_tag_to_hwq(tag);

}

So the value here is hw queue index from blk-mq point of view, and not 
megaraid_sas msix index, as you alluded to.

So we get 80 msix, 8 are reserved for low_latency_index_start (that's 
how it seems to me), and we report other 72 as #hw queues = 72 to SCSI 
midlayer.

So I think that this should be:

cmd->request_desc->SCSIIO.MSIxIndex =
blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;


> 
> lspcu:
> 
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               3204.246
> CPU max MHz:           3700.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> tm pbe s
> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
> lahf_lm abm
> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> bmi1 hle
> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> xsavec
> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
> 
> 

[snip]

> 4. This patch removes below code from driver so what this piece of
> code does is broken-
> 
> 
> -                               if (instance->adapter_type >= INVADER_SERIES &&
> -                                   !instance->msix_combined) {
> -                                       instance->msix_load_balance = true;
> -                                       instance->smp_affinity_enable = false;
> -                               }

Does this code need to be re-added? Would this have affected your test?

Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-13 17:42         ` John Garry
@ 2020-01-14  7:05           ` Hannes Reinecke
  2020-01-16 15:47             ` John Garry
  2020-01-17 11:40             ` Sumit Saxena
  2020-01-17 11:18           ` Sumit Saxena
  1 sibling, 2 replies; 37+ messages in thread
From: Hannes Reinecke @ 2020-01-14  7:05 UTC (permalink / raw)
  To: John Garry, Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 1/13/20 6:42 PM, John Garry wrote:
> On 10/01/2020 04:00, Sumit Saxena wrote:
>> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
>>>
>>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
>>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>>
>>>>> Fusion adapters can steer completions to individual queues, and
>>>>> we now have support for shared host-wide tags.
>>>>> So we can enable multiqueue support for fusion adapters and
>>>>> drop the hand-crafted interrupt affinity settings.
>>>>
>>>> Hi Hannes,
>>>>
>>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>>> time back and it had resulted in performance drop-
>>>> https://patchwork.kernel.org/patch/10969511/
>>>>
>>>> So, we will do some performance tests with this patch and update you.
>>>> Thank you.
>>>
>>> I'm aware of the results of Ming Leis work, but I do hope this patchset
>>> performs better.
>>>
>>> And when you do performance measurements, can you please run with both,
>>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
>>> I've measured quite a performance improvements when using mq-deadline,
>>> up to the point where I've gotten on-par performance with the original,
>>> non-mq, implementation.
>>> (As a data point, on my setup I've measured about 270k IOPS and 1092
>>> MB/s througput, running on just 2 SSDs).
>>> asas_build_ldio_fusion
>>> But thanks for doing a performance test here.
>>
>> Hi Hannes,
>>
>> Sorry for the delay in replying, I observed a few issues with this
>> patchset:
>>
>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
>> which IO submitter CPU is affined with. Due to this IO submission and
>> completion CPUs are different which causes performance drop for low
>> latency workloads.
> 
> Hi Sumit,
> 
> So the new code has:
> 
> megasas_build_ldio_fusion()
> {
> 
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag);
> 
> }
> 
> So the value here is hw queue index from blk-mq point of view, and not
> megaraid_sas msix index, as you alluded to.
> 
> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
> midlayer.
> 
> So I think that this should be:
> 
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
> 
> 
Indeed, that sounds reasonable.
(The whole queue mapping stuff isn't exactly well documented :-( )

I'll be updating the patch.

>>
>> lspcu:
>>
>> # lscpu
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                72
>> On-line CPU(s) list:   0-71
>> Thread(s) per core:    2
>> Core(s) per socket:    18
>> Socket(s):             2
>> NUMA node(s):          2
>> Vendor ID:             GenuineIntel
>> CPU family:            6
>> Model:                 85
>> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
>> Stepping:              4
>> CPU MHz:               3204.246
>> CPU max MHz:           3700.0000
>> CPU min MHz:           1200.0000
>> BogoMIPS:              5400.00
>> Virtualization:        VT-x
>> L1d cache:             32K
>> L1i cache:             32K
>> L2 cache:              1024K
>> L3 cache:              25344K
>> NUMA node0 CPU(s):     0-17,36-53
>> NUMA node1 CPU(s):     18-35,54-71
>> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>> tm pbe s
>> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
>> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
>> dtes64 monitor
>> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
>> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
>> lahf_lm abm
>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
>> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
>> bmi1 hle
>> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
>> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
>> xsavec
>> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
>>
>>
> 
> [snip]
> 
>> 4. This patch removes below code from driver so what this piece of
>> code does is broken-
>>
>>
>> -                               if (instance->adapter_type >=
>> INVADER_SERIES &&
>> -                                   !instance->msix_combined) {
>> -                                       instance->msix_load_balance =
>> true;
>> -                                       instance->smp_affinity_enable
>> = false;
>> -                               }
> 
> Does this code need to be re-added? Would this have affected your test?
> Primarily this patch was required to enable interrupt affinity on my
machine (Lenovo RAID 930-8i).
Can you give me some information why the code is present in the first
place? Some hardware limitation, maybe?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-10 12:09         ` John Garry
@ 2020-01-14 13:57           ` John Garry
  0 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2020-01-14 13:57 UTC (permalink / raw)
  To: Ming Lei
  Cc: Sumit Saxena, Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Linux SCSI List, linux-block,
	Hannes Reinecke

On 10/01/2020 12:09, John Garry wrote:
>>>>>
>>>>> Fusion adapters can steer completions to individual queues, and
>>>>> we now have support for shared host-wide tags.
>>>>> So we can enable multiqueue support for fusion adapters and
>>>>> drop the hand-crafted interrupt affinity settings.
>>>>
>>>> Hi Hannes,
>>>>
>>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>>> time back and it had resulted in performance drop-
>>>> https://patchwork.kernel.org/patch/10969511/
>>>>
>>>> So, we will do some performance tests with this patch and update you.
>>>>
>>>
>>> Hi Sumit,
>>>
>>> I was wondering if you had a chance to do this test yet?
>>>
>>> It would be good to know, so we can try to progress this work.
>>
>> Looks most of the comment in the following link isn't addressed:
>>
>> https://lore.kernel.org/linux-block/20191129002540.GA1829@ming.t460p/
> 
> OK, but I was waiting for results first, which I hoped would not take 
> too long. They weren't forgotten, for sure. Let me check them now.

Hi Ming,

I think that your questions here were related to the shared scheduler 
tags, which was Hannes' proposal (I initially had it in v2 series, but 
dropped it for v3).

I was just content to maintain the concept of shared driver tags.

Thanks,
John

> 
>>
>>> Firstly too much((nr_hw_queues - 1) times) memory is wasted. Secondly IO
>>> latency could be increased by too deep scheduler queue depth. Finally 
>>> CPU
>>> could be wasted in the retrying of running busy hw queue.
>>>
>>> Wrt. driver tags, this patch may be worse, given the average limit for
>>> each LUN is reduced by (nr_hw_queues) times, see hctx_may_queue().
>>>
>>> Another change is bt_wait_ptr(). Before your patches, there is single
>>> .wait_index, now the number of .wait_index is changed to nr_hw_queues.
>>>
>>> Also the run queue number is increased a lot in SCSI's IO completion, 
>>> see
>>> scsi_end_request().
>>
>> I guess memory waste won't be a blocker.
> 
> Yeah, that's a trade-off. And, as I remember, memory waste does not seem 
> extreme.
> 
>>
>> But it may not be one accepted behavior to reduce average active queue
>> depth for each LUN by nr_hw_queues times, meantime scheduler queue depth
>> is increased by nr_hw_queues times, compared with single queue.
>>
> 
> Thanks,
> John


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-14  7:05           ` Hannes Reinecke
@ 2020-01-16 15:47             ` John Garry
  2020-01-16 17:45               ` Hannes Reinecke
  2020-01-17 11:40             ` Sumit Saxena
  1 sibling, 1 reply; 37+ messages in thread
From: John Garry @ 2020-01-16 15:47 UTC (permalink / raw)
  To: Hannes Reinecke, Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke


>>>
>>> Hi Hannes,
>>>
>>> Sorry for the delay in replying, I observed a few issues with this
>>> patchset:
>>>
>>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
>>> which IO submitter CPU is affined with. Due to this IO submission and
>>> completion CPUs are different which causes performance drop for low
>>> latency workloads.
>>
>> Hi Sumit,
>>
>> So the new code has:
>>
>> megasas_build_ldio_fusion()
>> {
>>
>> cmd->request_desc->SCSIIO.MSIxIndex =
>> blk_mq_unique_tag_to_hwq(tag);
>>
>> }
>>
>> So the value here is hw queue index from blk-mq point of view, and not
>> megaraid_sas msix index, as you alluded to.
>>
>> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
>> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
>> midlayer.
>>
>> So I think that this should be:
>>
>> cmd->request_desc->SCSIIO.MSIxIndex =
>> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
>>
>>
> Indeed, that sounds reasonable.
> (The whole queue mapping stuff isn't exactly well documented :-( )
> 

Yeah, there's certainly lots of knobs and levers in this driver.

> I'll be updating the patch.

About this one:

 > 2. Seeing below stack traces/messages in dmesg during driver unload –
 >
 > [2565601.054366] Call Trace:
 > [2565601.054368]  blk_mq_free_map_and_requests+0x28/0x50
 > [2565601.054369]  blk_mq_free_tag_set+0x1d/0x90
 > [2565601.054370]  scsi_host_dev_release+0x8a/0xf0
 > [2565601.054370]  device_release+0x27/0x80
 > [2565601.054371]  kobject_cleanup+0x61/0x190
 > [2565601.054373]  megasas_detach_one+0x4c1/0x650 [megaraid_sas]
 > [2565601.054374]  pci_device_remove+0x3b/0xc0
 > [2565601.054375]  device_release_driver_internal+0xec/0x1b0
 > [2565601.054376]  driver_detach+0x46/0x90
 > [2565601.054377]  bus_remove_driver+0x58/0xd0
 > [2565601.054378]  pci_unregister_driver+0x26/0xa0
 > [2565601.054379]  megasas_exit+0x91/0x882 [megaraid_sas]
 > [2565601.054381]  __x64_sys_delete_module+0x16c/0x250
 > [2565601.054382]  do_syscall_64+0x5b/0x1b0
 > [2565601.054383]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 > [2565601.054383] RIP: 0033:0x7f7212a82837
 > [2565601.054384] RSP: 002b:00007ffdfa2dcea8 EFLAGS: 00000202 ORIG_RAX:
 > 00000000000000b0
 > [2565601.054385] RAX: ffffffffffffffda RBX: 0000000000b6e2e0 RCX:
 > 00007f7212a82837
 > [2565601.054385] RDX: 00007f7212af3ac0 RSI: 0000000000000800 RDI:
 > 0000000000b6e348
 > [2565601.054386] RBP: 0000000000000000 R08: 00007f7212d47060 R09:
 > 00007f7212af3ac0
 > [2565601.054386] R10: 00007ffdfa2dcbc0 R11: 0000000000000202 R12:
 > 00007ffdfa2dd71c
 > [2565601.054387] R13: 0000000000000000 R14: 0000000000b6e2e0 R15:
 > 0000000000b6e010
 > [2565601.054387] ---[ end trace 38899303bd85e838 ]---


I see it also for hisi_sas_v3_hw.

And so I don't understand the code change here, specifically where the 
WARN is generated:

void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
		     unsigned int hctx_idx)
{
	struct page *page;
	int i;

	if (tags->rqs) {
		for (i = 0; i < tags->nr_tags; i++)
			if (WARN_ON(tags->rqs[i]))
				tags->rqs[i] = NULL; <--- here
	}


I thought that tags->rqs[i] was just a holder for a pointer to a static 
tag, like assigned here:

static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
unsigned int tag, unsigned int op, u64 alloc_time_ns)
{
	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
	struct request *rq = tags->static_rqs[tag];

	...

	rq->tag = tag;
	rq->internal_tag = -1;
	data->hctx->tags->rqs[rq->tag] = rq;

	...
}

So I don't know why we need to WARN if unset, and then also clear it. 
The memory is freed pretty soon after this anyway.

Thanks,
John

> 
>>>
>>> lspcu:
>>>
>>> # lscpu
>>> Architecture:          x86_64
>>> CPU op-mode(s):        32-bit, 64-bit
>>> Byte Order:            Little Endian
>>> CPU(s):                72
>>> On-line CPU(s) list:   0-71
>>> Thread(s) per core:    2
>>> Core(s) per socket:    18
>>> Socket(s):             2
>>> NUMA node(s):          2
>>> Vendor ID:             GenuineIntel
>>> CPU family:            6
>>> Model:                 85
>>> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
>>> Stepping:              4
>>> CPU MHz:               3204.246
>>> CPU max MHz:           3700.0000
>>> CPU min MHz:           1200.0000
>>> BogoMIPS:              5400.00
>>> Virtualization:        VT-x
>>> L1d cache:             32K
>>> L1i cache:             32K
>>> L2 cache:              1024K
>>> L3 cache:              25344K
>>> NUMA node0 CPU(s):     0-17,36-53
>>> NUMA node1 CPU(s):     18-35,54-71
>>> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
>>> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
>>> tm pbe s
>>> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
>>> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
>>> dtes64 monitor
>>> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
>>> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
>>> lahf_lm abm
>>> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
>>> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
>>> bmi1 hle
>>> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
>>> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
>>> xsavec
>>> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
>>>
>>>
>>
>> [snip]
>>
>>> 4. This patch removes below code from driver so what this piece of
>>> code does is broken-
>>>
>>>
>>> -                               if (instance->adapter_type >=
>>> INVADER_SERIES &&
>>> -                                   !instance->msix_combined) {
>>> -                                       instance->msix_load_balance =
>>> true;
>>> -                                       instance->smp_affinity_enable
>>> = false;
>>> -                               }
>>
>> Does this code need to be re-added? Would this have affected your test?
>> Primarily this patch was required to enable interrupt affinity on my
> machine (Lenovo RAID 930-8i).
> Can you give me some information why the code is present in the first
> place? Some hardware limitation, maybe?
> 
> Cheers,
> 
> Hannes
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-16 15:47             ` John Garry
@ 2020-01-16 17:45               ` Hannes Reinecke
  0 siblings, 0 replies; 37+ messages in thread
From: Hannes Reinecke @ 2020-01-16 17:45 UTC (permalink / raw)
  To: John Garry, Sumit Saxena
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On 1/16/20 4:47 PM, John Garry wrote:
> 
>>>>
>>>> Hi Hannes,
>>>>
>>>> Sorry for the delay in replying, I observed a few issues with this
>>>> patchset:
>>>>
>>>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
>>>> which IO submitter CPU is affined with. Due to this IO submission and
>>>> completion CPUs are different which causes performance drop for low
>>>> latency workloads.
>>>
>>> Hi Sumit,
>>>
>>> So the new code has:
>>>
>>> megasas_build_ldio_fusion()
>>> {
>>>
>>> cmd->request_desc->SCSIIO.MSIxIndex =
>>> blk_mq_unique_tag_to_hwq(tag);
>>>
>>> }
>>>
>>> So the value here is hw queue index from blk-mq point of view, and not
>>> megaraid_sas msix index, as you alluded to.
>>>
>>> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
>>> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
>>> midlayer.
>>>
>>> So I think that this should be:
>>>
>>> cmd->request_desc->SCSIIO.MSIxIndex =
>>> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
>>>
>>>
>> Indeed, that sounds reasonable.
>> (The whole queue mapping stuff isn't exactly well documented :-( )
>>
> 
> Yeah, there's certainly lots of knobs and levers in this driver.
> 
>> I'll be updating the patch.
> 
> About this one:
> 
>  > 2. Seeing below stack traces/messages in dmesg during driver unload –
>  >
>  > [2565601.054366] Call Trace:
>  > [2565601.054368]  blk_mq_free_map_and_requests+0x28/0x50
>  > [2565601.054369]  blk_mq_free_tag_set+0x1d/0x90
>  > [2565601.054370]  scsi_host_dev_release+0x8a/0xf0
>  > [2565601.054370]  device_release+0x27/0x80
>  > [2565601.054371]  kobject_cleanup+0x61/0x190
>  > [2565601.054373]  megasas_detach_one+0x4c1/0x650 [megaraid_sas]
>  > [2565601.054374]  pci_device_remove+0x3b/0xc0
>  > [2565601.054375]  device_release_driver_internal+0xec/0x1b0
>  > [2565601.054376]  driver_detach+0x46/0x90
>  > [2565601.054377]  bus_remove_driver+0x58/0xd0
>  > [2565601.054378]  pci_unregister_driver+0x26/0xa0
>  > [2565601.054379]  megasas_exit+0x91/0x882 [megaraid_sas]
>  > [2565601.054381]  __x64_sys_delete_module+0x16c/0x250
>  > [2565601.054382]  do_syscall_64+0x5b/0x1b0
>  > [2565601.054383]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  > [2565601.054383] RIP: 0033:0x7f7212a82837
>  > [2565601.054384] RSP: 002b:00007ffdfa2dcea8 EFLAGS: 00000202 ORIG_RAX:
>  > 00000000000000b0
>  > [2565601.054385] RAX: ffffffffffffffda RBX: 0000000000b6e2e0 RCX:
>  > 00007f7212a82837
>  > [2565601.054385] RDX: 00007f7212af3ac0 RSI: 0000000000000800 RDI:
>  > 0000000000b6e348
>  > [2565601.054386] RBP: 0000000000000000 R08: 00007f7212d47060 R09:
>  > 00007f7212af3ac0
>  > [2565601.054386] R10: 00007ffdfa2dcbc0 R11: 0000000000000202 R12:
>  > 00007ffdfa2dd71c
>  > [2565601.054387] R13: 0000000000000000 R14: 0000000000b6e2e0 R15:
>  > 0000000000b6e010
>  > [2565601.054387] ---[ end trace 38899303bd85e838 ]---
> 
> 
> I see it also for hisi_sas_v3_hw.
> 
> And so I don't understand the code change here, specifically where the 
> WARN is generated:
> 
> void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>               unsigned int hctx_idx)
> {
>      struct page *page;
>      int i;
> 
>      if (tags->rqs) {
>          for (i = 0; i < tags->nr_tags; i++)
>              if (WARN_ON(tags->rqs[i]))
>                  tags->rqs[i] = NULL; <--- here
>      }
> 
> 
> I thought that tags->rqs[i] was just a holder for a pointer to a static 
> tag, like assigned here:
> 
> static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
> unsigned int tag, unsigned int op, u64 alloc_time_ns)
> {
>      struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
>      struct request *rq = tags->static_rqs[tag];
> 
>      ...
> 
>      rq->tag = tag;
>      rq->internal_tag = -1;
>      data->hctx->tags->rqs[rq->tag] = rq;
> 
>      ...
> }
> 
> So I don't know why we need to WARN if unset, and then also clear it. 
> The memory is freed pretty soon after this anyway.
> 
Indeed, ->rqs is a holder, referencing an entry in ->static_rqs.
Point here is that ->rqs is set when allocating a request, and should be 
zeroed when freeing the request.
And then this above patch would warn us if there's an imbalance, ie an 
allocated request didn't get freed.
But apparently the latter part didn't happen, leaving us with stale 
entries in ->rqs.
Either we fix that, or we drop the WARN_ON.
Personally I like clearing of the ->rqs pointer (as then it's easier to 
track use-after-free issues), but then this might have performance 
implications, and Jens might have some views about it.
So I'm fine with dropping it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-13 17:42         ` John Garry
  2020-01-14  7:05           ` Hannes Reinecke
@ 2020-01-17 11:18           ` Sumit Saxena
  2020-02-13 10:07             ` John Garry
  1 sibling, 1 reply; 37+ messages in thread
From: Sumit Saxena @ 2020-01-17 11:18 UTC (permalink / raw)
  To: John Garry
  Cc: Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On Mon, Jan 13, 2020 at 11:12 PM John Garry <john.garry@huawei.com> wrote:
>
> On 10/01/2020 04:00, Sumit Saxena wrote:
> > On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> On 12/9/19 11:10 AM, Sumit Saxena wrote:
> >>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
> >>>>
> >>>> Fusion adapters can steer completions to individual queues, and
> >>>> we now have support for shared host-wide tags.
> >>>> So we can enable multiqueue support for fusion adapters and
> >>>> drop the hand-crafted interrupt affinity settings.
> >>>
> >>> Hi Hannes,
> >>>
> >>> Ming Lei also proposed similar changes in megaraid_sas driver some
> >>> time back and it had resulted in performance drop-
> >>> https://patchwork.kernel.org/patch/10969511/
> >>>
> >>> So, we will do some performance tests with this patch and update you.
> >>> Thank you.
> >>
> >> I'm aware of the results of Ming Leis work, but I do hope this patchset
> >> performs better.
> >>
> >> And when you do performance measurements, can you please run with both,
> >> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
> >> I've measured quite a performance improvements when using mq-deadline,
> >> up to the point where I've gotten on-par performance with the original,
> >> non-mq, implementation.
> >> (As a data point, on my setup I've measured about 270k IOPS and 1092
> >> MB/s througput, running on just 2 SSDs).
> >>asas_build_ldio_fusion
> >> But thanks for doing a performance test here.
> >
> > Hi Hannes,
> >
> > Sorry for the delay in replying, I observed a few issues with this patchset:
> >
> > 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> > which IO submitter CPU is affined with. Due to this IO submission and
> > completion CPUs are different which causes performance drop for low
> > latency workloads.
>
> Hi Sumit,
>
> So the new code has:
>
> megasas_build_ldio_fusion()
> {
>
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag);
>
> }
>
> So the value here is hw queue index from blk-mq point of view, and not
> megaraid_sas msix index, as you alluded to.
Yes John, value filled in "cmd->request_desc->SCSIIO.MSIxIndex" is HW
queue index.

>
> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
> midlayer.
>
> So I think that this should be:
>
> cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
Agreed, this should return correct HW queue index.
>
>
> >
> > lspcu:
> >
> > # lscpu
> > Architecture:          x86_64
> > CPU op-mode(s):        32-bit, 64-bit
> > Byte Order:            Little Endian
> > CPU(s):                72
> > On-line CPU(s) list:   0-71
> > Thread(s) per core:    2
> > Core(s) per socket:    18
> > Socket(s):             2
> > NUMA node(s):          2
> > Vendor ID:             GenuineIntel
> > CPU family:            6
> > Model:                 85
> > Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> > Stepping:              4
> > CPU MHz:               3204.246
> > CPU max MHz:           3700.0000
> > CPU min MHz:           1200.0000
> > BogoMIPS:              5400.00
> > Virtualization:        VT-x
> > L1d cache:             32K
> > L1i cache:             32K
> > L2 cache:              1024K
> > L3 cache:              25344K
> > NUMA node0 CPU(s):     0-17,36-53
> > NUMA node1 CPU(s):     18-35,54-71
> > Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> > mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> > tm pbe s
> > yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> > rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> > dtes64 monitor
> > ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> > sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
> > lahf_lm abm
> > 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
> > mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> > bmi1 hle
> > avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
> > adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> > xsavec
> > xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
> >
> >
>
> [snip]
>
> > 4. This patch removes below code from driver so what this piece of
> > code does is broken-
> >
> >
> > -                               if (instance->adapter_type >= INVADER_SERIES &&
> > -                                   !instance->msix_combined) {
> > -                                       instance->msix_load_balance = true;
> > -                                       instance->smp_affinity_enable = false;
> > -                               }
>
> Does this code need to be re-added? Would this have affected your test?
This code did not affect my test but has to be re-added for affected hardware.
There are few megaraid_sas adapters for which "instance->msix_combined"
would be "0" and we need this code for those adapters.

Thanks,
Sumit
>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-14  7:05           ` Hannes Reinecke
  2020-01-16 15:47             ` John Garry
@ 2020-01-17 11:40             ` Sumit Saxena
  1 sibling, 0 replies; 37+ messages in thread
From: Sumit Saxena @ 2020-01-17 11:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: John Garry, Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, Linux SCSI List, linux-block,
	Hannes Reinecke

On Tue, Jan 14, 2020 at 12:35 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 1/13/20 6:42 PM, John Garry wrote:
> > On 10/01/2020 04:00, Sumit Saxena wrote:
> >> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
> >>>
> >>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
> >>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
> >>>>>
> >>>>> Fusion adapters can steer completions to individual queues, and
> >>>>> we now have support for shared host-wide tags.
> >>>>> So we can enable multiqueue support for fusion adapters and
> >>>>> drop the hand-crafted interrupt affinity settings.
> >>>>
> >>>> Hi Hannes,
> >>>>
> >>>> Ming Lei also proposed similar changes in megaraid_sas driver some
> >>>> time back and it had resulted in performance drop-
> >>>> https://patchwork.kernel.org/patch/10969511/
> >>>>
> >>>> So, we will do some performance tests with this patch and update you.
> >>>> Thank you.
> >>>
> >>> I'm aware of the results of Ming Leis work, but I do hope this patchset
> >>> performs better.
> >>>
> >>> And when you do performance measurements, can you please run with both,
> >>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
> >>> I've measured quite a performance improvements when using mq-deadline,
> >>> up to the point where I've gotten on-par performance with the original,
> >>> non-mq, implementation.
> >>> (As a data point, on my setup I've measured about 270k IOPS and 1092
> >>> MB/s througput, running on just 2 SSDs).
> >>> asas_build_ldio_fusion
> >>> But thanks for doing a performance test here.
> >>
> >> Hi Hannes,
> >>
> >> Sorry for the delay in replying, I observed a few issues with this
> >> patchset:
> >>
> >> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> >> which IO submitter CPU is affined with. Due to this IO submission and
> >> completion CPUs are different which causes performance drop for low
> >> latency workloads.
> >
> > Hi Sumit,
> >
> > So the new code has:
> >
> > megasas_build_ldio_fusion()
> > {
> >
> > cmd->request_desc->SCSIIO.MSIxIndex =
> > blk_mq_unique_tag_to_hwq(tag);
> >
> > }
> >
> > So the value here is hw queue index from blk-mq point of view, and not
> > megaraid_sas msix index, as you alluded to.
> >
> > So we get 80 msix, 8 are reserved for low_latency_index_start (that's
> > how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
> > midlayer.
> >
> > So I think that this should be:
> >
> > cmd->request_desc->SCSIIO.MSIxIndex =
> > blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
> >
> >
> Indeed, that sounds reasonable.
> (The whole queue mapping stuff isn't exactly well documented :-( )
>
> I'll be updating the patch.
>
> >>
> >> lspcu:
> >>
> >> # lscpu
> >> Architecture:          x86_64
> >> CPU op-mode(s):        32-bit, 64-bit
> >> Byte Order:            Little Endian
> >> CPU(s):                72
> >> On-line CPU(s) list:   0-71
> >> Thread(s) per core:    2
> >> Core(s) per socket:    18
> >> Socket(s):             2
> >> NUMA node(s):          2
> >> Vendor ID:             GenuineIntel
> >> CPU family:            6
> >> Model:                 85
> >> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> >> Stepping:              4
> >> CPU MHz:               3204.246
> >> CPU max MHz:           3700.0000
> >> CPU min MHz:           1200.0000
> >> BogoMIPS:              5400.00
> >> Virtualization:        VT-x
> >> L1d cache:             32K
> >> L1i cache:             32K
> >> L2 cache:              1024K
> >> L3 cache:              25344K
> >> NUMA node0 CPU(s):     0-17,36-53
> >> NUMA node1 CPU(s):     18-35,54-71
> >> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> >> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> >> tm pbe s
> >> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> >> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> >> dtes64 monitor
> >> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> >> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
> >> lahf_lm abm
> >> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
> >> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> >> bmi1 hle
> >> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
> >> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> >> xsavec
> >> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
> >>
> >>
> >
> > [snip]
> >
> >> 4. This patch removes below code from driver so what this piece of
> >> code does is broken-
> >>
> >>
> >> -                               if (instance->adapter_type >=
> >> INVADER_SERIES &&
> >> -                                   !instance->msix_combined) {
> >> -                                       instance->msix_load_balance =
> >> true;
> >> -                                       instance->smp_affinity_enable
> >> = false;
> >> -                               }
> >
> > Does this code need to be re-added? Would this have affected your test?
> > Primarily this patch was required to enable interrupt affinity on my
> machine (Lenovo RAID 930-8i).
> Can you give me some information why the code is present in the first
> place? Some hardware limitation, maybe?

Hi Hannes,
This code is for specific family of megaraid_sas adapters and Lenovo
RAID 930-8i belongs to them. For those adapters, we want to use
available HW queues in round robin fashion instead of using interrupt
affinity. It helps to improve performance and fixes soft lockups.
For interrupt affinity test purpose on Lenovo RAID 930-8i, you can
disable this code to use affinity. But in the final patch, this code
has to be present. This code was added through below commit:

commit 1d15d9098ad12b0021ac5a6b851f26d1ab021e5a
Author: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Date:   Tue May 7 10:05:36 2019 -0700

    scsi: megaraid_sas: Load balance completions across all MSI-X

    Driver will use "reply descriptor post queues" in round robin fashion when
    the combined MSI-X mode is not enabled. With this IO completions are
    distributed and load balanced across all the available reply descriptor
    post queues equally.

    This is enabled only if combined MSI-X mode is not enabled in firmware.
    This improves performance and also fixes soft lockups.

    When load balancing is enabled, IRQ affinity from driver needs to be
    disabled.

    Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
    Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


Thanks,
Sumit

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                   Teamlead Storage & Networking
> hare@suse.de                                      +49 911 74053 688
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-01-17 11:18           ` Sumit Saxena
@ 2020-02-13 10:07             ` John Garry
  2020-02-17 10:09               ` Sumit Saxena
  0 siblings, 1 reply; 37+ messages in thread
From: John Garry @ 2020-02-13 10:07 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On 17/01/2020 11:18, Sumit Saxena wrote:
>>>> or doing a performance test here.
>>> Hi Hannes,
>>>

Hi Sumit,

>>> Sorry for the delay in replying, I observed a few issues with this patchset:
>>>
>>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
>>> which IO submitter CPU is affined with. Due to this IO submission and
>>> completion CPUs are different which causes performance drop for low
>>> latency workloads.
>> Hi Sumit,
>>
>> So the new code has:
>>
>> megasas_build_ldio_fusion()
>> {
>>
>> cmd->request_desc->SCSIIO.MSIxIndex =
>> blk_mq_unique_tag_to_hwq(tag);
>>
>> }
>>
>> So the value here is hw queue index from blk-mq point of view, and not
>> megaraid_sas msix index, as you alluded to.
> Yes John, value filled in "cmd->request_desc->SCSIIO.MSIxIndex" is HW
> queue index.
> 
>> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
>> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
>> midlayer.
>>
>> So I think that this should be:
>>
>> cmd->request_desc->SCSIIO.MSIxIndex =
>> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;

Can you possibly test performance again with this local change, or would 
you rather an updated patchset be sent?

> Agreed, this should return correct HW queue index.
>>


Thanks,
John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
  2020-02-13 10:07             ` John Garry
@ 2020-02-17 10:09               ` Sumit Saxena
  0 siblings, 0 replies; 37+ messages in thread
From: Sumit Saxena @ 2020-02-17 10:09 UTC (permalink / raw)
  To: John Garry
  Cc: Hannes Reinecke, Martin K. Petersen, Jens Axboe,
	Christoph Hellwig, James Bottomley, Ming Lei, Linux SCSI List,
	linux-block, Hannes Reinecke

On Thu, Feb 13, 2020 at 3:37 PM John Garry <john.garry@huawei.com> wrote:
>
> On 17/01/2020 11:18, Sumit Saxena wrote:
> >>>> or doing a performance test here.
> >>> Hi Hannes,
> >>>
>
> Hi Sumit,
>
> >>> Sorry for the delay in replying, I observed a few issues with this patchset:
> >>>
> >>> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> >>> which IO submitter CPU is affined with. Due to this IO submission and
> >>> completion CPUs are different which causes performance drop for low
> >>> latency workloads.
> >> Hi Sumit,
> >>
> >> So the new code has:
> >>
> >> megasas_build_ldio_fusion()
> >> {
> >>
> >> cmd->request_desc->SCSIIO.MSIxIndex =
> >> blk_mq_unique_tag_to_hwq(tag);
> >>
> >> }
> >>
> >> So the value here is hw queue index from blk-mq point of view, and not
> >> megaraid_sas msix index, as you alluded to.
> > Yes John, value filled in "cmd->request_desc->SCSIIO.MSIxIndex" is HW
> > queue index.
> >
> >> So we get 80 msix, 8 are reserved for low_latency_index_start (that's
> >> how it seems to me), and we report other 72 as #hw queues = 72 to SCSI
> >> midlayer.
> >>
> >> So I think that this should be:
> >>
> >> cmd->request_desc->SCSIIO.MSIxIndex =
> >> blk_mq_unique_tag_to_hwq(tag) + low_latency_index_start;
>
> Can you possibly test performance again with this local change, or would
> you rather an updated patchset be sent?
Updated patchset is not required. I will do performance run with this
change and update.

Thanks,
Sumit
>
> > Agreed, this should return correct HW queue index.
> >>
>
>
> Thanks,
> John

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
                   ` (10 preceding siblings ...)
  2019-12-02 15:39 ` [PATCH 11/11] hpsa: enable host_tagset and switch to MQ Hannes Reinecke
@ 2020-02-26 11:09 ` John Garry
  11 siblings, 0 replies; 37+ messages in thread
From: John Garry @ 2020-02-26 11:09 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Jens Axboe, Christoph Hellwig,
	James Bottomley, Ming Lei, linux-scsi, linux-block, Sumit Saxena,
	chenxiang

On 02/12/2019 15:39, Hannes Reinecke wrote:
> Hi all,
> 

JFYI, Sumit requested that we rebase this patchset for testing against 
the latest kernel - it no longer applies for 5.6-rc. I'm going to do 
that now and repost.

Thanks,
John

> here now is an updated version of the v2 patchset from John Garry,
> including the suggestions and reviews from the mailing list.
> John, apologies for hijacking your work :-)
> 
> For this version I've only added some slight modifications to Johns
> original patch (renaming variables etc); the contentious separate sbitmap
> allocation has been dropped in favour of Johns original version with pointers
> to the embedded sbitmap.
> 
> But more importantly I've reworked the scheduler tag allocation after
> discussions with Ming Lei.
> 
> Point is, hostwide shared tags can't really be resized; they surely
> cannot be increased (as it's a hardware limitation), and even decreasing
> is questionable as any modification here would affect all devices
> served by this HBA.
> 
> Scheduler tags, OTOH, can be considered as per-queue, as the I/O scheduler
> might want to look at all requests on all queues. As such the queue depth
> is distinct from the actual queue depth of the tagset.
> Seeing that it is distinct the depth can now be changed independently of
> the underlying tagset, and there's no need ever to change the tagset itself.
> 
> I've also modified megaraid_sas, smartpqi and hpsa to take advantage of
> host_tags.
> 
> Performance for megaraid_sas is on par with the original implementation,
> with the added benefit that with this we should be able to handle cpu
> hotplug properly.
> 
> Differences to v4:
> - Rework scheduler tag allocations
> - Revert back to the original implementation from John
> 
> Differences to v3:
> - Include reviews from Ming Lei
> 
> Differences to v2:
> - Drop embedded tag bitmaps
> - Do not share scheduling tags
> - Add patches for hpsa and smartpqi
> 
> Differences to v1:
> - Use a shared sbitmap, and not a separate shared tags (a big change!)
> 	- Drop request.shared_tag
> - Add RB tags
> 
> Hannes Reinecke (7):
>    blk-mq: rename blk_mq_update_tag_set_depth()
>    blk-mq: add WARN_ON in blk_mq_free_rqs()
>    blk-mq: move shared sbitmap into elevator queue
>    scsi: Add template flag 'host_tagset'
>    megaraid_sas: switch fusion adapters to MQ
>    smartpqi: enable host tagset
>    hpsa: enable host_tagset and switch to MQ
> 
> John Garry (3):
>    blk-mq: Remove some unused function arguments
>    blk-mq: Facilitate a shared sbitmap per tagset
>    scsi: hisi_sas: Switch v3 hw to MQ
> 
> Ming Lei (1):
>    blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
>   block/bfq-iosched.c                         |   4 +-
>   block/blk-mq-debugfs.c                      |  12 +--
>   block/blk-mq-sched.c                        |  22 +++++
>   block/blk-mq-tag.c                          | 140 +++++++++++++++++++++-------
>   block/blk-mq-tag.h                          |  27 ++++--
>   block/blk-mq.c                              | 104 +++++++++++++++------
>   block/blk-mq.h                              |   7 +-
>   block/blk-sysfs.c                           |   7 ++
>   block/kyber-iosched.c                       |   4 +-
>   drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
>   drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 +++----
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  86 +++++++----------
>   drivers/scsi/hpsa.c                         |  44 ++-------
>   drivers/scsi/hpsa.h                         |   1 -
>   drivers/scsi/megaraid/megaraid_sas.h        |   1 -
>   drivers/scsi/megaraid/megaraid_sas_base.c   |  65 ++++---------
>   drivers/scsi/megaraid/megaraid_sas_fusion.c |  14 ++-
>   drivers/scsi/scsi_lib.c                     |   2 +
>   drivers/scsi/smartpqi/smartpqi_init.c       |  38 ++++++--
>   include/linux/blk-mq.h                      |   7 +-
>   include/linux/elevator.h                    |   3 +
>   include/scsi/scsi_host.h                    |   3 +
>   22 files changed, 380 insertions(+), 250 deletions(-)
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-02-26 11:09 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
2019-12-02 15:39 ` [PATCH 01/11] blk-mq: Remove some unused function arguments Hannes Reinecke
2019-12-02 15:39 ` [PATCH 02/11] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
2019-12-02 15:39 ` [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth() Hannes Reinecke
2019-12-03 14:30   ` John Garry
2019-12-03 14:53     ` Hannes Reinecke
2019-12-02 15:39 ` [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
2019-12-03 14:54   ` John Garry
2019-12-03 15:02     ` Hannes Reinecke
2019-12-04 10:24       ` John Garry
2019-12-03 16:38     ` Bart Van Assche
2019-12-02 15:39 ` [PATCH 05/11] blk-mq: add WARN_ON in blk_mq_free_rqs() Hannes Reinecke
2019-12-02 15:39 ` [PATCH 06/11] blk-mq: move shared sbitmap into elevator queue Hannes Reinecke
2019-12-02 15:39 ` [PATCH 07/11] scsi: Add template flag 'host_tagset' Hannes Reinecke
2019-12-02 15:39 ` [PATCH 08/11] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
2019-12-02 15:39 ` [PATCH 09/11] megaraid_sas: switch fusion adapters " Hannes Reinecke
2019-12-09 10:10   ` Sumit Saxena
2019-12-09 11:02     ` Hannes Reinecke
2020-01-10  4:00       ` Sumit Saxena
2020-01-10 12:18         ` John Garry
2020-01-13 17:42         ` John Garry
2020-01-14  7:05           ` Hannes Reinecke
2020-01-16 15:47             ` John Garry
2020-01-16 17:45               ` Hannes Reinecke
2020-01-17 11:40             ` Sumit Saxena
2020-01-17 11:18           ` Sumit Saxena
2020-02-13 10:07             ` John Garry
2020-02-17 10:09               ` Sumit Saxena
2020-01-09 11:55     ` John Garry
2020-01-09 15:19       ` Hannes Reinecke
2020-01-09 18:17         ` John Garry
2020-01-10  2:00       ` Ming Lei
2020-01-10 12:09         ` John Garry
2020-01-14 13:57           ` John Garry
2019-12-02 15:39 ` [PATCH 10/11] smartpqi: enable host tagset Hannes Reinecke
2019-12-02 15:39 ` [PATCH 11/11] hpsa: enable host_tagset and switch to MQ Hannes Reinecke
2020-02-26 11:09 ` [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.