All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2020-08-19 15:20 John Garry
  2020-08-19 15:20 ` [PATCH v8 01/18] blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
                   ` (21 more replies)
  0 siblings, 22 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Hi all,

Here is v8 of the patchset.

In this version of the series, we keep the shared sbitmap for driver tags,
and introduce changes to fix up the tag budgeting across request queues.
We also have a change to count requests per-hctx for when an elevator is
enabled, as an optimisation. I also dropped the debugfs changes - more on
that below.

Some performance figures:

Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
but it is not always an appropriate scheduler to use.

Tag depth 		4000 (default)			260**

Baseline (v5.9-rc1):
none sched:		2094K IOPS			513K
mq-deadline sched:	2145K IOPS			1336K

Final, host_tagset=0 in LLDD *, ***:
none sched:		2120K IOPS			550K
mq-deadline sched:	2121K IOPS			1309K

Final ***:
none sched:		2132K IOPS			1185			
mq-deadline sched:	2145K IOPS			2097	

* this is relevant as this is the performance in supporting but not
  enabling the feature
** depth=260 is relevant as some point where we are regularly waiting for
   tags to be available. Figures were are a bit unstable here.
*** Included "[PATCH V4] scsi: core: only re-run queue in
    scsi_end_request() if device queue is busy"

A copy of the patches can be found here:
https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8

The hpsa patch depends on:
https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/

And the smartpqi patch is not to be accepted.

Comments (and testing) welcome, thanks!

Differences to v7:
- Add null_blk and scsi_debug support
- Drop debugfs tags patch - it's too difficult to be the same between
hostwide and non-hostwide, as discussed:
https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
sbitmap_bitmap_show()"), I guess not many used this anyway...
- Add elevator per-hctx request count for optimisation
- Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
- Pass flags for avoid per-hq queue tags init/free for hostwide tags
- Add Don's reviewed-tag and tested-by tags to appropiate patches
	- (@Don, please let me know if issue with how I did this)
- Add "scsi: core: Show nr_hw_queues in sysfs"
- Rework megaraid SAS patch to have module param (Kashyap)
- rebase

V7 is here for more info:
https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t

Hannes Reinecke (5):
  blk-mq: Rename blk_mq_update_tag_set_depth()
  blk-mq: Free tags in blk_mq_init_tags() upon error
  scsi: Add host and host template flag 'host_tagset'
  hpsa: enable host_tagset and switch to MQ
  smartpqi: enable host tagset

John Garry (10):
  blk-mq: Pass flags for tag init/free
  blk-mq: Use pointers for blk_mq_tags bitmap tags
  blk-mq: Facilitate a shared sbitmap per tagset
  blk-mq: Relocate hctx_may_queue()
  blk-mq: Record nr_active_requests per queue for when using shared
    sbitmap
  blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
    shared sbitmap
  null_blk: Support shared tag bitmap
  scsi: core: Show nr_hw_queues in sysfs
  scsi: hisi_sas: Switch v3 hw to MQ
  scsi: scsi_debug: Support host tagset

Kashyap Desai (2):
  blk-mq, elevator: Count requests per hctx to improve performance
  scsi: megaraid_sas: Added support for shared host tagset for
    cpuhotplug

Ming Lei (1):
  blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED

 block/bfq-iosched.c                         |   9 +-
 block/blk-core.c                            |   2 +
 block/blk-mq-debugfs.c                      |  10 +-
 block/blk-mq-sched.c                        |  13 +-
 block/blk-mq-tag.c                          | 149 ++++++++++++++------
 block/blk-mq-tag.h                          |  56 +++-----
 block/blk-mq.c                              |  81 +++++++----
 block/blk-mq.h                              |  76 +++++++++-
 block/kyber-iosched.c                       |   4 +-
 block/mq-deadline.c                         |   6 +
 drivers/block/null_blk_main.c               |   6 +
 drivers/block/rnbd/rnbd-clt.c               |   2 +-
 drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 ++---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  87 +++++-------
 drivers/scsi/hosts.c                        |   1 +
 drivers/scsi/hpsa.c                         |  44 +-----
 drivers/scsi/hpsa.h                         |   1 -
 drivers/scsi/megaraid/megaraid_sas_base.c   |  39 +++++
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  29 ++--
 drivers/scsi/scsi_debug.c                   |  28 ++--
 drivers/scsi/scsi_lib.c                     |   2 +
 drivers/scsi/scsi_sysfs.c                   |  11 ++
 drivers/scsi/smartpqi/smartpqi_init.c       |  45 ++++--
 include/linux/blk-mq.h                      |  13 +-
 include/linux/blkdev.h                      |   3 +
 include/scsi/scsi_host.h                    |   9 +-
 27 files changed, 484 insertions(+), 281 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 01/18] blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 02/18] blk-mq: Rename blk_mq_update_tag_set_depth() John Garry
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Ming Lei <ming.lei@redhat.com>

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[jpg: rebase a few times, add rnbd-clt.c change]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-debugfs.c        |  2 +-
 block/blk-mq-tag.h            |  6 +++---
 block/blk-mq.c                | 20 ++++++++++----------
 drivers/block/rnbd/rnbd-clt.c |  2 +-
 include/linux/blk-mq.h        |  2 +-
 5 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3f09bcb8a6fd..a4597febff69 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -240,7 +240,7 @@ static const char *const alloc_policy_name[] = {
 #define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
-	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 	HCTX_FLAG_NAME(STACKING),
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index b1acac518c4e..918e2cee4f43 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -56,7 +56,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
 static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return false;
 
 	return __blk_mq_tag_busy(hctx);
@@ -64,7 +64,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return;
 
 	__blk_mq_tag_idle(hctx);
@@ -79,7 +79,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0015a1892153..8f95cc443527 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1124,7 +1124,7 @@ static bool blk_mq_get_driver_tag(struct request *rq)
 	if (rq->tag == BLK_MQ_NO_TAG && !__blk_mq_get_driver_tag(rq))
 		return false;
 
-	if ((hctx->flags & BLK_MQ_F_TAG_SHARED) &&
+	if ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
 			!(rq->rq_flags & RQF_MQ_INFLIGHT)) {
 		rq->rq_flags |= RQF_MQ_INFLIGHT;
 		atomic_inc(&hctx->nr_active);
@@ -1168,7 +1168,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -1420,7 +1420,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
 		bool needs_restart;
 		/* For non-shared tags, the RESTART check will suffice */
 		bool no_tag = prep == PREP_DISPATCH_NO_TAG &&
-                        (hctx->flags & BLK_MQ_F_TAG_SHARED);
+			(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED);
 		bool no_budget_avail = prep == PREP_DISPATCH_NO_BUDGET;
 
 		blk_mq_release_budgets(q, nr_budgets);
@@ -2657,7 +2657,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
 	hctx->queue = q;
-	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
 
 	INIT_LIST_HEAD(&hctx->hctx_list);
 
@@ -2874,9 +2874,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (shared)
-			hctx->flags |= BLK_MQ_F_TAG_SHARED;
+			hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		else
-			hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+			hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 	}
 }
 
@@ -2902,7 +2902,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	list_del(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
-		set->flags &= ~BLK_MQ_F_TAG_SHARED;
+		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, false);
 	}
@@ -2919,12 +2919,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
 	if (!list_empty(&set->tag_list) &&
-	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
-		set->flags |= BLK_MQ_F_TAG_SHARED;
+	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, true);
 	}
-	if (set->flags & BLK_MQ_F_TAG_SHARED)
+	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
 	list_add_tail(&q->tag_set_list, &set->tag_list);
 
diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index cc6a4e2587ae..77b0e0c62c36 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1180,7 +1180,7 @@ static int setup_mq_tags(struct rnbd_clt_session *sess)
 	tag_set->queue_depth	= sess->queue_depth;
 	tag_set->numa_node		= NUMA_NO_NODE;
 	tag_set->flags		= BLK_MQ_F_SHOULD_MERGE |
-				  BLK_MQ_F_TAG_SHARED;
+				  BLK_MQ_F_TAG_QUEUE_SHARED;
 	tag_set->cmd_size		= sizeof(struct rnbd_iu);
 	tag_set->nr_hw_queues	= num_online_cpus();
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 9d2d5ad367a4..95aca7aa8957 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -378,7 +378,7 @@ struct blk_mq_ops {
 
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
-	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_QUEUE_SHARED = 1 << 1,
 	/*
 	 * Set when this device requires underlying blk-mq device for
 	 * completing IO:
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 02/18] blk-mq: Rename blk_mq_update_tag_set_depth()
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2020-08-19 15:20 ` [PATCH v8 01/18] blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 03/18] blk-mq: Free tags in blk_mq_init_tags() upon error John Garry
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Hannes Reinecke <hare@suse.de>

The function does not set the depth, but rather transitions from
shared to non-shared queues and vice versa.

So rename it to blk_mq_update_tag_set_shared() to better reflect
its purpose.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[jpg: take out some unrelated changes in blk_mq_init_bitmap_tags()]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8f95cc443527..50de9eb60466 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2880,8 +2880,8 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 	}
 }
 
-static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
-					bool shared)
+static void blk_mq_update_tag_set_shared(struct blk_mq_tag_set *set,
+					 bool shared)
 {
 	struct request_queue *q;
 
@@ -2904,7 +2904,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 		/* just transitioned to unshared */
 		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
-		blk_mq_update_tag_set_depth(set, false);
+		blk_mq_update_tag_set_shared(set, false);
 	}
 	mutex_unlock(&set->tag_list_lock);
 	INIT_LIST_HEAD(&q->tag_set_list);
@@ -2922,7 +2922,7 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
-		blk_mq_update_tag_set_depth(set, true);
+		blk_mq_update_tag_set_shared(set, true);
 	}
 	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 03/18] blk-mq: Free tags in blk_mq_init_tags() upon error
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2020-08-19 15:20 ` [PATCH v8 01/18] blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
  2020-08-19 15:20 ` [PATCH v8 02/18] blk-mq: Rename blk_mq_update_tag_set_depth() John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 04/18] blk-mq: Pass flags for tag init/free John Garry
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Hannes Reinecke <hare@suse.de>

Since the tags are allocated in blk_mq_init_tags(), it's better practice
to free in that same function upon error, rather than a callee which is to
init the bitmap tags (blk_mq_init_tags()).

Signed-off-by: Hannes Reinecke <hare@suse.de>
[jpg: Split from an earlier patch with a new commit message]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 32d82e23b095..c2c9f369d777 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -429,24 +429,22 @@ static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
 				       node);
 }
 
-static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
-						   int node, int alloc_policy)
+static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
+				   int node, int alloc_policy)
 {
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
 	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
-		goto free_tags;
+		return -ENOMEM;
 	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
 		     node))
 		goto free_bitmap_tags;
 
-	return tags;
+	return 0;
 free_bitmap_tags:
 	sbitmap_queue_free(&tags->bitmap_tags);
-free_tags:
-	kfree(tags);
-	return NULL;
+	return -ENOMEM;
 }
 
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
@@ -467,7 +465,11 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
-	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
+	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+		kfree(tags);
+		return NULL;
+	}
+	return tags;
 }
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 04/18] blk-mq: Pass flags for tag init/free
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (2 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 03/18] blk-mq: Free tags in blk_mq_init_tags() upon error John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 05/18] blk-mq: Use pointers for blk_mq_tags bitmap tags John Garry
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Pass hctx/tagset flags argument down to blk_mq_init_tags() and
blk_mq_free_tags() for selective init/free.

For now, make it include the alloc policy flag, which can be evaluated
when needed (in blk_mq_init_tags()).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c | 11 ++++++++---
 block/blk-mq-tag.c   | 12 +++++++-----
 block/blk-mq-tag.h   |  7 ++++---
 block/blk-mq.c       | 23 +++++++++++++----------
 block/blk-mq.h       |  5 +++--
 5 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a19cdf159b75..ee970d848e62 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -607,9 +607,11 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 				   struct blk_mq_hw_ctx *hctx,
 				   unsigned int hctx_idx)
 {
+	unsigned int flags = set->flags;
+
 	if (hctx->sched_tags) {
 		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
-		blk_mq_free_rq_map(hctx->sched_tags);
+		blk_mq_free_rq_map(hctx->sched_tags, flags);
 		hctx->sched_tags = NULL;
 	}
 }
@@ -619,10 +621,11 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 				   unsigned int hctx_idx)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
+	unsigned int flags = set->flags;
 	int ret;
 
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
-					       set->reserved_tags);
+					       set->reserved_tags, flags);
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
@@ -640,8 +643,10 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
+		unsigned int flags = hctx->flags;
+
 		if (hctx->sched_tags) {
-			blk_mq_free_rq_map(hctx->sched_tags);
+			blk_mq_free_rq_map(hctx->sched_tags, flags);
 			hctx->sched_tags = NULL;
 		}
 	}
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index c2c9f369d777..0426f01e06a6 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -449,8 +449,9 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 				     unsigned int reserved_tags,
-				     int node, int alloc_policy)
+				     int node, unsigned int flags)
 {
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
 	struct blk_mq_tags *tags;
 
 	if (total_tags > BLK_MQ_TAG_MAX) {
@@ -472,7 +473,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	return tags;
 }
 
-void blk_mq_free_tags(struct blk_mq_tags *tags)
+void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
 {
 	sbitmap_queue_free(&tags->bitmap_tags);
 	sbitmap_queue_free(&tags->breserved_tags);
@@ -494,6 +495,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	 */
 	if (tdepth > tags->nr_tags) {
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
+		unsigned int flags = set->flags;
 		struct blk_mq_tags *new;
 		bool ret;
 
@@ -508,17 +510,17 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 			return -EINVAL;
 
 		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
-				tags->nr_reserved_tags);
+				tags->nr_reserved_tags, flags);
 		if (!new)
 			return -ENOMEM;
 		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
 		if (ret) {
-			blk_mq_free_rq_map(new);
+			blk_mq_free_rq_map(new, flags);
 			return -ENOMEM;
 		}
 
 		blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
-		blk_mq_free_rq_map(*tagsptr);
+		blk_mq_free_rq_map(*tagsptr, flags);
 		*tagsptr = new;
 	} else {
 		/*
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 918e2cee4f43..391f68d32fa1 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -21,9 +21,10 @@ struct blk_mq_tags {
 	struct list_head page_list;
 };
 
-
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
-extern void blk_mq_free_tags(struct blk_mq_tags *tags);
+extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
+					unsigned int reserved_tags,
+					int node, unsigned int flags);
+extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
 extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 50de9eb60466..376930061452 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2285,20 +2285,21 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	}
 }
 
-void blk_mq_free_rq_map(struct blk_mq_tags *tags)
+void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
 {
 	kfree(tags->rqs);
 	tags->rqs = NULL;
 	kfree(tags->static_rqs);
 	tags->static_rqs = NULL;
 
-	blk_mq_free_tags(tags);
+	blk_mq_free_tags(tags, flags);
 }
 
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags)
+					unsigned int reserved_tags,
+					unsigned int flags)
 {
 	struct blk_mq_tags *tags;
 	int node;
@@ -2307,8 +2308,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
-				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
+	tags = blk_mq_init_tags(nr_tags, reserved_tags, node, flags);
 	if (!tags)
 		return NULL;
 
@@ -2316,7 +2316,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 				 node);
 	if (!tags->rqs) {
-		blk_mq_free_tags(tags);
+		blk_mq_free_tags(tags, flags);
 		return NULL;
 	}
 
@@ -2325,7 +2325,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					node);
 	if (!tags->static_rqs) {
 		kfree(tags->rqs);
-		blk_mq_free_tags(tags);
+		blk_mq_free_tags(tags, flags);
 		return NULL;
 	}
 
@@ -2734,10 +2734,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 					int hctx_idx)
 {
+	unsigned int flags = set->flags;
 	int ret = 0;
 
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
-					set->queue_depth, set->reserved_tags);
+					set->queue_depth, set->reserved_tags, flags);
 	if (!set->tags[hctx_idx])
 		return false;
 
@@ -2746,7 +2747,7 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 	if (!ret)
 		return true;
 
-	blk_mq_free_rq_map(set->tags[hctx_idx]);
+	blk_mq_free_rq_map(set->tags[hctx_idx], flags);
 	set->tags[hctx_idx] = NULL;
 	return false;
 }
@@ -2754,9 +2755,11 @@ static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
 static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 					 unsigned int hctx_idx)
 {
+	unsigned int flags = set->flags;
+
 	if (set->tags && set->tags[hctx_idx]) {
 		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
-		blk_mq_free_rq_map(set->tags[hctx_idx]);
+		blk_mq_free_rq_map(set->tags[hctx_idx], flags);
 		set->tags[hctx_idx] = NULL;
 	}
 }
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 863a2f3346d4..b86e7fe47789 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -53,11 +53,12 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx,
  */
 void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx);
-void blk_mq_free_rq_map(struct blk_mq_tags *tags);
+void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags);
+					unsigned int reserved_tags,
+					unsigned int flags);
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth);
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 05/18] blk-mq: Use pointers for blk_mq_tags bitmap tags
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (3 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 04/18] blk-mq: Pass flags for tag init/free John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 06/18] blk-mq: Facilitate a shared sbitmap per tagset John Garry
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Introduce pointers for the blk_mq_tags regular and reserved bitmap tags,
with the goal of later being able to use a common shared tag bitmap across
all HW contexts in a set.

Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/bfq-iosched.c    |  4 ++--
 block/blk-mq-debugfs.c |  8 ++++----
 block/blk-mq-tag.c     | 41 ++++++++++++++++++++++-------------------
 block/blk-mq-tag.h     |  7 +++++--
 block/blk-mq.c         |  8 ++++----
 block/kyber-iosched.c  |  4 ++--
 6 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a4c0bec920cb..88f0dfa545d7 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6372,8 +6372,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
+	sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index a4597febff69..645b7f800cb8 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -452,11 +452,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 		   atomic_read(&tags->active_queues));
 
 	seq_puts(m, "\nbitmap_tags:\n");
-	sbitmap_queue_show(&tags->bitmap_tags, m);
+	sbitmap_queue_show(tags->bitmap_tags, m);
 
 	if (tags->nr_reserved_tags) {
 		seq_puts(m, "\nbreserved_tags:\n");
-		sbitmap_queue_show(&tags->breserved_tags, m);
+		sbitmap_queue_show(tags->breserved_tags, m);
 	}
 }
 
@@ -487,7 +487,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
@@ -521,7 +521,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->sched_tags)
-		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 0426f01e06a6..77fbeae05745 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->bitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->breserved_tags);
 }
 
 /*
@@ -82,10 +82,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_NO_TAG;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -131,9 +131,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->breserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->bitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -167,10 +167,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -298,9 +298,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
 	WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);
 
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
+		bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
 				 flags | BT_TAG_ITER_RESERVED);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
+	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
 }
 
 /**
@@ -416,8 +416,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -435,15 +435,18 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
-	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
+	if (bt_alloc(&tags->__bitmap_tags, depth, round_robin, node))
 		return -ENOMEM;
-	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
-		     node))
+	if (bt_alloc(&tags->__breserved_tags, tags->nr_reserved_tags,
+		     round_robin, node))
 		goto free_bitmap_tags;
 
+	tags->bitmap_tags = &tags->__bitmap_tags;
+	tags->breserved_tags = &tags->__breserved_tags;
+
 	return 0;
 free_bitmap_tags:
-	sbitmap_queue_free(&tags->bitmap_tags);
+	sbitmap_queue_free(&tags->__bitmap_tags);
 	return -ENOMEM;
 }
 
@@ -475,8 +478,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
 {
-	sbitmap_queue_free(&tags->bitmap_tags);
-	sbitmap_queue_free(&tags->breserved_tags);
+	sbitmap_queue_free(&tags->__bitmap_tags);
+	sbitmap_queue_free(&tags->__breserved_tags);
 	kfree(tags);
 }
 
@@ -527,7 +530,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
+		sbitmap_queue_resize(tags->bitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 391f68d32fa1..5b6f25bd1966 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,8 +13,11 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
-	struct sbitmap_queue bitmap_tags;
-	struct sbitmap_queue breserved_tags;
+	struct sbitmap_queue *bitmap_tags;
+	struct sbitmap_queue *breserved_tags;
+
+	struct sbitmap_queue __bitmap_tags;
+	struct sbitmap_queue __breserved_tags;
 
 	struct request **rqs;
 	struct request **static_rqs;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 376930061452..ce712ec818e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1096,14 +1096,14 @@ static inline unsigned int queued_to_index(unsigned int queued)
 
 static bool __blk_mq_get_driver_tag(struct request *rq)
 {
-	struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags;
+	struct sbitmap_queue *bt = rq->mq_hctx->tags->bitmap_tags;
 	unsigned int tag_offset = rq->mq_hctx->tags->nr_reserved_tags;
 	int tag;
 
 	blk_mq_tag_busy(rq->mq_hctx);
 
 	if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag)) {
-		bt = &rq->mq_hctx->tags->breserved_tags;
+		bt = rq->mq_hctx->tags->breserved_tags;
 		tag_offset = 0;
 	}
 
@@ -1145,7 +1145,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = &hctx->tags->bitmap_tags;
+		sbq = hctx->tags->bitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1163,7 +1163,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = hctx->tags->bitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index a38c5ab103d1..075e99c207ef 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
-	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+	sbitmap_queue_min_shallow_depth(hctx->sched_tags->bitmap_tags,
 					kqd->async_depth);
 
 	return 0;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 06/18] blk-mq: Facilitate a shared sbitmap per tagset
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (4 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 05/18] blk-mq: Use pointers for blk_mq_tags bitmap tags John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 07/18] blk-mq: Relocate hctx_may_queue() John Garry
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown. That problem is solved in commit bf0beec0607d
("blk-mq: drain I/O when all CPUs in a hctx are offline").

However, to take advantage of that blk-mq feature, the HBA HW queuess are
required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
queues need to be exposed to the upper layer.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk-mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e092ef ("scsi:
 core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
counter was removed, which would stop the LLDD being sent more than
.can_queue commands; however, it should still be ensured that the block
layer does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
are still allocated per hctx; the reason for this is that if tags and
requests were only allocated for a single hctx - like hctx0 - it may break
block drivers which expect a request be associated with a specific hctx,
i.e. not always hctx0. This will introduce extra memory usage.

This change is based on work originally from Ming Lei in [1] and from
Bart's suggestion in [2].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-sched.c   |  8 ++++---
 block/blk-mq-tag.c     | 51 ++++++++++++++++++++++++++++++++++++++----
 block/blk-mq-tag.h     |  7 ++++++
 block/blk-mq.c         | 15 +++++++++++++
 block/blk-mq.h         |  5 +++++
 include/linux/blk-mq.h |  6 +++++
 6 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ee970d848e62..777be967c4f2 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -607,7 +607,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 				   struct blk_mq_hw_ctx *hctx,
 				   unsigned int hctx_idx)
 {
-	unsigned int flags = set->flags;
+	unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
 
 	if (hctx->sched_tags) {
 		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
@@ -621,7 +621,8 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 				   unsigned int hctx_idx)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
-	unsigned int flags = set->flags;
+	/* Clear HCTX_SHARED so tags are init'ed */
+	unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
 	int ret;
 
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
@@ -643,7 +644,8 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		unsigned int flags = hctx->flags;
+		/* Clear HCTX_SHARED so tags are freed */
+		unsigned int flags = hctx->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
 
 		if (hctx->sched_tags) {
 			blk_mq_free_rq_map(hctx->sched_tags, flags);
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 77fbeae05745..c6d7ebc62bdb 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -197,7 +197,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -450,6 +450,38 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int flags)
+{
+	unsigned int depth = set->queue_depth - set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int i, node = set->numa_node;
+
+	if (bt_alloc(&set->__bitmap_tags, depth, round_robin, node))
+		return -ENOMEM;
+	if (bt_alloc(&set->__breserved_tags, set->reserved_tags,
+		     round_robin, node))
+		goto free_bitmap_tags;
+
+	for (i = 0; i < set->nr_hw_queues; i++) {
+		struct blk_mq_tags *tags = set->tags[i];
+
+		tags->bitmap_tags = &set->__bitmap_tags;
+		tags->breserved_tags = &set->__breserved_tags;
+	}
+
+	return 0;
+free_bitmap_tags:
+	sbitmap_queue_free(&set->__bitmap_tags);
+	return -ENOMEM;
+}
+
+void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
+{
+	sbitmap_queue_free(&set->__bitmap_tags);
+	sbitmap_queue_free(&set->__breserved_tags);
+}
+
 struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, unsigned int flags)
@@ -469,6 +501,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	if (flags & BLK_MQ_F_TAG_HCTX_SHARED)
+		return tags;
+
 	if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
 		kfree(tags);
 		return NULL;
@@ -478,8 +513,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
 {
-	sbitmap_queue_free(&tags->__bitmap_tags);
-	sbitmap_queue_free(&tags->__breserved_tags);
+	if (!(flags & BLK_MQ_F_TAG_HCTX_SHARED)) {
+		sbitmap_queue_free(tags->bitmap_tags);
+		sbitmap_queue_free(tags->breserved_tags);
+	}
 	kfree(tags);
 }
 
@@ -498,7 +535,8 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	 */
 	if (tdepth > tags->nr_tags) {
 		struct blk_mq_tag_set *set = hctx->queue->tag_set;
-		unsigned int flags = set->flags;
+		/* Only sched tags can grow, so clear HCTX_SHARED flag  */
+		unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED;
 		struct blk_mq_tags *new;
 		bool ret;
 
@@ -537,6 +575,11 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 	return 0;
 }
 
+void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
+{
+	sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
+}
+
 /**
  * blk_mq_unique_tag() - return a tag that is unique queue-wide
  * @rq: request for which to compute a unique tag
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 5b6f25bd1966..8f4cce52ba2d 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -29,12 +29,19 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
 					int node, unsigned int flags);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
 
+extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set,
+				      unsigned int flags);
+extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
+
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
 extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 			   unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
+extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
+					     unsigned int size);
+
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ce712ec818e4..a500d1dfa1bd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3430,11 +3430,21 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		if (blk_mq_init_shared_sbitmap(set, set->flags)) {
+			ret = -ENOMEM;
+			goto out_free_mq_rq_maps;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
 	return 0;
 
+out_free_mq_rq_maps:
+	for (i = 0; i < set->nr_hw_queues; i++)
+		blk_mq_free_map_and_requests(set, i);
 out_free_mq_map:
 	for (i = 0; i < set->nr_maps; i++) {
 		kfree(set->map[i].mq_map);
@@ -3453,6 +3463,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 	for (i = 0; i < set->nr_hw_queues; i++)
 		blk_mq_free_map_and_requests(set, i);
 
+	if (blk_mq_is_sbitmap_shared(set->flags))
+		blk_mq_exit_shared_sbitmap(set);
+
 	for (j = 0; j < set->nr_maps; j++) {
 		kfree(set->map[j].mq_map);
 		set->map[j].mq_map = NULL;
@@ -3489,6 +3502,8 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		if (!hctx->sched_tags) {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
+			if (!ret && blk_mq_is_sbitmap_shared(set->flags))
+				blk_mq_tag_resize_shared_sbitmap(set, nr);
 		} else {
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index b86e7fe47789..3acdb56065e1 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -159,6 +159,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(unsigned int flags)
+{
+	return flags & BLK_MQ_F_TAG_HCTX_SHARED;
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->q->elevator)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 95aca7aa8957..03b1f39d5a72 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -231,6 +231,9 @@ enum hctx_type {
  * @flags:	   Zero or more BLK_MQ_F_* flags.
  * @driver_data:   Pointer to data owned by the block driver that created this
  *		   tag set.
+ * @__bitmap_tags: A shared tags sbitmap, used over all hctx's
+ * @__breserved_tags:
+ *		   A shared reserved tags sbitmap, used over all hctx's
  * @tags:	   Tag sets. One tag set per hardware queue. Has @nr_hw_queues
  *		   elements.
  * @tag_list_lock: Serializes tag_list accesses.
@@ -250,6 +253,8 @@ struct blk_mq_tag_set {
 	unsigned int		flags;
 	void			*driver_data;
 
+	struct sbitmap_queue	__bitmap_tags;
+	struct sbitmap_queue	__breserved_tags;
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -384,6 +389,7 @@ enum {
 	 * completing IO:
 	 */
 	BLK_MQ_F_STACKING	= 1 << 2,
+	BLK_MQ_F_TAG_HCTX_SHARED = 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 07/18] blk-mq: Relocate hctx_may_queue()
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (5 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 06/18] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 08/18] blk-mq: Record nr_active_requests per queue for when using shared sbitmap John Garry
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

blk-mq.h and blk-mq-tag.h include on each other, which is less than ideal.

Locate hctx_may_queue() to blk-mq.h, as it is not really tag specific code.

In this way, we can drop the blk-mq-tag.h include of blk-mq.h

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.h | 33 ---------------------------------
 block/blk-mq.h     | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 8f4cce52ba2d..7d3e6b333a4a 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -2,8 +2,6 @@
 #ifndef INT_BLK_MQ_TAG_H
 #define INT_BLK_MQ_TAG_H
 
-#include "blk-mq.h"
-
 /*
  * Tag address space map.
  */
@@ -81,37 +79,6 @@ static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 	__blk_mq_tag_idle(hctx);
 }
 
-/*
- * For shared tag users, we track the number of currently active users
- * and attempt to provide a fair share of the tag depth for each of them.
- */
-static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
-				  struct sbitmap_queue *bt)
-{
-	unsigned int depth, users;
-
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
-		return true;
-	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-		return true;
-
-	/*
-	 * Don't try dividing an ant
-	 */
-	if (bt->sb.depth == 1)
-		return true;
-
-	users = atomic_read(&hctx->tags->active_queues);
-	if (!users)
-		return true;
-
-	/*
-	 * Allow at least some tags
-	 */
-	depth = max((bt->sb.depth + users - 1) / users, 4U);
-	return atomic_read(&hctx->nr_active) < depth;
-}
-
 static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags *tags,
 					  unsigned int tag)
 {
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 3acdb56065e1..56dc37c21908 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -259,4 +259,36 @@ static inline struct blk_plug *blk_mq_plug(struct request_queue *q,
 	return NULL;
 }
 
+/*
+ * For shared tag users, we track the number of currently active users
+ * and attempt to provide a fair share of the tag depth for each of them.
+ */
+static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
+				  struct sbitmap_queue *bt)
+{
+	unsigned int depth, users;
+
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
+		return true;
+	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+		return true;
+
+	/*
+	 * Don't try dividing an ant
+	 */
+	if (bt->sb.depth == 1)
+		return true;
+
+	users = atomic_read(&hctx->tags->active_queues);
+	if (!users)
+		return true;
+
+	/*
+	 * Allow at least some tags
+	 */
+	depth = max((bt->sb.depth + users - 1) / users, 4U);
+	return atomic_read(&hctx->nr_active) < depth;
+}
+
+
 #endif
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 08/18] blk-mq: Record nr_active_requests per queue for when using shared sbitmap
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (6 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 07/18] blk-mq: Relocate hctx_may_queue() John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 09/18] blk-mq: Record active_queues_shared_sbitmap per tag_set " John Garry
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

The per-hctx nr_active value can no longer be used to fairly assign a share
of tag depth per request queue for when using a shared sbitmap, as it does
not consider that the tags are shared tags over all hctx's.

For this case, record the nr_active_requests per request_queue, and make
the judgement based on that value.

Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-core.c       |  2 ++
 block/blk-mq.c         |  4 ++--
 block/blk-mq.h         | 26 ++++++++++++++++++++++++--
 include/linux/blkdev.h |  2 ++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..360975255a2a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -542,6 +542,8 @@ struct request_queue *blk_alloc_queue(int node_id)
 	q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
 	q->node = node_id;
 
+	atomic_set(&q->nr_active_requests_shared_sbitmap, 0);
+
 	timer_setup(&q->backing_dev_info->laptop_mode_wb_timer,
 		    laptop_mode_timer_fn, 0);
 	timer_setup(&q->timeout, blk_rq_timed_out_timer, 0);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a500d1dfa1bd..ebb72a59b433 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -519,7 +519,7 @@ void blk_mq_free_request(struct request *rq)
 
 	ctx->rq_completed[rq_is_sync(rq)]++;
 	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		atomic_dec(&hctx->nr_active);
+		__blk_mq_dec_active_requests(hctx);
 
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->backing_dev_info);
@@ -1127,7 +1127,7 @@ static bool blk_mq_get_driver_tag(struct request *rq)
 	if ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
 			!(rq->rq_flags & RQF_MQ_INFLIGHT)) {
 		rq->rq_flags |= RQF_MQ_INFLIGHT;
-		atomic_inc(&hctx->nr_active);
+		__blk_mq_inc_active_requests(hctx);
 	}
 	hctx->tags->rqs[rq->tag] = rq;
 	return true;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 56dc37c21908..25ec73078e95 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -199,6 +199,28 @@ static inline bool blk_mq_get_dispatch_budget(struct request_queue *q)
 	return true;
 }
 
+static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx)
+{
+	if (blk_mq_is_sbitmap_shared(hctx->flags))
+		atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap);
+	else
+		atomic_inc(&hctx->nr_active);
+}
+
+static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx)
+{
+	if (blk_mq_is_sbitmap_shared(hctx->flags))
+		atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap);
+	else
+		atomic_dec(&hctx->nr_active);
+}
+
+static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx)
+{
+	if (blk_mq_is_sbitmap_shared(hctx->flags))
+		return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap);
+	return atomic_read(&hctx->nr_active);
+}
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
@@ -207,7 +229,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
 		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
+		__blk_mq_dec_active_requests(hctx);
 	}
 }
 
@@ -287,7 +309,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 	 * Allow at least some tags
 	 */
 	depth = max((bt->sb.depth + users - 1) / users, 4U);
-	return atomic_read(&hctx->nr_active) < depth;
+	return __blk_mq_active_requests(hctx) < depth;
 }
 
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bb5636cc17b9..1d85235611e1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -484,6 +484,8 @@ struct request_queue {
 	struct timer_list	timeout;
 	struct work_struct	timeout_work;
 
+	atomic_t		nr_active_requests_shared_sbitmap;
+
 	struct list_head	icq_list;
 #ifdef CONFIG_BLK_CGROUP
 	DECLARE_BITMAP		(blkcg_pols, BLKCG_MAX_POLS);
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 09/18] blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (7 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 08/18] blk-mq: Record nr_active_requests per queue for when using shared sbitmap John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 10/18] blk-mq, elevator: Count requests per hctx to improve performance John Garry
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

For when using a shared sbitmap, no longer should the number of active
request queues per hctx be relied on for when judging how to share the tag
bitmap.

Instead maintain the number of active request queues per tag_set, and make
the judgement based on that.

Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Originally-from: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c     | 33 +++++++++++++++++++++++++--------
 block/blk-mq.c         |  2 ++
 block/blk-mq.h         | 16 +++++++++++++---
 include/linux/blk-mq.h |  1 +
 include/linux/blkdev.h |  1 +
 5 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index c6d7ebc62bdb..c31c4a0478a5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -23,9 +23,18 @@
  */
 bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
-	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-		atomic_inc(&hctx->tags->active_queues);
+	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+		struct request_queue *q = hctx->queue;
+		struct blk_mq_tag_set *set = q->tag_set;
+
+		if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
+		    !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
+			atomic_inc(&set->active_queues_shared_sbitmap);
+	} else {
+		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
+		    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+			atomic_inc(&hctx->tags->active_queues);
+	}
 
 	return true;
 }
@@ -47,11 +56,19 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
 	struct blk_mq_tags *tags = hctx->tags;
-
-	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-		return;
-
-	atomic_dec(&tags->active_queues);
+	struct request_queue *q = hctx->queue;
+	struct blk_mq_tag_set *set = q->tag_set;
+
+	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+		if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
+					&q->queue_flags))
+			return;
+		atomic_dec(&set->active_queues_shared_sbitmap);
+	} else {
+		if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+			return;
+		atomic_dec(&tags->active_queues);
+	}
 
 	blk_mq_tag_wakeup_all(tags, false);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ebb72a59b433..457b43829a4f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3431,6 +3431,8 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 		goto out_free_mq_map;
 
 	if (blk_mq_is_sbitmap_shared(set->flags)) {
+		atomic_set(&set->active_queues_shared_sbitmap, 0);
+
 		if (blk_mq_init_shared_sbitmap(set, set->flags)) {
 			ret = -ENOMEM;
 			goto out_free_mq_rq_maps;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 25ec73078e95..a52703c98b77 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -292,8 +292,6 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 
 	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
-	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-		return true;
 
 	/*
 	 * Don't try dividing an ant
@@ -301,7 +299,19 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 	if (bt->sb.depth == 1)
 		return true;
 
-	users = atomic_read(&hctx->tags->active_queues);
+	if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+		struct request_queue *q = hctx->queue;
+		struct blk_mq_tag_set *set = q->tag_set;
+
+		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &q->queue_flags))
+			return true;
+		users = atomic_read(&set->active_queues_shared_sbitmap);
+	} else {
+		if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
+			return true;
+		users = atomic_read(&hctx->tags->active_queues);
+	}
+
 	if (!users)
 		return true;
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 03b1f39d5a72..a4b35ec60faf 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -252,6 +252,7 @@ struct blk_mq_tag_set {
 	unsigned int		timeout;
 	unsigned int		flags;
 	void			*driver_data;
+	atomic_t		active_queues_shared_sbitmap;
 
 	struct sbitmap_queue	__bitmap_tags;
 	struct sbitmap_queue	__breserved_tags;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1d85235611e1..67b39596436c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -617,6 +617,7 @@ struct request_queue {
 #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
 #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
+#define QUEUE_FLAG_HCTX_ACTIVE 28	/* at least one blk-mq hctx is active */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP))
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 10/18] blk-mq, elevator: Count requests per hctx to improve performance
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (8 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 09/18] blk-mq: Record active_queues_shared_sbitmap per tag_set " John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 11/18] null_blk: Support shared tag bitmap John Garry
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Kashyap Desai <kashyap.desai@broadcom.com>

High CPU utilization on "native_queued_spin_lock_slowpath" due to lock
contention is possible for mq-deadline and bfq IO schedulers
when nr_hw_queues is more than one.

It is because kblockd work queue can submit IO from all online CPUs
(through blk_mq_run_hw_queues()) even though only one hctx has pending
commands.

The elevator callback .has_work for mq-deadline and bfq scheduler considers
pending work if there are any IOs on request queue but it does not account
hctx context.

Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering
the elevator even though there are no requests queued.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
[jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/bfq-iosched.c    | 5 +++++
 block/blk-mq.c         | 1 +
 block/mq-deadline.c    | 6 ++++++
 include/linux/blk-mq.h | 4 ++++
 4 files changed, 16 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 88f0dfa545d7..4650012f1e55 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4640,6 +4640,9 @@ static bool bfq_has_work(struct blk_mq_hw_ctx *hctx)
 {
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 
+	if (!atomic_read(&hctx->elevator_queued))
+		return false;
+
 	/*
 	 * Avoiding lock: a race on bfqd->busy_queues should cause at
 	 * most a call to dispatch for nothing
@@ -5554,6 +5557,7 @@ static void bfq_insert_requests(struct blk_mq_hw_ctx *hctx,
 		rq = list_first_entry(list, struct request, queuelist);
 		list_del_init(&rq->queuelist);
 		bfq_insert_request(hctx, rq, at_head);
+		atomic_inc(&hctx->elevator_queued);
 	}
 }
 
@@ -5933,6 +5937,7 @@ static void bfq_finish_requeue_request(struct request *rq)
 
 		bfq_completed_request(bfqq, bfqd);
 		bfq_finish_requeue_request_body(bfqq);
+		atomic_dec(&rq->mq_hctx->elevator_queued);
 
 		spin_unlock_irqrestore(&bfqd->lock, flags);
 	} else {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 457b43829a4f..361fb9fe1dc5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2649,6 +2649,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 		goto free_hctx;
 
 	atomic_set(&hctx->nr_active, 0);
+	atomic_set(&hctx->elevator_queued, 0);
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 	hctx->numa_node = node;
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b57470e154c8..800ac902809b 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -386,6 +386,8 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	spin_lock(&dd->lock);
 	rq = __dd_dispatch_request(dd);
 	spin_unlock(&dd->lock);
+	if (rq)
+		atomic_dec(&rq->mq_hctx->elevator_queued);
 
 	return rq;
 }
@@ -533,6 +535,7 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 		rq = list_first_entry(list, struct request, queuelist);
 		list_del_init(&rq->queuelist);
 		dd_insert_request(hctx, rq, at_head);
+		atomic_inc(&hctx->elevator_queued);
 	}
 	spin_unlock(&dd->lock);
 }
@@ -579,6 +582,9 @@ static bool dd_has_work(struct blk_mq_hw_ctx *hctx)
 {
 	struct deadline_data *dd = hctx->queue->elevator->elevator_data;
 
+	if (!atomic_read(&hctx->elevator_queued))
+		return false;
+
 	return !list_empty_careful(&dd->dispatch) ||
 		!list_empty_careful(&dd->fifo_list[0]) ||
 		!list_empty_careful(&dd->fifo_list[1]);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a4b35ec60faf..2f3ba31a1658 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -139,6 +139,10 @@ struct blk_mq_hw_ctx {
 	 * shared across request queues.
 	 */
 	atomic_t		nr_active;
+	/**
+	 * @elevator_queued: Number of queued requests on hctx.
+	 */
+	atomic_t                elevator_queued;
 
 	/** @cpuhp_online: List to store request if CPU is going to die */
 	struct hlist_node	cpuhp_online;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 11/18] null_blk: Support shared tag bitmap
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (9 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 10/18] blk-mq, elevator: Count requests per hctx to improve performance John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 12/18] scsi: Add host and host template flag 'host_tagset' John Garry
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Support a shared tag bitmap, whereby request tags are unique over all
submission queues, and not just per submission queue.

As such, per device total queue depth is normally hw_queue_depth *
submit_queues, but hw_queue_depth when set. And a similar story for when
shared_tags is set, where that is the queue depth over all null blk
devices.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/block/null_blk_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 47a9dad880af..19c04e074880 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -164,6 +164,10 @@ static bool shared_tags;
 module_param(shared_tags, bool, 0444);
 MODULE_PARM_DESC(shared_tags, "Share tag set between devices for blk-mq");
 
+static bool g_shared_tag_bitmap;
+module_param_named(shared_tag_bitmap, g_shared_tag_bitmap, bool, 0444);
+MODULE_PARM_DESC(shared_tag_bitmap, "Use shared tag bitmap for all submission queues for blk-mq");
+
 static int g_irqmode = NULL_IRQ_SOFTIRQ;
 
 static int null_set_irqmode(const char *str, const struct kernel_param *kp)
@@ -1692,6 +1696,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
 	if (g_no_sched)
 		set->flags |= BLK_MQ_F_NO_SCHED;
+	if (g_shared_tag_bitmap)
+		set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 	set->driver_data = NULL;
 
 	if ((nullb && nullb->dev->blocking) || g_blocking)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 12/18] scsi: Add host and host template flag 'host_tagset'
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (10 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 11/18] null_blk: Support shared tag bitmap John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs John Garry
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke, John Garry

From: Hannes Reinecke <hare@suse.com>

Add Host and host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue is
converted to blk-mq hw queue.

Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
Signed-off-by: Hannes Reinecke <hare@suse.com>
[jpg: Update comment on .can_queue and add Scsi_Host.host_tagset]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hosts.c     | 1 +
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 9 ++++++++-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 37d1c5565d90..2f162603876f 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -421,6 +421,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	shost->cmd_per_lun = sht->cmd_per_lun;
 	shost->unchecked_isa_dma = sht->unchecked_isa_dma;
 	shost->no_write_same = sht->no_write_same;
+	shost->host_tagset = sht->host_tagset;
 
 	if (shost_eh_deadline == -1 || !sht->eh_host_reset_handler)
 		shost->eh_deadline = -1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7c6dd6f75190..cdef5d1777cb 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1891,6 +1891,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	tag_set->flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
 	tag_set->driver_data = shost;
+	if (shost->host_tagset)
+		tag_set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 
 	return blk_mq_alloc_tag_set(tag_set);
 }
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 46ef8cccc982..701f178b20ae 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -436,6 +436,9 @@ struct scsi_host_template {
 	/* True if the controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
@@ -603,7 +606,8 @@ struct Scsi_Host {
 	 *
 	 * Note: it is assumed that each hardware queue has a queue depth of
 	 * can_queue. In other words, the total queue depth per host
-	 * is nr_hw_queues * can_queue.
+	 * is nr_hw_queues * can_queue. However, for when host_tagset is set,
+	 * the total queue depth is can_queue.
 	 */
 	unsigned nr_hw_queues;
 	unsigned active_mode:2;
@@ -634,6 +638,9 @@ struct Scsi_Host {
 	/* The controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/* Host responded with short (<36 bytes) INQUIRY result */
 	unsigned short_inquiry:1;
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (11 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 12/18] scsi: Add host and host template flag 'host_tagset' John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-09-10  8:33   ` John Garry
  2020-08-19 15:20 ` [PATCH v8 14/18] scsi: hisi_sas: Switch v3 hw to MQ John Garry
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

So that we don't use a value of 0 for when Scsi_Host.nr_hw_queues is unset,
use the tag_set->nr_hw_queues (which holds 1 for this case).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_sysfs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 163dbcb741c1..d6e344fa33ad 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -393,6 +393,16 @@ show_use_blk_mq(struct device *dev, struct device_attribute *attr, char *buf)
 }
 static DEVICE_ATTR(use_blk_mq, S_IRUGO, show_use_blk_mq, NULL);
 
+static ssize_t
+show_nr_hw_queues(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct Scsi_Host *shost = class_to_shost(dev);
+	struct blk_mq_tag_set *tag_set = &shost->tag_set;
+
+	return snprintf(buf, 20, "%d\n", tag_set->nr_hw_queues);
+}
+static DEVICE_ATTR(nr_hw_queues, S_IRUGO, show_nr_hw_queues, NULL);
+
 static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_use_blk_mq.attr,
 	&dev_attr_unique_id.attr,
@@ -411,6 +421,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
 	&dev_attr_prot_guard_type.attr,
 	&dev_attr_host_reset.attr,
 	&dev_attr_eh_deadline.attr,
+	&dev_attr_nr_hw_queues.attr,
 	NULL
 };
 
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 14/18] scsi: hisi_sas: Switch v3 hw to MQ
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (12 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 15/18] scsi: scsi_debug: Support host tagset John Garry
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 87 +++++++++++---------------
 3 files changed, 56 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 2bdd64648ef0..e6acbf940712 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
 #define _HISI_SAS_H_
 
 #include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
@@ -431,7 +433,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* bist */
 	enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 11caa4b0d797..7ed4eaedb7ca 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -417,6 +417,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
 	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+	struct scsi_cmnd *scmd = NULL;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -432,10 +433,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	if (task->uldd_task) {
+		struct ata_queued_cmd *qc;
 
+		if (dev_is_sata(device)) {
+			qc = task->uldd_task;
+			scmd = qc->scsicmd;
+		} else {
+			scmd = task->uldd_task;
+		}
+	}
+
+	if (scmd) {
+		unsigned int dq_index;
+		u32 blk_tag;
+
+		blk_tag = blk_mq_unique_tag(scmd->request);
+		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
 		*dq_pointer = dq = sas_dev->dq;
@@ -464,21 +478,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
+	else
+		rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
 
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
-		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 60adf5c32143..b6f75a1764df 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2362,68 +2362,36 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_WAKE_THREAD;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 {
-	const struct cpumask *mask;
-	int queue, cpu;
+	int vectors;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	struct irq_affinity desc = {
+		.pre_vectors = BASE_VECTORS_V3_HW,
+	};
 
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+	min_msi = MIN_AFFINE_VECTORS_V3_HW;
+	vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+						 min_msi, max_msi,
+						 PCI_IRQ_MSI |
+						 PCI_IRQ_AFFINITY,
+						 &desc);
+	if (vectors < 0)
+		return -ENOENT;
 
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
 
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
+
+	return 0;
 }
 
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc, i;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
-	if (auto_affine_msi_experimental) {
-		struct irq_affinity desc = {
-			.pre_vectors = BASE_VECTORS_V3_HW,
-		};
-
-		dev_info(dev, "Enable MSI auto-affinity\n");
-
-		min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
-		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
-							 min_msi, max_msi,
-							 PCI_IRQ_MSI |
-							 PCI_IRQ_AFFINITY,
-							 &desc);
-		if (vectors < 0)
-			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
-	} else {
-		min_msi = max_msi;
-		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
-						max_msi, PCI_IRQ_MSI);
-		if (vectors < 0)
-			return vectors;
-	}
-
-	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	int rc, i;
 
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
@@ -3072,6 +3040,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
 	return 0;
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+				     BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.proc_name		= DRV_NAME,
@@ -3082,6 +3059,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
+	.map_queues		= hisi_sas_map_queues,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
@@ -3098,6 +3076,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3269,6 +3248,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	rc = interrupt_preinit_v3_hw(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+	dev_err(dev, "%d hw queues\n", shost->nr_hw_queues);
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 15/18] scsi: scsi_debug: Support host tagset
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (13 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 14/18] scsi: hisi_sas: Switch v3 hw to MQ John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 16/18] hpsa: enable host_tagset and switch to MQ John Garry
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

When host_max_queue is set (> 0), set the Scsi_Host.host_tagset such that
blk-mq will use a hostwide tagset over all SCSI host submission queues.

This means that we may expose all submission queues and always use the hwq
chosen by blk-mq.

And since if sdebug_host_max_queue is set, sdebug_max_queue is fixed to the
same value, we can simplify how sdebug_driver_template.can_queue is set.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_debug.c | 28 ++++++++++------------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 064ed680c053..4238baa2b34b 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -4698,19 +4698,14 @@ static int resp_rwp_zone(struct scsi_cmnd *scp, struct sdebug_dev_info *devip)
 static struct sdebug_queue *get_queue(struct scsi_cmnd *cmnd)
 {
 	u16 hwq;
+	u32 tag = blk_mq_unique_tag(cmnd->request);
 
-	if (sdebug_host_max_queue) {
-		/* Provide a simple method to choose the hwq */
-		hwq = smp_processor_id() % submit_queues;
-	} else {
-		u32 tag = blk_mq_unique_tag(cmnd->request);
+	hwq = blk_mq_unique_tag_to_hwq(tag);
 
-		hwq = blk_mq_unique_tag_to_hwq(tag);
+	pr_debug("tag=%#x, hwq=%d\n", tag, hwq);
+	if (WARN_ON_ONCE(hwq >= submit_queues))
+		hwq = 0;
 
-		pr_debug("tag=%#x, hwq=%d\n", tag, hwq);
-		if (WARN_ON_ONCE(hwq >= submit_queues))
-			hwq = 0;
-	}
 	return sdebug_q_arr + hwq;
 }
 
@@ -7347,10 +7342,7 @@ static int sdebug_driver_probe(struct device *dev)
 
 	sdbg_host = to_sdebug_host(dev);
 
-	if (sdebug_host_max_queue)
-		sdebug_driver_template.can_queue = sdebug_host_max_queue;
-	else
-		sdebug_driver_template.can_queue = sdebug_max_queue;
+	sdebug_driver_template.can_queue = sdebug_max_queue;
 	if (!sdebug_clustering)
 		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
 
@@ -7367,11 +7359,11 @@ static int sdebug_driver_probe(struct device *dev)
 	}
 	/*
 	 * Decide whether to tell scsi subsystem that we want mq. The
-	 * following should give the same answer for each host. If the host
-	 * has a limit of hostwide max commands, then do not set.
+	 * following should give the same answer for each host.
 	 */
-	if (!sdebug_host_max_queue)
-		hpnt->nr_hw_queues = submit_queues;
+	hpnt->nr_hw_queues = submit_queues;
+	if (sdebug_host_max_queue)
+		hpnt->host_tagset = 1;
 
 	sdbg_host->shost = hpnt;
 	*((struct sdebug_host_info **)hpnt->hostdata) = sdbg_host;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 16/18] hpsa: enable host_tagset and switch to MQ
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (14 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 15/18] scsi: scsi_debug: Support host tagset John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-19 15:20 ` [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug John Garry
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Hannes Reinecke <hare@suse.de>

The smart array HBAs can steer interrupt completion, so this
patch switches the implementation to use multiqueue and enables
'host_tagset' as the HBA has a shared host-wide tagset.

Reviewed-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hpsa.c | 44 +++++++-------------------------------------
 drivers/scsi/hpsa.h |  1 -
 2 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 91794a50b31f..eaddcd3e9caf 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -984,6 +984,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 2048,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -1148,12 +1149,14 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
 static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	struct CommandList *c, int reply_queue)
 {
+	u32 blk_tag = blk_mq_unique_tag(c->scsi_cmd->request);
+
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
 	if (c->device)
 		atomic_inc(&c->device->commands_outstanding);
 
-	reply_queue = h->reply_map[raw_smp_processor_id()];
+	reply_queue = blk_mq_unique_tag_to_hwq(blk_tag);
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -5676,8 +5679,6 @@ static int hpsa_scsi_queue_command(struct Scsi_Host *sh, struct scsi_cmnd *cmd)
 	/* Get the ptr to our adapter structure out of cmd->host. */
 	h = sdev_to_hba(cmd->device);
 
-	BUG_ON(cmd->request->tag < 0);
-
 	dev = cmd->device->hostdata;
 	if (!dev) {
 		cmd->result = DID_NO_CONNECT << 16;
@@ -5853,7 +5854,7 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)
 	sh->hostdata[0] = (unsigned long) h;
 	sh->irq = pci_irq_vector(h->pdev, 0);
 	sh->unique_id = sh->irq;
-
+	sh->nr_hw_queues = h->msix_vectors > 0 ? h->msix_vectors : 1;
 	h->scsi_host = sh;
 	return 0;
 }
@@ -5879,7 +5880,8 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
  */
 static int hpsa_get_cmd_index(struct scsi_cmnd *scmd)
 {
-	int idx = scmd->request->tag;
+	u32 blk_tag = blk_mq_unique_tag(scmd->request);
+	int idx = blk_mq_unique_tag_to_tag(blk_tag);
 
 	if (idx < 0)
 		return idx;
@@ -7456,26 +7458,6 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
-static void hpsa_setup_reply_map(struct ctlr_info *h)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu;
-
-	for (queue = 0; queue < h->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(h->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			h->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		h->reply_map[cpu] = 0;
-}
-
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7872,9 +7854,6 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	if (err)
 		goto clean1;
 
-	/* setup mapping between CPU and reply queue */
-	hpsa_setup_reply_map(h);
-
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8613,7 +8592,6 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 
 static void hpda_free_ctlr_info(struct ctlr_info *h)
 {
-	kfree(h->reply_map);
 	kfree(h);
 }
 
@@ -8622,14 +8600,6 @@ static struct ctlr_info *hpda_alloc_ctlr_info(void)
 	struct ctlr_info *h;
 
 	h = kzalloc(sizeof(*h), GFP_KERNEL);
-	if (!h)
-		return NULL;
-
-	h->reply_map = kcalloc(nr_cpu_ids, sizeof(*h->reply_map), GFP_KERNEL);
-	if (!h->reply_map) {
-		kfree(h);
-		return NULL;
-	}
 	return h;
 }
 
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 6b87d9815b35..9c6a161f2367 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -161,7 +161,6 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
-	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (15 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 16/18] hpsa: enable host_tagset and switch to MQ John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-11-02 14:17   ` Qian Cai
  2020-08-19 15:20 ` [PATCH v8 18/18] smartpqi: enable host tagset John Garry
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke, John Garry

From: Kashyap Desai <kashyap.desai@broadcom.com>

Fusion adapters can steer completions to individual queues, and
we now have support for shared host-wide tags.
So we can enable multiqueue support for fusion adapters.

Once driver enable shared host-wide tags, cpu hotplug feature is also
supported as it was enabled using below patchsets -
commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
offline")

Currently driver has provision to disable host-wide tags using
"host_tagset_enable" module parameter.

Once we do not have any major performance regression using host-wide
tags, we will drop the hand-crafted interrupt affinity settings.

Performance is also meeting the expecatation - (used both none and
mq-deadline scheduler)
24 Drive SSD on Aero with/without this patch can get 3.1M IOPs
3 VDs consist of 8 SAS SSD on Aero with/without this patch can get 3.1M
IOPs.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 39 +++++++++++++++++++++
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 29 ++++++++-------
 2 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 861f7140f52e..6960922d0d7f 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -37,6 +37,7 @@
 #include <linux/poll.h>
 #include <linux/vmalloc.h>
 #include <linux/irq_poll.h>
+#include <linux/blk-mq-pci.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -113,6 +114,10 @@ unsigned int enable_sdev_max_qd;
 module_param(enable_sdev_max_qd, int, 0444);
 MODULE_PARM_DESC(enable_sdev_max_qd, "Enable sdev max qd as can_queue. Default: 0");
 
+int host_tagset_enable = 1;
+module_param(host_tagset_enable, int, 0444);
+MODULE_PARM_DESC(host_tagset_enable, "Shared host tagset enable/disable Default: enable(1)");
+
 MODULE_LICENSE("GPL");
 MODULE_VERSION(MEGASAS_VERSION);
 MODULE_AUTHOR("megaraidlinux.pdl@broadcom.com");
@@ -3119,6 +3124,19 @@ megasas_bios_param(struct scsi_device *sdev, struct block_device *bdev,
 	return 0;
 }
 
+static int megasas_map_queues(struct Scsi_Host *shost)
+{
+	struct megasas_instance *instance;
+
+	instance = (struct megasas_instance *)shost->hostdata;
+
+	if (shost->nr_hw_queues == 1)
+		return 0;
+
+	return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
+			instance->pdev, instance->low_latency_index_start);
+}
+
 static void megasas_aen_polling(struct work_struct *work);
 
 /**
@@ -3427,6 +3445,7 @@ static struct scsi_host_template megasas_template = {
 	.eh_timed_out = megasas_reset_timer,
 	.shost_attrs = megaraid_host_attrs,
 	.bios_param = megasas_bios_param,
+	.map_queues = megasas_map_queues,
 	.change_queue_depth = scsi_change_queue_depth,
 	.max_segment_size = 0xffffffff,
 };
@@ -6808,6 +6827,26 @@ static int megasas_io_attach(struct megasas_instance *instance)
 	host->max_lun = MEGASAS_MAX_LUN;
 	host->max_cmd_len = 16;
 
+	/* Use shared host tagset only for fusion adaptors
+	 * if there are managed interrupts (smp affinity enabled case).
+	 * Single msix_vectors in kdump, so shared host tag is also disabled.
+	 */
+
+	host->host_tagset = 0;
+	host->nr_hw_queues = 1;
+
+	if ((instance->adapter_type != MFI_SERIES) &&
+		(instance->msix_vectors > instance->low_latency_index_start) &&
+		host_tagset_enable &&
+		instance->smp_affinity_enable) {
+		host->host_tagset = 1;
+		host->nr_hw_queues = instance->msix_vectors -
+			instance->low_latency_index_start;
+	}
+
+	dev_info(&instance->pdev->dev,
+		"Max firmware commands: %d shared with nr_hw_queues = %d\n",
+		instance->max_fw_cmds, host->nr_hw_queues);
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 0824410f78f8..a4251121f173 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -359,24 +359,29 @@ megasas_get_msix_index(struct megasas_instance *instance,
 {
 	int sdev_busy;
 
-	/* nr_hw_queue = 1 for MegaRAID */
-	struct blk_mq_hw_ctx *hctx =
-		scmd->device->request_queue->queue_hw_ctx[0];
-
-	sdev_busy = atomic_read(&hctx->nr_active);
+	/* TBD - if sml remove device_busy in future, driver
+	 * should track counter in internal structure.
+	 */
+	sdev_busy = atomic_read(&scmd->device->device_busy);
 
 	if (instance->perf_mode == MR_BALANCED_PERF_MODE &&
-	    sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH))
+	    sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) {
 		cmd->request_desc->SCSIIO.MSIxIndex =
 			mega_mod64((atomic64_add_return(1, &instance->high_iops_outstanding) /
 					MR_HIGH_IOPS_BATCH_COUNT), instance->low_latency_index_start);
-	else if (instance->msix_load_balance)
+	} else if (instance->msix_load_balance) {
 		cmd->request_desc->SCSIIO.MSIxIndex =
 			(mega_mod64(atomic64_add_return(1, &instance->total_io_count),
 				instance->msix_vectors));
-	else
+	} else if (instance->host->nr_hw_queues > 1) {
+		u32 tag = blk_mq_unique_tag(scmd->request);
+
+		cmd->request_desc->SCSIIO.MSIxIndex = blk_mq_unique_tag_to_hwq(tag) +
+			instance->low_latency_index_start;
+	} else {
 		cmd->request_desc->SCSIIO.MSIxIndex =
 			instance->reply_map[raw_smp_processor_id()];
+	}
 }
 
 /**
@@ -956,9 +961,6 @@ megasas_alloc_cmds_fusion(struct megasas_instance *instance)
 	if (megasas_alloc_cmdlist_fusion(instance))
 		goto fail_exit;
 
-	dev_info(&instance->pdev->dev, "Configured max firmware commands: %d\n",
-		 instance->max_fw_cmds);
-
 	/* The first 256 bytes (SMID 0) is not used. Don't add to the cmd list */
 	io_req_base = fusion->io_request_frames + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
 	io_req_base_phys = fusion->io_request_frames_phys + MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
@@ -1102,8 +1104,9 @@ megasas_ioc_init_fusion(struct megasas_instance *instance)
 		MR_HIGH_IOPS_QUEUE_COUNT) && cur_intr_coalescing)
 		instance->perf_mode = MR_BALANCED_PERF_MODE;
 
-	dev_info(&instance->pdev->dev, "Performance mode :%s\n",
-		MEGASAS_PERF_MODE_2STR(instance->perf_mode));
+	dev_info(&instance->pdev->dev, "Performance mode :%s (latency index = %d)\n",
+		MEGASAS_PERF_MODE_2STR(instance->perf_mode),
+		instance->low_latency_index_start);
 
 	instance->fw_sync_cache_support = (scratch_pad_1 &
 		MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0;
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v8 18/18] smartpqi: enable host tagset
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (16 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug John Garry
@ 2020-08-19 15:20 ` John Garry
  2020-08-27  8:53 ` [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-19 15:20 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	John Garry

From: Hannes Reinecke <hare@suse.de>

Enable host tagset for smartpqi; with this we can use the request
tag to look command from the pool avoiding the list iteration in
the hot path.

Signed-off-by: Hannes Reinecke <hare@suse.de>
[jpg: Mod ctrl_info->next_io_request_slot calc]
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/smartpqi/smartpqi_init.c | 45 ++++++++++++++++++---------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c
index bd38c8cea56e..870ed1400a9e 100644
--- a/drivers/scsi/smartpqi/smartpqi_init.c
+++ b/drivers/scsi/smartpqi/smartpqi_init.c
@@ -575,22 +575,33 @@ static inline void pqi_reinit_io_request(struct pqi_io_request *io_request)
 }
 
 static struct pqi_io_request *pqi_alloc_io_request(
-	struct pqi_ctrl_info *ctrl_info)
+	struct pqi_ctrl_info *ctrl_info, struct scsi_cmnd *scmd)
 {
 	struct pqi_io_request *io_request;
-	u16 i = ctrl_info->next_io_request_slot;	/* benignly racy */
+	unsigned int limit = PQI_RESERVED_IO_SLOTS;
+	u16 i;
 
-	while (1) {
+	if (scmd) {
+		u32 blk_tag = blk_mq_unique_tag(scmd->request);
+
+		i = blk_mq_unique_tag_to_tag(blk_tag) + limit;
 		io_request = &ctrl_info->io_request_pool[i];
-		if (atomic_inc_return(&io_request->refcount) == 1)
-			break;
-		atomic_dec(&io_request->refcount);
-		i = (i + 1) % ctrl_info->max_io_slots;
+		if (WARN_ON(atomic_inc_return(&io_request->refcount) > 1)) {
+			atomic_dec(&io_request->refcount);
+			return NULL;
+		}
+	} else {
+		i = ctrl_info->next_io_request_slot;	/* benignly racy */
+		while (1) {
+			io_request = &ctrl_info->io_request_pool[i];
+			if (atomic_inc_return(&io_request->refcount) == 1)
+				break;
+			atomic_dec(&io_request->refcount);
+			i = (i + 1) % limit;
+		}
+		ctrl_info->next_io_request_slot = (i + 1) % limit;
 	}
 
-	/* benignly racy */
-	ctrl_info->next_io_request_slot = (i + 1) % ctrl_info->max_io_slots;
-
 	pqi_reinit_io_request(io_request);
 
 	return io_request;
@@ -4075,7 +4086,7 @@ static int pqi_submit_raid_request_synchronous(struct pqi_ctrl_info *ctrl_info,
 
 	atomic_inc(&ctrl_info->sync_cmds_outstanding);
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 
 	put_unaligned_le16(io_request->index,
 		&(((struct pqi_raid_path_request *)request)->request_id));
@@ -5032,7 +5043,9 @@ static inline int pqi_raid_submit_scsi_cmd(struct pqi_ctrl_info *ctrl_info,
 {
 	struct pqi_io_request *io_request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
 
 	return pqi_raid_submit_scsi_cmd_with_io_request(ctrl_info, io_request,
 		device, scmd, queue_group);
@@ -5230,7 +5243,10 @@ static int pqi_aio_submit_io(struct pqi_ctrl_info *ctrl_info,
 	struct pqi_io_request *io_request;
 	struct pqi_aio_path_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, scmd);
+	if (!io_request)
+		return SCSI_MLQUEUE_HOST_BUSY;
+
 	io_request->io_complete_callback = pqi_aio_io_complete;
 	io_request->scmd = scmd;
 	io_request->raid_bypass = raid_bypass;
@@ -5657,7 +5673,7 @@ static int pqi_lun_reset(struct pqi_ctrl_info *ctrl_info,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct pqi_task_management_request *request;
 
-	io_request = pqi_alloc_io_request(ctrl_info);
+	io_request = pqi_alloc_io_request(ctrl_info, NULL);
 	io_request->io_complete_callback = pqi_lun_reset_complete;
 	io_request->context = &wait;
 
@@ -6504,6 +6520,7 @@ static struct scsi_host_template pqi_driver_template = {
 	.map_queues = pqi_map_queues,
 	.sdev_attrs = pqi_sdev_attrs,
 	.shost_attrs = pqi_shost_attrs,
+	.host_tagset = 1,
 };
 
 static int pqi_register_scsi(struct pqi_ctrl_info *ctrl_info)
-- 
2.26.2


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (17 preceding siblings ...)
  2020-08-19 15:20 ` [PATCH v8 18/18] smartpqi: enable host tagset John Garry
@ 2020-08-27  8:53 ` John Garry
  2020-09-03 19:28 ` Douglas Gilbert
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-08-27  8:53 UTC (permalink / raw)
  To: axboe
  Cc: jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch, sumit.saxena,
	linux-block, linux-kernel, linux-scsi, esc.storagedev,
	megaraidlinux.pdl, chenxiang66, luojiaxing

Hi Jens,

I was wondering if you could kindly consider the block changes in this 
series, since I have now dropped the RFC flag?

I guess patch 5/18 (using pointers to bitmaps) would be of first 
concern. We did discuss this previously, and I think what we're doing 
now could be considered satisfactory.

Thanks,
john

> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 		4000 (default)			260**
> 
> Baseline (v5.9-rc1):
> none sched:		2094K IOPS			513K
> mq-deadline sched:	2145K IOPS			1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:		2120K IOPS			550K
> mq-deadline sched:	2121K IOPS			1309K
> 
> Final ***:
> none sched:		2132K IOPS			1185			
> mq-deadline sched:	2145K IOPS			2097	
> 
> * this is relevant as this is the performance in supporting but not
>    enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>     tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
>      scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!
> 
> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
> 	- (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
> 
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t
> 
> Hannes Reinecke (5):
>    blk-mq: Rename blk_mq_update_tag_set_depth()
>    blk-mq: Free tags in blk_mq_init_tags() upon error
>    scsi: Add host and host template flag 'host_tagset'
>    hpsa: enable host_tagset and switch to MQ
>    smartpqi: enable host tagset
> 
> John Garry (10):
>    blk-mq: Pass flags for tag init/free
>    blk-mq: Use pointers for blk_mq_tags bitmap tags
>    blk-mq: Facilitate a shared sbitmap per tagset
>    blk-mq: Relocate hctx_may_queue()
>    blk-mq: Record nr_active_requests per queue for when using shared
>      sbitmap
>    blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
>      shared sbitmap
>    null_blk: Support shared tag bitmap
>    scsi: core: Show nr_hw_queues in sysfs
>    scsi: hisi_sas: Switch v3 hw to MQ
>    scsi: scsi_debug: Support host tagset
> 
> Kashyap Desai (2):
>    blk-mq, elevator: Count requests per hctx to improve performance
>    scsi: megaraid_sas: Added support for shared host tagset for
>      cpuhotplug
> 
> Ming Lei (1):
>    blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
>   block/bfq-iosched.c                         |   9 +-
>   block/blk-core.c                            |   2 +
>   block/blk-mq-debugfs.c                      |  10 +-
>   block/blk-mq-sched.c                        |  13 +-
>   block/blk-mq-tag.c                          | 149 ++++++++++++++------
>   block/blk-mq-tag.h                          |  56 +++-----
>   block/blk-mq.c                              |  81 +++++++----
>   block/blk-mq.h                              |  76 +++++++++-
>   block/kyber-iosched.c                       |   4 +-
>   block/mq-deadline.c                         |   6 +
>   drivers/block/null_blk_main.c               |   6 +
>   drivers/block/rnbd/rnbd-clt.c               |   2 +-
>   drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
>   drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 ++---
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  87 +++++-------
>   drivers/scsi/hosts.c                        |   1 +
>   drivers/scsi/hpsa.c                         |  44 +-----
>   drivers/scsi/hpsa.h                         |   1 -
>   drivers/scsi/megaraid/megaraid_sas_base.c   |  39 +++++
>   drivers/scsi/megaraid/megaraid_sas_fusion.c |  29 ++--
>   drivers/scsi/scsi_debug.c                   |  28 ++--
>   drivers/scsi/scsi_lib.c                     |   2 +
>   drivers/scsi/scsi_sysfs.c                   |  11 ++
>   drivers/scsi/smartpqi/smartpqi_init.c       |  45 ++++--
>   include/linux/blk-mq.h                      |  13 +-
>   include/linux/blkdev.h                      |   3 +
>   include/scsi/scsi_host.h                    |   9 +-
>   27 files changed, 484 insertions(+), 281 deletions(-)
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (18 preceding siblings ...)
  2020-08-27  8:53 ` [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
@ 2020-09-03 19:28 ` Douglas Gilbert
  2020-09-03 21:23 ` Jens Axboe
  2020-09-08 12:46 ` Hannes Reinecke
  21 siblings, 0 replies; 58+ messages in thread
From: Douglas Gilbert @ 2020-09-03 19:28 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 2020-08-19 11:20 a.m., John Garry wrote:
> Hi all,
> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 		4000 (default)			260**
> 
> Baseline (v5.9-rc1):
> none sched:		2094K IOPS			513K
> mq-deadline sched:	2145K IOPS			1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:		2120K IOPS			550K
> mq-deadline sched:	2121K IOPS			1309K
> 
> Final ***:
> none sched:		2132K IOPS			1185			
> mq-deadline sched:	2145K IOPS			2097	
> 
> * this is relevant as this is the performance in supporting but not
>    enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>     tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
>      scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!

I tested this v8 patchset on MKP's 5.10/scsi-queue branch together with
my rewritten sg driver on my laptop and a Ryzen 5 3600 machine. Since I
don't have same hardware, I use the scsi_debug driver as the target:

    modprobe scsi_debug dev_size_mb=1024 sector_size=512 add_host=7 
per_host_store=1 ndelay=1000 random=1 submit_queues=12

My test is a script which runs these three commands many times with
differing parameters:

sg_mrq_dd iflag=random bs=512 of=/dev/sg8 thr=64 time=2
time to transfer data was 0.312705 secs, 3433.72 MB/sec
2097152+0 records in
2097152+0 records out

sg_mrq_dd bpt=256 thr=64 mrq=36 time=2 if=/dev/sg8 bs=512 of=/dev/sg9
time to transfer data was 0.212090 secs, 5062.67 MB/sec
2097152+0 records in
2097152+0 records out

sg_mrq_dd --verify if=/dev/sg8 of=/dev/sg9 bs=512 bpt=256 thr=64 mrq=36 time=2
Doing verify/cmp rather than copy
time to transfer data was 0.184563 secs, 5817.75 MB/sec
2097152+0 records in
2097152+0 records verified

The above is the output from last section of the my script run on the Ryzen 5.

So the three steps are:
    1) produce random data on /dev/sg8
    2) copy /dev/sg8 to /dev/sg9
    3) verify /dev/sg8 and /dev/sg9 are the same.

The latter step is done with a sequence of READ(/dev/sg8) and
VERIFY(BYTCHK=1 on /dev/sg9). The "mrq" stands for multiple requests (in
one invocation; the bsg driver did that before its write(2) command was
removed.
The SCSI devices on the Ryzen 5 machine are:

# lsscsi -gs
[2:0:0:0]  disk    IBM-207x HUSMM8020ASS20   J4B6  /dev/sda   /dev/sg0   200GB
[2:0:1:0]  disk    SEAGATE  ST200FM0073      0007  /dev/sdb   /dev/sg1   200GB
[2:0:2:0]  enclosu Areca Te ARC-802801.37.69 0137  -          /dev/sg2       -
[3:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdc   /dev/sg3  1.07GB
[4:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdd   /dev/sg4  1.07GB
[5:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sde   /dev/sg5  1.07GB
[6:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdf   /dev/sg6  1.07GB
[7:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdg   /dev/sg7  1.07GB
[8:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdh   /dev/sg8  1.07GB
[9:0:0:0]  disk    Linux    scsi_debug       0190  /dev/sdi   /dev/sg9  1.07GB
[N:0:1:1]  disk    WDC WDS250G2B0C-00PXH0__1       /dev/nvme0n1  -      250GB

My script took 17m12 and the highest throughput (on a copy) was 7.5 GB/sec.
Then I reloaded the scsi_debug module, this time with an additional
'host_max_queue=128' parameter. The script run time was 5 seconds shorter
and the maximum throughput was around 7.6 GB/sec. [Average throughput is
around 4 GB/sec.]

For comparison:

# time liburing/examples/io_uring-cp /dev/sdh /dev/sdi
real	0m1.542s
user	0m0.004s
sys	0m1.027s

Umm, that's less then 1 GB/sec. In its defence io_uring-cp is an
extremely simple, single threaded, proof-of-concept copy program,
at least compared to sg_mrq_dd . As used by the sg_mrq_dd the
rewritten sg driver bypasses moving 1 GB to and from the _user_
space while doing the above copy and verify steps.

So:

Tested-by: Douglas Gilbert <dgilbert@interlog.com>

> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
> 	- (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
> 
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t
> 
> Hannes Reinecke (5):
>    blk-mq: Rename blk_mq_update_tag_set_depth()
>    blk-mq: Free tags in blk_mq_init_tags() upon error
>    scsi: Add host and host template flag 'host_tagset'
>    hpsa: enable host_tagset and switch to MQ
>    smartpqi: enable host tagset
> 
> John Garry (10):
>    blk-mq: Pass flags for tag init/free
>    blk-mq: Use pointers for blk_mq_tags bitmap tags
>    blk-mq: Facilitate a shared sbitmap per tagset
>    blk-mq: Relocate hctx_may_queue()
>    blk-mq: Record nr_active_requests per queue for when using shared
>      sbitmap
>    blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
>      shared sbitmap
>    null_blk: Support shared tag bitmap
>    scsi: core: Show nr_hw_queues in sysfs
>    scsi: hisi_sas: Switch v3 hw to MQ
>    scsi: scsi_debug: Support host tagset
> 
> Kashyap Desai (2):
>    blk-mq, elevator: Count requests per hctx to improve performance
>    scsi: megaraid_sas: Added support for shared host tagset for
>      cpuhotplug
> 
> Ming Lei (1):
>    blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
>   block/bfq-iosched.c                         |   9 +-
>   block/blk-core.c                            |   2 +
>   block/blk-mq-debugfs.c                      |  10 +-
>   block/blk-mq-sched.c                        |  13 +-
>   block/blk-mq-tag.c                          | 149 ++++++++++++++------
>   block/blk-mq-tag.h                          |  56 +++-----
>   block/blk-mq.c                              |  81 +++++++----
>   block/blk-mq.h                              |  76 +++++++++-
>   block/kyber-iosched.c                       |   4 +-
>   block/mq-deadline.c                         |   6 +
>   drivers/block/null_blk_main.c               |   6 +
>   drivers/block/rnbd/rnbd-clt.c               |   2 +-
>   drivers/scsi/hisi_sas/hisi_sas.h            |   3 +-
>   drivers/scsi/hisi_sas/hisi_sas_main.c       |  36 ++---
>   drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      |  87 +++++-------
>   drivers/scsi/hosts.c                        |   1 +
>   drivers/scsi/hpsa.c                         |  44 +-----
>   drivers/scsi/hpsa.h                         |   1 -
>   drivers/scsi/megaraid/megaraid_sas_base.c   |  39 +++++
>   drivers/scsi/megaraid/megaraid_sas_fusion.c |  29 ++--
>   drivers/scsi/scsi_debug.c                   |  28 ++--
>   drivers/scsi/scsi_lib.c                     |   2 +
>   drivers/scsi/scsi_sysfs.c                   |  11 ++
>   drivers/scsi/smartpqi/smartpqi_init.c       |  45 ++++--
>   include/linux/blk-mq.h                      |  13 +-
>   include/linux/blkdev.h                      |   3 +
>   include/scsi/scsi_host.h                    |   9 +-
>   27 files changed, 484 insertions(+), 281 deletions(-)
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (19 preceding siblings ...)
  2020-09-03 19:28 ` Douglas Gilbert
@ 2020-09-03 21:23 ` Jens Axboe
  2020-09-04  9:09   ` John Garry
  2020-09-08 12:46 ` Hannes Reinecke
  21 siblings, 1 reply; 58+ messages in thread
From: Jens Axboe @ 2020-09-03 21:23 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 8/19/20 9:20 AM, John Garry wrote:
> Hi all,
> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 		4000 (default)			260**
> 
> Baseline (v5.9-rc1):
> none sched:		2094K IOPS			513K
> mq-deadline sched:	2145K IOPS			1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:		2120K IOPS			550K
> mq-deadline sched:	2121K IOPS			1309K
> 
> Final ***:
> none sched:		2132K IOPS			1185			
> mq-deadline sched:	2145K IOPS			2097	
> 
> * this is relevant as this is the performance in supporting but not
>   enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>    tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
>     scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!

I applied 1-11, leaving the SCSI core bits and drivers to Martin. I can
also carry them, just let me know.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-03 21:23 ` Jens Axboe
@ 2020-09-04  9:09   ` John Garry
  2020-09-04 12:44     ` Martin K. Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-09-04  9:09 UTC (permalink / raw)
  To: Jens Axboe, jejb, martin.petersen, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 03/09/2020 22:23, Jens Axboe wrote:
> On 8/19/20 9:20 AM, John Garry wrote:
>> Hi all,
>>
>> Here is v8 of the patchset.
>>
>> In this version of the series, we keep the shared sbitmap for driver tags,
>> and introduce changes to fix up the tag budgeting across request queues.
>> We also have a change to count requests per-hctx for when an elevator is
>> enabled, as an optimisation. I also dropped the debugfs changes - more on
>> that below.
>>
>> Some performance figures:
>>
>> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
>> but it is not always an appropriate scheduler to use.
>>
>> Tag depth 		4000 (default)			260**
>>
>> Baseline (v5.9-rc1):
>> none sched:		2094K IOPS			513K
>> mq-deadline sched:	2145K IOPS			1336K
>>
>> Final, host_tagset=0 in LLDD *, ***:
>> none sched:		2120K IOPS			550K
>> mq-deadline sched:	2121K IOPS			1309K
>>
>> Final ***:
>> none sched:		2132K IOPS			1185			
>> mq-deadline sched:	2145K IOPS			2097	
>>
>> * this is relevant as this is the performance in supporting but not
>>    enabling the feature
>> ** depth=260 is relevant as some point where we are regularly waiting for
>>     tags to be available. Figures were are a bit unstable here.
>> *** Included "[PATCH V4] scsi: core: only re-run queue in
>>      scsi_end_request() if device queue is busy"
>>
>> A copy of the patches can be found here:
>> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
>>
>> The hpsa patch depends on:
>> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
>>
>> And the smartpqi patch is not to be accepted.
>>
>> Comments (and testing) welcome, thanks!
> 
> I applied 1-11, leaving the SCSI core bits and drivers to Martin. I can
> also carry them, just let me know.
> 

Great, thanks!

So the SCSI parts depend on the block parts for building, so I guess it 
makes sense if you could carry them also.

hpsa and smartpqi patches are pending for now, but the rest could be 
picked up. Martin/James may want more review of the SCSI core bits, though.

Thanks again,
John



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-04  9:09   ` John Garry
@ 2020-09-04 12:44     ` Martin K. Petersen
  2020-09-16  7:21       ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Martin K. Petersen @ 2020-09-04 12:44 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, jejb, martin.petersen, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch,
	sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing


John,

> Martin/James may want more review of the SCSI core bits, though.

I'll take a look later today.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (20 preceding siblings ...)
  2020-09-03 21:23 ` Jens Axboe
@ 2020-09-08 12:46 ` Hannes Reinecke
  2020-09-08 13:38   ` John Garry
  21 siblings, 1 reply; 58+ messages in thread
From: Hannes Reinecke @ 2020-09-08 12:46 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 8/19/20 5:20 PM, John Garry wrote:
> Hi all,
> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 		4000 (default)			260**
> 
> Baseline (v5.9-rc1):
> none sched:		2094K IOPS			513K
> mq-deadline sched:	2145K IOPS			1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:		2120K IOPS			550K
> mq-deadline sched:	2121K IOPS			1309K
> 
> Final ***:
> none sched:		2132K IOPS			1185			
> mq-deadline sched:	2145K IOPS			2097	
> 
> * this is relevant as this is the performance in supporting but not
>   enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>    tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
>     scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-hare@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!
> 
> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
> 	- (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
> 
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.garry@huawei.com/T/#t
> 
> Hannes Reinecke (5):
>   blk-mq: Rename blk_mq_update_tag_set_depth()
>   blk-mq: Free tags in blk_mq_init_tags() upon error
>   scsi: Add host and host template flag 'host_tagset'
>   hpsa: enable host_tagset and switch to MQ
>   smartpqi: enable host tagset
> 
> John Garry (10):
>   blk-mq: Pass flags for tag init/free
>   blk-mq: Use pointers for blk_mq_tags bitmap tags
>   blk-mq: Facilitate a shared sbitmap per tagset
>   blk-mq: Relocate hctx_may_queue()
>   blk-mq: Record nr_active_requests per queue for when using shared
>     sbitmap
>   blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
>     shared sbitmap
>   null_blk: Support shared tag bitmap
>   scsi: core: Show nr_hw_queues in sysfs
>   scsi: hisi_sas: Switch v3 hw to MQ
>   scsi: scsi_debug: Support host tagset
> 
> Kashyap Desai (2):
>   blk-mq, elevator: Count requests per hctx to improve performance
>   scsi: megaraid_sas: Added support for shared host tagset for
>     cpuhotplug
> 
> Ming Lei (1):
>   blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
Now that Jens merged the block bits in his tree, wouldn't it be better
to re-send the SCSI bits only, thereby avoiding a potential merge error
later on?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-08 12:46 ` Hannes Reinecke
@ 2020-09-08 13:38   ` John Garry
  0 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-09-08 13:38 UTC (permalink / raw)
  To: Hannes Reinecke, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 08/09/2020 13:46, Hannes Reinecke wrote:
> Now that Jens merged the block bits in his tree, wouldn't it be better
> to re-send the SCSI bits only, thereby avoiding a potential merge error
> later on?
> 

Anything which I resend would need to be against Jens' tree (and not 
Martin's), assuming Jens will carry them also. So I am not sure how that 
will help.

JFYI, I just tested against today's linux-next, and the SCSI parts (hpsa 
and smartpqi omitted) still apply there without conflict.

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs
  2020-08-19 15:20 ` [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs John Garry
@ 2020-09-10  8:33   ` John Garry
  0 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-09-10  8:33 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen, don.brace, kashyap.desai, ming.lei,
	bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing

On 19/08/2020 16:20, John Garry wrote:
> So that we don't use a value of 0 for when Scsi_Host.nr_hw_queues is unset,
> use the tag_set->nr_hw_queues (which holds 1 for this case).
> 
> Signed-off-by: John Garry <john.garry@huawei.com>

Note that there has been no review on this patch yet.

It's not strictly necessary, but I see it as useful. The same info can 
be derived indirectly from block debugfs, but that's not always 
available (debugfs, that is).

Thanks,
john

> ---
>   drivers/scsi/scsi_sysfs.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 163dbcb741c1..d6e344fa33ad 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -393,6 +393,16 @@ show_use_blk_mq(struct device *dev, struct device_attribute *attr, char *buf)
>   }
>   static DEVICE_ATTR(use_blk_mq, S_IRUGO, show_use_blk_mq, NULL);
>   
> +static ssize_t
> +show_nr_hw_queues(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	struct Scsi_Host *shost = class_to_shost(dev);
> +	struct blk_mq_tag_set *tag_set = &shost->tag_set;
> +
> +	return snprintf(buf, 20, "%d\n", tag_set->nr_hw_queues);
> +}
> +static DEVICE_ATTR(nr_hw_queues, S_IRUGO, show_nr_hw_queues, NULL);
> +
>   static struct attribute *scsi_sysfs_shost_attrs[] = {
>   	&dev_attr_use_blk_mq.attr,
>   	&dev_attr_unique_id.attr,
> @@ -411,6 +421,7 @@ static struct attribute *scsi_sysfs_shost_attrs[] = {
>   	&dev_attr_prot_guard_type.attr,
>   	&dev_attr_host_reset.attr,
>   	&dev_attr_eh_deadline.attr,
> +	&dev_attr_nr_hw_queues.attr,
>   	NULL
>   };
>   
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-04 12:44     ` Martin K. Petersen
@ 2020-09-16  7:21       ` John Garry
  2020-09-17  1:10         ` Martin K. Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-09-16  7:21 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, jejb, don.brace, kashyap.desai, ming.lei, bvanassche,
	dgilbert, paolo.valente, hare, hch, sumit.saxena, linux-block,
	linux-kernel, linux-scsi, esc.storagedev, megaraidlinux.pdl,
	chenxiang66, luojiaxing

On 04/09/2020 13:44, Martin K. Petersen wrote:
>> Martin/James may want more review of the SCSI core bits, though.
> I'll take a look later today.


Hi Martin,

Have you had a chance to check these outstanding SCSI patches?

scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
scsi: scsi_debug: Support host tagset
scsi: hisi_sas: Switch v3 hw to MQ
scsi: core: Show nr_hw_queues in sysfs
scsi: Add host and host template flag 'host_tagset'

Cheers,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-16  7:21       ` John Garry
@ 2020-09-17  1:10         ` Martin K. Petersen
  2020-09-17  6:48           ` John Garry
  2020-09-28 16:11           ` Kashyap Desai
  0 siblings, 2 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-09-17  1:10 UTC (permalink / raw)
  To: John Garry
  Cc: Martin K. Petersen, Jens Axboe, jejb, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch,
	sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing


John,

> Have you had a chance to check these outstanding SCSI patches?
>
> scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
> scsi: scsi_debug: Support host tagset
> scsi: hisi_sas: Switch v3 hw to MQ
> scsi: core: Show nr_hw_queues in sysfs
> scsi: Add host and host template flag 'host_tagset'

These look good to me.

Jens, feel free to merge.

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-17  1:10         ` Martin K. Petersen
@ 2020-09-17  6:48           ` John Garry
  2020-09-21 21:35             ` Don.Brace
  2020-09-28 16:11           ` Kashyap Desai
  1 sibling, 1 reply; 58+ messages in thread
From: John Garry @ 2020-09-17  6:48 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, jejb, don.brace, kashyap.desai, ming.lei, bvanassche,
	dgilbert, paolo.valente, hare, hch, sumit.saxena, linux-block,
	linux-kernel, linux-scsi, esc.storagedev, megaraidlinux.pdl,
	chenxiang66, luojiaxing

> 
>> Have you had a chance to check these outstanding SCSI patches?
>>
>> scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
>> scsi: scsi_debug: Support host tagset
>> scsi: hisi_sas: Switch v3 hw to MQ
>> scsi: core: Show nr_hw_queues in sysfs
>> scsi: Add host and host template flag 'host_tagset'
> 
> These look good to me.
> 
> Jens, feel free to merge.

Hi Jens,

I'm waiting on the hpsa and smartpqi patches update, so please kindly 
merge only those patches, above.

Thanks!

> 
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
> 

Cheers Martin


^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-17  6:48           ` John Garry
@ 2020-09-21 21:35             ` Don.Brace
  2020-09-21 22:15               ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Don.Brace @ 2020-09-21 21:35 UTC (permalink / raw)
  To: john.garry, martin.petersen
  Cc: axboe, jejb, don.brace, kashyap.desai, ming.lei, bvanassche,
	dgilbert, paolo.valente, hare, hch, sumit.saxena, linux-block,
	linux-kernel, linux-scsi, esc.storagedev, megaraidlinux.pdl,
	chenxiang66, luojiaxing

Subject: Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs

>>Hi Jens,

>>I'm waiting on the hpsa and smartpqi patches >>update, so please kindly merge only those >>patches, above.

>>Thanks!

John, the hpsa driver crashes, the  or more patches to allow internal commands from Hannas seem to be missing.

I'll let you know exactly which ones soon.

Thanks,
Don

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-21 21:35             ` Don.Brace
@ 2020-09-21 22:15               ` John Garry
  2020-09-22  9:03                 ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-09-21 22:15 UTC (permalink / raw)
  To: Don.Brace, martin.petersen
  Cc: axboe, jejb, don.brace, kashyap.desai, ming.lei, bvanassche,
	dgilbert, paolo.valente, hare, hch, sumit.saxena, linux-block,
	linux-kernel, linux-scsi, esc.storagedev, megaraidlinux.pdl,
	chenxiang66, luojiaxing

On 21/09/2020 22:35, Don.Brace@microchip.com wrote:
>>> I'm waiting on the hpsa and smartpqi patches >>update, so please kindly merge only those >>patches, above.
>>> Thanks!
> John, the hpsa driver crashes, the  or more patches to allow internal commands from Hannas seem to be missing.
> 
> I'll let you know exactly which ones soon.
> 

Hi Don,

Right, that branch did not include Hannes patches as they did not apply 
cleanly, but I think that you need the same patches as before. I can 
create a branch for you to test which does include those tomorrow - let 
me know.

Alternatively I think that we could create a hpsa patch which does not 
rely on that series, like I mentioned here [0], but it would not be as 
clean.

Cheers,
John

https://lore.kernel.org/linux-scsi/dc0e72d8-7076-060c-3cd3-3d51ac7e6de8@huawei.com/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-21 22:15               ` John Garry
@ 2020-09-22  9:03                 ` John Garry
  0 siblings, 0 replies; 58+ messages in thread
From: John Garry @ 2020-09-22  9:03 UTC (permalink / raw)
  To: Don.Brace, martin.petersen
  Cc: axboe, jejb, don.brace, kashyap.desai, ming.lei, bvanassche,
	dgilbert, paolo.valente, hare, hch, sumit.saxena, linux-block,
	linux-kernel, linux-scsi, esc.storagedev, megaraidlinux.pdl,
	chenxiang66, luojiaxing

On 21/09/2020 23:15, John Garry wrote:
> On 21/09/2020 22:35, Don.Brace@microchip.com wrote:
>>>> I'm waiting on the hpsa and smartpqi patches >>update, so please 
>>>> kindly merge only those >>patches, above.
>>>> Thanks!
>> John, the hpsa driver crashes, the  or more patches to allow internal 
>> commands from Hannas seem to be missing.
>>
>> I'll let you know exactly which ones soon.
>>
> 
> Hi Don,
> 
> Right, that branch did not include Hannes patches as they did not apply 
> cleanly, but I think that you need the same patches as before. I can 
> create a branch for you to test which does include those tomorrow - let 
> me know.
> 
> Alternatively I think that we could create a hpsa patch which does not 
> rely on that series, like I mentioned here [0], but it would not be as 
> clean.
> 
> Cheers,
> John
> 
> https://lore.kernel.org/linux-scsi/dc0e72d8-7076-060c-3cd3-3d51ac7e6de8@huawei.com/ 
> 
> .

JFYI, I put the reserved command v6 patchset and $subject patchset on 
this branch:

https://github.com/hisilicon/kernel-dev/commits/private-topic-blk-mq-shared-tags-rfc-v8-resv-v6-baseline

I don't have HW to test hpsa or smartpqi.

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-17  1:10         ` Martin K. Petersen
  2020-09-17  6:48           ` John Garry
@ 2020-09-28 16:11           ` Kashyap Desai
  2020-10-06 14:24             ` John Garry
  1 sibling, 1 reply; 58+ messages in thread
From: Kashyap Desai @ 2020-09-28 16:11 UTC (permalink / raw)
  To: Martin K. Petersen, John Garry, Jens Axboe
  Cc: jejb, don.brace, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch, Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

>
> John,
>
> > Have you had a chance to check these outstanding SCSI patches?
> >
> > scsi: megaraid_sas: Added support for shared host tagset for
cpuhotplug
> > scsi: scsi_debug: Support host tagset
> > scsi: hisi_sas: Switch v3 hw to MQ
> > scsi: core: Show nr_hw_queues in sysfs
> > scsi: Add host and host template flag 'host_tagset'
>
> These look good to me.
>
> Jens, feel free to merge.

Hi Jens, Gentle ping. I am not able to find commits for above listed scsi
patches. I want to use your repo which has above mentioned patch for
<scsi:io_uring iopoll> patch submission.  Martin has Acked the scsi
patches.

>
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> --
> Martin K. Petersen	Oracle Linux Engineering

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-09-28 16:11           ` Kashyap Desai
@ 2020-10-06 14:24             ` John Garry
  2020-10-06 14:42               ` Jens Axboe
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-10-06 14:24 UTC (permalink / raw)
  To: Kashyap Desai, Martin K. Petersen, Jens Axboe
  Cc: jejb, don.brace, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch, Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing

On 28/09/2020 17:11, Kashyap Desai wrote:
>>
>> John,
>>
>>> Have you had a chance to check these outstanding SCSI patches?
>>>
>>> scsi: megaraid_sas: Added support for shared host tagset for
> cpuhotplug
>>> scsi: scsi_debug: Support host tagset
>>> scsi: hisi_sas: Switch v3 hw to MQ
>>> scsi: core: Show nr_hw_queues in sysfs
>>> scsi: Add host and host template flag 'host_tagset'
>>
>> These look good to me.
>>
>> Jens, feel free to merge.
> 
> Hi Jens, Gentle ping. I am not able to find commits for above listed scsi
> patches. I want to use your repo which has above mentioned patch for
> <scsi:io_uring iopoll> patch submission.  Martin has Acked the scsi
> patches.
> 
>>
>> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>>
>> --
>> Martin K. Petersen	Oracle Linux Engineering


Hi Jens,

Could you kindly pick up the following patches, to go along with the 
blk-mq changes:

scsi: megaraid_sas: Added support for shared host tagset for
cpuhotplug
scsi: scsi_debug: Support host tagset
scsi: hisi_sas: Switch v3 hw to MQ
scsi: core: Show nr_hw_queues in sysfs
scsi: Add host and host template flag 'host_tagset'

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
  2020-10-06 14:24             ` John Garry
@ 2020-10-06 14:42               ` Jens Axboe
  0 siblings, 0 replies; 58+ messages in thread
From: Jens Axboe @ 2020-10-06 14:42 UTC (permalink / raw)
  To: John Garry, Kashyap Desai, Martin K. Petersen
  Cc: jejb, don.brace, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch, Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing

On 10/6/20 8:24 AM, John Garry wrote:
> On 28/09/2020 17:11, Kashyap Desai wrote:
>>>
>>> John,
>>>
>>>> Have you had a chance to check these outstanding SCSI patches?
>>>>
>>>> scsi: megaraid_sas: Added support for shared host tagset for
>> cpuhotplug
>>>> scsi: scsi_debug: Support host tagset
>>>> scsi: hisi_sas: Switch v3 hw to MQ
>>>> scsi: core: Show nr_hw_queues in sysfs
>>>> scsi: Add host and host template flag 'host_tagset'
>>>
>>> These look good to me.
>>>
>>> Jens, feel free to merge.
>>
>> Hi Jens, Gentle ping. I am not able to find commits for above listed scsi
>> patches. I want to use your repo which has above mentioned patch for
>> <scsi:io_uring iopoll> patch submission.  Martin has Acked the scsi
>> patches.
>>
>>>
>>> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>>>
>>> --
>>> Martin K. Petersen	Oracle Linux Engineering
> 
> 
> Hi Jens,
> 
> Could you kindly pick up the following patches, to go along with the 
> blk-mq changes:
> 
> scsi: megaraid_sas: Added support for shared host tagset for
> cpuhotplug
> scsi: scsi_debug: Support host tagset
> scsi: hisi_sas: Switch v3 hw to MQ
> scsi: core: Show nr_hw_queues in sysfs
> scsi: Add host and host template flag 'host_tagset'

Sorry about the delay, picked up these 5 patches.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-08-19 15:20 ` [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug John Garry
@ 2020-11-02 14:17   ` Qian Cai
  2020-11-02 14:31     ` Kashyap Desai
  2020-11-02 14:51     ` John Garry
  0 siblings, 2 replies; 58+ messages in thread
From: Qian Cai @ 2020-11-02 14:17 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote:
> From: Kashyap Desai <kashyap.desai@broadcom.com>
> 
> Fusion adapters can steer completions to individual queues, and
> we now have support for shared host-wide tags.
> So we can enable multiqueue support for fusion adapters.
> 
> Once driver enable shared host-wide tags, cpu hotplug feature is also
> supported as it was enabled using below patchsets -
> commit bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> offline")
> 
> Currently driver has provision to disable host-wide tags using
> "host_tagset_enable" module parameter.
> 
> Once we do not have any major performance regression using host-wide
> tags, we will drop the hand-crafted interrupt affinity settings.
> 
> Performance is also meeting the expecatation - (used both none and
> mq-deadline scheduler)
> 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs
> 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get 3.1M
> IOPs.
> 
> Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> Signed-off-by: John Garry <john.garry@huawei.com>

Reverting this commit fixed an issue that Dell Power Edge R6415 server with
megaraid_sas is unable to boot.

c1:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
	DeviceName: Integrated RAID
	Subsystem: Dell PERC H730P Mini
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 48
	NUMA node: 3
	Region 0: I/O ports at c000 [size=256]
	Region 1: Memory at a5500000 (64-bit, non-prefetchable) [size=64K]
	Region 3: Memory at a5400000 (64-bit, non-prefetchable) [size=1M]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 512 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x8 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [c0] MSI-X: Enable+ Count=97 Masked-
		Vector table: BAR=1 offset=0000e000
		PBA: BAR=1 offset=0000f000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 04000001 c000000f c1080000 4ba9007a
	Capabilities: [1e0 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
		LaneErrStat: 0
	Capabilities: [1c0 v1] Power Budgeting <?>
	Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Kernel driver in use: megaraid_sas
	Kernel modules: megaraid_sas

[   26.330282][  T567] megasas: 07.714.04.00-rc1
[   26.355663][  T611] ahci 0000:87:00.2: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
[   26.364585][  T611] ahci 0000:87:00.2: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum part 
[   26.376125][  T289] megaraid_sas 0000:c1:00.0: FW now in Ready state
[   26.382534][  T289] megaraid_sas 0000:c1:00.0: 63 bit DMA mask and 32 bit consistent mask
[   26.391537][  T289] megaraid_sas 0000:c1:00.0: firmware supports msix	: (96)
[   26.431767][  T611] scsi host1: ahci
[   26.492580][  T611] ata1: SATA max UDMA/133 abar m4096@0xc0a02000 port 0xc0a02100 irq 60
[   26.701197][  T283] bnxt_en 0000:84:00.0 eth0: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem ad210000, node addr 4c:d9:8f:4a:20:e6
[   26.714352][  T283] bnxt_en 0000:84:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[   26.743738][   T24] tg3 0000:81:00.0 eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:65:3f:32
[   26.754974][   T24] tg3 0000:81:00.0 eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   26.765523][   T24] tg3 0000:81:00.0 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   26.774074][   T24] tg3 0000:81:00.0 eth1: dma_rwctrl[00000001] dma_mask[64-bit]
[   26.842518][  T620] ata1: SATA link down (SStatus 0 SControl 300)
[   26.945741][  T289] megaraid_sas 0000:c1:00.0: requested/available msix 49/49
[   26.952912][  T289] megaraid_sas 0000:c1:00.0: current msix/online cpus	: (49/48)
[   26.960401][  T289] megaraid_sas 0000:c1:00.0: RDPQ mode	: (disabled)
[   26.966876][  T289] megaraid_sas 0000:c1:00.0: Current firmware supports maximum commands: 928	 LDIO threshold: 0
[   27.079361][  T289] megaraid_sas 0000:c1:00.0: Performance mode :Latency (latency index = 1)
[   27.085381][  T283] bnxt_en 0000:84:00.1 eth2: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet found at mem ad200000, node addr 4c:d9:8f:4a:20:e7
[   27.087824][  T289] megaraid_sas 0000:c1:00.0: FW supports sync cache	: No
[   27.100959][  T283] bnxt_en 0000:84:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[   27.107835][  T289] megaraid_sas 0000:c1:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[   27.130978][   T24] tg3 0000:81:00.1 eth3: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 4c:d9:8f:65:3f:33
[   27.142919][   T24] tg3 0000:81:00.1 eth3: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[   27.146042][  T571] bnxt_en 0000:84:00.1 enp132s0f1np1: renamed from eth2
[   27.153456][   T24] tg3 0000:81:00.1 eth3: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[   27.153467][   T24] tg3 0000:81:00.1 eth3: dma_rwctrl[00000001] dma_mask[64-bit]
[   27.200900][  T289] megaraid_sas 0000:c1:00.0: FW provided supportMaxExtLDs: 1	max_lds: 64
[   27.209174][  T289] megaraid_sas 0000:c1:00.0: controller type	: MR(2048MB)
[   27.216260][  T289] megaraid_sas 0000:c1:00.0: Online Controller Reset(OCR)	: Enabled
[   27.224105][  T289] megaraid_sas 0000:c1:00.0: Secure JBOD support	: No
[   27.230720][  T289] megaraid_sas 0000:c1:00.0: NVMe passthru support	: No
[   27.237527][  T289] megaraid_sas 0000:c1:00.0: FW provided TM TaskAbort/Reset timeout	: 0 secs/0 secs
[   27.246754][  T289] megaraid_sas 0000:c1:00.0: JBOD sequence map support	: No
[   27.253906][  T289] megaraid_sas 0000:c1:00.0: PCI Lane Margining support	: No
[   27.341447][  T289] megaraid_sas 0000:c1:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
[   27.351729][  T289] megaraid_sas 0000:c1:00.0: INIT adapter done
[   27.357742][  T289] megaraid_sas 0000:c1:00.0: JBOD sequence map is disabled megasas_setup_jbod_map 5709
[   27.367832][  T289] megaraid_sas 0000:c1:00.0: pci id		: (0x1000)/(0x005d)/(0x1028)/(0x1f47)
[   27.376287][  T289] megaraid_sas 0000:c1:00.0: unevenspan support	: yes
[   27.382925][  T289] megaraid_sas 0000:c1:00.0: firmware crash dump	: no
[   27.389547][  T289] megaraid_sas 0000:c1:00.0: JBOD sequence map	: disabled
[   27.397816][  T289] megaraid_sas 0000:c1:00.0: Max firmware commands: 927 shared with nr_hw_queues = 48
[   27.407232][  T289] scsi host0: Avago SAS based MegaRAID driver
[   27.430212][  T586] bnxt_en 0000:84:00.0 enp132s0f0np0: renamed from eth0
[   27.781038][  T603] tg3 0000:81:00.0 eno1: renamed from eth1
[   28.194046][  T552] tg3 0000:81:00.1 eno2: renamed from eth3

[  251.961152][  T330] INFO: task systemd-udevd:567 blocked for more than 122 seconds.
[  251.968876][  T330]       Not tainted 5.10.0-rc1-next-20201102 #1
[  251.975003][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  251.983546][  T330] task:systemd-udevd   state:D stack:27224 pid:  567 ppid:   506 flags:0x00004324
[  251.992620][  T330] Call Trace:
[  251.995784][  T330]  __schedule+0x71d/0x1b60
[  252.000067][  T330]  ? __sched_text_start+0x8/0x8
[  252.004798][  T330]  schedule+0xbf/0x270
[  252.008735][  T330]  schedule_timeout+0x3fc/0x590
[  252.013464][  T330]  ? usleep_range+0x120/0x120
[  252.018008][  T330]  ? wait_for_completion+0x156/0x250
[  252.023176][  T330]  ? lock_downgrade+0x700/0x700
[  252.027886][  T330]  ? rcu_read_unlock+0x40/0x40
[  252.032530][  T330]  ? do_raw_spin_lock+0x121/0x290
[  252.037412][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[  252.043268][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
[  252.048331][  T330]  wait_for_completion+0x15e/0x250
[  252.053323][  T330]  ? wait_for_completion_interruptible+0x320/0x320
[  252.059687][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[  252.065543][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
[  252.070606][  T330]  __flush_work+0x42a/0x900
[  252.074989][  T330]  ? queue_delayed_work_on+0x90/0x90
[  252.080139][  T330]  ? __queue_work+0x463/0xf40
[  252.084700][  T330]  ? init_pwq+0x320/0x320
[  252.088891][  T330]  ? queue_work_on+0x5e/0x80
[  252.093364][  T330]  ? trace_hardirqs_on+0x1c/0x150
[  252.098255][  T330]  work_on_cpu+0xe7/0x130
[  252.102461][  T330]  ? flush_delayed_work+0xc0/0xc0
[  252.107342][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
[  252.112764][  T330]  ? work_debug_hint+0x30/0x30
[  252.117391][  T330]  ? pci_device_shutdown+0x80/0x80
[  252.122378][  T330]  ? cpumask_next_and+0x57/0x80
[  252.127094][  T330]  pci_device_probe+0x500/0x5c0
[  252.131824][  T330]  ? pci_device_remove+0x1f0/0x1f0
[  252.136805][  T330]  really_probe+0x207/0xad0
[  252.141191][  T330]  ? device_driver_attach+0x120/0x120
[  252.146428][  T330]  driver_probe_device+0x1f1/0x370
[  252.151424][  T330]  device_driver_attach+0xe5/0x120
[  252.156399][  T330]  __driver_attach+0xf0/0x260
[  252.160953][  T330]  bus_for_each_dev+0x117/0x1a0
[  252.165669][  T330]  ? subsys_dev_iter_exit+0x10/0x10
[  252.170731][  T330]  bus_add_driver+0x399/0x560
[  252.175289][  T330]  driver_register+0x189/0x310
[  252.179919][  T330]  ? 0xffffffffc05c1000
[  252.183960][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
[  252.189713][  T330]  do_one_initcall+0xf6/0x510
[  252.194267][  T330]  ? perf_trace_initcall_level+0x490/0x490
[  252.199940][  T330]  ? kasan_unpoison_shadow+0x30/0x40
[  252.205104][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
[  252.210859][  T330]  ? do_init_module+0x49/0x6c0
[  252.215500][  T330]  ? kmem_cache_alloc_trace+0x11f/0x1e0
[  252.220925][  T330]  ? kasan_unpoison_shadow+0x30/0x40
[  252.226068][  T330]  do_init_module+0x1ed/0x6c0
[  252.230608][  T330]  load_module+0x4a59/0x5d20
[  252.235081][  T330]  ? layout_and_allocate+0x2770/0x2770
[  252.240404][  T330]  ? __vmalloc_node+0x8d/0x100
[  252.245046][  T330]  ? kernel_read_file+0x485/0x5a0
[  252.249934][  T330]  ? kernel_read_file+0x305/0x5a0
[  252.254839][  T330]  ? __x64_sys_fsconfig+0x970/0x970
[  252.259903][  T330]  ? __do_sys_finit_module+0xff/0x180
[  252.265153][  T330]  __do_sys_finit_module+0xff/0x180
[  252.270216][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
[  252.275465][  T330]  ? __fget_files+0x1c3/0x2e0
[  252.280010][  T330]  do_syscall_64+0x33/0x40
[  252.284304][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  252.290054][  T330] RIP: 0033:0x7fbb3e2fa78d
[  252.294348][  T330] Code: Unable to access opcode bytes at RIP 0x7fbb3e2fa763.
[  252.301584][  T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  252.309855][  T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: 00007fbb3e2fa78d
[  252.317703][  T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: 0000000000000006
[  252.325553][  T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: 00007ffe572e8e40
[  252.333402][  T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[  252.341257][  T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: 0000000000000000
[  252.349117][  T330] 
[  252.349117][  T330] Showing all locks held in the system:
[  252.356770][  T330] 3 locks held by kworker/3:1/289:
[  252.361759][  T330]  #0: ffff8881001eb938 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ec/0x1610
[  252.371976][  T330]  #1: ffffc90004ee7e00 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610
[  252.382803][  T330]  #2: ffff8881430380e0 (&shost->scan_mutex){+.+.}-{3:3}, at: scsi_scan_host_selected+0xde/0x260
[  252.393199][  T330] 1 lock held by khungtaskd/330:
[  252.397993][  T330]  #0: ffffffff9d4d3760 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.52+0x0/0x30
[  252.408296][  T330] 1 lock held by systemd-journal/420:
[  252.413562][  T330] 1 lock held by systemd-udevd/567:
[  252.418619][  T330]  #0: ffff8881207ac218 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x37/0x120
[  252.428159][  T330] 
[  252.430355][  T330] =============================================
[  252.430355][  T330] 

> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 39 +++++++++++++++++++++
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 29 ++++++++-------
>  2 files changed, 55 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 861f7140f52e..6960922d0d7f 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -37,6 +37,7 @@
>  #include <linux/poll.h>
>  #include <linux/vmalloc.h>
>  #include <linux/irq_poll.h>
> +#include <linux/blk-mq-pci.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -113,6 +114,10 @@ unsigned int enable_sdev_max_qd;
>  module_param(enable_sdev_max_qd, int, 0444);
>  MODULE_PARM_DESC(enable_sdev_max_qd, "Enable sdev max qd as can_queue.
> Default: 0");
>  
> +int host_tagset_enable = 1;
> +module_param(host_tagset_enable, int, 0444);
> +MODULE_PARM_DESC(host_tagset_enable, "Shared host tagset enable/disable
> Default: enable(1)");
> +
>  MODULE_LICENSE("GPL");
>  MODULE_VERSION(MEGASAS_VERSION);
>  MODULE_AUTHOR("megaraidlinux.pdl@broadcom.com");
> @@ -3119,6 +3124,19 @@ megasas_bios_param(struct scsi_device *sdev, struct
> block_device *bdev,
>  	return 0;
>  }
>  
> +static int megasas_map_queues(struct Scsi_Host *shost)
> +{
> +	struct megasas_instance *instance;
> +
> +	instance = (struct megasas_instance *)shost->hostdata;
> +
> +	if (shost->nr_hw_queues == 1)
> +		return 0;
> +
> +	return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
> +			instance->pdev, instance->low_latency_index_start);
> +}
> +
>  static void megasas_aen_polling(struct work_struct *work);
>  
>  /**
> @@ -3427,6 +3445,7 @@ static struct scsi_host_template megasas_template = {
>  	.eh_timed_out = megasas_reset_timer,
>  	.shost_attrs = megaraid_host_attrs,
>  	.bios_param = megasas_bios_param,
> +	.map_queues = megasas_map_queues,
>  	.change_queue_depth = scsi_change_queue_depth,
>  	.max_segment_size = 0xffffffff,
>  };
> @@ -6808,6 +6827,26 @@ static int megasas_io_attach(struct megasas_instance
> *instance)
>  	host->max_lun = MEGASAS_MAX_LUN;
>  	host->max_cmd_len = 16;
>  
> +	/* Use shared host tagset only for fusion adaptors
> +	 * if there are managed interrupts (smp affinity enabled case).
> +	 * Single msix_vectors in kdump, so shared host tag is also disabled.
> +	 */
> +
> +	host->host_tagset = 0;
> +	host->nr_hw_queues = 1;
> +
> +	if ((instance->adapter_type != MFI_SERIES) &&
> +		(instance->msix_vectors > instance->low_latency_index_start) &&
> +		host_tagset_enable &&
> +		instance->smp_affinity_enable) {
> +		host->host_tagset = 1;
> +		host->nr_hw_queues = instance->msix_vectors -
> +			instance->low_latency_index_start;
> +	}
> +
> +	dev_info(&instance->pdev->dev,
> +		"Max firmware commands: %d shared with nr_hw_queues = %d\n",
> +		instance->max_fw_cmds, host->nr_hw_queues);
>  	/*
>  	 * Notify the mid-layer about the new controller
>  	 */
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index 0824410f78f8..a4251121f173 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -359,24 +359,29 @@ megasas_get_msix_index(struct megasas_instance
> *instance,
>  {
>  	int sdev_busy;
>  
> -	/* nr_hw_queue = 1 for MegaRAID */
> -	struct blk_mq_hw_ctx *hctx =
> -		scmd->device->request_queue->queue_hw_ctx[0];
> -
> -	sdev_busy = atomic_read(&hctx->nr_active);
> +	/* TBD - if sml remove device_busy in future, driver
> +	 * should track counter in internal structure.
> +	 */
> +	sdev_busy = atomic_read(&scmd->device->device_busy);
>  
>  	if (instance->perf_mode == MR_BALANCED_PERF_MODE &&
> -	    sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH))
> +	    sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH)) {
>  		cmd->request_desc->SCSIIO.MSIxIndex =
>  			mega_mod64((atomic64_add_return(1, &instance-
> >high_iops_outstanding) /
>  					MR_HIGH_IOPS_BATCH_COUNT), instance-
> >low_latency_index_start);
> -	else if (instance->msix_load_balance)
> +	} else if (instance->msix_load_balance) {
>  		cmd->request_desc->SCSIIO.MSIxIndex =
>  			(mega_mod64(atomic64_add_return(1, &instance-
> >total_io_count),
>  				instance->msix_vectors));
> -	else
> +	} else if (instance->host->nr_hw_queues > 1) {
> +		u32 tag = blk_mq_unique_tag(scmd->request);
> +
> +		cmd->request_desc->SCSIIO.MSIxIndex =
> blk_mq_unique_tag_to_hwq(tag) +
> +			instance->low_latency_index_start;
> +	} else {
>  		cmd->request_desc->SCSIIO.MSIxIndex =
>  			instance->reply_map[raw_smp_processor_id()];
> +	}
>  }
>  
>  /**
> @@ -956,9 +961,6 @@ megasas_alloc_cmds_fusion(struct megasas_instance
> *instance)
>  	if (megasas_alloc_cmdlist_fusion(instance))
>  		goto fail_exit;
>  
> -	dev_info(&instance->pdev->dev, "Configured max firmware commands: %d\n",
> -		 instance->max_fw_cmds);
> -
>  	/* The first 256 bytes (SMID 0) is not used. Don't add to the cmd list
> */
>  	io_req_base = fusion->io_request_frames +
> MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
>  	io_req_base_phys = fusion->io_request_frames_phys +
> MEGA_MPI2_RAID_DEFAULT_IO_FRAME_SIZE;
> @@ -1102,8 +1104,9 @@ megasas_ioc_init_fusion(struct megasas_instance
> *instance)
>  		MR_HIGH_IOPS_QUEUE_COUNT) && cur_intr_coalescing)
>  		instance->perf_mode = MR_BALANCED_PERF_MODE;
>  
> -	dev_info(&instance->pdev->dev, "Performance mode :%s\n",
> -		MEGASAS_PERF_MODE_2STR(instance->perf_mode));
> +	dev_info(&instance->pdev->dev, "Performance mode :%s (latency index =
> %d)\n",
> +		MEGASAS_PERF_MODE_2STR(instance->perf_mode),
> +		instance->low_latency_index_start);
>  
>  	instance->fw_sync_cache_support = (scratch_pad_1 &
>  		MR_CAN_HANDLE_SYNC_CACHE_OFFSET) ? 1 : 0;


^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-02 14:17   ` Qian Cai
@ 2020-11-02 14:31     ` Kashyap Desai
  2020-11-02 15:24       ` Qian Cai
  2020-11-02 14:51     ` John Garry
  1 sibling, 1 reply; 58+ messages in thread
From: Kashyap Desai @ 2020-11-02 14:31 UTC (permalink / raw)
  To: Qian Cai, John Garry, axboe, jejb, martin.petersen, don.brace,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

> On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote:
> > From: Kashyap Desai <kashyap.desai@broadcom.com>
> >
> > Fusion adapters can steer completions to individual queues, and we now
> > have support for shared host-wide tags.
> > So we can enable multiqueue support for fusion adapters.
> >
> > Once driver enable shared host-wide tags, cpu hotplug feature is also
> > supported as it was enabled using below patchsets - commit
> > bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> > offline")
> >
> > Currently driver has provision to disable host-wide tags using
> > "host_tagset_enable" module parameter.
> >
> > Once we do not have any major performance regression using host-wide
> > tags, we will drop the hand-crafted interrupt affinity settings.
> >
> > Performance is also meeting the expecatation - (used both none and
> > mq-deadline scheduler)
> > 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs
> > 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get
> > 3.1M IOPs.
> >
> > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
> > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > Signed-off-by: John Garry <john.garry@huawei.com>
>
> Reverting this commit fixed an issue that Dell Power Edge R6415 server
> with
> megaraid_sas is unable to boot.

I will take a look at this. BTW, can you try keeping same PATCH but use
module parameter "host_tagset_enable =0"

Kashyap

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-02 14:17   ` Qian Cai
  2020-11-02 14:31     ` Kashyap Desai
@ 2020-11-02 14:51     ` John Garry
  2020-11-02 15:18       ` Qian Cai
  1 sibling, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-02 14:51 UTC (permalink / raw)
  To: Qian Cai, axboe, jejb, martin.petersen, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On 02/11/2020 14:17, Qian Cai wrote:
> [  251.961152][  T330] INFO: task systemd-udevd:567 blocked for more than 122 seconds.
> [  251.968876][  T330]       Not tainted 5.10.0-rc1-next-20201102 #1
> [  251.975003][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  251.983546][  T330] task:systemd-udevd   state:D stack:27224 pid:  567 ppid:   506 flags:0x00004324
> [  251.992620][  T330] Call Trace:
> [  251.995784][  T330]  __schedule+0x71d/0x1b60
> [  252.000067][  T330]  ? __sched_text_start+0x8/0x8
> [  252.004798][  T330]  schedule+0xbf/0x270
> [  252.008735][  T330]  schedule_timeout+0x3fc/0x590
> [  252.013464][  T330]  ? usleep_range+0x120/0x120
> [  252.018008][  T330]  ? wait_for_completion+0x156/0x250
> [  252.023176][  T330]  ? lock_downgrade+0x700/0x700
> [  252.027886][  T330]  ? rcu_read_unlock+0x40/0x40
> [  252.032530][  T330]  ? do_raw_spin_lock+0x121/0x290
> [  252.037412][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> [  252.043268][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  252.048331][  T330]  wait_for_completion+0x15e/0x250
> [  252.053323][  T330]  ? wait_for_completion_interruptible+0x320/0x320
> [  252.059687][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> [  252.065543][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  252.070606][  T330]  __flush_work+0x42a/0x900
> [  252.074989][  T330]  ? queue_delayed_work_on+0x90/0x90
> [  252.080139][  T330]  ? __queue_work+0x463/0xf40
> [  252.084700][  T330]  ? init_pwq+0x320/0x320
> [  252.088891][  T330]  ? queue_work_on+0x5e/0x80
> [  252.093364][  T330]  ? trace_hardirqs_on+0x1c/0x150
> [  252.098255][  T330]  work_on_cpu+0xe7/0x130
> [  252.102461][  T330]  ? flush_delayed_work+0xc0/0xc0
> [  252.107342][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
> [  252.112764][  T330]  ? work_debug_hint+0x30/0x30
> [  252.117391][  T330]  ? pci_device_shutdown+0x80/0x80
> [  252.122378][  T330]  ? cpumask_next_and+0x57/0x80
> [  252.127094][  T330]  pci_device_probe+0x500/0x5c0
> [  252.131824][  T330]  ? pci_device_remove+0x1f0/0x1f0

Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call.

Or please share the .config

Cheers,
John

> [  252.136805][  T330]  really_probe+0x207/0xad0
> [  252.141191][  T330]  ? device_driver_attach+0x120/0x120
> [  252.146428][  T330]  driver_probe_device+0x1f1/0x370
> [  252.151424][  T330]  device_driver_attach+0xe5/0x120
> [  252.156399][  T330]  __driver_attach+0xf0/0x260
> [  252.160953][  T330]  bus_for_each_dev+0x117/0x1a0
> [  252.165669][  T330]  ? subsys_dev_iter_exit+0x10/0x10
> [  252.170731][  T330]  bus_add_driver+0x399/0x560
> [  252.175289][  T330]  driver_register+0x189/0x310
> [  252.179919][  T330]  ? 0xffffffffc05c1000
> [  252.183960][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
> [  252.189713][  T330]  do_one_initcall+0xf6/0x510
> [  252.194267][  T330]  ? perf_trace_initcall_level+0x490/0x490
> [  252.199940][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> [  252.205104][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
> [  252.210859][  T330]  ? do_init_module+0x49/0x6c0
> [  252.215500][  T330]  ? kmem_cache_alloc_trace+0x11f/0x1e0
> [  252.220925][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> [  252.226068][  T330]  do_init_module+0x1ed/0x6c0
> [  252.230608][  T330]  load_module+0x4a59/0x5d20
> [  252.235081][  T330]  ? layout_and_allocate+0x2770/0x2770
> [  252.240404][  T330]  ? __vmalloc_node+0x8d/0x100
> [  252.245046][  T330]  ? kernel_read_file+0x485/0x5a0
> [  252.249934][  T330]  ? kernel_read_file+0x305/0x5a0
> [  252.254839][  T330]  ? __x64_sys_fsconfig+0x970/0x970
> [  252.259903][  T330]  ? __do_sys_finit_module+0xff/0x180
> [  252.265153][  T330]  __do_sys_finit_module+0xff/0x180
> [  252.270216][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
> [  252.275465][  T330]  ? __fget_files+0x1c3/0x2e0
> [  252.280010][  T330]  do_syscall_64+0x33/0x40
> [  252.284304][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  252.290054][  T330] RIP: 0033:0x7fbb3e2fa78d
> [  252.294348][  T330] Code: Unable to access opcode bytes at RIP 0x7fbb3e2fa763.
> [  252.301584][  T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [  252.309855][  T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX: 00007fbb3e2fa78d
> [  252.317703][  T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI: 0000000000000006
> [  252.325553][  T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09: 00007ffe572e8e40
> [  252.333402][  T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
> [  252.341257][  T330] R13: 000055c7795930e0 R14: 0000000000020000 R15: 0000000000000000
> [  252.349117][  T330]


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-02 14:51     ` John Garry
@ 2020-11-02 15:18       ` Qian Cai
  2020-11-03 10:54         ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-02 15:18 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On Mon, 2020-11-02 at 14:51 +0000, John Garry wrote:
> On 02/11/2020 14:17, Qian Cai wrote:
> > [  251.961152][  T330] INFO: task systemd-udevd:567 blocked for more than
> > 122 seconds.
> > [  251.968876][  T330]       Not tainted 5.10.0-rc1-next-20201102 #1
> > [  251.975003][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  251.983546][  T330] task:systemd-udevd   state:D stack:27224 pid:  567
> > ppid:   506 flags:0x00004324
> > [  251.992620][  T330] Call Trace:
> > [  251.995784][  T330]  __schedule+0x71d/0x1b60
> > [  252.000067][  T330]  ? __sched_text_start+0x8/0x8
> > [  252.004798][  T330]  schedule+0xbf/0x270
> > [  252.008735][  T330]  schedule_timeout+0x3fc/0x590
> > [  252.013464][  T330]  ? usleep_range+0x120/0x120
> > [  252.018008][  T330]  ? wait_for_completion+0x156/0x250
> > [  252.023176][  T330]  ? lock_downgrade+0x700/0x700
> > [  252.027886][  T330]  ? rcu_read_unlock+0x40/0x40
> > [  252.032530][  T330]  ? do_raw_spin_lock+0x121/0x290
> > [  252.037412][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> > [  252.043268][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> > [  252.048331][  T330]  wait_for_completion+0x15e/0x250
> > [  252.053323][  T330]  ? wait_for_completion_interruptible+0x320/0x320
> > [  252.059687][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> > [  252.065543][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> > [  252.070606][  T330]  __flush_work+0x42a/0x900
> > [  252.074989][  T330]  ? queue_delayed_work_on+0x90/0x90
> > [  252.080139][  T330]  ? __queue_work+0x463/0xf40
> > [  252.084700][  T330]  ? init_pwq+0x320/0x320
> > [  252.088891][  T330]  ? queue_work_on+0x5e/0x80
> > [  252.093364][  T330]  ? trace_hardirqs_on+0x1c/0x150
> > [  252.098255][  T330]  work_on_cpu+0xe7/0x130
> > [  252.102461][  T330]  ? flush_delayed_work+0xc0/0xc0
> > [  252.107342][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
> > [  252.112764][  T330]  ? work_debug_hint+0x30/0x30
> > [  252.117391][  T330]  ? pci_device_shutdown+0x80/0x80
> > [  252.122378][  T330]  ? cpumask_next_and+0x57/0x80
> > [  252.127094][  T330]  pci_device_probe+0x500/0x5c0
> > [  252.131824][  T330]  ? pci_device_remove+0x1f0/0x1f0
> 
> Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call.
> 
> Or please share the .config

No. https://cailca.coding.net/public/linux/mm/git/files/master/x86.config

> 
> Cheers,
> John
> 
> > [  252.136805][  T330]  really_probe+0x207/0xad0
> > [  252.141191][  T330]  ? device_driver_attach+0x120/0x120
> > [  252.146428][  T330]  driver_probe_device+0x1f1/0x370
> > [  252.151424][  T330]  device_driver_attach+0xe5/0x120
> > [  252.156399][  T330]  __driver_attach+0xf0/0x260
> > [  252.160953][  T330]  bus_for_each_dev+0x117/0x1a0
> > [  252.165669][  T330]  ? subsys_dev_iter_exit+0x10/0x10
> > [  252.170731][  T330]  bus_add_driver+0x399/0x560
> > [  252.175289][  T330]  driver_register+0x189/0x310
> > [  252.179919][  T330]  ? 0xffffffffc05c1000
> > [  252.183960][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
> > [  252.189713][  T330]  do_one_initcall+0xf6/0x510
> > [  252.194267][  T330]  ? perf_trace_initcall_level+0x490/0x490
> > [  252.199940][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> > [  252.205104][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
> > [  252.210859][  T330]  ? do_init_module+0x49/0x6c0
> > [  252.215500][  T330]  ? kmem_cache_alloc_trace+0x11f/0x1e0
> > [  252.220925][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> > [  252.226068][  T330]  do_init_module+0x1ed/0x6c0
> > [  252.230608][  T330]  load_module+0x4a59/0x5d20
> > [  252.235081][  T330]  ? layout_and_allocate+0x2770/0x2770
> > [  252.240404][  T330]  ? __vmalloc_node+0x8d/0x100
> > [  252.245046][  T330]  ? kernel_read_file+0x485/0x5a0
> > [  252.249934][  T330]  ? kernel_read_file+0x305/0x5a0
> > [  252.254839][  T330]  ? __x64_sys_fsconfig+0x970/0x970
> > [  252.259903][  T330]  ? __do_sys_finit_module+0xff/0x180
> > [  252.265153][  T330]  __do_sys_finit_module+0xff/0x180
> > [  252.270216][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
> > [  252.275465][  T330]  ? __fget_files+0x1c3/0x2e0
> > [  252.280010][  T330]  do_syscall_64+0x33/0x40
> > [  252.284304][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  252.290054][  T330] RIP: 0033:0x7fbb3e2fa78d
> > [  252.294348][  T330] Code: Unable to access opcode bytes at RIP
> > 0x7fbb3e2fa763.
> > [  252.301584][  T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000139
> > [  252.309855][  T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX:
> > 00007fbb3e2fa78d
> > [  252.317703][  T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI:
> > 0000000000000006
> > [  252.325553][  T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09:
> > 00007ffe572e8e40
> > [  252.333402][  T330] R10: 0000000000000006 R11: 0000000000000246 R12:
> > 0000000000000000
> > [  252.341257][  T330] R13: 000055c7795930e0 R14: 0000000000020000 R15:
> > 0000000000000000
> > [  252.349117][  T330]


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-02 14:31     ` Kashyap Desai
@ 2020-11-02 15:24       ` Qian Cai
  0 siblings, 0 replies; 58+ messages in thread
From: Qian Cai @ 2020-11-02 15:24 UTC (permalink / raw)
  To: Kashyap Desai, John Garry, axboe, jejb, martin.petersen,
	don.brace, ming.lei, bvanassche, dgilbert, paolo.valente, hare,
	hch
  Cc: Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

On Mon, 2020-11-02 at 20:01 +0530, Kashyap Desai wrote:
> > On Wed, 2020-08-19 at 23:20 +0800, John Garry wrote:
> > > From: Kashyap Desai <kashyap.desai@broadcom.com>
> > > 
> > > Fusion adapters can steer completions to individual queues, and we now
> > > have support for shared host-wide tags.
> > > So we can enable multiqueue support for fusion adapters.
> > > 
> > > Once driver enable shared host-wide tags, cpu hotplug feature is also
> > > supported as it was enabled using below patchsets - commit
> > > bf0beec0607d ("blk-mq: drain I/O when all CPUs in a hctx are
> > > offline")
> > > 
> > > Currently driver has provision to disable host-wide tags using
> > > "host_tagset_enable" module parameter.
> > > 
> > > Once we do not have any major performance regression using host-wide
> > > tags, we will drop the hand-crafted interrupt affinity settings.
> > > 
> > > Performance is also meeting the expecatation - (used both none and
> > > mq-deadline scheduler)
> > > 24 Drive SSD on Aero with/without this patch can get 3.1M IOPs
> > > 3 VDs consist of 8 SAS SSD on Aero with/without this patch can get
> > > 3.1M IOPs.
> > > 
> > > Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
> > > Signed-off-by: Hannes Reinecke <hare@suse.com>
> > > Signed-off-by: John Garry <john.garry@huawei.com>
> > 
> > Reverting this commit fixed an issue that Dell Power Edge R6415 server
> > with
> > megaraid_sas is unable to boot.
> 
> I will take a look at this. BTW, can you try keeping same PATCH but use
> module parameter "host_tagset_enable =0"

Yes, that also works.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-02 15:18       ` Qian Cai
@ 2020-11-03 10:54         ` John Garry
  2020-11-03 13:04           ` Qian Cai
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-03 10:54 UTC (permalink / raw)
  To: Qian Cai, axboe, jejb, martin.petersen, don.brace, kashyap.desai,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On 02/11/2020 15:18, Qian Cai wrote:
> On Mon, 2020-11-02 at 14:51 +0000, John Garry wrote:
>> On 02/11/2020 14:17, Qian Cai wrote:
>>> [  251.961152][  T330] INFO: task systemd-udevd:567 blocked for more than
>>> 122 seconds.
>>> [  251.968876][  T330]       Not tainted 5.10.0-rc1-next-20201102 #1
>>> [  251.975003][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [  251.983546][  T330] task:systemd-udevd   state:D stack:27224 pid:  567
>>> ppid:   506 flags:0x00004324
>>> [  251.992620][  T330] Call Trace:
>>> [  251.995784][  T330]  __schedule+0x71d/0x1b60
>>> [  252.000067][  T330]  ? __sched_text_start+0x8/0x8
>>> [  252.004798][  T330]  schedule+0xbf/0x270
>>> [  252.008735][  T330]  schedule_timeout+0x3fc/0x590
>>> [  252.013464][  T330]  ? usleep_range+0x120/0x120
>>> [  252.018008][  T330]  ? wait_for_completion+0x156/0x250
>>> [  252.023176][  T330]  ? lock_downgrade+0x700/0x700
>>> [  252.027886][  T330]  ? rcu_read_unlock+0x40/0x40
>>> [  252.032530][  T330]  ? do_raw_spin_lock+0x121/0x290
>>> [  252.037412][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
>>> [  252.043268][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
>>> [  252.048331][  T330]  wait_for_completion+0x15e/0x250
>>> [  252.053323][  T330]  ? wait_for_completion_interruptible+0x320/0x320
>>> [  252.059687][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
>>> [  252.065543][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
>>> [  252.070606][  T330]  __flush_work+0x42a/0x900
>>> [  252.074989][  T330]  ? queue_delayed_work_on+0x90/0x90
>>> [  252.080139][  T330]  ? __queue_work+0x463/0xf40
>>> [  252.084700][  T330]  ? init_pwq+0x320/0x320
>>> [  252.088891][  T330]  ? queue_work_on+0x5e/0x80
>>> [  252.093364][  T330]  ? trace_hardirqs_on+0x1c/0x150
>>> [  252.098255][  T330]  work_on_cpu+0xe7/0x130
>>> [  252.102461][  T330]  ? flush_delayed_work+0xc0/0xc0
>>> [  252.107342][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
>>> [  252.112764][  T330]  ? work_debug_hint+0x30/0x30
>>> [  252.117391][  T330]  ? pci_device_shutdown+0x80/0x80
>>> [  252.122378][  T330]  ? cpumask_next_and+0x57/0x80
>>> [  252.127094][  T330]  pci_device_probe+0x500/0x5c0
>>> [  252.131824][  T330]  ? pci_device_remove+0x1f0/0x1f0
>>
>> Is CONFIG_DEBUG_TEST_DRIVER_REMOVE enabled? I figure it is, with this call.
>>
>> Or please share the .config
> 
> No. https://cailca.coding.net/public/linux/mm/git/files/master/x86.config
> 

thanks, FWIW, I just tested another megaraid sas card on linux-next 02 
Nov with vanilla arm64 defconfig and no special commandline param, and 
found no issue:

dmesg | grep mega
[30.031739] megasas: 07.714.04.00-rc1
[30.039749] megaraid_sas 0000:08:00.0: Adding to iommu group 0
[30.053247] megaraid_sas 0000:08:00.0: BAR:0x0  BAR's 
base_addr(phys):0x0000080010000000  mapped virt_addr:0x(____ptrval____)
[30.053251] megaraid_sas 0000:08:00.0: FW now in Ready state
[30.065162] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 bit 
consistent mask
[30.081197] megaraid_sas 0000:08:00.0: firmware supports msix  : (128)
[30.096349] megaraid_sas 0000:08:00.0: requested/available msix 128/128
[30.110277] megaraid_sas 0000:08:00.0: current msix/online cpus: (128/128)
[30.124917] megaraid_sas 0000:08:00.0: RDPQ mode  : (enabled)
[30.136821] megaraid_sas 0000:08:00.0: Current firmware supports maximum 
commands: 4077 LDIO threshold: 0
[30.208538] megaraid_sas 0000:08:00.0: Performance mode :Latency 
(latency index = 1)
[30.224838] megaraid_sas 0000:08:00.0: FW supports sync cache  : Yes
[30.238021] megaraid_sas 0000:08:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x40000009
[30.311960] megaraid_sas 0000:08:00.0: FW provided supportMaxExtLDs: 1 
max_lds: 64
[30.327885] megaraid_sas 0000:08:00.0: controller type : MR(2048MB)
[30.341066] megaraid_sas 0000:08:00.0: Online Controller Reset(OCR)  : 
Enabled
[30.356076] megaraid_sas 0000:08:00.0: Secure JBOD support: Yes
[30.368710] megaraid_sas 0000:08:00.0: NVMe passthru support : Yes
[30.381708] megaraid_sas 0000:08:00.0: FW provided TM TaskAbort/Reset 
timeout  : 6 secs/60 secs
[30.399825] megaraid_sas 0000:08:00.0: JBOD sequence map support  : Yes
[30.413552] megaraid_sas 0000:08:00.0: PCI Lane Margining support : No
[30.452059] megaraid_sas 0000:08:00.0: NVME page size  : (4096)
[30.465079] megaraid_sas 0000:08:00.0: megasas_enable_intr_fusion is 
called outbound_intr_mask:0x40000000
[30.485208] megaraid_sas 0000:08:00.0: INIT adapter done
[30.496609] megaraid_sas 0000:08:00.0: pci id : 
(0x1000)/(0x0016)/(0x19e5)/(0xd215)
[30.512931] megaraid_sas 0000:08:00.0: unevenspan support : no
[30.525199] megaraid_sas 0000:08:00.0: firmware crash dump: no
[30.537649] megaraid_sas 0000:08:00.0: JBOD sequence map  : enabled
[30.550743] megaraid_sas 0000:08:00.0: Max firmware commands: 4076 
shared with nr_hw_queues = 127

john@ubuntu:~$ lspci -s 08:00.0 -v
08:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID Tri-Mode 
SAS3508 (rev 01)
   Subsystem: Huawei Technologies Co., Ltd. MegaRAID Tri-Mode SAS3508
   Flags: bus master, fast devsel, latency 0, IRQ 41, NUMA node 0
   Memory at 80010000000 (64-bit, prefetchable) [size=1M]
   Memory at 80010100000 (64-bit, prefetchable) [size=1M]
   Memory at e9400000 (32-bit, non-prefetchable) [size=1M]
   I/O ports at 1000 [size=256]
   Expansion ROM at e9500000 [disabled] [size=1M]
   Capabilities: <access denied>
   Kernel driver in use: megaraid_sas

I have no x86 system to test that x86 config, though. How about 
v5.10-rc2 for this issue?

Thanks,
John


>>
>> Cheers,
>> John
>>
>>> [  252.136805][  T330]  really_probe+0x207/0xad0
>>> [  252.141191][  T330]  ? device_driver_attach+0x120/0x120
>>> [  252.146428][  T330]  driver_probe_device+0x1f1/0x370
>>> [  252.151424][  T330]  device_driver_attach+0xe5/0x120
>>> [  252.156399][  T330]  __driver_attach+0xf0/0x260
>>> [  252.160953][  T330]  bus_for_each_dev+0x117/0x1a0
>>> [  252.165669][  T330]  ? subsys_dev_iter_exit+0x10/0x10
>>> [  252.170731][  T330]  bus_add_driver+0x399/0x560
>>> [  252.175289][  T330]  driver_register+0x189/0x310
>>> [  252.179919][  T330]  ? 0xffffffffc05c1000
>>> [  252.183960][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
>>> [  252.189713][  T330]  do_one_initcall+0xf6/0x510
>>> [  252.194267][  T330]  ? perf_trace_initcall_level+0x490/0x490
>>> [  252.199940][  T330]  ? kasan_unpoison_shadow+0x30/0x40
>>> [  252.205104][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
>>> [  252.210859][  T330]  ? do_init_module+0x49/0x6c0
>>> [  252.215500][  T330]  ? kmem_cache_alloc_trace+0x11f/0x1e0
>>> [  252.220925][  T330]  ? kasan_unpoison_shadow+0x30/0x40
>>> [  252.226068][  T330]  do_init_module+0x1ed/0x6c0
>>> [  252.230608][  T330]  load_module+0x4a59/0x5d20
>>> [  252.235081][  T330]  ? layout_and_allocate+0x2770/0x2770
>>> [  252.240404][  T330]  ? __vmalloc_node+0x8d/0x100
>>> [  252.245046][  T330]  ? kernel_read_file+0x485/0x5a0
>>> [  252.249934][  T330]  ? kernel_read_file+0x305/0x5a0
>>> [  252.254839][  T330]  ? __x64_sys_fsconfig+0x970/0x970
>>> [  252.259903][  T330]  ? __do_sys_finit_module+0xff/0x180
>>> [  252.265153][  T330]  __do_sys_finit_module+0xff/0x180
>>> [  252.270216][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
>>> [  252.275465][  T330]  ? __fget_files+0x1c3/0x2e0
>>> [  252.280010][  T330]  do_syscall_64+0x33/0x40
>>> [  252.284304][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [  252.290054][  T330] RIP: 0033:0x7fbb3e2fa78d
>>> [  252.294348][  T330] Code: Unable to access opcode bytes at RIP
>>> 0x7fbb3e2fa763.
>>> [  252.301584][  T330] RSP: 002b:00007ffe572e8d18 EFLAGS: 00000246 ORIG_RAX:
>>> 0000000000000139
>>> [  252.309855][  T330] RAX: ffffffffffffffda RBX: 000055c7795d90f0 RCX:
>>> 00007fbb3e2fa78d
>>> [  252.317703][  T330] RDX: 0000000000000000 RSI: 00007fbb3ee6c82d RDI:
>>> 0000000000000006
>>> [  252.325553][  T330] RBP: 00007fbb3ee6c82d R08: 0000000000000000 R09:
>>> 00007ffe572e8e40
>>> [  252.333402][  T330] R10: 0000000000000006 R11: 0000000000000246 R12:
>>> 0000000000000000
>>> [  252.341257][  T330] R13: 000055c7795930e0 R14: 0000000000020000 R15:
>>> 0000000000000000
>>> [  252.349117][  T330]
> 
> .
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-03 10:54         ` John Garry
@ 2020-11-03 13:04           ` Qian Cai
  2020-11-04 15:21             ` Qian Cai
  0 siblings, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-03 13:04 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On Tue, 2020-11-03 at 10:54 +0000, John Garry wrote:
> I have no x86 system to test that x86 config, though. How about 
> v5.10-rc2 for this issue?

v5.10-rc2 is also broken here.

[  251.941451][  T330] INFO: task systemd-udevd:551 blocked for more than 122 seconds.
[  251.949176][  T330]       Not tainted 5.10.0-rc2 #3
[  251.954094][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  251.962633][  T330] task:systemd-udevd   state:D stack:27160 pid:  551 ppid:   506 flags:0x00000324
[  251.971707][  T330] Call Trace:
[  251.974871][  T330]  __schedule+0x71d/0x1b50
[  251.979155][  T330]  ? kcore_callback+0x1d/0x1d
[  251.983709][  T330]  schedule+0xbf/0x270
[  251.987640][  T330]  schedule_timeout+0x3fc/0x590
[  251.992370][  T330]  ? usleep_range+0x120/0x120
[  251.996910][  T330]  ? wait_for_completion+0x156/0x250
[  252.002080][  T330]  ? lock_downgrade+0x700/0x700
[  252.006792][  T330]  ? rcu_read_unlock+0x40/0x40
[  252.011435][  T330]  ? do_raw_spin_lock+0x121/0x290
[  252.016324][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[  252.022178][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
[  252.027235][  T330]  wait_for_completion+0x15e/0x250
[  252.032226][  T330]  ? wait_for_completion_interruptible+0x2f0/0x2f0
[  252.038590][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
[  252.044443][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
[  252.049502][  T330]  __flush_work+0x42a/0x900
[  252.053882][  T330]  ? queue_delayed_work_on+0x90/0x90
[  252.059025][  T330]  ? __queue_work+0x463/0xf40
[  252.063583][  T330]  ? init_pwq+0x320/0x320
[  252.067777][  T330]  ? queue_work_on+0x5e/0x80
[  252.072249][  T330]  ? trace_hardirqs_on+0x1c/0x150
[  252.077138][  T330]  work_on_cpu+0xe7/0x130
[  252.081347][  T330]  ? flush_delayed_work+0xc0/0xc0
[  252.086231][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
[  252.091655][  T330]  ? work_debug_hint+0x30/0x30
[  252.096284][  T330]  ? pci_device_shutdown+0x80/0x80
[  252.101274][  T330]  ? cpumask_next_and+0x57/0x80
[  252.105990][  T330]  pci_device_probe+0x500/0x5c0
[  252.110703][  T330]  ? pci_device_remove+0x1f0/0x1f0
[  252.115697][  T330]  really_probe+0x207/0xad0
[  252.120065][  T330]  ? device_driver_attach+0x120/0x120
[  252.125317][  T330]  driver_probe_device+0x1f1/0x370
[  252.130291][  T330]  device_driver_attach+0xe5/0x120
[  252.135281][  T330]  __driver_attach+0xf0/0x260
[  252.139827][  T330]  bus_for_each_dev+0x117/0x1a0
[  252.144552][  T330]  ? subsys_dev_iter_exit+0x10/0x10
[  252.149609][  T330]  bus_add_driver+0x399/0x560
[  252.154166][  T330]  driver_register+0x189/0x310
[  252.158795][  T330]  ? 0xffffffffc05c5000
[  252.162838][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
[  252.168593][  T330]  do_one_initcall+0xf6/0x510
[  252.173143][  T330]  ? perf_trace_initcall_level+0x490/0x490
[  252.178809][  T330]  ? kasan_unpoison_shadow+0x30/0x40
[  252.183973][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
[  252.189728][  T330]  ? do_init_module+0x49/0x6c0
[  252.194370][  T330]  ? kmem_cache_alloc_trace+0x12e/0x2a0
[  252.199780][  T330]  ? kasan_unpoison_shadow+0x30/0x40
[  252.204942][  T330]  do_init_module+0x1ed/0x6c0
[  252.209479][  T330]  load_module+0x4a25/0x5cf0
[  252.213950][  T330]  ? layout_and_allocate+0x2770/0x2770
[  252.219271][  T330]  ? __vmalloc_node+0x8d/0x100
[  252.223913][  T330]  ? kernel_read_file+0x485/0x5a0
[  252.228796][  T330]  ? kernel_read_file+0x305/0x5a0
[  252.233696][  T330]  ? __ia32_sys_fsconfig+0x6a0/0x6a0
[  252.238841][  T330]  ? __do_sys_finit_module+0xff/0x180
[  252.244093][  T330]  __do_sys_finit_module+0xff/0x180
[  252.249155][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
[  252.254403][  T330]  ? __fget_files+0x1c3/0x2e0
[  252.258940][  T330]  do_syscall_64+0x33/0x40
[  252.263234][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  252.268984][  T330] RIP: 0033:0x7f7cf6a4878d
[  252.273276][  T330] Code: Unable to access opcode bytes at RIP 0x7f7cf6a48763.
[  252.280499][  T330] RSP: 002b:00007ffcfa94b978 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  252.288781][  T330] RAX: ffffffffffffffda RBX: 000055e01f48b730 RCX: 00007f7cf6a4878d
[  252.296628][  T330] RDX: 0000000000000000 RSI: 00007f7cf75ba82d RDI: 0000000000000006
[  252.304482][  T330] RBP: 00007f7cf75ba82d R08: 0000000000000000 R09: 00007ffcfa94baa0
[  252.312331][  T330] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[  252.320167][  T330] R13: 000055e01f433530 R14: 0000000000020000 R15: 0000000000000000
[  252.328052][  T330] 
[  252.328052][  T330] Showing all locks held in the system:
[  252.335722][  T330] 3 locks held by kworker/3:1/289:
[  252.340697][  T330]  #0: ffff8881001eb338 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7ec/0x1610
[  252.350906][  T330]  #1: ffffc90004ef7e00 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610
[  252.361725][  T330]  #2: ffff88810dc600e0 (&shost->scan_mutex){+.+.}-{3:3}, at: scsi_scan_host_selected+0xde/0x260
[  252.372132][  T330] 1 lock held by khungtaskd/330:
[  252.376933][  T330]  #0: ffffffffb42d2de0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.52+0x0/0x30
[  252.387234][  T330] 1 lock held by systemd-journal/398:
[  252.392489][  T330] 1 lock held by systemd-udevd/551:
[  252.397550][  T330]  #0: ffff888109a49218 (&dev->mutex){....}-{3:3}, at: device_driver_attach+0x37/0x120
[  252.407085][  T330] 
[  252.409285][  T330] =============================================
[  252.409285][  T330] 



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-03 13:04           ` Qian Cai
@ 2020-11-04 15:21             ` Qian Cai
  2020-11-04 16:07               ` Kashyap Desai
  0 siblings, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-04 15:21 UTC (permalink / raw)
  To: John Garry, axboe, jejb, martin.petersen, don.brace,
	kashyap.desai, ming.lei, bvanassche, dgilbert, paolo.valente,
	hare, hch
  Cc: sumit.saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, megaraidlinux.pdl, chenxiang66, luojiaxing,
	Hannes Reinecke

On Tue, 2020-11-03 at 08:04 -0500, Qian Cai wrote:
> On Tue, 2020-11-03 at 10:54 +0000, John Garry wrote:
> > I have no x86 system to test that x86 config, though. How about 
> > v5.10-rc2 for this issue?
> 
> v5.10-rc2 is also broken here.

John, Kashyap, any update on this? If this is going to take a while to fix it
proper, should I send a patch to revert this or at least disable the feature by
default for megaraid_sas in the meantime, so it no longer breaks the existing
systems out there?

> 
> [  251.941451][  T330] INFO: task systemd-udevd:551 blocked for more than 122
> seconds.
> [  251.949176][  T330]       Not tainted 5.10.0-rc2 #3
> [  251.954094][  T330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  251.962633][  T330] task:systemd-udevd   state:D stack:27160 pid:  551
> ppid:   506 flags:0x00000324
> [  251.971707][  T330] Call Trace:
> [  251.974871][  T330]  __schedule+0x71d/0x1b50
> [  251.979155][  T330]  ? kcore_callback+0x1d/0x1d
> [  251.983709][  T330]  schedule+0xbf/0x270
> [  251.987640][  T330]  schedule_timeout+0x3fc/0x590
> [  251.992370][  T330]  ? usleep_range+0x120/0x120
> [  251.996910][  T330]  ? wait_for_completion+0x156/0x250
> [  252.002080][  T330]  ? lock_downgrade+0x700/0x700
> [  252.006792][  T330]  ? rcu_read_unlock+0x40/0x40
> [  252.011435][  T330]  ? do_raw_spin_lock+0x121/0x290
> [  252.016324][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> [  252.022178][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  252.027235][  T330]  wait_for_completion+0x15e/0x250
> [  252.032226][  T330]  ? wait_for_completion_interruptible+0x2f0/0x2f0
> [  252.038590][  T330]  ? lockdep_hardirqs_on_prepare+0x27c/0x3d0
> [  252.044443][  T330]  ? _raw_spin_unlock_irq+0x1f/0x30
> [  252.049502][  T330]  __flush_work+0x42a/0x900
> [  252.053882][  T330]  ? queue_delayed_work_on+0x90/0x90
> [  252.059025][  T330]  ? __queue_work+0x463/0xf40
> [  252.063583][  T330]  ? init_pwq+0x320/0x320
> [  252.067777][  T330]  ? queue_work_on+0x5e/0x80
> [  252.072249][  T330]  ? trace_hardirqs_on+0x1c/0x150
> [  252.077138][  T330]  work_on_cpu+0xe7/0x130
> [  252.081347][  T330]  ? flush_delayed_work+0xc0/0xc0
> [  252.086231][  T330]  ? __mutex_unlock_slowpath+0xd4/0x670
> [  252.091655][  T330]  ? work_debug_hint+0x30/0x30
> [  252.096284][  T330]  ? pci_device_shutdown+0x80/0x80
> [  252.101274][  T330]  ? cpumask_next_and+0x57/0x80
> [  252.105990][  T330]  pci_device_probe+0x500/0x5c0
> [  252.110703][  T330]  ? pci_device_remove+0x1f0/0x1f0
> [  252.115697][  T330]  really_probe+0x207/0xad0
> [  252.120065][  T330]  ? device_driver_attach+0x120/0x120
> [  252.125317][  T330]  driver_probe_device+0x1f1/0x370
> [  252.130291][  T330]  device_driver_attach+0xe5/0x120
> [  252.135281][  T330]  __driver_attach+0xf0/0x260
> [  252.139827][  T330]  bus_for_each_dev+0x117/0x1a0
> [  252.144552][  T330]  ? subsys_dev_iter_exit+0x10/0x10
> [  252.149609][  T330]  bus_add_driver+0x399/0x560
> [  252.154166][  T330]  driver_register+0x189/0x310
> [  252.158795][  T330]  ? 0xffffffffc05c5000
> [  252.162838][  T330]  megasas_init+0x117/0x1000 [megaraid_sas]
> [  252.168593][  T330]  do_one_initcall+0xf6/0x510
> [  252.173143][  T330]  ? perf_trace_initcall_level+0x490/0x490
> [  252.178809][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> [  252.183973][  T330]  ? __kasan_kmalloc.constprop.11+0xc1/0xd0
> [  252.189728][  T330]  ? do_init_module+0x49/0x6c0
> [  252.194370][  T330]  ? kmem_cache_alloc_trace+0x12e/0x2a0
> [  252.199780][  T330]  ? kasan_unpoison_shadow+0x30/0x40
> [  252.204942][  T330]  do_init_module+0x1ed/0x6c0
> [  252.209479][  T330]  load_module+0x4a25/0x5cf0
> [  252.213950][  T330]  ? layout_and_allocate+0x2770/0x2770
> [  252.219271][  T330]  ? __vmalloc_node+0x8d/0x100
> [  252.223913][  T330]  ? kernel_read_file+0x485/0x5a0
> [  252.228796][  T330]  ? kernel_read_file+0x305/0x5a0
> [  252.233696][  T330]  ? __ia32_sys_fsconfig+0x6a0/0x6a0
> [  252.238841][  T330]  ? __do_sys_finit_module+0xff/0x180
> [  252.244093][  T330]  __do_sys_finit_module+0xff/0x180
> [  252.249155][  T330]  ? __do_sys_init_module+0x1d0/0x1d0
> [  252.254403][  T330]  ? __fget_files+0x1c3/0x2e0
> [  252.258940][  T330]  do_syscall_64+0x33/0x40
> [  252.263234][  T330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  252.268984][  T330] RIP: 0033:0x7f7cf6a4878d
> [  252.273276][  T330] Code: Unable to access opcode bytes at RIP
> 0x7f7cf6a48763.
> [  252.280499][  T330] RSP: 002b:00007ffcfa94b978 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [  252.288781][  T330] RAX: ffffffffffffffda RBX: 000055e01f48b730 RCX:
> 00007f7cf6a4878d
> [  252.296628][  T330] RDX: 0000000000000000 RSI: 00007f7cf75ba82d RDI:
> 0000000000000006
> [  252.304482][  T330] RBP: 00007f7cf75ba82d R08: 0000000000000000 R09:
> 00007ffcfa94baa0
> [  252.312331][  T330] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000000000000
> [  252.320167][  T330] R13: 000055e01f433530 R14: 0000000000020000 R15:
> 0000000000000000
> [  252.328052][  T330] 
> [  252.328052][  T330] Showing all locks held in the system:
> [  252.335722][  T330] 3 locks held by kworker/3:1/289:
> [  252.340697][  T330]  #0: ffff8881001eb338 ((wq_completion)events){+.+.}-
> {0:0}, at: process_one_work+0x7ec/0x1610
> [  252.350906][  T330]  #1: ffffc90004ef7e00
> ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x820/0x1610
> [  252.361725][  T330]  #2: ffff88810dc600e0 (&shost->scan_mutex){+.+.}-{3:3}, 
> at: scsi_scan_host_selected+0xde/0x260
> [  252.372132][  T330] 1 lock held by khungtaskd/330:
> [  252.376933][  T330]  #0: ffffffffb42d2de0 (rcu_read_lock){....}-{1:2}, at:
> rcu_lock_acquire.constprop.52+0x0/0x30
> [  252.387234][  T330] 1 lock held by systemd-journal/398:
> [  252.392489][  T330] 1 lock held by systemd-udevd/551:
> [  252.397550][  T330]  #0: ffff888109a49218 (&dev->mutex){....}-{3:3}, at:
> device_driver_attach+0x37/0x120
> [  252.407085][  T330] 
> [  252.409285][  T330] =============================================
> [  252.409285][  T330] 
> 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-04 15:21             ` Qian Cai
@ 2020-11-04 16:07               ` Kashyap Desai
  2020-11-04 18:08                 ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Kashyap Desai @ 2020-11-04 16:07 UTC (permalink / raw)
  To: Qian Cai, John Garry, axboe, jejb, martin.petersen, don.brace,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 547 bytes --]

> >
> > v5.10-rc2 is also broken here.
>
> John, Kashyap, any update on this? If this is going to take a while to fix
> it
> proper, should I send a patch to revert this or at least disable the
> feature by
> default for megaraid_sas in the meantime, so it no longer breaks the
> existing
> systems out there?

I am trying to get similar h/w to try out. All my current h/w works fine.
Give me couple of days' time.
If this is not obviously common issue and need time, we will go with module
parameter disable method.
I will let you know.

Kashyap

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4169 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-04 16:07               ` Kashyap Desai
@ 2020-11-04 18:08                 ` John Garry
  2020-11-06 19:25                   ` Sumit Saxena
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-04 18:08 UTC (permalink / raw)
  To: Kashyap Desai, Qian Cai, axboe, jejb, martin.petersen, don.brace,
	ming.lei, bvanassche, dgilbert, paolo.valente, hare, hch
  Cc: Sumit Saxena, linux-block, linux-kernel, linux-scsi,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

On 04/11/2020 16:07, Kashyap Desai wrote:
>>>
>>> v5.10-rc2 is also broken here.
>>
>> John, Kashyap, any update on this? If this is going to take a while to fix
>> it
>> proper, should I send a patch to revert this or at least disable the
>> feature by
>> default for megaraid_sas in the meantime, so it no longer breaks the
>> existing
>> systems out there?
> 
> I am trying to get similar h/w to try out. All my current h/w works fine.
> Give me couple of days' time.
> If this is not obviously common issue and need time, we will go with module
> parameter disable method.
> I will let you know.

Hi Kashyap,

Please also consider just disabling for this card, so any other possible 
issues are unearthed on other cards. I don't have this card or any x86 
machine to test it unfortunately to assist.

BTW, just to be clear, did you try the same .config as Qian Cai?

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-04 18:08                 ` John Garry
@ 2020-11-06 19:25                   ` Sumit Saxena
  2020-11-07  0:17                     ` Qian Cai
  0 siblings, 1 reply; 58+ messages in thread
From: Sumit Saxena @ 2020-11-06 19:25 UTC (permalink / raw)
  To: John Garry
  Cc: Kashyap Desai, Qian Cai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

[-- Attachment #1: Type: text/plain, Size: 5822 bytes --]

On Wed, Nov 4, 2020 at 11:38 PM John Garry <john.garry@huawei.com> wrote:
>
> On 04/11/2020 16:07, Kashyap Desai wrote:
> >>>
> >>> v5.10-rc2 is also broken here.
> >>
> >> John, Kashyap, any update on this? If this is going to take a while to fix
> >> it
> >> proper, should I send a patch to revert this or at least disable the
> >> feature by
> >> default for megaraid_sas in the meantime, so it no longer breaks the
> >> existing
> >> systems out there?
> >
> > I am trying to get similar h/w to try out. All my current h/w works fine.
> > Give me couple of days' time.
> > If this is not obviously common issue and need time, we will go with module
> > parameter disable method.
> > I will let you know.
>
> Hi Kashyap,
>
> Please also consider just disabling for this card, so any other possible
> issues are unearthed on other cards. I don't have this card or any x86
> machine to test it unfortunately to assist.
>
> BTW, just to be clear, did you try the same .config as Qian Cai?
>
> Thanks,
> John
I am able to hit the boot hang and similar kind of stack traces as
reported by Qian with shared .config on x86 machine.
In my case the system boots after a hang of 40-45 mins. Qian, is it
true for you as well ?
With module parameter -"host_tagset_enable=0", the issue is not seen.
Below is snippet of the dmesg logs/traces which are observed during
system bootup and after wait of 40-45 mins
drives attached to megaraid_sas adapter are discovered:

========================================
[ 1969.502913] INFO: task systemd-udevd:906 can't die for more than
1720 seconds.
[ 1969.597725] task:systemd-udevd   state:D stack:13456 pid:  906
ppid:   858 flags:0x00000324
[ 1969.597730] Call Trace:
[ 1969.597734]  __schedule+0x263/0x7f0
[ 1969.597737]  ? __lock_acquire+0x576/0xaf0
[ 1969.597739]  ? wait_for_completion+0x7b/0x110
[ 1969.597741]  schedule+0x4c/0xc0
[ 1969.597743]  schedule_timeout+0x244/0x2e0
[ 1969.597745]  ? find_held_lock+0x2d/0x90
[ 1969.597748]  ? wait_for_completion+0xa6/0x110
[ 1969.597750]  ? wait_for_completion+0x7b/0x110
[ 1969.597752]  ? lockdep_hardirqs_on_prepare+0xd4/0x170
[ 1969.597753]  ? wait_for_completion+0x7b/0x110
[ 1969.597755]  wait_for_completion+0xae/0x110
[ 1969.597757]  __flush_work+0x269/0x4b0
[ 1969.597760]  ? init_pwq+0xf0/0xf0
[ 1969.597763]  work_on_cpu+0x9c/0xd0
[ 1969.597765]  ? work_is_static_object+0x10/0x10
[ 1969.597768]  ? pci_device_shutdown+0x30/0x30
[ 1969.597770]  pci_device_probe+0x197/0x1b0
[ 1969.597773]  really_probe+0xda/0x410
[ 1969.597776]  driver_probe_device+0xd9/0x140
[ 1969.597778]  device_driver_attach+0x4a/0x50
[ 1969.597780]  __driver_attach+0x83/0x140
[ 1969.597782]  ? device_driver_attach+0x50/0x50
[ 1969.597784]  ? device_driver_attach+0x50/0x50
[ 1969.597787]  bus_for_each_dev+0x74/0xc0
[ 1969.597789]  bus_add_driver+0x14b/0x1f0
[ 1969.597791]  ? 0xffffffffc04fb000
[ 1969.597793]  driver_register+0x66/0xb0
[ 1969.597795]  ? 0xffffffffc04fb000
[ 1969.597801]  megasas_init+0xe7/0x1000 [megaraid_sas]
[ 1969.597803]  do_one_initcall+0x62/0x300
[ 1969.597806]  ? do_init_module+0x1d/0x200
[ 1969.597808]  ? kmem_cache_alloc_trace+0x296/0x2d0
[ 1969.597811]  do_init_module+0x55/0x200
[ 1969.597813]  load_module+0x15f2/0x17b0
[ 1969.597816]  ? __do_sys_finit_module+0xad/0x110
[ 1969.597818]  __do_sys_finit_module+0xad/0x110
[ 1969.597820]  do_syscall_64+0x33/0x40
[ 1969.597823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1969.597825] RIP: 0033:0x7f66340262bd
[ 1969.597827] Code: Unable to access opcode bytes at RIP 0x7f6634026293.
[ 1969.597828] RSP: 002b:00007ffca1011f48 EFLAGS: 00000246 ORIG_RAX:
0000000000000139
[ 1969.597831] RAX: ffffffffffffffda RBX: 000055f6720cf370 RCX: 00007f66340262bd
[ 1969.597833] RDX: 0000000000000000 RSI: 00007f6634b9880d RDI: 0000000000000006
[ 1969.597835] RBP: 00007f6634b9880d R08: 0000000000000000 R09: 00007ffca1012070
[ 1969.597836] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[ 1969.597838] R13: 000055f6720cce70 R14: 0000000000020000 R15: 0000000000000000
[ 1969.597859]
               Showing all locks held in the system:
[ 1969.597862] 2 locks held by kworker/0:0/5:
[ 1969.597863]  #0: ffff9af800194b38
((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x1e6/0x5e0
[ 1969.597872]  #1: ffffbf3bc01f3e70
((kfence_timer).work){+.+.}-{0:0}, at: process_one_work+0x1e6/0x5e0
[ 1969.597890] 3 locks held by kworker/0:1/7:
[ 1969.597960] 1 lock held by khungtaskd/643:
[ 1969.597962]  #0: ffffffffa624cb60 (rcu_read_lock){....}-{1:2}, at:
rcu_lock_acquire.constprop.54+0x0/0x30
[ 1969.597982] 1 lock held by systemd-udevd/906:
[ 1969.597983]  #0: ffff9af984a1c218 (&dev->mutex){....}-{3:3}, at:
device_driver_attach+0x18/0x50

[ 1969.598010] =============================================

[ 1983.242512] random: fast init done
[ 2071.928411] sd 0:2:0:0: [sda] 1951399936 512-byte logical blocks:
(999 GB/931 GiB)
[ 2071.928480] sd 0:2:2:0: [sdc] 1756889088 512-byte logical blocks:
(900 GB/838 GiB)
[ 2071.928537] sd 0:2:1:0: [sdb] 285474816 512-byte logical blocks:
(146 GB/136 GiB)
[ 2071.928580] sd 0:2:0:0: [sda] Write Protect is off
[ 2071.928625] sd 0:2:0:0: [sda] Mode Sense: 1f 00 00 08
[ 2071.928629] sd 0:2:2:0: [sdc] Write Protect is off
[ 2071.928669] sd 0:2:1:0: [sdb] Write Protect is off
[ 2071.928706] sd 0:2:1:0: [sdb] Mode Sense: 1f 00 00 08
[ 2071.928844] sd 0:2:2:0: [sdc] Mode Sense: 1f 00 00 08
[ 2071.928848] sd 0:2:0:0: [sda] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA


================================

I am working on it and need some time for debugging. BTW did anyone
try "shared host tagset" patchset on some other adapter/s which are
not really multiqueue at HW level
but driver exposes multiple hardware queues(similar to megaraid_sas)
with the .config shared by Qian ?

Thanks,
Sumit

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4166 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-06 19:25                   ` Sumit Saxena
@ 2020-11-07  0:17                     ` Qian Cai
  2020-11-09  8:49                       ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-07  0:17 UTC (permalink / raw)
  To: Sumit Saxena, John Garry
  Cc: Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote:
> I am able to hit the boot hang and similar kind of stack traces as
> reported by Qian with shared .config on x86 machine.
> In my case the system boots after a hang of 40-45 mins. Qian, is it
> true for you as well ?
I don't know. I had never waited for that long.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-07  0:17                     ` Qian Cai
@ 2020-11-09  8:49                       ` John Garry
  2020-11-09 13:39                         ` Qian Cai
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-09  8:49 UTC (permalink / raw)
  To: Qian Cai, Sumit Saxena
  Cc: Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On 07/11/2020 00:17, Qian Cai wrote:
> On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote:
>> I am able to hit the boot hang and similar kind of stack traces as
>> reported by Qian with shared .config on x86 machine.
>> In my case the system boots after a hang of 40-45 mins. Qian, is it
>> true for you as well ?
> I don't know. I had never waited for that long.
> 
> .
> 

Hi Qian,

By chance do have an equivalent arm64 .config, enabling the same RH 
config options?

I suppose I could try do this myself also, but an authentic version 
would be nicer.

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-09  8:49                       ` John Garry
@ 2020-11-09 13:39                         ` Qian Cai
  2020-11-09 14:05                           ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-09 13:39 UTC (permalink / raw)
  To: John Garry, Sumit Saxena
  Cc: Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On Mon, 2020-11-09 at 08:49 +0000, John Garry wrote:
> On 07/11/2020 00:17, Qian Cai wrote:
> > On Sat, 2020-11-07 at 00:55 +0530, Sumit Saxena wrote:
> > > I am able to hit the boot hang and similar kind of stack traces as
> > > reported by Qian with shared .config on x86 machine.
> > > In my case the system boots after a hang of 40-45 mins. Qian, is it
> > > true for you as well ?
> > I don't know. I had never waited for that long.
> > 
> > .
> > 
> 
> Hi Qian,
> 
> By chance do have an equivalent arm64 .config, enabling the same RH 
> config options?
> 
> I suppose I could try do this myself also, but an authentic version 
> would be nicer.
The closest one I have here is:
https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config

but it only selects the Thunder X2 platform and needs to manually select
CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have
megaraid_sas.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-09 13:39                         ` Qian Cai
@ 2020-11-09 14:05                           ` John Garry
  2020-11-10 17:42                             ` John Garry
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-09 14:05 UTC (permalink / raw)
  To: Qian Cai, Sumit Saxena
  Cc: Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On 09/11/2020 13:39, Qian Cai wrote:
>> I suppose I could try do this myself also, but an authentic version
>> would be nicer.
> The closest one I have here is:
> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config
> 
> but it only selects the Thunder X2 platform and needs to manually select
> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have
> megaraid_sas.

Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 
D06CS.

So that board has a megaraid sas card. In addition, it also has hisi_sas 
HW, which is another storage controller which we enabled this same 
feature which is causing the problem.

I'll report back when I can.

Thanks,
john


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-09 14:05                           ` John Garry
@ 2020-11-10 17:42                             ` John Garry
  2020-11-11  7:27                               ` Sumit Saxena
  0 siblings, 1 reply; 58+ messages in thread
From: John Garry @ 2020-11-10 17:42 UTC (permalink / raw)
  To: Qian Cai, Sumit Saxena
  Cc: Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On 09/11/2020 14:05, John Garry wrote:
> On 09/11/2020 13:39, Qian Cai wrote:
>>> I suppose I could try do this myself also, but an authentic version
>>> would be nicer.
>> The closest one I have here is:
>> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config
>>
>> but it only selects the Thunder X2 platform and needs to manually select
>> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have
>> megaraid_sas.
> 
> Thanks, I'm confident I can fix it up to get it going on my Huawei arm64 
> D06CS.
> 
> So that board has a megaraid sas card. In addition, it also has hisi_sas 
> HW, which is another storage controller which we enabled this same 
> feature which is causing the problem.
> 
> I'll report back when I can.

So I had to hack that arm64 config a bit to get it booting:
https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang

Boot is ok on my board without the megaraid sas card, but includes 
hisi_sas HW (which enables the equivalent option which is exposing the 
problem).

But the board with the megaraid sas boots very slowly, specifically 
around the megaraid sas probe:

: ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A
[   50.023726][    T1] printk: console [ttyS0] enabled
[   50.412597][    T1] megasas: 07.714.04.00-rc1
[   50.436614][    T5] megaraid_sas 0000:08:00.0: FW now in Ready state
[   50.450079][    T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63 
bit consistent mask
[   50.467811][    T5] megaraid_sas 0000:08:00.0: firmware supports msix 
        : (128)
[   50.845995][    T5] megaraid_sas 0000:08:00.0: requested/available 
msix 128/128
[   50.861476][    T5] megaraid_sas 0000:08:00.0: current msix/online 
cpus      : (128/128)
[   50.877616][    T5] megaraid_sas 0000:08:00.0: RDPQ mode     : (enabled)
[   50.891018][    T5] megaraid_sas 0000:08:00.0: Current firmware 
supports maximum commands: 4077       LDIO threshold: 0
[   51.262942][    T5] megaraid_sas 0000:08:00.0: Performance mode 
:Latency (latency index = 1)
[   51.280749][    T5] megaraid_sas 0000:08:00.0: FW supports sync cache 
        : Yes
[   51.295451][    T5] megaraid_sas 0000:08:00.0: 
megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[   51.387474][    T5] megaraid_sas 0000:08:00.0: FW provided 
supportMaxExtLDs: 1       max_lds: 64
[   51.404931][    T5] megaraid_sas 0000:08:00.0: controller type 
: MR(2048MB)
[   51.419616][    T5] megaraid_sas 0000:08:00.0: Online Controller 
Reset(OCR)  : Enabled
[   51.436132][    T5] megaraid_sas 0000:08:00.0: Secure JBOD support 
: Yes
[   51.450265][    T5] megaraid_sas 0000:08:00.0: NVMe passthru support 
: Yes
[   51.464757][    T5] megaraid_sas 0000:08:00.0: FW provided TM 
TaskAbort/Reset timeout        : 6 secs/60 secs
[   51.484379][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map 
support     : Yes
[   51.499607][    T5] megaraid_sas 0000:08:00.0: PCI Lane Margining 
support    : No
[   51.547610][    T5] megaraid_sas 0000:08:00.0: NVME page size 
: (4096)
[   51.608635][    T5] megaraid_sas 0000:08:00.0: 
megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
[   51.630285][    T5] megaraid_sas 0000:08:00.0: INIT adapter done
[   51.649854][    T5] megaraid_sas 0000:08:00.0: pci id 
: (0x1000)/(0x0016)/(0x19e5)/(0xd215)
[   51.667873][    T5] megaraid_sas 0000:08:00.0: unevenspan support    : no
[   51.681646][    T5] megaraid_sas 0000:08:00.0: firmware crash dump   : no
[   51.695596][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map 
: enabled
[   51.711521][    T5] megaraid_sas 0000:08:00.0: Max firmware commands: 
4076 shared with nr_hw_queues = 127
[   51.733056][    T5] scsi host0: Avago SAS based MegaRAID driver
[   65.304363][    T5] scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG 
MZ7KH1T9 404Q PQ: 0 ANSI: 6
[   65.392401][    T5] scsi 0:0:1:0: Direct-Access     ATA      SAMSUNG 
MZ7KH1T9 404Q PQ: 0 ANSI: 6
[   79.508307][    T5] scsi 0:0:65:0: Enclosure         HUAWEI 
Expander 12Gx16  131  PQ: 0 ANSI: 6
[  183.965109][   C14] random: fast init done

Notice the 14 and 104 second delays.

But does boot fully to get to the console. I'll wait for further issues, 
which you guys seem to experience after a while.

Thanks,
John

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-10 17:42                             ` John Garry
@ 2020-11-11  7:27                               ` Sumit Saxena
  2020-11-11  9:27                                 ` Ming Lei
  2020-11-11 11:51                                 ` John Garry
  0 siblings, 2 replies; 58+ messages in thread
From: Sumit Saxena @ 2020-11-11  7:27 UTC (permalink / raw)
  To: John Garry
  Cc: Qian Cai, Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang66, luojiaxing, Hannes Reinecke

On Tue, Nov 10, 2020 at 11:12 PM John Garry <john.garry@huawei.com> wrote:
>
> On 09/11/2020 14:05, John Garry wrote:
> > On 09/11/2020 13:39, Qian Cai wrote:
> >>> I suppose I could try do this myself also, but an authentic version
> >>> would be nicer.
> >> The closest one I have here is:
> >> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config
> >>
> >> but it only selects the Thunder X2 platform and needs to manually select
> >> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have
> >> megaraid_sas.
> >
> > Thanks, I'm confident I can fix it up to get it going on my Huawei arm64
> > D06CS.
> >
> > So that board has a megaraid sas card. In addition, it also has hisi_sas
> > HW, which is another storage controller which we enabled this same
> > feature which is causing the problem.
> >
> > I'll report back when I can.
>
> So I had to hack that arm64 config a bit to get it booting:
> https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang
>
> Boot is ok on my board without the megaraid sas card, but includes
> hisi_sas HW (which enables the equivalent option which is exposing the
> problem).
>
> But the board with the megaraid sas boots very slowly, specifically
> around the megaraid sas probe:
>
> : ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A
> [   50.023726][    T1] printk: console [ttyS0] enabled
> [   50.412597][    T1] megasas: 07.714.04.00-rc1
> [   50.436614][    T5] megaraid_sas 0000:08:00.0: FW now in Ready state
> [   50.450079][    T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63
> bit consistent mask
> [   50.467811][    T5] megaraid_sas 0000:08:00.0: firmware supports msix
>         : (128)
> [   50.845995][    T5] megaraid_sas 0000:08:00.0: requested/available
> msix 128/128
> [   50.861476][    T5] megaraid_sas 0000:08:00.0: current msix/online
> cpus      : (128/128)
> [   50.877616][    T5] megaraid_sas 0000:08:00.0: RDPQ mode     : (enabled)
> [   50.891018][    T5] megaraid_sas 0000:08:00.0: Current firmware
> supports maximum commands: 4077       LDIO threshold: 0
> [   51.262942][    T5] megaraid_sas 0000:08:00.0: Performance mode
> :Latency (latency index = 1)
> [   51.280749][    T5] megaraid_sas 0000:08:00.0: FW supports sync cache
>         : Yes
> [   51.295451][    T5] megaraid_sas 0000:08:00.0:
> megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
> [   51.387474][    T5] megaraid_sas 0000:08:00.0: FW provided
> supportMaxExtLDs: 1       max_lds: 64
> [   51.404931][    T5] megaraid_sas 0000:08:00.0: controller type
> : MR(2048MB)
> [   51.419616][    T5] megaraid_sas 0000:08:00.0: Online Controller
> Reset(OCR)  : Enabled
> [   51.436132][    T5] megaraid_sas 0000:08:00.0: Secure JBOD support
> : Yes
> [   51.450265][    T5] megaraid_sas 0000:08:00.0: NVMe passthru support
> : Yes
> [   51.464757][    T5] megaraid_sas 0000:08:00.0: FW provided TM
> TaskAbort/Reset timeout        : 6 secs/60 secs
> [   51.484379][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map
> support     : Yes
> [   51.499607][    T5] megaraid_sas 0000:08:00.0: PCI Lane Margining
> support    : No
> [   51.547610][    T5] megaraid_sas 0000:08:00.0: NVME page size
> : (4096)
> [   51.608635][    T5] megaraid_sas 0000:08:00.0:
> megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
> [   51.630285][    T5] megaraid_sas 0000:08:00.0: INIT adapter done
> [   51.649854][    T5] megaraid_sas 0000:08:00.0: pci id
> : (0x1000)/(0x0016)/(0x19e5)/(0xd215)
> [   51.667873][    T5] megaraid_sas 0000:08:00.0: unevenspan support    : no
> [   51.681646][    T5] megaraid_sas 0000:08:00.0: firmware crash dump   : no
> [   51.695596][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map
> : enabled
> [   51.711521][    T5] megaraid_sas 0000:08:00.0: Max firmware commands:
> 4076 shared with nr_hw_queues = 127
> [   51.733056][    T5] scsi host0: Avago SAS based MegaRAID driver
> [   65.304363][    T5] scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG
> MZ7KH1T9 404Q PQ: 0 ANSI: 6
> [   65.392401][    T5] scsi 0:0:1:0: Direct-Access     ATA      SAMSUNG
> MZ7KH1T9 404Q PQ: 0 ANSI: 6
> [   79.508307][    T5] scsi 0:0:65:0: Enclosure         HUAWEI
> Expander 12Gx16  131  PQ: 0 ANSI: 6
> [  183.965109][   C14] random: fast init done
>
> Notice the 14 and 104 second delays.
>
> But does boot fully to get to the console. I'll wait for further issues,
> which you guys seem to experience after a while.
>
> Thanks,
> John
"megaraid_sas" driver calls “scsi_scan_host()” to discover SCSI
devices. In this failure case, scsi_scan_host() is taking a long time
to complete, hence causing delay in system boot.
With "host_tagset" enabled, scsi_scan_host() takes around 20 mins.
With "host_tagset" disabled, scsi_scan_host() takes upto 5-8 mins.

The scan time depends upon the number of scsi channels and devices per
scsi channel is exposed by LLD.
megaraid_sas driver exposes 4 channels and 128 drives per channel.

Each target scan takes 2 seconds (in case of failure with host_tagset
enabled).  That's why driver load completes after ~20 minutes. See
below:

[  299.725271] kobject: 'target18:0:96': free name
[  301.681267] kobject: 'target18:0:97' (00000000987c7f11):
kobject_cleanup, parent 0000000000000000
[  301.681269] kobject: 'target18:0:97' (00000000987c7f11): calling
ktype release
[  301.681273] kobject: 'target18:0:97': free name
[  303.575268] kobject: 'target18:0:98' (00000000a8c34149):
kobject_cleanup, parent 0000000000000000

In Qian's kernel .config, async scsi scan is disabled so in failure
case SCSI scan type is synchronous.
Below is the stack trace when scsi_scan_host() hangs:

[<0>] __wait_rcu_gp+0x134/0x170
[<0>] synchronize_rcu.part.80+0x53/0x60
[<0>] blk_free_flush_queue+0x12/0x30
[<0>] blk_mq_hw_sysfs_release+0x21/0x70
[<0>] kobject_release+0x46/0x150
[<0>] blk_mq_release+0xb4/0xf0
[<0>] blk_release_queue+0xc4/0x130
[<0>] kobject_release+0x46/0x150
[<0>] scsi_device_dev_release_usercontext+0x194/0x3f0
[<0>] execute_in_process_context+0x22/0xa0
[<0>] device_release+0x2e/0x80
[<0>] kobject_release+0x46/0x150
[<0>] scsi_alloc_sdev+0x2e7/0x310
[<0>] scsi_probe_and_add_lun+0x410/0xbd0
[<0>] __scsi_scan_target+0xf2/0x530
[<0>] scsi_scan_channel.part.7+0x51/0x70
[<0>] scsi_scan_host_selected+0xd4/0x140
[<0>] scsi_scan_host+0x198/0x1c0

This issue hits when lock related debugging is enabled in kernel config.
kernel .config parameters(may be subset of this list) are required to
hit the issue:

CONFIG_PREEMPT_COUNT=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_TRACE_IRQFLAGS_NMI=y
CONFIG_DEBUG_KOBJECT=y
CONFIG_PROVE_RCU=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y

When scsi_scan_host() hangs, there are no outstanding IOs with
megaraid_sas driver-firmware stack as SCSI "host_busy" counter and
megaraid_sas driver's internal counter are "0".
Key takeaways:
1. Issue is observed when lock related debugging is enabled so issue
is seen in debug environment.
2. Issue seems to be related to generic shared "host_tagset" code
whenever some kind of kernel debugging is enabled. We do not see an
immediate reason to hide this issue through disabling the
"host_tagset" feature.

John,
Issue may hit on ARM platform too using Qian's .config file with other
adapters (e.g. hisi_sas) as well. So I feel disabling “host_tagset” in
megaraid_sas driver will not help.  It requires debugging from the
“Entire Shared host tag feature” perspective as scsi_scan_host()
waittime aggravates when "host_tagset" is enabled. Also, I am doing
parallel debugging and if I find anything useful, I will share.

Qian,
I need full dmesg logs from your setup with
megaraid_sas.host_tagset_enable=1 and
megaraid_sas.host_tagset_enable=0. Please wait for a long time. I just
want to make sure that whatever you observe is the same as mine.

Thanks,
Sumit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-11  7:27                               ` Sumit Saxena
@ 2020-11-11  9:27                                 ` Ming Lei
  2020-11-11 11:36                                   ` Sumit Saxena
  2020-11-11 14:42                                   ` Qian Cai
  2020-11-11 11:51                                 ` John Garry
  1 sibling, 2 replies; 58+ messages in thread
From: Ming Lei @ 2020-11-11  9:27 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: John Garry, Qian Cai, Kashyap Desai, Jens Axboe,
	James E.J. Bottomley, Martin K. Petersen, don.brace,
	Bart Van Assche, dgilbert, paolo.valente, Hannes Reinecke,
	Christoph Hellwig, linux-block, LKML, Linux SCSI List,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

On Wed, Nov 11, 2020 at 12:57:59PM +0530, Sumit Saxena wrote:
> On Tue, Nov 10, 2020 at 11:12 PM John Garry <john.garry@huawei.com> wrote:
> >
> > On 09/11/2020 14:05, John Garry wrote:
> > > On 09/11/2020 13:39, Qian Cai wrote:
> > >>> I suppose I could try do this myself also, but an authentic version
> > >>> would be nicer.
> > >> The closest one I have here is:
> > >> https://cailca.coding.net/public/linux/mm/git/files/master/arm64.config
> > >>
> > >> but it only selects the Thunder X2 platform and needs to manually select
> > >> CONFIG_MEGARAID_SAS=m to start with, but none of arm64 systems here have
> > >> megaraid_sas.
> > >
> > > Thanks, I'm confident I can fix it up to get it going on my Huawei arm64
> > > D06CS.
> > >
> > > So that board has a megaraid sas card. In addition, it also has hisi_sas
> > > HW, which is another storage controller which we enabled this same
> > > feature which is causing the problem.
> > >
> > > I'll report back when I can.
> >
> > So I had to hack that arm64 config a bit to get it booting:
> > https://github.com/hisilicon/kernel-dev/commits/private-topic-sas-5.10-megaraid-hang
> >
> > Boot is ok on my board without the megaraid sas card, but includes
> > hisi_sas HW (which enables the equivalent option which is exposing the
> > problem).
> >
> > But the board with the megaraid sas boots very slowly, specifically
> > around the megaraid sas probe:
> >
> > : ttyS0 at MMIO 0x3f00002f8 (irq = 17, base_baud = 115200) is a 16550A
> > [   50.023726][    T1] printk: console [ttyS0] enabled
> > [   50.412597][    T1] megasas: 07.714.04.00-rc1
> > [   50.436614][    T5] megaraid_sas 0000:08:00.0: FW now in Ready state
> > [   50.450079][    T5] megaraid_sas 0000:08:00.0: 63 bit DMA mask and 63
> > bit consistent mask
> > [   50.467811][    T5] megaraid_sas 0000:08:00.0: firmware supports msix
> >         : (128)
> > [   50.845995][    T5] megaraid_sas 0000:08:00.0: requested/available
> > msix 128/128
> > [   50.861476][    T5] megaraid_sas 0000:08:00.0: current msix/online
> > cpus      : (128/128)
> > [   50.877616][    T5] megaraid_sas 0000:08:00.0: RDPQ mode     : (enabled)
> > [   50.891018][    T5] megaraid_sas 0000:08:00.0: Current firmware
> > supports maximum commands: 4077       LDIO threshold: 0
> > [   51.262942][    T5] megaraid_sas 0000:08:00.0: Performance mode
> > :Latency (latency index = 1)
> > [   51.280749][    T5] megaraid_sas 0000:08:00.0: FW supports sync cache
> >         : Yes
> > [   51.295451][    T5] megaraid_sas 0000:08:00.0:
> > megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
> > [   51.387474][    T5] megaraid_sas 0000:08:00.0: FW provided
> > supportMaxExtLDs: 1       max_lds: 64
> > [   51.404931][    T5] megaraid_sas 0000:08:00.0: controller type
> > : MR(2048MB)
> > [   51.419616][    T5] megaraid_sas 0000:08:00.0: Online Controller
> > Reset(OCR)  : Enabled
> > [   51.436132][    T5] megaraid_sas 0000:08:00.0: Secure JBOD support
> > : Yes
> > [   51.450265][    T5] megaraid_sas 0000:08:00.0: NVMe passthru support
> > : Yes
> > [   51.464757][    T5] megaraid_sas 0000:08:00.0: FW provided TM
> > TaskAbort/Reset timeout        : 6 secs/60 secs
> > [   51.484379][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map
> > support     : Yes
> > [   51.499607][    T5] megaraid_sas 0000:08:00.0: PCI Lane Margining
> > support    : No
> > [   51.547610][    T5] megaraid_sas 0000:08:00.0: NVME page size
> > : (4096)
> > [   51.608635][    T5] megaraid_sas 0000:08:00.0:
> > megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
> > [   51.630285][    T5] megaraid_sas 0000:08:00.0: INIT adapter done
> > [   51.649854][    T5] megaraid_sas 0000:08:00.0: pci id
> > : (0x1000)/(0x0016)/(0x19e5)/(0xd215)
> > [   51.667873][    T5] megaraid_sas 0000:08:00.0: unevenspan support    : no
> > [   51.681646][    T5] megaraid_sas 0000:08:00.0: firmware crash dump   : no
> > [   51.695596][    T5] megaraid_sas 0000:08:00.0: JBOD sequence map
> > : enabled
> > [   51.711521][    T5] megaraid_sas 0000:08:00.0: Max firmware commands:
> > 4076 shared with nr_hw_queues = 127
> > [   51.733056][    T5] scsi host0: Avago SAS based MegaRAID driver
> > [   65.304363][    T5] scsi 0:0:0:0: Direct-Access     ATA      SAMSUNG
> > MZ7KH1T9 404Q PQ: 0 ANSI: 6
> > [   65.392401][    T5] scsi 0:0:1:0: Direct-Access     ATA      SAMSUNG
> > MZ7KH1T9 404Q PQ: 0 ANSI: 6
> > [   79.508307][    T5] scsi 0:0:65:0: Enclosure         HUAWEI
> > Expander 12Gx16  131  PQ: 0 ANSI: 6
> > [  183.965109][   C14] random: fast init done
> >
> > Notice the 14 and 104 second delays.
> >
> > But does boot fully to get to the console. I'll wait for further issues,
> > which you guys seem to experience after a while.
> >
> > Thanks,
> > John
> "megaraid_sas" driver calls “scsi_scan_host()” to discover SCSI
> devices. In this failure case, scsi_scan_host() is taking a long time
> to complete, hence causing delay in system boot.
> With "host_tagset" enabled, scsi_scan_host() takes around 20 mins.
> With "host_tagset" disabled, scsi_scan_host() takes upto 5-8 mins.
> 
> The scan time depends upon the number of scsi channels and devices per
> scsi channel is exposed by LLD.
> megaraid_sas driver exposes 4 channels and 128 drives per channel.
> 
> Each target scan takes 2 seconds (in case of failure with host_tagset
> enabled).  That's why driver load completes after ~20 minutes. See
> below:
> 
> [  299.725271] kobject: 'target18:0:96': free name
> [  301.681267] kobject: 'target18:0:97' (00000000987c7f11):
> kobject_cleanup, parent 0000000000000000
> [  301.681269] kobject: 'target18:0:97' (00000000987c7f11): calling
> ktype release
> [  301.681273] kobject: 'target18:0:97': free name
> [  303.575268] kobject: 'target18:0:98' (00000000a8c34149):
> kobject_cleanup, parent 0000000000000000
> 
> In Qian's kernel .config, async scsi scan is disabled so in failure
> case SCSI scan type is synchronous.
> Below is the stack trace when scsi_scan_host() hangs:
> 
> [<0>] __wait_rcu_gp+0x134/0x170
> [<0>] synchronize_rcu.part.80+0x53/0x60
> [<0>] blk_free_flush_queue+0x12/0x30

Can this issue disappear by applying the following change?

diff --git a/block/blk-flush.c b/block/blk-flush.c
index e32958f0b687..b1fe6176d77f 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 	INIT_LIST_HEAD(&fq->flush_queue[1]);
 	INIT_LIST_HEAD(&fq->flush_data_in_flight);
 
-	lockdep_register_key(&fq->key);
-	lockdep_set_class(&fq->mq_flush_lock, &fq->key);
-
 	return fq;
 
  fail_rq:
@@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
 	if (!fq)
 		return;
 
-	lockdep_unregister_key(&fq->key);
 	kfree(fq->flush_rq);
 	kfree(fq);
 }


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-11  9:27                                 ` Ming Lei
@ 2020-11-11 11:36                                   ` Sumit Saxena
  2020-11-11 14:42                                   ` Qian Cai
  1 sibling, 0 replies; 58+ messages in thread
From: Sumit Saxena @ 2020-11-11 11:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: John Garry, Qian Cai, Kashyap Desai, Jens Axboe,
	James E.J. Bottomley, Martin K. Petersen, don.brace,
	Bart Van Assche, dgilbert, paolo.valente, Hannes Reinecke,
	Christoph Hellwig, linux-block, LKML, Linux SCSI List,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

>
> Can this issue disappear by applying the following change?
This change fixes the issue for me.

Qian,
Please try after applying changes suggested by Ming.

Thanks,
Sumit
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index e32958f0b687..b1fe6176d77f 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
>         INIT_LIST_HEAD(&fq->flush_queue[1]);
>         INIT_LIST_HEAD(&fq->flush_data_in_flight);
>
> -       lockdep_register_key(&fq->key);
> -       lockdep_set_class(&fq->mq_flush_lock, &fq->key);
> -
>         return fq;
>
>   fail_rq:
> @@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
>         if (!fq)
>                 return;
>
> -       lockdep_unregister_key(&fq->key);
>         kfree(fq->flush_rq);
>         kfree(fq);
>  }
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-11  7:27                               ` Sumit Saxena
  2020-11-11  9:27                                 ` Ming Lei
@ 2020-11-11 11:51                                 ` John Garry
  1 sibling, 0 replies; 58+ messages in thread
From: John Garry @ 2020-11-11 11:51 UTC (permalink / raw)
  To: Sumit Saxena
  Cc: Qian Cai, Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Ming Lei, Bart Van Assche,
	dgilbert, paolo.valente, Hannes Reinecke, Christoph Hellwig,
	linux-block, LKML, Linux SCSI List, esc.storagedev,
	PDL,MEGARAIDLINUX, chenxiang (M),
	luojiaxing, Hannes Reinecke

> 
> In Qian's kernel .config, async scsi scan is disabled so in failure
> case SCSI scan type is synchronous.
> Below is the stack trace when scsi_scan_host() hangs:
> 
> [<0>] __wait_rcu_gp+0x134/0x170
> [<0>] synchronize_rcu.part.80+0x53/0x60
> [<0>] blk_free_flush_queue+0x12/0x30
> [<0>] blk_mq_hw_sysfs_release+0x21/0x70

this is per blk_mq_hw_ctx

> [<0>] kobject_release+0x46/0x150
> [<0>] blk_mq_release+0xb4/0xf0
> [<0>] blk_release_queue+0xc4/0x130
> [<0>] kobject_release+0x46/0x150
> [<0>] scsi_device_dev_release_usercontext+0x194/0x3f0
> [<0>] execute_in_process_context+0x22/0xa0
> [<0>] device_release+0x2e/0x80
> [<0>] kobject_release+0x46/0x150
> [<0>] scsi_alloc_sdev+0x2e7/0x310
> [<0>] scsi_probe_and_add_lun+0x410/0xbd0
> [<0>] __scsi_scan_target+0xf2/0x530
> [<0>] scsi_scan_channel.part.7+0x51/0x70
> [<0>] scsi_scan_host_selected+0xd4/0x140
> [<0>] scsi_scan_host+0x198/0x1c0
> 
> This issue hits when lock related debugging is enabled in kernel config.
> kernel .config parameters(may be subset of this list) are required to
> hit the issue:
> 
> CONFIG_PREEMPT_COUNT=y *
> CONFIG_UNINLINE_SPIN_UNLOCK=y *
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_RT_MUTEXES=y *
> CONFIG_DEBUG_SPINLOCK=y *
> CONFIG_DEBUG_MUTEXES=y *
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y *
> CONFIG_DEBUG_RWSEMS=y *
> CONFIG_DEBUG_LOCK_ALLOC=y *
> CONFIG_LOCKDEP=y *
> CONFIG_DEBUG_LOCKDEP=y
> CONFIG_TRACE_IRQFLAGS=y *
> CONFIG_TRACE_IRQFLAGS_NMI=y
> CONFIG_DEBUG_KOBJECT=y 
> CONFIG_PROVE_RCU=y *
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y *

(* means that I enabled)

> 
> When scsi_scan_host() hangs, there are no outstanding IOs with
> megaraid_sas driver-firmware stack as SCSI "host_busy" counter and
> megaraid_sas driver's internal counter are "0".
> Key takeaways:
> 1. Issue is observed when lock related debugging is enabled so issue
> is seen in debug environment.
> 2. Issue seems to be related to generic shared "host_tagset" code
> whenever some kind of kernel debugging is enabled. We do not see an
> immediate reason to hide this issue through disabling the
> "host_tagset" feature.
> 
> John,
> Issue may hit on ARM platform too using Qian's .config file with other
> adapters (e.g. hisi_sas) as well. So I feel disabling “host_tagset” in
> megaraid_sas driver will not help.  It requires debugging from the
> “Entire Shared host tag feature” perspective as scsi_scan_host()
> waittime aggravates when "host_tagset" is enabled. Also, I am doing
> parallel debugging and if I find anything useful, I will share.

So isn't this then really related to how many HW queues we expose there 
is just scaling up the time? For megaraid sas, it's 1->128 for my arm64 
platform when host_tagset_enable=1.

As a hack, I tried this (while keeping host_tagset_enable=1):

@@ -6162,11 +6168,15 @@ static int megasas_init_fw(struct 
megasas_instance *instance)
                else
                        instance->low_latency_index_start = 1;

-               num_msix_req = num_online_cpus() + 
instance->low_latency_index_start;
+               num_msix_req = 6 + instance->low_latency_index_start;

(6 is an arbitrary small number)

And boot time is nearly same as with host_tagset_enable=0.

For hisi_sas, max HW queue number ever is 16. In addition, we don't scan 
each channel/id/lun for hisi_sas, as it has a scan handler.

> 
> Qian,
> I need full dmesg logs from your setup with
> megaraid_sas.host_tagset_enable=1 and
> megaraid_sas.host_tagset_enable=0. Please wait for a long time. I just
> want to make sure that whatever you observe is the same as mine.
> 

Thanks,
John


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-11  9:27                                 ` Ming Lei
  2020-11-11 11:36                                   ` Sumit Saxena
@ 2020-11-11 14:42                                   ` Qian Cai
  2020-11-11 15:04                                     ` Ming Lei
  1 sibling, 1 reply; 58+ messages in thread
From: Qian Cai @ 2020-11-11 14:42 UTC (permalink / raw)
  To: Ming Lei, Sumit Saxena
  Cc: John Garry, Kashyap Desai, Jens Axboe, James E.J. Bottomley,
	Martin K. Petersen, don.brace, Bart Van Assche, dgilbert,
	paolo.valente, Hannes Reinecke, Christoph Hellwig, linux-block,
	LKML, Linux SCSI List, esc.storagedev, PDL,MEGARAIDLINUX,
	chenxiang66, luojiaxing, Hannes Reinecke

On Wed, 2020-11-11 at 17:27 +0800, Ming Lei wrote:
> Can this issue disappear by applying the following change?

This makes the system boot again as well.

> 
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index e32958f0b687..b1fe6176d77f 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -469,9 +469,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node,
> int cmd_size,
>  	INIT_LIST_HEAD(&fq->flush_queue[1]);
>  	INIT_LIST_HEAD(&fq->flush_data_in_flight);
>  
> -	lockdep_register_key(&fq->key);
> -	lockdep_set_class(&fq->mq_flush_lock, &fq->key);
> -
>  	return fq;
>  
>   fail_rq:
> @@ -486,7 +483,6 @@ void blk_free_flush_queue(struct blk_flush_queue *fq)
>  	if (!fq)
>  		return;
>  
> -	lockdep_unregister_key(&fq->key);
>  	kfree(fq->flush_rq);
>  	kfree(fq);
>  }
> 
> 
> Thanks, 
> Ming


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug
  2020-11-11 14:42                                   ` Qian Cai
@ 2020-11-11 15:04                                     ` Ming Lei
  0 siblings, 0 replies; 58+ messages in thread
From: Ming Lei @ 2020-11-11 15:04 UTC (permalink / raw)
  To: Qian Cai
  Cc: Sumit Saxena, John Garry, Kashyap Desai, Jens Axboe,
	James E.J. Bottomley, Martin K. Petersen, don.brace,
	Bart Van Assche, dgilbert, paolo.valente, Hannes Reinecke,
	Christoph Hellwig, linux-block, LKML, Linux SCSI List,
	esc.storagedev, PDL,MEGARAIDLINUX, chenxiang66, luojiaxing,
	Hannes Reinecke

On Wed, Nov 11, 2020 at 09:42:17AM -0500, Qian Cai wrote:
> On Wed, 2020-11-11 at 17:27 +0800, Ming Lei wrote:
> > Can this issue disappear by applying the following change?
> 
> This makes the system boot again as well.

OK, actually it isn't necessary to register one new lock key for each
hctx(blk_flush_queue) instance, and the current way is really over-kill
because there can be lots of hw queues in one system.

The original lockdep warning can be avoided by setting one nvme_loop
specific lock class simply. If nvme_loop is backed against another nvme_loop,
we still can avoid the warning by killing the direct end io chain, or
assign another lock class.

Will prepare one formal patch tomorrow.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2020-11-11 15:04 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-19 15:20 [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2020-08-19 15:20 ` [PATCH v8 01/18] blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2020-08-19 15:20 ` [PATCH v8 02/18] blk-mq: Rename blk_mq_update_tag_set_depth() John Garry
2020-08-19 15:20 ` [PATCH v8 03/18] blk-mq: Free tags in blk_mq_init_tags() upon error John Garry
2020-08-19 15:20 ` [PATCH v8 04/18] blk-mq: Pass flags for tag init/free John Garry
2020-08-19 15:20 ` [PATCH v8 05/18] blk-mq: Use pointers for blk_mq_tags bitmap tags John Garry
2020-08-19 15:20 ` [PATCH v8 06/18] blk-mq: Facilitate a shared sbitmap per tagset John Garry
2020-08-19 15:20 ` [PATCH v8 07/18] blk-mq: Relocate hctx_may_queue() John Garry
2020-08-19 15:20 ` [PATCH v8 08/18] blk-mq: Record nr_active_requests per queue for when using shared sbitmap John Garry
2020-08-19 15:20 ` [PATCH v8 09/18] blk-mq: Record active_queues_shared_sbitmap per tag_set " John Garry
2020-08-19 15:20 ` [PATCH v8 10/18] blk-mq, elevator: Count requests per hctx to improve performance John Garry
2020-08-19 15:20 ` [PATCH v8 11/18] null_blk: Support shared tag bitmap John Garry
2020-08-19 15:20 ` [PATCH v8 12/18] scsi: Add host and host template flag 'host_tagset' John Garry
2020-08-19 15:20 ` [PATCH v8 13/18] scsi: core: Show nr_hw_queues in sysfs John Garry
2020-09-10  8:33   ` John Garry
2020-08-19 15:20 ` [PATCH v8 14/18] scsi: hisi_sas: Switch v3 hw to MQ John Garry
2020-08-19 15:20 ` [PATCH v8 15/18] scsi: scsi_debug: Support host tagset John Garry
2020-08-19 15:20 ` [PATCH v8 16/18] hpsa: enable host_tagset and switch to MQ John Garry
2020-08-19 15:20 ` [PATCH v8 17/18] scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug John Garry
2020-11-02 14:17   ` Qian Cai
2020-11-02 14:31     ` Kashyap Desai
2020-11-02 15:24       ` Qian Cai
2020-11-02 14:51     ` John Garry
2020-11-02 15:18       ` Qian Cai
2020-11-03 10:54         ` John Garry
2020-11-03 13:04           ` Qian Cai
2020-11-04 15:21             ` Qian Cai
2020-11-04 16:07               ` Kashyap Desai
2020-11-04 18:08                 ` John Garry
2020-11-06 19:25                   ` Sumit Saxena
2020-11-07  0:17                     ` Qian Cai
2020-11-09  8:49                       ` John Garry
2020-11-09 13:39                         ` Qian Cai
2020-11-09 14:05                           ` John Garry
2020-11-10 17:42                             ` John Garry
2020-11-11  7:27                               ` Sumit Saxena
2020-11-11  9:27                                 ` Ming Lei
2020-11-11 11:36                                   ` Sumit Saxena
2020-11-11 14:42                                   ` Qian Cai
2020-11-11 15:04                                     ` Ming Lei
2020-11-11 11:51                                 ` John Garry
2020-08-19 15:20 ` [PATCH v8 18/18] smartpqi: enable host tagset John Garry
2020-08-27  8:53 ` [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2020-09-03 19:28 ` Douglas Gilbert
2020-09-03 21:23 ` Jens Axboe
2020-09-04  9:09   ` John Garry
2020-09-04 12:44     ` Martin K. Petersen
2020-09-16  7:21       ` John Garry
2020-09-17  1:10         ` Martin K. Petersen
2020-09-17  6:48           ` John Garry
2020-09-21 21:35             ` Don.Brace
2020-09-21 22:15               ` John Garry
2020-09-22  9:03                 ` John Garry
2020-09-28 16:11           ` Kashyap Desai
2020-10-06 14:24             ` John Garry
2020-10-06 14:42               ` Jens Axboe
2020-09-08 12:46 ` Hannes Reinecke
2020-09-08 13:38   ` John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.