linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue
@ 2019-05-31  2:27 Ming Lei
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
                   ` (9 more replies)
  0 siblings, 10 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Hi,

The 1st patch introduces support hostwide tags for multiple hw queues
via the simplest approach to share single 'struct blk_mq_tags' instance
among all hw queues. In theory, this way won't cause any performance drop.
Even small IOPS improvement can be observed on random IO on
null_blk/scsi_debug.

By applying the hostwide tags for MQ, we can convert some SCSI driver's
private reply queue into generic blk-mq hw queue, then at least two
improvement can be obtained:

1) the private reply queue maping can be removed from drivers, since the
mapping has been implemented as generic API in blk-mq core

2) it helps to solve the generic managed IRQ race[1] during CPU hotplug
in generic way, otherwise we have to re-invent new way to address the
same issue for these drivers using private reply queue.


[1] https://lore.kernel.org/linux-block/20190527150207.11372-1-ming.lei@redhat.com/T/#m6d95e2218bdd712ffda8f6451a0bb73eb2a651af

Any comment and test feedback are appreciated.

Thanks,
Ming

Hannes Reinecke (1):
  scsi: Add template flag 'host_tagset'

Ming Lei (8):
  blk-mq: allow hw queues to share hostwide tags
  block: null_blk: introduce module parameter of 'g_host_tags'
  scsi_debug: support host tagset
  scsi: introduce scsi_cmnd_hctx_index()
  scsi: hpsa: convert private reply queue to blk-mq hw queue
  scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  scsi: megaraid: convert private reply queue to blk-mq hw queue
  scsi: mp3sas: convert private reply queue to blk-mq hw queue

 block/blk-mq-debugfs.c                      |  1 +
 block/blk-mq-sched.c                        |  8 +++
 block/blk-mq-tag.c                          |  6 ++
 block/blk-mq.c                              | 14 ++++
 block/elevator.c                            |  5 +-
 drivers/block/null_blk_main.c               |  6 ++
 drivers/scsi/hisi_sas/hisi_sas.h            |  2 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c       | 36 +++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      | 46 +++++--------
 drivers/scsi/hpsa.c                         | 49 ++++++--------
 drivers/scsi/megaraid/megaraid_sas_base.c   | 50 +++++---------
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c         | 74 ++++-----------------
 drivers/scsi/mpt3sas/mpt3sas_base.h         |  3 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c        | 17 +++++
 drivers/scsi/scsi_debug.c                   |  3 +
 drivers/scsi/scsi_lib.c                     |  2 +
 include/linux/blk-mq.h                      |  1 +
 include/scsi/scsi_cmnd.h                    | 15 +++++
 include/scsi/scsi_host.h                    |  3 +
 20 files changed, 168 insertions(+), 177 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:07   ` Hannes Reinecke
                     ` (2 more replies)
  2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Some SCSI HBAs(such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags, and the reply queue
is used for delievery & complete request, and one MSI-X vector is
assigned to each reply queue.

Now drivers have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
for automatic affinity assignment. Given there is only single blk-mq hw
queue, these drivers have to setup private reply queue mapping for
figuring out which reply queue is selected for delivery request, and
the queue mapping is based on managed IRQ affinity, and it is generic,
should have been done inside blk-mq.

Based on the following Hannes's patch, introduce BLK_MQ_F_HOST_TAGS for
converting reply queue into blk-mq hw queue.

	https://marc.info/?l=linux-block&m=149132580511346&w=2

Once driver sets BLK_MQ_F_HOST_TAGS, the hostwide tags & request pool is
shared among all blk-mq hw queues.

The following patches will map driver's reply queue into blk-mq hw queue
by applying BLK_MQ_F_HOST_TAGS.

Compared with the current implementation by single hw queue, performance
shouldn't be affected by this patch in theory.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   |  8 ++++++++
 block/blk-mq-tag.c     |  6 ++++++
 block/blk-mq.c         | 14 ++++++++++++++
 block/elevator.c       |  5 +++--
 include/linux/blk-mq.h |  1 +
 6 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6aea0ebc3a73..3d6780504dcb 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -237,6 +237,7 @@ static const char *const alloc_policy_name[] = {
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
 	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(HOST_TAGS),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 74c6bb871f7e..3a4d9ad63e7b 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -449,6 +449,9 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set,
 				   struct blk_mq_hw_ctx *hctx,
 				   unsigned int hctx_idx)
 {
+	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx)
+		return;
+
 	if (hctx->sched_tags) {
 		blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx);
 		blk_mq_free_rq_map(hctx->sched_tags);
@@ -463,6 +466,11 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	struct blk_mq_tag_set *set = q->tag_set;
 	int ret;
 
+	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx) {
+		hctx->sched_tags = q->queue_hw_ctx[0]->sched_tags;
+		return 0;
+	}
+
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
 					       set->reserved_tags);
 	if (!hctx->sched_tags)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 7513c8eaabee..309ec5079f3f 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -358,6 +358,9 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 	for (i = 0; i < tagset->nr_hw_queues; i++) {
 		if (tagset->tags && tagset->tags[i])
 			blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);
+
+		if (tagset->flags & BLK_MQ_F_HOST_TAGS)
+			break;
 	}
 }
 EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
@@ -405,6 +408,9 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		if (tags->nr_reserved_tags)
 			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
 		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+
+		if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+			break;
 	}
 	blk_queue_exit(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32b8ad3d341b..49d73d979cb3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 {
 	int ret = 0;
 
+	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx) {
+		set->tags[hctx_idx] = set->tags[0];
+		return true;
+	}
+
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
 					set->queue_depth, set->reserved_tags);
 	if (!set->tags[hctx_idx])
@@ -2451,6 +2456,9 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 					 unsigned int hctx_idx)
 {
+	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx)
+		return;
+
 	if (set->tags && set->tags[hctx_idx]) {
 		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
 		blk_mq_free_rq_map(set->tags[hctx_idx]);
@@ -3166,6 +3174,12 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 		}
 		if (ret)
 			break;
+
+		if (set->flags & BLK_MQ_F_HOST_TAGS)
+			break;
+	}
+
+	queue_for_each_hw_ctx(q, hctx, i) {
 		if (q->elevator && q->elevator->type->ops.depth_updated)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
diff --git a/block/elevator.c b/block/elevator.c
index ec55d5fc0b3e..ed553d9bc53e 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -596,7 +596,8 @@ int elevator_switch_mq(struct request_queue *q,
 
 /*
  * For blk-mq devices, we default to using mq-deadline, if available, for single
- * queue devices.  If deadline isn't available OR we have multiple queues,
+ * queue devices or multiple queue device with hostwide tags.  If deadline isn't
+ * available OR we have multiple queues,
  * default to "none".
  */
 int elevator_init_mq(struct request_queue *q)
@@ -604,7 +605,7 @@ int elevator_init_mq(struct request_queue *q)
 	struct elevator_type *e;
 	int err = 0;
 
-	if (q->nr_hw_queues != 1)
+	if (q->nr_hw_queues != 1 && !(q->tag_set->flags & BLK_MQ_F_HOST_TAGS))
 		return 0;
 
 	/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 15d1aa53d96c..b4e33b509229 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -219,6 +219,7 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_HOST_TAGS	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:08   ` Hannes Reinecke
                     ` (2 more replies)
  2019-05-31  2:27 ` [PATCH 3/9] scsi: Add template flag 'host_tagset' Ming Lei
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Introduces parameter of 'g_host_tags' for testing hostwide tags.

Not observe performance drop in the following test:

1) no 'host_tags', hw queue depth is 16, and 1 hw queue
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=0 submit_queues=1 hw_queue_depth=16

2) 'host_tags', global hw queue depth is 16, and 8 hw queues
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=1 submit_queues=8 hw_queue_depth=16

3) fio test command:

fio --bs=4k --ioengine=libaio --iodepth=16 --filename=/dev/nullb0:/dev/nullb1:/dev/nullb2:/dev/nullb3 --direct=1 --runtime=30 --numjobs=16 --rw=randread --name=test --group_reporting --gtod_reduce=1

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/null_blk_main.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 447d635c79a2..3c04446f3649 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -91,6 +91,10 @@ static int g_submit_queues = 1;
 module_param_named(submit_queues, g_submit_queues, int, 0444);
 MODULE_PARM_DESC(submit_queues, "Number of submission queues");
 
+static int g_host_tags = 0;
+module_param_named(host_tags, g_host_tags, int, S_IRUGO);
+MODULE_PARM_DESC(host_tags, "All submission queues share one tags");
+
 static int g_home_node = NUMA_NO_NODE;
 module_param_named(home_node, g_home_node, int, 0444);
 MODULE_PARM_DESC(home_node, "Home node for the device");
@@ -1554,6 +1558,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
 	if (g_no_sched)
 		set->flags |= BLK_MQ_F_NO_SCHED;
+	if (g_host_tags)
+		set->flags |= BLK_MQ_F_HOST_TAGS;
 	set->driver_data = NULL;
 
 	if ((nullb && nullb->dev->blocking) || g_blocking)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/9] scsi: Add template flag 'host_tagset'
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
  2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:08   ` Hannes Reinecke
  2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 65d0a10c76ad..2397947ed9fc 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1834,6 +1834,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_HOST_TAGS;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index a5fcdad4a03e..48bb7a666be6 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -428,6 +428,9 @@ struct scsi_host_template {
 	/* True if the low-level driver supports blk-mq only */
 	unsigned force_blk_mq:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/9] scsi_debug: support host tagset
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (2 preceding siblings ...)
  2019-05-31  2:27 ` [PATCH 3/9] scsi: Add template flag 'host_tagset' Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:09   ` Hannes Reinecke
                     ` (2 more replies)
  2019-05-31  2:27 ` [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index() Ming Lei
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

The 'host_tagset' can be set on scsi_debug device for testing
shared hostwide tags on multiple blk-mq hw queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_debug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index d323523f5f9d..8cf3f6c3f4f9 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -665,6 +665,7 @@ static bool have_dif_prot;
 static bool write_since_sync;
 static bool sdebug_statistics = DEF_STATISTICS;
 static bool sdebug_wp;
+static bool sdebug_host_tagset = false;
 
 static unsigned int sdebug_store_sectors;
 static sector_t sdebug_capacity;	/* in sectors */
@@ -4468,6 +4469,7 @@ module_param_named(vpd_use_hostno, sdebug_vpd_use_hostno, int,
 module_param_named(wp, sdebug_wp, bool, S_IRUGO | S_IWUSR);
 module_param_named(write_same_length, sdebug_write_same_length, int,
 		   S_IRUGO | S_IWUSR);
+module_param_named(host_tagset, sdebug_host_tagset, bool, S_IRUGO | S_IWUSR);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -5779,6 +5781,7 @@ static int sdebug_driver_probe(struct device *dev)
 	sdbg_host = to_sdebug_host(dev);
 
 	sdebug_driver_template.can_queue = sdebug_max_queue;
+	sdebug_driver_template.host_tagset = sdebug_host_tagset;
 	if (!sdebug_clustering)
 		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index()
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (3 preceding siblings ...)
  2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:10   ` Hannes Reinecke
  2019-05-31  2:27 ` [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue Ming Lei
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

For drivers which enable .host_tagset, introduce scsi_cmnd_hctx_index
to retrieve current reply queue index. If valid scsi command is provided,
blk-mq's hw queue's index is returned, otherwise return the queue
mapped from current CPU.

Prepare for converting device's privete reply queue to blk-mq hw queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/scsi/scsi_cmnd.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 76ed5e4acd38..23f611a6a9f2 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -9,6 +9,7 @@
 #include <linux/types.h>
 #include <linux/timer.h>
 #include <linux/scatterlist.h>
+#include <scsi/scsi_host.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_request.h>
 
@@ -332,4 +333,18 @@ static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
 	return xfer_len;
 }
 
+/* only for drivers which enable .host_tagset */
+static inline unsigned scsi_cmnd_hctx_index(struct Scsi_Host *sh,
+		struct scsi_cmnd *scmd)
+{
+	if (unlikely(!scmd || !scmd->request || !scmd->request->mq_hctx)) {
+		struct blk_mq_queue_map *qmap =
+			&sh->tag_set.map[HCTX_TYPE_DEFAULT];
+
+		return qmap->mq_map[raw_smp_processor_id()];
+	}
+
+	return scmd->request->mq_hctx->queue_num;
+}
+
 #endif /* _SCSI_SCSI_CMND_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (4 preceding siblings ...)
  2019-05-31  2:27 ` [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index() Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:15   ` Hannes Reinecke
  2019-05-31  2:27 ` [PATCH 7/9] scsi: hisi_sas_v3: " Ming Lei
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by IRQ vector, so map te private reply queue into blk-mq's hw
queue via .host_tagset.

Then the private reply mapping can be removed.

Another benefit is that the request/irq lost issue may be solved in
generic approach because managed IRQ may be shutdown during CPU
hotplug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 49 ++++++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 1bef1da273c2..c7136f9f0ce1 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -51,6 +51,7 @@
 #include <linux/jiffies.h>
 #include <linux/percpu-defs.h>
 #include <linux/percpu.h>
+#include <linux/blk-mq-pci.h>
 #include <asm/unaligned.h>
 #include <asm/div64.h>
 #include "hpsa_cmd.h"
@@ -902,6 +903,18 @@ static ssize_t host_show_legacy_board(struct device *dev,
 	return snprintf(buf, 20, "%d\n", h->legacy_board ? 1 : 0);
 }
 
+static int hpsa_map_queues(struct Scsi_Host *shost)
+{
+	struct ctlr_info *h = shost_to_hba(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	/* Switch to cpu mapping in case that managed IRQ isn't used */
+	if (shost->nr_hw_queues > 1)
+		return blk_mq_pci_map_queues(qmap, h->pdev, 0);
+	else
+		return blk_mq_map_queues(qmap);
+}
+
 static DEVICE_ATTR_RO(raid_level);
 static DEVICE_ATTR_RO(lunid);
 static DEVICE_ATTR_RO(unique_id);
@@ -971,6 +984,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.slave_alloc		= hpsa_slave_alloc,
 	.slave_configure	= hpsa_slave_configure,
 	.slave_destroy		= hpsa_slave_destroy,
+	.map_queues		= hpsa_map_queues,
 #ifdef CONFIG_COMPAT
 	.compat_ioctl		= hpsa_compat_ioctl,
 #endif
@@ -978,6 +992,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 2048,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -1145,7 +1160,7 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
 
-	reply_queue = h->reply_map[raw_smp_processor_id()];
+	reply_queue = scsi_cmnd_hctx_index(h->scsi_host, c->scsi_cmd);
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -5785,6 +5800,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
 {
 	int rv;
 
+	/* map reply queue to blk_mq hw queue */
+	h->scsi_host->nr_hw_queues = h->nreply_queues;
+
 	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
 	if (rv) {
 		dev_err(&h->pdev->dev, "scsi_add_host failed\n");
@@ -7386,26 +7404,6 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
-static void hpsa_setup_reply_map(struct ctlr_info *h)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu;
-
-	for (queue = 0; queue < h->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(h->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			h->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		h->reply_map[cpu] = 0;
-}
-
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7802,9 +7800,6 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	if (err)
 		goto clean1;
 
-	/* setup mapping between CPU and reply queue */
-	hpsa_setup_reply_map(h);
-
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8516,7 +8511,6 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 
 static void hpda_free_ctlr_info(struct ctlr_info *h)
 {
-	kfree(h->reply_map);
 	kfree(h);
 }
 
@@ -8528,11 +8522,6 @@ static struct ctlr_info *hpda_alloc_ctlr_info(void)
 	if (!h)
 		return NULL;
 
-	h->reply_map = kcalloc(nr_cpu_ids, sizeof(*h->reply_map), GFP_KERNEL);
-	if (!h->reply_map) {
-		kfree(h);
-		return NULL;
-	}
 	return h;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (5 preceding siblings ...)
  2019-05-31  2:27 ` [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue Ming Lei
@ 2019-05-31  2:27 ` Ming Lei
  2019-05-31  6:20   ` Hannes Reinecke
  2019-05-31  2:28 ` [PATCH 8/9] scsi: megaraid: " Ming Lei
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by IRQ vector, so map te private reply queue into blk-mq's hw
queue via .host_tagset.

Then the private reply mapping can be removed.

Another benefit is that the request/irq lost issue may be solved in
generic approach because managed IRQ may be shutdown during CPU
hotplug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  2 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++++++----------
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 46 +++++++++-----------------
 3 files changed, 36 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index fc87994b5d73..3d48848dbde7 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -26,6 +26,7 @@
 #include <linux/platform_device.h>
 #include <linux/property.h>
 #include <linux/regmap.h>
+#include <linux/blk-mq-pci.h>
 #include <scsi/sas_ata.h>
 #include <scsi/libsas.h>
 
@@ -378,7 +379,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* debugfs memories */
 	u32 *debugfs_global_reg;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 8a7feb8ed8d6..a1c1f30b9fdb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
 	return rc;
 }
 
+static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
+{
+	if (!task->uldd_task)
+		return NULL;
+
+	if (dev_is_sata(task->dev)) {
+		struct ata_queued_cmd *qc = task->uldd_task;
+		return qc->scsicmd;
+	} else {
+		return task->uldd_task;
+	}
+}
+
 static int hisi_sas_task_prep(struct sas_task *task,
 			      struct hisi_sas_dq **dq_pointer,
 			      bool is_tmf, struct hisi_sas_tmf_task *tmf,
@@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
+	struct scsi_cmnd *scsi_cmnd;
 
 	if (DEV_IS_GONE(sas_dev)) {
 		if (sas_dev)
@@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	scsi_cmnd = sas_task_to_scsi_cmd(task);
+	if (hisi_hba->shost->hostt->host_tagset) {
+		unsigned int dq_index = scsi_cmnd_hctx_index(
+				hisi_hba->shost, scsi_cmnd);
 
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
@@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
-
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
+	else
 		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 49620c2411df..063e50e5b30c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2344,30 +2344,6 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
-{
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
-
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
-}
-
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
@@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 
 		min_msi = MIN_AFFINE_VECTORS_V3_HW;
 
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
 		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
 							 min_msi, max_msi,
 							 PCI_IRQ_MSI |
@@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 							 &desc);
 		if (vectors < 0)
 			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
 	} else {
 		min_msi = max_msi;
 		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
@@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
 	clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	if (auto_affine_msi_experimental)
+		return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+				BASE_VECTORS_V3_HW);
+	else
+		return blk_mq_map_queues(qmap);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
 	.scan_start		= hisi_sas_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
+	.map_queues		= hisi_sas_map_queues,
+	.host_tagset		= 1,
 	.this_id		= -1,
 	.sg_tablesize		= HISI_SAS_SGE_PAGE_CNT,
 	.sg_prot_tablesize	= HISI_SAS_SGE_PAGE_CNT,
@@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
+
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (6 preceding siblings ...)
  2019-05-31  2:27 ` [PATCH 7/9] scsi: hisi_sas_v3: " Ming Lei
@ 2019-05-31  2:28 ` Ming Lei
  2019-05-31  6:22   ` Hannes Reinecke
  2019-06-01 21:41   ` Kashyap Desai
  2019-05-31  2:28 ` [PATCH 9/9] scsi: mp3sas: " Ming Lei
  2019-06-04  8:49 ` [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq " John Garry
  9 siblings, 2 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:28 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by IRQ vector, so map te private reply queue into blk-mq's hw
queue via .host_tagset.

Then the private reply mapping can be removed.

Another benefit is that the request/irq lost issue may be solved in
generic approach because managed IRQ may be shutdown during CPU
hotplug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 50 ++++++++-------------
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +-
 2 files changed, 20 insertions(+), 34 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 3dd1df472dc6..b49999b90231 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -33,6 +33,7 @@
 #include <linux/fs.h>
 #include <linux/compat.h>
 #include <linux/blkdev.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/mutex.h>
 #include <linux/poll.h>
 #include <linux/vmalloc.h>
@@ -3165,6 +3166,19 @@ megasas_fw_cmds_outstanding_show(struct device *cdev,
 	return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&instance->fw_outstanding));
 }
 
+static int megasas_map_queues(struct Scsi_Host *shost)
+{
+	struct megasas_instance *instance = (struct megasas_instance *)
+		shost->hostdata;
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	if (smp_affinity_enable && instance->msix_vectors)
+		return blk_mq_pci_map_queues(qmap, instance->pdev, 0);
+	else
+		return blk_mq_map_queues(qmap);
+}
+
+
 static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
 	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
 static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,
@@ -3207,7 +3221,9 @@ static struct scsi_host_template megasas_template = {
 	.shost_attrs = megaraid_host_attrs,
 	.bios_param = megasas_bios_param,
 	.change_queue_depth = scsi_change_queue_depth,
+	.map_queues =  megasas_map_queues,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 /**
@@ -5407,26 +5423,6 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
 		instance->use_seqnum_jbod_fp = false;
 }
 
-static void megasas_setup_reply_map(struct megasas_instance *instance)
-{
-	const struct cpumask *mask;
-	unsigned int queue, cpu;
-
-	for (queue = 0; queue < instance->msix_vectors; queue++) {
-		mask = pci_irq_get_affinity(instance->pdev, queue);
-		if (!mask)
-			goto fallback;
-
-		for_each_cpu(cpu, mask)
-			instance->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		instance->reply_map[cpu] = cpu % instance->msix_vectors;
-}
-
 /**
  * megasas_get_device_list -	Get the PD and LD device list from FW.
  * @instance:			Adapter soft state
@@ -5666,8 +5662,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
 			goto fail_init_adapter;
 	}
 
-	megasas_setup_reply_map(instance);
-
 	dev_info(&instance->pdev->dev,
 		"firmware supports msix\t: (%d)", fw_msix_count);
 	dev_info(&instance->pdev->dev,
@@ -6298,6 +6292,8 @@ static int megasas_io_attach(struct megasas_instance *instance)
 	host->max_lun = MEGASAS_MAX_LUN;
 	host->max_cmd_len = 16;
 
+	host->nr_hw_queues = instance->msix_vectors ?: 1;
+
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
@@ -6464,11 +6460,6 @@ static inline int megasas_alloc_mfi_ctrl_mem(struct megasas_instance *instance)
  */
 static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
 {
-	instance->reply_map = kcalloc(nr_cpu_ids, sizeof(unsigned int),
-				      GFP_KERNEL);
-	if (!instance->reply_map)
-		return -ENOMEM;
-
 	switch (instance->adapter_type) {
 	case MFI_SERIES:
 		if (megasas_alloc_mfi_ctrl_mem(instance))
@@ -6485,8 +6476,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
 
 	return 0;
  fail:
-	kfree(instance->reply_map);
-	instance->reply_map = NULL;
 	return -ENOMEM;
 }
 
@@ -6499,7 +6488,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
  */
 static inline void megasas_free_ctrl_mem(struct megasas_instance *instance)
 {
-	kfree(instance->reply_map);
 	if (instance->adapter_type == MFI_SERIES) {
 		if (instance->producer)
 			dma_free_coherent(&instance->pdev->dev, sizeof(u32),
@@ -7142,8 +7130,6 @@ megasas_resume(struct pci_dev *pdev)
 	if (rval < 0)
 		goto fail_reenable_msix;
 
-	megasas_setup_reply_map(instance);
-
 	if (instance->adapter_type != MFI_SERIES) {
 		megasas_reset_reply_desc(instance);
 		if (megasas_ioc_init_fusion(instance)) {
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 4dfa0685a86c..4f909f32bf5c 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2699,7 +2699,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 	}
 
 	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->reply_map[raw_smp_processor_id()];
+		scsi_cmnd_hctx_index(instance->host, scp);
 
 	if (instance->adapter_type >= VENTURA_SERIES) {
 		/* FP for Optimal raid level 1.
@@ -3013,7 +3013,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
 
 	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->reply_map[raw_smp_processor_id()];
+		scsi_cmnd_hctx_index(instance->host, scmd);
 
 	if (!fp_possible) {
 		/* system pd firmware path */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 9/9] scsi: mp3sas: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (7 preceding siblings ...)
  2019-05-31  2:28 ` [PATCH 8/9] scsi: megaraid: " Ming Lei
@ 2019-05-31  2:28 ` Ming Lei
  2019-05-31  6:23   ` Hannes Reinecke
  2019-06-06 11:58   ` Sreekanth Reddy
  2019-06-04  8:49 ` [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq " John Garry
  9 siblings, 2 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  2:28 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by IRQ vector, so map te private reply queue into blk-mq's hw
queue via .host_tagset.

Then the private reply mapping can be removed.

Another benefit is that the request/irq lost issue may be solved in
generic approach because managed IRQ may be shutdown during CPU
hotplug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c  | 74 +++++-----------------------
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 17 +++++++
 3 files changed, 31 insertions(+), 63 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 8aacbd1e7db2..2b207d2925b4 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2855,8 +2855,7 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index)
 static void
 _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
 {
-	unsigned int cpu, nr_cpus, nr_msix, index = 0;
-	struct adapter_reply_queue *reply_q;
+	unsigned int nr_cpus, nr_msix;
 
 	if (!_base_is_controller_msix_enabled(ioc))
 		return;
@@ -2866,50 +2865,9 @@ _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
 		return;
 	}
 
-	memset(ioc->cpu_msix_table, 0, ioc->cpu_msix_table_sz);
-
 	nr_cpus = num_online_cpus();
 	nr_msix = ioc->reply_queue_count = min(ioc->reply_queue_count,
 					       ioc->facts.MaxMSIxVectors);
-	if (!nr_msix)
-		return;
-
-	if (smp_affinity_enable) {
-		list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
-			const cpumask_t *mask = pci_irq_get_affinity(ioc->pdev,
-							reply_q->msix_index);
-			if (!mask) {
-				ioc_warn(ioc, "no affinity for msi %x\n",
-					 reply_q->msix_index);
-				continue;
-			}
-
-			for_each_cpu_and(cpu, mask, cpu_online_mask) {
-				if (cpu >= ioc->cpu_msix_table_sz)
-					break;
-				ioc->cpu_msix_table[cpu] = reply_q->msix_index;
-			}
-		}
-		return;
-	}
-	cpu = cpumask_first(cpu_online_mask);
-
-	list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
-
-		unsigned int i, group = nr_cpus / nr_msix;
-
-		if (cpu >= nr_cpus)
-			break;
-
-		if (index < nr_cpus % nr_msix)
-			group++;
-
-		for (i = 0 ; i < group ; i++) {
-			ioc->cpu_msix_table[cpu] = reply_q->msix_index;
-			cpu = cpumask_next(cpu, cpu_online_mask);
-		}
-		index++;
-	}
 }
 
 /**
@@ -2924,6 +2882,7 @@ _base_disable_msix(struct MPT3SAS_ADAPTER *ioc)
 		return;
 	pci_disable_msix(ioc->pdev);
 	ioc->msix_enable = 0;
+	ioc->smp_affinity_enable = 0;
 }
 
 /**
@@ -2980,6 +2939,9 @@ _base_enable_msix(struct MPT3SAS_ADAPTER *ioc)
 		goto try_ioapic;
 	}
 
+	if (irq_flags & PCI_IRQ_AFFINITY)
+		ioc->smp_affinity_enable = 1;
+
 	ioc->msix_enable = 1;
 	ioc->reply_queue_count = r;
 	for (i = 0; i < ioc->reply_queue_count; i++) {
@@ -3266,7 +3228,7 @@ mpt3sas_base_get_reply_virt_addr(struct MPT3SAS_ADAPTER *ioc, u32 phys_addr)
 }
 
 static inline u8
-_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
+_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 {
 	/* Enables reply_queue load balancing */
 	if (ioc->msix_load_balance)
@@ -3274,7 +3236,7 @@ _base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
 		    base_mod64(atomic64_add_return(1,
 		    &ioc->total_io_cnt), ioc->reply_queue_count) : 0;
 
-	return ioc->cpu_msix_table[raw_smp_processor_id()];
+	return scsi_cmnd_hctx_index(ioc->shost, scmd);
 }
 
 /**
@@ -3325,7 +3287,7 @@ mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
 
 	smid = tag + 1;
 	request->cb_idx = cb_idx;
-	request->msix_io = _base_get_msix_index(ioc);
+	request->msix_io = _base_get_msix_index(ioc, scmd);
 	request->smid = smid;
 	INIT_LIST_HEAD(&request->chain_list);
 	return smid;
@@ -3498,7 +3460,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
 	_base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp,
 					ioc->request_sz);
 	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3520,7 +3482,7 @@ _base_put_smid_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
 
 
 	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3543,7 +3505,7 @@ mpt3sas_base_put_smid_fast_path(struct MPT3SAS_ADAPTER *ioc, u16 smid,
 
 	descriptor.SCSIIO.RequestFlags =
 	    MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3607,7 +3569,7 @@ mpt3sas_base_put_smid_nvme_encap(struct MPT3SAS_ADAPTER *ioc, u16 smid)
 
 	descriptor.Default.RequestFlags =
 		MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED;
-	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.Default.SMID = cpu_to_le16(smid);
 	descriptor.Default.LMID = 0;
 	descriptor.Default.DescriptorTypeDependent = 0;
@@ -3639,7 +3601,7 @@ mpt3sas_base_put_smid_default(struct MPT3SAS_ADAPTER *ioc, u16 smid)
 	}
 	request = (u64 *)&descriptor;
 	descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE;
-	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.Default.SMID = cpu_to_le16(smid);
 	descriptor.Default.LMID = 0;
 	descriptor.Default.DescriptorTypeDependent = 0;
@@ -6524,19 +6486,11 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
 
 	dinitprintk(ioc, ioc_info(ioc, "%s\n", __func__));
 
-	/* setup cpu_msix_table */
 	ioc->cpu_count = num_online_cpus();
 	for_each_online_cpu(cpu_id)
 		last_cpu_id = cpu_id;
 	ioc->cpu_msix_table_sz = last_cpu_id + 1;
-	ioc->cpu_msix_table = kzalloc(ioc->cpu_msix_table_sz, GFP_KERNEL);
 	ioc->reply_queue_count = 1;
-	if (!ioc->cpu_msix_table) {
-		dfailprintk(ioc,
-			    ioc_info(ioc, "allocation for cpu_msix_table failed!!!\n"));
-		r = -ENOMEM;
-		goto out_free_resources;
-	}
 
 	if (ioc->is_warpdrive) {
 		ioc->reply_post_host_index = kcalloc(ioc->cpu_msix_table_sz,
@@ -6748,7 +6702,6 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
 	mpt3sas_base_free_resources(ioc);
 	_base_release_memory_pools(ioc);
 	pci_set_drvdata(ioc->pdev, NULL);
-	kfree(ioc->cpu_msix_table);
 	if (ioc->is_warpdrive)
 		kfree(ioc->reply_post_host_index);
 	kfree(ioc->pd_handles);
@@ -6789,7 +6742,6 @@ mpt3sas_base_detach(struct MPT3SAS_ADAPTER *ioc)
 	_base_release_memory_pools(ioc);
 	mpt3sas_free_enclosure_list(ioc);
 	pci_set_drvdata(ioc->pdev, NULL);
-	kfree(ioc->cpu_msix_table);
 	if (ioc->is_warpdrive)
 		kfree(ioc->reply_post_host_index);
 	kfree(ioc->pd_handles);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 480219f0efc5..4d441e031025 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1022,7 +1022,6 @@ typedef void (*MPT3SAS_FLUSH_RUNNING_CMDS)(struct MPT3SAS_ADAPTER *ioc);
  * @start_scan_failed: means port enable failed, return's the ioc_status
  * @msix_enable: flag indicating msix is enabled
  * @msix_vector_count: number msix vectors
- * @cpu_msix_table: table for mapping cpus to msix index
  * @cpu_msix_table_sz: table size
  * @total_io_cnt: Gives total IO count, used to load balance the interrupts
  * @msix_load_balance: Enables load balancing of interrupts across
@@ -1183,6 +1182,7 @@ struct MPT3SAS_ADAPTER {
 	u16		broadcast_aen_pending;
 	u8		shost_recovery;
 	u8		got_task_abort_from_ioctl;
+	u8		smp_affinity_enable;
 
 	struct mutex	reset_in_progress_mutex;
 	spinlock_t	ioc_reset_in_progress_lock;
@@ -1199,7 +1199,6 @@ struct MPT3SAS_ADAPTER {
 
 	u8		msix_enable;
 	u16		msix_vector_count;
-	u8		*cpu_msix_table;
 	u16		cpu_msix_table_sz;
 	resource_size_t __iomem **reply_post_host_index;
 	u32		ioc_reset_count;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 1ccfbc7eebe0..59c1f9e694a0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -55,6 +55,7 @@
 #include <linux/interrupt.h>
 #include <linux/aer.h>
 #include <linux/raid_class.h>
+#include <linux/blk-mq-pci.h>
 #include <asm/unaligned.h>
 
 #include "mpt3sas_base.h"
@@ -10161,6 +10162,17 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	return 1;
 }
 
+static int mpt3sas_map_queues(struct Scsi_Host *shost)
+{
+	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	if (ioc->smp_affinity_enable)
+		return blk_mq_pci_map_queues(qmap, ioc->pdev, 0);
+	else
+		return blk_mq_map_queues(qmap);
+}
+
 /* shost template for SAS 2.0 HBA devices */
 static struct scsi_host_template mpt2sas_driver_template = {
 	.module				= THIS_MODULE,
@@ -10189,6 +10201,8 @@ static struct scsi_host_template mpt2sas_driver_template = {
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
+	.host_tagset			= 1,
+	.map_queues			= mpt3sas_map_queues,
 };
 
 /* raid transport support for SAS 2.0 HBA devices */
@@ -10227,6 +10241,8 @@ static struct scsi_host_template mpt3sas_driver_template = {
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
+	.host_tagset			= 1,
+	.map_queues			= mpt3sas_map_queues,
 };
 
 /* raid transport support for SAS 3.0 HBA devices */
@@ -10538,6 +10554,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	} else
 		ioc->hide_drives = 0;
 
+	shost->nr_hw_queues = ioc->reply_queue_count;
 	rv = scsi_add_host(shost, &pdev->dev);
 	if (rv) {
 		ioc_err(ioc, "failure at %s:%d/%s()!\n",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
@ 2019-05-31  6:07   ` Hannes Reinecke
  2019-05-31 15:37   ` Bart Van Assche
  2019-06-05 14:10   ` John Garry
  2 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:07 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> Some SCSI HBAs(such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
> multiple reply queues with single hostwide tags, and the reply queue
> is used for delievery & complete request, and one MSI-X vector is
> assigned to each reply queue.
> 
> Now drivers have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
> for automatic affinity assignment. Given there is only single blk-mq hw
> queue, these drivers have to setup private reply queue mapping for
> figuring out which reply queue is selected for delivery request, and
> the queue mapping is based on managed IRQ affinity, and it is generic,
> should have been done inside blk-mq.
> 
> Based on the following Hannes's patch, introduce BLK_MQ_F_HOST_TAGS for
> converting reply queue into blk-mq hw queue.
> 
> 	https://marc.info/?l=linux-block&m=149132580511346&w=2
> 
> Once driver sets BLK_MQ_F_HOST_TAGS, the hostwide tags & request pool is
> shared among all blk-mq hw queues.
> 
> The following patches will map driver's reply queue into blk-mq hw queue
> by applying BLK_MQ_F_HOST_TAGS.
> 
> Compared with the current implementation by single hw queue, performance
> shouldn't be affected by this patch in theory.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-debugfs.c |  1 +
>  block/blk-mq-sched.c   |  8 ++++++++
>  block/blk-mq-tag.c     |  6 ++++++
>  block/blk-mq.c         | 14 ++++++++++++++
>  block/elevator.c       |  5 +++--
>  include/linux/blk-mq.h |  1 +
>  6 files changed, 33 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'
  2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
@ 2019-05-31  6:08   ` Hannes Reinecke
  2019-05-31 15:39   ` Bart Van Assche
  2019-06-02  1:56   ` Minwoo Im
  2 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:08 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> Introduces parameter of 'g_host_tags' for testing hostwide tags.
> 
> Not observe performance drop in the following test:
> 
> 1) no 'host_tags', hw queue depth is 16, and 1 hw queue
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=0 submit_queues=1 hw_queue_depth=16
> 
> 2) 'host_tags', global hw queue depth is 16, and 8 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=1 submit_queues=8 hw_queue_depth=16
> 
> 3) fio test command:
> 
> fio --bs=4k --ioengine=libaio --iodepth=16 --filename=/dev/nullb0:/dev/nullb1:/dev/nullb2:/dev/nullb3 --direct=1 --runtime=30 --numjobs=16 --rw=randread --name=test --group_reporting --gtod_reduce=1
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/null_blk_main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/9] scsi: Add template flag 'host_tagset'
  2019-05-31  2:27 ` [PATCH 3/9] scsi: Add template flag 'host_tagset' Ming Lei
@ 2019-05-31  6:08   ` Hannes Reinecke
  0 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:08 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> Add a host template flag 'host_tagset' so hostwide tagset can be
> shared on multiple reply queues after the SCSI device's reply queue
> is converted to blk-mq hw queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/scsi_lib.c  | 2 ++
>  include/scsi/scsi_host.h | 3 +++
>  2 files changed, 5 insertions(+)
> 
(Do I need to review my own patch?)

Anyway.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] scsi_debug: support host tagset
  2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
@ 2019-05-31  6:09   ` Hannes Reinecke
  2019-06-02  2:03   ` Minwoo Im
  2019-06-02 17:01   ` Douglas Gilbert
  2 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:09 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> The 'host_tagset' can be set on scsi_debug device for testing
> shared hostwide tags on multiple blk-mq hw queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/scsi_debug.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index()
  2019-05-31  2:27 ` [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index() Ming Lei
@ 2019-05-31  6:10   ` Hannes Reinecke
  0 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:10 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> For drivers which enable .host_tagset, introduce scsi_cmnd_hctx_index
> to retrieve current reply queue index. If valid scsi command is provided,
> blk-mq's hw queue's index is returned, otherwise return the queue
> mapped from current CPU.
> 
> Prepare for converting device's privete reply queue to blk-mq hw queue.
> 
                                  ^private

> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  include/scsi/scsi_cmnd.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 ` [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue Ming Lei
@ 2019-05-31  6:15   ` Hannes Reinecke
  2019-05-31  6:30     ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:15 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> queue via .host_tagset.
> 
> Then the private reply mapping can be removed.
> 
> Another benefit is that the request/irq lost issue may be solved in
> generic approach because managed IRQ may be shutdown during CPU
> hotplug.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 49 ++++++++++++++++++---------------------------
>  1 file changed, 19 insertions(+), 30 deletions(-)
> 
There had been requests to make the internal interrupt mapping optional;
but I guess we first should
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 1bef1da273c2..c7136f9f0ce1 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>  #include <linux/jiffies.h>
>  #include <linux/percpu-defs.h>
>  #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>  #include <asm/div64.h>
>  #include "hpsa_cmd.h"
> @@ -902,6 +903,18 @@ static ssize_t host_show_legacy_board(struct device *dev,
>  	return snprintf(buf, 20, "%d\n", h->legacy_board ? 1 : 0);
>  }
>  
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +	struct ctlr_info *h = shost_to_hba(shost);
> +	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +	/* Switch to cpu mapping in case that managed IRQ isn't used */
> +	if (shost->nr_hw_queues > 1)
> +		return blk_mq_pci_map_queues(qmap, h->pdev, 0);
> +	else
> +		return blk_mq_map_queues(qmap);
> +}
> +
>  static DEVICE_ATTR_RO(raid_level);
>  static DEVICE_ATTR_RO(lunid);
>  static DEVICE_ATTR_RO(unique_id);
This helper is pretty much shared between all converted drivers.
Shouldn't we have a common function here?
Something like

scsi_mq_host_tag_map(struct Scsi_Host *shost, int offset)?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  2:27 ` [PATCH 7/9] scsi: hisi_sas_v3: " Ming Lei
@ 2019-05-31  6:20   ` Hannes Reinecke
  2019-05-31  6:34     ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:20 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:27 AM, Ming Lei wrote:
> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> queue via .host_tagset.
> 
> Then the private reply mapping can be removed.
> 
> Another benefit is that the request/irq lost issue may be solved in
> generic approach because managed IRQ may be shutdown during CPU
> hotplug.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas.h       |  2 +-
>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++++++----------
>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 46 +++++++++-----------------
>  3 files changed, 36 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
> index fc87994b5d73..3d48848dbde7 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> @@ -26,6 +26,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/property.h>
>  #include <linux/regmap.h>
> +#include <linux/blk-mq-pci.h>
>  #include <scsi/sas_ata.h>
>  #include <scsi/libsas.h>
>  
> @@ -378,7 +379,6 @@ struct hisi_hba {
>  	u32 intr_coal_count;	/* Interrupt count to coalesce */
>  
>  	int cq_nvecs;
> -	unsigned int *reply_map;
>  
>  	/* debugfs memories */
>  	u32 *debugfs_global_reg;
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 8a7feb8ed8d6..a1c1f30b9fdb 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
>  	return rc;
>  }
>  
> +static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
> +{
> +	if (!task->uldd_task)
> +		return NULL;
> +
> +	if (dev_is_sata(task->dev)) {
> +		struct ata_queued_cmd *qc = task->uldd_task;
> +		return qc->scsicmd;
> +	} else {
> +		return task->uldd_task;
> +	}
> +}
> +
>  static int hisi_sas_task_prep(struct sas_task *task,
>  			      struct hisi_sas_dq **dq_pointer,
>  			      bool is_tmf, struct hisi_sas_tmf_task *tmf,
> @@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
>  	struct hisi_sas_dq *dq;
>  	unsigned long flags;
>  	int wr_q_index;
> +	struct scsi_cmnd *scsi_cmnd;
>  
>  	if (DEV_IS_GONE(sas_dev)) {
>  		if (sas_dev)
> @@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
>  		return -ECOMM;
>  	}
>  
> -	if (hisi_hba->reply_map) {
> -		int cpu = raw_smp_processor_id();
> -		unsigned int dq_index = hisi_hba->reply_map[cpu];
> +	scsi_cmnd = sas_task_to_scsi_cmd(task);
> +	if (hisi_hba->shost->hostt->host_tagset) {
> +		unsigned int dq_index = scsi_cmnd_hctx_index(
> +				hisi_hba->shost, scsi_cmnd);
>  
>  		*dq_pointer = dq = &hisi_hba->dq[dq_index];
>  	} else {
> @@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,
>  
>  	if (hisi_hba->hw->slot_index_alloc)
>  		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
> -	else {
> -		struct scsi_cmnd *scsi_cmnd = NULL;
> -
> -		if (task->uldd_task) {
> -			struct ata_queued_cmd *qc;
> -
> -			if (dev_is_sata(device)) {
> -				qc = task->uldd_task;
> -				scsi_cmnd = qc->scsicmd;
> -			} else {
> -				scsi_cmnd = task->uldd_task;
> -			}
> -		}
> +	else
>  		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
> -	}
>  	if (rc < 0)
>  		goto err_out_dif_dma_unmap;
>  
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> index 49620c2411df..063e50e5b30c 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> @@ -2344,30 +2344,6 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
>  	return IRQ_HANDLED;
>  }
>  
> -static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
> -{
> -	const struct cpumask *mask;
> -	int queue, cpu;
> -
> -	for (queue = 0; queue < nvecs; queue++) {
> -		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
> -
> -		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
> -					    BASE_VECTORS_V3_HW);
> -		if (!mask)
> -			goto fallback;
> -		cq->pci_irq_mask = mask;
> -		for_each_cpu(cpu, mask)
> -			hisi_hba->reply_map[cpu] = queue;
> -	}
> -	return;
> -
> -fallback:
> -	for_each_possible_cpu(cpu)
> -		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
> -	/* Don't clean all CQ masks */
> -}
> -
>  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>  {
>  	struct device *dev = hisi_hba->dev;
> @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>  
>  		min_msi = MIN_AFFINE_VECTORS_V3_HW;
>  
> -		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
> -						   sizeof(unsigned int),
> -						   GFP_KERNEL);
> -		if (!hisi_hba->reply_map)
> -			return -ENOMEM;
>  		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
>  							 min_msi, max_msi,
>  							 PCI_IRQ_MSI |
> @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>  							 &desc);
>  		if (vectors < 0)
>  			return -ENOENT;
> -		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
>  	} else {
>  		min_msi = max_msi;
>  		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
> @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
>  	clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
>  }
>  
> +static int hisi_sas_map_queues(struct Scsi_Host *shost)
> +{
> +	struct hisi_hba *hisi_hba = shost_priv(shost);
> +	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +	if (auto_affine_msi_experimental)
> +		return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
> +				BASE_VECTORS_V3_HW);
> +	else
> +		return blk_mq_map_queues(qmap);
> +}
> +
>  static struct scsi_host_template sht_v3_hw = {
>  	.name			= DRV_NAME,
>  	.module			= THIS_MODULE,

As mentioned, we should be using a common function here.

> @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
>  	.scan_start		= hisi_sas_scan_start,
>  	.change_queue_depth	= sas_change_queue_depth,
>  	.bios_param		= sas_bios_param,
> +	.map_queues		= hisi_sas_map_queues,
> +	.host_tagset		= 1,
>  	.this_id		= -1,
>  	.sg_tablesize		= HISI_SAS_SGE_PAGE_CNT,
>  	.sg_prot_tablesize	= HISI_SAS_SGE_PAGE_CNT,
> @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	if (hisi_sas_debugfs_enable)
>  		hisi_sas_debugfs_init(hisi_hba);
>  
> +	shost->nr_hw_queues = hisi_hba->cq_nvecs;
> +
>  	rc = scsi_add_host(shost, dev);
>  	if (rc)
>  		goto err_out_ha;
> 
Well, I'd rather see the v3 hardware converted to 'real' blk-mq first;
the hardware itself is pretty much multiqueue already, so we should be
better off converting it to blk-mq.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-05-31  2:28 ` [PATCH 8/9] scsi: megaraid: " Ming Lei
@ 2019-05-31  6:22   ` Hannes Reinecke
  2019-06-01 21:41   ` Kashyap Desai
  1 sibling, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:22 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:28 AM, Ming Lei wrote:
> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> queue via .host_tagset.
> 
> Then the private reply mapping can be removed.
> 
> Another benefit is that the request/irq lost issue may be solved in
> generic approach because managed IRQ may be shutdown during CPU
> hotplug.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 50 ++++++++-------------
>  drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +-
>  2 files changed, 20 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 3dd1df472dc6..b49999b90231 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -33,6 +33,7 @@
>  #include <linux/fs.h>
>  #include <linux/compat.h>
>  #include <linux/blkdev.h>
> +#include <linux/blk-mq-pci.h>
>  #include <linux/mutex.h>
>  #include <linux/poll.h>
>  #include <linux/vmalloc.h>
> @@ -3165,6 +3166,19 @@ megasas_fw_cmds_outstanding_show(struct device *cdev,
>  	return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&instance->fw_outstanding));
>  }
>  
> +static int megasas_map_queues(struct Scsi_Host *shost)
> +{
> +	struct megasas_instance *instance = (struct megasas_instance *)
> +		shost->hostdata;
> +	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +	if (smp_affinity_enable && instance->msix_vectors)
> +		return blk_mq_pci_map_queues(qmap, instance->pdev, 0);
> +	else
> +		return blk_mq_map_queues(qmap);
> +}
> +
> +
>  static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
>  	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
>  static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,

As mentioned, we should be using a common function here.

> @@ -3207,7 +3221,9 @@ static struct scsi_host_template megasas_template = {
>  	.shost_attrs = megaraid_host_attrs,
>  	.bios_param = megasas_bios_param,
>  	.change_queue_depth = scsi_change_queue_depth,
> +	.map_queues =  megasas_map_queues,
>  	.no_write_same = 1,
> +	.host_tagset = 1,
>  };
>  
>  /**
> @@ -5407,26 +5423,6 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
>  		instance->use_seqnum_jbod_fp = false;
>  }
>  
> -static void megasas_setup_reply_map(struct megasas_instance *instance)
> -{
> -	const struct cpumask *mask;
> -	unsigned int queue, cpu;
> -
> -	for (queue = 0; queue < instance->msix_vectors; queue++) {
> -		mask = pci_irq_get_affinity(instance->pdev, queue);
> -		if (!mask)
> -			goto fallback;
> -
> -		for_each_cpu(cpu, mask)
> -			instance->reply_map[cpu] = queue;
> -	}
> -	return;
> -
> -fallback:
> -	for_each_possible_cpu(cpu)
> -		instance->reply_map[cpu] = cpu % instance->msix_vectors;
> -}
> -
>  /**
>   * megasas_get_device_list -	Get the PD and LD device list from FW.
>   * @instance:			Adapter soft state
> @@ -5666,8 +5662,6 @@ static int megasas_init_fw(struct megasas_instance *instance)
>  			goto fail_init_adapter;
>  	}
>  
> -	megasas_setup_reply_map(instance);
> -
>  	dev_info(&instance->pdev->dev,
>  		"firmware supports msix\t: (%d)", fw_msix_count);
>  	dev_info(&instance->pdev->dev,
> @@ -6298,6 +6292,8 @@ static int megasas_io_attach(struct megasas_instance *instance)
>  	host->max_lun = MEGASAS_MAX_LUN;
>  	host->max_cmd_len = 16;
>  
> +	host->nr_hw_queues = instance->msix_vectors ?: 1;
> +
>  	/*
>  	 * Notify the mid-layer about the new controller
>  	 */
> @@ -6464,11 +6460,6 @@ static inline int megasas_alloc_mfi_ctrl_mem(struct megasas_instance *instance)
>   */
>  static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>  {
> -	instance->reply_map = kcalloc(nr_cpu_ids, sizeof(unsigned int),
> -				      GFP_KERNEL);
> -	if (!instance->reply_map)
> -		return -ENOMEM;
> -
>  	switch (instance->adapter_type) {
>  	case MFI_SERIES:
>  		if (megasas_alloc_mfi_ctrl_mem(instance))
> @@ -6485,8 +6476,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>  
>  	return 0;
>   fail:
> -	kfree(instance->reply_map);
> -	instance->reply_map = NULL;
>  	return -ENOMEM;
>  }
>  
> @@ -6499,7 +6488,6 @@ static int megasas_alloc_ctrl_mem(struct megasas_instance *instance)
>   */
>  static inline void megasas_free_ctrl_mem(struct megasas_instance *instance)
>  {
> -	kfree(instance->reply_map);
>  	if (instance->adapter_type == MFI_SERIES) {
>  		if (instance->producer)
>  			dma_free_coherent(&instance->pdev->dev, sizeof(u32),
> @@ -7142,8 +7130,6 @@ megasas_resume(struct pci_dev *pdev)
>  	if (rval < 0)
>  		goto fail_reenable_msix;
>  
> -	megasas_setup_reply_map(instance);
> -
>  	if (instance->adapter_type != MFI_SERIES) {
>  		megasas_reset_reply_desc(instance);
>  		if (megasas_ioc_init_fusion(instance)) {
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index 4dfa0685a86c..4f909f32bf5c 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -2699,7 +2699,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
>  	}
>  
>  	cmd->request_desc->SCSIIO.MSIxIndex =
> -		instance->reply_map[raw_smp_processor_id()];
> +		scsi_cmnd_hctx_index(instance->host, scp);
>  
>  	if (instance->adapter_type >= VENTURA_SERIES) {
>  		/* FP for Optimal raid level 1.
> @@ -3013,7 +3013,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
>  	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
>  
>  	cmd->request_desc->SCSIIO.MSIxIndex =
> -		instance->reply_map[raw_smp_processor_id()];
> +		scsi_cmnd_hctx_index(instance->host, scmd);
>  
>  	if (!fp_possible) {
>  		/* system pd firmware path */
> 
Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 9/9] scsi: mp3sas: convert private reply queue to blk-mq hw queue
  2019-05-31  2:28 ` [PATCH 9/9] scsi: mp3sas: " Ming Lei
@ 2019-05-31  6:23   ` Hannes Reinecke
  2019-06-06 11:58   ` Sreekanth Reddy
  1 sibling, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:23 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/31/19 4:28 AM, Ming Lei wrote:
> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> queue via .host_tagset.
> 
> Then the private reply mapping can be removed.
> 
> Another benefit is that the request/irq lost issue may be solved in
> generic approach because managed IRQ may be shutdown during CPU
> hotplug.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c  | 74 +++++-----------------------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 17 +++++++
>  3 files changed, 31 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 8aacbd1e7db2..2b207d2925b4 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -2855,8 +2855,7 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index)
>  static void
>  _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
>  {
> -	unsigned int cpu, nr_cpus, nr_msix, index = 0;
> -	struct adapter_reply_queue *reply_q;
> +	unsigned int nr_cpus, nr_msix;
>  
>  	if (!_base_is_controller_msix_enabled(ioc))
>  		return;
> @@ -2866,50 +2865,9 @@ _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
>  		return;
>  	}
>  
> -	memset(ioc->cpu_msix_table, 0, ioc->cpu_msix_table_sz);
> -
>  	nr_cpus = num_online_cpus();
>  	nr_msix = ioc->reply_queue_count = min(ioc->reply_queue_count,
>  					       ioc->facts.MaxMSIxVectors);
> -	if (!nr_msix)
> -		return;
> -
> -	if (smp_affinity_enable) {
> -		list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
> -			const cpumask_t *mask = pci_irq_get_affinity(ioc->pdev,
> -							reply_q->msix_index);
> -			if (!mask) {
> -				ioc_warn(ioc, "no affinity for msi %x\n",
> -					 reply_q->msix_index);
> -				continue;
> -			}
> -
> -			for_each_cpu_and(cpu, mask, cpu_online_mask) {
> -				if (cpu >= ioc->cpu_msix_table_sz)
> -					break;
> -				ioc->cpu_msix_table[cpu] = reply_q->msix_index;
> -			}
> -		}
> -		return;
> -	}
> -	cpu = cpumask_first(cpu_online_mask);
> -
> -	list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
> -
> -		unsigned int i, group = nr_cpus / nr_msix;
> -
> -		if (cpu >= nr_cpus)
> -			break;
> -
> -		if (index < nr_cpus % nr_msix)
> -			group++;
> -
> -		for (i = 0 ; i < group ; i++) {
> -			ioc->cpu_msix_table[cpu] = reply_q->msix_index;
> -			cpu = cpumask_next(cpu, cpu_online_mask);
> -		}
> -		index++;
> -	}
>  }
>  
>  /**
> @@ -2924,6 +2882,7 @@ _base_disable_msix(struct MPT3SAS_ADAPTER *ioc)
>  		return;
>  	pci_disable_msix(ioc->pdev);
>  	ioc->msix_enable = 0;
> +	ioc->smp_affinity_enable = 0;
>  }
>  
>  /**
> @@ -2980,6 +2939,9 @@ _base_enable_msix(struct MPT3SAS_ADAPTER *ioc)
>  		goto try_ioapic;
>  	}
>  
> +	if (irq_flags & PCI_IRQ_AFFINITY)
> +		ioc->smp_affinity_enable = 1;
> +
>  	ioc->msix_enable = 1;
>  	ioc->reply_queue_count = r;
>  	for (i = 0; i < ioc->reply_queue_count; i++) {
> @@ -3266,7 +3228,7 @@ mpt3sas_base_get_reply_virt_addr(struct MPT3SAS_ADAPTER *ioc, u32 phys_addr)
>  }
>  
>  static inline u8
> -_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
> +_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>  {
>  	/* Enables reply_queue load balancing */
>  	if (ioc->msix_load_balance)
> @@ -3274,7 +3236,7 @@ _base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
>  		    base_mod64(atomic64_add_return(1,
>  		    &ioc->total_io_cnt), ioc->reply_queue_count) : 0;
>  
> -	return ioc->cpu_msix_table[raw_smp_processor_id()];
> +	return scsi_cmnd_hctx_index(ioc->shost, scmd);
>  }
>  
>  /**
> @@ -3325,7 +3287,7 @@ mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
>  
>  	smid = tag + 1;
>  	request->cb_idx = cb_idx;
> -	request->msix_io = _base_get_msix_index(ioc);
> +	request->msix_io = _base_get_msix_index(ioc, scmd);
>  	request->smid = smid;
>  	INIT_LIST_HEAD(&request->chain_list);
>  	return smid;
> @@ -3498,7 +3460,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
>  	_base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp,
>  					ioc->request_sz);
>  	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
> -	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
> +	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>  	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>  	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>  	descriptor.SCSIIO.LMID = 0;
> @@ -3520,7 +3482,7 @@ _base_put_smid_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
>  
>  
>  	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
> -	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
> +	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>  	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>  	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>  	descriptor.SCSIIO.LMID = 0;
> @@ -3543,7 +3505,7 @@ mpt3sas_base_put_smid_fast_path(struct MPT3SAS_ADAPTER *ioc, u16 smid,
>  
>  	descriptor.SCSIIO.RequestFlags =
>  	    MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO;
> -	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc);
> +	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL);
>  	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>  	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>  	descriptor.SCSIIO.LMID = 0;
> @@ -3607,7 +3569,7 @@ mpt3sas_base_put_smid_nvme_encap(struct MPT3SAS_ADAPTER *ioc, u16 smid)
>  
>  	descriptor.Default.RequestFlags =
>  		MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED;
> -	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
> +	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>  	descriptor.Default.SMID = cpu_to_le16(smid);
>  	descriptor.Default.LMID = 0;
>  	descriptor.Default.DescriptorTypeDependent = 0;
> @@ -3639,7 +3601,7 @@ mpt3sas_base_put_smid_default(struct MPT3SAS_ADAPTER *ioc, u16 smid)
>  	}
>  	request = (u64 *)&descriptor;
>  	descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE;
> -	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
> +	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>  	descriptor.Default.SMID = cpu_to_le16(smid);
>  	descriptor.Default.LMID = 0;
>  	descriptor.Default.DescriptorTypeDependent = 0;
> @@ -6524,19 +6486,11 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
>  
>  	dinitprintk(ioc, ioc_info(ioc, "%s\n", __func__));
>  
> -	/* setup cpu_msix_table */
>  	ioc->cpu_count = num_online_cpus();
>  	for_each_online_cpu(cpu_id)
>  		last_cpu_id = cpu_id;
>  	ioc->cpu_msix_table_sz = last_cpu_id + 1;
> -	ioc->cpu_msix_table = kzalloc(ioc->cpu_msix_table_sz, GFP_KERNEL);
>  	ioc->reply_queue_count = 1;
> -	if (!ioc->cpu_msix_table) {
> -		dfailprintk(ioc,
> -			    ioc_info(ioc, "allocation for cpu_msix_table failed!!!\n"));
> -		r = -ENOMEM;
> -		goto out_free_resources;
> -	}
>  
>  	if (ioc->is_warpdrive) {
>  		ioc->reply_post_host_index = kcalloc(ioc->cpu_msix_table_sz,
> @@ -6748,7 +6702,6 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
>  	mpt3sas_base_free_resources(ioc);
>  	_base_release_memory_pools(ioc);
>  	pci_set_drvdata(ioc->pdev, NULL);
> -	kfree(ioc->cpu_msix_table);
>  	if (ioc->is_warpdrive)
>  		kfree(ioc->reply_post_host_index);
>  	kfree(ioc->pd_handles);
> @@ -6789,7 +6742,6 @@ mpt3sas_base_detach(struct MPT3SAS_ADAPTER *ioc)
>  	_base_release_memory_pools(ioc);
>  	mpt3sas_free_enclosure_list(ioc);
>  	pci_set_drvdata(ioc->pdev, NULL);
> -	kfree(ioc->cpu_msix_table);
>  	if (ioc->is_warpdrive)
>  		kfree(ioc->reply_post_host_index);
>  	kfree(ioc->pd_handles);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 480219f0efc5..4d441e031025 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -1022,7 +1022,6 @@ typedef void (*MPT3SAS_FLUSH_RUNNING_CMDS)(struct MPT3SAS_ADAPTER *ioc);
>   * @start_scan_failed: means port enable failed, return's the ioc_status
>   * @msix_enable: flag indicating msix is enabled
>   * @msix_vector_count: number msix vectors
> - * @cpu_msix_table: table for mapping cpus to msix index
>   * @cpu_msix_table_sz: table size
>   * @total_io_cnt: Gives total IO count, used to load balance the interrupts
>   * @msix_load_balance: Enables load balancing of interrupts across
> @@ -1183,6 +1182,7 @@ struct MPT3SAS_ADAPTER {
>  	u16		broadcast_aen_pending;
>  	u8		shost_recovery;
>  	u8		got_task_abort_from_ioctl;
> +	u8		smp_affinity_enable;
>  
>  	struct mutex	reset_in_progress_mutex;
>  	spinlock_t	ioc_reset_in_progress_lock;
> @@ -1199,7 +1199,6 @@ struct MPT3SAS_ADAPTER {
>  
>  	u8		msix_enable;
>  	u16		msix_vector_count;
> -	u8		*cpu_msix_table;
>  	u16		cpu_msix_table_sz;
>  	resource_size_t __iomem **reply_post_host_index;
>  	u32		ioc_reset_count;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 1ccfbc7eebe0..59c1f9e694a0 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -55,6 +55,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/aer.h>
>  #include <linux/raid_class.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>  
>  #include "mpt3sas_base.h"
> @@ -10161,6 +10162,17 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time)
>  	return 1;
>  }
>  
> +static int mpt3sas_map_queues(struct Scsi_Host *shost)
> +{
> +	struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +	if (ioc->smp_affinity_enable)
> +		return blk_mq_pci_map_queues(qmap, ioc->pdev, 0);
> +	else
> +		return blk_mq_map_queues(qmap);
> +}
> +
>  /* shost template for SAS 2.0 HBA devices */
>  static struct scsi_host_template mpt2sas_driver_template = {
>  	.module				= THIS_MODULE,

As indicated, we should be using a common function here.

> @@ -10189,6 +10201,8 @@ static struct scsi_host_template mpt2sas_driver_template = {
>  	.sdev_attrs			= mpt3sas_dev_attrs,
>  	.track_queue_depth		= 1,
>  	.cmd_size			= sizeof(struct scsiio_tracker),
> +	.host_tagset			= 1,
> +	.map_queues			= mpt3sas_map_queues,
>  };
>  
>  /* raid transport support for SAS 2.0 HBA devices */
> @@ -10227,6 +10241,8 @@ static struct scsi_host_template mpt3sas_driver_template = {
>  	.sdev_attrs			= mpt3sas_dev_attrs,
>  	.track_queue_depth		= 1,
>  	.cmd_size			= sizeof(struct scsiio_tracker),
> +	.host_tagset			= 1,
> +	.map_queues			= mpt3sas_map_queues,
>  };
>  
>  /* raid transport support for SAS 3.0 HBA devices */
> @@ -10538,6 +10554,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  	} else
>  		ioc->hide_drives = 0;
>  
> +	shost->nr_hw_queues = ioc->reply_queue_count;
>  	rv = scsi_add_host(shost, &pdev->dev);
>  	if (rv) {
>  		ioc_err(ioc, "failure at %s:%d/%s()!\n",
> 
Otherwise:

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue
  2019-05-31  6:15   ` Hannes Reinecke
@ 2019-05-31  6:30     ` Ming Lei
  2019-05-31  6:40       ` Hannes Reinecke
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-05-31  6:30 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, John Garry, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 2:15 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 5/31/19 4:27 AM, Ming Lei wrote:
> > SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> > assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> > queue via .host_tagset.
> >
> > Then the private reply mapping can be removed.
> >
> > Another benefit is that the request/irq lost issue may be solved in
> > generic approach because managed IRQ may be shutdown during CPU
> > hotplug.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/hpsa.c | 49 ++++++++++++++++++---------------------------
> >  1 file changed, 19 insertions(+), 30 deletions(-)
> >
> There had been requests to make the internal interrupt mapping optional;
> but I guess we first should

For HPSA, either managed IRQ is used or single MSI-X vector is allocated,
I am pretty sure that both cases are covered in this patch, so not sure what the
'optional' means.

> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 1bef1da273c2..c7136f9f0ce1 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -51,6 +51,7 @@
> >  #include <linux/jiffies.h>
> >  #include <linux/percpu-defs.h>
> >  #include <linux/percpu.h>
> > +#include <linux/blk-mq-pci.h>
> >  #include <asm/unaligned.h>
> >  #include <asm/div64.h>
> >  #include "hpsa_cmd.h"
> > @@ -902,6 +903,18 @@ static ssize_t host_show_legacy_board(struct device *dev,
> >       return snprintf(buf, 20, "%d\n", h->legacy_board ? 1 : 0);
> >  }
> >
> > +static int hpsa_map_queues(struct Scsi_Host *shost)
> > +{
> > +     struct ctlr_info *h = shost_to_hba(shost);
> > +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> > +
> > +     /* Switch to cpu mapping in case that managed IRQ isn't used */
> > +     if (shost->nr_hw_queues > 1)
> > +             return blk_mq_pci_map_queues(qmap, h->pdev, 0);
> > +     else
> > +             return blk_mq_map_queues(qmap);
> > +}
> > +
> >  static DEVICE_ATTR_RO(raid_level);
> >  static DEVICE_ATTR_RO(lunid);
> >  static DEVICE_ATTR_RO(unique_id);
> This helper is pretty much shared between all converted drivers.
> Shouldn't we have a common function here?
> Something like
>
> scsi_mq_host_tag_map(struct Scsi_Host *shost, int offset)?

I am not sure if the common helper is helpful much, since the
condition for using
cpu map or pci map still depends on driver private state, we still
have to define
each driver's .map_queues too.

Also PCI device pointer has to be provided.

thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  6:20   ` Hannes Reinecke
@ 2019-05-31  6:34     ` Ming Lei
  2019-05-31  6:42       ` Hannes Reinecke
  2019-05-31 11:38       ` John Garry
  0 siblings, 2 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  6:34 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, John Garry, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 2:21 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 5/31/19 4:27 AM, Ming Lei wrote:
> > SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> > assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> > queue via .host_tagset.
> >
> > Then the private reply mapping can be removed.
> >
> > Another benefit is that the request/irq lost issue may be solved in
> > generic approach because managed IRQ may be shutdown during CPU
> > hotplug.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/hisi_sas/hisi_sas.h       |  2 +-
> >  drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++++++----------
> >  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 46 +++++++++-----------------
> >  3 files changed, 36 insertions(+), 48 deletions(-)
> >
> > diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
> > index fc87994b5d73..3d48848dbde7 100644
> > --- a/drivers/scsi/hisi_sas/hisi_sas.h
> > +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> > @@ -26,6 +26,7 @@
> >  #include <linux/platform_device.h>
> >  #include <linux/property.h>
> >  #include <linux/regmap.h>
> > +#include <linux/blk-mq-pci.h>
> >  #include <scsi/sas_ata.h>
> >  #include <scsi/libsas.h>
> >
> > @@ -378,7 +379,6 @@ struct hisi_hba {
> >       u32 intr_coal_count;    /* Interrupt count to coalesce */
> >
> >       int cq_nvecs;
> > -     unsigned int *reply_map;
> >
> >       /* debugfs memories */
> >       u32 *debugfs_global_reg;
> > diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> > index 8a7feb8ed8d6..a1c1f30b9fdb 100644
> > --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> > +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> > @@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
> >       return rc;
> >  }
> >
> > +static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
> > +{
> > +     if (!task->uldd_task)
> > +             return NULL;
> > +
> > +     if (dev_is_sata(task->dev)) {
> > +             struct ata_queued_cmd *qc = task->uldd_task;
> > +             return qc->scsicmd;
> > +     } else {
> > +             return task->uldd_task;
> > +     }
> > +}
> > +
> >  static int hisi_sas_task_prep(struct sas_task *task,
> >                             struct hisi_sas_dq **dq_pointer,
> >                             bool is_tmf, struct hisi_sas_tmf_task *tmf,
> > @@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >       struct hisi_sas_dq *dq;
> >       unsigned long flags;
> >       int wr_q_index;
> > +     struct scsi_cmnd *scsi_cmnd;
> >
> >       if (DEV_IS_GONE(sas_dev)) {
> >               if (sas_dev)
> > @@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >               return -ECOMM;
> >       }
> >
> > -     if (hisi_hba->reply_map) {
> > -             int cpu = raw_smp_processor_id();
> > -             unsigned int dq_index = hisi_hba->reply_map[cpu];
> > +     scsi_cmnd = sas_task_to_scsi_cmd(task);
> > +     if (hisi_hba->shost->hostt->host_tagset) {
> > +             unsigned int dq_index = scsi_cmnd_hctx_index(
> > +                             hisi_hba->shost, scsi_cmnd);
> >
> >               *dq_pointer = dq = &hisi_hba->dq[dq_index];
> >       } else {
> > @@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >
> >       if (hisi_hba->hw->slot_index_alloc)
> >               rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
> > -     else {
> > -             struct scsi_cmnd *scsi_cmnd = NULL;
> > -
> > -             if (task->uldd_task) {
> > -                     struct ata_queued_cmd *qc;
> > -
> > -                     if (dev_is_sata(device)) {
> > -                             qc = task->uldd_task;
> > -                             scsi_cmnd = qc->scsicmd;
> > -                     } else {
> > -                             scsi_cmnd = task->uldd_task;
> > -                     }
> > -             }
> > +     else
> >               rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
> > -     }
> >       if (rc < 0)
> >               goto err_out_dif_dma_unmap;
> >
> > diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > index 49620c2411df..063e50e5b30c 100644
> > --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> > @@ -2344,30 +2344,6 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
> >       return IRQ_HANDLED;
> >  }
> >
> > -static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
> > -{
> > -     const struct cpumask *mask;
> > -     int queue, cpu;
> > -
> > -     for (queue = 0; queue < nvecs; queue++) {
> > -             struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
> > -
> > -             mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
> > -                                         BASE_VECTORS_V3_HW);
> > -             if (!mask)
> > -                     goto fallback;
> > -             cq->pci_irq_mask = mask;
> > -             for_each_cpu(cpu, mask)
> > -                     hisi_hba->reply_map[cpu] = queue;
> > -     }
> > -     return;
> > -
> > -fallback:
> > -     for_each_possible_cpu(cpu)
> > -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
> > -     /* Don't clean all CQ masks */
> > -}
> > -
> >  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >  {
> >       struct device *dev = hisi_hba->dev;
> > @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >
> >               min_msi = MIN_AFFINE_VECTORS_V3_HW;
> >
> > -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
> > -                                                sizeof(unsigned int),
> > -                                                GFP_KERNEL);
> > -             if (!hisi_hba->reply_map)
> > -                     return -ENOMEM;
> >               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
> >                                                        min_msi, max_msi,
> >                                                        PCI_IRQ_MSI |
> > @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >                                                        &desc);
> >               if (vectors < 0)
> >                       return -ENOENT;
> > -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
> >       } else {
> >               min_msi = max_msi;
> >               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
> > @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
> >       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
> >  }
> >
> > +static int hisi_sas_map_queues(struct Scsi_Host *shost)
> > +{
> > +     struct hisi_hba *hisi_hba = shost_priv(shost);
> > +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> > +
> > +     if (auto_affine_msi_experimental)
> > +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
> > +                             BASE_VECTORS_V3_HW);
> > +     else
> > +             return blk_mq_map_queues(qmap);
> > +}
> > +
> >  static struct scsi_host_template sht_v3_hw = {
> >       .name                   = DRV_NAME,
> >       .module                 = THIS_MODULE,
>
> As mentioned, we should be using a common function here.
>
> > @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
> >       .scan_start             = hisi_sas_scan_start,
> >       .change_queue_depth     = sas_change_queue_depth,
> >       .bios_param             = sas_bios_param,
> > +     .map_queues             = hisi_sas_map_queues,
> > +     .host_tagset            = 1,
> >       .this_id                = -1,
> >       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
> >       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
> > @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >       if (hisi_sas_debugfs_enable)
> >               hisi_sas_debugfs_init(hisi_hba);
> >
> > +     shost->nr_hw_queues = hisi_hba->cq_nvecs;
> > +
> >       rc = scsi_add_host(shost, dev);
> >       if (rc)
> >               goto err_out_ha;
> >
> Well, I'd rather see the v3 hardware converted to 'real' blk-mq first;
> the hardware itself is pretty much multiqueue already, so we should be
> better off converting it to blk-mq.

From John Garry's input, the tags is still hostwide, then not sure how to
partition the hostwide tags into each hw queue's tags. That can be quite
hard to do if the queue depth isn't big enough.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue
  2019-05-31  6:30     ` Ming Lei
@ 2019-05-31  6:40       ` Hannes Reinecke
  0 siblings, 0 replies; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, John Garry, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On 5/31/19 8:30 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 2:15 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 5/31/19 4:27 AM, Ming Lei wrote:
>>> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
>>> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
>>> queue via .host_tagset.
>>>
>>> Then the private reply mapping can be removed.
>>>
>>> Another benefit is that the request/irq lost issue may be solved in
>>> generic approach because managed IRQ may be shutdown during CPU
>>> hotplug.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>  drivers/scsi/hpsa.c | 49 ++++++++++++++++++---------------------------
>>>  1 file changed, 19 insertions(+), 30 deletions(-)
>>>
>> There had been requests to make the internal interrupt mapping optional;
>> but I guess we first should
> 
> For HPSA, either managed IRQ is used or single MSI-X vector is allocated,
> I am pretty sure that both cases are covered in this patch, so not sure what the
> 'optional' means.
> 
>>> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
>>> index 1bef1da273c2..c7136f9f0ce1 100644
>>> --- a/drivers/scsi/hpsa.c
>>> +++ b/drivers/scsi/hpsa.c
>>> @@ -51,6 +51,7 @@
>>>  #include <linux/jiffies.h>
>>>  #include <linux/percpu-defs.h>
>>>  #include <linux/percpu.h>
>>> +#include <linux/blk-mq-pci.h>
>>>  #include <asm/unaligned.h>
>>>  #include <asm/div64.h>
>>>  #include "hpsa_cmd.h"
>>> @@ -902,6 +903,18 @@ static ssize_t host_show_legacy_board(struct device *dev,
>>>       return snprintf(buf, 20, "%d\n", h->legacy_board ? 1 : 0);
>>>  }
>>>
>>> +static int hpsa_map_queues(struct Scsi_Host *shost)
>>> +{
>>> +     struct ctlr_info *h = shost_to_hba(shost);
>>> +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
>>> +
>>> +     /* Switch to cpu mapping in case that managed IRQ isn't used */
>>> +     if (shost->nr_hw_queues > 1)
>>> +             return blk_mq_pci_map_queues(qmap, h->pdev, 0);
>>> +     else
>>> +             return blk_mq_map_queues(qmap);
>>> +}
>>> +
>>>  static DEVICE_ATTR_RO(raid_level);
>>>  static DEVICE_ATTR_RO(lunid);
>>>  static DEVICE_ATTR_RO(unique_id);
>> This helper is pretty much shared between all converted drivers.
>> Shouldn't we have a common function here?
>> Something like
>>
>> scsi_mq_host_tag_map(struct Scsi_Host *shost, int offset)?
> 
> I am not sure if the common helper is helpful much, since the
> condition for using
> cpu map or pci map still depends on driver private state, we still
> have to define
> each driver's .map_queues too.
> 
> Also PCI device pointer has to be provided.
> 
Hmm. Okay, so let's keep it this way.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  6:34     ` Ming Lei
@ 2019-05-31  6:42       ` Hannes Reinecke
  2019-05-31  7:14         ` Ming Lei
  2019-05-31 11:38       ` John Garry
  1 sibling, 1 reply; 48+ messages in thread
From: Hannes Reinecke @ 2019-05-31  6:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, John Garry, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On 5/31/19 8:34 AM, Ming Lei wrote:
> On Fri, May 31, 2019 at 2:21 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 5/31/19 4:27 AM, Ming Lei wrote:
>>> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
>>> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
>>> queue via .host_tagset.
>>>
>>> Then the private reply mapping can be removed.
>>>
>>> Another benefit is that the request/irq lost issue may be solved in
>>> generic approach because managed IRQ may be shutdown during CPU
>>> hotplug.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>>  drivers/scsi/hisi_sas/hisi_sas.h       |  2 +-
>>>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++++++----------
>>>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 46 +++++++++-----------------
>>>  3 files changed, 36 insertions(+), 48 deletions(-)
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
>>> index fc87994b5d73..3d48848dbde7 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>>> @@ -26,6 +26,7 @@
>>>  #include <linux/platform_device.h>
>>>  #include <linux/property.h>
>>>  #include <linux/regmap.h>
>>> +#include <linux/blk-mq-pci.h>
>>>  #include <scsi/sas_ata.h>
>>>  #include <scsi/libsas.h>
>>>
>>> @@ -378,7 +379,6 @@ struct hisi_hba {
>>>       u32 intr_coal_count;    /* Interrupt count to coalesce */
>>>
>>>       int cq_nvecs;
>>> -     unsigned int *reply_map;
>>>
>>>       /* debugfs memories */
>>>       u32 *debugfs_global_reg;
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> index 8a7feb8ed8d6..a1c1f30b9fdb 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> @@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
>>>       return rc;
>>>  }
>>>
>>> +static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
>>> +{
>>> +     if (!task->uldd_task)
>>> +             return NULL;
>>> +
>>> +     if (dev_is_sata(task->dev)) {
>>> +             struct ata_queued_cmd *qc = task->uldd_task;
>>> +             return qc->scsicmd;
>>> +     } else {
>>> +             return task->uldd_task;
>>> +     }
>>> +}
>>> +
>>>  static int hisi_sas_task_prep(struct sas_task *task,
>>>                             struct hisi_sas_dq **dq_pointer,
>>>                             bool is_tmf, struct hisi_sas_tmf_task *tmf,
>>> @@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
>>>       struct hisi_sas_dq *dq;
>>>       unsigned long flags;
>>>       int wr_q_index;
>>> +     struct scsi_cmnd *scsi_cmnd;
>>>
>>>       if (DEV_IS_GONE(sas_dev)) {
>>>               if (sas_dev)
>>> @@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
>>>               return -ECOMM;
>>>       }
>>>
>>> -     if (hisi_hba->reply_map) {
>>> -             int cpu = raw_smp_processor_id();
>>> -             unsigned int dq_index = hisi_hba->reply_map[cpu];
>>> +     scsi_cmnd = sas_task_to_scsi_cmd(task);
>>> +     if (hisi_hba->shost->hostt->host_tagset) {
>>> +             unsigned int dq_index = scsi_cmnd_hctx_index(
>>> +                             hisi_hba->shost, scsi_cmnd);
>>>
>>>               *dq_pointer = dq = &hisi_hba->dq[dq_index];
>>>       } else {
>>> @@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,
>>>
>>>       if (hisi_hba->hw->slot_index_alloc)
>>>               rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
>>> -     else {
>>> -             struct scsi_cmnd *scsi_cmnd = NULL;
>>> -
>>> -             if (task->uldd_task) {
>>> -                     struct ata_queued_cmd *qc;
>>> -
>>> -                     if (dev_is_sata(device)) {
>>> -                             qc = task->uldd_task;
>>> -                             scsi_cmnd = qc->scsicmd;
>>> -                     } else {
>>> -                             scsi_cmnd = task->uldd_task;
>>> -                     }
>>> -             }
>>> +     else
>>>               rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
>>> -     }
>>>       if (rc < 0)
>>>               goto err_out_dif_dma_unmap;
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
>>> index 49620c2411df..063e50e5b30c 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
>>> @@ -2344,30 +2344,6 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
>>>       return IRQ_HANDLED;
>>>  }
>>>
>>> -static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
>>> -{
>>> -     const struct cpumask *mask;
>>> -     int queue, cpu;
>>> -
>>> -     for (queue = 0; queue < nvecs; queue++) {
>>> -             struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
>>> -
>>> -             mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
>>> -                                         BASE_VECTORS_V3_HW);
>>> -             if (!mask)
>>> -                     goto fallback;
>>> -             cq->pci_irq_mask = mask;
>>> -             for_each_cpu(cpu, mask)
>>> -                     hisi_hba->reply_map[cpu] = queue;
>>> -     }
>>> -     return;
>>> -
>>> -fallback:
>>> -     for_each_possible_cpu(cpu)
>>> -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
>>> -     /* Don't clean all CQ masks */
>>> -}
>>> -
>>>  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>  {
>>>       struct device *dev = hisi_hba->dev;
>>> @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>
>>>               min_msi = MIN_AFFINE_VECTORS_V3_HW;
>>>
>>> -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
>>> -                                                sizeof(unsigned int),
>>> -                                                GFP_KERNEL);
>>> -             if (!hisi_hba->reply_map)
>>> -                     return -ENOMEM;
>>>               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
>>>                                                        min_msi, max_msi,
>>>                                                        PCI_IRQ_MSI |
>>> @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>                                                        &desc);
>>>               if (vectors < 0)
>>>                       return -ENOENT;
>>> -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
>>>       } else {
>>>               min_msi = max_msi;
>>>               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
>>> @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
>>>       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
>>>  }
>>>
>>> +static int hisi_sas_map_queues(struct Scsi_Host *shost)
>>> +{
>>> +     struct hisi_hba *hisi_hba = shost_priv(shost);
>>> +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
>>> +
>>> +     if (auto_affine_msi_experimental)
>>> +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
>>> +                             BASE_VECTORS_V3_HW);
>>> +     else
>>> +             return blk_mq_map_queues(qmap);
>>> +}
>>> +
>>>  static struct scsi_host_template sht_v3_hw = {
>>>       .name                   = DRV_NAME,
>>>       .module                 = THIS_MODULE,
>>
>> As mentioned, we should be using a common function here.
>>
>>> @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
>>>       .scan_start             = hisi_sas_scan_start,
>>>       .change_queue_depth     = sas_change_queue_depth,
>>>       .bios_param             = sas_bios_param,
>>> +     .map_queues             = hisi_sas_map_queues,
>>> +     .host_tagset            = 1,
>>>       .this_id                = -1,
>>>       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
>>>       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
>>> @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>       if (hisi_sas_debugfs_enable)
>>>               hisi_sas_debugfs_init(hisi_hba);
>>>
>>> +     shost->nr_hw_queues = hisi_hba->cq_nvecs;
>>> +
>>>       rc = scsi_add_host(shost, dev);
>>>       if (rc)
>>>               goto err_out_ha;
>>>
>> Well, I'd rather see the v3 hardware converted to 'real' blk-mq first;
>> the hardware itself is pretty much multiqueue already, so we should be
>> better off converting it to blk-mq.
> 
> From John Garry's input, the tags is still hostwide, then not sure how to
> partition the hostwide tags into each hw queue's tags. That can be quite
> hard to do if the queue depth isn't big enough.
> 
Shouldn't be much of an issue; the conversion to blk-mq would still be
using a host-wide tag map.
Problem is more the 'v2' hardware, which has some pretty dodgy hardware
limitations. But I'll be looking into it and will be posting a patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  6:42       ` Hannes Reinecke
@ 2019-05-31  7:14         ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2019-05-31  7:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, John Garry, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 2:42 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 5/31/19 8:34 AM, Ming Lei wrote:
> > On Fri, May 31, 2019 at 2:21 PM Hannes Reinecke <hare@suse.de> wrote:
> >>
> >> On 5/31/19 4:27 AM, Ming Lei wrote:
> >>> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> >>> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> >>> queue via .host_tagset.
> >>>
> >>> Then the private reply mapping can be removed.
> >>>
> >>> Another benefit is that the request/irq lost issue may be solved in
> >>> generic approach because managed IRQ may be shutdown during CPU
> >>> hotplug.
> >>>
> >>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>> ---
> >>>  drivers/scsi/hisi_sas/hisi_sas.h       |  2 +-
> >>>  drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++++++----------
> >>>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 46 +++++++++-----------------
> >>>  3 files changed, 36 insertions(+), 48 deletions(-)
> >>>
> >>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
> >>> index fc87994b5d73..3d48848dbde7 100644
> >>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> >>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> >>> @@ -26,6 +26,7 @@
> >>>  #include <linux/platform_device.h>
> >>>  #include <linux/property.h>
> >>>  #include <linux/regmap.h>
> >>> +#include <linux/blk-mq-pci.h>
> >>>  #include <scsi/sas_ata.h>
> >>>  #include <scsi/libsas.h>
> >>>
> >>> @@ -378,7 +379,6 @@ struct hisi_hba {
> >>>       u32 intr_coal_count;    /* Interrupt count to coalesce */
> >>>
> >>>       int cq_nvecs;
> >>> -     unsigned int *reply_map;
> >>>
> >>>       /* debugfs memories */
> >>>       u32 *debugfs_global_reg;
> >>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> >>> index 8a7feb8ed8d6..a1c1f30b9fdb 100644
> >>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> >>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> >>> @@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
> >>>       return rc;
> >>>  }
> >>>
> >>> +static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
> >>> +{
> >>> +     if (!task->uldd_task)
> >>> +             return NULL;
> >>> +
> >>> +     if (dev_is_sata(task->dev)) {
> >>> +             struct ata_queued_cmd *qc = task->uldd_task;
> >>> +             return qc->scsicmd;
> >>> +     } else {
> >>> +             return task->uldd_task;
> >>> +     }
> >>> +}
> >>> +
> >>>  static int hisi_sas_task_prep(struct sas_task *task,
> >>>                             struct hisi_sas_dq **dq_pointer,
> >>>                             bool is_tmf, struct hisi_sas_tmf_task *tmf,
> >>> @@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >>>       struct hisi_sas_dq *dq;
> >>>       unsigned long flags;
> >>>       int wr_q_index;
> >>> +     struct scsi_cmnd *scsi_cmnd;
> >>>
> >>>       if (DEV_IS_GONE(sas_dev)) {
> >>>               if (sas_dev)
> >>> @@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >>>               return -ECOMM;
> >>>       }
> >>>
> >>> -     if (hisi_hba->reply_map) {
> >>> -             int cpu = raw_smp_processor_id();
> >>> -             unsigned int dq_index = hisi_hba->reply_map[cpu];
> >>> +     scsi_cmnd = sas_task_to_scsi_cmd(task);
> >>> +     if (hisi_hba->shost->hostt->host_tagset) {
> >>> +             unsigned int dq_index = scsi_cmnd_hctx_index(
> >>> +                             hisi_hba->shost, scsi_cmnd);
> >>>
> >>>               *dq_pointer = dq = &hisi_hba->dq[dq_index];
> >>>       } else {
> >>> @@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >>>
> >>>       if (hisi_hba->hw->slot_index_alloc)
> >>>               rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
> >>> -     else {
> >>> -             struct scsi_cmnd *scsi_cmnd = NULL;
> >>> -
> >>> -             if (task->uldd_task) {
> >>> -                     struct ata_queued_cmd *qc;
> >>> -
> >>> -                     if (dev_is_sata(device)) {
> >>> -                             qc = task->uldd_task;
> >>> -                             scsi_cmnd = qc->scsicmd;
> >>> -                     } else {
> >>> -                             scsi_cmnd = task->uldd_task;
> >>> -                     }
> >>> -             }
> >>> +     else
> >>>               rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
> >>> -     }
> >>>       if (rc < 0)
> >>>               goto err_out_dif_dma_unmap;
> >>>
> >>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> >>> index 49620c2411df..063e50e5b30c 100644
> >>> --- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> >>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
> >>> @@ -2344,30 +2344,6 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
> >>>       return IRQ_HANDLED;
> >>>  }
> >>>
> >>> -static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
> >>> -{
> >>> -     const struct cpumask *mask;
> >>> -     int queue, cpu;
> >>> -
> >>> -     for (queue = 0; queue < nvecs; queue++) {
> >>> -             struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
> >>> -
> >>> -             mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
> >>> -                                         BASE_VECTORS_V3_HW);
> >>> -             if (!mask)
> >>> -                     goto fallback;
> >>> -             cq->pci_irq_mask = mask;
> >>> -             for_each_cpu(cpu, mask)
> >>> -                     hisi_hba->reply_map[cpu] = queue;
> >>> -     }
> >>> -     return;
> >>> -
> >>> -fallback:
> >>> -     for_each_possible_cpu(cpu)
> >>> -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
> >>> -     /* Don't clean all CQ masks */
> >>> -}
> >>> -
> >>>  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >>>  {
> >>>       struct device *dev = hisi_hba->dev;
> >>> @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >>>
> >>>               min_msi = MIN_AFFINE_VECTORS_V3_HW;
> >>>
> >>> -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
> >>> -                                                sizeof(unsigned int),
> >>> -                                                GFP_KERNEL);
> >>> -             if (!hisi_hba->reply_map)
> >>> -                     return -ENOMEM;
> >>>               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
> >>>                                                        min_msi, max_msi,
> >>>                                                        PCI_IRQ_MSI |
> >>> @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> >>>                                                        &desc);
> >>>               if (vectors < 0)
> >>>                       return -ENOENT;
> >>> -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
> >>>       } else {
> >>>               min_msi = max_msi;
> >>>               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
> >>> @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
> >>>       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
> >>>  }
> >>>
> >>> +static int hisi_sas_map_queues(struct Scsi_Host *shost)
> >>> +{
> >>> +     struct hisi_hba *hisi_hba = shost_priv(shost);
> >>> +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> >>> +
> >>> +     if (auto_affine_msi_experimental)
> >>> +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
> >>> +                             BASE_VECTORS_V3_HW);
> >>> +     else
> >>> +             return blk_mq_map_queues(qmap);
> >>> +}
> >>> +
> >>>  static struct scsi_host_template sht_v3_hw = {
> >>>       .name                   = DRV_NAME,
> >>>       .module                 = THIS_MODULE,
> >>
> >> As mentioned, we should be using a common function here.
> >>
> >>> @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
> >>>       .scan_start             = hisi_sas_scan_start,
> >>>       .change_queue_depth     = sas_change_queue_depth,
> >>>       .bios_param             = sas_bios_param,
> >>> +     .map_queues             = hisi_sas_map_queues,
> >>> +     .host_tagset            = 1,
> >>>       .this_id                = -1,
> >>>       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
> >>>       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
> >>> @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >>>       if (hisi_sas_debugfs_enable)
> >>>               hisi_sas_debugfs_init(hisi_hba);
> >>>
> >>> +     shost->nr_hw_queues = hisi_hba->cq_nvecs;
> >>> +
> >>>       rc = scsi_add_host(shost, dev);
> >>>       if (rc)
> >>>               goto err_out_ha;
> >>>
> >> Well, I'd rather see the v3 hardware converted to 'real' blk-mq first;
> >> the hardware itself is pretty much multiqueue already, so we should be
> >> better off converting it to blk-mq.
> >
> > From John Garry's input, the tags is still hostwide, then not sure how to
> > partition the hostwide tags into each hw queue's tags. That can be quite
> > hard to do if the queue depth isn't big enough.
> >
> Shouldn't be much of an issue; the conversion to blk-mq would still be
> using a host-wide tag map.

Could you explain a bit more? Because that is exactly what this patch is doing
(expose MQ on host-wide tag)


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31  6:34     ` Ming Lei
  2019-05-31  6:42       ` Hannes Reinecke
@ 2019-05-31 11:38       ` John Garry
  2019-06-03 11:00         ` Ming Lei
  1 sibling, 1 reply; 48+ messages in thread
From: John Garry @ 2019-05-31 11:38 UTC (permalink / raw)
  To: Ming Lei, Hannes Reinecke
  Cc: Ming Lei, Jens Axboe, linux-block, Linux SCSI List,
	Martin K . Petersen, James Bottomley, Bart Van Assche,
	Hannes Reinecke, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig


>>> -fallback:
>>> -     for_each_possible_cpu(cpu)
>>> -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
>>> -     /* Don't clean all CQ masks */
>>> -}
>>> -
>>>  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>  {
>>>       struct device *dev = hisi_hba->dev;
>>> @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>
>>>               min_msi = MIN_AFFINE_VECTORS_V3_HW;
>>>
>>> -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
>>> -                                                sizeof(unsigned int),
>>> -                                                GFP_KERNEL);
>>> -             if (!hisi_hba->reply_map)
>>> -                     return -ENOMEM;
>>>               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
>>>                                                        min_msi, max_msi,
>>>                                                        PCI_IRQ_MSI |
>>> @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>                                                        &desc);
>>>               if (vectors < 0)
>>>                       return -ENOENT;
>>> -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
>>>       } else {
>>>               min_msi = max_msi;
>>>               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
>>> @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
>>>       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
>>>  }
>>>
>>> +static int hisi_sas_map_queues(struct Scsi_Host *shost)
>>> +{
>>> +     struct hisi_hba *hisi_hba = shost_priv(shost);
>>> +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
>>> +
>>> +     if (auto_affine_msi_experimental)
>>> +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
>>> +                             BASE_VECTORS_V3_HW);
>>> +     else
>>> +             return blk_mq_map_queues(qmap);

I don't think that the mapping which blk_mq_map_queues() creates are not 
want we want. I'm guessing that we still would like a mapping similar to 
what blk_mq_pci_map_queues() produces, which is an even spread, putting 
adjacent CPUs on the same queue.

For my system with 96 cpus and 16 queues, blk_mq_map_queues() would map 
queue 0 to cpu 0, 16, 32, 48 ..., queue 1 to cpu 1, 17, 33 and so on.

>>> +}
>>> +
>>>  static struct scsi_host_template sht_v3_hw = {
>>>       .name                   = DRV_NAME,
>>>       .module                 = THIS_MODULE,
>>
>> As mentioned, we should be using a common function here.
>>
>>> @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
>>>       .scan_start             = hisi_sas_scan_start,
>>>       .change_queue_depth     = sas_change_queue_depth,
>>>       .bios_param             = sas_bios_param,
>>> +     .map_queues             = hisi_sas_map_queues,
>>> +     .host_tagset            = 1,
>>>       .this_id                = -1,
>>>       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
>>>       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
>>> @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>       if (hisi_sas_debugfs_enable)
>>>               hisi_sas_debugfs_init(hisi_hba);
>>>
>>> +     shost->nr_hw_queues = hisi_hba->cq_nvecs;

There's an ordering issue here, which can be fixed without too much trouble.

Value hisi_hba->cq_nvecs is not set until after this point, in 
hisi_sas_v3_probe()->hw->hw_init->hisi_sas_v3_init()->interrupt_init_v3_hw() 


Please see revised patch, below.


>>> +
>>>       rc = scsi_add_host(shost, dev);
>>>       if (rc)
>>>               goto err_out_ha;
>>>
>> Well, I'd rather see the v3 hardware converted to 'real' blk-mq first;
>> the hardware itself is pretty much multiqueue already, so we should be
>> better off converting it to blk-mq.
>
>>From John Garry's input, the tags is still hostwide, then not sure how to
> partition the hostwide tags into each hw queue's tags. That can be quite
> hard to do if the queue depth isn't big enough.

JFYI, There is no limition on which command tags can be used on which queue.

And, as I mentioned in response to "hisi_sas_v3: multiqueue support", 
the hw queue depth is configurable, and we make it same value as max 
commands tags, that being 4096.

>
> Thanks,
> Ming Lei
>
> .
>

Thanks,
John

 From b3c4ded715e1a7282f59fbd216bd2f0e852986aa Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@redhat.com>
Date: Fri, 31 May 2019 10:27:59 +0800
Subject: [PATCH] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw
  queue

SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by IRQ vector, so map te private reply queue into blk-mq's hw
queue via .host_tagset.

Then the private reply mapping can be removed.

Another benefit is that the request/irq lost issue may be solved in
generic approach because managed IRQ may be shutdown during CPU
hotplug.

Signed-off-by: Ming Lei <ming.lei@redhat.com>

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h 
b/drivers/scsi/hisi_sas/hisi_sas.h
index fc87994b5d73..3d48848dbde7 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -26,6 +26,7 @@
  #include <linux/platform_device.h>
  #include <linux/property.h>
  #include <linux/regmap.h>
+#include <linux/blk-mq-pci.h>
  #include <scsi/sas_ata.h>
  #include <scsi/libsas.h>

@@ -378,7 +379,6 @@ struct hisi_hba {
  	u32 intr_coal_count;	/* Interrupt count to coalesce */

  	int cq_nvecs;
-	unsigned int *reply_map;

  	/* debugfs memories */
  	u32 *debugfs_global_reg;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c 
b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 8a7feb8ed8d6..a1c1f30b9fdb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -441,6 +441,19 @@ static int hisi_sas_dif_dma_map(struct hisi_hba 
*hisi_hba,
  	return rc;
  }

+static struct scsi_cmnd *sas_task_to_scsi_cmd(struct sas_task *task)
+{
+	if (!task->uldd_task)
+		return NULL;
+
+	if (dev_is_sata(task->dev)) {
+		struct ata_queued_cmd *qc = task->uldd_task;
+		return qc->scsicmd;
+	} else {
+		return task->uldd_task;
+	}
+}
+
  static int hisi_sas_task_prep(struct sas_task *task,
  			      struct hisi_sas_dq **dq_pointer,
  			      bool is_tmf, struct hisi_sas_tmf_task *tmf,
@@ -459,6 +472,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
  	struct hisi_sas_dq *dq;
  	unsigned long flags;
  	int wr_q_index;
+	struct scsi_cmnd *scsi_cmnd;

  	if (DEV_IS_GONE(sas_dev)) {
  		if (sas_dev)
@@ -471,9 +485,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
  		return -ECOMM;
  	}

-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	scsi_cmnd = sas_task_to_scsi_cmd(task);
+	if (hisi_hba->shost->hostt->host_tagset) {
+		unsigned int dq_index = scsi_cmnd_hctx_index(
+				hisi_hba->shost, scsi_cmnd);

  		*dq_pointer = dq = &hisi_hba->dq[dq_index];
  	} else {
@@ -503,21 +518,8 @@ static int hisi_sas_task_prep(struct sas_task *task,

  	if (hisi_hba->hw->slot_index_alloc)
  		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
-
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
+	else
  		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
  	if (rc < 0)
  		goto err_out_dif_dma_unmap;

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 49620c2411df..0aa750cbefb3 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2344,36 +2344,9 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, 
void *p)
  	return IRQ_HANDLED;
  }

-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_pre_init_v3_hw(struct hisi_hba *hisi_hba)
  {
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
-
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
-
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
-}
-
-static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
-{
-	struct device *dev = hisi_hba->dev;
-	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc;
-	int i, k;
+	int vectors;
  	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;

  	if (auto_affine_msi_experimental) {
@@ -2383,11 +2356,6 @@ static int interrupt_init_v3_hw(struct hisi_hba 
*hisi_hba)

  		min_msi = MIN_AFFINE_VECTORS_V3_HW;

-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
  		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
  							 min_msi, max_msi,
  							 PCI_IRQ_MSI |
@@ -2395,7 +2363,6 @@ static int interrupt_init_v3_hw(struct hisi_hba 
*hisi_hba)
  							 &desc);
  		if (vectors < 0)
  			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
  	} else {
  		min_msi = max_msi;
  		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
@@ -2403,16 +2370,25 @@ static int interrupt_init_v3_hw(struct hisi_hba 
*hisi_hba)
  		if (vectors < 0)
  			return vectors;
  	}
-
  	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;

+	return 0;
+}
+
+static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
+{
+	struct device *dev = hisi_hba->dev;
+	struct pci_dev *pdev = hisi_hba->pci_dev;
+	int rc, i, k;
+
+	dev_err(dev,  "%s hisi_hba->cq_nvecs=%d\n", __func__, hisi_hba->cq_nvecs);
+
  	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
  			      int_phy_up_down_bcast_v3_hw, 0,
  			      DRV_NAME " phy", hisi_hba);
  	if (rc) {
  		dev_err(dev, "could not request phy interrupt, rc=%d\n", rc);
-		rc = -ENOENT;
-		goto free_irq_vectors;
+		return -ENOENT;
  	}

  	rc = devm_request_irq(dev, pci_irq_vector(pdev, 2),
@@ -2467,8 +2443,6 @@ static int interrupt_init_v3_hw(struct hisi_hba 
*hisi_hba)
  	free_irq(pci_irq_vector(pdev, 2), hisi_hba);
  free_phy_irq:
  	free_irq(pci_irq_vector(pdev, 1), hisi_hba);
-free_irq_vectors:
-	pci_free_irq_vectors(pdev);
  	return rc;
  }

@@ -2896,6 +2870,18 @@ static void debugfs_snapshot_restore_v3_hw(struct 
hisi_hba *hisi_hba)
  	clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
  }

+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	if (auto_affine_msi_experimental)
+		return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+				BASE_VECTORS_V3_HW);
+	else
+		return blk_mq_map_queues(qmap);
+}
+
  static struct scsi_host_template sht_v3_hw = {
  	.name			= DRV_NAME,
  	.module			= THIS_MODULE,
@@ -2906,6 +2892,8 @@ static struct scsi_host_template sht_v3_hw = {
  	.scan_start		= hisi_sas_scan_start,
  	.change_queue_depth	= sas_change_queue_depth,
  	.bios_param		= sas_bios_param,
+	.map_queues		= hisi_sas_map_queues,
+	.host_tagset		= 1,
  	.this_id		= -1,
  	.sg_tablesize		= HISI_SAS_SGE_PAGE_CNT,
  	.sg_prot_tablesize	= HISI_SAS_SGE_PAGE_CNT,
@@ -3092,15 +3080,21 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
  	if (hisi_sas_debugfs_enable)
  		hisi_sas_debugfs_init(hisi_hba);

+
+	rc = interrupt_pre_init_v3_hw(hisi_hba);
+	if (rc < 0)
+		goto err_out_interrupts;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
+
  	rc = scsi_add_host(shost, dev);
  	if (rc)
-		goto err_out_ha;
+		goto err_out_interrupts;

  	rc = sas_register_ha(sha);
  	if (rc)
  		goto err_out_register_ha;

-	rc = hisi_hba->hw->hw_init(hisi_hba);
+	rc = hisi_sas_v3_init(hisi_hba);
  	if (rc)
  		goto err_out_register_ha;

@@ -3110,6 +3104,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)

  err_out_register_ha:
  	scsi_remove_host(shost);
+err_out_interrupts:
+	pci_free_irq_vectors(pdev);
  err_out_ha:
  	scsi_host_put(shost);
  err_out_regions:
-- 
2.17.1






^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
  2019-05-31  6:07   ` Hannes Reinecke
@ 2019-05-31 15:37   ` Bart Van Assche
  2019-06-24  8:44     ` Ming Lei
  2019-06-05 14:10   ` John Garry
  2 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2019-05-31 15:37 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Hannes Reinecke, John Garry, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/30/19 7:27 PM, Ming Lei wrote:
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 6aea0ebc3a73..3d6780504dcb 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -237,6 +237,7 @@ static const char *const alloc_policy_name[] = {
>   static const char *const hctx_flag_name[] = {
>   	HCTX_FLAG_NAME(SHOULD_MERGE),
>   	HCTX_FLAG_NAME(TAG_SHARED),
> +	HCTX_FLAG_NAME(HOST_TAGS),
>   	HCTX_FLAG_NAME(BLOCKING),
>   	HCTX_FLAG_NAME(NO_SCHED),
>   };

The name BLK_MQ_F_HOST_TAGS suggests that tags are shared across a SCSI 
host. That is misleading since this flag means that tags are shared 
across hardware queues. Additionally, the "host" term is a term that 
comes from the SCSI world and this patch is a block layer patch. That 
makes me wonder whether another name should be used to reflect that all 
hardware queues share the same tag set? How about renaming 
BLK_MQ_F_TAG_SHARED into BLK_MQ_F_TAG_QUEUE_SHARED and renaming 
BLK_MQ_F_HOST_TAGS into BLK_MQ_F_TAG_HCTX_SHARED?

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'
  2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
  2019-05-31  6:08   ` Hannes Reinecke
@ 2019-05-31 15:39   ` Bart Van Assche
  2019-06-24  8:43     ` Ming Lei
  2019-06-02  1:56   ` Minwoo Im
  2 siblings, 1 reply; 48+ messages in thread
From: Bart Van Assche @ 2019-05-31 15:39 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Hannes Reinecke, John Garry, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/30/19 7:27 PM, Ming Lei wrote:
> +static int g_host_tags = 0;

Static variables should not be explicitly initialized to zero.

> +module_param_named(host_tags, g_host_tags, int, S_IRUGO);
> +MODULE_PARM_DESC(host_tags, "All submission queues share one tags");
                                                             ^^^^^^^^
Did you perhaps mean "one tagset"?

Bart.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-05-31  2:28 ` [PATCH 8/9] scsi: megaraid: " Ming Lei
  2019-05-31  6:22   ` Hannes Reinecke
@ 2019-06-01 21:41   ` Kashyap Desai
  2019-06-02  6:42     ` Ming Lei
  1 sibling, 1 reply; 48+ messages in thread
From: Kashyap Desai @ 2019-06-01 21:41 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
assigned by
> IRQ vector, so map te private reply queue into blk-mq's hw queue via
> .host_tagset.
>
> Then the private reply mapping can be removed.
>
> Another benefit is that the request/irq lost issue may be solved in
generic
> approach because managed IRQ may be shutdown during CPU hotplug.

Ming,

I quickly tested this patch series on MegaRaid Aero controller. Without
this patch I can get 3.0M IOPS, but once I apply this patch I see only
1.2M IOPS (40% Performance drop)
HBA supports 5089 can_queue.

<perf top> output without  patch -

    3.39%  [megaraid_sas]  [k] complete_cmd_fusion
     3.36%  [kernel]        [k] scsi_queue_rq
     3.26%  [kernel]        [k] entry_SYSCALL_64
     2.57%  [kernel]        [k] syscall_return_via_sysret
     1.95%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
     1.88%  [kernel]        [k] _raw_spin_lock_irqsave
     1.79%  [kernel]        [k] gup_pmd_range
     1.73%  [kernel]        [k] _raw_spin_lock
     1.68%  [kernel]        [k] __sched_text_start
     1.19%  [kernel]        [k] irq_entries_start
     1.13%  [kernel]        [k] scsi_dec_host_busy
     1.08%  [kernel]        [k] aio_complete
     1.07%  [kernel]        [k] read_tsc
     1.01%  [kernel]        [k] blk_mq_get_request
     0.93%  [kernel]        [k] __update_load_avg_cfs_rq
     0.92%  [kernel]        [k] aio_read_events
     0.91%  [kernel]        [k] lookup_ioctx
     0.91%  fio             [.] fio_gettime
     0.87%  [kernel]        [k] set_next_entity
     0.87%  [megaraid_sas]  [k] megasas_build_ldio_fusion

<perf top> output with  patch -

    11.30%  [kernel]       [k] native_queued_spin_lock_slowpath
     3.37%  [kernel]       [k] sbitmap_any_bit_set
     2.91%  [kernel]       [k] blk_mq_run_hw_queue
     2.32%  [kernel]       [k] _raw_spin_lock_irqsave
     2.29%  [kernel]       [k] menu_select
     2.04%  [kernel]       [k] entry_SYSCALL_64
     2.03%  [kernel]       [k] __sched_text_start
     1.70%  [kernel]       [k] scsi_queue_rq
     1.66%  [kernel]       [k] _raw_spin_lock
     1.58%  [kernel]       [k] syscall_return_via_sysret
     1.33%  [kernel]       [k] native_write_msr
     1.20%  [kernel]       [k] read_tsc
     1.13%  [kernel]       [k] blk_mq_run_hw_queues
     1.13%  [kernel]       [k] __sbq_wake_up
     1.01%  [kernel]       [k] irq_entries_start
     1.00%  [kernel]       [k] switch_mm_irqs_off
     0.99%  [kernel]       [k] gup_pmd_range
     0.98%  [kernel]       [k] __update_load_avg_cfs_rq
     0.98%  [kernel]       [k] set_next_entity
     0.92%  [kernel]       [k] do_idle

Kashyap

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'
  2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
  2019-05-31  6:08   ` Hannes Reinecke
  2019-05-31 15:39   ` Bart Van Assche
@ 2019-06-02  1:56   ` Minwoo Im
  2 siblings, 0 replies; 48+ messages in thread
From: Minwoo Im @ 2019-06-02  1:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 19-05-31 10:27:54, Ming Lei wrote:
> Introduces parameter of 'g_host_tags' for testing hostwide tags.
> 
> Not observe performance drop in the following test:
> 
> 1) no 'host_tags', hw queue depth is 16, and 1 hw queue
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=0 submit_queues=1 hw_queue_depth=16
> 
> 2) 'host_tags', global hw queue depth is 16, and 8 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=1 submit_queues=8 hw_queue_depth=16
> 
> 3) fio test command:
> 
> fio --bs=4k --ioengine=libaio --iodepth=16 --filename=/dev/nullb0:/dev/nullb1:/dev/nullb2:/dev/nullb3 --direct=1 --runtime=30 --numjobs=16 --rw=randread --name=test --group_reporting --gtod_reduce=1
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/null_blk_main.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
> index 447d635c79a2..3c04446f3649 100644
> --- a/drivers/block/null_blk_main.c
> +++ b/drivers/block/null_blk_main.c
> @@ -91,6 +91,10 @@ static int g_submit_queues = 1;
>  module_param_named(submit_queues, g_submit_queues, int, 0444);
>  MODULE_PARM_DESC(submit_queues, "Number of submission queues");
>  
> +static int g_host_tags = 0;
> +module_param_named(host_tags, g_host_tags, int, S_IRUGO);
> +MODULE_PARM_DESC(host_tags, "All submission queues share one tags");
> +
>  static int g_home_node = NUMA_NO_NODE;
>  module_param_named(home_node, g_home_node, int, 0444);
>  MODULE_PARM_DESC(home_node, "Home node for the device");
> @@ -1554,6 +1558,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
>  	set->flags = BLK_MQ_F_SHOULD_MERGE;
>  	if (g_no_sched)
>  		set->flags |= BLK_MQ_F_NO_SCHED;
> +	if (g_host_tags)
> +		set->flags |= BLK_MQ_F_HOST_TAGS;

Hi Ming,

I think it would be great if you can provide documentation update also
for the null_blk (Documenation/block/null_blk.txt).  Also what Bart has
pointed out can be applied, I guess.

Otherwise, it looks good to me.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] scsi_debug: support host tagset
  2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
  2019-05-31  6:09   ` Hannes Reinecke
@ 2019-06-02  2:03   ` Minwoo Im
  2019-06-02 17:01   ` Douglas Gilbert
  2 siblings, 0 replies; 48+ messages in thread
From: Minwoo Im @ 2019-06-02  2:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 19-05-31 10:27:56, Ming Lei wrote:
> The 'host_tagset' can be set on scsi_debug device for testing
> shared hostwide tags on multiple blk-mq hw queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/scsi_debug.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index d323523f5f9d..8cf3f6c3f4f9 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -665,6 +665,7 @@ static bool have_dif_prot;
>  static bool write_since_sync;
>  static bool sdebug_statistics = DEF_STATISTICS;
>  static bool sdebug_wp;
> +static bool sdebug_host_tagset = false;

Hi Ming, 

I think we can leave it without an initialisation just like the others above.

>  
>  static unsigned int sdebug_store_sectors;
>  static sector_t sdebug_capacity;	/* in sectors */
> @@ -4468,6 +4469,7 @@ module_param_named(vpd_use_hostno, sdebug_vpd_use_hostno, int,
>  module_param_named(wp, sdebug_wp, bool, S_IRUGO | S_IWUSR);
>  module_param_named(write_same_length, sdebug_write_same_length, int,
>  		   S_IRUGO | S_IWUSR);
> +module_param_named(host_tagset, sdebug_host_tagset, bool, S_IRUGO | S_IWUSR);
>  
>  MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
>  MODULE_DESCRIPTION("SCSI debug adapter driver");
> @@ -5779,6 +5781,7 @@ static int sdebug_driver_probe(struct device *dev)
>  	sdbg_host = to_sdebug_host(dev);
>  
>  	sdebug_driver_template.can_queue = sdebug_max_queue;
> +	sdebug_driver_template.host_tagset = sdebug_host_tagset;
>  	if (!sdebug_clustering)
>  		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;

Otherwise: It looks good to me in host tagset point of view.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-01 21:41   ` Kashyap Desai
@ 2019-06-02  6:42     ` Ming Lei
  2019-06-02  7:48       ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-06-02  6:42 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

Hi Kashyap,

Thanks for your test.

On Sun, Jun 02, 2019 at 03:11:26AM +0530, Kashyap Desai wrote:
> > SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by
> > IRQ vector, so map te private reply queue into blk-mq's hw queue via
> > .host_tagset.
> >
> > Then the private reply mapping can be removed.
> >
> > Another benefit is that the request/irq lost issue may be solved in
> generic
> > approach because managed IRQ may be shutdown during CPU hotplug.
> 
> Ming,
> 
> I quickly tested this patch series on MegaRaid Aero controller. Without
> this patch I can get 3.0M IOPS, but once I apply this patch I see only
> 1.2M IOPS (40% Performance drop)
> HBA supports 5089 can_queue.
> 
> <perf top> output without  patch -
> 
>     3.39%  [megaraid_sas]  [k] complete_cmd_fusion
>      3.36%  [kernel]        [k] scsi_queue_rq
>      3.26%  [kernel]        [k] entry_SYSCALL_64
>      2.57%  [kernel]        [k] syscall_return_via_sysret
>      1.95%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
>      1.88%  [kernel]        [k] _raw_spin_lock_irqsave
>      1.79%  [kernel]        [k] gup_pmd_range
>      1.73%  [kernel]        [k] _raw_spin_lock
>      1.68%  [kernel]        [k] __sched_text_start
>      1.19%  [kernel]        [k] irq_entries_start
>      1.13%  [kernel]        [k] scsi_dec_host_busy
>      1.08%  [kernel]        [k] aio_complete
>      1.07%  [kernel]        [k] read_tsc
>      1.01%  [kernel]        [k] blk_mq_get_request
>      0.93%  [kernel]        [k] __update_load_avg_cfs_rq
>      0.92%  [kernel]        [k] aio_read_events
>      0.91%  [kernel]        [k] lookup_ioctx
>      0.91%  fio             [.] fio_gettime
>      0.87%  [kernel]        [k] set_next_entity
>      0.87%  [megaraid_sas]  [k] megasas_build_ldio_fusion
> 
> <perf top> output with  patch -
> 
>     11.30%  [kernel]       [k] native_queued_spin_lock_slowpath

I guess there must be one global lock required in megaraid submission path,
could you run 'perf record -g -a' to see which lock is and what the stack
trace is?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-02  6:42     ` Ming Lei
@ 2019-06-02  7:48       ` Ming Lei
  2019-06-02 16:34         ` Kashyap Desai
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-06-02  7:48 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

On Sun, Jun 02, 2019 at 02:42:02PM +0800, Ming Lei wrote:
> Hi Kashyap,
> 
> Thanks for your test.
> 
> On Sun, Jun 02, 2019 at 03:11:26AM +0530, Kashyap Desai wrote:
> > > SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> > assigned by
> > > IRQ vector, so map te private reply queue into blk-mq's hw queue via
> > > .host_tagset.
> > >
> > > Then the private reply mapping can be removed.
> > >
> > > Another benefit is that the request/irq lost issue may be solved in
> > generic
> > > approach because managed IRQ may be shutdown during CPU hotplug.
> > 
> > Ming,
> > 
> > I quickly tested this patch series on MegaRaid Aero controller. Without
> > this patch I can get 3.0M IOPS, but once I apply this patch I see only
> > 1.2M IOPS (40% Performance drop)
> > HBA supports 5089 can_queue.
> > 
> > <perf top> output without  patch -
> > 
> >     3.39%  [megaraid_sas]  [k] complete_cmd_fusion
> >      3.36%  [kernel]        [k] scsi_queue_rq
> >      3.26%  [kernel]        [k] entry_SYSCALL_64
> >      2.57%  [kernel]        [k] syscall_return_via_sysret
> >      1.95%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
> >      1.88%  [kernel]        [k] _raw_spin_lock_irqsave
> >      1.79%  [kernel]        [k] gup_pmd_range
> >      1.73%  [kernel]        [k] _raw_spin_lock
> >      1.68%  [kernel]        [k] __sched_text_start
> >      1.19%  [kernel]        [k] irq_entries_start
> >      1.13%  [kernel]        [k] scsi_dec_host_busy
> >      1.08%  [kernel]        [k] aio_complete
> >      1.07%  [kernel]        [k] read_tsc
> >      1.01%  [kernel]        [k] blk_mq_get_request
> >      0.93%  [kernel]        [k] __update_load_avg_cfs_rq
> >      0.92%  [kernel]        [k] aio_read_events
> >      0.91%  [kernel]        [k] lookup_ioctx
> >      0.91%  fio             [.] fio_gettime
> >      0.87%  [kernel]        [k] set_next_entity
> >      0.87%  [megaraid_sas]  [k] megasas_build_ldio_fusion
> > 
> > <perf top> output with  patch -
> > 
> >     11.30%  [kernel]       [k] native_queued_spin_lock_slowpath
> 
> I guess there must be one global lock required in megaraid submission path,
> could you run 'perf record -g -a' to see which lock is and what the stack
> trace is?

Meantime please try the following patch and see if difference can be made.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 49d73d979cb3..d2abec3b0f60 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -589,7 +589,7 @@ static void __blk_mq_complete_request(struct request *rq)
 	 * So complete IO reqeust in softirq context in case of single queue
 	 * for not degrading IO performance by irqsoff latency.
 	 */
-	if (q->nr_hw_queues == 1) {
+	if (q->nr_hw_queues == 1 || (rq->mq_hctx->flags & BLK_MQ_F_HOST_TAGS)) {
 		__blk_complete_request(rq);
 		return;
 	}
@@ -1977,7 +1977,8 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		/* bypass scheduler for flush rq */
 		blk_insert_flush(rq);
 		blk_mq_run_hw_queue(data.hctx, true);
-	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs)) {
+	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs ||
+				(data.hctx->flags & BLK_MQ_F_HOST_TAGS))) {
 		/*
 		 * Use plugging if we have a ->commit_rqs() hook as well, as
 		 * we know the driver uses bd->last in a smart fashion.

thanks,
Ming

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* RE: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-02  7:48       ` Ming Lei
@ 2019-06-02 16:34         ` Kashyap Desai
  2019-06-03  3:56           ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: Kashyap Desai @ 2019-06-02 16:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

> Meantime please try the following patch and see if difference can be
made.
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c index
> 49d73d979cb3..d2abec3b0f60 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -589,7 +589,7 @@ static void __blk_mq_complete_request(struct
> request *rq)
>  	 * So complete IO reqeust in softirq context in case of single
queue
>  	 * for not degrading IO performance by irqsoff latency.
>  	 */
> -	if (q->nr_hw_queues == 1) {
> +	if (q->nr_hw_queues == 1 || (rq->mq_hctx->flags &
> BLK_MQ_F_HOST_TAGS))
> +{
>  		__blk_complete_request(rq);
>  		return;
>  	}
> @@ -1977,7 +1977,8 @@ static blk_qc_t blk_mq_make_request(struct
> request_queue *q, struct bio *bio)
>  		/* bypass scheduler for flush rq */
>  		blk_insert_flush(rq);
>  		blk_mq_run_hw_queue(data.hctx, true);
> -	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops-
> >commit_rqs)) {
> +	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs
> ||
> +				(data.hctx->flags & BLK_MQ_F_HOST_TAGS)))
> {
>  		/*
>  		 * Use plugging if we have a ->commit_rqs() hook as well,
as
>  		 * we know the driver uses bd->last in a smart fashion.

Ming -

I tried above patch and no improvement in performance.

Below is perf record data - lock contention is while getting the tag
(blk_mq_get_tag )

6.67%     6.67%  fio              [kernel.vmlinux]  [k]
native_queued_spin_lock_slowpath
   - 6.66% io_submit
      - 6.66% entry_SYSCALL_64
         - do_syscall_64
            - 6.66% __x64_sys_io_submit
               - 6.66% io_submit_one
                  - 6.66% aio_read
                     - 6.66% generic_file_read_iter
                        - 6.66% blkdev_direct_IO
                           - 6.65% submit_bio
                              - generic_make_request
                                 - 6.65% blk_mq_make_request
                                    - 6.65% blk_mq_get_request
                                       - 6.65% blk_mq_get_tag
                                          - 6.58%
prepare_to_wait_exclusive
                                             - 6.57%
_raw_spin_lock_irqsave

queued_spin_lock_slowpath

>
> thanks,
> Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/9] scsi_debug: support host tagset
  2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
  2019-05-31  6:09   ` Hannes Reinecke
  2019-06-02  2:03   ` Minwoo Im
@ 2019-06-02 17:01   ` Douglas Gilbert
  2 siblings, 0 replies; 48+ messages in thread
From: Douglas Gilbert @ 2019-06-02 17:01 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

On 2019-05-30 10:27 p.m., Ming Lei wrote:
> The 'host_tagset' can be set on scsi_debug device for testing
> shared hostwide tags on multiple blk-mq hw queue.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

Hi,
Attached are my suggestions to clean up this patch a bit. It basically
   - drops the unneeded initialization (pointed out in another review)
   - places new module_param_named() in alphabetical order
   - adds MODULE_PARM_DESC() for 'modinfo scsi_debug' online help

Doug Gilbert

> ---
>   drivers/scsi/scsi_debug.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
> index d323523f5f9d..8cf3f6c3f4f9 100644
> --- a/drivers/scsi/scsi_debug.c
> +++ b/drivers/scsi/scsi_debug.c
> @@ -665,6 +665,7 @@ static bool have_dif_prot;
>   static bool write_since_sync;
>   static bool sdebug_statistics = DEF_STATISTICS;
>   static bool sdebug_wp;
> +static bool sdebug_host_tagset = false;
>   
>   static unsigned int sdebug_store_sectors;
>   static sector_t sdebug_capacity;	/* in sectors */
> @@ -4468,6 +4469,7 @@ module_param_named(vpd_use_hostno, sdebug_vpd_use_hostno, int,
>   module_param_named(wp, sdebug_wp, bool, S_IRUGO | S_IWUSR);
>   module_param_named(write_same_length, sdebug_write_same_length, int,
>   		   S_IRUGO | S_IWUSR);
> +module_param_named(host_tagset, sdebug_host_tagset, bool, S_IRUGO | S_IWUSR);
>   
>   MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
>   MODULE_DESCRIPTION("SCSI debug adapter driver");
> @@ -5779,6 +5781,7 @@ static int sdebug_driver_probe(struct device *dev)
>   	sdbg_host = to_sdebug_host(dev);
>   
>   	sdebug_driver_template.can_queue = sdebug_max_queue;
> +	sdebug_driver_template.host_tagset = sdebug_host_tagset;
>   	if (!sdebug_clustering)
>   		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
>   
> 


[-- Attachment #2: 0002-sg-convert-to-blk_mq-hw-queue-Ming-Lei.patch --]
[-- Type: text/x-patch, Size: 2266 bytes --]

From bb14859f821ade9e3a0ee5f187e66a419d310ec0 Mon Sep 17 00:00:00 2001
From: Douglas Gilbert <dgilbert@interlog.com>
Date: Sat, 1 Jun 2019 18:07:51 -0400
Subject: [PATCH 2/2] sg: convert to blk_mq hw queue; Ming Lei

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/scsi_debug.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index e27f4df24021..a880ac4d13f8 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -669,6 +669,7 @@ static bool sdebug_clustering;
 static bool sdebug_host_lock = DEF_HOST_LOCK;
 static bool sdebug_strict = DEF_STRICT;
 static bool sdebug_any_injecting_opt;
+static bool sdebug_host_tagset;
 static bool sdebug_verbose;
 static bool have_dif_prot;
 static bool write_since_sync;
@@ -4515,6 +4516,7 @@ module_param_named(every_nth, sdebug_every_nth, int, S_IRUGO | S_IWUSR);
 module_param_named(fake_rw, sdebug_fake_rw, int, S_IRUGO | S_IWUSR);
 module_param_named(guard, sdebug_guard, uint, S_IRUGO);
 module_param_named(host_lock, sdebug_host_lock, bool, S_IRUGO | S_IWUSR);
+module_param_named(host_tagset, sdebug_host_tagset, bool, 0644);
 module_param_string(inq_vendor, sdebug_inq_vendor_id,
 		    sizeof(sdebug_inq_vendor_id), S_IRUGO|S_IWUSR);
 module_param_string(inq_product, sdebug_inq_product_id,
@@ -4575,6 +4577,7 @@ MODULE_PARM_DESC(every_nth, "timeout every nth command(def=0)");
 MODULE_PARM_DESC(fake_rw, "fake reads/writes instead of copying (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
 MODULE_PARM_DESC(host_lock, "host_lock is ignored (def=0)");
+MODULE_PARM_DESC(host_tagset, "host_tagset for multiple hw queues (def=0)");
 MODULE_PARM_DESC(inq_vendor, "SCSI INQUIRY vendor string (def=\"Linux\")");
 MODULE_PARM_DESC(inq_product, "SCSI INQUIRY product string (def=\"scsi_debug\")");
 MODULE_PARM_DESC(inq_rev, "SCSI INQUIRY revision string (def=\""
@@ -5866,6 +5869,7 @@ static int sdebug_driver_probe(struct device *dev)
 	sdbg_host = to_sdebug_host(dev);
 
 	sdebug_driver_template.can_queue = sdebug_max_queue;
+	sdebug_driver_template.host_tagset = sdebug_host_tagset;
 	if (!sdebug_clustering)
 		sdebug_driver_template.dma_boundary = PAGE_SIZE - 1;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-02 16:34         ` Kashyap Desai
@ 2019-06-03  3:56           ` Ming Lei
  2019-06-03 10:00             ` Kashyap Desai
  2019-06-07  9:45             ` Kashyap Desai
  0 siblings, 2 replies; 48+ messages in thread
From: Ming Lei @ 2019-06-03  3:56 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

Hi Kashyap,

Thanks for collecting the log.

On Sun, Jun 02, 2019 at 10:04:01PM +0530, Kashyap Desai wrote:
> > Meantime please try the following patch and see if difference can be
> made.
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c index
> > 49d73d979cb3..d2abec3b0f60 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -589,7 +589,7 @@ static void __blk_mq_complete_request(struct
> > request *rq)
> >  	 * So complete IO reqeust in softirq context in case of single
> queue
> >  	 * for not degrading IO performance by irqsoff latency.
> >  	 */
> > -	if (q->nr_hw_queues == 1) {
> > +	if (q->nr_hw_queues == 1 || (rq->mq_hctx->flags &
> > BLK_MQ_F_HOST_TAGS))
> > +{
> >  		__blk_complete_request(rq);
> >  		return;
> >  	}
> > @@ -1977,7 +1977,8 @@ static blk_qc_t blk_mq_make_request(struct
> > request_queue *q, struct bio *bio)
> >  		/* bypass scheduler for flush rq */
> >  		blk_insert_flush(rq);
> >  		blk_mq_run_hw_queue(data.hctx, true);
> > -	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops-
> > >commit_rqs)) {
> > +	} else if (plug && (q->nr_hw_queues == 1 || q->mq_ops->commit_rqs
> > ||
> > +				(data.hctx->flags & BLK_MQ_F_HOST_TAGS)))
> > {
> >  		/*
> >  		 * Use plugging if we have a ->commit_rqs() hook as well,
> as
> >  		 * we know the driver uses bd->last in a smart fashion.
> 
> Ming -
> 
> I tried above patch and no improvement in performance.
> 
> Below is perf record data - lock contention is while getting the tag
> (blk_mq_get_tag )
> 
> 6.67%     6.67%  fio              [kernel.vmlinux]  [k]
> native_queued_spin_lock_slowpath
>    - 6.66% io_submit
>       - 6.66% entry_SYSCALL_64
>          - do_syscall_64
>             - 6.66% __x64_sys_io_submit
>                - 6.66% io_submit_one
>                   - 6.66% aio_read
>                      - 6.66% generic_file_read_iter
>                         - 6.66% blkdev_direct_IO
>                            - 6.65% submit_bio
>                               - generic_make_request
>                                  - 6.65% blk_mq_make_request
>                                     - 6.65% blk_mq_get_request
>                                        - 6.65% blk_mq_get_tag
>                                           - 6.58%
> prepare_to_wait_exclusive
>                                              - 6.57%
> _raw_spin_lock_irqsave
> 
> queued_spin_lock_slowpath

Please drop the patch in my last email, and apply the following patch
and see if we can make a difference:

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3d6780504dcb..69d6bffcc8ff 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -627,6 +627,9 @@ static int hctx_active_show(void *data, struct seq_file *m)
 {
 	struct blk_mq_hw_ctx *hctx = data;
 
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+
 	seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
 	return 0;
 }
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 309ec5079f3f..58ef83a34fda 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -30,6 +30,9 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
  */
 bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
 	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		atomic_inc(&hctx->tags->active_queues);
@@ -55,6 +58,9 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
 	struct blk_mq_tags *tags = hctx->tags;
 
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+
 	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return;
 
@@ -74,6 +80,10 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 
 	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
 		return true;
+
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..84e9b46ffc78 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -36,11 +36,22 @@ extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
 
+static inline struct blk_mq_hw_ctx *blk_mq_master_hctx(
+		struct blk_mq_hw_ctx *hctx)
+{
+	return hctx->queue->queue_hw_ctx[0];
+}
+
+
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
 						 struct blk_mq_hw_ctx *hctx)
 {
 	if (!hctx)
 		return &bt->ws[0];
+
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+
 	return sbq_wait_ptr(bt, &hctx->wait_index);
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 49d73d979cb3..4196ed3b0085 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -303,7 +303,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	} else {
 		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
 			rq_flags = RQF_MQ_INFLIGHT;
-			atomic_inc(&data->hctx->nr_active);
+			blk_mq_inc_nr_active(data->hctx);
 		}
 		rq->tag = tag;
 		rq->internal_tag = -1;
@@ -517,7 +517,7 @@ void blk_mq_free_request(struct request *rq)
 
 	ctx->rq_completed[rq_is_sync(rq)]++;
 	if (rq->rq_flags & RQF_MQ_INFLIGHT)
-		atomic_dec(&hctx->nr_active);
+		blk_mq_dec_nr_active(hctx);
 
 	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
 		laptop_io_completion(q->backing_dev_info);
@@ -1064,7 +1064,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
 	if (rq->tag >= 0) {
 		if (shared) {
 			rq->rq_flags |= RQF_MQ_INFLIGHT;
-			atomic_inc(&data.hctx->nr_active);
+			blk_mq_inc_nr_active(data.hctx);
 		}
 		data.hctx->tags->rqs[rq->tag] = rq;
 	}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 633a5a77ee8b..f1279b8c2289 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -193,6 +193,20 @@ unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part);
 void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part,
 			 unsigned int inflight[2]);
 
+static inline void blk_mq_inc_nr_active(struct blk_mq_hw_ctx *hctx)
+{
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+	atomic_inc(&hctx->nr_active);
+}
+
+static inline void blk_mq_dec_nr_active(struct blk_mq_hw_ctx *hctx)
+{
+	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
+		hctx = blk_mq_master_hctx(hctx);
+	atomic_dec(&hctx->nr_active);
+}
+
 static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
@@ -218,7 +232,7 @@ static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
 		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
+		blk_mq_dec_nr_active(hctx);
 	}
 }
 
Thanks,
Ming

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* RE: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-03  3:56           ` Ming Lei
@ 2019-06-03 10:00             ` Kashyap Desai
  2019-06-07  9:45             ` Kashyap Desai
  1 sibling, 0 replies; 48+ messages in thread
From: Kashyap Desai @ 2019-06-03 10:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

>
> Please drop the patch in my last email, and apply the following patch
and see
> if we can make a difference:

Ming,

I dropped early patch and applied the below patched.  Now, I am getting
expected performance (3.0M IOPS).
Below patch fix the performance issue.  See perf report after applying the
same -

     8.52%  [kernel]        [k] sbitmap_any_bit_set
     4.19%  [kernel]        [k] blk_mq_run_hw_queue
     3.76%  [megaraid_sas]  [k] complete_cmd_fusion
     3.24%  [kernel]        [k] scsi_queue_rq
     2.53%  [megaraid_sas]  [k] megasas_build_ldio_fusion
     2.34%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
     2.18%  [kernel]        [k] entry_SYSCALL_64
     1.85%  [kernel]        [k] syscall_return_via_sysret
     1.78%  [kernel]        [k] blk_mq_run_hw_queues
     1.59%  [kernel]        [k] gup_pmd_range
     1.49%  [kernel]        [k] _raw_spin_lock_irqsave
     1.24%  [kernel]        [k] scsi_dec_host_busy
     1.23%  [kernel]        [k] blk_mq_free_request
     1.23%  [kernel]        [k] blk_mq_get_request
     0.96%  [kernel]        [k] __slab_free
     0.91%  [kernel]        [k] aio_complete
     0.90%  [kernel]        [k] __sched_text_start
     0.89%  [megaraid_sas]  [k] megasas_queue_command
     0.85%  [kernel]        [k] __fget
     0.84%  [kernel]        [k] scsi_mq_get_budget

I will do some more testing and update the results.

Kashyap

>
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index
> 3d6780504dcb..69d6bffcc8ff 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -627,6 +627,9 @@ static int hctx_active_show(void *data, struct
seq_file
> *m)  {
>  	struct blk_mq_hw_ctx *hctx = data;
>
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +
>  	seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
>  	return 0;
>  }
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> 309ec5079f3f..58ef83a34fda 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -30,6 +30,9 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
>   */
>  bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)  {
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +
>  	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
>  	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  		atomic_inc(&hctx->tags->active_queues);
> @@ -55,6 +58,9 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)  {
>  	struct blk_mq_tags *tags = hctx->tags;
>
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +
>  	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  		return;
>
> @@ -74,6 +80,10 @@ static inline bool hctx_may_queue(struct
> blk_mq_hw_ctx *hctx,
>
>  	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
>  		return true;
> +
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +
>  	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  		return true;
>
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index
> 61deab0b5a5a..84e9b46ffc78 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -36,11 +36,22 @@ extern void blk_mq_tag_wakeup_all(struct
> blk_mq_tags *tags, bool);  void blk_mq_queue_tag_busy_iter(struct
> request_queue *q, busy_iter_fn *fn,
>  		void *priv);
>
> +static inline struct blk_mq_hw_ctx *blk_mq_master_hctx(
> +		struct blk_mq_hw_ctx *hctx)
> +{
> +	return hctx->queue->queue_hw_ctx[0];
> +}
> +
> +
>  static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue
*bt,
>  						 struct blk_mq_hw_ctx
*hctx)
>  {
>  	if (!hctx)
>  		return &bt->ws[0];
> +
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +
>  	return sbq_wait_ptr(bt, &hctx->wait_index);  }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c index
> 49d73d979cb3..4196ed3b0085 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -303,7 +303,7 @@ static struct request *blk_mq_rq_ctx_init(struct
> blk_mq_alloc_data *data,
>  	} else {
>  		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
>  			rq_flags = RQF_MQ_INFLIGHT;
> -			atomic_inc(&data->hctx->nr_active);
> +			blk_mq_inc_nr_active(data->hctx);
>  		}
>  		rq->tag = tag;
>  		rq->internal_tag = -1;
> @@ -517,7 +517,7 @@ void blk_mq_free_request(struct request *rq)
>
>  	ctx->rq_completed[rq_is_sync(rq)]++;
>  	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> -		atomic_dec(&hctx->nr_active);
> +		blk_mq_dec_nr_active(hctx);
>
>  	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
>  		laptop_io_completion(q->backing_dev_info);
> @@ -1064,7 +1064,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
>  	if (rq->tag >= 0) {
>  		if (shared) {
>  			rq->rq_flags |= RQF_MQ_INFLIGHT;
> -			atomic_inc(&data.hctx->nr_active);
> +			blk_mq_inc_nr_active(data.hctx);
>  		}
>  		data.hctx->tags->rqs[rq->tag] = rq;
>  	}
> diff --git a/block/blk-mq.h b/block/blk-mq.h index
> 633a5a77ee8b..f1279b8c2289 100644
> --- a/block/blk-mq.h
> +++ b/block/blk-mq.h
> @@ -193,6 +193,20 @@ unsigned int blk_mq_in_flight(struct request_queue
> *q, struct hd_struct *part);  void blk_mq_in_flight_rw(struct
request_queue
> *q, struct hd_struct *part,
>  			 unsigned int inflight[2]);
>
> +static inline void blk_mq_inc_nr_active(struct blk_mq_hw_ctx *hctx) {
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +	atomic_inc(&hctx->nr_active);
> +}
> +
> +static inline void blk_mq_dec_nr_active(struct blk_mq_hw_ctx *hctx) {
> +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> +		hctx = blk_mq_master_hctx(hctx);
> +	atomic_dec(&hctx->nr_active);
> +}
> +
>  static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx
*hctx)
> {
>  	struct request_queue *q = hctx->queue; @@ -218,7 +232,7 @@ static
> inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
>
>  	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
>  		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
> -		atomic_dec(&hctx->nr_active);
> +		blk_mq_dec_nr_active(hctx);
>  	}
>  }
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-05-31 11:38       ` John Garry
@ 2019-06-03 11:00         ` Ming Lei
  2019-06-03 13:00           ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-06-03 11:00 UTC (permalink / raw)
  To: John Garry
  Cc: Ming Lei, Hannes Reinecke, Jens Axboe, linux-block,
	Linux SCSI List, Martin K . Petersen, James Bottomley,
	Bart Van Assche, Hannes Reinecke, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 12:38:10PM +0100, John Garry wrote:
> 
> > > > -fallback:
> > > > -     for_each_possible_cpu(cpu)
> > > > -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
> > > > -     /* Don't clean all CQ masks */
> > > > -}
> > > > -
> > > >  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > >  {
> > > >       struct device *dev = hisi_hba->dev;
> > > > @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > > 
> > > >               min_msi = MIN_AFFINE_VECTORS_V3_HW;
> > > > 
> > > > -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
> > > > -                                                sizeof(unsigned int),
> > > > -                                                GFP_KERNEL);
> > > > -             if (!hisi_hba->reply_map)
> > > > -                     return -ENOMEM;
> > > >               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
> > > >                                                        min_msi, max_msi,
> > > >                                                        PCI_IRQ_MSI |
> > > > @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > >                                                        &desc);
> > > >               if (vectors < 0)
> > > >                       return -ENOENT;
> > > > -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
> > > >       } else {
> > > >               min_msi = max_msi;
> > > >               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
> > > > @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
> > > >       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
> > > >  }
> > > > 
> > > > +static int hisi_sas_map_queues(struct Scsi_Host *shost)
> > > > +{
> > > > +     struct hisi_hba *hisi_hba = shost_priv(shost);
> > > > +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> > > > +
> > > > +     if (auto_affine_msi_experimental)
> > > > +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
> > > > +                             BASE_VECTORS_V3_HW);
> > > > +     else
> > > > +             return blk_mq_map_queues(qmap);
> 
> I don't think that the mapping which blk_mq_map_queues() creates are not
> want we want. I'm guessing that we still would like a mapping similar to
> what blk_mq_pci_map_queues() produces, which is an even spread, putting
> adjacent CPUs on the same queue.
> 
> For my system with 96 cpus and 16 queues, blk_mq_map_queues() would map
> queue 0 to cpu 0, 16, 32, 48 ..., queue 1 to cpu 1, 17, 33 and so on.

blk_mq_map_queues() is the default or fallback mapping in case that managed
irq isn't used. If the mapping isn't good enough, we still can improve it
in future, then any driver applying it can got improved.

> 
> > > > +}
> > > > +
> > > >  static struct scsi_host_template sht_v3_hw = {
> > > >       .name                   = DRV_NAME,
> > > >       .module                 = THIS_MODULE,
> > > 
> > > As mentioned, we should be using a common function here.
> > > 
> > > > @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
> > > >       .scan_start             = hisi_sas_scan_start,
> > > >       .change_queue_depth     = sas_change_queue_depth,
> > > >       .bios_param             = sas_bios_param,
> > > > +     .map_queues             = hisi_sas_map_queues,
> > > > +     .host_tagset            = 1,
> > > >       .this_id                = -1,
> > > >       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
> > > >       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
> > > > @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> > > >       if (hisi_sas_debugfs_enable)
> > > >               hisi_sas_debugfs_init(hisi_hba);
> > > > 
> > > > +     shost->nr_hw_queues = hisi_hba->cq_nvecs;
> 
> There's an ordering issue here, which can be fixed without too much trouble.
> 
> Value hisi_hba->cq_nvecs is not set until after this point, in
> hisi_sas_v3_probe()->hw->hw_init->hisi_sas_v3_init()->interrupt_init_v3_hw()
> 
> 
> Please see revised patch, below.

Good catch, will integrate it in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-06-03 11:00         ` Ming Lei
@ 2019-06-03 13:00           ` John Garry
  2019-06-04 13:37             ` Ming Lei
  0 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2019-06-03 13:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Hannes Reinecke, Jens Axboe, linux-block,
	Linux SCSI List, Martin K . Petersen, James Bottomley,
	Bart Van Assche, Hannes Reinecke, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On 03/06/2019 12:00, Ming Lei wrote:
> On Fri, May 31, 2019 at 12:38:10PM +0100, John Garry wrote:
>>
>>>>> -fallback:
>>>>> -     for_each_possible_cpu(cpu)
>>>>> -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
>>>>> -     /* Don't clean all CQ masks */
>>>>> -}
>>>>> -
>>>>>  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>>>  {
>>>>>       struct device *dev = hisi_hba->dev;
>>>>> @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>>>
>>>>>               min_msi = MIN_AFFINE_VECTORS_V3_HW;
>>>>>
>>>>> -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
>>>>> -                                                sizeof(unsigned int),
>>>>> -                                                GFP_KERNEL);
>>>>> -             if (!hisi_hba->reply_map)
>>>>> -                     return -ENOMEM;
>>>>>               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
>>>>>                                                        min_msi, max_msi,
>>>>>                                                        PCI_IRQ_MSI |
>>>>> @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
>>>>>                                                        &desc);
>>>>>               if (vectors < 0)
>>>>>                       return -ENOENT;
>>>>> -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
>>>>>       } else {
>>>>>               min_msi = max_msi;
>>>>>               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
>>>>> @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
>>>>>       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
>>>>>  }
>>>>>
>>>>> +static int hisi_sas_map_queues(struct Scsi_Host *shost)
>>>>> +{
>>>>> +     struct hisi_hba *hisi_hba = shost_priv(shost);
>>>>> +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
>>>>> +
>>>>> +     if (auto_affine_msi_experimental)
>>>>> +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
>>>>> +                             BASE_VECTORS_V3_HW);
>>>>> +     else
>>>>> +             return blk_mq_map_queues(qmap);
>>
>> I don't think that the mapping which blk_mq_map_queues() creates are not
>> want we want. I'm guessing that we still would like a mapping similar to
>> what blk_mq_pci_map_queues() produces, which is an even spread, putting
>> adjacent CPUs on the same queue.
>>
>> For my system with 96 cpus and 16 queues, blk_mq_map_queues() would map
>> queue 0 to cpu 0, 16, 32, 48 ..., queue 1 to cpu 1, 17, 33 and so on.
>

Hi Ming,

> blk_mq_map_queues() is the default or fallback mapping in case that managed
> irq isn't used. If the mapping isn't good enough, we still can improve it
> in future, then any driver applying it can got improved.
>

That's the right attitude. However, as I see, we can only know the 
mapping when we know the interrupt affinity or some other mapping 
restriction or rule etc, which we don't know in this case.

For now, personally I would rather if we only expose multiple queues for 
when auto_affine_msi_experimental is set. I fear that we may make a 
performance regression for !auto_affine_msi_experimental with this 
patch. We would need to test.

Hopefully we can drop !auto_affine_msi_experimental support when CPU 
hotplug issue is resolved.

>>
>>>>> +}
>>>>> +
>>>>>  static struct scsi_host_template sht_v3_hw = {
>>>>>       .name                   = DRV_NAME,
>>>>>       .module                 = THIS_MODULE,
>>>>
>>>> As mentioned, we should be using a common function here.
>>>>
>>>>> @@ -2906,6 +2888,8 @@ static struct scsi_host_template sht_v3_hw = {
>>>>>       .scan_start             = hisi_sas_scan_start,
>>>>>       .change_queue_depth     = sas_change_queue_depth,
>>>>>       .bios_param             = sas_bios_param,
>>>>> +     .map_queues             = hisi_sas_map_queues,
>>>>> +     .host_tagset            = 1,
>>>>>       .this_id                = -1,
>>>>>       .sg_tablesize           = HISI_SAS_SGE_PAGE_CNT,
>>>>>       .sg_prot_tablesize      = HISI_SAS_SGE_PAGE_CNT,
>>>>> @@ -3092,6 +3076,8 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>>>>       if (hisi_sas_debugfs_enable)
>>>>>               hisi_sas_debugfs_init(hisi_hba);
>>>>>
>>>>> +     shost->nr_hw_queues = hisi_hba->cq_nvecs;
>>
>> There's an ordering issue here, which can be fixed without too much trouble.
>>
>> Value hisi_hba->cq_nvecs is not set until after this point, in
>> hisi_sas_v3_probe()->hw->hw_init->hisi_sas_v3_init()->interrupt_init_v3_hw()
>>
>>
>> Please see revised patch, below.
>
> Good catch, will integrate it in V2.
>

Thanks!

> Thanks,
> Ming
>
> .
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue
  2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
                   ` (8 preceding siblings ...)
  2019-05-31  2:28 ` [PATCH 9/9] scsi: mp3sas: " Ming Lei
@ 2019-06-04  8:49 ` John Garry
  2019-08-13  8:30   ` John Garry
  9 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2019-06-04  8:49 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig, chenxiang

On 31/05/2019 03:27, Ming Lei wrote:
> Hi,
>
> The 1st patch introduces support hostwide tags for multiple hw queues
> via the simplest approach to share single 'struct blk_mq_tags' instance
> among all hw queues. In theory, this way won't cause any performance drop.
> Even small IOPS improvement can be observed on random IO on
> null_blk/scsi_debug.
>
> By applying the hostwide tags for MQ, we can convert some SCSI driver's
> private reply queue into generic blk-mq hw queue, then at least two
> improvement can be obtained:
>
> 1) the private reply queue maping can be removed from drivers, since the
> mapping has been implemented as generic API in blk-mq core
>
> 2) it helps to solve the generic managed IRQ race[1] during CPU hotplug
> in generic way, otherwise we have to re-invent new way to address the
> same issue for these drivers using private reply queue.
>
>
> [1] https://lore.kernel.org/linux-block/20190527150207.11372-1-ming.lei@redhat.com/T/#m6d95e2218bdd712ffda8f6451a0bb73eb2a651af
>
> Any comment and test feedback are appreciated.
>
> Thanks,
> Ming
>

Hi Ming,

I think that we'll wait for a v2 to test this, since you gave some 
updated patch for the megaraid performance testing.

Thanks,
John

> Hannes Reinecke (1):
>   scsi: Add template flag 'host_tagset'
>
> Ming Lei (8):
>   blk-mq: allow hw queues to share hostwide tags
>   block: null_blk: introduce module parameter of 'g_host_tags'
>   scsi_debug: support host tagset
>   scsi: introduce scsi_cmnd_hctx_index()
>   scsi: hpsa: convert private reply queue to blk-mq hw queue
>   scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
>   scsi: megaraid: convert private reply queue to blk-mq hw queue
>   scsi: mp3sas: convert private reply queue to blk-mq hw queue
>
>  block/blk-mq-debugfs.c                      |  1 +
>  block/blk-mq-sched.c                        |  8 +++
>  block/blk-mq-tag.c                          |  6 ++
>  block/blk-mq.c                              | 14 ++++
>  block/elevator.c                            |  5 +-
>  drivers/block/null_blk_main.c               |  6 ++
>  drivers/scsi/hisi_sas/hisi_sas.h            |  2 +-
>  drivers/scsi/hisi_sas/hisi_sas_main.c       | 36 +++++-----
>  drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      | 46 +++++--------
>  drivers/scsi/hpsa.c                         | 49 ++++++--------
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 50 +++++---------
>  drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +-
>  drivers/scsi/mpt3sas/mpt3sas_base.c         | 74 ++++-----------------
>  drivers/scsi/mpt3sas/mpt3sas_base.h         |  3 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c        | 17 +++++
>  drivers/scsi/scsi_debug.c                   |  3 +
>  drivers/scsi/scsi_lib.c                     |  2 +
>  include/linux/blk-mq.h                      |  1 +
>  include/scsi/scsi_cmnd.h                    | 15 +++++
>  include/scsi/scsi_host.h                    |  3 +
>  20 files changed, 168 insertions(+), 177 deletions(-)
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/9] scsi: hisi_sas_v3: convert private reply queue to blk-mq hw queue
  2019-06-03 13:00           ` John Garry
@ 2019-06-04 13:37             ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2019-06-04 13:37 UTC (permalink / raw)
  To: John Garry
  Cc: Ming Lei, Hannes Reinecke, Jens Axboe, linux-block,
	Linux SCSI List, Martin K . Petersen, James Bottomley,
	Bart Van Assche, Hannes Reinecke, Don Brace, Kashyap Desai,
	Sathya Prakash, Christoph Hellwig

On Mon, Jun 03, 2019 at 02:00:19PM +0100, John Garry wrote:
> On 03/06/2019 12:00, Ming Lei wrote:
> > On Fri, May 31, 2019 at 12:38:10PM +0100, John Garry wrote:
> > > 
> > > > > > -fallback:
> > > > > > -     for_each_possible_cpu(cpu)
> > > > > > -             hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
> > > > > > -     /* Don't clean all CQ masks */
> > > > > > -}
> > > > > > -
> > > > > >  static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > > > >  {
> > > > > >       struct device *dev = hisi_hba->dev;
> > > > > > @@ -2383,11 +2359,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > > > > 
> > > > > >               min_msi = MIN_AFFINE_VECTORS_V3_HW;
> > > > > > 
> > > > > > -             hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
> > > > > > -                                                sizeof(unsigned int),
> > > > > > -                                                GFP_KERNEL);
> > > > > > -             if (!hisi_hba->reply_map)
> > > > > > -                     return -ENOMEM;
> > > > > >               vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
> > > > > >                                                        min_msi, max_msi,
> > > > > >                                                        PCI_IRQ_MSI |
> > > > > > @@ -2395,7 +2366,6 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
> > > > > >                                                        &desc);
> > > > > >               if (vectors < 0)
> > > > > >                       return -ENOENT;
> > > > > > -             setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
> > > > > >       } else {
> > > > > >               min_msi = max_msi;
> > > > > >               vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
> > > > > > @@ -2896,6 +2866,18 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
> > > > > >       clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
> > > > > >  }
> > > > > > 
> > > > > > +static int hisi_sas_map_queues(struct Scsi_Host *shost)
> > > > > > +{
> > > > > > +     struct hisi_hba *hisi_hba = shost_priv(shost);
> > > > > > +     struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> > > > > > +
> > > > > > +     if (auto_affine_msi_experimental)
> > > > > > +             return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
> > > > > > +                             BASE_VECTORS_V3_HW);
> > > > > > +     else
> > > > > > +             return blk_mq_map_queues(qmap);
> > > 
> > > I don't think that the mapping which blk_mq_map_queues() creates are not
> > > want we want. I'm guessing that we still would like a mapping similar to
> > > what blk_mq_pci_map_queues() produces, which is an even spread, putting
> > > adjacent CPUs on the same queue.
> > > 
> > > For my system with 96 cpus and 16 queues, blk_mq_map_queues() would map
> > > queue 0 to cpu 0, 16, 32, 48 ..., queue 1 to cpu 1, 17, 33 and so on.
> > 
> 
> Hi Ming,
> 
> > blk_mq_map_queues() is the default or fallback mapping in case that managed
> > irq isn't used. If the mapping isn't good enough, we still can improve it
> > in future, then any driver applying it can got improved.
> > 
> 
> That's the right attitude. However, as I see, we can only know the mapping
> when we know the interrupt affinity or some other mapping restriction or
> rule etc, which we don't know in this case.
> 
> For now, personally I would rather if we only expose multiple queues for
> when auto_affine_msi_experimental is set. I fear that we may make a
> performance regression for !auto_affine_msi_experimental with this patch. We
> would need to test.

I suggest to use the blk-mq generic helper.

The default queue mapping of blk_mq_map_queues() has been used for a
while, so far so good, such as, very similar way is applied on
megaraid_sas and mpt3sas, see _base_assign_reply_queues() and
megasas_setup_reply_map().

If performance drop is caused, just report it out, we could fix it.
Or even you can write a new .map_queues method just for hisi_sas v3.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
  2019-05-31  6:07   ` Hannes Reinecke
  2019-05-31 15:37   ` Bart Van Assche
@ 2019-06-05 14:10   ` John Garry
  2019-06-24  8:46     ` Ming Lei
  2 siblings, 1 reply; 48+ messages in thread
From: John Garry @ 2019-06-05 14:10 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 31/05/2019 03:27, Ming Lei wrote:
> index 32b8ad3d341b..49d73d979cb3 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
>  {
>  	int ret = 0;
>

Hi Ming,

> +	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx) {
> +		set->tags[hctx_idx] = set->tags[0];

Here we set all tags same as that of hctx index 0.

> +		return true;


As such, I think that the error handling in __blk_mq_alloc_rq_maps() is 
made a little fragile:

__blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
	int i;

	for (i = 0; i < set->nr_hw_queues; i++)
		if (!__blk_mq_alloc_rq_map(set, i))
			goto out_unwind;

	return 0;

out_unwind:
	while (--i >= 0)
		blk_mq_free_rq_map(set->tags[i]);

	return -ENOMEM;
}

If __blk_mq_alloc_rq_map(, i > 1) fails for when BLK_MQ_F_HOST_TAGS FLAG 
is set (even though today it can't), then we would try to free 
set->tags[0] multiple times.

> +	}
> +
>  	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
>  					set->queue_depth, set->reserved_tags);

Thanks,
John

>  	if (!set->tags[hctx_idx])
> @@ -2451,6 +2456,9 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
>  static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
>  					



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 9/9] scsi: mp3sas: convert private reply queue to blk-mq hw queue
  2019-05-31  2:28 ` [PATCH 9/9] scsi: mp3sas: " Ming Lei
  2019-05-31  6:23   ` Hannes Reinecke
@ 2019-06-06 11:58   ` Sreekanth Reddy
  1 sibling, 0 replies; 48+ messages in thread
From: Sreekanth Reddy @ 2019-06-06 11:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig

Ming - We have one outstanding series posted for mpt3sas driver. Your next
revision may need rebase considering below update -

https://marc.info/?l=linux-scsi&m=155930490520681&w=2

Thanks & Regards,
Sreekanth

On Fri, May 31, 2019 at 7:59 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> SCSI's reply qeueue is very similar with blk-mq's hw queue, both
> assigned by IRQ vector, so map te private reply queue into blk-mq's hw
> queue via .host_tagset.
>
> Then the private reply mapping can be removed.
>
> Another benefit is that the request/irq lost issue may be solved in
> generic approach because managed IRQ may be shutdown during CPU
> hotplug.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c  | 74 +++++-----------------------
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 +-
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 17 +++++++
>  3 files changed, 31 insertions(+), 63 deletions(-)
>
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index 8aacbd1e7db2..2b207d2925b4 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -2855,8 +2855,7 @@ _base_request_irq(struct MPT3SAS_ADAPTER *ioc, u8 index)
>  static void
>  _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
>  {
> -       unsigned int cpu, nr_cpus, nr_msix, index = 0;
> -       struct adapter_reply_queue *reply_q;
> +       unsigned int nr_cpus, nr_msix;
>
>         if (!_base_is_controller_msix_enabled(ioc))
>                 return;
> @@ -2866,50 +2865,9 @@ _base_assign_reply_queues(struct MPT3SAS_ADAPTER *ioc)
>                 return;
>         }
>
> -       memset(ioc->cpu_msix_table, 0, ioc->cpu_msix_table_sz);
> -
>         nr_cpus = num_online_cpus();
>         nr_msix = ioc->reply_queue_count = min(ioc->reply_queue_count,
>                                                ioc->facts.MaxMSIxVectors);
> -       if (!nr_msix)
> -               return;
> -
> -       if (smp_affinity_enable) {
> -               list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
> -                       const cpumask_t *mask = pci_irq_get_affinity(ioc->pdev,
> -                                                       reply_q->msix_index);
> -                       if (!mask) {
> -                               ioc_warn(ioc, "no affinity for msi %x\n",
> -                                        reply_q->msix_index);
> -                               continue;
> -                       }
> -
> -                       for_each_cpu_and(cpu, mask, cpu_online_mask) {
> -                               if (cpu >= ioc->cpu_msix_table_sz)
> -                                       break;
> -                               ioc->cpu_msix_table[cpu] = reply_q->msix_index;
> -                       }
> -               }
> -               return;
> -       }
> -       cpu = cpumask_first(cpu_online_mask);
> -
> -       list_for_each_entry(reply_q, &ioc->reply_queue_list, list) {
> -
> -               unsigned int i, group = nr_cpus / nr_msix;
> -
> -               if (cpu >= nr_cpus)
> -                       break;
> -
> -               if (index < nr_cpus % nr_msix)
> -                       group++;
> -
> -               for (i = 0 ; i < group ; i++) {
> -                       ioc->cpu_msix_table[cpu] = reply_q->msix_index;
> -                       cpu = cpumask_next(cpu, cpu_online_mask);
> -               }
> -               index++;
> -       }
>  }
>
>  /**
> @@ -2924,6 +2882,7 @@ _base_disable_msix(struct MPT3SAS_ADAPTER *ioc)
>                 return;
>         pci_disable_msix(ioc->pdev);
>         ioc->msix_enable = 0;
> +       ioc->smp_affinity_enable = 0;
>  }
>
>  /**
> @@ -2980,6 +2939,9 @@ _base_enable_msix(struct MPT3SAS_ADAPTER *ioc)
>                 goto try_ioapic;
>         }
>
> +       if (irq_flags & PCI_IRQ_AFFINITY)
> +               ioc->smp_affinity_enable = 1;
> +
>         ioc->msix_enable = 1;
>         ioc->reply_queue_count = r;
>         for (i = 0; i < ioc->reply_queue_count; i++) {
> @@ -3266,7 +3228,7 @@ mpt3sas_base_get_reply_virt_addr(struct MPT3SAS_ADAPTER *ioc, u32 phys_addr)
>  }
>
>  static inline u8
> -_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
> +_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
>  {
>         /* Enables reply_queue load balancing */
>         if (ioc->msix_load_balance)
> @@ -3274,7 +3236,7 @@ _base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
>                     base_mod64(atomic64_add_return(1,
>                     &ioc->total_io_cnt), ioc->reply_queue_count) : 0;
>
> -       return ioc->cpu_msix_table[raw_smp_processor_id()];
> +       return scsi_cmnd_hctx_index(ioc->shost, scmd);
>  }
>
>  /**
> @@ -3325,7 +3287,7 @@ mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
>
>         smid = tag + 1;
>         request->cb_idx = cb_idx;
> -       request->msix_io = _base_get_msix_index(ioc);
> +       request->msix_io = _base_get_msix_index(ioc, scmd);
>         request->smid = smid;
>         INIT_LIST_HEAD(&request->chain_list);
>         return smid;
> @@ -3498,7 +3460,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
>         _base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp,
>                                         ioc->request_sz);
>         descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
> -       descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
> +       descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>         descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>         descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>         descriptor.SCSIIO.LMID = 0;
> @@ -3520,7 +3482,7 @@ _base_put_smid_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
>
>
>         descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
> -       descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
> +       descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>         descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>         descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>         descriptor.SCSIIO.LMID = 0;
> @@ -3543,7 +3505,7 @@ mpt3sas_base_put_smid_fast_path(struct MPT3SAS_ADAPTER *ioc, u16 smid,
>
>         descriptor.SCSIIO.RequestFlags =
>             MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO;
> -       descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc);
> +       descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL);
>         descriptor.SCSIIO.SMID = cpu_to_le16(smid);
>         descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
>         descriptor.SCSIIO.LMID = 0;
> @@ -3607,7 +3569,7 @@ mpt3sas_base_put_smid_nvme_encap(struct MPT3SAS_ADAPTER *ioc, u16 smid)
>
>         descriptor.Default.RequestFlags =
>                 MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED;
> -       descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
> +       descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>         descriptor.Default.SMID = cpu_to_le16(smid);
>         descriptor.Default.LMID = 0;
>         descriptor.Default.DescriptorTypeDependent = 0;
> @@ -3639,7 +3601,7 @@ mpt3sas_base_put_smid_default(struct MPT3SAS_ADAPTER *ioc, u16 smid)
>         }
>         request = (u64 *)&descriptor;
>         descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE;
> -       descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
> +       descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
>         descriptor.Default.SMID = cpu_to_le16(smid);
>         descriptor.Default.LMID = 0;
>         descriptor.Default.DescriptorTypeDependent = 0;
> @@ -6524,19 +6486,11 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
>
>         dinitprintk(ioc, ioc_info(ioc, "%s\n", __func__));
>
> -       /* setup cpu_msix_table */
>         ioc->cpu_count = num_online_cpus();
>         for_each_online_cpu(cpu_id)
>                 last_cpu_id = cpu_id;
>         ioc->cpu_msix_table_sz = last_cpu_id + 1;
> -       ioc->cpu_msix_table = kzalloc(ioc->cpu_msix_table_sz, GFP_KERNEL);
>         ioc->reply_queue_count = 1;
> -       if (!ioc->cpu_msix_table) {
> -               dfailprintk(ioc,
> -                           ioc_info(ioc, "allocation for cpu_msix_table failed!!!\n"));
> -               r = -ENOMEM;
> -               goto out_free_resources;
> -       }
>
>         if (ioc->is_warpdrive) {
>                 ioc->reply_post_host_index = kcalloc(ioc->cpu_msix_table_sz,
> @@ -6748,7 +6702,6 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPTER *ioc)
>         mpt3sas_base_free_resources(ioc);
>         _base_release_memory_pools(ioc);
>         pci_set_drvdata(ioc->pdev, NULL);
> -       kfree(ioc->cpu_msix_table);
>         if (ioc->is_warpdrive)
>                 kfree(ioc->reply_post_host_index);
>         kfree(ioc->pd_handles);
> @@ -6789,7 +6742,6 @@ mpt3sas_base_detach(struct MPT3SAS_ADAPTER *ioc)
>         _base_release_memory_pools(ioc);
>         mpt3sas_free_enclosure_list(ioc);
>         pci_set_drvdata(ioc->pdev, NULL);
> -       kfree(ioc->cpu_msix_table);
>         if (ioc->is_warpdrive)
>                 kfree(ioc->reply_post_host_index);
>         kfree(ioc->pd_handles);
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h
> index 480219f0efc5..4d441e031025 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.h
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
> @@ -1022,7 +1022,6 @@ typedef void (*MPT3SAS_FLUSH_RUNNING_CMDS)(struct MPT3SAS_ADAPTER *ioc);
>   * @start_scan_failed: means port enable failed, return's the ioc_status
>   * @msix_enable: flag indicating msix is enabled
>   * @msix_vector_count: number msix vectors
> - * @cpu_msix_table: table for mapping cpus to msix index
>   * @cpu_msix_table_sz: table size
>   * @total_io_cnt: Gives total IO count, used to load balance the interrupts
>   * @msix_load_balance: Enables load balancing of interrupts across
> @@ -1183,6 +1182,7 @@ struct MPT3SAS_ADAPTER {
>         u16             broadcast_aen_pending;
>         u8              shost_recovery;
>         u8              got_task_abort_from_ioctl;
> +       u8              smp_affinity_enable;
>
>         struct mutex    reset_in_progress_mutex;
>         spinlock_t      ioc_reset_in_progress_lock;
> @@ -1199,7 +1199,6 @@ struct MPT3SAS_ADAPTER {
>
>         u8              msix_enable;
>         u16             msix_vector_count;
> -       u8              *cpu_msix_table;
>         u16             cpu_msix_table_sz;
>         resource_size_t __iomem **reply_post_host_index;
>         u32             ioc_reset_count;
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> index 1ccfbc7eebe0..59c1f9e694a0 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
> @@ -55,6 +55,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/aer.h>
>  #include <linux/raid_class.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>
>  #include "mpt3sas_base.h"
> @@ -10161,6 +10162,17 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time)
>         return 1;
>  }
>
> +static int mpt3sas_map_queues(struct Scsi_Host *shost)
> +{
> +       struct MPT3SAS_ADAPTER *ioc = shost_priv(shost);
> +       struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +       if (ioc->smp_affinity_enable)
> +               return blk_mq_pci_map_queues(qmap, ioc->pdev, 0);
> +       else
> +               return blk_mq_map_queues(qmap);
> +}
> +
>  /* shost template for SAS 2.0 HBA devices */
>  static struct scsi_host_template mpt2sas_driver_template = {
>         .module                         = THIS_MODULE,
> @@ -10189,6 +10201,8 @@ static struct scsi_host_template mpt2sas_driver_template = {
>         .sdev_attrs                     = mpt3sas_dev_attrs,
>         .track_queue_depth              = 1,
>         .cmd_size                       = sizeof(struct scsiio_tracker),
> +       .host_tagset                    = 1,
> +       .map_queues                     = mpt3sas_map_queues,
>  };
>
>  /* raid transport support for SAS 2.0 HBA devices */
> @@ -10227,6 +10241,8 @@ static struct scsi_host_template mpt3sas_driver_template = {
>         .sdev_attrs                     = mpt3sas_dev_attrs,
>         .track_queue_depth              = 1,
>         .cmd_size                       = sizeof(struct scsiio_tracker),
> +       .host_tagset                    = 1,
> +       .map_queues                     = mpt3sas_map_queues,
>  };
>
>  /* raid transport support for SAS 3.0 HBA devices */
> @@ -10538,6 +10554,7 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>         } else
>                 ioc->hide_drives = 0;
>
> +       shost->nr_hw_queues = ioc->reply_queue_count;
>         rv = scsi_add_host(shost, &pdev->dev);
>         if (rv) {
>                 ioc_err(ioc, "failure at %s:%d/%s()!\n",
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: [PATCH 8/9] scsi: megaraid: convert private reply queue to blk-mq hw queue
  2019-06-03  3:56           ` Ming Lei
  2019-06-03 10:00             ` Kashyap Desai
@ 2019-06-07  9:45             ` Kashyap Desai
  1 sibling, 0 replies; 48+ messages in thread
From: Kashyap Desai @ 2019-06-07  9:45 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, John Garry,
	Don Brace, Sathya Prakash Veerichetty, Christoph Hellwig

>
> >
> > Please drop the patch in my last email, and apply the following patch
> > and see if we can make a difference:
>
> Ming,
>
> I dropped early patch and applied the below patched.  Now, I am getting
> expected performance (3.0M IOPS).
> Below patch fix the performance issue.  See perf report after applying
the
> same -
>
>      8.52%  [kernel]        [k] sbitmap_any_bit_set
>      4.19%  [kernel]        [k] blk_mq_run_hw_queue
>      3.76%  [megaraid_sas]  [k] complete_cmd_fusion
>      3.24%  [kernel]        [k] scsi_queue_rq
>      2.53%  [megaraid_sas]  [k] megasas_build_ldio_fusion
>      2.34%  [megaraid_sas]  [k] megasas_build_and_issue_cmd_fusion
>      2.18%  [kernel]        [k] entry_SYSCALL_64
>      1.85%  [kernel]        [k] syscall_return_via_sysret
>      1.78%  [kernel]        [k] blk_mq_run_hw_queues
>      1.59%  [kernel]        [k] gup_pmd_range
>      1.49%  [kernel]        [k] _raw_spin_lock_irqsave
>      1.24%  [kernel]        [k] scsi_dec_host_busy
>      1.23%  [kernel]        [k] blk_mq_free_request
>      1.23%  [kernel]        [k] blk_mq_get_request
>      0.96%  [kernel]        [k] __slab_free
>      0.91%  [kernel]        [k] aio_complete
>      0.90%  [kernel]        [k] __sched_text_start
>      0.89%  [megaraid_sas]  [k] megasas_queue_command
>      0.85%  [kernel]        [k] __fget
>      0.84%  [kernel]        [k] scsi_mq_get_budget
>
> I will do some more testing and update the results.

Ming, I did testing on AMD Dual Socket server (AMD EPYC 7601 32-Core
Processor). System has total 128 logical cores.

Without patch, performance can go upto 2.8M IOPS. See below perf top
output.

   7.37%  [megaraid_sas]      [k] complete_cmd_fusion
   2.51%  [kernel]            [k] copy_user_generic_string
   2.48%  [kernel]            [k] read_tsc
   2.10%  fio                 [.] thread_main
   2.06%  [kernel]            [k] gup_pgd_range
   1.98%  [kernel]            [k] __get_user_4
   1.92%  [kernel]            [k] entry_SYSCALL_64
   1.58%  [kernel]            [k] scsi_queue_rq
   1.55%  [megaraid_sas]      [k] megasas_queue_command
   1.52%  [kernel]            [k] irq_entries_start
   1.43%  fio                 [.] get_io_u
   1.39%  [kernel]            [k] blkdev_direct_IO
   1.34%  [kernel]            [k] __audit_syscall_exit
   1.31%  [megaraid_sas]      [k] megasas_build_and_issue_cmd_fusion
   1.27%  [kernel]            [k] syscall_slow_exit_work
   1.23%  [kernel]            [k] io_submit_one
   1.20%  [kernel]            [k] do_syscall_64
   1.17%  fio                 [.] td_io_queue
   1.16%  [kernel]            [k] lookup_ioctx
   1.14%  [kernel]            [k] kmem_cache_alloc
   1.10%  [megaraid_sas]      [k] megasas_build_ldio_fusion
   1.07%  [kernel]            [k] __memset
   1.06%  [kernel]            [k] __virt_addr_valid
   0.98%  [kernel]            [k] blk_mq_get_request
   0.94%  [kernel]            [k] note_interrupt
   0.91%  [kernel]            [k] __get_user_8
   0.91%  [kernel]            [k] aio_read_events
   0.85%  [kernel]            [k] __put_user_4
   0.78%  fio                 [.] fio_libaio_commit
   0.74%  [megaraid_sas]      [k] MR_BuildRaidContext
   0.70%  [kernel]            [k] __x64_sys_io_submit
   0.69%  fio                 [.] utime_since_now


With your patch - Performance can go upto 1.7M IOPS. See below perf top
output.

 23.01%  [kernel]              [k] sbitmap_any_bit_set
   6.42%  [kernel]              [k] blk_mq_run_hw_queue
   4.44%  [megaraid_sas]        [k] complete_cmd_fusion
   4.23%  [kernel]              [k] blk_mq_run_hw_queues
   1.80%  [kernel]              [k] read_tsc
   1.60%  [kernel]              [k] copy_user_generic_string
   1.33%  fio                   [.] thread_main
   1.27%  [kernel]              [k] irq_entries_start
   1.22%  [kernel]              [k] gup_pgd_range
   1.20%  [kernel]              [k] __get_user_4
   1.20%  [kernel]              [k] entry_SYSCALL_64
   1.07%  [kernel]              [k] scsi_queue_rq
   0.88%  fio                   [.] get_io_u
   0.87%  [megaraid_sas]        [k] megasas_queue_command
   0.86%  [kernel]              [k] blkdev_direct_IO
   0.85%  fio                   [.] td_io_queue
   0.80%  [kernel]              [k] note_interrupt
   0.76%  [kernel]              [k] lookup_ioctx
   0.76%  [kernel]              [k] do_syscall_64
   0.75%  [megaraid_sas]        [k] megasas_build_and_issue_cmd_fusion
   0.74%  [megaraid_sas]        [k] megasas_build_ldio_fusion
   0.72%  [kernel]              [k] kmem_cache_alloc
   0.71%  [kernel]              [k] __audit_syscall_exit
   0.67%  [kernel]              [k] __virt_addr_valid
   0.65%  [kernel]              [k] blk_mq_get_request
   0.64%  [kernel]              [k] __memset
   0.62%  [kernel]              [k] syscall_slow_exit_work
   0.60%  [kernel]              [k] io_submit_one
   0.59%  [kernel]              [k] ktime_get
   0.58%  fio                   [.] fio_libaio_commit
   0.57%  [kernel]              [k] aio_read_events
   0.54%  [kernel]              [k] __get_user_8
   0.53%  [kernel]              [k] aio_complete_rw
   0.51%  [kernel]              [k] kmem_cache_free

With your patch + reducing logical cpu core to 64 (CPU hotplugged),
performance can go upto 2.2M IOPS. See below perf top output.

   9.56%  [kernel]            [k] sbitmap_any_bit_set
   4.62%  [megaraid_sas]      [k] complete_cmd_fusion
   3.02%  [kernel]            [k] blk_mq_run_hw_queue
   2.15%  [kernel]            [k] copy_user_generic_string
   2.13%  [kernel]            [k] blk_mq_run_hw_queues
   2.09%  [kernel]            [k] read_tsc
   1.66%  [kernel]            [k] __get_user_4
   1.59%  [kernel]            [k] entry_SYSCALL_64
   1.57%  [kernel]            [k] gup_pgd_range
   1.55%  fio                 [.] thread_main
   1.51%  [kernel]            [k] scsi_queue_rq
   1.31%  [kernel]            [k] __memset
   1.21%  [megaraid_sas]      [k] megasas_build_and_issue_cmd_fusion
   1.16%  [megaraid_sas]      [k] megasas_queue_command
   1.13%  fio                 [.] get_io_u
   1.12%  [kernel]            [k] blk_mq_get_request
   1.07%  [kernel]            [k] blkdev_direct_IO
   1.06%  [kernel]            [k] __put_user_4
   1.05%  fio                 [.] td_io_queue
   1.02%  [kernel]            [k] syscall_slow_exit_work
   1.00%  [megaraid_sas]      [k] megasas_build_ldio_fusion


In summary, Part of the performance drop may be correlated with number of
hctx created in block layer. I can provide more details and can test
follow up patch.

Kashyap


>
> Kashyap
>
> >
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index
> > 3d6780504dcb..69d6bffcc8ff 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -627,6 +627,9 @@ static int hctx_active_show(void *data, struct
> > seq_file
> > *m)  {
> >  	struct blk_mq_hw_ctx *hctx = data;
> >
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +
> >  	seq_printf(m, "%d\n", atomic_read(&hctx->nr_active));
> >  	return 0;
> >  }
> > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index
> > 309ec5079f3f..58ef83a34fda 100644
> > --- a/block/blk-mq-tag.c
> > +++ b/block/blk-mq-tag.c
> > @@ -30,6 +30,9 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
> >   */
> >  bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)  {
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +
> >  	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
> >  	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> >  		atomic_inc(&hctx->tags->active_queues);
> > @@ -55,6 +58,9 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
> {
> >  	struct blk_mq_tags *tags = hctx->tags;
> >
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +
> >  	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> >  		return;
> >
> > @@ -74,6 +80,10 @@ static inline bool hctx_may_queue(struct
> > blk_mq_hw_ctx *hctx,
> >
> >  	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
> >  		return true;
> > +
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +
> >  	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> >  		return true;
> >
> > diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index
> > 61deab0b5a5a..84e9b46ffc78 100644
> > --- a/block/blk-mq-tag.h
> > +++ b/block/blk-mq-tag.h
> > @@ -36,11 +36,22 @@ extern void blk_mq_tag_wakeup_all(struct
> > blk_mq_tags *tags, bool);  void blk_mq_queue_tag_busy_iter(struct
> > request_queue *q, busy_iter_fn *fn,
> >  		void *priv);
> >
> > +static inline struct blk_mq_hw_ctx *blk_mq_master_hctx(
> > +		struct blk_mq_hw_ctx *hctx)
> > +{
> > +	return hctx->queue->queue_hw_ctx[0]; }
> > +
> > +
> >  static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue
*bt,
> >  						 struct blk_mq_hw_ctx
*hctx)
> >  {
> >  	if (!hctx)
> >  		return &bt->ws[0];
> > +
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +
> >  	return sbq_wait_ptr(bt, &hctx->wait_index);  }
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c index
> > 49d73d979cb3..4196ed3b0085 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -303,7 +303,7 @@ static struct request *blk_mq_rq_ctx_init(struct
> > blk_mq_alloc_data *data,
> >  	} else {
> >  		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
> >  			rq_flags = RQF_MQ_INFLIGHT;
> > -			atomic_inc(&data->hctx->nr_active);
> > +			blk_mq_inc_nr_active(data->hctx);
> >  		}
> >  		rq->tag = tag;
> >  		rq->internal_tag = -1;
> > @@ -517,7 +517,7 @@ void blk_mq_free_request(struct request *rq)
> >
> >  	ctx->rq_completed[rq_is_sync(rq)]++;
> >  	if (rq->rq_flags & RQF_MQ_INFLIGHT)
> > -		atomic_dec(&hctx->nr_active);
> > +		blk_mq_dec_nr_active(hctx);
> >
> >  	if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
> >  		laptop_io_completion(q->backing_dev_info);
> > @@ -1064,7 +1064,7 @@ bool blk_mq_get_driver_tag(struct request *rq)
> >  	if (rq->tag >= 0) {
> >  		if (shared) {
> >  			rq->rq_flags |= RQF_MQ_INFLIGHT;
> > -			atomic_inc(&data.hctx->nr_active);
> > +			blk_mq_inc_nr_active(data.hctx);
> >  		}
> >  		data.hctx->tags->rqs[rq->tag] = rq;
> >  	}
> > diff --git a/block/blk-mq.h b/block/blk-mq.h index
> > 633a5a77ee8b..f1279b8c2289 100644
> > --- a/block/blk-mq.h
> > +++ b/block/blk-mq.h
> > @@ -193,6 +193,20 @@ unsigned int blk_mq_in_flight(struct
> > request_queue *q, struct hd_struct *part);  void
> > blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part,
> >  			 unsigned int inflight[2]);
> >
> > +static inline void blk_mq_inc_nr_active(struct blk_mq_hw_ctx *hctx) {
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +	atomic_inc(&hctx->nr_active);
> > +}
> > +
> > +static inline void blk_mq_dec_nr_active(struct blk_mq_hw_ctx *hctx) {
> > +	if (hctx->flags & BLK_MQ_F_HOST_TAGS)
> > +		hctx = blk_mq_master_hctx(hctx);
> > +	atomic_dec(&hctx->nr_active);
> > +}
> > +
> >  static inline void blk_mq_put_dispatch_budget(struct blk_mq_hw_ctx
> > *hctx) {
> >  	struct request_queue *q = hctx->queue; @@ -218,7 +232,7 @@ static
> > inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
> >
> >  	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
> >  		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
> > -		atomic_dec(&hctx->nr_active);
> > +		blk_mq_dec_nr_active(hctx);
> >  	}
> >  }
> >
> > Thanks,
> > Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags'
  2019-05-31 15:39   ` Bart Van Assche
@ 2019-06-24  8:43     ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2019-06-24  8:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Hannes Reinecke, John Garry, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 08:39:04AM -0700, Bart Van Assche wrote:
> On 5/30/19 7:27 PM, Ming Lei wrote:
> > +static int g_host_tags = 0;
> 
> Static variables should not be explicitly initialized to zero.

OK

> 
> > +module_param_named(host_tags, g_host_tags, int, S_IRUGO);
> > +MODULE_PARM_DESC(host_tags, "All submission queues share one tags");
>                                                             ^^^^^^^^
> Did you perhaps mean "one tagset"?

tagset means one set of tags, here all submission queues share one
single tag space(tags), see 'struct blk_mq_tags'.


thanks, 
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-05-31 15:37   ` Bart Van Assche
@ 2019-06-24  8:44     ` Ming Lei
  0 siblings, 0 replies; 48+ messages in thread
From: Ming Lei @ 2019-06-24  8:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Hannes Reinecke, John Garry, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Fri, May 31, 2019 at 08:37:39AM -0700, Bart Van Assche wrote:
> On 5/30/19 7:27 PM, Ming Lei wrote:
> > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > index 6aea0ebc3a73..3d6780504dcb 100644
> > --- a/block/blk-mq-debugfs.c
> > +++ b/block/blk-mq-debugfs.c
> > @@ -237,6 +237,7 @@ static const char *const alloc_policy_name[] = {
> >   static const char *const hctx_flag_name[] = {
> >   	HCTX_FLAG_NAME(SHOULD_MERGE),
> >   	HCTX_FLAG_NAME(TAG_SHARED),
> > +	HCTX_FLAG_NAME(HOST_TAGS),
> >   	HCTX_FLAG_NAME(BLOCKING),
> >   	HCTX_FLAG_NAME(NO_SCHED),
> >   };
> 
> The name BLK_MQ_F_HOST_TAGS suggests that tags are shared across a SCSI
> host. That is misleading since this flag means that tags are shared across
> hardware queues. Additionally, the "host" term is a term that comes from the
> SCSI world and this patch is a block layer patch. That makes me wonder
> whether another name should be used to reflect that all hardware queues
> share the same tag set? How about renaming BLK_MQ_F_TAG_SHARED into
> BLK_MQ_F_TAG_QUEUE_SHARED and renaming BLK_MQ_F_HOST_TAGS into
> BLK_MQ_F_TAG_HCTX_SHARED?

Looks fine, will do it in V2.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-06-05 14:10   ` John Garry
@ 2019-06-24  8:46     ` Ming Lei
  2019-06-24 13:14       ` John Garry
  0 siblings, 1 reply; 48+ messages in thread
From: Ming Lei @ 2019-06-24  8:46 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Wed, Jun 05, 2019 at 03:10:51PM +0100, John Garry wrote:
> On 31/05/2019 03:27, Ming Lei wrote:
> > index 32b8ad3d341b..49d73d979cb3 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
> >  {
> >  	int ret = 0;
> > 
> 
> Hi Ming,
> 
> > +	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx) {
> > +		set->tags[hctx_idx] = set->tags[0];
> 
> Here we set all tags same as that of hctx index 0.
> 
> > +		return true;
> 
> 
> As such, I think that the error handling in __blk_mq_alloc_rq_maps() is made
> a little fragile:
> 
> __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> {
> 	int i;
> 
> 	for (i = 0; i < set->nr_hw_queues; i++)
> 		if (!__blk_mq_alloc_rq_map(set, i))
> 			goto out_unwind;
> 
> 	return 0;
> 
> out_unwind:
> 	while (--i >= 0)
> 		blk_mq_free_rq_map(set->tags[i]);
> 
> 	return -ENOMEM;
> }
> 
> If __blk_mq_alloc_rq_map(, i > 1) fails for when BLK_MQ_F_HOST_TAGS FLAG is
> set (even though today it can't), then we would try to free set->tags[0]
> multiple times.

Good catch, and the issue can be addressed easily by setting set->hctx[i] as
NULL, then check 'tags' in blk_mq_free_rq_map().

Thanks,
Ming

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags
  2019-06-24  8:46     ` Ming Lei
@ 2019-06-24 13:14       ` John Garry
  0 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2019-06-24 13:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, linux-scsi, Martin K . Petersen,
	James Bottomley, Bart Van Assche, Hannes Reinecke, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 24/06/2019 09:46, Ming Lei wrote:
> On Wed, Jun 05, 2019 at 03:10:51PM +0100, John Garry wrote:
>> On 31/05/2019 03:27, Ming Lei wrote:
>>> index 32b8ad3d341b..49d73d979cb3 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2433,6 +2433,11 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
>>>  {
>>>  	int ret = 0;
>>>
>>
>> Hi Ming,
>>
>>> +	if ((set->flags & BLK_MQ_F_HOST_TAGS) && hctx_idx) {
>>> +		set->tags[hctx_idx] = set->tags[0];
>>
>> Here we set all tags same as that of hctx index 0.
>>
>>> +		return true;
>>
>>
>> As such, I think that the error handling in __blk_mq_alloc_rq_maps() is made
>> a little fragile:
>>
>> __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>> {
>> 	int i;
>>
>> 	for (i = 0; i < set->nr_hw_queues; i++)
>> 		if (!__blk_mq_alloc_rq_map(set, i))
>> 			goto out_unwind;
>>
>> 	return 0;
>>
>> out_unwind:
>> 	while (--i >= 0)
>> 		blk_mq_free_rq_map(set->tags[i]);
>>
>> 	return -ENOMEM;
>> }
>>
>> If __blk_mq_alloc_rq_map(, i > 1) fails for when BLK_MQ_F_HOST_TAGS FLAG is
>> set (even though today it can't), then we would try to free set->tags[0]
>> multiple times.
>

Hi Ming,

> Good catch, and the issue can be addressed easily by setting set->hctx[i] as
> NULL, then check 'tags' in blk_mq_free_rq_map().

OK, so you could do that. But I just think that it's not a great 
practice in general to have multiple pointers pointing at the same 
dynamic memory.

Thanks,
John

>
> Thanks,
> Ming
>
> .
>



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue
  2019-06-04  8:49 ` [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq " John Garry
@ 2019-08-13  8:30   ` John Garry
  0 siblings, 0 replies; 48+ messages in thread
From: John Garry @ 2019-08-13  8:30 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, linux-scsi, Martin K . Petersen
  Cc: James Bottomley, Bart Van Assche, Hannes Reinecke, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig, chenxiang

On 04/06/2019 09:49, John Garry wrote:
> On 31/05/2019 03:27, Ming Lei wrote:
>> Hi,


Hi Ming,

I'm raising the hostwide tags issue again, which I brought up in 
https://www.spinics.net/lists/linux-block/msg43754.html

Here's that discussion:

 >> I don't mean to hijack this thread, but JFYI we're getting around to 
test
 >> https://github.com/ming1/linux/commits/v5.2-rc-host-tags-V2 - 
unfortunately
 >> we're still seeing a performance regression. I can't see where it's 
coming
 >> from. We're double-checking the test though.
 >
 > host-tag patchset is only for several particular drivers which use
 > private reply queue as completion queue.
 >
 > This patchset is for handling generic blk-mq CPU hotplug issue, and
 > the several particular scsi drivers(hisi_sas_v3, hpsa, megaraid_sas and
 > mp3sas) won't be covered so far.
 >
 > I'd suggest to move on for generic blk-mq devices first given now blk-mq
 > is the only request IO path now.
 >
 > There are at least two choices for us to handle drivers/devices with
 > private completion queue:
 >
 > 1) host-tags
 > - performance issue shouldn't be hard to solve, given it is same with
 > with single tags in theory, and just corner cases is there.
 >
 > What I am not glad with this approach is that blk-mq-tag code becomes 
mess.

Right, not so neat. And we see a 3M vs 2.4M IOPS drop with this 
patchset. That's without any optimisation like discussed in 
https://lkml.org/lkml/2019/8/10/124

And I don't know about any impact of performance of hosts which already 
exposed multiple queues.

Note that it's still much better than the 700K IOPS we see without 
managed interrupts at all.

 >
 > 2) private callback
 > - we could define private callback to drain each completion queue in
 >   driver simply.

Yeah, but then we raise the question of why the LLDD can't just register 
for hotplug handler itself.

Personally I prefer #1. I'll have a look at the issues when I get a chance.

Much appreciated,
John




^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-08-13  8:31 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-31  2:27 [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq hw queue Ming Lei
2019-05-31  2:27 ` [PATCH 1/9] blk-mq: allow hw queues to share hostwide tags Ming Lei
2019-05-31  6:07   ` Hannes Reinecke
2019-05-31 15:37   ` Bart Van Assche
2019-06-24  8:44     ` Ming Lei
2019-06-05 14:10   ` John Garry
2019-06-24  8:46     ` Ming Lei
2019-06-24 13:14       ` John Garry
2019-05-31  2:27 ` [PATCH 2/9] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
2019-05-31  6:08   ` Hannes Reinecke
2019-05-31 15:39   ` Bart Van Assche
2019-06-24  8:43     ` Ming Lei
2019-06-02  1:56   ` Minwoo Im
2019-05-31  2:27 ` [PATCH 3/9] scsi: Add template flag 'host_tagset' Ming Lei
2019-05-31  6:08   ` Hannes Reinecke
2019-05-31  2:27 ` [PATCH 4/9] scsi_debug: support host tagset Ming Lei
2019-05-31  6:09   ` Hannes Reinecke
2019-06-02  2:03   ` Minwoo Im
2019-06-02 17:01   ` Douglas Gilbert
2019-05-31  2:27 ` [PATCH 5/9] scsi: introduce scsi_cmnd_hctx_index() Ming Lei
2019-05-31  6:10   ` Hannes Reinecke
2019-05-31  2:27 ` [PATCH 6/9] scsi: hpsa: convert private reply queue to blk-mq hw queue Ming Lei
2019-05-31  6:15   ` Hannes Reinecke
2019-05-31  6:30     ` Ming Lei
2019-05-31  6:40       ` Hannes Reinecke
2019-05-31  2:27 ` [PATCH 7/9] scsi: hisi_sas_v3: " Ming Lei
2019-05-31  6:20   ` Hannes Reinecke
2019-05-31  6:34     ` Ming Lei
2019-05-31  6:42       ` Hannes Reinecke
2019-05-31  7:14         ` Ming Lei
2019-05-31 11:38       ` John Garry
2019-06-03 11:00         ` Ming Lei
2019-06-03 13:00           ` John Garry
2019-06-04 13:37             ` Ming Lei
2019-05-31  2:28 ` [PATCH 8/9] scsi: megaraid: " Ming Lei
2019-05-31  6:22   ` Hannes Reinecke
2019-06-01 21:41   ` Kashyap Desai
2019-06-02  6:42     ` Ming Lei
2019-06-02  7:48       ` Ming Lei
2019-06-02 16:34         ` Kashyap Desai
2019-06-03  3:56           ` Ming Lei
2019-06-03 10:00             ` Kashyap Desai
2019-06-07  9:45             ` Kashyap Desai
2019-05-31  2:28 ` [PATCH 9/9] scsi: mp3sas: " Ming Lei
2019-05-31  6:23   ` Hannes Reinecke
2019-06-06 11:58   ` Sreekanth Reddy
2019-06-04  8:49 ` [PATCH 0/9] blk-mq/scsi: convert private reply queue into blk_mq " John Garry
2019-08-13  8:30   ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).