All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
@ 2018-02-05 15:20 Ming Lei
  2018-02-05 15:20 ` [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

Hi All,

This patchset supports global tags which was started by Hannes originally:

	https://marc.info/?l=linux-block&m=149132580511346&w=2

Also inroduce 'force_blk_mq' and 'host_tagset' to 'struct scsi_host_template',
so that driver can avoid to support two IO paths(legacy and blk-mq), especially
recent discusion mentioned that SCSI_MQ will be enabled at default soon.

	https://marc.info/?l=linux-scsi&m=151727684915589&w=2

With the above changes, it should be easier to convert SCSI drivers'
reply queue into blk-mq's hctx, then the automatic irq affinity issue can
be solved easily, please see detailed descrption in commit log and the
8th patch for converting HPSA.

Also drivers may require to complete request on the submission CPU
for avoiding hard/soft deadlock, which can be done easily with blk_mq
too.

	https://marc.info/?t=151601851400001&r=1&w=2

The 6th patch uses the introduced 'force_blk_mq' to fix virtio_scsi
so that IO hang issue can be avoided inside legacy IO path, this issue is
a bit generic, at least HPSA/virtio-scsi are found broken with v4.15+.

The 7th & 8th patch fixes HPSA's IO issue which is caused by the reply
queue selection algorithem.

Laurence has verified that this patch makes HPSA working with the linus
tree with this patchset.

The V2 can be found in the following tree/branch:

	https://github.com/ming1/linux/commits/v4.15-for-next-global-tags-v2

V2:
	- fix restart code for global tags
	- fixes HPSA's IO hang issue
	- add 'scsi: Add template flag 'host_tagset''
	- reorder patch

Thanks
Ming

Hannes Reinecke (1):
  scsi: Add template flag 'host_tagset'

Ming Lei (7):
  blk-mq: tags: define several fields of tags as pointer
  blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  block: null_blk: introduce module parameter of 'g_global_tags'
  scsi: introduce force_blk_mq
  scsi: virtio_scsi: fix IO hang by irq vector automatic affinity
  scsi: hpsa: call hpsa_hba_inquiry() after adding host
  scsi: hpsa: use blk_mq to solve irq affinity issue

 block/bfq-iosched.c        |  4 +--
 block/blk-mq-debugfs.c     | 11 ++++----
 block/blk-mq-sched.c       | 13 ++++++++-
 block/blk-mq-tag.c         | 67 +++++++++++++++++++++++++++++-----------------
 block/blk-mq-tag.h         | 15 ++++++++---
 block/blk-mq.c             | 31 ++++++++++++++++-----
 block/blk-mq.h             |  3 ++-
 block/kyber-iosched.c      |  2 +-
 drivers/block/null_blk.c   |  6 +++++
 drivers/scsi/hosts.c       |  1 +
 drivers/scsi/hpsa.c        | 56 ++++++++++++++++++++++++--------------
 drivers/scsi/scsi_lib.c    |  2 ++
 drivers/scsi/virtio_scsi.c | 59 +++-------------------------------------
 include/linux/blk-mq.h     |  2 ++
 include/scsi/scsi_host.h   |  6 +++++
 15 files changed, 157 insertions(+), 121 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-06 21:41   ` Omar Sandoval
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

This patch changes tags->breserved_tags, tags->bitmap_tags and
tags->active_queues as pointer, and prepares for supporting global tags.

No functional change.

Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c    |  4 ++--
 block/blk-mq-debugfs.c | 10 +++++-----
 block/blk-mq-tag.c     | 48 ++++++++++++++++++++++++++----------------------
 block/blk-mq-tag.h     | 10 +++++++---
 block/blk-mq.c         |  2 +-
 block/kyber-iosched.c  |  2 +-
 6 files changed, 42 insertions(+), 34 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 47e6ec7427c4..1e1211814a57 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -534,9 +534,9 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
 	} else
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
 
 	if (unlikely(bfqd->sb_shift != bt->sb.shift))
 		bfq_update_depths(bfqd, bt);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 21cbc1f071c6..0dfafa4b655a 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -433,14 +433,14 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 	seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
 	seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
 	seq_printf(m, "active_queues=%d\n",
-		   atomic_read(&tags->active_queues));
+		   atomic_read(tags->active_queues));
 
 	seq_puts(m, "\nbitmap_tags:\n");
-	sbitmap_queue_show(&tags->bitmap_tags, m);
+	sbitmap_queue_show(tags->bitmap_tags, m);
 
 	if (tags->nr_reserved_tags) {
 		seq_puts(m, "\nbreserved_tags:\n");
-		sbitmap_queue_show(&tags->breserved_tags, m);
+		sbitmap_queue_show(tags->breserved_tags, m);
 	}
 }
 
@@ -471,7 +471,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->tags)
-		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
@@ -505,7 +505,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	if (res)
 		goto out;
 	if (hctx->sched_tags)
-		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
+		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
 out:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 336dde07b230..571797dc36cb 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -18,7 +18,7 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
 	if (!tags)
 		return true;
 
-	return sbitmap_any_bit_clear(&tags->bitmap_tags.sb);
+	return sbitmap_any_bit_clear(&tags->bitmap_tags->sb);
 }
 
 /*
@@ -28,7 +28,7 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
 	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
-		atomic_inc(&hctx->tags->active_queues);
+		atomic_inc(hctx->tags->active_queues);
 
 	return true;
 }
@@ -38,9 +38,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->bitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->breserved_tags);
 }
 
 /*
@@ -54,7 +54,7 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return;
 
-	atomic_dec(&tags->active_queues);
+	atomic_dec(tags->active_queues);
 
 	blk_mq_tag_wakeup_all(tags, false);
 }
@@ -79,7 +79,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 	if (bt->sb.depth == 1)
 		return true;
 
-	users = atomic_read(&hctx->tags->active_queues);
+	users = atomic_read(hctx->tags->active_queues);
 	if (!users)
 		return true;
 
@@ -117,10 +117,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_TAG_FAIL;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->breserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->bitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -165,9 +165,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->breserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->bitmap_tags;
 
 		finish_wait(&ws->wait, &wait);
 		ws = bt_wait_ptr(bt, data->hctx);
@@ -189,10 +189,10 @@ void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -283,8 +283,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, tags->breserved_tags, fn, priv, true);
+	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, false);
 }
 
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
@@ -346,8 +346,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
 	}
 
 }
@@ -365,15 +365,15 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
 	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
 
-	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
+	if (bt_alloc(tags->bitmap_tags, depth, round_robin, node))
 		goto free_tags;
-	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
+	if (bt_alloc(tags->breserved_tags, tags->nr_reserved_tags, round_robin,
 		     node))
 		goto free_bitmap_tags;
 
 	return tags;
 free_bitmap_tags:
-	sbitmap_queue_free(&tags->bitmap_tags);
+	sbitmap_queue_free(tags->bitmap_tags);
 free_tags:
 	kfree(tags);
 	return NULL;
@@ -397,13 +397,17 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	tags->bitmap_tags = &tags->__bitmap_tags;
+	tags->breserved_tags = &tags->__breserved_tags;
+	tags->active_queues = &tags->__active_queues;
+
 	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
 }
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	sbitmap_queue_free(&tags->bitmap_tags);
-	sbitmap_queue_free(&tags->breserved_tags);
+	sbitmap_queue_free(tags->bitmap_tags);
+	sbitmap_queue_free(tags->breserved_tags);
 	kfree(tags);
 }
 
@@ -454,7 +458,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags, tdepth);
+		sbitmap_queue_resize(tags->bitmap_tags, tdepth);
 	}
 
 	return 0;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..a68323fa0c02 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -11,10 +11,14 @@ struct blk_mq_tags {
 	unsigned int nr_tags;
 	unsigned int nr_reserved_tags;
 
-	atomic_t active_queues;
+	atomic_t *active_queues;
+	atomic_t __active_queues;
 
-	struct sbitmap_queue bitmap_tags;
-	struct sbitmap_queue breserved_tags;
+	struct sbitmap_queue *bitmap_tags;
+	struct sbitmap_queue *breserved_tags;
+
+	struct sbitmap_queue __bitmap_tags;
+	struct sbitmap_queue __breserved_tags;
 
 	struct request **rqs;
 	struct request **static_rqs;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102e2149..69d4534870af 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1136,7 +1136,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx,
 		return false;
 	}
 
-	ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
+	ws = bt_wait_ptr(this_hctx->tags->bitmap_tags, this_hctx);
 	add_wait_queue(&ws->wait, wait);
 
 	/*
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index f95c60774ce8..4adaced76382 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -281,7 +281,7 @@ static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
  2018-02-05 15:20 ` [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-06 20:33   ` Omar Sandoval
                     ` (3 more replies)
  2018-02-05 15:20 ` [PATCH V2 3/8] scsi: Add template flag 'host_tagset' Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 4 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple
reply queues, but tags is often HBA wide.

These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
for automatic affinity assignment.

Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
for some irq vectors, this can't be avoided even though the allocation
is improved.

So all these drivers have to avoid to ask HBA to complete request in
reply queue which hasn't online CPUs assigned, and HPSA has been broken
with v4.15+:

	https://marc.info/?l=linux-kernel&m=151748144730409&w=2

This issue can be solved generically and easily via blk_mq(scsi_mq) multiple
hw queue by mapping each reply queue into hctx, but one tricky thing is
the HBA wide(instead of hw queue wide) tags.

This patch is based on the following Hannes's patch:

	https://marc.info/?l=linux-block&m=149132580511346&w=2

One big difference with Hannes's is that this patch only makes the tags sbitmap
and active_queues data structure HBA wide, and others are kept as NUMA locality,
such as request, hctx, tags, ...

The following patch will support global tags on null_blk, also the performance
data is provided, no obvious performance loss is observed when the whole
hw queue depth is same.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-mq-sched.c   | 13 ++++++++++++-
 block/blk-mq-tag.c     | 23 ++++++++++++++++++-----
 block/blk-mq-tag.h     |  5 ++++-
 block/blk-mq.c         | 29 ++++++++++++++++++++++++-----
 block/blk-mq.h         |  3 ++-
 include/linux/blk-mq.h |  2 ++
 7 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 0dfafa4b655a..0f0fafe03f5d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
 	HCTX_FLAG_NAME(TAG_SHARED),
 	HCTX_FLAG_NAME(SG_MERGE),
+	HCTX_FLAG_NAME(GLOBAL_TAGS),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 55c0a745b427..385bbec73804 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
 	} else
 		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
 
+	/* need to restart all hw queues for global tags */
+	if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
+		struct blk_mq_hw_ctx *hctx2;
+		int i;
+
+		queue_for_each_hw_ctx(hctx->queue, hctx2, i)
+			if (blk_mq_run_hw_queue(hctx2, true))
+				return true;
+		return false;
+	}
+
 	return blk_mq_run_hw_queue(hctx, true);
 }
 
@@ -495,7 +506,7 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	int ret;
 
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
-					       set->reserved_tags);
+					       set->reserved_tags, false);
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 571797dc36cb..66377d09eaeb 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -379,9 +379,11 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return NULL;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
-				     int node, int alloc_policy)
+				     int node, int alloc_policy,
+				     bool global_tag)
 {
 	struct blk_mq_tags *tags;
 
@@ -397,6 +399,14 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	WARN_ON(global_tag && !set->global_tags);
+	if (global_tag && set->global_tags) {
+		tags->bitmap_tags = set->global_tags->bitmap_tags;
+		tags->breserved_tags = set->global_tags->breserved_tags;
+		tags->active_queues = set->global_tags->active_queues;
+		tags->global_tags = true;
+		return tags;
+	}
 	tags->bitmap_tags = &tags->__bitmap_tags;
 	tags->breserved_tags = &tags->__breserved_tags;
 	tags->active_queues = &tags->__active_queues;
@@ -406,8 +416,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 void blk_mq_free_tags(struct blk_mq_tags *tags)
 {
-	sbitmap_queue_free(tags->bitmap_tags);
-	sbitmap_queue_free(tags->breserved_tags);
+	if (!tags->global_tags) {
+		sbitmap_queue_free(tags->bitmap_tags);
+		sbitmap_queue_free(tags->breserved_tags);
+	}
 	kfree(tags);
 }
 
@@ -441,7 +453,8 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		if (tdepth > 16 * BLKDEV_MAX_RQ)
 			return -EINVAL;
 
-		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0);
+		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0,
+				(*tagsptr)->global_tags);
 		if (!new)
 			return -ENOMEM;
 		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index a68323fa0c02..a87b5cfa5726 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -20,13 +20,16 @@ struct blk_mq_tags {
 	struct sbitmap_queue __bitmap_tags;
 	struct sbitmap_queue __breserved_tags;
 
+	bool	global_tags;
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+		unsigned int nr_tags, unsigned int reserved_tags, int node,
+		int alloc_policy, bool global_tag);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 69d4534870af..a98466dc71b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2023,7 +2023,8 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags)
+					unsigned int reserved_tags,
+					bool global_tags)
 {
 	struct blk_mq_tags *tags;
 	int node;
@@ -2032,8 +2033,9 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
-				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
+				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
+				global_tags);
 	if (!tags)
 		return NULL;
 
@@ -2336,7 +2338,8 @@ static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 	int ret = 0;
 
 	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
-					set->queue_depth, set->reserved_tags);
+					set->queue_depth, set->reserved_tags,
+					!!(set->flags & BLK_MQ_F_GLOBAL_TAGS));
 	if (!set->tags[hctx_idx])
 		return false;
 
@@ -2891,15 +2894,28 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (set->flags & BLK_MQ_F_GLOBAL_TAGS) {
+		ret = -ENOMEM;
+		set->global_tags = blk_mq_init_tags(set, set->queue_depth,
+				set->reserved_tags, set->numa_node,
+				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
+				false);
+		if (!set->global_tags)
+			goto out_free_mq_map;
+	}
+
 	ret = blk_mq_alloc_rq_maps(set);
 	if (ret)
-		goto out_free_mq_map;
+		goto out_free_global_tags;
 
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
 	return 0;
 
+out_free_global_tags:
+	if (set->global_tags)
+		blk_mq_free_tags(set->global_tags);
 out_free_mq_map:
 	kfree(set->mq_map);
 	set->mq_map = NULL;
@@ -2914,6 +2930,9 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i;
 
+	if (set->global_tags)
+		blk_mq_free_tags(set->global_tags);
+
 	for (i = 0; i < nr_cpu_ids; i++)
 		blk_mq_free_map_and_requests(set, i);
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..d1d9a0a8e1fa 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -61,7 +61,8 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags);
+					unsigned int reserved_tags,
+					bool global_tags);
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..8548c72d6b4a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -82,6 +82,7 @@ struct blk_mq_tag_set {
 	void			*driver_data;
 
 	struct blk_mq_tags	**tags;
+	struct blk_mq_tags	*global_tags;	/* for BLK_MQ_F_GLOBAL_TAGS */
 
 	struct mutex		tag_list_lock;
 	struct list_head	tag_list;
@@ -175,6 +176,7 @@ enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
+	BLK_MQ_F_GLOBAL_TAGS	= 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 3/8] scsi: Add template flag 'host_tagset'
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
  2018-02-05 15:20 ` [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer Ming Lei
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-05 15:20 ` [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Hannes Reinecke, Ming Lei

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' to enable the use of a global
tagset for block-mq.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 55be2550c555..9ab74ac634ea 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2274,6 +2274,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_GLOBAL_TAGS;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index a8b7bf879ced..f6623f887ee4 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -457,6 +457,9 @@ struct scsi_host_template {
 	 */
 	unsigned int max_host_blocked;
 
+	/* True if the host supports a host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Default value for the blocking.  If the queue is empty,
 	 * host_blocked counts down in the request_fn until it restarts
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (2 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 3/8] scsi: Add template flag 'host_tagset' Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-05 20:26     ` Don Brace
  2018-02-06 21:43   ` Omar Sandoval
  2018-02-05 15:20 ` [PATCH V2 5/8] scsi: introduce force_blk_mq Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

This patch introduces the parameter of 'g_global_tags' so that we can
test this feature by null_blk easiy.

Not see obvious performance drop with global_tags when the whole hw
depth is kept as same:

1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=0 submit_queues=4 hw_queue_depth=1

2) 'global_tags', global hw queue depth is 4, and 4 hw queues
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=1 submit_queues=4 hw_queue_depth=4

3) fio test done in above two settings:
   fio --bs=4k --size=512G  --rw=randread --norandommap --direct=1 --ioengine=libaio --iodepth=4 --runtime=$RUNTIME --group_reporting=1  --name=nullb0 --filename=/dev/nullb0 --name=nullb1 --filename=/dev/nullb1 --name=nullb2 --filename=/dev/nullb2 --name=nullb3 --filename=/dev/nullb3

1M IOPS can be reached in both above tests which is done in one VM.

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/null_blk.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 287a09611c0f..ad0834efad42 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -163,6 +163,10 @@ static int g_submit_queues = 1;
 module_param_named(submit_queues, g_submit_queues, int, S_IRUGO);
 MODULE_PARM_DESC(submit_queues, "Number of submission queues");
 
+static int g_global_tags = 0;
+module_param_named(global_tags, g_global_tags, int, S_IRUGO);
+MODULE_PARM_DESC(global_tags, "All submission queues share one tags");
+
 static int g_home_node = NUMA_NO_NODE;
 module_param_named(home_node, g_home_node, int, S_IRUGO);
 MODULE_PARM_DESC(home_node, "Home node for the device");
@@ -1622,6 +1626,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
 	if (g_no_sched)
 		set->flags |= BLK_MQ_F_NO_SCHED;
+	if (g_global_tags)
+		set->flags |= BLK_MQ_F_GLOBAL_TAGS;
 	set->driver_data = NULL;
 
 	if ((nullb && nullb->dev->blocking) || g_blocking)
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 5/8] scsi: introduce force_blk_mq
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (3 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-06 20:20   ` Omar Sandoval
  2018-02-05 15:20 ` [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

>From scsi driver view, it is a bit troublesome to support both blk-mq
and non-blk-mq at the same time, especially when drivers need to support
multi hw-queue.

This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
can provide blk-mq only support, so driver code can avoid the trouble
for supporting both.

This patch may clean up driver a lot by providing blk-mq only support, espeically
it is easier to convert multiple reply queues into blk_mq's MQ for the following
purposes:

1) use blk_mq multiple hw queue to deal with allocated irq vectors of all offline
CPU affinity[1]:

	[1] https://marc.info/?l=linux-kernel&m=151748144730409&w=2

Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
for some irq vectors, this can't be avoided even though the allocation
is improved.

So all these drivers have to avoid to ask HBA to complete request in
reply queue which hasn't online CPUs assigned.

This issue can be solved generically and easily via blk_mq(scsi_mq) multiple
hw queue by mapping each reply queue into hctx.

2) some drivers[1] require to complete request in the submission CPU for
avoiding hard/soft lockup, which is easily done with blk_mq, so not necessary
to reinvent wheels for solving the problem.

	[2] https://marc.info/?t=151601851400001&r=1&w=2

Sovling the above issues for non-MQ path may not be easy, or introduce
unnecessary work, especially we plan to enable SCSI_MQ soon as discussed
recently[3]:

	[3] https://marc.info/?l=linux-scsi&m=151727684915589&w=2

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hosts.c     | 1 +
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index fe3a0da3ec97..c75cebd7911d 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -471,6 +471,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 		shost->dma_boundary = 0xffffffff;
 
 	shost->use_blk_mq = scsi_use_blk_mq;
+	shost->use_blk_mq = scsi_use_blk_mq || !!shost->hostt->force_blk_mq;
 
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index f6623f887ee4..d67b2eaff8f1 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,6 +452,9 @@ struct scsi_host_template {
 	/* True if the controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	/* True if the low-level driver supports blk-mq only */
+	unsigned force_blk_mq:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (4 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 5/8] scsi: introduce force_blk_mq Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-05 15:56   ` Paolo Bonzini
  2018-02-05 15:20 ` [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
for some irq vectors, this can't be avoided even though the allocation
is improved.

For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues of
virtio-scsi, the irq affinity assigned can become the following shape:

	irq 36, cpu list 0-7
	irq 37, cpu list 0-7
	irq 38, cpu list 0-7
	irq 39, cpu list 0-1
	irq 40, cpu list 4,6
	irq 41, cpu list 2-3
	irq 42, cpu list 5,7

Then IO hang is triggered in case of non-SCSI_MQ.

Given storage IO is always C/S model, there isn't such issue with SCSI_MQ(blk-mq),
because no IO can be submitted to one hw queue if the hw queue hasn't online
CPUs.

Fix this issue by forcing to use blk_mq.

BTW, I have been used virtio-scsi(scsi_mq) for several years, and it has
been quite stable, so it shouldn't cause extra risk.

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 59 +++-------------------------------------------
 1 file changed, 3 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 7c28e8d4955a..54e3a0f6844c 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -91,9 +91,6 @@ struct virtio_scsi_vq {
 struct virtio_scsi_target_state {
 	seqcount_t tgt_seq;
 
-	/* Count of outstanding requests. */
-	atomic_t reqs;
-
 	/* Currently active virtqueue for requests sent to this target. */
 	struct virtio_scsi_vq *req_vq;
 };
@@ -152,8 +149,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 	struct virtio_scsi_cmd *cmd = buf;
 	struct scsi_cmnd *sc = cmd->sc;
 	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
 	dev_dbg(&sc->device->sdev_gendev,
 		"cmd %p response %u status %#02x sense_len %u\n",
@@ -210,8 +205,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 	}
 
 	sc->scsi_done(sc);
-
-	atomic_dec(&tgt->reqs);
 }
 
 static void virtscsi_vq_done(struct virtio_scsi *vscsi,
@@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
 					struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
-	atomic_inc(&tgt->reqs);
 	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
 }
 
@@ -596,55 +586,11 @@ static struct virtio_scsi_vq *virtscsi_pick_vq_mq(struct virtio_scsi *vscsi,
 	return &vscsi->req_vqs[hwq];
 }
 
-static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi,
-					       struct virtio_scsi_target_state *tgt)
-{
-	struct virtio_scsi_vq *vq;
-	unsigned long flags;
-	u32 queue_num;
-
-	local_irq_save(flags);
-	if (atomic_inc_return(&tgt->reqs) > 1) {
-		unsigned long seq;
-
-		do {
-			seq = read_seqcount_begin(&tgt->tgt_seq);
-			vq = tgt->req_vq;
-		} while (read_seqcount_retry(&tgt->tgt_seq, seq));
-	} else {
-		/* no writes can be concurrent because of atomic_t */
-		write_seqcount_begin(&tgt->tgt_seq);
-
-		/* keep previous req_vq if a reader just arrived */
-		if (unlikely(atomic_read(&tgt->reqs) > 1)) {
-			vq = tgt->req_vq;
-			goto unlock;
-		}
-
-		queue_num = smp_processor_id();
-		while (unlikely(queue_num >= vscsi->num_queues))
-			queue_num -= vscsi->num_queues;
-		tgt->req_vq = vq = &vscsi->req_vqs[queue_num];
- unlock:
-		write_seqcount_end(&tgt->tgt_seq);
-	}
-	local_irq_restore(flags);
-
-	return vq;
-}
-
 static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
 				       struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
-	struct virtio_scsi_vq *req_vq;
-
-	if (shost_use_blk_mq(sh))
-		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
-	else
-		req_vq = virtscsi_pick_vq(vscsi, tgt);
+	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
 
 	return virtscsi_queuecommand(vscsi, req_vq, sc);
 }
@@ -775,7 +721,6 @@ static int virtscsi_target_alloc(struct scsi_target *starget)
 		return -ENOMEM;
 
 	seqcount_init(&tgt->tgt_seq);
-	atomic_set(&tgt->reqs, 0);
 	tgt->req_vq = &vscsi->req_vqs[0];
 
 	starget->hostdata = tgt;
@@ -823,6 +768,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
 	.target_alloc = virtscsi_target_alloc,
 	.target_destroy = virtscsi_target_destroy,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 static struct scsi_host_template virtscsi_host_template_multi = {
@@ -844,6 +790,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
 	.target_destroy = virtscsi_target_destroy,
 	.map_queues = virtscsi_map_queues,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 #define virtscsi_config_get(vdev, fld) \
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (5 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-05 18:55     ` Don Brace
  2018-02-06  8:32   ` Hannes Reinecke
  2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
  2018-02-06 23:15 ` [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Jens Axboe
  8 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

So that we can decide the default reply queue by the map created
during adding host.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 287e5eb0723f..443eabf63a9f 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -8691,12 +8691,9 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Disable discovery polling.*/
 	h->discovery_polling = 0;
 
-
 	/* Turn the interrupts on so we can service requests */
 	h->access.set_intr_mask(h, HPSA_INTR_ON);
 
-	hpsa_hba_inquiry(h);
-
 	h->lastlogicals = kzalloc(sizeof(*(h->lastlogicals)), GFP_KERNEL);
 	if (!h->lastlogicals)
 		dev_info(&h->pdev->dev,
@@ -8707,6 +8704,8 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h */
 
+	hpsa_hba_inquiry(h);
+
 	/* Monitor the controller for firmware lockups */
 	h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
 	INIT_DELAYED_WORK(&h->monitor_ctlr_work, hpsa_monitor_ctlr_worker);
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (6 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host Ming Lei
@ 2018-02-05 15:20 ` Ming Lei
  2018-02-05 15:58     ` Laurence Oberman
                     ` (2 more replies)
  2018-02-06 23:15 ` [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Jens Axboe
  8 siblings, 3 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-05 15:20 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman, Ming Lei

This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
each reply queue to blk_mq's hw queue, then .queuecommand can always
choose the hw queue as the reply queue. And if no any online CPU is
mapped to one hw queue, request can't be submitted to this hw queue
at all, finally the irq affinity issue is solved.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 443eabf63a9f..e517a4c74a28 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -51,6 +51,7 @@
 #include <linux/jiffies.h>
 #include <linux/percpu-defs.h>
 #include <linux/percpu.h>
+#include <linux/blk-mq-pci.h>
 #include <asm/unaligned.h>
 #include <asm/div64.h>
 #include "hpsa_cmd.h"
@@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
 #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
 				 HPSA_MAX_CONCURRENT_PASSTHRUS)
 
+static int hpsa_map_queues(struct Scsi_Host *shost)
+{
+        struct ctlr_info *h = shost_to_hba(shost);
+
+        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
+}
+
 static struct scsi_host_template hpsa_driver_template = {
 	.module			= THIS_MODULE,
 	.name			= HPSA,
@@ -974,10 +982,13 @@ static struct scsi_host_template hpsa_driver_template = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl		= hpsa_compat_ioctl,
 #endif
+	.map_queues = hpsa_map_queues,
 	.sdev_attrs = hpsa_sdev_attrs,
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 1024,
 	.no_write_same = 1,
+	.force_blk_mq = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -1045,11 +1056,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (unlikely(!h->msix_vectors))
 			return;
-		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-			c->Header.ReplyQueue =
-				raw_smp_processor_id() % h->nreply_queues;
-		else
-			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+		c->Header.ReplyQueue = reply_queue;
 	}
 }
 
@@ -1063,10 +1070,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->ReplyQueue = reply_queue % h->nreply_queues;
+	cp->ReplyQueue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit (bit 0)
@@ -1087,10 +1091,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
 	/* Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/* Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
 	 *  - pull count (bits 0-3)
@@ -1109,10 +1110,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
@@ -1152,11 +1150,27 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
 		h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
 }
 
+static unsigned get_reply_queue(struct ctlr_info *h, struct CommandList *c)
+{
+	struct scsi_cmnd *cmd = c->scsi_cmd;
+	int *map;
+
+	if (cmd && cmd->request) {
+		u32 tag = blk_mq_unique_tag(cmd->request);
+		return blk_mq_unique_tag_to_hwq(tag);
+	}
+
+	map = h->scsi_host->tag_set.mq_map;
+	return map[raw_smp_processor_id()];
+}
+
 static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	struct CommandList *c, int reply_queue)
 {
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
+
+	reply_queue = get_reply_queue(h, c);
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
 {
 	int rv;
 
+	/* map reply queue to blk_mq hw queue */
+	h->scsi_host->nr_hw_queues = h->msix_vectors;
+
 	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
 	if (rv) {
 		dev_err(&h->pdev->dev, "scsi_add_host failed\n");
-- 
2.9.5

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity
  2018-02-05 15:20 ` [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity Ming Lei
@ 2018-02-05 15:56   ` Paolo Bonzini
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2018-02-05 15:56 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman

On 05/02/2018 16:20, Ming Lei wrote:
> Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
> has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
> for some irq vectors, this can't be avoided even though the allocation
> is improved.
> 
> For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues of
> virtio-scsi, the irq affinity assigned can become the following shape:
> 
> 	irq 36, cpu list 0-7
> 	irq 37, cpu list 0-7
> 	irq 38, cpu list 0-7
> 	irq 39, cpu list 0-1
> 	irq 40, cpu list 4,6
> 	irq 41, cpu list 2-3
> 	irq 42, cpu list 5,7
> 
> Then IO hang is triggered in case of non-SCSI_MQ.
> 
> Given storage IO is always C/S model, there isn't such issue with SCSI_MQ(blk-mq),
> because no IO can be submitted to one hw queue if the hw queue hasn't online
> CPUs.
> 
> Fix this issue by forcing to use blk_mq.
> 
> BTW, I have been used virtio-scsi(scsi_mq) for several years, and it has
> been quite stable, so it shouldn't cause extra risk.

I think that's ok now that we have I/O schedulers for blk-mq.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/virtio_scsi.c | 59 +++-------------------------------------------
>  1 file changed, 3 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
> index 7c28e8d4955a..54e3a0f6844c 100644
> --- a/drivers/scsi/virtio_scsi.c
> +++ b/drivers/scsi/virtio_scsi.c
> @@ -91,9 +91,6 @@ struct virtio_scsi_vq {
>  struct virtio_scsi_target_state {
>  	seqcount_t tgt_seq;
>  
> -	/* Count of outstanding requests. */
> -	atomic_t reqs;
> -
>  	/* Currently active virtqueue for requests sent to this target. */
>  	struct virtio_scsi_vq *req_vq;
>  };
> @@ -152,8 +149,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
>  	struct virtio_scsi_cmd *cmd = buf;
>  	struct scsi_cmnd *sc = cmd->sc;
>  	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
> -	struct virtio_scsi_target_state *tgt =
> -				scsi_target(sc->device)->hostdata;
>  
>  	dev_dbg(&sc->device->sdev_gendev,
>  		"cmd %p response %u status %#02x sense_len %u\n",
> @@ -210,8 +205,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
>  	}
>  
>  	sc->scsi_done(sc);
> -
> -	atomic_dec(&tgt->reqs);
>  }
>  
>  static void virtscsi_vq_done(struct virtio_scsi *vscsi,
> @@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
>  					struct scsi_cmnd *sc)
>  {
>  	struct virtio_scsi *vscsi = shost_priv(sh);
> -	struct virtio_scsi_target_state *tgt =
> -				scsi_target(sc->device)->hostdata;
>  
> -	atomic_inc(&tgt->reqs);
>  	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
>  }
>  
> @@ -596,55 +586,11 @@ static struct virtio_scsi_vq *virtscsi_pick_vq_mq(struct virtio_scsi *vscsi,
>  	return &vscsi->req_vqs[hwq];
>  }
>  
> -static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi,
> -					       struct virtio_scsi_target_state *tgt)
> -{
> -	struct virtio_scsi_vq *vq;
> -	unsigned long flags;
> -	u32 queue_num;
> -
> -	local_irq_save(flags);
> -	if (atomic_inc_return(&tgt->reqs) > 1) {
> -		unsigned long seq;
> -
> -		do {
> -			seq = read_seqcount_begin(&tgt->tgt_seq);
> -			vq = tgt->req_vq;
> -		} while (read_seqcount_retry(&tgt->tgt_seq, seq));
> -	} else {
> -		/* no writes can be concurrent because of atomic_t */
> -		write_seqcount_begin(&tgt->tgt_seq);
> -
> -		/* keep previous req_vq if a reader just arrived */
> -		if (unlikely(atomic_read(&tgt->reqs) > 1)) {
> -			vq = tgt->req_vq;
> -			goto unlock;
> -		}
> -
> -		queue_num = smp_processor_id();
> -		while (unlikely(queue_num >= vscsi->num_queues))
> -			queue_num -= vscsi->num_queues;
> -		tgt->req_vq = vq = &vscsi->req_vqs[queue_num];
> - unlock:
> -		write_seqcount_end(&tgt->tgt_seq);
> -	}
> -	local_irq_restore(flags);
> -
> -	return vq;
> -}
> -
>  static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
>  				       struct scsi_cmnd *sc)
>  {
>  	struct virtio_scsi *vscsi = shost_priv(sh);
> -	struct virtio_scsi_target_state *tgt =
> -				scsi_target(sc->device)->hostdata;
> -	struct virtio_scsi_vq *req_vq;
> -
> -	if (shost_use_blk_mq(sh))
> -		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
> -	else
> -		req_vq = virtscsi_pick_vq(vscsi, tgt);
> +	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
>  
>  	return virtscsi_queuecommand(vscsi, req_vq, sc);
>  }
> @@ -775,7 +721,6 @@ static int virtscsi_target_alloc(struct scsi_target *starget)
>  		return -ENOMEM;
>  
>  	seqcount_init(&tgt->tgt_seq);
> -	atomic_set(&tgt->reqs, 0);
>  	tgt->req_vq = &vscsi->req_vqs[0];
>  
>  	starget->hostdata = tgt;
> @@ -823,6 +768,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
>  	.target_alloc = virtscsi_target_alloc,
>  	.target_destroy = virtscsi_target_destroy,
>  	.track_queue_depth = 1,
> +	.force_blk_mq = 1,
>  };
>  
>  static struct scsi_host_template virtscsi_host_template_multi = {
> @@ -844,6 +790,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
>  	.target_destroy = virtscsi_target_destroy,
>  	.map_queues = virtscsi_map_queues,
>  	.track_queue_depth = 1,
> +	.force_blk_mq = 1,
>  };
>  
>  #define virtscsi_config_get(vdev, fld) \
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
@ 2018-02-05 15:58     ` Laurence Oberman
  2018-02-06  2:18     ` chenxiang (M)
  2018-02-06  8:39   ` Hannes Reinecke
  2 siblings, 0 replies; 40+ messages in thread
From: Laurence Oberman @ 2018-02-05 15:58 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, Don Brace
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime
> maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, request can't be submitted to this hw queue
> at all, finally the irq affinity issue is solved.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++---------
> --------
>  1 file changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 443eabf63a9f..e517a4c74a28 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>  #include <linux/jiffies.h>
>  #include <linux/percpu-defs.h>
>  #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>  #include <asm/div64.h>
>  #include "hpsa_cmd.h"
> @@ -956,6 +957,13 @@ static struct device_attribute
> *hpsa_shost_attrs[] = {
>  #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
>  				 HPSA_MAX_CONCURRENT_PASSTHRUS)
>  
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +        struct ctlr_info *h = shost_to_hba(shost);
> +
> +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> +}
> +
>  static struct scsi_host_template hpsa_driver_template = {
>  	.module			= THIS_MODULE,
>  	.name			= HPSA,
> @@ -974,10 +982,13 @@ static struct scsi_host_template
> hpsa_driver_template = {
>  #ifdef CONFIG_COMPAT
>  	.compat_ioctl		= hpsa_compat_ioctl,
>  #endif
> +	.map_queues = hpsa_map_queues,
>  	.sdev_attrs = hpsa_sdev_attrs,
>  	.shost_attrs = hpsa_shost_attrs,
>  	.max_sectors = 1024,
>  	.no_write_same = 1,
> +	.force_blk_mq = 1,
> +	.host_tagset = 1,
>  };
>  
>  static inline u32 next_command(struct ctlr_info *h, u8 q)
> @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct
> ctlr_info *h, struct CommandList *c,
>  		c->busaddr |= 1 | (h->blockFetchTable[c-
> >Header.SGList] << 1);
>  		if (unlikely(!h->msix_vectors))
>  			return;
> -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -			c->Header.ReplyQueue =
> -				raw_smp_processor_id() % h-
> >nreply_queues;
> -		else
> -			c->Header.ReplyQueue = reply_queue % h-
> >nreply_queues;
> +		c->Header.ReplyQueue = reply_queue;
>  	}
>  }
>  
> @@ -1063,10 +1070,7 @@ static void
> set_ioaccel1_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->ReplyQueue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> +	cp->ReplyQueue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit (bit 0)
> @@ -1087,10 +1091,7 @@ static void
> set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>  	/* Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/* Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
>  	 *  - pull count (bits 0-3)
> @@ -1109,10 +1110,7 @@ static void
> set_ioaccel2_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
> @@ -1152,11 +1150,27 @@ static void
> dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
>  		h->heartbeat_sample_interval =
> HEARTBEAT_SAMPLE_INTERVAL;
>  }
>  
> +static unsigned get_reply_queue(struct ctlr_info *h, struct
> CommandList *c)
> +{
> +	struct scsi_cmnd *cmd = c->scsi_cmd;
> +	int *map;
> +
> +	if (cmd && cmd->request) {
> +		u32 tag = blk_mq_unique_tag(cmd->request);
> +		return blk_mq_unique_tag_to_hwq(tag);
> +	}
> +
> +	map = h->scsi_host->tag_set.mq_map;
> +	return map[raw_smp_processor_id()];
> +}
> +
>  static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>  	struct CommandList *c, int reply_queue)
>  {
>  	dial_down_lockup_detection_during_fw_flash(h, c);
>  	atomic_inc(&h->commands_outstanding);
> +
> +	reply_queue = get_reply_queue(h, c);
>  	switch (c->cmd_type) {
>  	case CMD_IOACCEL1:
>  		set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info
> *h)
>  {
>  	int rv;
>  
> +	/* map reply queue to blk_mq hw queue */
> +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> +
>  	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
>  	if (rv) {
>  		dev_err(&h->pdev->dev, "scsi_add_host failed\n");

This is a critical issue on the HPSA because Linus already has the
original commit that causes the system to fail to boot.

All my testing was on DL380 G7 servers with:

Hewlett-Packard Company Smart Array G6 controllers 
Vendor: HP       Model: P410i            Rev: 6.64

Ming's patch fixes this so we need to try move this along.

I have a DL380 G8 as well which is also likely exposed here and I added
 Don Brace for FYI to this list.

Thanks Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
@ 2018-02-05 15:58     ` Laurence Oberman
  0 siblings, 0 replies; 40+ messages in thread
From: Laurence Oberman @ 2018-02-05 15:58 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime
> maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, request can't be submitted to this hw queue
> at all, finally the irq affinity issue is solved.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++---------
> --------
>  1 file changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 443eabf63a9f..e517a4c74a28 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>  #include <linux/jiffies.h>
>  #include <linux/percpu-defs.h>
>  #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>  #include <asm/div64.h>
>  #include "hpsa_cmd.h"
> @@ -956,6 +957,13 @@ static struct device_attribute
> *hpsa_shost_attrs[] = {
>  #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
>  				 HPSA_MAX_CONCURRENT_PASSTHRUS)
>  
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +        struct ctlr_info *h = shost_to_hba(shost);
> +
> +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> +}
> +
>  static struct scsi_host_template hpsa_driver_template = {
>  	.module			= THIS_MODULE,
>  	.name			= HPSA,
> @@ -974,10 +982,13 @@ static struct scsi_host_template
> hpsa_driver_template = {
>  #ifdef CONFIG_COMPAT
>  	.compat_ioctl		= hpsa_compat_ioctl,
>  #endif
> +	.map_queues = hpsa_map_queues,
>  	.sdev_attrs = hpsa_sdev_attrs,
>  	.shost_attrs = hpsa_shost_attrs,
>  	.max_sectors = 1024,
>  	.no_write_same = 1,
> +	.force_blk_mq = 1,
> +	.host_tagset = 1,
>  };
>  
>  static inline u32 next_command(struct ctlr_info *h, u8 q)
> @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct
> ctlr_info *h, struct CommandList *c,
>  		c->busaddr |= 1 | (h->blockFetchTable[c-
> >Header.SGList] << 1);
>  		if (unlikely(!h->msix_vectors))
>  			return;
> -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -			c->Header.ReplyQueue =
> -				raw_smp_processor_id() % h-
> >nreply_queues;
> -		else
> -			c->Header.ReplyQueue = reply_queue % h-
> >nreply_queues;
> +		c->Header.ReplyQueue = reply_queue;
>  	}
>  }
>  
> @@ -1063,10 +1070,7 @@ static void
> set_ioaccel1_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->ReplyQueue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> +	cp->ReplyQueue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit (bit 0)
> @@ -1087,10 +1091,7 @@ static void
> set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>  	/* Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/* Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
>  	 *  - pull count (bits 0-3)
> @@ -1109,10 +1110,7 @@ static void
> set_ioaccel2_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for
> this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h-
> >nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
> @@ -1152,11 +1150,27 @@ static void
> dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
>  		h->heartbeat_sample_interval =
> HEARTBEAT_SAMPLE_INTERVAL;
>  }
>  
> +static unsigned get_reply_queue(struct ctlr_info *h, struct
> CommandList *c)
> +{
> +	struct scsi_cmnd *cmd = c->scsi_cmd;
> +	int *map;
> +
> +	if (cmd && cmd->request) {
> +		u32 tag = blk_mq_unique_tag(cmd->request);
> +		return blk_mq_unique_tag_to_hwq(tag);
> +	}
> +
> +	map = h->scsi_host->tag_set.mq_map;
> +	return map[raw_smp_processor_id()];
> +}
> +
>  static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>  	struct CommandList *c, int reply_queue)
>  {
>  	dial_down_lockup_detection_during_fw_flash(h, c);
>  	atomic_inc(&h->commands_outstanding);
> +
> +	reply_queue = get_reply_queue(h, c);
>  	switch (c->cmd_type) {
>  	case CMD_IOACCEL1:
>  		set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info
> *h)
>  {
>  	int rv;
>  
> +	/* map reply queue to blk_mq hw queue */
> +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> +
>  	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
>  	if (rv) {
>  		dev_err(&h->pdev->dev, "scsi_add_host failed\n");

This is a critical issue on the HPSA because Linus already has the
original commit that causes the system to fail to boot.

All my testing was on DL380 G7 servers with:

Hewlett-Packard Company Smart Array G6 controllers 
Vendor: HP       Model: P410i            Rev: 6.64

Ming's patch fixes this so we need to try move this along.

I have a DL380 G8 as well which is also likely exposed here and I added
 Don Brace for FYI to this list.

Thanks Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:58     ` Laurence Oberman
@ 2018-02-05 16:07       ` Don Brace
  -1 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 16:07 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei, Jens Axboe, linux-block,
	Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBUaGlzIGlzIGEgY3JpdGljYWwgaXNzdWUg
b24gdGhlIEhQU0EgYmVjYXVzZSBMaW51cyBhbHJlYWR5IGhhcyB0aGUNCj4gb3JpZ2luYWwgY29t
bWl0IHRoYXQgY2F1c2VzIHRoZSBzeXN0ZW0gdG8gZmFpbCB0byBib290Lg0KPiANCj4gQWxsIG15
IHRlc3Rpbmcgd2FzIG9uIERMMzgwIEc3IHNlcnZlcnMgd2l0aDoNCj4gDQo+IEhld2xldHQtUGFj
a2FyZCBDb21wYW55IFNtYXJ0IEFycmF5IEc2IGNvbnRyb2xsZXJzDQo+IFZlbmRvcjogSFAgICAg
ICAgTW9kZWw6IFA0MTBpICAgICAgICAgICAgUmV2OiA2LjY0DQo+IA0KPiBNaW5nJ3MgcGF0Y2gg
Zml4ZXMgdGhpcyBzbyB3ZSBuZWVkIHRvIHRyeSBtb3ZlIHRoaXMgYWxvbmcuDQo+IA0KPiBJIGhh
dmUgYSBETDM4MCBHOCBhcyB3ZWxsIHdoaWNoIGlzIGFsc28gbGlrZWx5IGV4cG9zZWQgaGVyZSBh
bmQgSSBhZGRlZA0KPiAgRG9uIEJyYWNlIGZvciBGWUkgdG8gdGhpcyBsaXN0Lg0KPiANCj4gVGhh
bmtzIE1pbmcNCg0KUnVubmluZyBzb21lIHRlc3RzIG5vdy4NCg==

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
@ 2018-02-05 16:07       ` Don Brace
  0 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 16:07 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei, Jens Axboe, linux-block,
	Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini

> -----Original Message-----
> This is a critical issue on the HPSA because Linus already has the
> original commit that causes the system to fail to boot.
> 
> All my testing was on DL380 G7 servers with:
> 
> Hewlett-Packard Company Smart Array G6 controllers
> Vendor: HP       Model: P410i            Rev: 6.64
> 
> Ming's patch fixes this so we need to try move this along.
> 
> I have a DL380 G8 as well which is also likely exposed here and I added
>  Don Brace for FYI to this list.
> 
> Thanks Ming

Running some tests now.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:58     ` Laurence Oberman
@ 2018-02-05 18:54       ` Don Brace
  -1 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 18:54 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei, Jens Axboe, linux-block,
	Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBMYXVyZW5jZSBPYmVybWFuIFtt
YWlsdG86bG9iZXJtYW5AcmVkaGF0LmNvbV0NCj4gU2VudDogTW9uZGF5LCBGZWJydWFyeSAwNSwg
MjAxOCA5OjU4IEFNDQo+IFRvOiBNaW5nIExlaSA8bWluZy5sZWlAcmVkaGF0LmNvbT47IEplbnMg
QXhib2UgPGF4Ym9lQGtlcm5lbC5kaz47IGxpbnV4LQ0KPiBibG9ja0B2Z2VyLmtlcm5lbC5vcmc7
IENocmlzdG9waCBIZWxsd2lnIDxoY2hAaW5mcmFkZWFkLm9yZz47IE1pa2UgU25pdHplcg0KPiA8
c25pdHplckByZWRoYXQuY29tPjsgRG9uIEJyYWNlIDxkb24uYnJhY2VAbWljcm9zZW1pLmNvbT4N
Cj4gQ2M6IGxpbnV4LXNjc2lAdmdlci5rZXJuZWwub3JnOyBIYW5uZXMgUmVpbmVja2UgPGhhcmVA
c3VzZS5kZT47IEFydW4gRWFzaQ0KPiA8YXJ1bi5lYXNpQGNhdml1bS5jb20+OyBPbWFyIFNhbmRv
dmFsIDxvc2FuZG92QGZiLmNvbT47IE1hcnRpbiBLIC4NCj4gUGV0ZXJzZW4gPG1hcnRpbi5wZXRl
cnNlbkBvcmFjbGUuY29tPjsgSmFtZXMgQm90dG9tbGV5DQo+IDxqYW1lcy5ib3R0b21sZXlAaGFu
c2VucGFydG5lcnNoaXAuY29tPjsgQ2hyaXN0b3BoIEhlbGx3aWcgPGhjaEBsc3QuZGU+Ow0KPiBE
b24gQnJhY2UgPGRvbi5icmFjZUBtaWNyb3NlbWkuY29tPjsgS2FzaHlhcCBEZXNhaQ0KPiA8a2Fz
aHlhcC5kZXNhaUBicm9hZGNvbS5jb20+OyBQZXRlciBSaXZlcmEgPHBldGVyLnJpdmVyYUBicm9h
ZGNvbS5jb20+Ow0KPiBQYW9sbyBCb256aW5pIDxwYm9uemluaUByZWRoYXQuY29tPg0KPiBTdWJq
ZWN0OiBSZTogW1BBVENIIFYyIDgvOF0gc2NzaTogaHBzYTogdXNlIGJsa19tcSB0byBzb2x2ZSBp
cnEgYWZmaW5pdHkgaXNzdWUNCj4gDQo+IEVYVEVSTkFMIEVNQUlMDQo+IA0KPiANCj4gT24gTW9u
LCAyMDE4LTAyLTA1IGF0IDIzOjIwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gPiBUaGlzIHBh
dGNoIHVzZXMgLmZvcmNlX2Jsa19tcSB0byBkcml2ZSBIUFNBIHZpYSBTQ1NJX01RLCBtZWFudGlt
ZQ0KPiA+IG1hcHMNCj4gPiBlYWNoIHJlcGx5IHF1ZXVlIHRvIGJsa19tcSdzIGh3IHF1ZXVlLCB0
aGVuIC5xdWV1ZWNvbW1hbmQgY2FuIGFsd2F5cw0KPiA+IGNob29zZSB0aGUgaHcgcXVldWUgYXMg
dGhlIHJlcGx5IHF1ZXVlLiBBbmQgaWYgbm8gYW55IG9ubGluZSBDUFUgaXMNCj4gPiBtYXBwZWQg
dG8gb25lIGh3IHF1ZXVlLCByZXF1ZXN0IGNhbid0IGJlIHN1Ym1pdHRlZCB0byB0aGlzIGh3IHF1
ZXVlDQo+ID4gYXQgYWxsLCBmaW5hbGx5IHRoZSBpcnEgYWZmaW5pdHkgaXNzdWUgaXMgc29sdmVk
Lg0KPiA+DQo+ID4gQ2M6IEhhbm5lcyBSZWluZWNrZSA8aGFyZUBzdXNlLmRlPg0KPiA+IENjOiBB
cnVuIEVhc2kgPGFydW4uZWFzaUBjYXZpdW0uY29tPg0KPiA+IENjOiBPbWFyIFNhbmRvdmFsIDxv
c2FuZG92QGZiLmNvbT4sDQo+ID4gQ2M6ICJNYXJ0aW4gSy4gUGV0ZXJzZW4iIDxtYXJ0aW4ucGV0
ZXJzZW5Ab3JhY2xlLmNvbT4sDQo+ID4gQ2M6IEphbWVzIEJvdHRvbWxleSA8amFtZXMuYm90dG9t
bGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbT4sDQo+ID4gQ2M6IENocmlzdG9waCBIZWxsd2lnIDxo
Y2hAbHN0LmRlPiwNCj4gPiBDYzogRG9uIEJyYWNlIDxkb24uYnJhY2VAbWljcm9zZW1pLmNvbT4N
Cj4gPiBDYzogS2FzaHlhcCBEZXNhaSA8a2FzaHlhcC5kZXNhaUBicm9hZGNvbS5jb20+DQo+ID4g
Q2M6IFBldGVyIFJpdmVyYSA8cGV0ZXIucml2ZXJhQGJyb2FkY29tLmNvbT4NCj4gPiBDYzogUGFv
bG8gQm9uemluaSA8cGJvbnppbmlAcmVkaGF0LmNvbT4NCj4gPiBDYzogTWlrZSBTbml0emVyIDxz
bml0emVyQHJlZGhhdC5jb20+DQo+ID4gVGVzdGVkLWJ5OiBMYXVyZW5jZSBPYmVybWFuIDxsb2Jl
cm1hbkByZWRoYXQuY29tPg0KPiA+IFNpZ25lZC1vZmYtYnk6IE1pbmcgTGVpIDxtaW5nLmxlaUBy
ZWRoYXQuY29tPg0KPiA+IC0tLQ0KPiA+ICBkcml2ZXJzL3Njc2kvaHBzYS5jIHwgNTEgKysrKysr
KysrKysrKysrKysrKysrKysrKysrKysrKysrKy0tLS0tLS0tLQ0KPiBUaGlzIGlzIGEgY3JpdGlj
YWwgaXNzdWUgb24gdGhlIEhQU0EgYmVjYXVzZSBMaW51cyBhbHJlYWR5IGhhcyB0aGUNCj4gb3Jp
Z2luYWwgY29tbWl0IHRoYXQgY2F1c2VzIHRoZSBzeXN0ZW0gdG8gZmFpbCB0byBib290Lg0KPiAN
Cj4gQWxsIG15IHRlc3Rpbmcgd2FzIG9uIERMMzgwIEc3IHNlcnZlcnMgd2l0aDoNCj4gDQo+IEhl
d2xldHQtUGFja2FyZCBDb21wYW55IFNtYXJ0IEFycmF5IEc2IGNvbnRyb2xsZXJzDQo+IFZlbmRv
cjogSFAgICAgICAgTW9kZWw6IFA0MTBpICAgICAgICAgICAgUmV2OiA2LjY0DQo+IA0KPiBNaW5n
J3MgcGF0Y2ggZml4ZXMgdGhpcyBzbyB3ZSBuZWVkIHRvIHRyeSBtb3ZlIHRoaXMgYWxvbmcuDQo+
IA0KPiBJIGhhdmUgYSBETDM4MCBHOCBhcyB3ZWxsIHdoaWNoIGlzIGFsc28gbGlrZWx5IGV4cG9z
ZWQgaGVyZSBhbmQgSSBhZGRlZA0KPiAgRG9uIEJyYWNlIGZvciBGWUkgdG8gdGhpcyBsaXN0Lg0K
PiANCj4gVGhhbmtzIE1pbmcNCg0KVGVzdGVkLWJ5OiBEb24gQnJhY2UgPGRvbi5icmFjZUBtaWNy
b3NlbWkuY29tPg0KUDQ0MSwgUDQzMSwgUDgzMGksIEgyNDANCg0KQWNrZWQtYnk6IERvbiBCcmFj
ZSA8ZG9uLmJyYWNlQG1pY3Jvc2VtaS5jb20+DQoNCg0KDQo=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
@ 2018-02-05 18:54       ` Don Brace
  0 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 18:54 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei, Jens Axboe, linux-block,
	Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini

> -----Original Message-----
> From: Laurence Oberman [mailto:loberman@redhat.com]
> Sent: Monday, February 05, 2018 9:58 AM
> To: Ming Lei <ming.lei@redhat.com>; Jens Axboe <axboe@kernel.dk>; linux-
> block@vger.kernel.org; Christoph Hellwig <hch@infradead.org>; Mike Snitzer
> <snitzer@redhat.com>; Don Brace <don.brace@microsemi.com>
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>; Arun Easi
> <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> Petersen <martin.petersen@oracle.com>; James Bottomley
> <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst.de>;
> Don Brace <don.brace@microsemi.com>; Kashyap Desai
> <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.com>;
> Paolo Bonzini <pbonzini@redhat.com>
> Subject: Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
> 
> EXTERNAL EMAIL
> 
> 
> On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> > This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime
> > maps
> > each reply queue to blk_mq's hw queue, then .queuecommand can always
> > choose the hw queue as the reply queue. And if no any online CPU is
> > mapped to one hw queue, request can't be submitted to this hw queue
> > at all, finally the irq affinity issue is solved.
> >
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Arun Easi <arun.easi@cavium.com>
> > Cc: Omar Sandoval <osandov@fb.com>,
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++---------
> This is a critical issue on the HPSA because Linus already has the
> original commit that causes the system to fail to boot.
> 
> All my testing was on DL380 G7 servers with:
> 
> Hewlett-Packard Company Smart Array G6 controllers
> Vendor: HP       Model: P410i            Rev: 6.64
> 
> Ming's patch fixes this so we need to try move this along.
> 
> I have a DL380 G8 as well which is also likely exposed here and I added
>  Don Brace for FYI to this list.
> 
> Thanks Ming

Tested-by: Don Brace <don.brace@microsemi.com>
P441, P431, P830i, H240

Acked-by: Don Brace <don.brace@microsemi.com>




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host
  2018-02-05 15:20 ` [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host Ming Lei
@ 2018-02-05 18:55     ` Don Brace
  2018-02-06  8:32   ` Hannes Reinecke
  1 sibling, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 18:55 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini, Laurence Oberman



> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Monday, February 05, 2018 9:21 AM
> To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; Christoph
> Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>; Arun Easi
> <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> Petersen <martin.petersen@oracle.com>; James Bottomley
> <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst.de>;
> Don Brace <don.brace@microsemi.com>; Kashyap Desai
> <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Laurence Oberman
> <loberman@redhat.com>; Ming Lei <ming.lei@redhat.com>
> Subject: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding =
host
>=20
> EXTERNAL EMAIL
>=20
>=20
> So that we can decide the default reply queue by the map created
> during adding host.
>=20
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>=20
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 287e5eb0723f..443eabf63a9f 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -8691,12 +8691,9 @@ static int hpsa_init_one(struct pci_dev *pdev, con=
st
> struct pci_device_id *ent)
>         /* Disable discovery polling.*/
>         h->discovery_polling =3D 0;
>=20
> -
>         /* Turn the interrupts on so we can service requests */
>         h->access.set_intr_mask(h, HPSA_INTR_ON);
>=20
> -       hpsa_hba_inquiry(h);
> -
>         h->lastlogicals =3D kzalloc(sizeof(*(h->lastlogicals)), GFP_KERNE=
L);
>         if (!h->lastlogicals)
>                 dev_info(&h->pdev->dev,
> @@ -8707,6 +8704,8 @@ static int hpsa_init_one(struct pci_dev *pdev, cons=
t
> struct pci_device_id *ent)
>         if (rc)
>                 goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h=
 */
>=20
> +       hpsa_hba_inquiry(h);
> +
>         /* Monitor the controller for firmware lockups */
>         h->heartbeat_sample_interval =3D HEARTBEAT_SAMPLE_INTERVAL;
>         INIT_DELAYED_WORK(&h->monitor_ctlr_work, hpsa_monitor_ctlr_worker=
);
> --
> 2.9.5

Tested-by: Don Brace <don.brace@microsemi.com>
P441, P431, P830i, H240

Acked-by: Don Brace <don.brace@microsemi.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host
@ 2018-02-05 18:55     ` Don Brace
  0 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 18:55 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini, Laurence Oberman



> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Monday, February 05, 2018 9:21 AM
> To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; Christoph
> Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>; Arun Easi
> <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> Petersen <martin.petersen@oracle.com>; James Bottomley
> <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst.de>;
> Don Brace <don.brace@microsemi.com>; Kashyap Desai
> <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Laurence Oberman
> <loberman@redhat.com>; Ming Lei <ming.lei@redhat.com>
> Subject: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host
> 
> EXTERNAL EMAIL
> 
> 
> So that we can decide the default reply queue by the map created
> during adding host.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 287e5eb0723f..443eabf63a9f 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -8691,12 +8691,9 @@ static int hpsa_init_one(struct pci_dev *pdev, const
> struct pci_device_id *ent)
>         /* Disable discovery polling.*/
>         h->discovery_polling = 0;
> 
> -
>         /* Turn the interrupts on so we can service requests */
>         h->access.set_intr_mask(h, HPSA_INTR_ON);
> 
> -       hpsa_hba_inquiry(h);
> -
>         h->lastlogicals = kzalloc(sizeof(*(h->lastlogicals)), GFP_KERNEL);
>         if (!h->lastlogicals)
>                 dev_info(&h->pdev->dev,
> @@ -8707,6 +8704,8 @@ static int hpsa_init_one(struct pci_dev *pdev, const
> struct pci_device_id *ent)
>         if (rc)
>                 goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h */
> 
> +       hpsa_hba_inquiry(h);
> +
>         /* Monitor the controller for firmware lockups */
>         h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
>         INIT_DELAYED_WORK(&h->monitor_ctlr_work, hpsa_monitor_ctlr_worker);
> --
> 2.9.5

Tested-by: Don Brace <don.brace@microsemi.com>
P441, P431, P830i, H240

Acked-by: Don Brace <don.brace@microsemi.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'
  2018-02-05 15:20 ` [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
@ 2018-02-05 20:26     ` Don Brace
  2018-02-06 21:43   ` Omar Sandoval
  1 sibling, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 20:26 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini, Laurence Oberman

> This patch introduces the parameter of 'g_global_tags' so that we can
> test this feature by null_blk easiy.
>=20
> Not see obvious performance drop with global_tags when the whole hw
> depth is kept as same:
>=20
> 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
> modprobe null_blk queue_mode=3D2 nr_devices=3D4 shared_tags=3D1 global_ta=
gs=3D0
> submit_queues=3D4 hw_queue_depth=3D1
>=20
> 2) 'global_tags', global hw queue depth is 4, and 4 hw queues
> modprobe null_blk queue_mode=3D2 nr_devices=3D4 shared_tags=3D1 global_ta=
gs=3D1
> submit_queues=3D4 hw_queue_depth=3D4
>=20
> 3) fio test done in above two settings:
>    fio --bs=3D4k --size=3D512G  --rw=3Drandread --norandommap --direct=3D=
1 --
> ioengine=3Dlibaio --iodepth=3D4 --runtime=3D$RUNTIME --group_reporting=3D=
1  --
> name=3Dnullb0 --filename=3D/dev/nullb0 --name=3Dnullb1 --filename=3D/dev/=
nullb1 --
> name=3Dnullb2 --filename=3D/dev/nullb2 --name=3Dnullb3 --filename=3D/dev/=
nullb3
>=20
> 1M IOPS can be reached in both above tests which is done in one VM.
>=20
I am getting 1.1M IOPS for both cases.

Tested-by: Don Brace <don.brace@microsemi.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'
@ 2018-02-05 20:26     ` Don Brace
  0 siblings, 0 replies; 40+ messages in thread
From: Don Brace @ 2018-02-05 20:26 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Paolo Bonzini, Laurence Oberman

> This patch introduces the parameter of 'g_global_tags' so that we can
> test this feature by null_blk easiy.
> 
> Not see obvious performance drop with global_tags when the whole hw
> depth is kept as same:
> 
> 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=0
> submit_queues=4 hw_queue_depth=1
> 
> 2) 'global_tags', global hw queue depth is 4, and 4 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=1
> submit_queues=4 hw_queue_depth=4
> 
> 3) fio test done in above two settings:
>    fio --bs=4k --size=512G  --rw=randread --norandommap --direct=1 --
> ioengine=libaio --iodepth=4 --runtime=$RUNTIME --group_reporting=1  --
> name=nullb0 --filename=/dev/nullb0 --name=nullb1 --filename=/dev/nullb1 --
> name=nullb2 --filename=/dev/nullb2 --name=nullb3 --filename=/dev/nullb3
> 
> 1M IOPS can be reached in both above tests which is done in one VM.
> 
I am getting 1.1M IOPS for both cases.

Tested-by: Don Brace <don.brace@microsemi.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
@ 2018-02-06  2:18     ` chenxiang (M)
  2018-02-06  2:18     ` chenxiang (M)
  2018-02-06  8:39   ` Hannes Reinecke
  2 siblings, 0 replies; 40+ messages in thread
From: chenxiang (M) @ 2018-02-06  2:18 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

在 2018/2/5 23:20, Ming Lei 写道:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, request can't be submitted to this hw queue
> at all, finally the irq affinity issue is solved.
>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
>   1 file changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 443eabf63a9f..e517a4c74a28 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>   #include <linux/jiffies.h>
>   #include <linux/percpu-defs.h>
>   #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>   #include <asm/unaligned.h>
>   #include <asm/div64.h>
>   #include "hpsa_cmd.h"
> @@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
>   #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
>   				 HPSA_MAX_CONCURRENT_PASSTHRUS)
>   
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +        struct ctlr_info *h = shost_to_hba(shost);
> +
> +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> +}
> +

Hi Lei Ming,
It is okay to use blk_mq_pci_map_queue to solve automatic irq affinity 
issue when the first interrupt vector for queues is 0.
But if the first interrupt vector for queues is not 0,  we seems 
couldn't use blk_mq_pci_map_queue directly,
such as blk_mq_virtio_map_queues, it realizes a interface itself. Is it 
possible to provide a general interface for those
situations?


>   static struct scsi_host_template hpsa_driver_template = {
>   	.module			= THIS_MODULE,
>   	.name			= HPSA,
> @@ -974,10 +982,13 @@ static struct scsi_host_template hpsa_driver_template = {
>   #ifdef CONFIG_COMPAT
>   	.compat_ioctl		= hpsa_compat_ioctl,
>   #endif
> +	.map_queues = hpsa_map_queues,
>   	.sdev_attrs = hpsa_sdev_attrs,
>   	.shost_attrs = hpsa_shost_attrs,
>   	.max_sectors = 1024,
>   	.no_write_same = 1,
> +	.force_blk_mq = 1,
> +	.host_tagset = 1,
>   };
>   
>   static inline u32 next_command(struct ctlr_info *h, u8 q)
> @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
>   		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
>   		if (unlikely(!h->msix_vectors))
>   			return;
> -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -			c->Header.ReplyQueue =
> -				raw_smp_processor_id() % h->nreply_queues;
> -		else
> -			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
> +		c->Header.ReplyQueue = reply_queue;
>   	}
>   }
>   
> @@ -1063,10 +1070,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
>   	 * Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> +	cp->ReplyQueue = reply_queue;
>   	/*
>   	 * Set the bits in the address sent down to include:
>   	 *  - performant mode bit (bit 0)
> @@ -1087,10 +1091,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>   	/* Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>   	/* Set the bits in the address sent down to include:
>   	 *  - performant mode bit not used in ioaccel mode 2
>   	 *  - pull count (bits 0-3)
> @@ -1109,10 +1110,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
>   	 * Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>   	/*
>   	 * Set the bits in the address sent down to include:
>   	 *  - performant mode bit not used in ioaccel mode 2
> @@ -1152,11 +1150,27 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
>   		h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
>   }
>   
> +static unsigned get_reply_queue(struct ctlr_info *h, struct CommandList *c)
> +{
> +	struct scsi_cmnd *cmd = c->scsi_cmd;
> +	int *map;
> +
> +	if (cmd && cmd->request) {
> +		u32 tag = blk_mq_unique_tag(cmd->request);
> +		return blk_mq_unique_tag_to_hwq(tag);
> +	}
> +
> +	map = h->scsi_host->tag_set.mq_map;
> +	return map[raw_smp_processor_id()];
> +}
> +
>   static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>   	struct CommandList *c, int reply_queue)
>   {
>   	dial_down_lockup_detection_during_fw_flash(h, c);
>   	atomic_inc(&h->commands_outstanding);
> +
> +	reply_queue = get_reply_queue(h, c);
>   	switch (c->cmd_type) {
>   	case CMD_IOACCEL1:
>   		set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
>   {
>   	int rv;
>   
> +	/* map reply queue to blk_mq hw queue */
> +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> +
>   	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
>   	if (rv) {
>   		dev_err(&h->pdev->dev, "scsi_add_host failed\n");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
@ 2018-02-06  2:18     ` chenxiang (M)
  0 siblings, 0 replies; 40+ messages in thread
From: chenxiang (M) @ 2018-02-06  2:18 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

在 2018/2/5 23:20, Ming Lei 写道:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, request can't be submitted to this hw queue
> at all, finally the irq affinity issue is solved.
>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
>   1 file changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 443eabf63a9f..e517a4c74a28 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>   #include <linux/jiffies.h>
>   #include <linux/percpu-defs.h>
>   #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>   #include <asm/unaligned.h>
>   #include <asm/div64.h>
>   #include "hpsa_cmd.h"
> @@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
>   #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
>   				 HPSA_MAX_CONCURRENT_PASSTHRUS)
>   
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +        struct ctlr_info *h = shost_to_hba(shost);
> +
> +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> +}
> +

Hi Lei Ming,
It is okay to use blk_mq_pci_map_queue to solve automatic irq affinity 
issue when the first interrupt vector for queues is 0.
But if the first interrupt vector for queues is not 0,  we seems 
couldn't use blk_mq_pci_map_queue directly,
such as blk_mq_virtio_map_queues, it realizes a interface itself. Is it 
possible to provide a general interface for those
situations?


>   static struct scsi_host_template hpsa_driver_template = {
>   	.module			= THIS_MODULE,
>   	.name			= HPSA,
> @@ -974,10 +982,13 @@ static struct scsi_host_template hpsa_driver_template = {
>   #ifdef CONFIG_COMPAT
>   	.compat_ioctl		= hpsa_compat_ioctl,
>   #endif
> +	.map_queues = hpsa_map_queues,
>   	.sdev_attrs = hpsa_sdev_attrs,
>   	.shost_attrs = hpsa_shost_attrs,
>   	.max_sectors = 1024,
>   	.no_write_same = 1,
> +	.force_blk_mq = 1,
> +	.host_tagset = 1,
>   };
>   
>   static inline u32 next_command(struct ctlr_info *h, u8 q)
> @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
>   		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
>   		if (unlikely(!h->msix_vectors))
>   			return;
> -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -			c->Header.ReplyQueue =
> -				raw_smp_processor_id() % h->nreply_queues;
> -		else
> -			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
> +		c->Header.ReplyQueue = reply_queue;
>   	}
>   }
>   
> @@ -1063,10 +1070,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
>   	 * Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> +	cp->ReplyQueue = reply_queue;
>   	/*
>   	 * Set the bits in the address sent down to include:
>   	 *  - performant mode bit (bit 0)
> @@ -1087,10 +1091,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>   	/* Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>   	/* Set the bits in the address sent down to include:
>   	 *  - performant mode bit not used in ioaccel mode 2
>   	 *  - pull count (bits 0-3)
> @@ -1109,10 +1110,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
>   	 * Tell the controller to post the reply to the queue for this
>   	 * processor.  This seems to give the best I/O throughput.
>   	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>   	/*
>   	 * Set the bits in the address sent down to include:
>   	 *  - performant mode bit not used in ioaccel mode 2
> @@ -1152,11 +1150,27 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
>   		h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
>   }
>   
> +static unsigned get_reply_queue(struct ctlr_info *h, struct CommandList *c)
> +{
> +	struct scsi_cmnd *cmd = c->scsi_cmd;
> +	int *map;
> +
> +	if (cmd && cmd->request) {
> +		u32 tag = blk_mq_unique_tag(cmd->request);
> +		return blk_mq_unique_tag_to_hwq(tag);
> +	}
> +
> +	map = h->scsi_host->tag_set.mq_map;
> +	return map[raw_smp_processor_id()];
> +}
> +
>   static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>   	struct CommandList *c, int reply_queue)
>   {
>   	dial_down_lockup_detection_during_fw_flash(h, c);
>   	atomic_inc(&h->commands_outstanding);
> +
> +	reply_queue = get_reply_queue(h, c);
>   	switch (c->cmd_type) {
>   	case CMD_IOACCEL1:
>   		set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
>   {
>   	int rv;
>   
> +	/* map reply queue to blk_mq hw queue */
> +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> +
>   	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
>   	if (rv) {
>   		dev_err(&h->pdev->dev, "scsi_add_host failed\n");

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-06  2:18     ` chenxiang (M)
  (?)
@ 2018-02-06  8:23     ` Ming Lei
  -1 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-06  8:23 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

Hello chenxiang,

On Tue, Feb 06, 2018 at 10:18:19AM +0800, chenxiang (M) wrote:
> 在 2018/2/5 23:20, Ming Lei 写道:
> > This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
> > each reply queue to blk_mq's hw queue, then .queuecommand can always
> > choose the hw queue as the reply queue. And if no any online CPU is
> > mapped to one hw queue, request can't be submitted to this hw queue
> > at all, finally the irq affinity issue is solved.
> > 
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Arun Easi <arun.easi@cavium.com>
> > Cc: Omar Sandoval <osandov@fb.com>,
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
> >   1 file changed, 34 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 443eabf63a9f..e517a4c74a28 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -51,6 +51,7 @@
> >   #include <linux/jiffies.h>
> >   #include <linux/percpu-defs.h>
> >   #include <linux/percpu.h>
> > +#include <linux/blk-mq-pci.h>
> >   #include <asm/unaligned.h>
> >   #include <asm/div64.h>
> >   #include "hpsa_cmd.h"
> > @@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
> >   #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
> >   				 HPSA_MAX_CONCURRENT_PASSTHRUS)
> > +static int hpsa_map_queues(struct Scsi_Host *shost)
> > +{
> > +        struct ctlr_info *h = shost_to_hba(shost);
> > +
> > +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> > +}
> > +
> 
> Hi Lei Ming,
> It is okay to use blk_mq_pci_map_queue to solve automatic irq affinity issue
> when the first interrupt vector for queues is 0.
> But if the first interrupt vector for queues is not 0,  we seems couldn't
> use blk_mq_pci_map_queue directly,
> such as blk_mq_virtio_map_queues, it realizes a interface itself. Is it
> possible to provide a general interface for those
> situations?

I guess it isn't necessary to do that, as you see .map_queues has been
introduced to 'scsi_host_template' for dealing driver specific irq
vector difference, such as, virtio-pci, 'irq_affinity' is needed for
excluding 'pre_vectors' which should serve as virtio config vector.

But that should belong to another topic about implementing generic
.map_queues interface, and seems not related with this patch, since
the usage of blk_mq_pci_map_queues() in this patch is correct.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host
  2018-02-05 15:20 ` [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host Ming Lei
  2018-02-05 18:55     ` Don Brace
@ 2018-02-06  8:32   ` Hannes Reinecke
  1 sibling, 0 replies; 40+ messages in thread
From: Hannes Reinecke @ 2018-02-06  8:32 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Arun Easi, Omar Sandoval, Martin K . Petersen,
	James Bottomley, Christoph Hellwig, Don Brace, Kashyap Desai,
	Peter Rivera, Paolo Bonzini, Laurence Oberman

On 02/05/2018 04:20 PM, Ming Lei wrote:
> So that we can decide the default reply queue by the map created
> during adding host.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 287e5eb0723f..443eabf63a9f 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -8691,12 +8691,9 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	/* Disable discovery polling.*/
>  	h->discovery_polling = 0;
>  
> -
>  	/* Turn the interrupts on so we can service requests */
>  	h->access.set_intr_mask(h, HPSA_INTR_ON);
>  
> -	hpsa_hba_inquiry(h);
> -
>  	h->lastlogicals = kzalloc(sizeof(*(h->lastlogicals)), GFP_KERNEL);
>  	if (!h->lastlogicals)
>  		dev_info(&h->pdev->dev,
> @@ -8707,6 +8704,8 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	if (rc)
>  		goto clean7; /* perf, sg, cmd, irq, shost, pci, lu, aer/h */
>  
> +	hpsa_hba_inquiry(h);
> +
>  	/* Monitor the controller for firmware lockups */
>  	h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
>  	INIT_DELAYED_WORK(&h->monitor_ctlr_work, hpsa_monitor_ctlr_worker);
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
  2018-02-05 15:58     ` Laurence Oberman
  2018-02-06  2:18     ` chenxiang (M)
@ 2018-02-06  8:39   ` Hannes Reinecke
  2018-02-06  9:51     ` Ming Lei
  2 siblings, 1 reply; 40+ messages in thread
From: Hannes Reinecke @ 2018-02-06  8:39 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Arun Easi, Omar Sandoval, Martin K . Petersen,
	James Bottomley, Christoph Hellwig, Don Brace, Kashyap Desai,
	Peter Rivera, Paolo Bonzini, Laurence Oberman

On 02/05/2018 04:20 PM, Ming Lei wrote:
> This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
> each reply queue to blk_mq's hw queue, then .queuecommand can always
> choose the hw queue as the reply queue. And if no any online CPU is
> mapped to one hw queue, request can't be submitted to this hw queue
> at all, finally the irq affinity issue is solved.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 34 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 443eabf63a9f..e517a4c74a28 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -51,6 +51,7 @@
>  #include <linux/jiffies.h>
>  #include <linux/percpu-defs.h>
>  #include <linux/percpu.h>
> +#include <linux/blk-mq-pci.h>
>  #include <asm/unaligned.h>
>  #include <asm/div64.h>
>  #include "hpsa_cmd.h"
> @@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
>  #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
>  				 HPSA_MAX_CONCURRENT_PASSTHRUS)
>  
> +static int hpsa_map_queues(struct Scsi_Host *shost)
> +{
> +        struct ctlr_info *h = shost_to_hba(shost);
> +
> +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> +}
> +
>  static struct scsi_host_template hpsa_driver_template = {
>  	.module			= THIS_MODULE,
>  	.name			= HPSA,
> @@ -974,10 +982,13 @@ static struct scsi_host_template hpsa_driver_template = {
>  #ifdef CONFIG_COMPAT
>  	.compat_ioctl		= hpsa_compat_ioctl,
>  #endif
> +	.map_queues = hpsa_map_queues,
>  	.sdev_attrs = hpsa_sdev_attrs,
>  	.shost_attrs = hpsa_shost_attrs,
>  	.max_sectors = 1024,
>  	.no_write_same = 1,
> +	.force_blk_mq = 1,
> +	.host_tagset = 1,
>  };
>  
>  static inline u32 next_command(struct ctlr_info *h, u8 q)
> @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
>  		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
>  		if (unlikely(!h->msix_vectors))
>  			return;
> -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -			c->Header.ReplyQueue =
> -				raw_smp_processor_id() % h->nreply_queues;
> -		else
> -			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
> +		c->Header.ReplyQueue = reply_queue;
>  	}
>  }
>  
> @@ -1063,10 +1070,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> +	cp->ReplyQueue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit (bit 0)
> @@ -1087,10 +1091,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>  	/* Tell the controller to post the reply to the queue for this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/* Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
>  	 *  - pull count (bits 0-3)
> @@ -1109,10 +1110,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
>  	 * Tell the controller to post the reply to the queue for this
>  	 * processor.  This seems to give the best I/O throughput.
>  	 */
> -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> -	else
> -		cp->reply_queue = reply_queue % h->nreply_queues;
> +	cp->reply_queue = reply_queue;
>  	/*
>  	 * Set the bits in the address sent down to include:
>  	 *  - performant mode bit not used in ioaccel mode 2
> @@ -1152,11 +1150,27 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
>  		h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
>  }
>  
> +static unsigned get_reply_queue(struct ctlr_info *h, struct CommandList *c)
> +{
> +	struct scsi_cmnd *cmd = c->scsi_cmd;
> +	int *map;
> +
> +	if (cmd && cmd->request) {
> +		u32 tag = blk_mq_unique_tag(cmd->request);
> +		return blk_mq_unique_tag_to_hwq(tag);
> +	}
> +
> +	map = h->scsi_host->tag_set.mq_map;
> +	return map[raw_smp_processor_id()];
> +}
> +
>  static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
>  	struct CommandList *c, int reply_queue)
>  {
>  	dial_down_lockup_detection_during_fw_flash(h, c);
>  	atomic_inc(&h->commands_outstanding);
> +
> +	reply_queue = get_reply_queue(h, c);
>  	switch (c->cmd_type) {
>  	case CMD_IOACCEL1:
>  		set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
>  {
>  	int rv;
>  
> +	/* map reply queue to blk_mq hw queue */
> +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> +
>  	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
>  	if (rv) {
>  		dev_err(&h->pdev->dev, "scsi_add_host failed\n");
> 
I really wonder why we need to call 'raw_smp_processor' here; typically
the queue should be encoded in the tag, so it would be better to call
blk_mq_unique_tag()/blk_mq_unique_tag_to_hwq() here to get the hardware
queue.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue
  2018-02-06  8:39   ` Hannes Reinecke
@ 2018-02-06  9:51     ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-06  9:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Arun Easi, Omar Sandoval, Martin K . Petersen,
	James Bottomley, Christoph Hellwig, Don Brace, Kashyap Desai,
	Peter Rivera, Paolo Bonzini, Laurence Oberman

On Tue, Feb 06, 2018 at 09:39:26AM +0100, Hannes Reinecke wrote:
> On 02/05/2018 04:20 PM, Ming Lei wrote:
> > This patch uses .force_blk_mq to drive HPSA via SCSI_MQ, meantime maps
> > each reply queue to blk_mq's hw queue, then .queuecommand can always
> > choose the hw queue as the reply queue. And if no any online CPU is
> > mapped to one hw queue, request can't be submitted to this hw queue
> > at all, finally the irq affinity issue is solved.
> > 
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Arun Easi <arun.easi@cavium.com>
> > Cc: Omar Sandoval <osandov@fb.com>,
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/hpsa.c | 51 ++++++++++++++++++++++++++++++++++-----------------
> >  1 file changed, 34 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 443eabf63a9f..e517a4c74a28 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -51,6 +51,7 @@
> >  #include <linux/jiffies.h>
> >  #include <linux/percpu-defs.h>
> >  #include <linux/percpu.h>
> > +#include <linux/blk-mq-pci.h>
> >  #include <asm/unaligned.h>
> >  #include <asm/div64.h>
> >  #include "hpsa_cmd.h"
> > @@ -956,6 +957,13 @@ static struct device_attribute *hpsa_shost_attrs[] = {
> >  #define HPSA_NRESERVED_CMDS	(HPSA_CMDS_RESERVED_FOR_DRIVER +\
> >  				 HPSA_MAX_CONCURRENT_PASSTHRUS)
> >  
> > +static int hpsa_map_queues(struct Scsi_Host *shost)
> > +{
> > +        struct ctlr_info *h = shost_to_hba(shost);
> > +
> > +        return blk_mq_pci_map_queues(&shost->tag_set, h->pdev);
> > +}
> > +
> >  static struct scsi_host_template hpsa_driver_template = {
> >  	.module			= THIS_MODULE,
> >  	.name			= HPSA,
> > @@ -974,10 +982,13 @@ static struct scsi_host_template hpsa_driver_template = {
> >  #ifdef CONFIG_COMPAT
> >  	.compat_ioctl		= hpsa_compat_ioctl,
> >  #endif
> > +	.map_queues = hpsa_map_queues,
> >  	.sdev_attrs = hpsa_sdev_attrs,
> >  	.shost_attrs = hpsa_shost_attrs,
> >  	.max_sectors = 1024,
> >  	.no_write_same = 1,
> > +	.force_blk_mq = 1,
> > +	.host_tagset = 1,
> >  };
> >  
> >  static inline u32 next_command(struct ctlr_info *h, u8 q)
> > @@ -1045,11 +1056,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
> >  		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
> >  		if (unlikely(!h->msix_vectors))
> >  			return;
> > -		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -			c->Header.ReplyQueue =
> > -				raw_smp_processor_id() % h->nreply_queues;
> > -		else
> > -			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
> > +		c->Header.ReplyQueue = reply_queue;
> >  	}
> >  }
> >  
> > @@ -1063,10 +1070,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
> >  	 * Tell the controller to post the reply to the queue for this
> >  	 * processor.  This seems to give the best I/O throughput.
> >  	 */
> > -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
> > -	else
> > -		cp->ReplyQueue = reply_queue % h->nreply_queues;
> > +	cp->ReplyQueue = reply_queue;
> >  	/*
> >  	 * Set the bits in the address sent down to include:
> >  	 *  - performant mode bit (bit 0)
> > @@ -1087,10 +1091,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> >  	/* Tell the controller to post the reply to the queue for this
> >  	 * processor.  This seems to give the best I/O throughput.
> >  	 */
> > -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> > -	else
> > -		cp->reply_queue = reply_queue % h->nreply_queues;
> > +	cp->reply_queue = reply_queue;
> >  	/* Set the bits in the address sent down to include:
> >  	 *  - performant mode bit not used in ioaccel mode 2
> >  	 *  - pull count (bits 0-3)
> > @@ -1109,10 +1110,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
> >  	 * Tell the controller to post the reply to the queue for this
> >  	 * processor.  This seems to give the best I/O throughput.
> >  	 */
> > -	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -		cp->reply_queue = smp_processor_id() % h->nreply_queues;
> > -	else
> > -		cp->reply_queue = reply_queue % h->nreply_queues;
> > +	cp->reply_queue = reply_queue;
> >  	/*
> >  	 * Set the bits in the address sent down to include:
> >  	 *  - performant mode bit not used in ioaccel mode 2
> > @@ -1152,11 +1150,27 @@ static void dial_up_lockup_detection_on_fw_flash_complete(struct ctlr_info *h,
> >  		h->heartbeat_sample_interval = HEARTBEAT_SAMPLE_INTERVAL;
> >  }
> >  
> > +static unsigned get_reply_queue(struct ctlr_info *h, struct CommandList *c)
> > +{
> > +	struct scsi_cmnd *cmd = c->scsi_cmd;
> > +	int *map;
> > +
> > +	if (cmd && cmd->request) {
> > +		u32 tag = blk_mq_unique_tag(cmd->request);
> > +		return blk_mq_unique_tag_to_hwq(tag);
> > +	}
> > +
> > +	map = h->scsi_host->tag_set.mq_map;
> > +	return map[raw_smp_processor_id()];
> > +}
> > +
> >  static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
> >  	struct CommandList *c, int reply_queue)
> >  {
> >  	dial_down_lockup_detection_during_fw_flash(h, c);
> >  	atomic_inc(&h->commands_outstanding);
> > +
> > +	reply_queue = get_reply_queue(h, c);
> >  	switch (c->cmd_type) {
> >  	case CMD_IOACCEL1:
> >  		set_ioaccel1_performant_mode(h, c, reply_queue);
> > @@ -5781,6 +5795,9 @@ static int hpsa_scsi_add_host(struct ctlr_info *h)
> >  {
> >  	int rv;
> >  
> > +	/* map reply queue to blk_mq hw queue */
> > +	h->scsi_host->nr_hw_queues = h->msix_vectors;
> > +
> >  	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
> >  	if (rv) {
> >  		dev_err(&h->pdev->dev, "scsi_add_host failed\n");
> > 
> I really wonder why we need to call 'raw_smp_processor' here; typically
> the queue should be encoded in the tag, so it would be better to call
> blk_mq_unique_tag()/blk_mq_unique_tag_to_hwq() here to get the hardware
> queue.

HPSA may send internal command(no any request attached) to HBA directly,
that is why raw_smp_processor() is used here. As you saw in patch 7, command
can be submitted to HBA even before adding scsi host...

I admit it is ugly, but that should be something we can survive, at least blk-mq
provides one simply approach to get the correct(or default) hw/reply queue for
submission req. 

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/8] scsi: introduce force_blk_mq
  2018-02-05 15:20 ` [PATCH V2 5/8] scsi: introduce force_blk_mq Ming Lei
@ 2018-02-06 20:20   ` Omar Sandoval
  2018-02-07  0:46     ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2018-02-06 20:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Mon, Feb 05, 2018 at 11:20:32PM +0800, Ming Lei wrote:
> From scsi driver view, it is a bit troublesome to support both blk-mq
> and non-blk-mq at the same time, especially when drivers need to support
> multi hw-queue.
> 
> This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
> can provide blk-mq only support, so driver code can avoid the trouble
> for supporting both.
> 
> This patch may clean up driver a lot by providing blk-mq only support, espeically
> it is easier to convert multiple reply queues into blk_mq's MQ for the following
> purposes:
> 
> 1) use blk_mq multiple hw queue to deal with allocated irq vectors of all offline
> CPU affinity[1]:
> 
> 	[1] https://marc.info/?l=linux-kernel&m=151748144730409&w=2
> 
> Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
> has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
> for some irq vectors, this can't be avoided even though the allocation
> is improved.
> 
> So all these drivers have to avoid to ask HBA to complete request in
> reply queue which hasn't online CPUs assigned.
> 
> This issue can be solved generically and easily via blk_mq(scsi_mq) multiple
> hw queue by mapping each reply queue into hctx.
> 
> 2) some drivers[1] require to complete request in the submission CPU for
> avoiding hard/soft lockup, which is easily done with blk_mq, so not necessary
> to reinvent wheels for solving the problem.
> 
> 	[2] https://marc.info/?t=151601851400001&r=1&w=2
> 
> Sovling the above issues for non-MQ path may not be easy, or introduce
> unnecessary work, especially we plan to enable SCSI_MQ soon as discussed
> recently[3]:
> 
> 	[3] https://marc.info/?l=linux-scsi&m=151727684915589&w=2
> 
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hosts.c     | 1 +
>  include/scsi/scsi_host.h | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index fe3a0da3ec97..c75cebd7911d 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -471,6 +471,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>  		shost->dma_boundary = 0xffffffff;
>  
>  	shost->use_blk_mq = scsi_use_blk_mq;

Not sure if this is a patch formatting issue, but this old line wasn't
deleted.

> +	shost->use_blk_mq = scsi_use_blk_mq || !!shost->hostt->force_blk_mq;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
@ 2018-02-06 20:33   ` Omar Sandoval
  2018-02-07  0:44     ` Ming Lei
  2018-02-06 23:18   ` Jens Axboe
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Omar Sandoval @ 2018-02-06 20:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Mon, Feb 05, 2018 at 11:20:29PM +0800, Ming Lei wrote:
> Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple
> reply queues, but tags is often HBA wide.
> 
> These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
> for automatic affinity assignment.
> 
> Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
> has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
> for some irq vectors, this can't be avoided even though the allocation
> is improved.
> 
> So all these drivers have to avoid to ask HBA to complete request in
> reply queue which hasn't online CPUs assigned, and HPSA has been broken
> with v4.15+:
> 
> 	https://marc.info/?l=linux-kernel&m=151748144730409&w=2
> 
> This issue can be solved generically and easily via blk_mq(scsi_mq) multiple
> hw queue by mapping each reply queue into hctx, but one tricky thing is
> the HBA wide(instead of hw queue wide) tags.
> 
> This patch is based on the following Hannes's patch:
> 
> 	https://marc.info/?l=linux-block&m=149132580511346&w=2
> 
> One big difference with Hannes's is that this patch only makes the tags sbitmap
> and active_queues data structure HBA wide, and others are kept as NUMA locality,
> such as request, hctx, tags, ...
> 
> The following patch will support global tags on null_blk, also the performance
> data is provided, no obvious performance loss is observed when the whole
> hw queue depth is same.
> 
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-debugfs.c |  1 +
>  block/blk-mq-sched.c   | 13 ++++++++++++-
>  block/blk-mq-tag.c     | 23 ++++++++++++++++++-----
>  block/blk-mq-tag.h     |  5 ++++-
>  block/blk-mq.c         | 29 ++++++++++++++++++++++++-----
>  block/blk-mq.h         |  3 ++-
>  include/linux/blk-mq.h |  2 ++
>  7 files changed, 63 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 0dfafa4b655a..0f0fafe03f5d 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = {
>  	HCTX_FLAG_NAME(SHOULD_MERGE),
>  	HCTX_FLAG_NAME(TAG_SHARED),
>  	HCTX_FLAG_NAME(SG_MERGE),
> +	HCTX_FLAG_NAME(GLOBAL_TAGS),
>  	HCTX_FLAG_NAME(BLOCKING),
>  	HCTX_FLAG_NAME(NO_SCHED),
>  };
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..385bbec73804 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
>  	} else
>  		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
>  
> +	/* need to restart all hw queues for global tags */
> +	if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
> +		struct blk_mq_hw_ctx *hctx2;
> +		int i;
> +
> +		queue_for_each_hw_ctx(hctx->queue, hctx2, i)
> +			if (blk_mq_run_hw_queue(hctx2, true))
> +				return true;

Is it intentional that we stop after the first hw queue does work? That
seems fine but it's a little confusing because the comment claims we
restart everything.

> +		return false;
> +	}
> +

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer
  2018-02-05 15:20 ` [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer Ming Lei
@ 2018-02-06 21:41   ` Omar Sandoval
  0 siblings, 0 replies; 40+ messages in thread
From: Omar Sandoval @ 2018-02-06 21:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Mon, Feb 05, 2018 at 11:20:28PM +0800, Ming Lei wrote:
> This patch changes tags->breserved_tags, tags->bitmap_tags and
> tags->active_queues as pointer, and prepares for supporting global tags.
> 
> No functional change.
> 
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.com>

Assuming it builds :)

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/bfq-iosched.c    |  4 ++--
>  block/blk-mq-debugfs.c | 10 +++++-----
>  block/blk-mq-tag.c     | 48 ++++++++++++++++++++++++++----------------------
>  block/blk-mq-tag.h     | 10 +++++++---
>  block/blk-mq.c         |  2 +-
>  block/kyber-iosched.c  |  2 +-
>  6 files changed, 42 insertions(+), 34 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 47e6ec7427c4..1e1211814a57 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -534,9 +534,9 @@ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
>  			WARN_ON_ONCE(1);
>  			return;
>  		}
> -		bt = &tags->breserved_tags;
> +		bt = tags->breserved_tags;
>  	} else
> -		bt = &tags->bitmap_tags;
> +		bt = tags->bitmap_tags;
>  
>  	if (unlikely(bfqd->sb_shift != bt->sb.shift))
>  		bfq_update_depths(bfqd, bt);
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 21cbc1f071c6..0dfafa4b655a 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -433,14 +433,14 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
>  	seq_printf(m, "nr_tags=%u\n", tags->nr_tags);
>  	seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
>  	seq_printf(m, "active_queues=%d\n",
> -		   atomic_read(&tags->active_queues));
> +		   atomic_read(tags->active_queues));
>  
>  	seq_puts(m, "\nbitmap_tags:\n");
> -	sbitmap_queue_show(&tags->bitmap_tags, m);
> +	sbitmap_queue_show(tags->bitmap_tags, m);
>  
>  	if (tags->nr_reserved_tags) {
>  		seq_puts(m, "\nbreserved_tags:\n");
> -		sbitmap_queue_show(&tags->breserved_tags, m);
> +		sbitmap_queue_show(tags->breserved_tags, m);
>  	}
>  }
>  
> @@ -471,7 +471,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
>  	if (res)
>  		goto out;
>  	if (hctx->tags)
> -		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
> +		sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
>  out:
> @@ -505,7 +505,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
>  	if (res)
>  		goto out;
>  	if (hctx->sched_tags)
> -		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
> +		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
>  out:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 336dde07b230..571797dc36cb 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -18,7 +18,7 @@ bool blk_mq_has_free_tags(struct blk_mq_tags *tags)
>  	if (!tags)
>  		return true;
>  
> -	return sbitmap_any_bit_clear(&tags->bitmap_tags.sb);
> +	return sbitmap_any_bit_clear(&tags->bitmap_tags->sb);
>  }
>  
>  /*
> @@ -28,7 +28,7 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>  {
>  	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
>  	    !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
> -		atomic_inc(&hctx->tags->active_queues);
> +		atomic_inc(hctx->tags->active_queues);
>  
>  	return true;
>  }
> @@ -38,9 +38,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   */
>  void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>  {
> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> +	sbitmap_queue_wake_all(tags->bitmap_tags);
>  	if (include_reserve)
> -		sbitmap_queue_wake_all(&tags->breserved_tags);
> +		sbitmap_queue_wake_all(tags->breserved_tags);
>  }
>  
>  /*
> @@ -54,7 +54,7 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
>  	if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
>  		return;
>  
> -	atomic_dec(&tags->active_queues);
> +	atomic_dec(tags->active_queues);
>  
>  	blk_mq_tag_wakeup_all(tags, false);
>  }
> @@ -79,7 +79,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
>  	if (bt->sb.depth == 1)
>  		return true;
>  
> -	users = atomic_read(&hctx->tags->active_queues);
> +	users = atomic_read(hctx->tags->active_queues);
>  	if (!users)
>  		return true;
>  
> @@ -117,10 +117,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  			WARN_ON_ONCE(1);
>  			return BLK_MQ_TAG_FAIL;
>  		}
> -		bt = &tags->breserved_tags;
> +		bt = tags->breserved_tags;
>  		tag_offset = 0;
>  	} else {
> -		bt = &tags->bitmap_tags;
> +		bt = tags->bitmap_tags;
>  		tag_offset = tags->nr_reserved_tags;
>  	}
>  
> @@ -165,9 +165,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  		data->hctx = blk_mq_map_queue(data->q, data->ctx->cpu);
>  		tags = blk_mq_tags_from_data(data);
>  		if (data->flags & BLK_MQ_REQ_RESERVED)
> -			bt = &tags->breserved_tags;
> +			bt = tags->breserved_tags;
>  		else
> -			bt = &tags->bitmap_tags;
> +			bt = tags->bitmap_tags;
>  
>  		finish_wait(&ws->wait, &wait);
>  		ws = bt_wait_ptr(bt, data->hctx);
> @@ -189,10 +189,10 @@ void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
>  		const int real_tag = tag - tags->nr_reserved_tags;
>  
>  		BUG_ON(real_tag >= tags->nr_tags);
> -		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
>  	} else {
>  		BUG_ON(tag >= tags->nr_reserved_tags);
> -		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
>  	}
>  }
>  
> @@ -283,8 +283,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
>  		busy_tag_iter_fn *fn, void *priv)
>  {
>  	if (tags->nr_reserved_tags)
> -		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
> -	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
> +		bt_tags_for_each(tags, tags->breserved_tags, fn, priv, true);
> +	bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, false);
>  }
>  
>  void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
> @@ -346,8 +346,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			continue;
>  
>  		if (tags->nr_reserved_tags)
> -			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> -		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> +			bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
> +		bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
>  	}
>  
>  }
> @@ -365,15 +365,15 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  	unsigned int depth = tags->nr_tags - tags->nr_reserved_tags;
>  	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
>  
> -	if (bt_alloc(&tags->bitmap_tags, depth, round_robin, node))
> +	if (bt_alloc(tags->bitmap_tags, depth, round_robin, node))
>  		goto free_tags;
> -	if (bt_alloc(&tags->breserved_tags, tags->nr_reserved_tags, round_robin,
> +	if (bt_alloc(tags->breserved_tags, tags->nr_reserved_tags, round_robin,
>  		     node))
>  		goto free_bitmap_tags;
>  
>  	return tags;
>  free_bitmap_tags:
> -	sbitmap_queue_free(&tags->bitmap_tags);
> +	sbitmap_queue_free(tags->bitmap_tags);
>  free_tags:
>  	kfree(tags);
>  	return NULL;
> @@ -397,13 +397,17 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	tags->nr_tags = total_tags;
>  	tags->nr_reserved_tags = reserved_tags;
>  
> +	tags->bitmap_tags = &tags->__bitmap_tags;
> +	tags->breserved_tags = &tags->__breserved_tags;
> +	tags->active_queues = &tags->__active_queues;
> +
>  	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
>  }
>  
>  void blk_mq_free_tags(struct blk_mq_tags *tags)
>  {
> -	sbitmap_queue_free(&tags->bitmap_tags);
> -	sbitmap_queue_free(&tags->breserved_tags);
> +	sbitmap_queue_free(tags->bitmap_tags);
> +	sbitmap_queue_free(tags->breserved_tags);
>  	kfree(tags);
>  }
>  
> @@ -454,7 +458,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  		 * Don't need (or can't) update reserved tags here, they
>  		 * remain static and should never need resizing.
>  		 */
> -		sbitmap_queue_resize(&tags->bitmap_tags, tdepth);
> +		sbitmap_queue_resize(tags->bitmap_tags, tdepth);
>  	}
>  
>  	return 0;
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 61deab0b5a5a..a68323fa0c02 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -11,10 +11,14 @@ struct blk_mq_tags {
>  	unsigned int nr_tags;
>  	unsigned int nr_reserved_tags;
>  
> -	atomic_t active_queues;
> +	atomic_t *active_queues;
> +	atomic_t __active_queues;
>  
> -	struct sbitmap_queue bitmap_tags;
> -	struct sbitmap_queue breserved_tags;
> +	struct sbitmap_queue *bitmap_tags;
> +	struct sbitmap_queue *breserved_tags;
> +
> +	struct sbitmap_queue __bitmap_tags;
> +	struct sbitmap_queue __breserved_tags;
>  
>  	struct request **rqs;
>  	struct request **static_rqs;
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102e2149..69d4534870af 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1136,7 +1136,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx **hctx,
>  		return false;
>  	}
>  
> -	ws = bt_wait_ptr(&this_hctx->tags->bitmap_tags, this_hctx);
> +	ws = bt_wait_ptr(this_hctx->tags->bitmap_tags, this_hctx);
>  	add_wait_queue(&ws->wait, wait);
>  
>  	/*
> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> index f95c60774ce8..4adaced76382 100644
> --- a/block/kyber-iosched.c
> +++ b/block/kyber-iosched.c
> @@ -281,7 +281,7 @@ static unsigned int kyber_sched_tags_shift(struct kyber_queue_data *kqd)
>  	 * All of the hardware queues have the same depth, so we can just grab
>  	 * the shift of the first one.
>  	 */
> -	return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
> +	return kqd->q->queue_hw_ctx[0]->sched_tags->bitmap_tags->sb.shift;
>  }
>  
>  static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags'
  2018-02-05 15:20 ` [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
  2018-02-05 20:26     ` Don Brace
@ 2018-02-06 21:43   ` Omar Sandoval
  1 sibling, 0 replies; 40+ messages in thread
From: Omar Sandoval @ 2018-02-06 21:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Mon, Feb 05, 2018 at 11:20:31PM +0800, Ming Lei wrote:
> This patch introduces the parameter of 'g_global_tags' so that we can
> test this feature by null_blk easiy.
> 
> Not see obvious performance drop with global_tags when the whole hw
> depth is kept as same:
> 
> 1) no 'global_tags', each hw queue depth is 1, and 4 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=0 submit_queues=4 hw_queue_depth=1
> 
> 2) 'global_tags', global hw queue depth is 4, and 4 hw queues
> modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 global_tags=1 submit_queues=4 hw_queue_depth=4
> 
> 3) fio test done in above two settings:
>    fio --bs=4k --size=512G  --rw=randread --norandommap --direct=1 --ioengine=libaio --iodepth=4 --runtime=$RUNTIME --group_reporting=1  --name=nullb0 --filename=/dev/nullb0 --name=nullb1 --filename=/dev/nullb1 --name=nullb2 --filename=/dev/nullb2 --name=nullb3 --filename=/dev/nullb3
> 
> 1M IOPS can be reached in both above tests which is done in one VM.
> 
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>

The module parameter is just called "global_tags", not "g_global_tags",
right? The subject should say the former.

Otherwise,

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/null_blk.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
> index 287a09611c0f..ad0834efad42 100644
> --- a/drivers/block/null_blk.c
> +++ b/drivers/block/null_blk.c
> @@ -163,6 +163,10 @@ static int g_submit_queues = 1;
>  module_param_named(submit_queues, g_submit_queues, int, S_IRUGO);
>  MODULE_PARM_DESC(submit_queues, "Number of submission queues");
>  
> +static int g_global_tags = 0;
> +module_param_named(global_tags, g_global_tags, int, S_IRUGO);
> +MODULE_PARM_DESC(global_tags, "All submission queues share one tags");
> +
>  static int g_home_node = NUMA_NO_NODE;
>  module_param_named(home_node, g_home_node, int, S_IRUGO);
>  MODULE_PARM_DESC(home_node, "Home node for the device");
> @@ -1622,6 +1626,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
>  	set->flags = BLK_MQ_F_SHOULD_MERGE;
>  	if (g_no_sched)
>  		set->flags |= BLK_MQ_F_NO_SCHED;
> +	if (g_global_tags)
> +		set->flags |= BLK_MQ_F_GLOBAL_TAGS;
>  	set->driver_data = NULL;
>  
>  	if ((nullb && nullb->dev->blocking) || g_blocking)
> -- 
> 2.9.5
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq
  2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
                   ` (7 preceding siblings ...)
  2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
@ 2018-02-06 23:15 ` Jens Axboe
  8 siblings, 0 replies; 40+ messages in thread
From: Jens Axboe @ 2018-02-06 23:15 UTC (permalink / raw)
  To: Ming Lei, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On 2/5/18 8:20 AM, Ming Lei wrote:
> Hi All,
> 
> This patchset supports global tags which was started by Hannes originally:
> 
> 	https://marc.info/?l=linux-block&m=149132580511346&w=2
> 
> Also inroduce 'force_blk_mq' and 'host_tagset' to 'struct scsi_host_template',
> so that driver can avoid to support two IO paths(legacy and blk-mq), especially
> recent discusion mentioned that SCSI_MQ will be enabled at default soon.
> 
> 	https://marc.info/?l=linux-scsi&m=151727684915589&w=2
> 
> With the above changes, it should be easier to convert SCSI drivers'
> reply queue into blk-mq's hctx, then the automatic irq affinity issue can
> be solved easily, please see detailed descrption in commit log and the
> 8th patch for converting HPSA.
> 
> Also drivers may require to complete request on the submission CPU
> for avoiding hard/soft deadlock, which can be done easily with blk_mq
> too.
> 
> 	https://marc.info/?t=151601851400001&r=1&w=2
> 
> The 6th patch uses the introduced 'force_blk_mq' to fix virtio_scsi
> so that IO hang issue can be avoided inside legacy IO path, this issue is
> a bit generic, at least HPSA/virtio-scsi are found broken with v4.15+.
> 
> The 7th & 8th patch fixes HPSA's IO issue which is caused by the reply
> queue selection algorithem.
> 
> Laurence has verified that this patch makes HPSA working with the linus
> tree with this patchset.

Do you have any performance numbers for this patchset? I'd be curious
to know how big the hit is.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
  2018-02-06 20:33   ` Omar Sandoval
@ 2018-02-06 23:18   ` Jens Axboe
  2018-02-07  0:43     ` Ming Lei
  2018-02-07 16:09     ` Bart Van Assche
  2018-02-07 16:59     ` Bart Van Assche
  2018-02-08 15:25     ` Bart Van Assche
  3 siblings, 2 replies; 40+ messages in thread
From: Jens Axboe @ 2018-02-06 23:18 UTC (permalink / raw)
  To: Ming Lei, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On 2/5/18 8:20 AM, Ming Lei wrote:
> Quite a few HBAs(such as HPSA, megaraid, mpt3sas, ..) support multiple
> reply queues, but tags is often HBA wide.
> 
> These HBAs have switched to use pci_alloc_irq_vectors(PCI_IRQ_AFFINITY)
> for automatic affinity assignment.
> 
> Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
> has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
> for some irq vectors, this can't be avoided even though the allocation
> is improved.
> 
> So all these drivers have to avoid to ask HBA to complete request in
> reply queue which hasn't online CPUs assigned, and HPSA has been broken
> with v4.15+:
> 
> 	https://marc.info/?l=linux-kernel&m=151748144730409&w=2
> 
> This issue can be solved generically and easily via blk_mq(scsi_mq) multiple
> hw queue by mapping each reply queue into hctx, but one tricky thing is
> the HBA wide(instead of hw queue wide) tags.
> 
> This patch is based on the following Hannes's patch:
> 
> 	https://marc.info/?l=linux-block&m=149132580511346&w=2
> 
> One big difference with Hannes's is that this patch only makes the tags sbitmap
> and active_queues data structure HBA wide, and others are kept as NUMA locality,
> such as request, hctx, tags, ...
> 
> The following patch will support global tags on null_blk, also the performance
> data is provided, no obvious performance loss is observed when the whole
> hw queue depth is same.

GLOBAL implies that it's, strangely enough, global. That isn't really the
case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd
welcome better names, but global doesn't seem to be a great choice.

BLK_MQ_F_SET_TAGS?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-06 23:18   ` Jens Axboe
@ 2018-02-07  0:43     ` Ming Lei
  2018-02-07 16:09     ` Bart Van Assche
  1 sibling, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-07  0:43 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Mike Snitzer, linux-scsi,
	Hannes Reinecke, Arun Easi, Omar Sandoval, Martin K . Petersen,
	James Bottomley, Christoph Hellwig, Don Brace, Kashyap Desai,
	Peter Rivera, Paolo Bonzini, Laurence Oberman

On Tue, Feb 06, 2018 at 04:18:20PM -0700, Jens Axboe wrote:
> On 2/5/18 8:20 AM, Ming Lei wrote:
...
> 
> GLOBAL implies that it's, strangely enough, global. That isn't really the
> case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd
> welcome better names, but global doesn't seem to be a great choice.
> 
> BLK_MQ_F_SET_TAGS?

Good point, I am fine with either BLK_MQ_F_HOST_TAGS or BLK_MQ_F_SET_TAGS,
will update in V3.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-06 20:33   ` Omar Sandoval
@ 2018-02-07  0:44     ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-07  0:44 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Tue, Feb 06, 2018 at 12:33:36PM -0800, Omar Sandoval wrote:
> On Mon, Feb 05, 2018 at 11:20:29PM +0800, Ming Lei wrote:
......
> >  
> > +	/* need to restart all hw queues for global tags */
> > +	if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
> > +		struct blk_mq_hw_ctx *hctx2;
> > +		int i;
> > +
> > +		queue_for_each_hw_ctx(hctx->queue, hctx2, i)
> > +			if (blk_mq_run_hw_queue(hctx2, true))
> > +				return true;
> 
> Is it intentional that we stop after the first hw queue does work? That
> seems fine but it's a little confusing because the comment claims we
> restart everything.

Good catch, will update comment in V3.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 5/8] scsi: introduce force_blk_mq
  2018-02-06 20:20   ` Omar Sandoval
@ 2018-02-07  0:46     ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2018-02-07  0:46 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On Tue, Feb 06, 2018 at 12:20:43PM -0800, Omar Sandoval wrote:
> On Mon, Feb 05, 2018 at 11:20:32PM +0800, Ming Lei wrote:

...
> >  	shost->use_blk_mq = scsi_use_blk_mq;
> 
> Not sure if this is a patch formatting issue, but this old line wasn't
> deleted.

Good catch, the old line need to be removed.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-06 23:18   ` Jens Axboe
  2018-02-07  0:43     ` Ming Lei
@ 2018-02-07 16:09     ` Bart Van Assche
  1 sibling, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2018-02-07 16:09 UTC (permalink / raw)
  To: Jens Axboe, Ming Lei, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Paolo Bonzini,
	Laurence Oberman

On 02/06/18 15:18, Jens Axboe wrote:
> GLOBAL implies that it's, strangely enough, global. That isn't really the
> case. Why not call this BLK_MQ_F_HOST_TAGS or something like that? I'd
> welcome better names, but global doesn't seem to be a great choice.
> 
> BLK_MQ_F_SET_TAGS?

I like the name BLK_MQ_F_HOST_TAGS because it refers how these tags will 
be used by SCSI LLDs, namely as a host-wide tag set.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
@ 2018-02-07 16:59     ` Bart Van Assche
  2018-02-06 23:18   ` Jens Axboe
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2018-02-07 16:59 UTC (permalink / raw)
  To: hch, linux-block, snitzer, ming.lei, axboe
  Cc: hch, martin.petersen, hare, linux-scsi, don.brace,
	james.bottomley, pbonzini, arun.easi, osandov, loberman,
	kashyap.desai, peter.rivera

T24gTW9uLCAyMDE4LTAyLTA1IGF0IDIzOjIwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gZGlm
ZiAtLWdpdCBhL2Jsb2NrL2Jsay1tcS1zY2hlZC5jIGIvYmxvY2svYmxrLW1xLXNjaGVkLmMNCj4g
aW5kZXggNTVjMGE3NDViNDI3Li4zODViYmVjNzM4MDQgMTAwNjQ0DQo+IC0tLSBhL2Jsb2NrL2Js
ay1tcS1zY2hlZC5jDQo+ICsrKyBiL2Jsb2NrL2Jsay1tcS1zY2hlZC5jDQo+IEBAIC04MSw2ICs4
MSwxNyBAQCBzdGF0aWMgYm9vbCBibGtfbXFfc2NoZWRfcmVzdGFydF9oY3R4KHN0cnVjdCBibGtf
bXFfaHdfY3R4ICpoY3R4KQ0KPiAgCX0gZWxzZQ0KPiAgCQljbGVhcl9iaXQoQkxLX01RX1NfU0NI
RURfUkVTVEFSVCwgJmhjdHgtPnN0YXRlKTsNCj4gIA0KPiArCS8qIG5lZWQgdG8gcmVzdGFydCBh
bGwgaHcgcXVldWVzIGZvciBnbG9iYWwgdGFncyAqLw0KPiArCWlmIChoY3R4LT5mbGFncyAmIEJM
S19NUV9GX0dMT0JBTF9UQUdTKSB7DQo+ICsJCXN0cnVjdCBibGtfbXFfaHdfY3R4ICpoY3R4MjsN
Cj4gKwkJaW50IGk7DQo+ICsNCj4gKwkJcXVldWVfZm9yX2VhY2hfaHdfY3R4KGhjdHgtPnF1ZXVl
LCBoY3R4MiwgaSkNCj4gKwkJCWlmIChibGtfbXFfcnVuX2h3X3F1ZXVlKGhjdHgyLCB0cnVlKSkN
Cj4gKwkJCQlyZXR1cm4gdHJ1ZTsNCj4gKwkJcmV0dXJuIGZhbHNlOw0KPiArCX0NCj4gKw0KPiAg
CXJldHVybiBibGtfbXFfcnVuX2h3X3F1ZXVlKGhjdHgsIHRydWUpOw0KPiAgfQ0KDQpUaGlzIG5l
dyBsb29wIGxvb2tzIG1pc3BsYWNlZCB0byBtZS4gSWYgYm90aCB0aGUgQkxLX01RX0ZfR0xPQkFM
X1RBR1MgYW5kIHRoZQ0KQkxLX01RX0ZfVEFHX1NIQVJFRCBmbGFncyBhcmUgc2V0IHRoZW4gdGhl
IG91dGVyIGxvb3AgaW4gYmxrX21xX3NjaGVkX3Jlc3RhcnQoKQ0KYW5kIHRoZSBpbm5lciBsb29w
IGluIGJsa19tcV9zY2hlZF9yZXN0YXJ0X2hjdHgoKSB3aWxsIGNhdXNlIG1vcmUgY2FsbHMgb2YN
CmJsa19tcV9ydW5faHdfcXVldWUoKSB0aGFuIG5lY2Vzc2FyeS4gSGF2ZSB5b3UgY29uc2lkZXJl
ZCB0byBtZXJnZSB0aGUgYWJvdmUNCmxvb3AgaW50byBibGtfbXFfc2NoZWRfcmVzdGFydCgpPw0K
DQpUaGFua3MsDQoNCkJhcnQuDQoNCg0K

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
@ 2018-02-07 16:59     ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2018-02-07 16:59 UTC (permalink / raw)
  To: hch, linux-block, snitzer, ming.lei, axboe
  Cc: hch, martin.petersen, hare, linux-scsi, don.brace,
	james.bottomley, pbonzini, arun.easi, osandov, loberman,
	kashyap.desai, peter.rivera

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..385bbec73804 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
>  	} else
>  		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
>  
> +	/* need to restart all hw queues for global tags */
> +	if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
> +		struct blk_mq_hw_ctx *hctx2;
> +		int i;
> +
> +		queue_for_each_hw_ctx(hctx->queue, hctx2, i)
> +			if (blk_mq_run_hw_queue(hctx2, true))
> +				return true;
> +		return false;
> +	}
> +
>  	return blk_mq_run_hw_queue(hctx, true);
>  }

This new loop looks misplaced to me. If both the BLK_MQ_F_GLOBAL_TAGS and the
BLK_MQ_F_TAG_SHARED flags are set then the outer loop in blk_mq_sched_restart()
and the inner loop in blk_mq_sched_restart_hctx() will cause more calls of
blk_mq_run_hw_queue() than necessary. Have you considered to merge the above
loop into blk_mq_sched_restart()?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
  2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
@ 2018-02-08 15:25     ` Bart Van Assche
  2018-02-06 23:18   ` Jens Axboe
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2018-02-08 15:25 UTC (permalink / raw)
  To: hch, linux-block, snitzer, ming.lei, axboe
  Cc: hch, martin.petersen, hare, linux-scsi, don.brace,
	james.bottomley, pbonzini, arun.easi, osandov, loberman,
	kashyap.desai, peter.rivera

T24gTW9uLCAyMDE4LTAyLTA1IGF0IDIzOjIwICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gZGlm
ZiAtLWdpdCBhL2Jsb2NrL2Jsay1tcS1zY2hlZC5jIGIvYmxvY2svYmxrLW1xLXNjaGVkLmMNCj4g
aW5kZXggNTVjMGE3NDViNDI3Li4zODViYmVjNzM4MDQgMTAwNjQ0DQo+IC0tLSBhL2Jsb2NrL2Js
ay1tcS1zY2hlZC5jDQo+ICsrKyBiL2Jsb2NrL2Jsay1tcS1zY2hlZC5jDQo+IEBAIC04MSw2ICs4
MSwxNyBAQCBzdGF0aWMgYm9vbCBibGtfbXFfc2NoZWRfcmVzdGFydF9oY3R4KHN0cnVjdCBibGtf
bXFfaHdfY3R4ICpoY3R4KQ0KPiAgCX0gZWxzZQ0KPiAgCQljbGVhcl9iaXQoQkxLX01RX1NfU0NI
RURfUkVTVEFSVCwgJmhjdHgtPnN0YXRlKTsNCj4gIA0KPiArCS8qIG5lZWQgdG8gcmVzdGFydCBh
bGwgaHcgcXVldWVzIGZvciBnbG9iYWwgdGFncyAqLw0KPiArCWlmIChoY3R4LT5mbGFncyAmIEJM
S19NUV9GX0dMT0JBTF9UQUdTKSB7DQo+ICsJCXN0cnVjdCBibGtfbXFfaHdfY3R4ICpoY3R4MjsN
Cj4gKwkJaW50IGk7DQo+ICsNCj4gKwkJcXVldWVfZm9yX2VhY2hfaHdfY3R4KGhjdHgtPnF1ZXVl
LCBoY3R4MiwgaSkNCj4gKwkJCWlmIChibGtfbXFfcnVuX2h3X3F1ZXVlKGhjdHgyLCB0cnVlKSkN
Cj4gKwkJCQlyZXR1cm4gdHJ1ZTsNCj4gKwkJcmV0dXJuIGZhbHNlOw0KPiArCX0NCj4gKw0KPiAg
CXJldHVybiBibGtfbXFfcnVuX2h3X3F1ZXVlKGhjdHgsIHRydWUpOw0KPiAgfQ0KDQpJdCBzZWVt
cyB3ZWlyZCB0byBtZSB0aGF0IG5vIG1hdHRlciBmb3Igd2hpY2ggaGFyZHdhcmUgcXVldWUgYSBy
ZXN0YXJ0IGlzDQpyZXF1ZXN0ZWQgKHRoZSBoY3R4IGFyZ3VtZW50KSB0aGF0IHRoZSBhYm92ZSBs
b29wIHN0YXJ0cyB3aXRoIGV4YW1pbmluZw0KdGhlIGhhcmR3YXJlIHF1ZXVlIHdpdGggaW5kZXgg
MC4gV2lsbCB0aGlzIGNhdXNlIGZhaXJuZXNzIGFuZC9vciBjYWNoZQ0KbGluZSBib3VuY2luZyBw
cm9ibGVtcz8NCg0KVGhhbmtzLA0KDQpCYXJ0Lg0KDQoNCg0KDQo=

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS
@ 2018-02-08 15:25     ` Bart Van Assche
  0 siblings, 0 replies; 40+ messages in thread
From: Bart Van Assche @ 2018-02-08 15:25 UTC (permalink / raw)
  To: hch, linux-block, snitzer, ming.lei, axboe
  Cc: hch, martin.petersen, hare, linux-scsi, don.brace,
	james.bottomley, pbonzini, arun.easi, osandov, loberman,
	kashyap.desai, peter.rivera

On Mon, 2018-02-05 at 23:20 +0800, Ming Lei wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55c0a745b427..385bbec73804 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -81,6 +81,17 @@ static bool blk_mq_sched_restart_hctx(struct blk_mq_hw_ctx *hctx)
>  	} else
>  		clear_bit(BLK_MQ_S_SCHED_RESTART, &hctx->state);
>  
> +	/* need to restart all hw queues for global tags */
> +	if (hctx->flags & BLK_MQ_F_GLOBAL_TAGS) {
> +		struct blk_mq_hw_ctx *hctx2;
> +		int i;
> +
> +		queue_for_each_hw_ctx(hctx->queue, hctx2, i)
> +			if (blk_mq_run_hw_queue(hctx2, true))
> +				return true;
> +		return false;
> +	}
> +
>  	return blk_mq_run_hw_queue(hctx, true);
>  }

It seems weird to me that no matter for which hardware queue a restart is
requested (the hctx argument) that the above loop starts with examining
the hardware queue with index 0. Will this cause fairness and/or cache
line bouncing problems?

Thanks,

Bart.





^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-02-08 15:25 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-05 15:20 [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Ming Lei
2018-02-05 15:20 ` [PATCH V2 1/8] blk-mq: tags: define several fields of tags as pointer Ming Lei
2018-02-06 21:41   ` Omar Sandoval
2018-02-05 15:20 ` [PATCH V2 2/8] blk-mq: introduce BLK_MQ_F_GLOBAL_TAGS Ming Lei
2018-02-06 20:33   ` Omar Sandoval
2018-02-07  0:44     ` Ming Lei
2018-02-06 23:18   ` Jens Axboe
2018-02-07  0:43     ` Ming Lei
2018-02-07 16:09     ` Bart Van Assche
2018-02-07 16:59   ` Bart Van Assche
2018-02-07 16:59     ` Bart Van Assche
2018-02-08 15:25   ` Bart Van Assche
2018-02-08 15:25     ` Bart Van Assche
2018-02-05 15:20 ` [PATCH V2 3/8] scsi: Add template flag 'host_tagset' Ming Lei
2018-02-05 15:20 ` [PATCH V2 4/8] block: null_blk: introduce module parameter of 'g_global_tags' Ming Lei
2018-02-05 20:26   ` Don Brace
2018-02-05 20:26     ` Don Brace
2018-02-06 21:43   ` Omar Sandoval
2018-02-05 15:20 ` [PATCH V2 5/8] scsi: introduce force_blk_mq Ming Lei
2018-02-06 20:20   ` Omar Sandoval
2018-02-07  0:46     ` Ming Lei
2018-02-05 15:20 ` [PATCH V2 6/8] scsi: virtio_scsi: fix IO hang by irq vector automatic affinity Ming Lei
2018-02-05 15:56   ` Paolo Bonzini
2018-02-05 15:20 ` [PATCH V2 7/8] scsi: hpsa: call hpsa_hba_inquiry() after adding host Ming Lei
2018-02-05 18:55   ` Don Brace
2018-02-05 18:55     ` Don Brace
2018-02-06  8:32   ` Hannes Reinecke
2018-02-05 15:20 ` [PATCH V2 8/8] scsi: hpsa: use blk_mq to solve irq affinity issue Ming Lei
2018-02-05 15:58   ` Laurence Oberman
2018-02-05 15:58     ` Laurence Oberman
2018-02-05 16:07     ` Don Brace
2018-02-05 16:07       ` Don Brace
2018-02-05 18:54     ` Don Brace
2018-02-05 18:54       ` Don Brace
2018-02-06  2:18   ` chenxiang (M)
2018-02-06  2:18     ` chenxiang (M)
2018-02-06  8:23     ` Ming Lei
2018-02-06  8:39   ` Hannes Reinecke
2018-02-06  9:51     ` Ming Lei
2018-02-06 23:15 ` [PATCH V2 0/8] blk-mq/scsi-mq: support global tags & introduce force_blk_mq Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.