All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2019-11-19 14:27 John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

This is another stab at solving the problem of hostwide shared tags for SCSI
HBAs.

As discussed previously, Ming Lei's most recent series in [0] to use
hctx[0] tags for all hctx for a host was a bit messy and intrusive, so seen
as a no go. Indeed, blk-mq is designed for separate tags per hctx.

Bart also followed up on my v1 RFC with another implementation along those
same lines, which was neater, but I am concerned that the change in this
approach may cause issues - see [1].

This series introduces a different approach to solve that problem, in
keeping the per-hctx tags but introducing a new separate sbitmap per
tagset. The shared sbitmap is used to generate a unique tag over all hctx per
tagset.

Currently I just fixed up the hisi_sas driver to use the shared tags,
but should not be much trouble to change others over.

Patch #3 is still quite experimental at this point - I added some code
comments on this. I also threw in a minor tidy-up patch.

[0] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[1] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Differences to v1:
- Use a shared sbitmap, and not a separate shared tags (a big change!)
	- Drop request.shared_tag
- Add RB tags

Hannes Reinecke (1):
  scsi: Add template flag 'host_tagset'

John Garry (3):
  blk-mq: Remove some unused function arguments
  blk-mq: Facilitate a shared sbitmap per tagset
  scsi: hisi_sas: Switch v3 hw to MQ

Ming Lei (1):
  blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED

 block/bfq-iosched.c                    |  4 +-
 block/blk-mq-debugfs.c                 |  6 +-
 block/blk-mq-sched.c                   | 14 +++++
 block/blk-mq-tag.c                     | 80 +++++++++++++++++------
 block/blk-mq-tag.h                     | 18 ++++--
 block/blk-mq.c                         | 87 ++++++++++++++++++++------
 block/blk-mq.h                         |  7 ++-
 block/kyber-iosched.c                  |  4 +-
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++--------------
 drivers/scsi/scsi_lib.c                |  2 +
 include/linux/blk-mq.h                 |  9 ++-
 include/scsi/scsi_host.h               |  3 +
 14 files changed, 239 insertions(+), 120 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

The struct blk_mq_hw_ctx * argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c |  4 ++--
 block/blk-mq-tag.h |  3 +--
 block/blk-mq.c     | 10 ++++------
 block/blk-mq.h     |  2 +-
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index fbacde454718..586c9d6e904a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -183,8 +183,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	return tag + tag_offset;
 }
 
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-		    struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+		    unsigned int tag)
 {
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 15bc74acb57e..d0c10d043891 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,7 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-			   struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75228d282fc3..20c498fce4dd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -493,9 +493,9 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
 	if (rq->tag != -1)
-		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
@@ -3354,7 +3354,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
 }
 
 static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
-				       struct blk_mq_hw_ctx *hctx,
 				       struct request *rq)
 {
 	unsigned long ret = 0;
@@ -3387,7 +3386,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 }
 
 static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
-				     struct blk_mq_hw_ctx *hctx,
 				     struct request *rq)
 {
 	struct hrtimer_sleeper hs;
@@ -3407,7 +3405,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
-		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+		nsecs = blk_mq_poll_nsecs(q, rq);
 
 	if (!nsecs)
 		return false;
@@ -3462,7 +3460,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 			return false;
 	}
 
-	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
 /**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 32c62c64e6c2..78d38b5f2793 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -208,7 +208,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
 	rq->tag = -1;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

From: Ming Lei <ming.lei@redhat.com>

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-debugfs.c |  2 +-
 block/blk-mq-tag.c     |  2 +-
 block/blk-mq-tag.h     |  4 ++--
 block/blk-mq.c         | 20 ++++++++++----------
 include/linux/blk-mq.h |  2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..33a40ae1d60f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -236,7 +236,7 @@ static const char *const alloc_policy_name[] = {
 #define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
-	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 586c9d6e904a..42792942b428 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -65,7 +65,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index d0c10d043891..e36bec8e0970 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -53,7 +53,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
 static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return false;
 
 	return __blk_mq_tag_busy(hctx);
@@ -61,7 +61,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return;
 
 	__blk_mq_tag_idle(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20c498fce4dd..2d2956249ae9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -296,7 +296,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		rq->tag = -1;
 		rq->internal_tag = tag;
 	} else {
-		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
+		if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
 			rq_flags = RQF_MQ_INFLIGHT;
 			atomic_inc(&data->hctx->nr_active);
 		}
@@ -1112,7 +1112,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -1243,7 +1243,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				 * For non-shared tags, the RESTART check
 				 * will suffice.
 				 */
-				if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+				if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 					no_tag = true;
 				break;
 			}
@@ -2358,7 +2358,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
 	hctx->queue = q;
-	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
 
 	INIT_LIST_HEAD(&hctx->hctx_list);
 
@@ -2575,9 +2575,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (shared)
-			hctx->flags |= BLK_MQ_F_TAG_SHARED;
+			hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		else
-			hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+			hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 	}
 }
 
@@ -2603,7 +2603,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	list_del_rcu(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
-		set->flags &= ~BLK_MQ_F_TAG_SHARED;
+		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, false);
 	}
@@ -2620,12 +2620,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
 	if (!list_empty(&set->tag_list) &&
-	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
-		set->flags |= BLK_MQ_F_TAG_SHARED;
+	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, true);
 	}
-	if (set->flags & BLK_MQ_F_TAG_SHARED)
+	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
 	list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 201abcb6bd6e..8952c9dfef79 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -225,7 +225,7 @@ struct blk_mq_ops {
 
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
-	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-21  8:55   ` Ming Lei
  2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
  4 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown.

To solve that problem, Ming's patchset to drain hctx's should ensure no
IOs are missed in-flight [1].

However, to take advantage of that patchset, we need to map the HBA HW
queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
host busy counter, which would stop the LLDD being sent more than
.can_queue commands; however, we should still ensure that the block layer
does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
tags and requests per hctx; the reason for this is that if we only allocate
tags and requests for a single hctx - like hctx0 - we may break block
drivers which expect a request be associated with a specific hctx, i.e.
not hctx0.

This is based on work originally from Ming Lei in [3] and from Bart's
suggestion in [4].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
[3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/bfq-iosched.c    |  4 +--
 block/blk-mq-debugfs.c |  4 +--
 block/blk-mq-sched.c   | 14 ++++++++
 block/blk-mq-tag.c     | 74 +++++++++++++++++++++++++++++++++---------
 block/blk-mq-tag.h     | 15 +++++++--
 block/blk-mq.c         | 57 +++++++++++++++++++++++++++++---
 block/blk-mq.h         |  5 +++
 block/kyber-iosched.c  |  4 +--
 include/linux/blk-mq.h |  7 ++++
 9 files changed, 157 insertions(+), 27 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0319d6339822..9633c864af07 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
+	sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 33a40ae1d60f..9cf2f09c08e4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->tags)
+	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
 		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
@@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->sched_tags)
+	if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
 		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..b23547f10949 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 {
+	struct blk_mq_tag_set *tag_set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
 	struct elevator_queue *eq;
 	unsigned int i;
@@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		blk_mq_debugfs_register_sched_hctx(q, hctx);
 	}
 
+	if (blk_mq_is_sbitmap_shared(tag_set)) {
+		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
+			ret = -ENOMEM;
+			goto err;
+		}
+		queue_for_each_hw_ctx(q, hctx, i) {
+			struct blk_mq_tags *tags = hctx->sched_tags;
+
+			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
+			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
+		}
+	}
+
 	return 0;
 
 err:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 42792942b428..6625bebb46c3 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->pbitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->pbreserved_tags);
 }
 
 /*
@@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_TAG_FAIL;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->pbreserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->pbitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->pbreserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->pbitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
+	bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
 }
 
 /**
@@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 		     node))
 		goto free_bitmap_tags;
 
+	tags->pbitmap_tags = &tags->bitmap_tags;
+	tags->pbreserved_tags = &tags->breserved_tags;
+
 	return tags;
 free_bitmap_tags:
 	sbitmap_queue_free(&tags->bitmap_tags);
@@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return NULL;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
+	return false;
+}
+
+bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
+{
+	unsigned int depth = nr_requests -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
+	return false;
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
 {
@@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	if (blk_mq_is_sbitmap_shared(set))
+		return tags;
 	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
 }
 
@@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
+		sbitmap_queue_resize(tags->pbitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index e36bec8e0970..198b6fbe2c22 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,20 +13,31 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
+	/* We should consider deleting these so they are not referenced by mistake */
 	struct sbitmap_queue bitmap_tags;
 	struct sbitmap_queue breserved_tags;
 
+	struct sbitmap_queue *pbitmap_tags;
+	struct sbitmap_queue *pbreserved_tags;
+
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
+					     unsigned long nr_requests);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+					    unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags,
+			   struct blk_mq_ctx *ctx, unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2d2956249ae9..8d5b21919c9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = &hctx->tags->bitmap_tags;
+		sbq = hctx->tags->pbitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
@@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set)) {
+		if (!blk_mq_init_shared_sbitmap(set)) {
+			ret = -ENOMEM;
+			goto out_free_mq_map;
+		}
+
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			struct blk_mq_tags *tags = set->tags[i];
+
+			tags->pbitmap_tags = &set->shared_bitmap_tags;
+			tags->pbreserved_tags = &set->shared_breserved_tags;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
@@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
+	bool sched_tags = false;
 	int i, ret;
 
 	if (!set)
@@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
 		} else {
+			sched_tags = true;
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
 		}
@@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
 
-	if (!ret)
+	/*
+	 * if ret is 0, all queues should have been updated to the same depth
+	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
+	 * if some are updated, we should probably roll back the change altogether. FIXME
+	 */
+	if (!ret) {
+		if (blk_mq_is_sbitmap_shared(set)) {
+			if (sched_tags) {
+				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
+				sbitmap_queue_free(&set->sched_shared_breserved_tags);
+				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
+					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
+				}
+			} else {
+				sbitmap_queue_free(&set->shared_bitmap_tags);
+				sbitmap_queue_free(&set->shared_breserved_tags);
+				if (!blk_mq_init_shared_sbitmap(set))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->tags->pbitmap_tags = &set->shared_bitmap_tags;
+					hctx->tags->pbreserved_tags = &set->shared_breserved_tags;
+				}
+			}
+		}
 		q->nr_requests = nr;
+	}
+	/*
+	 * if ret != 0, q->nr_requests would not be updated, yet the depth
+	 * for some hctx may have changed - is that right?
+	 */
 
 	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 78d38b5f2793..4c1ea206d3f4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0ef637..59fb6afd8e68 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return q->queue_hw_ctx[0]->sched_tags->pbitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
-	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+	sbitmap_queue_min_shallow_depth(hctx->sched_tags->pbitmap_tags,
 					kqd->async_depth);
 
 	return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8952c9dfef79..9b923b42c07b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
 	unsigned int		flags;		/* BLK_MQ_F_* */
 	void			*driver_data;
 
+	struct sbitmap_queue shared_bitmap_tags;
+	struct sbitmap_queue shared_breserved_tags;
+
+	struct sbitmap_queue sched_shared_bitmap_tags;
+	struct sbitmap_queue sched_shared_breserved_tags;
+
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -226,6 +232,7 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset'
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (2 preceding siblings ...)
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5447738906ac..97c9352f2d16 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1901,6 +1901,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 31e0d6ca1eba..08299aeb822d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -442,6 +442,9 @@ struct scsi_host_template {
 	/* True if the low-level driver supports blk-mq only */
 	unsigned force_blk_mq:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (3 preceding siblings ...)
  2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
@ 2019-11-19 14:27 ` John Garry
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++---------------
 3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 720c4d6be939..5dbc59084f63 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
 #define _HISI_SAS_H_
 
 #include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
@@ -390,7 +392,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* bist */
 	enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 0847e682797b..48d6f1c7f9c2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -421,6 +421,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
 	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+	struct scsi_cmnd *scmd = NULL;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -436,10 +437,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	if (task->uldd_task) {
+		struct ata_queued_cmd *qc;
 
+		if (dev_is_sata(device)) {
+			qc = task->uldd_task;
+			scmd = qc->scsicmd;
+		} else {
+			scmd = task->uldd_task;
+		}
+	}
+
+	if (scmd) {
+		unsigned int dq_index;
+		u32 blk_tag;
+
+		blk_tag = blk_mq_unique_tag(scmd->request);
+		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
 		*dq_pointer = dq = sas_dev->dq;
@@ -468,21 +482,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
+	else
+		rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
 
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
-		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index cb8d087762db..5180360482eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2344,66 +2344,35 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 {
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+	int vectors;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	struct irq_affinity desc = {
+		.pre_vectors = BASE_VECTORS_V3_HW,
+	};
+
+	min_msi = MIN_AFFINE_VECTORS_V3_HW;
+	vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+						 min_msi, max_msi,
+						 PCI_IRQ_MSI |
+						 PCI_IRQ_AFFINITY,
+						 &desc);
+	if (vectors < 0)
+		return -ENOENT;
 
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
 
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
+	return 0;
 }
 
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc, i;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
-	if (auto_affine_msi_experimental) {
-		struct irq_affinity desc = {
-			.pre_vectors = BASE_VECTORS_V3_HW,
-		};
-
-		min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
-		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
-							 min_msi, max_msi,
-							 PCI_IRQ_MSI |
-							 PCI_IRQ_AFFINITY,
-							 &desc);
-		if (vectors < 0)
-			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
-	} else {
-		min_msi = max_msi;
-		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
-						max_msi, PCI_IRQ_MSI);
-		if (vectors < 0)
-			return vectors;
-	}
-
-	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	int rc, i;
 
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
@@ -3048,6 +3017,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
 	return 0;
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+					     BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -3056,6 +3034,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
+	.map_queues		= hisi_sas_map_queues,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
@@ -3069,6 +3048,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3240,6 +3220,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	rc = interrupt_preinit_v3_hw(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+	dev_err(dev, "%d hw qeues\n", shost->nr_hw_queues);
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-21  8:55   ` Ming Lei
  2019-11-21 10:24     ` John Garry
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2019-11-21  8:55 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang66

On Tue, Nov 19, 2019 at 10:27:36PM +0800, John Garry wrote:
> Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
> multiple reply queues with single hostwide tags.
> 
> In addition, these drivers want to use interrupt assignment in
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
> CPU hotplug may cause in-flight IO completion to not be serviced when an
> interrupt is shutdown.
> 
> To solve that problem, Ming's patchset to drain hctx's should ensure no
> IOs are missed in-flight [1].
> 
> However, to take advantage of that patchset, we need to map the HBA HW
> queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.
> 
> In making that transition, the per-SCSI command request tags are no
> longer unique per Scsi host - they are just unique per hctx. As such, the
> HBA LLDD would have to generate this tag internally, which has a certain
> performance overhead.
> 
> However another problem is that blk mq assumes the host may accept
> (Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
> host busy counter, which would stop the LLDD being sent more than
> .can_queue commands; however, we should still ensure that the block layer
> does not issue more than .can_queue commands to the Scsi host.
> 
> To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
> which may be requested at init time.
> 
> New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
> tagset to indicate whether the shared sbitmap should be used.
> 
> Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
> tags and requests per hctx; the reason for this is that if we only allocate
> tags and requests for a single hctx - like hctx0 - we may break block
> drivers which expect a request be associated with a specific hctx, i.e.
> not hctx0.
> 
> This is based on work originally from Ming Lei in [3] and from Bart's
> suggestion in [4].
> 
> [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
> [1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
> [2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
> [3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
> [4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/bfq-iosched.c    |  4 +--
>  block/blk-mq-debugfs.c |  4 +--
>  block/blk-mq-sched.c   | 14 ++++++++
>  block/blk-mq-tag.c     | 74 +++++++++++++++++++++++++++++++++---------
>  block/blk-mq-tag.h     | 15 +++++++--
>  block/blk-mq.c         | 57 +++++++++++++++++++++++++++++---
>  block/blk-mq.h         |  5 +++
>  block/kyber-iosched.c  |  4 +--
>  include/linux/blk-mq.h |  7 ++++
>  9 files changed, 157 insertions(+), 27 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0319d6339822..9633c864af07 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
>  	struct blk_mq_tags *tags = hctx->sched_tags;
>  	unsigned int min_shallow;
>  
> -	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
> -	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
> +	min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
> +	sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
>  }
>  
>  static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 33a40ae1d60f..9cf2f09c08e4 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
>  	res = mutex_lock_interruptible(&q->sysfs_lock);
>  	if (res)
>  		goto out;
> -	if (hctx->tags)
> +	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
>  		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
> @@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
>  	res = mutex_lock_interruptible(&q->sysfs_lock);
>  	if (res)
>  		goto out;
> -	if (hctx->sched_tags)
> +	if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
>  		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..b23547f10949 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
>  
>  int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  {
> +	struct blk_mq_tag_set *tag_set = q->tag_set;
>  	struct blk_mq_hw_ctx *hctx;
>  	struct elevator_queue *eq;
>  	unsigned int i;
> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  		blk_mq_debugfs_register_sched_hctx(q, hctx);
>  	}
>  
> +	if (blk_mq_is_sbitmap_shared(tag_set)) {
> +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +		queue_for_each_hw_ctx(q, hctx, i) {
> +			struct blk_mq_tags *tags = hctx->sched_tags;
> +
> +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;

This kind of sharing is wrong, sched tags should be request queue wide
instead of tagset wide, and each request queue has its own & independent
scheduler queue.

> +		}
> +	}
> +
>  	return 0;
>  
>  err:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 42792942b428..6625bebb46c3 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   */
>  void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>  {
> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> +	sbitmap_queue_wake_all(tags->pbitmap_tags);
>  	if (include_reserve)
> -		sbitmap_queue_wake_all(&tags->breserved_tags);
> +		sbitmap_queue_wake_all(tags->pbreserved_tags);
>  }
>  
>  /*
> @@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  			WARN_ON_ONCE(1);
>  			return BLK_MQ_TAG_FAIL;
>  		}
> -		bt = &tags->breserved_tags;
> +		bt = tags->pbreserved_tags;
>  		tag_offset = 0;
>  	} else {
> -		bt = &tags->bitmap_tags;
> +		bt = tags->pbitmap_tags;
>  		tag_offset = tags->nr_reserved_tags;
>  	}
>  
> @@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  						data->ctx);
>  		tags = blk_mq_tags_from_data(data);
>  		if (data->flags & BLK_MQ_REQ_RESERVED)
> -			bt = &tags->breserved_tags;
> +			bt = tags->pbreserved_tags;
>  		else
> -			bt = &tags->bitmap_tags;
> +			bt = tags->pbitmap_tags;
>  
>  		/*
>  		 * If destination hw queue is changed, fake wake up on
> @@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
>  		const int real_tag = tag - tags->nr_reserved_tags;
>  
>  		BUG_ON(real_tag >= tags->nr_tags);
> -		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
>  	} else {
>  		BUG_ON(tag >= tags->nr_reserved_tags);
> -		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
>  	}
>  }
>  
> @@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>  	 * We can hit rq == NULL here, because the tagging functions
>  	 * test and set the bit before assigning ->rqs[].
>  	 */
> -	if (rq && rq->q == hctx->queue)
> +	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
>  		return iter_data->fn(hctx, rq, iter_data->data, reserved);
>  	return true;
>  }
> @@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
>  		busy_tag_iter_fn *fn, void *priv)
>  {
>  	if (tags->nr_reserved_tags)
> -		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
> -	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
> +		bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
> +	bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
>  }
>  
>  /**
> @@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			continue;
>  
>  		if (tags->nr_reserved_tags)
> -			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> -		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> +			bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
> +		bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
>  	}
>  	blk_queue_exit(q);
>  }
> @@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  		     node))
>  		goto free_bitmap_tags;
>  
> +	tags->pbitmap_tags = &tags->bitmap_tags;
> +	tags->pbreserved_tags = &tags->breserved_tags;
> +
>  	return tags;
>  free_bitmap_tags:
>  	sbitmap_queue_free(&tags->bitmap_tags);
> @@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  	return NULL;
>  }
>  
> -struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> +bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
> +{
> +	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
> +	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> +	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> +	int node = tag_set->numa_node;
> +
> +	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
> +		return false;
> +	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
> +			 node))
> +		goto free_bitmap_tags;
> +
> +	return true;
> +free_bitmap_tags:
> +	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
> +	return false;
> +}
> +
> +bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
> +{
> +	unsigned int depth = nr_requests -tag_set->reserved_tags;
> +	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> +	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> +	int node = tag_set->numa_node;
> +
> +	if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
> +		return false;
> +	if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
> +			 node))
> +		goto free_bitmap_tags;
> +
> +	return true;
> +free_bitmap_tags:
> +	sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
> +	return false;
> +}
> +
> +struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
> +				     unsigned int total_tags,
>  				     unsigned int reserved_tags,
>  				     int node, int alloc_policy)
>  {
> @@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	tags->nr_tags = total_tags;
>  	tags->nr_reserved_tags = reserved_tags;
>  
> +	if (blk_mq_is_sbitmap_shared(set))
> +		return tags;
>  	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
>  }
>  
> @@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  		 * Don't need (or can't) update reserved tags here, they
>  		 * remain static and should never need resizing.
>  		 */
> -		sbitmap_queue_resize(&tags->bitmap_tags,
> +		sbitmap_queue_resize(tags->pbitmap_tags,
>  				tdepth - tags->nr_reserved_tags);
>  	}
>  
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index e36bec8e0970..198b6fbe2c22 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -13,20 +13,31 @@ struct blk_mq_tags {
>  
>  	atomic_t active_queues;
>  
> +	/* We should consider deleting these so they are not referenced by mistake */
>  	struct sbitmap_queue bitmap_tags;
>  	struct sbitmap_queue breserved_tags;
>  
> +	struct sbitmap_queue *pbitmap_tags;
> +	struct sbitmap_queue *pbreserved_tags;
> +
>  	struct request **rqs;
>  	struct request **static_rqs;
>  	struct list_head page_list;
>  };
>  
>  
> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
> +extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
> +extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
> +					     unsigned long nr_requests);
> +extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
> +					    unsigned int nr_tags,
> +					    unsigned int reserved_tags,
> +					    int node, int alloc_policy);
>  extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>  
>  extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
> -extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
> +extern void blk_mq_put_tag(struct blk_mq_tags *tags,
> +			   struct blk_mq_ctx *ctx, unsigned int tag);
>  extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  					struct blk_mq_tags **tags,
>  					unsigned int depth, bool can_grow);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2d2956249ae9..8d5b21919c9a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
>  		struct sbitmap_queue *sbq;
>  
>  		list_del_init(&wait->entry);
> -		sbq = &hctx->tags->bitmap_tags;
> +		sbq = hctx->tags->pbitmap_tags;
>  		atomic_dec(&sbq->ws_active);
>  	}
>  	spin_unlock(&hctx->dispatch_wait_lock);
> @@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
>  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
>  				 struct request *rq)
>  {
> -	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
> +	struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
>  	struct wait_queue_head *wq;
>  	wait_queue_entry_t *wait;
>  	bool ret;
> @@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  	if (node == NUMA_NO_NODE)
>  		node = set->numa_node;
>  
> -	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> +	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
>  				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>  	if (!tags)
>  		return NULL;
> @@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	if (ret)
>  		goto out_free_mq_map;
>  
> +	if (blk_mq_is_sbitmap_shared(set)) {
> +		if (!blk_mq_init_shared_sbitmap(set)) {
> +			ret = -ENOMEM;
> +			goto out_free_mq_map;
> +		}
> +
> +		for (i = 0; i < set->nr_hw_queues; i++) {
> +			struct blk_mq_tags *tags = set->tags[i];
> +
> +			tags->pbitmap_tags = &set->shared_bitmap_tags;
> +			tags->pbreserved_tags = &set->shared_breserved_tags;
> +		}
> +	}
> +
>  	mutex_init(&set->tag_list_lock);
>  	INIT_LIST_HEAD(&set->tag_list);
>  
> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  {
>  	struct blk_mq_tag_set *set = q->tag_set;
>  	struct blk_mq_hw_ctx *hctx;
> +	bool sched_tags = false;
>  	int i, ret;
>  
>  	if (!set)
> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>  							false);
>  		} else {
> +			sched_tags = true;
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>  							nr, true);
>  		}
> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  			q->elevator->type->ops.depth_updated(hctx);
>  	}
>  
> -	if (!ret)
> +	/*
> +	 * if ret is 0, all queues should have been updated to the same depth
> +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> +	 * if some are updated, we should probably roll back the change altogether. FIXME
> +	 */
> +	if (!ret) {
> +		if (blk_mq_is_sbitmap_shared(set)) {
> +			if (sched_tags) {
> +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
> +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> +					return -ENOMEM; /* fixup error handling */
> +
> +				queue_for_each_hw_ctx(q, hctx, i) {
> +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> +				}
> +			} else {
> +				sbitmap_queue_free(&set->shared_bitmap_tags);
> +				sbitmap_queue_free(&set->shared_breserved_tags);
> +				if (!blk_mq_init_shared_sbitmap(set))
> +					return -ENOMEM; /* fixup error handling */

No, we can't re-allocate driver tags here which are shared by all LUNs.
And you should see that 'can_grow' is set as false for driver tags
in blk_mq_update_nr_requests(), which can only touch per-request-queue
data, not tagset wide data.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-21  8:55   ` Ming Lei
@ 2019-11-21 10:24     ` John Garry
  2019-11-25  3:00       ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-21 10:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang (M)

>>   
>>   int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>>   {
>> +	struct blk_mq_tag_set *tag_set = q->tag_set;
>>   	struct blk_mq_hw_ctx *hctx;
>>   	struct elevator_queue *eq;
>>   	unsigned int i;
>> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>>   		blk_mq_debugfs_register_sched_hctx(q, hctx);
>>   	}
>>   
>> +	if (blk_mq_is_sbitmap_shared(tag_set)) {
>> +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
>> +			ret = -ENOMEM;
>> +			goto err;
>> +		}
>> +		queue_for_each_hw_ctx(q, hctx, i) {
>> +			struct blk_mq_tags *tags = hctx->sched_tags;
>> +
>> +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
>> +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
> 
> This kind of sharing is wrong, sched tags should be request queue wide
> instead of tagset wide, and each request queue has its own & independent
> scheduler queue.

Right, so if we get get a scheduler tag we still need to get a driver 
tag, and this would be the "shared" tag.

That makes things simpler then.

> 
>> +		}
>> +	}
>> +
>>   	return 0;
>>   
>>   err:
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 42792942b428..6625bebb46c3 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>    */
>>   void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>>   {
>> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
>> +	sbitmap_queue_wake_all(tags->pbitmap_tags);
>>   	if (include_reserve)
>> -		sbitmap_queue_wake_all(&tags->breserved_tags);
>> +		sbitmap_queue_wake_all(tags->pbreserved_tags);
>>   }
>>   

[...]


>>   	mutex_init(&set->tag_list_lock);
>>   	INIT_LIST_HEAD(&set->tag_list);
>>   
>> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   {
>>   	struct blk_mq_tag_set *set = q->tag_set;
>>   	struct blk_mq_hw_ctx *hctx;
>> +	bool sched_tags = false;
>>   	int i, ret;
>>   
>>   	if (!set)
>> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>>   							false);
>>   		} else {
>> +			sched_tags = true;
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>   							nr, true);
>>   		}
>> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   			q->elevator->type->ops.depth_updated(hctx);
>>   	}
>>   
>> -	if (!ret)
>> +	/*
>> +	 * if ret is 0, all queues should have been updated to the same depth
>> +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
>> +	 * if some are updated, we should probably roll back the change altogether. FIXME
>> +	 */
>> +	if (!ret) {
>> +		if (blk_mq_is_sbitmap_shared(set)) {
>> +			if (sched_tags) {
>> +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
>> +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
>> +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
>> +					return -ENOMEM; /* fixup error handling */
>> +
>> +				queue_for_each_hw_ctx(q, hctx, i) {
>> +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
>> +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
>> +				}
>> +			} else {
>> +				sbitmap_queue_free(&set->shared_bitmap_tags);
>> +				sbitmap_queue_free(&set->shared_breserved_tags);
>> +				if (!blk_mq_init_shared_sbitmap(set))
>> +					return -ENOMEM; /* fixup error handling */
> 
> No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> in blk_mq_update_nr_requests(), which can only touch per-request-queue
> data, not tagset wide data.

Yeah, I see that. We should just resize for driver tags bitmap.

Personally I think the mainline code is a little loose here, as if we 
could grow driver tags, then blk_mq_tagset.tags would be out-of-sync 
with the hctx->tags. Maybe that should be made more explicit in the code.

BTW, do you have anything to say about this (modified slightly) comment:

/*
  * if ret != 0, q->nr_requests would not be updated, yet the depth
  * for some hctx sched tags may have changed - is that the right thing
  * to do?
  */

Thanks,
John


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-21 10:24     ` John Garry
@ 2019-11-25  3:00       ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-11-25  3:00 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang (M)

On Thu, Nov 21, 2019 at 10:24:16AM +0000, John Garry wrote:
> > >   int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > >   {
> > > +	struct blk_mq_tag_set *tag_set = q->tag_set;
> > >   	struct blk_mq_hw_ctx *hctx;
> > >   	struct elevator_queue *eq;
> > >   	unsigned int i;
> > > @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > >   		blk_mq_debugfs_register_sched_hctx(q, hctx);
> > >   	}
> > > +	if (blk_mq_is_sbitmap_shared(tag_set)) {
> > > +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> > > +			ret = -ENOMEM;
> > > +			goto err;
> > > +		}
> > > +		queue_for_each_hw_ctx(q, hctx, i) {
> > > +			struct blk_mq_tags *tags = hctx->sched_tags;
> > > +
> > > +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> > > +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
> > 
> > This kind of sharing is wrong, sched tags should be request queue wide
> > instead of tagset wide, and each request queue has its own & independent
> > scheduler queue.
> 
> Right, so if we get get a scheduler tag we still need to get a driver tag,
> and this would be the "shared" tag.
> 
> That makes things simpler then.
> 
> > 
> > > +		}
> > > +	}
> > > +
> > >   	return 0;
> > >   err:
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 42792942b428..6625bebb46c3 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> > >    */
> > >   void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
> > >   {
> > > -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> > > +	sbitmap_queue_wake_all(tags->pbitmap_tags);
> > >   	if (include_reserve)
> > > -		sbitmap_queue_wake_all(&tags->breserved_tags);
> > > +		sbitmap_queue_wake_all(tags->pbreserved_tags);
> > >   }
> 
> [...]
> 
> 
> > >   	mutex_init(&set->tag_list_lock);
> > >   	INIT_LIST_HEAD(&set->tag_list);
> > > @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   {
> > >   	struct blk_mq_tag_set *set = q->tag_set;
> > >   	struct blk_mq_hw_ctx *hctx;
> > > +	bool sched_tags = false;
> > >   	int i, ret;
> > >   	if (!set)
> > > @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
> > >   							false);
> > >   		} else {
> > > +			sched_tags = true;
> > >   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> > >   							nr, true);
> > >   		}
> > > @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   			q->elevator->type->ops.depth_updated(hctx);
> > >   	}
> > > -	if (!ret)
> > > +	/*
> > > +	 * if ret is 0, all queues should have been updated to the same depth
> > > +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> > > +	 * if some are updated, we should probably roll back the change altogether. FIXME
> > > +	 */
> > > +	if (!ret) {
> > > +		if (blk_mq_is_sbitmap_shared(set)) {
> > > +			if (sched_tags) {
> > > +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> > > +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
> > > +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> > > +					return -ENOMEM; /* fixup error handling */
> > > +
> > > +				queue_for_each_hw_ctx(q, hctx, i) {
> > > +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> > > +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> > > +				}
> > > +			} else {
> > > +				sbitmap_queue_free(&set->shared_bitmap_tags);
> > > +				sbitmap_queue_free(&set->shared_breserved_tags);
> > > +				if (!blk_mq_init_shared_sbitmap(set))
> > > +					return -ENOMEM; /* fixup error handling */
> > 
> > No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> > in blk_mq_update_nr_requests(), which can only touch per-request-queue
> > data, not tagset wide data.
> 
> Yeah, I see that. We should just resize for driver tags bitmap.
> 
> Personally I think the mainline code is a little loose here, as if we could
> grow driver tags, then blk_mq_tagset.tags would be out-of-sync with the
> hctx->tags. Maybe that should be made more explicit in the code.
> 
> BTW, do you have anything to say about this (modified slightly) comment:
> 
> /*
>  * if ret != 0, q->nr_requests would not be updated, yet the depth
>  * for some hctx sched tags may have changed - is that the right thing
>  * to do?
>  */

In theory, your concern is right, but so far we only support same
depth of hctx for either sched tags or driver tags, so not an issue
so far.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-25  3:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
2019-11-21  8:55   ` Ming Lei
2019-11-21 10:24     ` John Garry
2019-11-25  3:00       ` Ming Lei
2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.