linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2019-11-19 14:27 John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

This is another stab at solving the problem of hostwide shared tags for SCSI
HBAs.

As discussed previously, Ming Lei's most recent series in [0] to use
hctx[0] tags for all hctx for a host was a bit messy and intrusive, so seen
as a no go. Indeed, blk-mq is designed for separate tags per hctx.

Bart also followed up on my v1 RFC with another implementation along those
same lines, which was neater, but I am concerned that the change in this
approach may cause issues - see [1].

This series introduces a different approach to solve that problem, in
keeping the per-hctx tags but introducing a new separate sbitmap per
tagset. The shared sbitmap is used to generate a unique tag over all hctx per
tagset.

Currently I just fixed up the hisi_sas driver to use the shared tags,
but should not be much trouble to change others over.

Patch #3 is still quite experimental at this point - I added some code
comments on this. I also threw in a minor tidy-up patch.

[0] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[1] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Differences to v1:
- Use a shared sbitmap, and not a separate shared tags (a big change!)
	- Drop request.shared_tag
- Add RB tags

Hannes Reinecke (1):
  scsi: Add template flag 'host_tagset'

John Garry (3):
  blk-mq: Remove some unused function arguments
  blk-mq: Facilitate a shared sbitmap per tagset
  scsi: hisi_sas: Switch v3 hw to MQ

Ming Lei (1):
  blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED

 block/bfq-iosched.c                    |  4 +-
 block/blk-mq-debugfs.c                 |  6 +-
 block/blk-mq-sched.c                   | 14 +++++
 block/blk-mq-tag.c                     | 80 +++++++++++++++++------
 block/blk-mq-tag.h                     | 18 ++++--
 block/blk-mq.c                         | 87 ++++++++++++++++++++------
 block/blk-mq.h                         |  7 ++-
 block/kyber-iosched.c                  |  4 +-
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++--------------
 drivers/scsi/scsi_lib.c                |  2 +
 include/linux/blk-mq.h                 |  9 ++-
 include/scsi/scsi_host.h               |  3 +
 14 files changed, 239 insertions(+), 120 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

The struct blk_mq_hw_ctx * argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-tag.c |  4 ++--
 block/blk-mq-tag.h |  3 +--
 block/blk-mq.c     | 10 ++++------
 block/blk-mq.h     |  2 +-
 4 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index fbacde454718..586c9d6e904a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -183,8 +183,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	return tag + tag_offset;
 }
 
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-		    struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+		    unsigned int tag)
 {
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 15bc74acb57e..d0c10d043891 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,7 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
-			   struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75228d282fc3..20c498fce4dd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -493,9 +493,9 @@ static void __blk_mq_free_request(struct request *rq)
 	blk_pm_mark_last_busy(rq);
 	rq->mq_hctx = NULL;
 	if (rq->tag != -1)
-		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+		blk_mq_put_tag(hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-		blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+		blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
 	blk_mq_sched_restart(hctx);
 	blk_queue_exit(q);
 }
@@ -3354,7 +3354,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
 }
 
 static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
-				       struct blk_mq_hw_ctx *hctx,
 				       struct request *rq)
 {
 	unsigned long ret = 0;
@@ -3387,7 +3386,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
 }
 
 static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
-				     struct blk_mq_hw_ctx *hctx,
 				     struct request *rq)
 {
 	struct hrtimer_sleeper hs;
@@ -3407,7 +3405,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
 	if (q->poll_nsec > 0)
 		nsecs = q->poll_nsec;
 	else
-		nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+		nsecs = blk_mq_poll_nsecs(q, rq);
 
 	if (!nsecs)
 		return false;
@@ -3462,7 +3460,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 			return false;
 	}
 
-	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+	return blk_mq_poll_hybrid_sleep(q, rq);
 }
 
 /**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 32c62c64e6c2..78d38b5f2793 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -208,7 +208,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
 static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 					   struct request *rq)
 {
-	blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
 	rq->tag = -1;
 
 	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

From: Ming Lei <ming.lei@redhat.com>

BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.

So rename it to make the point explicitly.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/blk-mq-debugfs.c |  2 +-
 block/blk-mq-tag.c     |  2 +-
 block/blk-mq-tag.h     |  4 ++--
 block/blk-mq.c         | 20 ++++++++++----------
 include/linux/blk-mq.h |  2 +-
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..33a40ae1d60f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -236,7 +236,7 @@ static const char *const alloc_policy_name[] = {
 #define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
 static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
-	HCTX_FLAG_NAME(TAG_SHARED),
+	HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 586c9d6e904a..42792942b428 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -65,7 +65,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
 {
 	unsigned int depth, users;
 
-	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return true;
 	if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
 		return true;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index d0c10d043891..e36bec8e0970 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -53,7 +53,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
 
 static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return false;
 
 	return __blk_mq_tag_busy(hctx);
@@ -61,7 +61,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
 
 static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 {
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
 		return;
 
 	__blk_mq_tag_idle(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20c498fce4dd..2d2956249ae9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -296,7 +296,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		rq->tag = -1;
 		rq->internal_tag = tag;
 	} else {
-		if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
+		if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
 			rq_flags = RQF_MQ_INFLIGHT;
 			atomic_inc(&data->hctx->nr_active);
 		}
@@ -1112,7 +1112,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait_queue_entry_t *wait;
 	bool ret;
 
-	if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+	if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
 		blk_mq_sched_mark_restart_hctx(hctx);
 
 		/*
@@ -1243,7 +1243,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
 				 * For non-shared tags, the RESTART check
 				 * will suffice.
 				 */
-				if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+				if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 					no_tag = true;
 				break;
 			}
@@ -2358,7 +2358,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
 	hctx->queue = q;
-	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
 
 	INIT_LIST_HEAD(&hctx->hctx_list);
 
@@ -2575,9 +2575,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
 
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (shared)
-			hctx->flags |= BLK_MQ_F_TAG_SHARED;
+			hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		else
-			hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+			hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 	}
 }
 
@@ -2603,7 +2603,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
 	list_del_rcu(&q->tag_set_list);
 	if (list_is_singular(&set->tag_list)) {
 		/* just transitioned to unshared */
-		set->flags &= ~BLK_MQ_F_TAG_SHARED;
+		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, false);
 	}
@@ -2620,12 +2620,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
 	 * Check to see if we're transitioning to shared (from 1 to 2 queues).
 	 */
 	if (!list_empty(&set->tag_list) &&
-	    !(set->flags & BLK_MQ_F_TAG_SHARED)) {
-		set->flags |= BLK_MQ_F_TAG_SHARED;
+	    !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+		set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
 		/* update existing queue */
 		blk_mq_update_tag_set_depth(set, true);
 	}
-	if (set->flags & BLK_MQ_F_TAG_SHARED)
+	if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
 		queue_set_hctx_shared(q, true);
 	list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 201abcb6bd6e..8952c9dfef79 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -225,7 +225,7 @@ struct blk_mq_ops {
 
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
-	BLK_MQ_F_TAG_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
  2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-21  8:55   ` Ming Lei
  2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
  4 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.

In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown.

To solve that problem, Ming's patchset to drain hctx's should ensure no
IOs are missed in-flight [1].

However, to take advantage of that patchset, we need to map the HBA HW
queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.

In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.

However another problem is that blk mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
host busy counter, which would stop the LLDD being sent more than
.can_queue commands; however, we should still ensure that the block layer
does not issue more than .can_queue commands to the Scsi host.

To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.

New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.

Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
tags and requests per hctx; the reason for this is that if we only allocate
tags and requests for a single hctx - like hctx0 - we may break block
drivers which expect a request be associated with a specific hctx, i.e.
not hctx0.

This is based on work originally from Ming Lei in [3] and from Bart's
suggestion in [4].

[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
[3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be

Signed-off-by: John Garry <john.garry@huawei.com>
---
 block/bfq-iosched.c    |  4 +--
 block/blk-mq-debugfs.c |  4 +--
 block/blk-mq-sched.c   | 14 ++++++++
 block/blk-mq-tag.c     | 74 +++++++++++++++++++++++++++++++++---------
 block/blk-mq-tag.h     | 15 +++++++--
 block/blk-mq.c         | 57 +++++++++++++++++++++++++++++---
 block/blk-mq.h         |  5 +++
 block/kyber-iosched.c  |  4 +--
 include/linux/blk-mq.h |  7 ++++
 9 files changed, 157 insertions(+), 27 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0319d6339822..9633c864af07 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
 	struct blk_mq_tags *tags = hctx->sched_tags;
 	unsigned int min_shallow;
 
-	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
-	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+	min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
+	sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
 }
 
 static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 33a40ae1d60f..9cf2f09c08e4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->tags)
+	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
 		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
@@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
 	res = mutex_lock_interruptible(&q->sysfs_lock);
 	if (res)
 		goto out;
-	if (hctx->sched_tags)
+	if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
 		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
 	mutex_unlock(&q->sysfs_lock);
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..b23547f10949 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
 
 int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 {
+	struct blk_mq_tag_set *tag_set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
 	struct elevator_queue *eq;
 	unsigned int i;
@@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 		blk_mq_debugfs_register_sched_hctx(q, hctx);
 	}
 
+	if (blk_mq_is_sbitmap_shared(tag_set)) {
+		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
+			ret = -ENOMEM;
+			goto err;
+		}
+		queue_for_each_hw_ctx(q, hctx, i) {
+			struct blk_mq_tags *tags = hctx->sched_tags;
+
+			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
+			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
+		}
+	}
+
 	return 0;
 
 err:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 42792942b428..6625bebb46c3 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
  */
 void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
 {
-	sbitmap_queue_wake_all(&tags->bitmap_tags);
+	sbitmap_queue_wake_all(tags->pbitmap_tags);
 	if (include_reserve)
-		sbitmap_queue_wake_all(&tags->breserved_tags);
+		sbitmap_queue_wake_all(tags->pbreserved_tags);
 }
 
 /*
@@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 			WARN_ON_ONCE(1);
 			return BLK_MQ_TAG_FAIL;
 		}
-		bt = &tags->breserved_tags;
+		bt = tags->pbreserved_tags;
 		tag_offset = 0;
 	} else {
-		bt = &tags->bitmap_tags;
+		bt = tags->pbitmap_tags;
 		tag_offset = tags->nr_reserved_tags;
 	}
 
@@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 						data->ctx);
 		tags = blk_mq_tags_from_data(data);
 		if (data->flags & BLK_MQ_REQ_RESERVED)
-			bt = &tags->breserved_tags;
+			bt = tags->pbreserved_tags;
 		else
-			bt = &tags->bitmap_tags;
+			bt = tags->pbitmap_tags;
 
 		/*
 		 * If destination hw queue is changed, fake wake up on
@@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
 		const int real_tag = tag - tags->nr_reserved_tags;
 
 		BUG_ON(real_tag >= tags->nr_tags);
-		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+		sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
 	} else {
 		BUG_ON(tag >= tags->nr_reserved_tags);
-		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+		sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
 	}
 }
 
@@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
 	 * We can hit rq == NULL here, because the tagging functions
 	 * test and set the bit before assigning ->rqs[].
 	 */
-	if (rq && rq->q == hctx->queue)
+	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
 		return iter_data->fn(hctx, rq, iter_data->data, reserved);
 	return true;
 }
@@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
-		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
-	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+		bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
+	bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
 }
 
 /**
@@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			continue;
 
 		if (tags->nr_reserved_tags)
-			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
-		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+			bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
+		bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
 	}
 	blk_queue_exit(q);
 }
@@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 		     node))
 		goto free_bitmap_tags;
 
+	tags->pbitmap_tags = &tags->bitmap_tags;
+	tags->pbreserved_tags = &tags->breserved_tags;
+
 	return tags;
 free_bitmap_tags:
 	sbitmap_queue_free(&tags->bitmap_tags);
@@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return NULL;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
+	return false;
+}
+
+bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
+{
+	unsigned int depth = nr_requests -tag_set->reserved_tags;
+	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+	int node = tag_set->numa_node;
+
+	if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
+		return false;
+	if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
+			 node))
+		goto free_bitmap_tags;
+
+	return true;
+free_bitmap_tags:
+	sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
+	return false;
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
 				     int node, int alloc_policy)
 {
@@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
 
+	if (blk_mq_is_sbitmap_shared(set))
+		return tags;
 	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
 }
 
@@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		 * Don't need (or can't) update reserved tags here, they
 		 * remain static and should never need resizing.
 		 */
-		sbitmap_queue_resize(&tags->bitmap_tags,
+		sbitmap_queue_resize(tags->pbitmap_tags,
 				tdepth - tags->nr_reserved_tags);
 	}
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index e36bec8e0970..198b6fbe2c22 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,20 +13,31 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
+	/* We should consider deleting these so they are not referenced by mistake */
 	struct sbitmap_queue bitmap_tags;
 	struct sbitmap_queue breserved_tags;
 
+	struct sbitmap_queue *pbitmap_tags;
+	struct sbitmap_queue *pbreserved_tags;
+
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
+					     unsigned long nr_requests);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+					    unsigned int nr_tags,
+					    unsigned int reserved_tags,
+					    int node, int alloc_policy);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags,
+			   struct blk_mq_ctx *ctx, unsigned int tag);
 extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 					struct blk_mq_tags **tags,
 					unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2d2956249ae9..8d5b21919c9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 		struct sbitmap_queue *sbq;
 
 		list_del_init(&wait->entry);
-		sbq = &hctx->tags->bitmap_tags;
+		sbq = hctx->tags->pbitmap_tags;
 		atomic_dec(&sbq->ws_active);
 	}
 	spin_unlock(&hctx->dispatch_wait_lock);
@@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
 static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 				 struct request *rq)
 {
-	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+	struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
 	struct wait_queue_head *wq;
 	wait_queue_entry_t *wait;
 	bool ret;
@@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
 				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
 	if (!tags)
 		return NULL;
@@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	if (blk_mq_is_sbitmap_shared(set)) {
+		if (!blk_mq_init_shared_sbitmap(set)) {
+			ret = -ENOMEM;
+			goto out_free_mq_map;
+		}
+
+		for (i = 0; i < set->nr_hw_queues; i++) {
+			struct blk_mq_tags *tags = set->tags[i];
+
+			tags->pbitmap_tags = &set->shared_bitmap_tags;
+			tags->pbreserved_tags = &set->shared_breserved_tags;
+		}
+	}
+
 	mutex_init(&set->tag_list_lock);
 	INIT_LIST_HEAD(&set->tag_list);
 
@@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 {
 	struct blk_mq_tag_set *set = q->tag_set;
 	struct blk_mq_hw_ctx *hctx;
+	bool sched_tags = false;
 	int i, ret;
 
 	if (!set)
@@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
 							false);
 		} else {
+			sched_tags = true;
 			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
 							nr, true);
 		}
@@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 			q->elevator->type->ops.depth_updated(hctx);
 	}
 
-	if (!ret)
+	/*
+	 * if ret is 0, all queues should have been updated to the same depth
+	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
+	 * if some are updated, we should probably roll back the change altogether. FIXME
+	 */
+	if (!ret) {
+		if (blk_mq_is_sbitmap_shared(set)) {
+			if (sched_tags) {
+				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
+				sbitmap_queue_free(&set->sched_shared_breserved_tags);
+				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
+					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
+				}
+			} else {
+				sbitmap_queue_free(&set->shared_bitmap_tags);
+				sbitmap_queue_free(&set->shared_breserved_tags);
+				if (!blk_mq_init_shared_sbitmap(set))
+					return -ENOMEM; /* fixup error handling */
+
+				queue_for_each_hw_ctx(q, hctx, i) {
+					hctx->tags->pbitmap_tags = &set->shared_bitmap_tags;
+					hctx->tags->pbreserved_tags = &set->shared_breserved_tags;
+				}
+			}
+		}
 		q->nr_requests = nr;
+	}
+	/*
+	 * if ret != 0, q->nr_requests would not be updated, yet the depth
+	 * for some hctx may have changed - is that right?
+	 */
 
 	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 78d38b5f2793..4c1ea206d3f4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
 	struct blk_mq_hw_ctx *hctx;
 };
 
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+	return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
+}
+
 static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
 {
 	if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0ef637..59fb6afd8e68 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
 	 * All of the hardware queues have the same depth, so we can just grab
 	 * the shift of the first one.
 	 */
-	return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+	return q->queue_hw_ctx[0]->sched_tags->pbitmap_tags->sb.shift;
 }
 
 static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 	khd->batching = 0;
 
 	hctx->sched_data = khd;
-	sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+	sbitmap_queue_min_shallow_depth(hctx->sched_tags->pbitmap_tags,
 					kqd->async_depth);
 
 	return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8952c9dfef79..9b923b42c07b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
 	unsigned int		flags;		/* BLK_MQ_F_* */
 	void			*driver_data;
 
+	struct sbitmap_queue shared_bitmap_tags;
+	struct sbitmap_queue shared_breserved_tags;
+
+	struct sbitmap_queue sched_shared_bitmap_tags;
+	struct sbitmap_queue sched_shared_breserved_tags;
+
 	struct blk_mq_tags	**tags;
 
 	struct mutex		tag_list_lock;
@@ -226,6 +232,7 @@ struct blk_mq_ops {
 enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_QUEUE_SHARED	= 1 << 1,
+	BLK_MQ_F_TAG_HCTX_SHARED	= 1 << 2,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset'
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (2 preceding siblings ...)
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-19 14:27 ` John Garry
  2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5447738906ac..97c9352f2d16 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1901,6 +1901,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 31e0d6ca1eba..08299aeb822d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -442,6 +442,9 @@ struct scsi_host_template {
 	/* True if the low-level driver supports blk-mq only */
 	unsigned force_blk_mq:1;
 
+	/* True if the host uses host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ
  2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
                   ` (3 preceding siblings ...)
  2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
@ 2019-11-19 14:27 ` John Garry
  4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
  To: axboe, jejb, martin.petersen
  Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
	bvanassche, chenxiang66, John Garry

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 36 ++++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++---------------
 3 files changed, 56 insertions(+), 69 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 720c4d6be939..5dbc59084f63 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
 #define _HISI_SAS_H_
 
 #include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
 #include <linux/clk.h>
 #include <linux/debugfs.h>
 #include <linux/dmapool.h>
@@ -390,7 +392,6 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
-	unsigned int *reply_map;
 
 	/* bist */
 	enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 0847e682797b..48d6f1c7f9c2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -421,6 +421,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
 	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+	struct scsi_cmnd *scmd = NULL;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -436,10 +437,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+	if (task->uldd_task) {
+		struct ata_queued_cmd *qc;
 
+		if (dev_is_sata(device)) {
+			qc = task->uldd_task;
+			scmd = qc->scsicmd;
+		} else {
+			scmd = task->uldd_task;
+		}
+	}
+
+	if (scmd) {
+		unsigned int dq_index;
+		u32 blk_tag;
+
+		blk_tag = blk_mq_unique_tag(scmd->request);
+		dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
 		*dq_pointer = dq = sas_dev->dq;
@@ -468,21 +482,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
-	else {
-		struct scsi_cmnd *scsi_cmnd = NULL;
-
-		if (task->uldd_task) {
-			struct ata_queued_cmd *qc;
+	else
+		rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
 
-			if (dev_is_sata(device)) {
-				qc = task->uldd_task;
-				scsi_cmnd = qc->scsicmd;
-			} else {
-				scsi_cmnd = task->uldd_task;
-			}
-		}
-		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
-	}
 	if (rc < 0)
 		goto err_out_dif_dma_unmap;
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index cb8d087762db..5180360482eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2344,66 +2344,35 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 {
-	const struct cpumask *mask;
-	int queue, cpu;
-
-	for (queue = 0; queue < nvecs; queue++) {
-		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+	int vectors;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+	struct Scsi_Host *shost = hisi_hba->shost;
+	struct irq_affinity desc = {
+		.pre_vectors = BASE_VECTORS_V3_HW,
+	};
+
+	min_msi = MIN_AFFINE_VECTORS_V3_HW;
+	vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+						 min_msi, max_msi,
+						 PCI_IRQ_MSI |
+						 PCI_IRQ_AFFINITY,
+						 &desc);
+	if (vectors < 0)
+		return -ENOENT;
 
-		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
-					    BASE_VECTORS_V3_HW);
-		if (!mask)
-			goto fallback;
-		cq->pci_irq_mask = mask;
-		for_each_cpu(cpu, mask)
-			hisi_hba->reply_map[cpu] = queue;
-	}
-	return;
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	shost->nr_hw_queues = hisi_hba->cq_nvecs;
 
-fallback:
-	for_each_possible_cpu(cpu)
-		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
-	/* Don't clean all CQ masks */
+	return 0;
 }
 
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
-	int vectors, rc, i;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
-	if (auto_affine_msi_experimental) {
-		struct irq_affinity desc = {
-			.pre_vectors = BASE_VECTORS_V3_HW,
-		};
-
-		min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
-		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
-						   sizeof(unsigned int),
-						   GFP_KERNEL);
-		if (!hisi_hba->reply_map)
-			return -ENOMEM;
-		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
-							 min_msi, max_msi,
-							 PCI_IRQ_MSI |
-							 PCI_IRQ_AFFINITY,
-							 &desc);
-		if (vectors < 0)
-			return -ENOENT;
-		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
-	} else {
-		min_msi = max_msi;
-		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
-						max_msi, PCI_IRQ_MSI);
-		if (vectors < 0)
-			return vectors;
-	}
-
-	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+	int rc, i;
 
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
@@ -3048,6 +3017,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
 	return 0;
 }
 
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+	struct hisi_hba *hisi_hba = shost_priv(shost);
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+					     BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -3056,6 +3034,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
+	.map_queues		= hisi_sas_map_queues,
 	.change_queue_depth	= sas_change_queue_depth,
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
@@ -3069,6 +3048,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.host_tagset		= 1,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3240,6 +3220,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (hisi_sas_debugfs_enable)
 		hisi_sas_debugfs_init(hisi_hba);
 
+	rc = interrupt_preinit_v3_hw(hisi_hba);
+	if (rc)
+		goto err_out_ha;
+	dev_err(dev, "%d hw qeues\n", shost->nr_hw_queues);
 	rc = scsi_add_host(shost, dev);
 	if (rc)
 		goto err_out_ha;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-21  8:55   ` Ming Lei
  2019-11-21 10:24     ` John Garry
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2019-11-21  8:55 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang66

On Tue, Nov 19, 2019 at 10:27:36PM +0800, John Garry wrote:
> Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
> multiple reply queues with single hostwide tags.
> 
> In addition, these drivers want to use interrupt assignment in
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
> CPU hotplug may cause in-flight IO completion to not be serviced when an
> interrupt is shutdown.
> 
> To solve that problem, Ming's patchset to drain hctx's should ensure no
> IOs are missed in-flight [1].
> 
> However, to take advantage of that patchset, we need to map the HBA HW
> queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.
> 
> In making that transition, the per-SCSI command request tags are no
> longer unique per Scsi host - they are just unique per hctx. As such, the
> HBA LLDD would have to generate this tag internally, which has a certain
> performance overhead.
> 
> However another problem is that blk mq assumes the host may accept
> (Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
> host busy counter, which would stop the LLDD being sent more than
> .can_queue commands; however, we should still ensure that the block layer
> does not issue more than .can_queue commands to the Scsi host.
> 
> To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
> which may be requested at init time.
> 
> New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
> tagset to indicate whether the shared sbitmap should be used.
> 
> Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
> tags and requests per hctx; the reason for this is that if we only allocate
> tags and requests for a single hctx - like hctx0 - we may break block
> drivers which expect a request be associated with a specific hctx, i.e.
> not hctx0.
> 
> This is based on work originally from Ming Lei in [3] and from Bart's
> suggestion in [4].
> 
> [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
> [1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
> [2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
> [3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
> [4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  block/bfq-iosched.c    |  4 +--
>  block/blk-mq-debugfs.c |  4 +--
>  block/blk-mq-sched.c   | 14 ++++++++
>  block/blk-mq-tag.c     | 74 +++++++++++++++++++++++++++++++++---------
>  block/blk-mq-tag.h     | 15 +++++++--
>  block/blk-mq.c         | 57 +++++++++++++++++++++++++++++---
>  block/blk-mq.h         |  5 +++
>  block/kyber-iosched.c  |  4 +--
>  include/linux/blk-mq.h |  7 ++++
>  9 files changed, 157 insertions(+), 27 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0319d6339822..9633c864af07 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
>  	struct blk_mq_tags *tags = hctx->sched_tags;
>  	unsigned int min_shallow;
>  
> -	min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
> -	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
> +	min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
> +	sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
>  }
>  
>  static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 33a40ae1d60f..9cf2f09c08e4 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
>  	res = mutex_lock_interruptible(&q->sysfs_lock);
>  	if (res)
>  		goto out;
> -	if (hctx->tags)
> +	if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
>  		sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
> @@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
>  	res = mutex_lock_interruptible(&q->sysfs_lock);
>  	if (res)
>  		goto out;
> -	if (hctx->sched_tags)
> +	if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
>  		sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
>  	mutex_unlock(&q->sysfs_lock);
>  
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..b23547f10949 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
>  
>  int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  {
> +	struct blk_mq_tag_set *tag_set = q->tag_set;
>  	struct blk_mq_hw_ctx *hctx;
>  	struct elevator_queue *eq;
>  	unsigned int i;
> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>  		blk_mq_debugfs_register_sched_hctx(q, hctx);
>  	}
>  
> +	if (blk_mq_is_sbitmap_shared(tag_set)) {
> +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +		queue_for_each_hw_ctx(q, hctx, i) {
> +			struct blk_mq_tags *tags = hctx->sched_tags;
> +
> +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;

This kind of sharing is wrong, sched tags should be request queue wide
instead of tagset wide, and each request queue has its own & independent
scheduler queue.

> +		}
> +	}
> +
>  	return 0;
>  
>  err:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 42792942b428..6625bebb46c3 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>   */
>  void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>  {
> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> +	sbitmap_queue_wake_all(tags->pbitmap_tags);
>  	if (include_reserve)
> -		sbitmap_queue_wake_all(&tags->breserved_tags);
> +		sbitmap_queue_wake_all(tags->pbreserved_tags);
>  }
>  
>  /*
> @@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  			WARN_ON_ONCE(1);
>  			return BLK_MQ_TAG_FAIL;
>  		}
> -		bt = &tags->breserved_tags;
> +		bt = tags->pbreserved_tags;
>  		tag_offset = 0;
>  	} else {
> -		bt = &tags->bitmap_tags;
> +		bt = tags->pbitmap_tags;
>  		tag_offset = tags->nr_reserved_tags;
>  	}
>  
> @@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>  						data->ctx);
>  		tags = blk_mq_tags_from_data(data);
>  		if (data->flags & BLK_MQ_REQ_RESERVED)
> -			bt = &tags->breserved_tags;
> +			bt = tags->pbreserved_tags;
>  		else
> -			bt = &tags->bitmap_tags;
> +			bt = tags->pbitmap_tags;
>  
>  		/*
>  		 * If destination hw queue is changed, fake wake up on
> @@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
>  		const int real_tag = tag - tags->nr_reserved_tags;
>  
>  		BUG_ON(real_tag >= tags->nr_tags);
> -		sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
>  	} else {
>  		BUG_ON(tag >= tags->nr_reserved_tags);
> -		sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
> +		sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
>  	}
>  }
>  
> @@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
>  	 * We can hit rq == NULL here, because the tagging functions
>  	 * test and set the bit before assigning ->rqs[].
>  	 */
> -	if (rq && rq->q == hctx->queue)
> +	if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
>  		return iter_data->fn(hctx, rq, iter_data->data, reserved);
>  	return true;
>  }
> @@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
>  		busy_tag_iter_fn *fn, void *priv)
>  {
>  	if (tags->nr_reserved_tags)
> -		bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
> -	bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
> +		bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
> +	bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
>  }
>  
>  /**
> @@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			continue;
>  
>  		if (tags->nr_reserved_tags)
> -			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> -		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> +			bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
> +		bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
>  	}
>  	blk_queue_exit(q);
>  }
> @@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  		     node))
>  		goto free_bitmap_tags;
>  
> +	tags->pbitmap_tags = &tags->bitmap_tags;
> +	tags->pbreserved_tags = &tags->breserved_tags;
> +
>  	return tags;
>  free_bitmap_tags:
>  	sbitmap_queue_free(&tags->bitmap_tags);
> @@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
>  	return NULL;
>  }
>  
> -struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> +bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
> +{
> +	unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
> +	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> +	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> +	int node = tag_set->numa_node;
> +
> +	if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
> +		return false;
> +	if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
> +			 node))
> +		goto free_bitmap_tags;
> +
> +	return true;
> +free_bitmap_tags:
> +	sbitmap_queue_free(&tag_set->shared_bitmap_tags);
> +	return false;
> +}
> +
> +bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
> +{
> +	unsigned int depth = nr_requests -tag_set->reserved_tags;
> +	int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> +	bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> +	int node = tag_set->numa_node;
> +
> +	if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
> +		return false;
> +	if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
> +			 node))
> +		goto free_bitmap_tags;
> +
> +	return true;
> +free_bitmap_tags:
> +	sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
> +	return false;
> +}
> +
> +struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
> +				     unsigned int total_tags,
>  				     unsigned int reserved_tags,
>  				     int node, int alloc_policy)
>  {
> @@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
>  	tags->nr_tags = total_tags;
>  	tags->nr_reserved_tags = reserved_tags;
>  
> +	if (blk_mq_is_sbitmap_shared(set))
> +		return tags;
>  	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
>  }
>  
> @@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  		 * Don't need (or can't) update reserved tags here, they
>  		 * remain static and should never need resizing.
>  		 */
> -		sbitmap_queue_resize(&tags->bitmap_tags,
> +		sbitmap_queue_resize(tags->pbitmap_tags,
>  				tdepth - tags->nr_reserved_tags);
>  	}
>  
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index e36bec8e0970..198b6fbe2c22 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -13,20 +13,31 @@ struct blk_mq_tags {
>  
>  	atomic_t active_queues;
>  
> +	/* We should consider deleting these so they are not referenced by mistake */
>  	struct sbitmap_queue bitmap_tags;
>  	struct sbitmap_queue breserved_tags;
>  
> +	struct sbitmap_queue *pbitmap_tags;
> +	struct sbitmap_queue *pbreserved_tags;
> +
>  	struct request **rqs;
>  	struct request **static_rqs;
>  	struct list_head page_list;
>  };
>  
>  
> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
> +extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
> +extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
> +					     unsigned long nr_requests);
> +extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
> +					    unsigned int nr_tags,
> +					    unsigned int reserved_tags,
> +					    int node, int alloc_policy);
>  extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>  
>  extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
> -extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
> +extern void blk_mq_put_tag(struct blk_mq_tags *tags,
> +			   struct blk_mq_ctx *ctx, unsigned int tag);
>  extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  					struct blk_mq_tags **tags,
>  					unsigned int depth, bool can_grow);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2d2956249ae9..8d5b21919c9a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
>  		struct sbitmap_queue *sbq;
>  
>  		list_del_init(&wait->entry);
> -		sbq = &hctx->tags->bitmap_tags;
> +		sbq = hctx->tags->pbitmap_tags;
>  		atomic_dec(&sbq->ws_active);
>  	}
>  	spin_unlock(&hctx->dispatch_wait_lock);
> @@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
>  static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
>  				 struct request *rq)
>  {
> -	struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
> +	struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
>  	struct wait_queue_head *wq;
>  	wait_queue_entry_t *wait;
>  	bool ret;
> @@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>  	if (node == NUMA_NO_NODE)
>  		node = set->numa_node;
>  
> -	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> +	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
>  				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>  	if (!tags)
>  		return NULL;
> @@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	if (ret)
>  		goto out_free_mq_map;
>  
> +	if (blk_mq_is_sbitmap_shared(set)) {
> +		if (!blk_mq_init_shared_sbitmap(set)) {
> +			ret = -ENOMEM;
> +			goto out_free_mq_map;
> +		}
> +
> +		for (i = 0; i < set->nr_hw_queues; i++) {
> +			struct blk_mq_tags *tags = set->tags[i];
> +
> +			tags->pbitmap_tags = &set->shared_bitmap_tags;
> +			tags->pbreserved_tags = &set->shared_breserved_tags;
> +		}
> +	}
> +
>  	mutex_init(&set->tag_list_lock);
>  	INIT_LIST_HEAD(&set->tag_list);
>  
> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  {
>  	struct blk_mq_tag_set *set = q->tag_set;
>  	struct blk_mq_hw_ctx *hctx;
> +	bool sched_tags = false;
>  	int i, ret;
>  
>  	if (!set)
> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>  							false);
>  		} else {
> +			sched_tags = true;
>  			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>  							nr, true);
>  		}
> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>  			q->elevator->type->ops.depth_updated(hctx);
>  	}
>  
> -	if (!ret)
> +	/*
> +	 * if ret is 0, all queues should have been updated to the same depth
> +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> +	 * if some are updated, we should probably roll back the change altogether. FIXME
> +	 */
> +	if (!ret) {
> +		if (blk_mq_is_sbitmap_shared(set)) {
> +			if (sched_tags) {
> +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
> +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> +					return -ENOMEM; /* fixup error handling */
> +
> +				queue_for_each_hw_ctx(q, hctx, i) {
> +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> +				}
> +			} else {
> +				sbitmap_queue_free(&set->shared_bitmap_tags);
> +				sbitmap_queue_free(&set->shared_breserved_tags);
> +				if (!blk_mq_init_shared_sbitmap(set))
> +					return -ENOMEM; /* fixup error handling */

No, we can't re-allocate driver tags here which are shared by all LUNs.
And you should see that 'can_grow' is set as false for driver tags
in blk_mq_update_nr_requests(), which can only touch per-request-queue
data, not tagset wide data.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-21  8:55   ` Ming Lei
@ 2019-11-21 10:24     ` John Garry
  2019-11-25  3:00       ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-21 10:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang (M)

>>   
>>   int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>>   {
>> +	struct blk_mq_tag_set *tag_set = q->tag_set;
>>   	struct blk_mq_hw_ctx *hctx;
>>   	struct elevator_queue *eq;
>>   	unsigned int i;
>> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>>   		blk_mq_debugfs_register_sched_hctx(q, hctx);
>>   	}
>>   
>> +	if (blk_mq_is_sbitmap_shared(tag_set)) {
>> +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
>> +			ret = -ENOMEM;
>> +			goto err;
>> +		}
>> +		queue_for_each_hw_ctx(q, hctx, i) {
>> +			struct blk_mq_tags *tags = hctx->sched_tags;
>> +
>> +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
>> +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
> 
> This kind of sharing is wrong, sched tags should be request queue wide
> instead of tagset wide, and each request queue has its own & independent
> scheduler queue.

Right, so if we get get a scheduler tag we still need to get a driver 
tag, and this would be the "shared" tag.

That makes things simpler then.

> 
>> +		}
>> +	}
>> +
>>   	return 0;
>>   
>>   err:
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 42792942b428..6625bebb46c3 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>>    */
>>   void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>>   {
>> -	sbitmap_queue_wake_all(&tags->bitmap_tags);
>> +	sbitmap_queue_wake_all(tags->pbitmap_tags);
>>   	if (include_reserve)
>> -		sbitmap_queue_wake_all(&tags->breserved_tags);
>> +		sbitmap_queue_wake_all(tags->pbreserved_tags);
>>   }
>>   

[...]


>>   	mutex_init(&set->tag_list_lock);
>>   	INIT_LIST_HEAD(&set->tag_list);
>>   
>> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   {
>>   	struct blk_mq_tag_set *set = q->tag_set;
>>   	struct blk_mq_hw_ctx *hctx;
>> +	bool sched_tags = false;
>>   	int i, ret;
>>   
>>   	if (!set)
>> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>>   							false);
>>   		} else {
>> +			sched_tags = true;
>>   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>>   							nr, true);
>>   		}
>> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>>   			q->elevator->type->ops.depth_updated(hctx);
>>   	}
>>   
>> -	if (!ret)
>> +	/*
>> +	 * if ret is 0, all queues should have been updated to the same depth
>> +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
>> +	 * if some are updated, we should probably roll back the change altogether. FIXME
>> +	 */
>> +	if (!ret) {
>> +		if (blk_mq_is_sbitmap_shared(set)) {
>> +			if (sched_tags) {
>> +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
>> +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
>> +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
>> +					return -ENOMEM; /* fixup error handling */
>> +
>> +				queue_for_each_hw_ctx(q, hctx, i) {
>> +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
>> +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
>> +				}
>> +			} else {
>> +				sbitmap_queue_free(&set->shared_bitmap_tags);
>> +				sbitmap_queue_free(&set->shared_breserved_tags);
>> +				if (!blk_mq_init_shared_sbitmap(set))
>> +					return -ENOMEM; /* fixup error handling */
> 
> No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> in blk_mq_update_nr_requests(), which can only touch per-request-queue
> data, not tagset wide data.

Yeah, I see that. We should just resize for driver tags bitmap.

Personally I think the mainline code is a little loose here, as if we 
could grow driver tags, then blk_mq_tagset.tags would be out-of-sync 
with the hctx->tags. Maybe that should be made more explicit in the code.

BTW, do you have anything to say about this (modified slightly) comment:

/*
  * if ret != 0, q->nr_requests would not be updated, yet the depth
  * for some hctx sched tags may have changed - is that the right thing
  * to do?
  */

Thanks,
John


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
  2019-11-21 10:24     ` John Garry
@ 2019-11-25  3:00       ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-11-25  3:00 UTC (permalink / raw)
  To: John Garry
  Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
	linux-scsi, hare, bvanassche, chenxiang (M)

On Thu, Nov 21, 2019 at 10:24:16AM +0000, John Garry wrote:
> > >   int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > >   {
> > > +	struct blk_mq_tag_set *tag_set = q->tag_set;
> > >   	struct blk_mq_hw_ctx *hctx;
> > >   	struct elevator_queue *eq;
> > >   	unsigned int i;
> > > @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > >   		blk_mq_debugfs_register_sched_hctx(q, hctx);
> > >   	}
> > > +	if (blk_mq_is_sbitmap_shared(tag_set)) {
> > > +		if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> > > +			ret = -ENOMEM;
> > > +			goto err;
> > > +		}
> > > +		queue_for_each_hw_ctx(q, hctx, i) {
> > > +			struct blk_mq_tags *tags = hctx->sched_tags;
> > > +
> > > +			tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> > > +			tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
> > 
> > This kind of sharing is wrong, sched tags should be request queue wide
> > instead of tagset wide, and each request queue has its own & independent
> > scheduler queue.
> 
> Right, so if we get get a scheduler tag we still need to get a driver tag,
> and this would be the "shared" tag.
> 
> That makes things simpler then.
> 
> > 
> > > +		}
> > > +	}
> > > +
> > >   	return 0;
> > >   err:
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 42792942b428..6625bebb46c3 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> > >    */
> > >   void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
> > >   {
> > > -	sbitmap_queue_wake_all(&tags->bitmap_tags);
> > > +	sbitmap_queue_wake_all(tags->pbitmap_tags);
> > >   	if (include_reserve)
> > > -		sbitmap_queue_wake_all(&tags->breserved_tags);
> > > +		sbitmap_queue_wake_all(tags->pbreserved_tags);
> > >   }
> 
> [...]
> 
> 
> > >   	mutex_init(&set->tag_list_lock);
> > >   	INIT_LIST_HEAD(&set->tag_list);
> > > @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   {
> > >   	struct blk_mq_tag_set *set = q->tag_set;
> > >   	struct blk_mq_hw_ctx *hctx;
> > > +	bool sched_tags = false;
> > >   	int i, ret;
> > >   	if (!set)
> > > @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   			ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
> > >   							false);
> > >   		} else {
> > > +			sched_tags = true;
> > >   			ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> > >   							nr, true);
> > >   		}
> > > @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > >   			q->elevator->type->ops.depth_updated(hctx);
> > >   	}
> > > -	if (!ret)
> > > +	/*
> > > +	 * if ret is 0, all queues should have been updated to the same depth
> > > +	 * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> > > +	 * if some are updated, we should probably roll back the change altogether. FIXME
> > > +	 */
> > > +	if (!ret) {
> > > +		if (blk_mq_is_sbitmap_shared(set)) {
> > > +			if (sched_tags) {
> > > +				sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> > > +				sbitmap_queue_free(&set->sched_shared_breserved_tags);
> > > +				if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> > > +					return -ENOMEM; /* fixup error handling */
> > > +
> > > +				queue_for_each_hw_ctx(q, hctx, i) {
> > > +					hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> > > +					hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> > > +				}
> > > +			} else {
> > > +				sbitmap_queue_free(&set->shared_bitmap_tags);
> > > +				sbitmap_queue_free(&set->shared_breserved_tags);
> > > +				if (!blk_mq_init_shared_sbitmap(set))
> > > +					return -ENOMEM; /* fixup error handling */
> > 
> > No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> > in blk_mq_update_nr_requests(), which can only touch per-request-queue
> > data, not tagset wide data.
> 
> Yeah, I see that. We should just resize for driver tags bitmap.
> 
> Personally I think the mainline code is a little loose here, as if we could
> grow driver tags, then blk_mq_tagset.tags would be out-of-sync with the
> hctx->tags. Maybe that should be made more explicit in the code.
> 
> BTW, do you have anything to say about this (modified slightly) comment:
> 
> /*
>  * if ret != 0, q->nr_requests would not be updated, yet the depth
>  * for some hctx sched tags may have changed - is that the right thing
>  * to do?
>  */

In theory, your concern is right, but so far we only support same
depth of hctx for either sched tags or driver tags, so not an issue
so far.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-25  3:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
2019-11-21  8:55   ` Ming Lei
2019-11-21 10:24     ` John Garry
2019-11-25  3:00       ` Ming Lei
2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).