* [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs
@ 2019-11-19 14:27 John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
This is another stab at solving the problem of hostwide shared tags for SCSI
HBAs.
As discussed previously, Ming Lei's most recent series in [0] to use
hctx[0] tags for all hctx for a host was a bit messy and intrusive, so seen
as a no go. Indeed, blk-mq is designed for separate tags per hctx.
Bart also followed up on my v1 RFC with another implementation along those
same lines, which was neater, but I am concerned that the change in this
approach may cause issues - see [1].
This series introduces a different approach to solve that problem, in
keeping the per-hctx tags but introducing a new separate sbitmap per
tagset. The shared sbitmap is used to generate a unique tag over all hctx per
tagset.
Currently I just fixed up the hisi_sas driver to use the shared tags,
but should not be much trouble to change others over.
Patch #3 is still quite experimental at this point - I added some code
comments on this. I also threw in a minor tidy-up patch.
[0] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[1] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
Differences to v1:
- Use a shared sbitmap, and not a separate shared tags (a big change!)
- Drop request.shared_tag
- Add RB tags
Hannes Reinecke (1):
scsi: Add template flag 'host_tagset'
John Garry (3):
blk-mq: Remove some unused function arguments
blk-mq: Facilitate a shared sbitmap per tagset
scsi: hisi_sas: Switch v3 hw to MQ
Ming Lei (1):
blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
block/bfq-iosched.c | 4 +-
block/blk-mq-debugfs.c | 6 +-
block/blk-mq-sched.c | 14 +++++
block/blk-mq-tag.c | 80 +++++++++++++++++------
block/blk-mq-tag.h | 18 ++++--
block/blk-mq.c | 87 ++++++++++++++++++++------
block/blk-mq.h | 7 ++-
block/kyber-iosched.c | 4 +-
drivers/scsi/hisi_sas/hisi_sas.h | 3 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 36 ++++++-----
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++--------------
drivers/scsi/scsi_lib.c | 2 +
include/linux/blk-mq.h | 9 ++-
include/scsi/scsi_host.h | 3 +
14 files changed, 239 insertions(+), 120 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
@ 2019-11-19 14:27 ` John Garry
2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
The struct blk_mq_hw_ctx * argument in blk_mq_put_tag(),
blk_mq_poll_nsecs(), and blk_mq_poll_hybrid_sleep() is unused, so remove
it.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: John Garry <john.garry@huawei.com>
---
block/blk-mq-tag.c | 4 ++--
block/blk-mq-tag.h | 3 +--
block/blk-mq.c | 10 ++++------
block/blk-mq.h | 2 +-
4 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index fbacde454718..586c9d6e904a 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -183,8 +183,8 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
return tag + tag_offset;
}
-void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
- struct blk_mq_ctx *ctx, unsigned int tag)
+void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
+ unsigned int tag)
{
if (!blk_mq_tag_is_reserved(tags, tag)) {
const int real_tag = tag - tags->nr_reserved_tags;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 15bc74acb57e..d0c10d043891 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -26,8 +26,7 @@ extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int r
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
- struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 75228d282fc3..20c498fce4dd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -493,9 +493,9 @@ static void __blk_mq_free_request(struct request *rq)
blk_pm_mark_last_busy(rq);
rq->mq_hctx = NULL;
if (rq->tag != -1)
- blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
+ blk_mq_put_tag(hctx->tags, ctx, rq->tag);
if (sched_tag != -1)
- blk_mq_put_tag(hctx, hctx->sched_tags, ctx, sched_tag);
+ blk_mq_put_tag(hctx->sched_tags, ctx, sched_tag);
blk_mq_sched_restart(hctx);
blk_queue_exit(q);
}
@@ -3354,7 +3354,6 @@ static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb)
}
static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
- struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
unsigned long ret = 0;
@@ -3387,7 +3386,6 @@ static unsigned long blk_mq_poll_nsecs(struct request_queue *q,
}
static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
- struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
struct hrtimer_sleeper hs;
@@ -3407,7 +3405,7 @@ static bool blk_mq_poll_hybrid_sleep(struct request_queue *q,
if (q->poll_nsec > 0)
nsecs = q->poll_nsec;
else
- nsecs = blk_mq_poll_nsecs(q, hctx, rq);
+ nsecs = blk_mq_poll_nsecs(q, rq);
if (!nsecs)
return false;
@@ -3462,7 +3460,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
return false;
}
- return blk_mq_poll_hybrid_sleep(q, hctx, rq);
+ return blk_mq_poll_hybrid_sleep(q, rq);
}
/**
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 32c62c64e6c2..78d38b5f2793 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -208,7 +208,7 @@ static inline bool blk_mq_get_dispatch_budget(struct blk_mq_hw_ctx *hctx)
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
- blk_mq_put_tag(hctx, hctx->tags, rq->mq_ctx, rq->tag);
+ blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
rq->tag = -1;
if (rq->rq_flags & RQF_MQ_INFLIGHT) {
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
@ 2019-11-19 14:27 ` John Garry
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
From: Ming Lei <ming.lei@redhat.com>
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request
queues, all of which should belong to LUNs attached to same HBA.
So rename it to make the point explicitly.
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
block/blk-mq-debugfs.c | 2 +-
block/blk-mq-tag.c | 2 +-
block/blk-mq-tag.h | 4 ++--
block/blk-mq.c | 20 ++++++++++----------
include/linux/blk-mq.h | 2 +-
5 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index b3f2ba483992..33a40ae1d60f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -236,7 +236,7 @@ static const char *const alloc_policy_name[] = {
#define HCTX_FLAG_NAME(name) [ilog2(BLK_MQ_F_##name)] = #name
static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(SHOULD_MERGE),
- HCTX_FLAG_NAME(TAG_SHARED),
+ HCTX_FLAG_NAME(TAG_QUEUE_SHARED),
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
};
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 586c9d6e904a..42792942b428 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -65,7 +65,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
{
unsigned int depth, users;
- if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_SHARED))
+ if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return true;
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return true;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index d0c10d043891..e36bec8e0970 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -53,7 +53,7 @@ extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *);
static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
- if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+ if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return false;
return __blk_mq_tag_busy(hctx);
@@ -61,7 +61,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
- if (!(hctx->flags & BLK_MQ_F_TAG_SHARED))
+ if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED))
return;
__blk_mq_tag_idle(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 20c498fce4dd..2d2956249ae9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -296,7 +296,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
rq->tag = -1;
rq->internal_tag = tag;
} else {
- if (data->hctx->flags & BLK_MQ_F_TAG_SHARED) {
+ if (data->hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) {
rq_flags = RQF_MQ_INFLIGHT;
atomic_inc(&data->hctx->nr_active);
}
@@ -1112,7 +1112,7 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
wait_queue_entry_t *wait;
bool ret;
- if (!(hctx->flags & BLK_MQ_F_TAG_SHARED)) {
+ if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
blk_mq_sched_mark_restart_hctx(hctx);
/*
@@ -1243,7 +1243,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
* For non-shared tags, the RESTART check
* will suffice.
*/
- if (hctx->flags & BLK_MQ_F_TAG_SHARED)
+ if (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
no_tag = true;
break;
}
@@ -2358,7 +2358,7 @@ blk_mq_alloc_hctx(struct request_queue *q, struct blk_mq_tag_set *set,
spin_lock_init(&hctx->lock);
INIT_LIST_HEAD(&hctx->dispatch);
hctx->queue = q;
- hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+ hctx->flags = set->flags & ~BLK_MQ_F_TAG_QUEUE_SHARED;
INIT_LIST_HEAD(&hctx->hctx_list);
@@ -2575,9 +2575,9 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared)
queue_for_each_hw_ctx(q, hctx, i) {
if (shared)
- hctx->flags |= BLK_MQ_F_TAG_SHARED;
+ hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
else
- hctx->flags &= ~BLK_MQ_F_TAG_SHARED;
+ hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
}
}
@@ -2603,7 +2603,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue *q)
list_del_rcu(&q->tag_set_list);
if (list_is_singular(&set->tag_list)) {
/* just transitioned to unshared */
- set->flags &= ~BLK_MQ_F_TAG_SHARED;
+ set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
/* update existing queue */
blk_mq_update_tag_set_depth(set, false);
}
@@ -2620,12 +2620,12 @@ static void blk_mq_add_queue_tag_set(struct blk_mq_tag_set *set,
* Check to see if we're transitioning to shared (from 1 to 2 queues).
*/
if (!list_empty(&set->tag_list) &&
- !(set->flags & BLK_MQ_F_TAG_SHARED)) {
- set->flags |= BLK_MQ_F_TAG_SHARED;
+ !(set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+ set->flags |= BLK_MQ_F_TAG_QUEUE_SHARED;
/* update existing queue */
blk_mq_update_tag_set_depth(set, true);
}
- if (set->flags & BLK_MQ_F_TAG_SHARED)
+ if (set->flags & BLK_MQ_F_TAG_QUEUE_SHARED)
queue_set_hctx_shared(q, true);
list_add_tail_rcu(&q->tag_set_list, &set->tag_list);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 201abcb6bd6e..8952c9dfef79 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -225,7 +225,7 @@ struct blk_mq_ops {
enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
- BLK_MQ_F_TAG_SHARED = 1 << 1,
+ BLK_MQ_F_TAG_QUEUE_SHARED = 1 << 1,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
@ 2019-11-19 14:27 ` John Garry
2019-11-21 8:55 ` Ming Lei
2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
4 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
multiple reply queues with single hostwide tags.
In addition, these drivers want to use interrupt assignment in
pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
CPU hotplug may cause in-flight IO completion to not be serviced when an
interrupt is shutdown.
To solve that problem, Ming's patchset to drain hctx's should ensure no
IOs are missed in-flight [1].
However, to take advantage of that patchset, we need to map the HBA HW
queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.
In making that transition, the per-SCSI command request tags are no
longer unique per Scsi host - they are just unique per hctx. As such, the
HBA LLDD would have to generate this tag internally, which has a certain
performance overhead.
However another problem is that blk mq assumes the host may accept
(Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
host busy counter, which would stop the LLDD being sent more than
.can_queue commands; however, we should still ensure that the block layer
does not issue more than .can_queue commands to the Scsi host.
To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
which may be requested at init time.
New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
tagset to indicate whether the shared sbitmap should be used.
Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
tags and requests per hctx; the reason for this is that if we only allocate
tags and requests for a single hctx - like hctx0 - we may break block
drivers which expect a request be associated with a specific hctx, i.e.
not hctx0.
This is based on work originally from Ming Lei in [3] and from Bart's
suggestion in [4].
[0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
[1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
[2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
[3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
[4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
Signed-off-by: John Garry <john.garry@huawei.com>
---
block/bfq-iosched.c | 4 +--
block/blk-mq-debugfs.c | 4 +--
block/blk-mq-sched.c | 14 ++++++++
block/blk-mq-tag.c | 74 +++++++++++++++++++++++++++++++++---------
block/blk-mq-tag.h | 15 +++++++--
block/blk-mq.c | 57 +++++++++++++++++++++++++++++---
block/blk-mq.h | 5 +++
block/kyber-iosched.c | 4 +--
include/linux/blk-mq.h | 7 ++++
9 files changed, 157 insertions(+), 27 deletions(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0319d6339822..9633c864af07 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
struct blk_mq_tags *tags = hctx->sched_tags;
unsigned int min_shallow;
- min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
- sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
+ min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
+ sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
}
static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 33a40ae1d60f..9cf2f09c08e4 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
- if (hctx->tags)
+ if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
@@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
res = mutex_lock_interruptible(&q->sysfs_lock);
if (res)
goto out;
- if (hctx->sched_tags)
+ if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd47b3d..b23547f10949 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
{
+ struct blk_mq_tag_set *tag_set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
struct elevator_queue *eq;
unsigned int i;
@@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
blk_mq_debugfs_register_sched_hctx(q, hctx);
}
+ if (blk_mq_is_sbitmap_shared(tag_set)) {
+ if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ queue_for_each_hw_ctx(q, hctx, i) {
+ struct blk_mq_tags *tags = hctx->sched_tags;
+
+ tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
+ tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
+ }
+ }
+
return 0;
err:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 42792942b428..6625bebb46c3 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
*/
void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
{
- sbitmap_queue_wake_all(&tags->bitmap_tags);
+ sbitmap_queue_wake_all(tags->pbitmap_tags);
if (include_reserve)
- sbitmap_queue_wake_all(&tags->breserved_tags);
+ sbitmap_queue_wake_all(tags->pbreserved_tags);
}
/*
@@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
WARN_ON_ONCE(1);
return BLK_MQ_TAG_FAIL;
}
- bt = &tags->breserved_tags;
+ bt = tags->pbreserved_tags;
tag_offset = 0;
} else {
- bt = &tags->bitmap_tags;
+ bt = tags->pbitmap_tags;
tag_offset = tags->nr_reserved_tags;
}
@@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
data->ctx);
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
- bt = &tags->breserved_tags;
+ bt = tags->pbreserved_tags;
else
- bt = &tags->bitmap_tags;
+ bt = tags->pbitmap_tags;
/*
* If destination hw queue is changed, fake wake up on
@@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
const int real_tag = tag - tags->nr_reserved_tags;
BUG_ON(real_tag >= tags->nr_tags);
- sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
+ sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
} else {
BUG_ON(tag >= tags->nr_reserved_tags);
- sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
+ sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
}
}
@@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* We can hit rq == NULL here, because the tagging functions
* test and set the bit before assigning ->rqs[].
*/
- if (rq && rq->q == hctx->queue)
+ if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
return iter_data->fn(hctx, rq, iter_data->data, reserved);
return true;
}
@@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
busy_tag_iter_fn *fn, void *priv)
{
if (tags->nr_reserved_tags)
- bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
- bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
+ bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
+ bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
}
/**
@@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
continue;
if (tags->nr_reserved_tags)
- bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
- bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+ bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
+ bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
}
blk_queue_exit(q);
}
@@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
node))
goto free_bitmap_tags;
+ tags->pbitmap_tags = &tags->bitmap_tags;
+ tags->pbreserved_tags = &tags->breserved_tags;
+
return tags;
free_bitmap_tags:
sbitmap_queue_free(&tags->bitmap_tags);
@@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
return NULL;
}
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
+{
+ unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
+ int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+ bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+ int node = tag_set->numa_node;
+
+ if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
+ return false;
+ if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
+ node))
+ goto free_bitmap_tags;
+
+ return true;
+free_bitmap_tags:
+ sbitmap_queue_free(&tag_set->shared_bitmap_tags);
+ return false;
+}
+
+bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
+{
+ unsigned int depth = nr_requests -tag_set->reserved_tags;
+ int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
+ bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
+ int node = tag_set->numa_node;
+
+ if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
+ return false;
+ if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
+ node))
+ goto free_bitmap_tags;
+
+ return true;
+free_bitmap_tags:
+ sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
+ return false;
+}
+
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+ unsigned int total_tags,
unsigned int reserved_tags,
int node, int alloc_policy)
{
@@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_tags = total_tags;
tags->nr_reserved_tags = reserved_tags;
+ if (blk_mq_is_sbitmap_shared(set))
+ return tags;
return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
}
@@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
- sbitmap_queue_resize(&tags->bitmap_tags,
+ sbitmap_queue_resize(tags->pbitmap_tags,
tdepth - tags->nr_reserved_tags);
}
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index e36bec8e0970..198b6fbe2c22 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,20 +13,31 @@ struct blk_mq_tags {
atomic_t active_queues;
+ /* We should consider deleting these so they are not referenced by mistake */
struct sbitmap_queue bitmap_tags;
struct sbitmap_queue breserved_tags;
+ struct sbitmap_queue *pbitmap_tags;
+ struct sbitmap_queue *pbreserved_tags;
+
struct request **rqs;
struct request **static_rqs;
struct list_head page_list;
};
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
+extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
+ unsigned long nr_requests);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
+ unsigned int nr_tags,
+ unsigned int reserved_tags,
+ int node, int alloc_policy);
extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
-extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
+extern void blk_mq_put_tag(struct blk_mq_tags *tags,
+ struct blk_mq_ctx *ctx, unsigned int tag);
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2d2956249ae9..8d5b21919c9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
struct sbitmap_queue *sbq;
list_del_init(&wait->entry);
- sbq = &hctx->tags->bitmap_tags;
+ sbq = hctx->tags->pbitmap_tags;
atomic_dec(&sbq->ws_active);
}
spin_unlock(&hctx->dispatch_wait_lock);
@@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
- struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+ struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
struct wait_queue_head *wq;
wait_queue_entry_t *wait;
bool ret;
@@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
if (node == NUMA_NO_NODE)
node = set->numa_node;
- tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+ tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
if (!tags)
return NULL;
@@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
if (ret)
goto out_free_mq_map;
+ if (blk_mq_is_sbitmap_shared(set)) {
+ if (!blk_mq_init_shared_sbitmap(set)) {
+ ret = -ENOMEM;
+ goto out_free_mq_map;
+ }
+
+ for (i = 0; i < set->nr_hw_queues; i++) {
+ struct blk_mq_tags *tags = set->tags[i];
+
+ tags->pbitmap_tags = &set->shared_bitmap_tags;
+ tags->pbreserved_tags = &set->shared_breserved_tags;
+ }
+ }
+
mutex_init(&set->tag_list_lock);
INIT_LIST_HEAD(&set->tag_list);
@@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
{
struct blk_mq_tag_set *set = q->tag_set;
struct blk_mq_hw_ctx *hctx;
+ bool sched_tags = false;
int i, ret;
if (!set)
@@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
false);
} else {
+ sched_tags = true;
ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
nr, true);
}
@@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
q->elevator->type->ops.depth_updated(hctx);
}
- if (!ret)
+ /*
+ * if ret is 0, all queues should have been updated to the same depth
+ * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
+ * if some are updated, we should probably roll back the change altogether. FIXME
+ */
+ if (!ret) {
+ if (blk_mq_is_sbitmap_shared(set)) {
+ if (sched_tags) {
+ sbitmap_queue_free(&set->sched_shared_bitmap_tags);
+ sbitmap_queue_free(&set->sched_shared_breserved_tags);
+ if (!blk_mq_init_sched_shared_sbitmap(set, nr))
+ return -ENOMEM; /* fixup error handling */
+
+ queue_for_each_hw_ctx(q, hctx, i) {
+ hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
+ hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
+ }
+ } else {
+ sbitmap_queue_free(&set->shared_bitmap_tags);
+ sbitmap_queue_free(&set->shared_breserved_tags);
+ if (!blk_mq_init_shared_sbitmap(set))
+ return -ENOMEM; /* fixup error handling */
+
+ queue_for_each_hw_ctx(q, hctx, i) {
+ hctx->tags->pbitmap_tags = &set->shared_bitmap_tags;
+ hctx->tags->pbreserved_tags = &set->shared_breserved_tags;
+ }
+ }
+ }
q->nr_requests = nr;
+ }
+ /*
+ * if ret != 0, q->nr_requests would not be updated, yet the depth
+ * for some hctx may have changed - is that right?
+ */
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 78d38b5f2793..4c1ea206d3f4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -166,6 +166,11 @@ struct blk_mq_alloc_data {
struct blk_mq_hw_ctx *hctx;
};
+static inline bool blk_mq_is_sbitmap_shared(struct blk_mq_tag_set *tag_set)
+{
+ return !!(tag_set->flags & BLK_MQ_F_TAG_HCTX_SHARED);
+}
+
static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data)
{
if (data->flags & BLK_MQ_REQ_INTERNAL)
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0ef637..59fb6afd8e68 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -359,7 +359,7 @@ static unsigned int kyber_sched_tags_shift(struct request_queue *q)
* All of the hardware queues have the same depth, so we can just grab
* the shift of the first one.
*/
- return q->queue_hw_ctx[0]->sched_tags->bitmap_tags.sb.shift;
+ return q->queue_hw_ctx[0]->sched_tags->pbitmap_tags->sb.shift;
}
static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
@@ -502,7 +502,7 @@ static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
khd->batching = 0;
hctx->sched_data = khd;
- sbitmap_queue_min_shallow_depth(&hctx->sched_tags->bitmap_tags,
+ sbitmap_queue_min_shallow_depth(hctx->sched_tags->pbitmap_tags,
kqd->async_depth);
return 0;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8952c9dfef79..9b923b42c07b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -109,6 +109,12 @@ struct blk_mq_tag_set {
unsigned int flags; /* BLK_MQ_F_* */
void *driver_data;
+ struct sbitmap_queue shared_bitmap_tags;
+ struct sbitmap_queue shared_breserved_tags;
+
+ struct sbitmap_queue sched_shared_bitmap_tags;
+ struct sbitmap_queue sched_shared_breserved_tags;
+
struct blk_mq_tags **tags;
struct mutex tag_list_lock;
@@ -226,6 +232,7 @@ struct blk_mq_ops {
enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
BLK_MQ_F_TAG_QUEUE_SHARED = 1 << 1,
+ BLK_MQ_F_TAG_HCTX_SHARED = 1 << 2,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset'
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
` (2 preceding siblings ...)
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-19 14:27 ` John Garry
2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
From: Hannes Reinecke <hare@suse.com>
Add a host template flag 'host_tagset' so hostwide tagset can be
shared on multiple reply queues after the SCSI device's reply queue
is converted to blk-mq hw queue.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
drivers/scsi/scsi_lib.c | 2 ++
include/scsi/scsi_host.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 5447738906ac..97c9352f2d16 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1901,6 +1901,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
shost->tag_set.flags |=
BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+ if (shost->hostt->host_tagset)
+ shost->tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
shost->tag_set.driver_data = shost;
return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 31e0d6ca1eba..08299aeb822d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -442,6 +442,9 @@ struct scsi_host_template {
/* True if the low-level driver supports blk-mq only */
unsigned force_blk_mq:1;
+ /* True if the host uses host-wide tagspace */
+ unsigned host_tagset:1;
+
/*
* Countdown for host blocking with no commands outstanding.
*/
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
` (3 preceding siblings ...)
2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
@ 2019-11-19 14:27 ` John Garry
4 siblings, 0 replies; 9+ messages in thread
From: John Garry @ 2019-11-19 14:27 UTC (permalink / raw)
To: axboe, jejb, martin.petersen
Cc: linux-block, linux-kernel, linux-scsi, ming.lei, hare,
bvanassche, chenxiang66, John Garry
Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.
Signed-off-by: John Garry <john.garry@huawei.com>
---
drivers/scsi/hisi_sas/hisi_sas.h | 3 +-
drivers/scsi/hisi_sas/hisi_sas_main.c | 36 ++++++-----
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 86 +++++++++++---------------
3 files changed, 56 insertions(+), 69 deletions(-)
diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 720c4d6be939..5dbc59084f63 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -8,6 +8,8 @@
#define _HISI_SAS_H_
#include <linux/acpi.h>
+#include <linux/blk-mq.h>
+#include <linux/blk-mq-pci.h>
#include <linux/clk.h>
#include <linux/debugfs.h>
#include <linux/dmapool.h>
@@ -390,7 +392,6 @@ struct hisi_hba {
u32 intr_coal_count; /* Interrupt count to coalesce */
int cq_nvecs;
- unsigned int *reply_map;
/* bist */
enum sas_linkrate debugfs_bist_linkrate;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 0847e682797b..48d6f1c7f9c2 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -421,6 +421,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
struct device *dev = hisi_hba->dev;
int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
int n_elem = 0, n_elem_dif = 0, n_elem_req = 0;
+ struct scsi_cmnd *scmd = NULL;
struct hisi_sas_dq *dq;
unsigned long flags;
int wr_q_index;
@@ -436,10 +437,23 @@ static int hisi_sas_task_prep(struct sas_task *task,
return -ECOMM;
}
- if (hisi_hba->reply_map) {
- int cpu = raw_smp_processor_id();
- unsigned int dq_index = hisi_hba->reply_map[cpu];
+ if (task->uldd_task) {
+ struct ata_queued_cmd *qc;
+ if (dev_is_sata(device)) {
+ qc = task->uldd_task;
+ scmd = qc->scsicmd;
+ } else {
+ scmd = task->uldd_task;
+ }
+ }
+
+ if (scmd) {
+ unsigned int dq_index;
+ u32 blk_tag;
+
+ blk_tag = blk_mq_unique_tag(scmd->request);
+ dq_index = blk_mq_unique_tag_to_hwq(blk_tag);
*dq_pointer = dq = &hisi_hba->dq[dq_index];
} else {
*dq_pointer = dq = sas_dev->dq;
@@ -468,21 +482,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
if (hisi_hba->hw->slot_index_alloc)
rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
- else {
- struct scsi_cmnd *scsi_cmnd = NULL;
-
- if (task->uldd_task) {
- struct ata_queued_cmd *qc;
+ else
+ rc = hisi_sas_slot_index_alloc(hisi_hba, scmd);
- if (dev_is_sata(device)) {
- qc = task->uldd_task;
- scsi_cmnd = qc->scsicmd;
- } else {
- scsi_cmnd = task->uldd_task;
- }
- }
- rc = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
- }
if (rc < 0)
goto err_out_dif_dma_unmap;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index cb8d087762db..5180360482eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2344,66 +2344,35 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
return IRQ_HANDLED;
}
-static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
{
- const struct cpumask *mask;
- int queue, cpu;
-
- for (queue = 0; queue < nvecs; queue++) {
- struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+ int vectors;
+ int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+ struct Scsi_Host *shost = hisi_hba->shost;
+ struct irq_affinity desc = {
+ .pre_vectors = BASE_VECTORS_V3_HW,
+ };
+
+ min_msi = MIN_AFFINE_VECTORS_V3_HW;
+ vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+ min_msi, max_msi,
+ PCI_IRQ_MSI |
+ PCI_IRQ_AFFINITY,
+ &desc);
+ if (vectors < 0)
+ return -ENOENT;
- mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
- BASE_VECTORS_V3_HW);
- if (!mask)
- goto fallback;
- cq->pci_irq_mask = mask;
- for_each_cpu(cpu, mask)
- hisi_hba->reply_map[cpu] = queue;
- }
- return;
+ hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+ shost->nr_hw_queues = hisi_hba->cq_nvecs;
-fallback:
- for_each_possible_cpu(cpu)
- hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
- /* Don't clean all CQ masks */
+ return 0;
}
static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
{
struct device *dev = hisi_hba->dev;
struct pci_dev *pdev = hisi_hba->pci_dev;
- int vectors, rc, i;
- int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
-
- if (auto_affine_msi_experimental) {
- struct irq_affinity desc = {
- .pre_vectors = BASE_VECTORS_V3_HW,
- };
-
- min_msi = MIN_AFFINE_VECTORS_V3_HW;
-
- hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
- sizeof(unsigned int),
- GFP_KERNEL);
- if (!hisi_hba->reply_map)
- return -ENOMEM;
- vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
- min_msi, max_msi,
- PCI_IRQ_MSI |
- PCI_IRQ_AFFINITY,
- &desc);
- if (vectors < 0)
- return -ENOENT;
- setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
- } else {
- min_msi = max_msi;
- vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
- max_msi, PCI_IRQ_MSI);
- if (vectors < 0)
- return vectors;
- }
-
- hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+ int rc, i;
rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
int_phy_up_down_bcast_v3_hw, 0,
@@ -3048,6 +3017,15 @@ static int debugfs_set_bist_v3_hw(struct hisi_hba *hisi_hba, bool enable)
return 0;
}
+static int hisi_sas_map_queues(struct Scsi_Host *shost)
+{
+ struct hisi_hba *hisi_hba = shost_priv(shost);
+ struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+ return blk_mq_pci_map_queues(qmap, hisi_hba->pci_dev,
+ BASE_VECTORS_V3_HW);
+}
+
static struct scsi_host_template sht_v3_hw = {
.name = DRV_NAME,
.module = THIS_MODULE,
@@ -3056,6 +3034,7 @@ static struct scsi_host_template sht_v3_hw = {
.slave_configure = hisi_sas_slave_configure,
.scan_finished = hisi_sas_scan_finished,
.scan_start = hisi_sas_scan_start,
+ .map_queues = hisi_sas_map_queues,
.change_queue_depth = sas_change_queue_depth,
.bios_param = sas_bios_param,
.this_id = -1,
@@ -3069,6 +3048,7 @@ static struct scsi_host_template sht_v3_hw = {
.shost_attrs = host_attrs_v3_hw,
.tag_alloc_policy = BLK_TAG_ALLOC_RR,
.host_reset = hisi_sas_host_reset,
+ .host_tagset = 1,
};
static const struct hisi_sas_hw hisi_sas_v3_hw = {
@@ -3240,6 +3220,10 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (hisi_sas_debugfs_enable)
hisi_sas_debugfs_init(hisi_hba);
+ rc = interrupt_preinit_v3_hw(hisi_hba);
+ if (rc)
+ goto err_out_ha;
+ dev_err(dev, "%d hw qeues\n", shost->nr_hw_queues);
rc = scsi_add_host(shost, dev);
if (rc)
goto err_out_ha;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
@ 2019-11-21 8:55 ` Ming Lei
2019-11-21 10:24 ` John Garry
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2019-11-21 8:55 UTC (permalink / raw)
To: John Garry
Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
linux-scsi, hare, bvanassche, chenxiang66
On Tue, Nov 19, 2019 at 10:27:36PM +0800, John Garry wrote:
> Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
> multiple reply queues with single hostwide tags.
>
> In addition, these drivers want to use interrupt assignment in
> pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
> CPU hotplug may cause in-flight IO completion to not be serviced when an
> interrupt is shutdown.
>
> To solve that problem, Ming's patchset to drain hctx's should ensure no
> IOs are missed in-flight [1].
>
> However, to take advantage of that patchset, we need to map the HBA HW
> queues to blk mq hctx's; to do that, we need to expose the HBA HW queues.
>
> In making that transition, the per-SCSI command request tags are no
> longer unique per Scsi host - they are just unique per hctx. As such, the
> HBA LLDD would have to generate this tag internally, which has a certain
> performance overhead.
>
> However another problem is that blk mq assumes the host may accept
> (Scsi_host.can_queue * #hw queue) commands. In [2], we removed the Scsi
> host busy counter, which would stop the LLDD being sent more than
> .can_queue commands; however, we should still ensure that the block layer
> does not issue more than .can_queue commands to the Scsi host.
>
> To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
> which may be requested at init time.
>
> New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
> tagset to indicate whether the shared sbitmap should be used.
>
> Even when BLK_MQ_F_TAG_HCTX_SHARED is set, we still allocate a full set of
> tags and requests per hctx; the reason for this is that if we only allocate
> tags and requests for a single hctx - like hctx0 - we may break block
> drivers which expect a request be associated with a specific hctx, i.e.
> not hctx0.
>
> This is based on work originally from Ming Lei in [3] and from Bart's
> suggestion in [4].
>
> [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
> [1] https://lore.kernel.org/linux-block/20191014015043.25029-1-ming.lei@redhat.com/
> [2] https://lore.kernel.org/linux-scsi/20191025065855.6309-1-ming.lei@redhat.com/
> [3] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
> [4] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144be
>
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
> block/bfq-iosched.c | 4 +--
> block/blk-mq-debugfs.c | 4 +--
> block/blk-mq-sched.c | 14 ++++++++
> block/blk-mq-tag.c | 74 +++++++++++++++++++++++++++++++++---------
> block/blk-mq-tag.h | 15 +++++++--
> block/blk-mq.c | 57 +++++++++++++++++++++++++++++---
> block/blk-mq.h | 5 +++
> block/kyber-iosched.c | 4 +--
> include/linux/blk-mq.h | 7 ++++
> 9 files changed, 157 insertions(+), 27 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 0319d6339822..9633c864af07 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6327,8 +6327,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
> struct blk_mq_tags *tags = hctx->sched_tags;
> unsigned int min_shallow;
>
> - min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
> - sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
> + min_shallow = bfq_update_depths(bfqd, tags->pbitmap_tags);
> + sbitmap_queue_min_shallow_depth(tags->pbitmap_tags, min_shallow);
> }
>
> static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 33a40ae1d60f..9cf2f09c08e4 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -483,7 +483,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
> res = mutex_lock_interruptible(&q->sysfs_lock);
> if (res)
> goto out;
> - if (hctx->tags)
> + if (hctx->tags) /* We should just iterate the relevant bits for this hctx FIXME */
> sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
> mutex_unlock(&q->sysfs_lock);
>
> @@ -517,7 +517,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
> res = mutex_lock_interruptible(&q->sysfs_lock);
> if (res)
> goto out;
> - if (hctx->sched_tags)
> + if (hctx->sched_tags) /* We should just iterate the relevant bits for this hctx FIXME */
> sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
> mutex_unlock(&q->sysfs_lock);
>
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index ca22afd47b3d..b23547f10949 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -492,6 +492,7 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q)
>
> int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> {
> + struct blk_mq_tag_set *tag_set = q->tag_set;
> struct blk_mq_hw_ctx *hctx;
> struct elevator_queue *eq;
> unsigned int i;
> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> blk_mq_debugfs_register_sched_hctx(q, hctx);
> }
>
> + if (blk_mq_is_sbitmap_shared(tag_set)) {
> + if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> + ret = -ENOMEM;
> + goto err;
> + }
> + queue_for_each_hw_ctx(q, hctx, i) {
> + struct blk_mq_tags *tags = hctx->sched_tags;
> +
> + tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> + tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
This kind of sharing is wrong, sched tags should be request queue wide
instead of tagset wide, and each request queue has its own & independent
scheduler queue.
> + }
> + }
> +
> return 0;
>
> err:
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 42792942b428..6625bebb46c3 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> */
> void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
> {
> - sbitmap_queue_wake_all(&tags->bitmap_tags);
> + sbitmap_queue_wake_all(tags->pbitmap_tags);
> if (include_reserve)
> - sbitmap_queue_wake_all(&tags->breserved_tags);
> + sbitmap_queue_wake_all(tags->pbreserved_tags);
> }
>
> /*
> @@ -113,10 +113,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
> WARN_ON_ONCE(1);
> return BLK_MQ_TAG_FAIL;
> }
> - bt = &tags->breserved_tags;
> + bt = tags->pbreserved_tags;
> tag_offset = 0;
> } else {
> - bt = &tags->bitmap_tags;
> + bt = tags->pbitmap_tags;
> tag_offset = tags->nr_reserved_tags;
> }
>
> @@ -162,9 +162,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
> data->ctx);
> tags = blk_mq_tags_from_data(data);
> if (data->flags & BLK_MQ_REQ_RESERVED)
> - bt = &tags->breserved_tags;
> + bt = tags->pbreserved_tags;
> else
> - bt = &tags->bitmap_tags;
> + bt = tags->pbitmap_tags;
>
> /*
> * If destination hw queue is changed, fake wake up on
> @@ -190,10 +190,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
> const int real_tag = tag - tags->nr_reserved_tags;
>
> BUG_ON(real_tag >= tags->nr_tags);
> - sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
> + sbitmap_queue_clear(tags->pbitmap_tags, real_tag, ctx->cpu);
> } else {
> BUG_ON(tag >= tags->nr_reserved_tags);
> - sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
> + sbitmap_queue_clear(tags->pbreserved_tags, tag, ctx->cpu);
> }
> }
>
> @@ -220,7 +220,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
> * We can hit rq == NULL here, because the tagging functions
> * test and set the bit before assigning ->rqs[].
> */
> - if (rq && rq->q == hctx->queue)
> + if (rq && rq->q == hctx->queue && rq->mq_hctx == hctx)
> return iter_data->fn(hctx, rq, iter_data->data, reserved);
> return true;
> }
> @@ -321,8 +321,8 @@ static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
> busy_tag_iter_fn *fn, void *priv)
> {
> if (tags->nr_reserved_tags)
> - bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
> - bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
> + bt_tags_for_each(tags, tags->pbreserved_tags, fn, priv, true);
> + bt_tags_for_each(tags, tags->pbitmap_tags, fn, priv, false);
> }
>
> /**
> @@ -419,8 +419,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
> continue;
>
> if (tags->nr_reserved_tags)
> - bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
> - bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
> + bt_for_each(hctx, tags->pbreserved_tags, fn, priv, true);
> + bt_for_each(hctx, tags->pbitmap_tags, fn, priv, false);
> }
> blk_queue_exit(q);
> }
> @@ -444,6 +444,9 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
> node))
> goto free_bitmap_tags;
>
> + tags->pbitmap_tags = &tags->bitmap_tags;
> + tags->pbreserved_tags = &tags->breserved_tags;
> +
> return tags;
> free_bitmap_tags:
> sbitmap_queue_free(&tags->bitmap_tags);
> @@ -452,7 +455,46 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
> return NULL;
> }
>
> -struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> +bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set)
> +{
> + unsigned int depth = tag_set->queue_depth -tag_set->reserved_tags;
> + int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> + bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> + int node = tag_set->numa_node;
> +
> + if (bt_alloc(&tag_set->shared_bitmap_tags, depth, round_robin, node))
> + return false;
> + if (bt_alloc(&tag_set->shared_breserved_tags, tag_set->reserved_tags, round_robin,
> + node))
> + goto free_bitmap_tags;
> +
> + return true;
> +free_bitmap_tags:
> + sbitmap_queue_free(&tag_set->shared_bitmap_tags);
> + return false;
> +}
> +
> +bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set, unsigned long nr_requests)
> +{
> + unsigned int depth = nr_requests -tag_set->reserved_tags;
> + int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(tag_set->flags);
> + bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR;
> + int node = tag_set->numa_node;
> +
> + if (bt_alloc(&tag_set->sched_shared_bitmap_tags, depth, round_robin, node))
> + return false;
> + if (bt_alloc(&tag_set->sched_shared_breserved_tags, tag_set->reserved_tags, round_robin,
> + node))
> + goto free_bitmap_tags;
> +
> + return true;
> +free_bitmap_tags:
> + sbitmap_queue_free(&tag_set->sched_shared_bitmap_tags);
> + return false;
> +}
> +
> +struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
> + unsigned int total_tags,
> unsigned int reserved_tags,
> int node, int alloc_policy)
> {
> @@ -470,6 +512,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
> tags->nr_tags = total_tags;
> tags->nr_reserved_tags = reserved_tags;
>
> + if (blk_mq_is_sbitmap_shared(set))
> + return tags;
> return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
> }
>
> @@ -526,7 +570,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
> * Don't need (or can't) update reserved tags here, they
> * remain static and should never need resizing.
> */
> - sbitmap_queue_resize(&tags->bitmap_tags,
> + sbitmap_queue_resize(tags->pbitmap_tags,
> tdepth - tags->nr_reserved_tags);
> }
>
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index e36bec8e0970..198b6fbe2c22 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -13,20 +13,31 @@ struct blk_mq_tags {
>
> atomic_t active_queues;
>
> + /* We should consider deleting these so they are not referenced by mistake */
> struct sbitmap_queue bitmap_tags;
> struct sbitmap_queue breserved_tags;
>
> + struct sbitmap_queue *pbitmap_tags;
> + struct sbitmap_queue *pbreserved_tags;
> +
> struct request **rqs;
> struct request **static_rqs;
> struct list_head page_list;
> };
>
>
> -extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
> +extern bool blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *tag_set);
> +extern bool blk_mq_init_sched_shared_sbitmap(struct blk_mq_tag_set *tag_set,
> + unsigned long nr_requests);
> +extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *tag_set,
> + unsigned int nr_tags,
> + unsigned int reserved_tags,
> + int node, int alloc_policy);
> extern void blk_mq_free_tags(struct blk_mq_tags *tags);
>
> extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
> -extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag);
> +extern void blk_mq_put_tag(struct blk_mq_tags *tags,
> + struct blk_mq_ctx *ctx, unsigned int tag);
> extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
> struct blk_mq_tags **tags,
> unsigned int depth, bool can_grow);
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2d2956249ae9..8d5b21919c9a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1089,7 +1089,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
> struct sbitmap_queue *sbq;
>
> list_del_init(&wait->entry);
> - sbq = &hctx->tags->bitmap_tags;
> + sbq = hctx->tags->pbitmap_tags;
> atomic_dec(&sbq->ws_active);
> }
> spin_unlock(&hctx->dispatch_wait_lock);
> @@ -1107,7 +1107,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
> static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
> struct request *rq)
> {
> - struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
> + struct sbitmap_queue *sbq = hctx->tags->pbitmap_tags;
> struct wait_queue_head *wq;
> wait_queue_entry_t *wait;
> bool ret;
> @@ -2097,7 +2097,7 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
> if (node == NUMA_NO_NODE)
> node = set->numa_node;
>
> - tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
> + tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
> BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
> if (!tags)
> return NULL;
> @@ -3100,6 +3100,20 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> if (ret)
> goto out_free_mq_map;
>
> + if (blk_mq_is_sbitmap_shared(set)) {
> + if (!blk_mq_init_shared_sbitmap(set)) {
> + ret = -ENOMEM;
> + goto out_free_mq_map;
> + }
> +
> + for (i = 0; i < set->nr_hw_queues; i++) {
> + struct blk_mq_tags *tags = set->tags[i];
> +
> + tags->pbitmap_tags = &set->shared_bitmap_tags;
> + tags->pbreserved_tags = &set->shared_breserved_tags;
> + }
> + }
> +
> mutex_init(&set->tag_list_lock);
> INIT_LIST_HEAD(&set->tag_list);
>
> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> {
> struct blk_mq_tag_set *set = q->tag_set;
> struct blk_mq_hw_ctx *hctx;
> + bool sched_tags = false;
> int i, ret;
>
> if (!set)
> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
> false);
> } else {
> + sched_tags = true;
> ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> nr, true);
> }
> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> q->elevator->type->ops.depth_updated(hctx);
> }
>
> - if (!ret)
> + /*
> + * if ret is 0, all queues should have been updated to the same depth
> + * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> + * if some are updated, we should probably roll back the change altogether. FIXME
> + */
> + if (!ret) {
> + if (blk_mq_is_sbitmap_shared(set)) {
> + if (sched_tags) {
> + sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> + sbitmap_queue_free(&set->sched_shared_breserved_tags);
> + if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> + return -ENOMEM; /* fixup error handling */
> +
> + queue_for_each_hw_ctx(q, hctx, i) {
> + hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> + hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> + }
> + } else {
> + sbitmap_queue_free(&set->shared_bitmap_tags);
> + sbitmap_queue_free(&set->shared_breserved_tags);
> + if (!blk_mq_init_shared_sbitmap(set))
> + return -ENOMEM; /* fixup error handling */
No, we can't re-allocate driver tags here which are shared by all LUNs.
And you should see that 'can_grow' is set as false for driver tags
in blk_mq_update_nr_requests(), which can only touch per-request-queue
data, not tagset wide data.
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
2019-11-21 8:55 ` Ming Lei
@ 2019-11-21 10:24 ` John Garry
2019-11-25 3:00 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: John Garry @ 2019-11-21 10:24 UTC (permalink / raw)
To: Ming Lei
Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
linux-scsi, hare, bvanassche, chenxiang (M)
>>
>> int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>> {
>> + struct blk_mq_tag_set *tag_set = q->tag_set;
>> struct blk_mq_hw_ctx *hctx;
>> struct elevator_queue *eq;
>> unsigned int i;
>> @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
>> blk_mq_debugfs_register_sched_hctx(q, hctx);
>> }
>>
>> + if (blk_mq_is_sbitmap_shared(tag_set)) {
>> + if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
>> + ret = -ENOMEM;
>> + goto err;
>> + }
>> + queue_for_each_hw_ctx(q, hctx, i) {
>> + struct blk_mq_tags *tags = hctx->sched_tags;
>> +
>> + tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
>> + tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
>
> This kind of sharing is wrong, sched tags should be request queue wide
> instead of tagset wide, and each request queue has its own & independent
> scheduler queue.
Right, so if we get get a scheduler tag we still need to get a driver
tag, and this would be the "shared" tag.
That makes things simpler then.
>
>> + }
>> + }
>> +
>> return 0;
>>
>> err:
>> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
>> index 42792942b428..6625bebb46c3 100644
>> --- a/block/blk-mq-tag.c
>> +++ b/block/blk-mq-tag.c
>> @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
>> */
>> void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
>> {
>> - sbitmap_queue_wake_all(&tags->bitmap_tags);
>> + sbitmap_queue_wake_all(tags->pbitmap_tags);
>> if (include_reserve)
>> - sbitmap_queue_wake_all(&tags->breserved_tags);
>> + sbitmap_queue_wake_all(tags->pbreserved_tags);
>> }
>>
[...]
>> mutex_init(&set->tag_list_lock);
>> INIT_LIST_HEAD(&set->tag_list);
>>
>> @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>> {
>> struct blk_mq_tag_set *set = q->tag_set;
>> struct blk_mq_hw_ctx *hctx;
>> + bool sched_tags = false;
>> int i, ret;
>>
>> if (!set)
>> @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>> ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
>> false);
>> } else {
>> + sched_tags = true;
>> ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
>> nr, true);
>> }
>> @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
>> q->elevator->type->ops.depth_updated(hctx);
>> }
>>
>> - if (!ret)
>> + /*
>> + * if ret is 0, all queues should have been updated to the same depth
>> + * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
>> + * if some are updated, we should probably roll back the change altogether. FIXME
>> + */
>> + if (!ret) {
>> + if (blk_mq_is_sbitmap_shared(set)) {
>> + if (sched_tags) {
>> + sbitmap_queue_free(&set->sched_shared_bitmap_tags);
>> + sbitmap_queue_free(&set->sched_shared_breserved_tags);
>> + if (!blk_mq_init_sched_shared_sbitmap(set, nr))
>> + return -ENOMEM; /* fixup error handling */
>> +
>> + queue_for_each_hw_ctx(q, hctx, i) {
>> + hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
>> + hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
>> + }
>> + } else {
>> + sbitmap_queue_free(&set->shared_bitmap_tags);
>> + sbitmap_queue_free(&set->shared_breserved_tags);
>> + if (!blk_mq_init_shared_sbitmap(set))
>> + return -ENOMEM; /* fixup error handling */
>
> No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> in blk_mq_update_nr_requests(), which can only touch per-request-queue
> data, not tagset wide data.
Yeah, I see that. We should just resize for driver tags bitmap.
Personally I think the mainline code is a little loose here, as if we
could grow driver tags, then blk_mq_tagset.tags would be out-of-sync
with the hctx->tags. Maybe that should be made more explicit in the code.
BTW, do you have anything to say about this (modified slightly) comment:
/*
* if ret != 0, q->nr_requests would not be updated, yet the depth
* for some hctx sched tags may have changed - is that the right thing
* to do?
*/
Thanks,
John
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset
2019-11-21 10:24 ` John Garry
@ 2019-11-25 3:00 ` Ming Lei
0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-11-25 3:00 UTC (permalink / raw)
To: John Garry
Cc: axboe, jejb, martin.petersen, linux-block, linux-kernel,
linux-scsi, hare, bvanassche, chenxiang (M)
On Thu, Nov 21, 2019 at 10:24:16AM +0000, John Garry wrote:
> > > int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > > {
> > > + struct blk_mq_tag_set *tag_set = q->tag_set;
> > > struct blk_mq_hw_ctx *hctx;
> > > struct elevator_queue *eq;
> > > unsigned int i;
> > > @@ -537,6 +538,19 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
> > > blk_mq_debugfs_register_sched_hctx(q, hctx);
> > > }
> > > + if (blk_mq_is_sbitmap_shared(tag_set)) {
> > > + if (!blk_mq_init_sched_shared_sbitmap(tag_set, q->nr_requests)) {
> > > + ret = -ENOMEM;
> > > + goto err;
> > > + }
> > > + queue_for_each_hw_ctx(q, hctx, i) {
> > > + struct blk_mq_tags *tags = hctx->sched_tags;
> > > +
> > > + tags->pbitmap_tags = &tag_set->sched_shared_bitmap_tags;
> > > + tags->pbreserved_tags = &tag_set->sched_shared_breserved_tags;
> >
> > This kind of sharing is wrong, sched tags should be request queue wide
> > instead of tagset wide, and each request queue has its own & independent
> > scheduler queue.
>
> Right, so if we get get a scheduler tag we still need to get a driver tag,
> and this would be the "shared" tag.
>
> That makes things simpler then.
>
> >
> > > + }
> > > + }
> > > +
> > > return 0;
> > > err:
> > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> > > index 42792942b428..6625bebb46c3 100644
> > > --- a/block/blk-mq-tag.c
> > > +++ b/block/blk-mq-tag.c
> > > @@ -35,9 +35,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
> > > */
> > > void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
> > > {
> > > - sbitmap_queue_wake_all(&tags->bitmap_tags);
> > > + sbitmap_queue_wake_all(tags->pbitmap_tags);
> > > if (include_reserve)
> > > - sbitmap_queue_wake_all(&tags->breserved_tags);
> > > + sbitmap_queue_wake_all(tags->pbreserved_tags);
> > > }
>
> [...]
>
>
> > > mutex_init(&set->tag_list_lock);
> > > INIT_LIST_HEAD(&set->tag_list);
> > > @@ -3137,6 +3151,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > > {
> > > struct blk_mq_tag_set *set = q->tag_set;
> > > struct blk_mq_hw_ctx *hctx;
> > > + bool sched_tags = false;
> > > int i, ret;
> > > if (!set)
> > > @@ -3160,6 +3175,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > > ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
> > > false);
> > > } else {
> > > + sched_tags = true;
> > > ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
> > > nr, true);
> > > }
> > > @@ -3169,8 +3185,41 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
> > > q->elevator->type->ops.depth_updated(hctx);
> > > }
> > > - if (!ret)
> > > + /*
> > > + * if ret is 0, all queues should have been updated to the same depth
> > > + * if not, then maybe some have been updated - yuk, need to handle this for shared sbitmap...
> > > + * if some are updated, we should probably roll back the change altogether. FIXME
> > > + */
> > > + if (!ret) {
> > > + if (blk_mq_is_sbitmap_shared(set)) {
> > > + if (sched_tags) {
> > > + sbitmap_queue_free(&set->sched_shared_bitmap_tags);
> > > + sbitmap_queue_free(&set->sched_shared_breserved_tags);
> > > + if (!blk_mq_init_sched_shared_sbitmap(set, nr))
> > > + return -ENOMEM; /* fixup error handling */
> > > +
> > > + queue_for_each_hw_ctx(q, hctx, i) {
> > > + hctx->sched_tags->pbitmap_tags = &set->sched_shared_bitmap_tags;
> > > + hctx->sched_tags->pbreserved_tags = &set->sched_shared_breserved_tags;
> > > + }
> > > + } else {
> > > + sbitmap_queue_free(&set->shared_bitmap_tags);
> > > + sbitmap_queue_free(&set->shared_breserved_tags);
> > > + if (!blk_mq_init_shared_sbitmap(set))
> > > + return -ENOMEM; /* fixup error handling */
> >
> > No, we can't re-allocate driver tags here which are shared by all LUNs. > And you should see that 'can_grow' is set as false for driver tags
> > in blk_mq_update_nr_requests(), which can only touch per-request-queue
> > data, not tagset wide data.
>
> Yeah, I see that. We should just resize for driver tags bitmap.
>
> Personally I think the mainline code is a little loose here, as if we could
> grow driver tags, then blk_mq_tagset.tags would be out-of-sync with the
> hctx->tags. Maybe that should be made more explicit in the code.
>
> BTW, do you have anything to say about this (modified slightly) comment:
>
> /*
> * if ret != 0, q->nr_requests would not be updated, yet the depth
> * for some hctx sched tags may have changed - is that the right thing
> * to do?
> */
In theory, your concern is right, but so far we only support same
depth of hctx for either sched tags or driver tags, so not an issue
so far.
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-11-25 3:01 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 14:27 [PATCH RFC v2 0/5] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry
2019-11-19 14:27 ` [PATCH RFC v2 1/5] blk-mq: Remove some unused function arguments John Garry
2019-11-19 14:27 ` [PATCH RFC V2 2/5] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED John Garry
2019-11-19 14:27 ` [PATCH RFC V2 3/5] blk-mq: Facilitate a shared sbitmap per tagset John Garry
2019-11-21 8:55 ` Ming Lei
2019-11-21 10:24 ` John Garry
2019-11-25 3:00 ` Ming Lei
2019-11-19 14:27 ` [PATCH RFC V2 4/5] scsi: Add template flag 'host_tagset' John Garry
2019-11-19 14:27 ` [PATCH RFC V2 5/5] scsi: hisi_sas: Switch v3 hw to MQ John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).