* [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix
@ 2022-02-28 9:04 Ming Lei
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
` (5 more replies)
0 siblings, 6 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
Hi,
The 1st patch figures out correct numa node for each kind of hw queue.
The 2nd patch simplifies reallocation of q->queue_hw_ctx a bit.
The 3rd patch re-configures poll capability after queue map is changed.
The 4th patch changes mtip32xx to avoid to refer to q->queue_hw_ctx
directly.
The 5th & 6th patches fix use-after-free on q->queue_hw_ctx.
Ming Lei (6):
blk-mq: figure out correct numa node for hw queue
blk-mq: simplify reallocation of hw ctxs a bit
blk-mq: re-config poll after queue map is changed
block: mtip32xx: don't touch q->queue_hw_ctx
blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index
blk-mq: manage hctx map via xarray
block/blk-mq-sysfs.c | 2 +-
block/blk-mq-tag.c | 2 +-
block/blk-mq.c | 125 +++++++++++++++++-------------
block/blk-mq.h | 2 +-
drivers/block/mtip32xx/mtip32xx.c | 4 +-
include/linux/blk-mq.h | 8 +-
include/linux/blkdev.h | 2 +-
7 files changed, 84 insertions(+), 61 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-03-01 13:30 ` Christoph Hellwig
2022-03-01 19:19 ` John Garry
2022-02-28 9:04 ` [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit Ming Lei
` (4 subsequent siblings)
5 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
The current code always uses default queue map and hw queue index
for figuring out the numa node for hw queue, this way isn't correct
because blk-mq supports three queue maps, and the correct queue map
should be used for the specified hw queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 36 ++++++++++++++++++++++++++++++------
1 file changed, 30 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ce7725031..931add81813b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3107,15 +3107,41 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
blk_mq_free_tags(tags);
}
+static int hctx_idx_to_type(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx)
+{
+ int j;
+
+ for (j = 0; j < set->nr_maps; j++) {
+ unsigned int start = set->map[j].queue_offset;
+ unsigned int end = start + set->map[j].nr_queues;
+
+ if (hctx_idx >= start && hctx_idx < end)
+ break;
+ }
+
+ if (j >= set->nr_maps)
+ j = HCTX_TYPE_DEFAULT;
+
+ return j;
+}
+
+static int blk_mq_get_hctx_node(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx)
+{
+ int type = hctx_idx_to_type(set, hctx_idx);
+
+ return blk_mq_hw_queue_to_node(&set->map[type], hctx_idx);
+}
+
static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
unsigned int hctx_idx,
unsigned int nr_tags,
unsigned int reserved_tags)
{
struct blk_mq_tags *tags;
- int node;
+ int node = blk_mq_get_hctx_node(set, hctx_idx);
- node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
if (node == NUMA_NO_NODE)
node = set->numa_node;
@@ -3165,9 +3191,8 @@ static int blk_mq_alloc_rqs(struct blk_mq_tag_set *set,
{
unsigned int i, j, entries_per_page, max_order = 4;
size_t rq_size, left;
- int node;
+ int node = blk_mq_get_hctx_node(set, hctx_idx);
- node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
if (node == NUMA_NO_NODE)
node = set->numa_node;
@@ -3941,10 +3966,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
/* protect against switching io scheduler */
mutex_lock(&q->sysfs_lock);
for (i = 0; i < set->nr_hw_queues; i++) {
- int node;
+ int node = blk_mq_get_hctx_node(set, i);
struct blk_mq_hw_ctx *hctx;
- node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], i);
/*
* If the hw queue has been mapped to another numa node,
* we need to realloc the hctx. If allocation fails, fallback
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-03-01 13:30 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 3/6] blk-mq: re-config poll after queue map is changed Ming Lei
` (3 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
blk_mq_alloc_and_init_hctx() has already taken reuse into account, so
no need to do it outside, then we can simplify blk_mq_realloc_hw_ctxs().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 33 ++++++++++++++-------------------
1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 931add81813b..361903f07c84 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3966,29 +3966,24 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
/* protect against switching io scheduler */
mutex_lock(&q->sysfs_lock);
for (i = 0; i < set->nr_hw_queues; i++) {
+ int old_node;
int node = blk_mq_get_hctx_node(set, i);
- struct blk_mq_hw_ctx *hctx;
+ struct blk_mq_hw_ctx *old_hctx = hctxs[i];
- /*
- * If the hw queue has been mapped to another numa node,
- * we need to realloc the hctx. If allocation fails, fallback
- * to use the previous one.
- */
- if (hctxs[i] && (hctxs[i]->numa_node == node))
- continue;
+ if (old_hctx) {
+ old_node = old_hctx->numa_node;
+ blk_mq_exit_hctx(q, set, old_hctx, i);
+ }
- hctx = blk_mq_alloc_and_init_hctx(set, q, i, node);
- if (hctx) {
- if (hctxs[i])
- blk_mq_exit_hctx(q, set, hctxs[i], i);
- hctxs[i] = hctx;
- } else {
- if (hctxs[i])
- pr_warn("Allocate new hctx on node %d fails,\
- fallback to previous one on node %d\n",
- node, hctxs[i]->numa_node);
- else
+ hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i, node);
+ if (!hctxs[i]) {
+ if (!old_hctx)
break;
+ pr_warn("Allocate new hctx on node %d fails, fallback to previous one on node %d\n",
+ node, old_node);
+ hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i,
+ old_node);
+ WARN_ON_ONCE(!hctxs[i]);
}
}
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/6] blk-mq: re-config poll after queue map is changed
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
2022-02-28 9:04 ` [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-03-01 13:31 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx Ming Lei
` (2 subsequent siblings)
5 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
queue map can be changed when updating nr_hw_queues, so we need to
re-config queue's poll capability. Add one helper for doing this job.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 361903f07c84..5ce6a8289cff 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4010,6 +4010,17 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
mutex_unlock(&q->sysfs_lock);
}
+static void blk_mq_config_poll(struct request_queue *q)
+{
+ struct blk_mq_tag_set *set = q->tag_set;
+
+ if (set->nr_maps > HCTX_TYPE_POLL &&
+ set->map[HCTX_TYPE_POLL].nr_queues)
+ blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+ else
+ blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
+}
+
int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
struct request_queue *q)
{
@@ -4044,9 +4055,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
q->tag_set = set;
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
- if (set->nr_maps > HCTX_TYPE_POLL &&
- set->map[HCTX_TYPE_POLL].nr_queues)
- blk_queue_flag_set(QUEUE_FLAG_POLL, q);
+ blk_mq_config_poll(q);
INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
INIT_LIST_HEAD(&q->requeue_list);
@@ -4512,6 +4521,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_realloc_hw_ctxs(set, q);
+ blk_mq_config_poll(q);
if (q->nr_hw_queues != set->nr_hw_queues) {
int i = prev_nr_hw_queues;
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
` (2 preceding siblings ...)
2022-02-28 9:04 ` [PATCH 3/6] blk-mq: re-config poll after queue map is changed Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-03-01 13:32 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index Ming Lei
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
5 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
q->queue_hw_ctx is really one blk-mq internal structure for retrieving
hctx via its index, not supposed to be used by drivers. Meantime drivers
can get the tags structure easily from tagset.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/mtip32xx/mtip32xx.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index cba956881d55..2d43ab5b5cc0 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -160,9 +160,7 @@ static bool mtip_check_surprise_removal(struct driver_data *dd)
static struct mtip_cmd *mtip_cmd_from_tag(struct driver_data *dd,
unsigned int tag)
{
- struct blk_mq_hw_ctx *hctx = dd->queue->queue_hw_ctx[0];
-
- return blk_mq_rq_to_pdu(blk_mq_tag_to_rq(hctx->tags, tag));
+ return blk_mq_rq_to_pdu(blk_mq_tag_to_rq(dd->tags.tags[0], tag));
}
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
` (3 preceding siblings ...)
2022-02-28 9:04 ` [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-03-01 13:33 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
5 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
Prepare for managing hctx mapping via xarray by adding one helper of
blk_mq_get_hctx().
No functional change.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-sysfs.c | 2 +-
block/blk-mq.c | 4 ++--
block/blk-mq.h | 2 +-
include/linux/blk-mq.h | 8 +++++++-
4 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 674786574075..22950e764536 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -279,7 +279,7 @@ int __blk_mq_register_dev(struct device *dev, struct request_queue *q)
unreg:
while (--i >= 0)
- blk_mq_unregister_hctx(q->queue_hw_ctx[i]);
+ blk_mq_unregister_hctx(blk_mq_get_hctx(q, i));
kobject_uevent(q->mq_kobj, KOBJ_REMOVE);
kobject_del(q->mq_kobj);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ce6a8289cff..d01c1693a3d4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -71,7 +71,7 @@ static int blk_mq_poll_stats_bkt(const struct request *rq)
static inline struct blk_mq_hw_ctx *blk_qc_to_hctx(struct request_queue *q,
blk_qc_t qc)
{
- return q->queue_hw_ctx[(qc & ~BLK_QC_T_INTERNAL) >> BLK_QC_T_SHIFT];
+ return blk_mq_get_hctx(q, (qc & ~BLK_QC_T_INTERNAL) >> BLK_QC_T_SHIFT);
}
static inline struct request *blk_qc_to_rq(struct blk_mq_hw_ctx *hctx,
@@ -573,7 +573,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
* If not tell the caller that it should skip this queue.
*/
ret = -EXDEV;
- data.hctx = q->queue_hw_ctx[hctx_idx];
+ data.hctx = blk_mq_get_hctx(q, hctx_idx);
if (!blk_mq_hw_queue_mapped(data.hctx))
goto out_queue_exit;
cpu = cpumask_first_and(data.hctx->cpumask, cpu_online_mask);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 948791ea2a3e..804668457f88 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -83,7 +83,7 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *
enum hctx_type type,
unsigned int cpu)
{
- return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]];
+ return blk_mq_get_hctx(q, q->tag_set->map[type].mq_map[cpu]);
}
static inline enum hctx_type blk_mq_get_hctx_type(unsigned int flags)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3a41d50b85d3..3d949b1968f3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -916,9 +916,15 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
return rq + 1;
}
+static inline struct blk_mq_hw_ctx *blk_mq_get_hctx(struct request_queue *q,
+ unsigned int hctx_idx)
+{
+ return q->queue_hw_ctx[hctx_idx];
+}
+
#define queue_for_each_hw_ctx(q, hctx, i) \
for ((i) = 0; (i) < (q)->nr_hw_queues && \
- ({ hctx = (q)->queue_hw_ctx[i]; 1; }); (i)++)
+ ({ hctx = blk_mq_get_hctx((q), (i)); 1; }); (i)++)
#define hctx_for_each_ctx(hctx, ctx, i) \
for ((i) = 0; (i) < (hctx)->nr_ctx && \
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
` (4 preceding siblings ...)
2022-02-28 9:04 ` [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index Ming Lei
@ 2022-02-28 9:04 ` Ming Lei
2022-02-28 17:36 ` kernel test robot
` (2 more replies)
5 siblings, 3 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-28 9:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Yu Kuai, Ming Lei
Firstly code becomes more clean by switching to xarray from plain array.
Secondly use-after-free on q->queue_hw_ctx can be fixed because
queue_for_each_hw_ctx() may be run when updating nr_hw_queues is
in-progress. With this patch, q->hctx_table is defined as xarray, and
this structure will share same lifetime with request queue, so
queue_for_each_hw_ctx() can use q->hctx_table to lookup hctx reliably.
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/blk-mq-tag.c | 2 +-
block/blk-mq.c | 48 +++++++++++++++++-------------------------
include/linux/blk-mq.h | 4 ++--
include/linux/blkdev.h | 2 +-
4 files changed, 23 insertions(+), 33 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 0fd409b8e86e..eb33667d63b2 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -498,7 +498,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv)
{
/*
- * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
+ * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and hctx_table
* while the queue is frozen. So we can use q_usage_counter to avoid
* racing with it.
*/
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d01c1693a3d4..273137dc3ffd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3437,6 +3437,8 @@ static void blk_mq_exit_hctx(struct request_queue *q,
blk_mq_remove_cpuhp(hctx);
+ xa_erase(&q->hctx_table, hctx_idx);
+
spin_lock(&q->unused_hctx_lock);
list_add(&hctx->hctx_list, &q->unused_hctx_list);
spin_unlock(&q->unused_hctx_lock);
@@ -3476,8 +3478,15 @@ static int blk_mq_init_hctx(struct request_queue *q,
if (blk_mq_init_request(set, hctx->fq->flush_rq, hctx_idx,
hctx->numa_node))
goto exit_hctx;
+
+ if (xa_insert(&q->hctx_table, hctx_idx, hctx, GFP_KERNEL))
+ goto exit_flush_rq;
+
return 0;
+ exit_flush_rq:
+ if (set->ops->exit_request)
+ set->ops->exit_request(set, hctx->fq->flush_rq, hctx_idx);
exit_hctx:
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
@@ -3855,7 +3864,7 @@ void blk_mq_release(struct request_queue *q)
kobject_put(&hctx->kobj);
}
- kfree(q->queue_hw_ctx);
+ xa_destroy(&q->hctx_table);
/*
* release .mq_kobj and sw queue's kobject now because
@@ -3945,45 +3954,26 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
struct request_queue *q)
{
int i, j, end;
- struct blk_mq_hw_ctx **hctxs = q->queue_hw_ctx;
-
- if (q->nr_hw_queues < set->nr_hw_queues) {
- struct blk_mq_hw_ctx **new_hctxs;
-
- new_hctxs = kcalloc_node(set->nr_hw_queues,
- sizeof(*new_hctxs), GFP_KERNEL,
- set->numa_node);
- if (!new_hctxs)
- return;
- if (hctxs)
- memcpy(new_hctxs, hctxs, q->nr_hw_queues *
- sizeof(*hctxs));
- q->queue_hw_ctx = new_hctxs;
- kfree(hctxs);
- hctxs = new_hctxs;
- }
/* protect against switching io scheduler */
mutex_lock(&q->sysfs_lock);
for (i = 0; i < set->nr_hw_queues; i++) {
int old_node;
int node = blk_mq_get_hctx_node(set, i);
- struct blk_mq_hw_ctx *old_hctx = hctxs[i];
+ struct blk_mq_hw_ctx *old_hctx = blk_mq_get_hctx(q, i);
if (old_hctx) {
old_node = old_hctx->numa_node;
blk_mq_exit_hctx(q, set, old_hctx, i);
}
- hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i, node);
- if (!hctxs[i]) {
+ if (!blk_mq_alloc_and_init_hctx(set, q, i, node)) {
if (!old_hctx)
break;
pr_warn("Allocate new hctx on node %d fails, fallback to previous one on node %d\n",
node, old_node);
- hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i,
- old_node);
- WARN_ON_ONCE(!hctxs[i]);
+ WARN_ON_ONCE(!blk_mq_alloc_and_init_hctx(set, q, i,
+ old_node));
}
}
/*
@@ -4000,12 +3990,10 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
}
for (; j < end; j++) {
- struct blk_mq_hw_ctx *hctx = hctxs[j];
+ struct blk_mq_hw_ctx *hctx = blk_mq_get_hctx(q, j);
- if (hctx) {
+ if (hctx)
blk_mq_exit_hctx(q, set, hctx, j);
- hctxs[j] = NULL;
- }
}
mutex_unlock(&q->sysfs_lock);
}
@@ -4045,6 +4033,8 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
INIT_LIST_HEAD(&q->unused_hctx_list);
spin_lock_init(&q->unused_hctx_lock);
+ xa_init(&q->hctx_table);
+
blk_mq_realloc_hw_ctxs(set, q);
if (!q->nr_hw_queues)
goto err_hctxs;
@@ -4074,7 +4064,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
return 0;
err_hctxs:
- kfree(q->queue_hw_ctx);
+ xa_destroy(&q->hctx_table);
q->nr_hw_queues = 0;
blk_mq_sysfs_deinit(q);
err_poll:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3d949b1968f3..b102ff64e56a 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -919,12 +919,12 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
static inline struct blk_mq_hw_ctx *blk_mq_get_hctx(struct request_queue *q,
unsigned int hctx_idx)
{
- return q->queue_hw_ctx[hctx_idx];
+ return xa_load(&q->hctx_table, hctx_idx);
}
#define queue_for_each_hw_ctx(q, hctx, i) \
for ((i) = 0; (i) < (q)->nr_hw_queues && \
- ({ hctx = blk_mq_get_hctx((q), (i)); 1; }); (i)++)
+ (hctx = blk_mq_get_hctx((q), (i))); (i)++)
#define hctx_for_each_ctx(hctx, ctx, i) \
for ((i) = 0; (i) < (hctx)->nr_ctx && \
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f757f9c2871f..a53ae40aaded 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -355,7 +355,7 @@ struct request_queue {
unsigned int queue_depth;
/* hw dispatch queues */
- struct blk_mq_hw_ctx **queue_hw_ctx;
+ struct xarray hctx_table;
unsigned int nr_hw_queues;
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
@ 2022-02-28 17:36 ` kernel test robot
2022-02-28 17:57 ` kernel test robot
2022-03-01 13:37 ` Christoph Hellwig
2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-02-28 17:36 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: kbuild-all, linux-block, Yu Kuai, Ming Lei
Hi Ming,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on v5.17-rc6 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: alpha-allyesconfig (https://download.01.org/0day-ci/archive/20220301/202203010021.zTuzL2PG-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/176e39bc0acb20f8fd869d170b429b7253b089c4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
git checkout 176e39bc0acb20f8fd869d170b429b7253b089c4
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=alpha SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from block/blk-mq-debugfs-zoned.c:7:
>> block/blk-mq-debugfs.h:24:42: warning: 'struct blk_mq_hw_ctx' declared inside parameter list will not be visible outside of this definition or declaration
24 | struct blk_mq_hw_ctx *hctx);
| ^~~~~~~~~~~~~
block/blk-mq-debugfs.h:25:44: warning: 'struct blk_mq_hw_ctx' declared inside parameter list will not be visible outside of this definition or declaration
25 | void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
| ^~~~~~~~~~~~~
block/blk-mq-debugfs.h:32:47: warning: 'struct blk_mq_hw_ctx' declared inside parameter list will not be visible outside of this definition or declaration
32 | struct blk_mq_hw_ctx *hctx);
| ^~~~~~~~~~~~~
block/blk-mq-debugfs.h:33:50: warning: 'struct blk_mq_hw_ctx' declared inside parameter list will not be visible outside of this definition or declaration
33 | void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
| ^~~~~~~~~~~~~
vim +24 block/blk-mq-debugfs.h
16b738f651c83a0 Omar Sandoval 2017-05-04 20
6cfc0081b046ebf Greg Kroah-Hartman 2019-06-12 21 void blk_mq_debugfs_register(struct request_queue *q);
d173a25165c1244 Omar Sandoval 2017-05-04 22 void blk_mq_debugfs_unregister(struct request_queue *q);
6cfc0081b046ebf Greg Kroah-Hartman 2019-06-12 23 void blk_mq_debugfs_register_hctx(struct request_queue *q,
9c1051aacde8280 Omar Sandoval 2017-05-04 @24 struct blk_mq_hw_ctx *hctx);
9c1051aacde8280 Omar Sandoval 2017-05-04 25 void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
6cfc0081b046ebf Greg Kroah-Hartman 2019-06-12 26 void blk_mq_debugfs_register_hctxs(struct request_queue *q);
9c1051aacde8280 Omar Sandoval 2017-05-04 27 void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
d332ce091813d11 Omar Sandoval 2017-05-04 28
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
2022-02-28 17:36 ` kernel test robot
@ 2022-02-28 17:57 ` kernel test robot
2022-03-01 9:08 ` Ming Lei
2022-03-01 13:37 ` Christoph Hellwig
2 siblings, 1 reply; 25+ messages in thread
From: kernel test robot @ 2022-02-28 17:57 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: llvm, kbuild-all, linux-block, Yu Kuai, Ming Lei
Hi Ming,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on v5.17-rc6 next-20220225]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-a002-20220228 (https://download.01.org/0day-ci/archive/20220301/202203010133.JIpKpP2Z-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/176e39bc0acb20f8fd869d170b429b7253b089c4
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
git checkout 176e39bc0acb20f8fd869d170b429b7253b089c4
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from block/blk-mq-debugfs-zoned.c:7:
>> block/blk-mq-debugfs.h:24:14: warning: declaration of 'struct blk_mq_hw_ctx' will not be visible outside of this function [-Wvisibility]
struct blk_mq_hw_ctx *hctx);
^
block/blk-mq-debugfs.h:25:44: warning: declaration of 'struct blk_mq_hw_ctx' will not be visible outside of this function [-Wvisibility]
void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
^
block/blk-mq-debugfs.h:32:19: warning: declaration of 'struct blk_mq_hw_ctx' will not be visible outside of this function [-Wvisibility]
struct blk_mq_hw_ctx *hctx);
^
block/blk-mq-debugfs.h:33:50: warning: declaration of 'struct blk_mq_hw_ctx' will not be visible outside of this function [-Wvisibility]
void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
^
4 warnings generated.
vim +24 block/blk-mq-debugfs.h
16b738f651c83a Omar Sandoval 2017-05-04 20
6cfc0081b046eb Greg Kroah-Hartman 2019-06-12 21 void blk_mq_debugfs_register(struct request_queue *q);
d173a25165c124 Omar Sandoval 2017-05-04 22 void blk_mq_debugfs_unregister(struct request_queue *q);
6cfc0081b046eb Greg Kroah-Hartman 2019-06-12 23 void blk_mq_debugfs_register_hctx(struct request_queue *q,
9c1051aacde828 Omar Sandoval 2017-05-04 @24 struct blk_mq_hw_ctx *hctx);
9c1051aacde828 Omar Sandoval 2017-05-04 25 void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
6cfc0081b046eb Greg Kroah-Hartman 2019-06-12 26 void blk_mq_debugfs_register_hctxs(struct request_queue *q);
9c1051aacde828 Omar Sandoval 2017-05-04 27 void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
d332ce091813d1 Omar Sandoval 2017-05-04 28
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-02-28 17:57 ` kernel test robot
@ 2022-03-01 9:08 ` Ming Lei
0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2022-03-01 9:08 UTC (permalink / raw)
To: kernel test robot; +Cc: Jens Axboe, llvm, kbuild-all, linux-block, Yu Kuai
Hi,
Thanks for the report!
On Tue, Mar 01, 2022 at 01:57:33AM +0800, kernel test robot wrote:
> Hi Ming,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on axboe-block/for-next]
> [also build test WARNING on v5.17-rc6 next-20220225]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url: https://github.com/0day-ci/linux/commits/Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
> base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
> config: i386-randconfig-a002-20220228 (https://download.01.org/0day-ci/archive/20220301/202203010133.JIpKpP2Z-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # https://github.com/0day-ci/linux/commit/176e39bc0acb20f8fd869d170b429b7253b089c4
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Ming-Lei/blk-mq-update_nr_hw_queues-related-improvement-bugfix/20220228-170706
> git checkout 176e39bc0acb20f8fd869d170b429b7253b089c4
> # save the config file to linux build tree
> mkdir build_dir
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All warnings (new ones prefixed by >>):
>
> In file included from block/blk-mq-debugfs-zoned.c:7:
> >> block/blk-mq-debugfs.h:24:14: warning: declaration of 'struct blk_mq_hw_ctx' will not be visible outside of this function [-Wvisibility]
> struct blk_mq_hw_ctx *hctx);
The warning is nothing to do with this patchset.
You can trigger that by just changing modify time of include/linux/blkdev.h.
The following patch can fix the warning, and I will post one formal
patch soon.
diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
index a68aa6041a10..37fe2716e96e 100644
--- a/block/blk-mq-debugfs.h
+++ b/block/blk-mq-debugfs.h
@@ -5,6 +5,7 @@
#ifdef CONFIG_BLK_DEBUG_FS
#include <linux/seq_file.h>
+#include <linux/blk-mq.h>
struct blk_mq_debugfs_attr {
const char *name;
Thanks,
Ming
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
@ 2022-03-01 13:30 ` Christoph Hellwig
2022-03-01 19:19 ` John Garry
1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:30 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit
2022-02-28 9:04 ` [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit Ming Lei
@ 2022-03-01 13:30 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:30 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On Mon, Feb 28, 2022 at 05:04:26PM +0800, Ming Lei wrote:
> blk_mq_alloc_and_init_hctx() has already taken reuse into account, so
> no need to do it outside, then we can simplify blk_mq_realloc_hw_ctxs().
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/6] blk-mq: re-config poll after queue map is changed
2022-02-28 9:04 ` [PATCH 3/6] blk-mq: re-config poll after queue map is changed Ming Lei
@ 2022-03-01 13:31 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:31 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On Mon, Feb 28, 2022 at 05:04:27PM +0800, Ming Lei wrote:
> queue map can be changed when updating nr_hw_queues, so we need to
> re-config queue's poll capability. Add one helper for doing this job.
s/re-config/reconfigure/g
> +static void blk_mq_config_poll(struct request_queue *q)
> +{
> + struct blk_mq_tag_set *set = q->tag_set;
> +
> + if (set->nr_maps > HCTX_TYPE_POLL &&
> + set->map[HCTX_TYPE_POLL].nr_queues)
> + blk_queue_flag_set(QUEUE_FLAG_POLL, q);
> + else
> + blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
> +}
Maybe name this blk_mq_update_poll_flag?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx
2022-02-28 9:04 ` [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx Ming Lei
@ 2022-03-01 13:32 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:32 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On Mon, Feb 28, 2022 at 05:04:28PM +0800, Ming Lei wrote:
> q->queue_hw_ctx is really one blk-mq internal structure for retrieving
> hctx via its index, not supposed to be used by drivers. Meantime drivers
> can get the tags structure easily from tagset.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index
2022-02-28 9:04 ` [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index Ming Lei
@ 2022-03-01 13:33 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:33 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On Mon, Feb 28, 2022 at 05:04:29PM +0800, Ming Lei wrote:
> Prepare for managing hctx mapping via xarray by adding one helper of
> blk_mq_get_hctx().
I'd rather merge this into the next patch for reason I'll explain there.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
2022-02-28 17:36 ` kernel test robot
2022-02-28 17:57 ` kernel test robot
@ 2022-03-01 13:37 ` Christoph Hellwig
2022-03-02 2:06 ` Ming Lei
2 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-01 13:37 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
> - hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i,
> - old_node);
> - WARN_ON_ONCE(!hctxs[i]);
> + WARN_ON_ONCE(!blk_mq_alloc_and_init_hctx(set, q, i,
> + old_node));
Please avoid doing the actual work inside a WARN_ON statement.
>
> for (; j < end; j++) {
> - struct blk_mq_hw_ctx *hctx = hctxs[j];
> + struct blk_mq_hw_ctx *hctx = blk_mq_get_hctx(q, j);
>
> - if (hctx) {
> + if (hctx)
> blk_mq_exit_hctx(q, set, hctx, j);
> - hctxs[j] = NULL;
> - }
> }
Instead of a for loop that does xa_loads repeatedly this can just
use xa_for_each_range. Same for a bunch of other loops like that,
e.g. in blk_mq_unregister_dev or the __blk_mq_register_dev failure
path.
> @@ -919,12 +919,12 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
> static inline struct blk_mq_hw_ctx *blk_mq_get_hctx(struct request_queue *q,
> unsigned int hctx_idx)
> {
> - return q->queue_hw_ctx[hctx_idx];
> + return xa_load(&q->hctx_table, hctx_idx);
> }
>
> #define queue_for_each_hw_ctx(q, hctx, i) \
> for ((i) = 0; (i) < (q)->nr_hw_queues && \
> - ({ hctx = blk_mq_get_hctx((q), (i)); 1; }); (i)++)
> + (hctx = blk_mq_get_hctx((q), (i))); (i)++)
This should be using a xa_for_each loop.
With various places converted to loops I'm not even sure we really
want the blk_mq_get_hctx helper at all.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
2022-03-01 13:30 ` Christoph Hellwig
@ 2022-03-01 19:19 ` John Garry
2022-03-02 1:47 ` Ming Lei
1 sibling, 1 reply; 25+ messages in thread
From: John Garry @ 2022-03-01 19:19 UTC (permalink / raw)
To: Ming Lei, Jens Axboe; +Cc: linux-block, Yu Kuai
On 28/02/2022 09:04, Ming Lei wrote:
> The current code always uses default queue map and hw queue index
> for figuring out the numa node for hw queue, this way isn't correct
> because blk-mq supports three queue maps, and the correct queue map
> should be used for the specified hw queue.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
Hi Ming,
Just some small comments to consider if you need to respin.
Thanks,
John
> block/blk-mq.c | 36 ++++++++++++++++++++++++++++++------
> 1 file changed, 30 insertions(+), 6 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a05ce7725031..931add81813b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3107,15 +3107,41 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
> blk_mq_free_tags(tags);
> }
>
> +static int
enum hctx_type?
> hctx_idx_to_type(struct blk_mq_tag_set *set,
> + unsigned int hctx_idx)
> +{
> + int j;
super nit: normally use i
> +
> + for (j = 0; j < set->nr_maps; j++) {
> + unsigned int start = set->map[j].queue_offset;
nit: double whitespace intentional?
> + unsigned int end = start + set->map[j].nr_queues;
> +
> + if (hctx_idx >= start && hctx_idx < end)
> + break;
> + }
> +
> + if (j >= set->nr_maps)
> + j = HCTX_TYPE_DEFAULT;
> +
> + return j;
> +}
> +
> +static int blk_mq_get_hctx_node(struct blk_mq_tag_set *set,
> + unsigned int hctx_idx)
> +{
> + int type = hctx_idx_to_type(set, hctx_idx);
> +
> + return blk_mq_hw_queue_to_node(&set->map[type], hctx_idx);
> +}
> +
> static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
> unsigned int hctx_idx,
> unsigned int nr_tags,
> unsigned int reserved_tags)
> {
> struct blk_mq_tags *tags;
> - int node;
> + int node = blk_mq_get_hctx_node(set, hctx_idx);
nit: the code originally had reverse firtree ordering, which I suppose
is not by mistake
>
> - node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
> if (node == NUMA_NO_NODE)
> node = set->numa_node;
>
> @@ -3165,9 +3191,8 @@ static int blk_mq_alloc_rqs(struct blk_mq_tag_set *set,
> {
> unsigned int i, j, entries_per_page, max_order = 4;
> size_t rq_size, left;
> - int node;
> + int node = blk_mq_get_hctx_node(set, hctx_idx);
and here
>
> - node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], hctx_idx);
> if (node == NUMA_NO_NODE)
> node = set->numa_node;
>
> @@ -3941,10 +3966,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
> /* protect against switching io scheduler */
> mutex_lock(&q->sysfs_lock);
> for (i = 0; i < set->nr_hw_queues; i++) {
> - int node;
> + int node = blk_mq_get_hctx_node(set, i);
> struct blk_mq_hw_ctx *hctx;
>
> - node = blk_mq_hw_queue_to_node(&set->map[HCTX_TYPE_DEFAULT], i);
> /*
> * If the hw queue has been mapped to another numa node,
> * we need to realloc the hctx. If allocation fails, fallback
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-03-01 19:19 ` John Garry
@ 2022-03-02 1:47 ` Ming Lei
2022-03-02 9:02 ` John Garry
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-03-02 1:47 UTC (permalink / raw)
To: John Garry; +Cc: Jens Axboe, linux-block, Yu Kuai
On Tue, Mar 01, 2022 at 07:19:43PM +0000, John Garry wrote:
> On 28/02/2022 09:04, Ming Lei wrote:
> > The current code always uses default queue map and hw queue index
> > for figuring out the numa node for hw queue, this way isn't correct
> > because blk-mq supports three queue maps, and the correct queue map
> > should be used for the specified hw queue.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
>
> Hi Ming,
>
> Just some small comments to consider if you need to respin.
>
> Thanks,
> John
>
> > block/blk-mq.c | 36 ++++++++++++++++++++++++++++++------
> > 1 file changed, 30 insertions(+), 6 deletions(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index a05ce7725031..931add81813b 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -3107,15 +3107,41 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
> > blk_mq_free_tags(tags);
> > }
> > +static int
>
> enum hctx_type?
>
> > hctx_idx_to_type(struct blk_mq_tag_set *set,
> > + unsigned int hctx_idx)
> > +{
> > + int j;
>
> super nit: normally use i
OK
>
> > +
> > + for (j = 0; j < set->nr_maps; j++) {
> > + unsigned int start = set->map[j].queue_offset;
>
> nit: double whitespace intentional?
will fix it.
>
> > + unsigned int end = start + set->map[j].nr_queues;
> > +
> > + if (hctx_idx >= start && hctx_idx < end)
> > + break;
> > + }
> > +
> > + if (j >= set->nr_maps)
> > + j = HCTX_TYPE_DEFAULT;
> > +
> > + return j;
> > +}
> > +
> > +static int blk_mq_get_hctx_node(struct blk_mq_tag_set *set,
> > + unsigned int hctx_idx)
> > +{
> > + int type = hctx_idx_to_type(set, hctx_idx);
> > +
> > + return blk_mq_hw_queue_to_node(&set->map[type], hctx_idx);
> > +}
> > +
> > static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
> > unsigned int hctx_idx,
> > unsigned int nr_tags,
> > unsigned int reserved_tags)
> > {
> > struct blk_mq_tags *tags;
> > - int node;
> > + int node = blk_mq_get_hctx_node(set, hctx_idx);
>
> nit: the code originally had reverse firtree ordering, which I suppose is
> not by mistake
What is reverse firtree ordering here? I don't know what is wrong
with the above one line change from patch style viewpoint, and
checkpatch complains nothing here.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-03-01 13:37 ` Christoph Hellwig
@ 2022-03-02 2:06 ` Ming Lei
2022-03-02 8:24 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-03-02 2:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yu Kuai
On Tue, Mar 01, 2022 at 05:37:17AM -0800, Christoph Hellwig wrote:
> > - hctxs[i] = blk_mq_alloc_and_init_hctx(set, q, i,
> > - old_node);
> > - WARN_ON_ONCE(!hctxs[i]);
> > + WARN_ON_ONCE(!blk_mq_alloc_and_init_hctx(set, q, i,
> > + old_node));
>
>
> Please avoid doing the actual work inside a WARN_ON statement.
OK.
>
> >
> > for (; j < end; j++) {
> > - struct blk_mq_hw_ctx *hctx = hctxs[j];
> > + struct blk_mq_hw_ctx *hctx = blk_mq_get_hctx(q, j);
> >
> > - if (hctx) {
> > + if (hctx)
> > blk_mq_exit_hctx(q, set, hctx, j);
> > - hctxs[j] = NULL;
> > - }
> > }
>
> Instead of a for loop that does xa_loads repeatedly this can just
> use xa_for_each_range. Same for a bunch of other loops like that,
> e.g. in blk_mq_unregister_dev or the __blk_mq_register_dev failure
> path.
>
> > @@ -919,12 +919,12 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq)
> > static inline struct blk_mq_hw_ctx *blk_mq_get_hctx(struct request_queue *q,
> > unsigned int hctx_idx)
> > {
> > - return q->queue_hw_ctx[hctx_idx];
> > + return xa_load(&q->hctx_table, hctx_idx);
> > }
> >
> > #define queue_for_each_hw_ctx(q, hctx, i) \
> > for ((i) = 0; (i) < (q)->nr_hw_queues && \
> > - ({ hctx = blk_mq_get_hctx((q), (i)); 1; }); (i)++)
> > + (hctx = blk_mq_get_hctx((q), (i))); (i)++)
>
> This should be using a xa_for_each loop.
I did considered xa_for_each(), but it requires rcu read lock.
Also queue_for_each_hw_ctx() is supposed to not run in fast path,
meantime xa_load() is lightweight enough too, so repeated xa_load()
is fine here.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-03-02 2:06 ` Ming Lei
@ 2022-03-02 8:24 ` Christoph Hellwig
2022-03-02 9:53 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-02 8:24 UTC (permalink / raw)
To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block, Yu Kuai
On Wed, Mar 02, 2022 at 10:06:04AM +0800, Ming Lei wrote:
> I did considered xa_for_each(), but it requires rcu read lock.
No, I doesn't. It just takes a RCU lock internally.
> Also queue_for_each_hw_ctx() is supposed to not run in fast path,
> meantime xa_load() is lightweight enough too, so repeated xa_load()
> is fine here.
I'd rather have the clarity of the proper iterators.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-03-02 1:47 ` Ming Lei
@ 2022-03-02 9:02 ` John Garry
2022-03-02 9:22 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: John Garry @ 2022-03-02 9:02 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On 02/03/2022 01:47, Ming Lei wrote:
>>> static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>>> unsigned int hctx_idx,
>>> unsigned int nr_tags,
>>> unsigned int reserved_tags)
>>> {
>>> struct blk_mq_tags *tags;
>>> - int node;
>>> + int node = blk_mq_get_hctx_node(set, hctx_idx);
>> nit: the code originally had reverse firtree ordering, which I suppose is
>> not by mistake
> What is reverse firtree ordering here? I don't know what is wrong
> with the above one line change from patch style viewpoint, and
> checkpatch complains nothing here.
checkpath would not complain about this. I'm talking about:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst#n587
The original code had:
struct blk_mq_tags *tags;
int node;
as opposed to:
int node;
struct blk_mq_tags *tags;
That's all. The block code seems to mostly follow this style when
possible. It's just a style issue.
thanks,
John
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-03-02 9:02 ` John Garry
@ 2022-03-02 9:22 ` Ming Lei
2022-03-02 10:11 ` John Garry
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-03-02 9:22 UTC (permalink / raw)
To: John Garry; +Cc: Jens Axboe, linux-block, Yu Kuai
On Wed, Mar 02, 2022 at 09:02:53AM +0000, John Garry wrote:
> On 02/03/2022 01:47, Ming Lei wrote:
> > > > static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
> > > > unsigned int hctx_idx,
> > > > unsigned int nr_tags,
> > > > unsigned int reserved_tags)
> > > > {
> > > > struct blk_mq_tags *tags;
> > > > - int node;
> > > > + int node = blk_mq_get_hctx_node(set, hctx_idx);
> > > nit: the code originally had reverse firtree ordering, which I suppose is
> > > not by mistake
> > What is reverse firtree ordering here? I don't know what is wrong
> > with the above one line change from patch style viewpoint, and
> > checkpatch complains nothing here.
>
> checkpath would not complain about this. I'm talking about:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst#n587
>
> The original code had:
>
> struct blk_mq_tags *tags;
> int node;
>
> as opposed to:
>
> int node;
> struct blk_mq_tags *tags;
>
> That's all. The block code seems to mostly follow this style when possible.
> It's just a style issue.
But my patch doesn't change the order, :-)
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-03-02 8:24 ` Christoph Hellwig
@ 2022-03-02 9:53 ` Ming Lei
2022-03-02 9:59 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-03-02 9:53 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, linux-block, Yu Kuai
On Wed, Mar 02, 2022 at 12:24:34AM -0800, Christoph Hellwig wrote:
> On Wed, Mar 02, 2022 at 10:06:04AM +0800, Ming Lei wrote:
> > I did considered xa_for_each(), but it requires rcu read lock.
>
> No, I doesn't. It just takes a RCU lock internally.
OK.
>
> > Also queue_for_each_hw_ctx() is supposed to not run in fast path,
> > meantime xa_load() is lightweight enough too, so repeated xa_load()
> > is fine here.
>
> I'd rather have the clarity of the proper iterators.
Another point is that 'unsigned long *' is passed to xa_find() as index.
However, almost all users of queue_for_each_hw_ctx() defines hctx index
as 'unsigned int'.
If we switch to xa_for_each(), the type of hctx index needs to be changed
to 'unsigned long' for every queue_for_each_hw_ctx(). But xa_load()
needn't such change.
Also from user viewpoint, looks 'unsigned long' change for hctx index is
still a bit confusing, IMO.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/6] blk-mq: manage hctx map via xarray
2022-03-02 9:53 ` Ming Lei
@ 2022-03-02 9:59 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-03-02 9:59 UTC (permalink / raw)
To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block, Yu Kuai
On Wed, Mar 02, 2022 at 05:53:44PM +0800, Ming Lei wrote:
> Another point is that 'unsigned long *' is passed to xa_find() as index.
> However, almost all users of queue_for_each_hw_ctx() defines hctx index
> as 'unsigned int'.
>
> If we switch to xa_for_each(), the type of hctx index needs to be changed
> to 'unsigned long' for every queue_for_each_hw_ctx(). But xa_load()
> needn't such change.
>
> Also from user viewpoint, looks 'unsigned long' change for hctx index is
> still a bit confusing, IMO.
If you use xarray you need to fit the convention it expects, that should
not be in any way a surprise.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/6] blk-mq: figure out correct numa node for hw queue
2022-03-02 9:22 ` Ming Lei
@ 2022-03-02 10:11 ` John Garry
0 siblings, 0 replies; 25+ messages in thread
From: John Garry @ 2022-03-02 10:11 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Yu Kuai
On 02/03/2022 09:22, Ming Lei wrote:
>>> What is reverse firtree ordering here? I don't know what is wrong
>>> with the above one line change from patch style viewpoint, and
>>> checkpatch complains nothing here.
>> checkpath would not complain about this. I'm talking about:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst#n587
>>
>> The original code had:
>>
>> struct blk_mq_tags *tags;
>> int node;
>>
>> as opposed to:
>>
>> int node;
>> struct blk_mq_tags *tags;
>>
>> That's all. The block code seems to mostly follow this style when possible.
>> It's just a style issue.
> But my patch doesn't change the order,:-)
Right, and I am just saying that if you were to maintain this coding
style then it should. But, as I alluded before, for only 2 declarations
is not so important.
Thanks,
John
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-03-02 10:11 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28 9:04 [PATCH 0/6] blk-mq: update_nr_hw_queues related improvement & bugfix Ming Lei
2022-02-28 9:04 ` [PATCH 1/6] blk-mq: figure out correct numa node for hw queue Ming Lei
2022-03-01 13:30 ` Christoph Hellwig
2022-03-01 19:19 ` John Garry
2022-03-02 1:47 ` Ming Lei
2022-03-02 9:02 ` John Garry
2022-03-02 9:22 ` Ming Lei
2022-03-02 10:11 ` John Garry
2022-02-28 9:04 ` [PATCH 2/6] blk-mq: simplify reallocation of hw ctxs a bit Ming Lei
2022-03-01 13:30 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 3/6] blk-mq: re-config poll after queue map is changed Ming Lei
2022-03-01 13:31 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 4/6] block: mtip32xx: don't touch q->queue_hw_ctx Ming Lei
2022-03-01 13:32 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 5/6] blk-mq: add helper of blk_mq_get_hctx to retrieve hctx via its index Ming Lei
2022-03-01 13:33 ` Christoph Hellwig
2022-02-28 9:04 ` [PATCH 6/6] blk-mq: manage hctx map via xarray Ming Lei
2022-02-28 17:36 ` kernel test robot
2022-02-28 17:57 ` kernel test robot
2022-03-01 9:08 ` Ming Lei
2022-03-01 13:37 ` Christoph Hellwig
2022-03-02 2:06 ` Ming Lei
2022-03-02 8:24 ` Christoph Hellwig
2022-03-02 9:53 ` Ming Lei
2022-03-02 9:59 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).