All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/3] blk-mq: three misc patches
@ 2022-06-16  1:43 Ming Lei
  2022-06-16  1:43 ` [PATCH V3 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-16  1:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei

Hello Guys,

The 1st two patches make referring to q->elevator more reliable, and
avoid potential use-after-free on q->elevator in two code paths.

The 3rd patch improves boot time for scsi host with lots of hw queues.

V3:
	- clear QUEUE_FLAG_SQ_SCHED for none & kyber explicitly

V2:
	- follow Christoph's suggestion to set QUEUE_FLAG_SQ_SCHED inside
	mq-deadline and bfq directly


Ming Lei (3):
  blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
  blk-mq: avoid to touch q->elevator without any protection
  blk-mq: don't clear flush_rq from tags->rqs[]

 block/bfq-iosched.c    |  3 +++
 block/blk-mq-sched.c   |  1 +
 block/blk-mq.c         | 27 ++++++++-------------------
 block/kyber-iosched.c  |  3 ++-
 block/mq-deadline.c    |  3 +++
 include/linux/blkdev.h |  4 ++--
 6 files changed, 19 insertions(+), 22 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V3 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
  2022-06-16  1:43 [PATCH V3 0/3] blk-mq: three misc patches Ming Lei
@ 2022-06-16  1:43 ` Ming Lei
  2022-06-16  1:44 ` [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-16  1:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei

elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e9bf950983c7..22a89c758f70 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4438,12 +4438,14 @@ static bool blk_mq_elv_switch_none(struct list_head *head,
 	if (!qe)
 		return false;
 
+	/* q->elevator needs protection from ->sysfs_lock */
+	mutex_lock(&q->sysfs_lock);
+
 	INIT_LIST_HEAD(&qe->node);
 	qe->q = q;
 	qe->type = q->elevator->type;
 	list_add(&qe->node, head);
 
-	mutex_lock(&q->sysfs_lock);
 	/*
 	 * After elevator_switch_mq, the previous elevator_queue will be
 	 * released by elevator_release. The reference of the io scheduler
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection
  2022-06-16  1:43 [PATCH V3 0/3] blk-mq: three misc patches Ming Lei
  2022-06-16  1:43 ` [PATCH V3 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
@ 2022-06-16  1:44 ` Ming Lei
  2022-06-16  6:15   ` Christoph Hellwig
  2022-06-16  1:44 ` [PATCH V3 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
  2022-06-16 20:46 ` [PATCH V3 0/3] blk-mq: three misc patches Jens Axboe
  3 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2022-06-16  1:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Jan Kara

q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.

Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c    |  3 +++
 block/blk-mq-sched.c   |  1 +
 block/blk-mq.c         | 18 ++----------------
 block/kyber-iosched.c  |  3 ++-
 block/mq-deadline.c    |  3 +++
 include/linux/blkdev.h |  4 ++--
 6 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 0d46cb728bbf..caa55a5624bc 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -7188,6 +7188,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	bfq_init_root_group(bfqd->root_group, bfqd);
 	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
 
+	/* We dispatch from request queue wide instead of hw queue */
+	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
+
 	wbt_disable_default(q);
 	return 0;
 
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 9e56a69422b6..eb3c65a21362 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -564,6 +564,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
 	int ret;
 
 	if (!e) {
+		blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
 		q->elevator = NULL;
 		q->nr_requests = q->tag_set->queue_depth;
 		return 0;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22a89c758f70..112dce569192 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2140,20 +2140,6 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queue);
 
-/*
- * Is the request queue handled by an IO scheduler that does not respect
- * hardware queues when dispatching?
- */
-static bool blk_mq_has_sqsched(struct request_queue *q)
-{
-	struct elevator_queue *e = q->elevator;
-
-	if (e && e->type->ops.dispatch_request &&
-	    !(e->type->elevator_features & ELEVATOR_F_MQ_AWARE))
-		return true;
-	return false;
-}
-
 /*
  * Return prefered queue to dispatch from (if any) for non-mq aware IO
  * scheduler.
@@ -2186,7 +2172,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	unsigned long i;
 
 	sq_hctx = NULL;
-	if (blk_mq_has_sqsched(q))
+	if (blk_queue_sq_sched(q))
 		sq_hctx = blk_mq_get_sq_hctx(q);
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_hctx_stopped(hctx))
@@ -2214,7 +2200,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
 	unsigned long i;
 
 	sq_hctx = NULL;
-	if (blk_mq_has_sqsched(q))
+	if (blk_queue_sq_sched(q))
 		sq_hctx = blk_mq_get_sq_hctx(q);
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_hctx_stopped(hctx))
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 70ff2a599ef6..8f7c745b4a57 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -421,6 +421,8 @@ static int kyber_init_sched(struct request_queue *q, struct elevator_type *e)
 
 	blk_stat_enable_accounting(q);
 
+	blk_queue_flag_clear(QUEUE_FLAG_SQ_SCHED, q);
+
 	eq->elevator_data = kqd;
 	q->elevator = eq;
 
@@ -1033,7 +1035,6 @@ static struct elevator_type kyber_sched = {
 #endif
 	.elevator_attrs = kyber_sched_attrs,
 	.elevator_name = "kyber",
-	.elevator_features = ELEVATOR_F_MQ_AWARE,
 	.elevator_owner = THIS_MODULE,
 };
 
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 6ed602b2f80a..1a9e835e816c 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -642,6 +642,9 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 	spin_lock_init(&dd->lock);
 	spin_lock_init(&dd->zone_lock);
 
+	/* We dispatch from request queue wide instead of hw queue */
+	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);
+
 	q->elevator = eq;
 	return 0;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 608d577734c2..bb6e3c31b3b7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -575,6 +575,7 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
+#define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP) |		\
@@ -616,6 +617,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_nowait(q)	test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
+#define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
@@ -1006,8 +1008,6 @@ void disk_set_independent_access_ranges(struct gendisk *disk,
  */
 /* Supports zoned block devices sequential write constraint */
 #define ELEVATOR_F_ZBD_SEQ_WRITE	(1U << 0)
-/* Supports scheduling on multiple hardware queues */
-#define ELEVATOR_F_MQ_AWARE		(1U << 1)
 
 extern void blk_queue_required_elevator_features(struct request_queue *q,
 						 unsigned int features);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V3 3/3] blk-mq: don't clear flush_rq from tags->rqs[]
  2022-06-16  1:43 [PATCH V3 0/3] blk-mq: three misc patches Ming Lei
  2022-06-16  1:43 ` [PATCH V3 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
  2022-06-16  1:44 ` [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-16  1:44 ` Ming Lei
  2022-06-16 20:46 ` [PATCH V3 0/3] blk-mq: three misc patches Jens Axboe
  3 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2022-06-16  1:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Ming Lei, Yu Kuai

commit 364b61818f65 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.

Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.

Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 112dce569192..992997f6acbd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3429,8 +3429,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (blk_mq_hw_queue_mapped(hctx))
 		blk_mq_tag_idle(hctx);
 
-	blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
-			set->queue_depth, flush_rq);
+	if (blk_queue_init_done(q))
+		blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx],
+				set->queue_depth, flush_rq);
 	if (set->ops->exit_request)
 		set->ops->exit_request(set, flush_rq, hctx_idx);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection
  2022-06-16  1:44 ` [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
@ 2022-06-16  6:15   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2022-06-16  6:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Christoph Hellwig, linux-block, Jan Kara

On Thu, Jun 16, 2022 at 09:44:00AM +0800, Ming Lei wrote:
> q->elevator is referred in blk_mq_has_sqsched() without any protection,
> no .q_usage_counter is held, no queue srcu and rcu read lock is held,
> so potential use-after-free may be triggered.
> 
> Fix the issue by adding one queue flag for checking if the elevator
> uses single queue style dispatch. Meantime the elevator feature flag
> of ELEVATOR_F_MQ_AWARE isn't needed any more.

I think clearing in common code would be safer, but this does work
as-is, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V3 0/3] blk-mq: three misc patches
  2022-06-16  1:43 [PATCH V3 0/3] blk-mq: three misc patches Ming Lei
                   ` (2 preceding siblings ...)
  2022-06-16  1:44 ` [PATCH V3 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
@ 2022-06-16 20:46 ` Jens Axboe
  3 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-06-16 20:46 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-block, Christoph Hellwig

On Thu, 16 Jun 2022 09:43:58 +0800, Ming Lei wrote:
> The 1st two patches make referring to q->elevator more reliable, and
> avoid potential use-after-free on q->elevator in two code paths.
> 
> The 3rd patch improves boot time for scsi host with lots of hw queues.
> 
> V3:
> 	- clear QUEUE_FLAG_SQ_SCHED for none & kyber explicitly
> 
> [...]

Applied, thanks!

[1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
      commit: 5fd7a84a09e640016fe106dd3e992f5210e23dc7
[2/3] blk-mq: avoid to touch q->elevator without any protection
      commit: 4d337cebcb1c27d9b48c48b9a98e939d4552d584
[3/3] blk-mq: don't clear flush_rq from tags->rqs[]
      commit: 6cfeadbff3f8905f2854735ebb88e581402c16c4

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-16 20:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-16  1:43 [PATCH V3 0/3] blk-mq: three misc patches Ming Lei
2022-06-16  1:43 ` [PATCH V3 1/3] blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none Ming Lei
2022-06-16  1:44 ` [PATCH V3 2/3] blk-mq: avoid to touch q->elevator without any protection Ming Lei
2022-06-16  6:15   ` Christoph Hellwig
2022-06-16  1:44 ` [PATCH V3 3/3] blk-mq: don't clear flush_rq from tags->rqs[] Ming Lei
2022-06-16 20:46 ` [PATCH V3 0/3] blk-mq: three misc patches Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.