* [PATCH] block: disable the elevator int del_gendisk
@ 2022-05-31 16:05 Christoph Hellwig
2022-06-01 0:54 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-05-31 16:05 UTC (permalink / raw)
To: axboe; +Cc: ming.lei, linux-block, syzbot+3e3f419f4a7816471838
The elevator is only used for file system requests, which are stopped in
del_gendisk. Move disabling the elevator and freeing the scheduler tags
to the end of del_gendisk instead of doing that work in disk_release and
blk_cleanup_queue to avoid a use after free on q->tag_set from
disk_release as the tag_set might not be alive at that point.
Move the blk_qos_exit call as well, as it just depends on the elevator
exit and would be the only reason to keep the not exactly cheap queue
freeze in disk_release.
Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")
Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
---
block/blk-core.c | 13 -------------
block/genhd.c | 38 ++++++++++----------------------------
2 files changed, 10 insertions(+), 41 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 80fa73c419a99..19cfa71e33728 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -322,19 +322,6 @@ void blk_cleanup_queue(struct request_queue *q)
blk_mq_exit_queue(q);
}
- /*
- * In theory, request pool of sched_tags belongs to request queue.
- * However, the current implementation requires tag_set for freeing
- * requests, so free the pool now.
- *
- * Queue has become frozen, there can't be any in-queue requests, so
- * it is safe to free requests now.
- */
- mutex_lock(&q->sysfs_lock);
- if (q->elevator)
- blk_mq_sched_free_rqs(q);
- mutex_unlock(&q->sysfs_lock);
-
/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
}
diff --git a/block/genhd.c b/block/genhd.c
index 8ff5b187791af..9914d0f24fecd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -649,6 +649,16 @@ void del_gendisk(struct gendisk *disk)
blk_sync_queue(q);
blk_flush_integrity();
+ blk_mq_cancel_work_sync(q);
+
+ if (q->elevator) {
+ mutex_lock(&q->sysfs_lock);
+ elevator_exit(q);
+ mutex_unlock(&q->sysfs_lock);
+ }
+
+ rq_qos_exit(q);
+
/*
* Allow using passthrough request again after the queue is torn down.
*/
@@ -1117,31 +1127,6 @@ static const struct attribute_group *disk_attr_groups[] = {
NULL
};
-static void disk_release_mq(struct request_queue *q)
-{
- blk_mq_cancel_work_sync(q);
-
- /*
- * There can't be any non non-passthrough bios in flight here, but
- * requests stay around longer, including passthrough ones so we
- * still need to freeze the queue here.
- */
- blk_mq_freeze_queue(q);
-
- /*
- * Since the I/O scheduler exit code may access cgroup information,
- * perform I/O scheduler exit before disassociating from the block
- * cgroup controller.
- */
- if (q->elevator) {
- mutex_lock(&q->sysfs_lock);
- elevator_exit(q);
- mutex_unlock(&q->sysfs_lock);
- }
- rq_qos_exit(q);
- __blk_mq_unfreeze_queue(q, true);
-}
-
/**
* disk_release - releases all allocated resources of the gendisk
* @dev: the device representing this disk
@@ -1163,9 +1148,6 @@ static void disk_release(struct device *dev)
might_sleep();
WARN_ON_ONCE(disk_live(disk));
- if (queue_is_mq(disk->queue))
- disk_release_mq(disk->queue);
-
blkcg_exit_queue(disk->queue);
disk_release_events(disk);
--
2.30.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-05-31 16:05 [PATCH] block: disable the elevator int del_gendisk Christoph Hellwig
@ 2022-06-01 0:54 ` Ming Lei
2022-06-01 6:43 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01 0:54 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838
On Tue, May 31, 2022 at 06:05:35PM +0200, Christoph Hellwig wrote:
> The elevator is only used for file system requests, which are stopped in
> del_gendisk. Move disabling the elevator and freeing the scheduler tags
> to the end of del_gendisk instead of doing that work in disk_release and
> blk_cleanup_queue to avoid a use after free on q->tag_set from
> disk_release as the tag_set might not be alive at that point.
>
> Move the blk_qos_exit call as well, as it just depends on the elevator
> exit and would be the only reason to keep the not exactly cheap queue
> freeze in disk_release.
>
> Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")
> Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
> ---
> block/blk-core.c | 13 -------------
> block/genhd.c | 38 ++++++++++----------------------------
> 2 files changed, 10 insertions(+), 41 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 80fa73c419a99..19cfa71e33728 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -322,19 +322,6 @@ void blk_cleanup_queue(struct request_queue *q)
> blk_mq_exit_queue(q);
> }
>
> - /*
> - * In theory, request pool of sched_tags belongs to request queue.
> - * However, the current implementation requires tag_set for freeing
> - * requests, so free the pool now.
> - *
> - * Queue has become frozen, there can't be any in-queue requests, so
> - * it is safe to free requests now.
> - */
> - mutex_lock(&q->sysfs_lock);
> - if (q->elevator)
> - blk_mq_sched_free_rqs(q);
> - mutex_unlock(&q->sysfs_lock);
> -
> /* @q is and will stay empty, shutdown and put */
> blk_put_queue(q);
> }
> diff --git a/block/genhd.c b/block/genhd.c
> index 8ff5b187791af..9914d0f24fecd 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -649,6 +649,16 @@ void del_gendisk(struct gendisk *disk)
>
> blk_sync_queue(q);
> blk_flush_integrity();
> + blk_mq_cancel_work_sync(q);
> +
> + if (q->elevator) {
> + mutex_lock(&q->sysfs_lock);
> + elevator_exit(q);
> + mutex_unlock(&q->sysfs_lock);
> + }
This way can't be safe, who can guarantee that all sync submission
activities are gone after queue is frozen? We had lots of reports on
blk_mq_sched_has_work() which triggers UAF.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-06-01 0:54 ` Ming Lei
@ 2022-06-01 6:43 ` Christoph Hellwig
2022-06-01 7:09 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01 6:43 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838
On Wed, Jun 01, 2022 at 08:54:30AM +0800, Ming Lei wrote:
> This way can't be safe, who can guarantee that all sync submission
> activities are gone after queue is frozen? We had lots of reports on
> blk_mq_sched_has_work() which triggers UAF.
Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
patch below. Do you have any good reproducer, though?
diff --git a/block/genhd.c b/block/genhd.c
index 9914d0f24fecd..155b64ff991f6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -652,9 +652,13 @@ void del_gendisk(struct gendisk *disk)
blk_mq_cancel_work_sync(q);
if (q->elevator) {
+ blk_mq_quiesce_queue(q);
+
mutex_lock(&q->sysfs_lock);
elevator_exit(q);
mutex_unlock(&q->sysfs_lock);
+
+ blk_mq_unquiesce_queue(q);
}
rq_qos_exit(q);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-06-01 6:43 ` Christoph Hellwig
@ 2022-06-01 7:09 ` Ming Lei
2022-06-01 7:14 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01 7:09 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838
On Wed, Jun 01, 2022 at 08:43:29AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 08:54:30AM +0800, Ming Lei wrote:
> > This way can't be safe, who can guarantee that all sync submission
> > activities are gone after queue is frozen? We had lots of reports on
> > blk_mq_sched_has_work() which triggers UAF.
>
> Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> patch below. Do you have any good reproducer, though?
blktests block/027 should cover this.
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 9914d0f24fecd..155b64ff991f6 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -652,9 +652,13 @@ void del_gendisk(struct gendisk *disk)
> blk_mq_cancel_work_sync(q);
>
> if (q->elevator) {
> + blk_mq_quiesce_queue(q);
> +
> mutex_lock(&q->sysfs_lock);
> elevator_exit(q);
> mutex_unlock(&q->sysfs_lock);
> +
> + blk_mq_unquiesce_queue(q);
> }
>
I am afraid the above way may slow down disk shutdown a lot, see
the following commit, that is also the reason why I moved it into disk
release handler, when any sync io submission are done.
commit 1311326cf4755c7ffefd20f576144ecf46d9906b
Author: Ming Lei <ming.lei@redhat.com>
Date: Mon Jun 25 19:31:49 2018 +0800
blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()
SCSI probing may synchronously create and destroy a lot of request_queues
for non-existent devices. Any synchronize_rcu() in queue creation or
destroy path may introduce long latency during booting, see detailed
description in comment of blk_register_queue().
This patch removes one synchronize_rcu() inside blk_cleanup_queue()
for this case, commit c2856ae2f315d75(blk-mq: quiesce queue before freeing queue)
needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but
when queue isn't initialized, it isn't necessary to do that since
only pass-through requests are involved, no original issue in
scsi_execute() at all.
Without this patch and previous one, it may take more 20+ seconds for
virtio-scsi to complete disk probe. With the two patches, the time becomes
less than 100ms.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-06-01 7:09 ` Ming Lei
@ 2022-06-01 7:14 ` Christoph Hellwig
2022-06-01 9:07 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01 7:14 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838
On Wed, Jun 01, 2022 at 03:09:26PM +0800, Ming Lei wrote:
> > Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> > patch below. Do you have any good reproducer, though?
>
> blktests block/027 should cover this.
That did not trigger the problem for me.
> > if (q->elevator) {
> > + blk_mq_quiesce_queue(q);
> > +
> > mutex_lock(&q->sysfs_lock);
> > elevator_exit(q);
> > mutex_unlock(&q->sysfs_lock);
> > +
> > + blk_mq_unquiesce_queue(q);
> > }
> >
>
> I am afraid the above way may slow down disk shutdown a lot, see
> the following commit, that is also the reason why I moved it into disk
> release handler, when any sync io submission are done.
SCSI devices that are just probed and never had a disk attached will
not have q->elevator set and not hit this quiesce at all.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-06-01 7:14 ` Christoph Hellwig
@ 2022-06-01 9:07 ` Ming Lei
2022-06-01 12:01 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01 9:07 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838
On Wed, Jun 01, 2022 at 09:14:29AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 03:09:26PM +0800, Ming Lei wrote:
> > > Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> > > patch below. Do you have any good reproducer, though?
> >
> > blktests block/027 should cover this.
>
> That did not trigger the problem for me.
This kind of issue is often not 100% duplicated.
>
> > > if (q->elevator) {
> > > + blk_mq_quiesce_queue(q);
> > > +
> > > mutex_lock(&q->sysfs_lock);
> > > elevator_exit(q);
> > > mutex_unlock(&q->sysfs_lock);
> > > +
> > > + blk_mq_unquiesce_queue(q);
> > > }
> > >
> >
> > I am afraid the above way may slow down disk shutdown a lot, see
> > the following commit, that is also the reason why I moved it into disk
> > release handler, when any sync io submission are done.
>
> SCSI devices that are just probed and never had a disk attached will
> not have q->elevator set and not hit this quiesce at all.
Yes, but host with hundreds of real LUNs may be shutdown slowly too
since sd_remove() won't be called in async way.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] block: disable the elevator int del_gendisk
2022-06-01 9:07 ` Ming Lei
@ 2022-06-01 12:01 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01 12:01 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838
On Wed, Jun 01, 2022 at 05:07:53PM +0800, Ming Lei wrote:
> > > I am afraid the above way may slow down disk shutdown a lot, see
> > > the following commit, that is also the reason why I moved it into disk
> > > release handler, when any sync io submission are done.
> >
> > SCSI devices that are just probed and never had a disk attached will
> > not have q->elevator set and not hit this quiesce at all.
>
> Yes, but host with hundreds of real LUNs may be shutdown slowly too
> since sd_remove() won't be called in async way.
So maybe teardown will slow down. But we fix a reproducable bug, and
do get the lifetimes right. The sched request are enabled in add_disk
(or when a scheduler is enabled if there was none after that) so we
should tear it down in del_gendisk. The fact that we've been playing
so lose with these lifetime rules in the block layer has been a constant
source of bugs.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-06-01 12:01 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 16:05 [PATCH] block: disable the elevator int del_gendisk Christoph Hellwig
2022-06-01 0:54 ` Ming Lei
2022-06-01 6:43 ` Christoph Hellwig
2022-06-01 7:09 ` Ming Lei
2022-06-01 7:14 ` Christoph Hellwig
2022-06-01 9:07 ` Ming Lei
2022-06-01 12:01 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.