All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: disable the elevator int del_gendisk
@ 2022-05-31 16:05 Christoph Hellwig
  2022-06-01  0:54 ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-05-31 16:05 UTC (permalink / raw)
  To: axboe; +Cc: ming.lei, linux-block, syzbot+3e3f419f4a7816471838

The elevator is only used for file system requests, which are stopped in
del_gendisk.  Move disabling the elevator and freeing the scheduler tags
to the end of del_gendisk instead of doing that work in disk_release and
blk_cleanup_queue to avoid a use after free on q->tag_set from
disk_release as the tag_set might not be alive at that point.

Move the blk_qos_exit call as well, as it just depends on the elevator
exit and would be the only reason to keep the not exactly cheap queue
freeze in disk_release.

Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")
Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
---
 block/blk-core.c | 13 -------------
 block/genhd.c    | 38 ++++++++++----------------------------
 2 files changed, 10 insertions(+), 41 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 80fa73c419a99..19cfa71e33728 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -322,19 +322,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		blk_mq_exit_queue(q);
 	}
 
-	/*
-	 * In theory, request pool of sched_tags belongs to request queue.
-	 * However, the current implementation requires tag_set for freeing
-	 * requests, so free the pool now.
-	 *
-	 * Queue has become frozen, there can't be any in-queue requests, so
-	 * it is safe to free requests now.
-	 */
-	mutex_lock(&q->sysfs_lock);
-	if (q->elevator)
-		blk_mq_sched_free_rqs(q);
-	mutex_unlock(&q->sysfs_lock);
-
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/genhd.c b/block/genhd.c
index 8ff5b187791af..9914d0f24fecd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -649,6 +649,16 @@ void del_gendisk(struct gendisk *disk)
 
 	blk_sync_queue(q);
 	blk_flush_integrity();
+	blk_mq_cancel_work_sync(q);
+
+	if (q->elevator) {
+		mutex_lock(&q->sysfs_lock);
+		elevator_exit(q);
+		mutex_unlock(&q->sysfs_lock);
+	}
+
+	rq_qos_exit(q);
+
 	/*
 	 * Allow using passthrough request again after the queue is torn down.
 	 */
@@ -1117,31 +1127,6 @@ static const struct attribute_group *disk_attr_groups[] = {
 	NULL
 };
 
-static void disk_release_mq(struct request_queue *q)
-{
-	blk_mq_cancel_work_sync(q);
-
-	/*
-	 * There can't be any non non-passthrough bios in flight here, but
-	 * requests stay around longer, including passthrough ones so we
-	 * still need to freeze the queue here.
-	 */
-	blk_mq_freeze_queue(q);
-
-	/*
-	 * Since the I/O scheduler exit code may access cgroup information,
-	 * perform I/O scheduler exit before disassociating from the block
-	 * cgroup controller.
-	 */
-	if (q->elevator) {
-		mutex_lock(&q->sysfs_lock);
-		elevator_exit(q);
-		mutex_unlock(&q->sysfs_lock);
-	}
-	rq_qos_exit(q);
-	__blk_mq_unfreeze_queue(q, true);
-}
-
 /**
  * disk_release - releases all allocated resources of the gendisk
  * @dev: the device representing this disk
@@ -1163,9 +1148,6 @@ static void disk_release(struct device *dev)
 	might_sleep();
 	WARN_ON_ONCE(disk_live(disk));
 
-	if (queue_is_mq(disk->queue))
-		disk_release_mq(disk->queue);
-
 	blkcg_exit_queue(disk->queue);
 
 	disk_release_events(disk);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-05-31 16:05 [PATCH] block: disable the elevator int del_gendisk Christoph Hellwig
@ 2022-06-01  0:54 ` Ming Lei
  2022-06-01  6:43   ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01  0:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838

On Tue, May 31, 2022 at 06:05:35PM +0200, Christoph Hellwig wrote:
> The elevator is only used for file system requests, which are stopped in
> del_gendisk.  Move disabling the elevator and freeing the scheduler tags
> to the end of del_gendisk instead of doing that work in disk_release and
> blk_cleanup_queue to avoid a use after free on q->tag_set from
> disk_release as the tag_set might not be alive at that point.
> 
> Move the blk_qos_exit call as well, as it just depends on the elevator
> exit and would be the only reason to keep the not exactly cheap queue
> freeze in disk_release.
> 
> Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")
> Reported-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: syzbot+3e3f419f4a7816471838@syzkaller.appspotmail.com
> ---
>  block/blk-core.c | 13 -------------
>  block/genhd.c    | 38 ++++++++++----------------------------
>  2 files changed, 10 insertions(+), 41 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 80fa73c419a99..19cfa71e33728 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -322,19 +322,6 @@ void blk_cleanup_queue(struct request_queue *q)
>  		blk_mq_exit_queue(q);
>  	}
>  
> -	/*
> -	 * In theory, request pool of sched_tags belongs to request queue.
> -	 * However, the current implementation requires tag_set for freeing
> -	 * requests, so free the pool now.
> -	 *
> -	 * Queue has become frozen, there can't be any in-queue requests, so
> -	 * it is safe to free requests now.
> -	 */
> -	mutex_lock(&q->sysfs_lock);
> -	if (q->elevator)
> -		blk_mq_sched_free_rqs(q);
> -	mutex_unlock(&q->sysfs_lock);
> -
>  	/* @q is and will stay empty, shutdown and put */
>  	blk_put_queue(q);
>  }
> diff --git a/block/genhd.c b/block/genhd.c
> index 8ff5b187791af..9914d0f24fecd 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -649,6 +649,16 @@ void del_gendisk(struct gendisk *disk)
>  
>  	blk_sync_queue(q);
>  	blk_flush_integrity();
> +	blk_mq_cancel_work_sync(q);
> +
> +	if (q->elevator) {
> +		mutex_lock(&q->sysfs_lock);
> +		elevator_exit(q);
> +		mutex_unlock(&q->sysfs_lock);
> +	}

This way can't be safe, who can guarantee that all sync submission
activities are gone after queue is frozen? We had lots of reports on
blk_mq_sched_has_work() which triggers UAF.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-06-01  0:54 ` Ming Lei
@ 2022-06-01  6:43   ` Christoph Hellwig
  2022-06-01  7:09     ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01  6:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838

On Wed, Jun 01, 2022 at 08:54:30AM +0800, Ming Lei wrote:
> This way can't be safe, who can guarantee that all sync submission
> activities are gone after queue is frozen? We had lots of reports on
> blk_mq_sched_has_work() which triggers UAF.

Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
patch below.  Do you have any good reproducer, though?

diff --git a/block/genhd.c b/block/genhd.c
index 9914d0f24fecd..155b64ff991f6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -652,9 +652,13 @@ void del_gendisk(struct gendisk *disk)
 	blk_mq_cancel_work_sync(q);
 
 	if (q->elevator) {
+		blk_mq_quiesce_queue(q);
+
 		mutex_lock(&q->sysfs_lock);
 		elevator_exit(q);
 		mutex_unlock(&q->sysfs_lock);
+
+		blk_mq_unquiesce_queue(q);
 	}
 
 	rq_qos_exit(q);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-06-01  6:43   ` Christoph Hellwig
@ 2022-06-01  7:09     ` Ming Lei
  2022-06-01  7:14       ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01  7:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838

On Wed, Jun 01, 2022 at 08:43:29AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 08:54:30AM +0800, Ming Lei wrote:
> > This way can't be safe, who can guarantee that all sync submission
> > activities are gone after queue is frozen? We had lots of reports on
> > blk_mq_sched_has_work() which triggers UAF.
> 
> Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> patch below.  Do you have any good reproducer, though?

blktests block/027 should cover this.

> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 9914d0f24fecd..155b64ff991f6 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -652,9 +652,13 @@ void del_gendisk(struct gendisk *disk)
>  	blk_mq_cancel_work_sync(q);
>  
>  	if (q->elevator) {
> +		blk_mq_quiesce_queue(q);
> +
>  		mutex_lock(&q->sysfs_lock);
>  		elevator_exit(q);
>  		mutex_unlock(&q->sysfs_lock);
> +
> +		blk_mq_unquiesce_queue(q);
>  	}
>  

I am afraid the above way may slow down disk shutdown a lot, see
the following commit, that is also the reason why I moved it into disk
release handler, when any sync io submission are done.

commit 1311326cf4755c7ffefd20f576144ecf46d9906b
Author: Ming Lei <ming.lei@redhat.com>
Date:   Mon Jun 25 19:31:49 2018 +0800

    blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()

    SCSI probing may synchronously create and destroy a lot of request_queues
    for non-existent devices. Any synchronize_rcu() in queue creation or
    destroy path may introduce long latency during booting, see detailed
    description in comment of blk_register_queue().

    This patch removes one synchronize_rcu() inside blk_cleanup_queue()
    for this case, commit c2856ae2f315d75(blk-mq: quiesce queue before freeing queue)
    needs synchronize_rcu() for implementing blk_mq_quiesce_queue(), but
    when queue isn't initialized, it isn't necessary to do that since
    only pass-through requests are involved, no original issue in
    scsi_execute() at all.

    Without this patch and previous one, it may take more 20+ seconds for
    virtio-scsi to complete disk probe. With the two patches, the time becomes
    less than 100ms.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-06-01  7:09     ` Ming Lei
@ 2022-06-01  7:14       ` Christoph Hellwig
  2022-06-01  9:07         ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01  7:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838

On Wed, Jun 01, 2022 at 03:09:26PM +0800, Ming Lei wrote:
> > Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> > patch below.  Do you have any good reproducer, though?
> 
> blktests block/027 should cover this.

That did not trigger the problem for me.

> >  	if (q->elevator) {
> > +		blk_mq_quiesce_queue(q);
> > +
> >  		mutex_lock(&q->sysfs_lock);
> >  		elevator_exit(q);
> >  		mutex_unlock(&q->sysfs_lock);
> > +
> > +		blk_mq_unquiesce_queue(q);
> >  	}
> >  
> 
> I am afraid the above way may slow down disk shutdown a lot, see
> the following commit, that is also the reason why I moved it into disk
> release handler, when any sync io submission are done.

SCSI devices that are just probed and never had a disk attached will
not have q->elevator set and not hit this quiesce at all.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-06-01  7:14       ` Christoph Hellwig
@ 2022-06-01  9:07         ` Ming Lei
  2022-06-01 12:01           ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Ming Lei @ 2022-06-01  9:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, syzbot+3e3f419f4a7816471838

On Wed, Jun 01, 2022 at 09:14:29AM +0200, Christoph Hellwig wrote:
> On Wed, Jun 01, 2022 at 03:09:26PM +0800, Ming Lei wrote:
> > > Yes, we probably need a blk_mq_quiesce_queue call like in the incremental
> > > patch below.  Do you have any good reproducer, though?
> > 
> > blktests block/027 should cover this.
> 
> That did not trigger the problem for me.

This kind of issue is often not 100% duplicated.

> 
> > >  	if (q->elevator) {
> > > +		blk_mq_quiesce_queue(q);
> > > +
> > >  		mutex_lock(&q->sysfs_lock);
> > >  		elevator_exit(q);
> > >  		mutex_unlock(&q->sysfs_lock);
> > > +
> > > +		blk_mq_unquiesce_queue(q);
> > >  	}
> > >  
> > 
> > I am afraid the above way may slow down disk shutdown a lot, see
> > the following commit, that is also the reason why I moved it into disk
> > release handler, when any sync io submission are done.
> 
> SCSI devices that are just probed and never had a disk attached will
> not have q->elevator set and not hit this quiesce at all.

Yes, but host with hundreds of real LUNs may be shutdown slowly too
since sd_remove() won't be called in async way.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block: disable the elevator int del_gendisk
  2022-06-01  9:07         ` Ming Lei
@ 2022-06-01 12:01           ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2022-06-01 12:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, axboe, linux-block, syzbot+3e3f419f4a7816471838

On Wed, Jun 01, 2022 at 05:07:53PM +0800, Ming Lei wrote:
> > > I am afraid the above way may slow down disk shutdown a lot, see
> > > the following commit, that is also the reason why I moved it into disk
> > > release handler, when any sync io submission are done.
> > 
> > SCSI devices that are just probed and never had a disk attached will
> > not have q->elevator set and not hit this quiesce at all.
> 
> Yes, but host with hundreds of real LUNs may be shutdown slowly too
> since sd_remove() won't be called in async way.

So maybe teardown will slow down.  But we fix a reproducable bug, and
do get the lifetimes right.  The sched request are enabled in add_disk
(or when a scheduler is enabled if there was none after that) so we
should tear it down in del_gendisk.  The fact that we've been playing
so lose with these lifetime rules in the block layer has been a constant
source of bugs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-01 12:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 16:05 [PATCH] block: disable the elevator int del_gendisk Christoph Hellwig
2022-06-01  0:54 ` Ming Lei
2022-06-01  6:43   ` Christoph Hellwig
2022-06-01  7:09     ` Ming Lei
2022-06-01  7:14       ` Christoph Hellwig
2022-06-01  9:07         ` Ming Lei
2022-06-01 12:01           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.