linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi: core: cleanup request queue before releasing gendisk
@ 2021-09-15  9:25 Ming Lei
  2021-09-15 13:40 ` Christoph Hellwig
  2021-09-15 13:40 ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Ming Lei @ 2021-09-15  9:25 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi
  Cc: Ming Lei, Christoph Hellwig, Sven Schnelle

gendisk instance has to be released after request queue is cleaned up
because bdi is referred from gendisk since commit edb0872f44ec ("block:
move the bdi from the request_queue to the gendisk").

For sd and sr, gendisk can be removed in the release handler(sd_remove/
sr_remove) of sdev->sdev_gendev, which is triggered in device_del(sdev->sdev_gendev)
in __scsi_remove_device(), when the request queue isn't cleaned up yet.

So kernel oops could be triggered when referring bdi via gendisk.

Fix the issue by moving blk_cleanup_queue() into sd_remove() and
sr_remove().

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_priv.h   | 6 ++++++
 drivers/scsi/scsi_sysfs.c  | 3 ++-
 drivers/scsi/sd.c          | 2 ++
 drivers/scsi/sr.c          | 4 +++-
 include/scsi/scsi_device.h | 1 +
 5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 6d9152031a40..6aef88d772a3 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -145,6 +145,12 @@ extern void __scsi_remove_device(struct scsi_device *);
 extern struct bus_type scsi_bus_type;
 extern const struct attribute_group *scsi_sysfs_shost_attr_groups[];
 
+static inline void scsi_sysfs_cleanup_sdev(struct scsi_device *sdev)
+{
+	sdev->request_queue_cleaned = true;
+	blk_cleanup_queue(sdev->request_queue);
+}
+
 /* scsi_netlink.c */
 #ifdef CONFIG_SCSI_NETLINK
 extern void scsi_netlink_init(void);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 86793259e541..9d54fc9c89ea 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1463,7 +1463,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 	scsi_device_set_state(sdev, SDEV_DEL);
 	mutex_unlock(&sdev->state_mutex);
 
-	blk_cleanup_queue(sdev->request_queue);
+	if (!sdev->request_queue_cleaned)
+		blk_cleanup_queue(sdev->request_queue);
 	cancel_work_sync(&sdev->requeue_work);
 
 	if (sdev->host->hostt->slave_destroy)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cbd9999f93a6..8132f9833488 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3498,6 +3498,8 @@ static int sd_remove(struct device *dev)
 	del_gendisk(sdkp->disk);
 	sd_shutdown(dev);
 
+	scsi_sysfs_cleanup_sdev(to_scsi_device(dev));
+
 	free_opal_dev(sdkp->opal_dev);
 
 	mutex_lock(&sd_ref_mutex);
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 8b17b35283aa..ef85940f3168 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -64,7 +64,7 @@
 
 #include "scsi_logging.h"
 #include "sr.h"
-
+#include "scsi_priv.h"
 
 MODULE_DESCRIPTION("SCSI cdrom (sr) driver");
 MODULE_LICENSE("GPL");
@@ -1045,6 +1045,8 @@ static int sr_remove(struct device *dev)
 	del_gendisk(cd->disk);
 	dev_set_drvdata(dev, NULL);
 
+	scsi_sysfs_cleanup_sdev(to_scsi_device(dev));
+
 	mutex_lock(&sr_ref_mutex);
 	kref_put(&cd->kref, sr_kref_release);
 	mutex_unlock(&sr_ref_mutex);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 09a17f6e93a7..d102bc2c423b 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -207,6 +207,7 @@ struct scsi_device {
 	unsigned rpm_autosuspend:1;	/* Enable runtime autosuspend at device
 					 * creation time */
 	unsigned ignore_media_change:1; /* Ignore MEDIA CHANGE on resume */
+	unsigned request_queue_cleaned:1; /* Request queue has been cleaned up */
 
 	bool offline_already;		/* Device offline message logged */
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-15  9:25 [PATCH] scsi: core: cleanup request queue before releasing gendisk Ming Lei
@ 2021-09-15 13:40 ` Christoph Hellwig
  2021-09-16  1:36   ` Ming Lei
  2021-09-15 13:40 ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-15 13:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi,
	Christoph Hellwig, Sven Schnelle

On Wed, Sep 15, 2021 at 05:25:47PM +0800, Ming Lei wrote:
> gendisk instance has to be released after request queue is cleaned up
> because bdi is referred from gendisk since commit edb0872f44ec ("block:
> move the bdi from the request_queue to the gendisk").
> 
> For sd and sr, gendisk can be removed in the release handler(sd_remove/
> sr_remove) of sdev->sdev_gendev, which is triggered in device_del(sdev->sdev_gendev)
> in __scsi_remove_device(), when the request queue isn't cleaned up yet.
> 
> So kernel oops could be triggered when referring bdi via gendisk.
> 
> Fix the issue by moving blk_cleanup_queue() into sd_remove() and
> sr_remove().

This looks like a bit of a bandaid to me.  I think the proper fix
is to move the parts of blk_cleanup_queue that need a disk or bdi
to del_gendisk.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-15  9:25 [PATCH] scsi: core: cleanup request queue before releasing gendisk Ming Lei
  2021-09-15 13:40 ` Christoph Hellwig
@ 2021-09-15 13:40 ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-15 13:40 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi,
	Christoph Hellwig, Sven Schnelle

Btw, do you have a good reproducer for issues in this area?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-15 13:40 ` Christoph Hellwig
@ 2021-09-16  1:36   ` Ming Lei
  2021-09-16 10:14     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-16  1:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Wed, Sep 15, 2021 at 03:40:08PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 15, 2021 at 05:25:47PM +0800, Ming Lei wrote:
> > gendisk instance has to be released after request queue is cleaned up
> > because bdi is referred from gendisk since commit edb0872f44ec ("block:
> > move the bdi from the request_queue to the gendisk").
> > 
> > For sd and sr, gendisk can be removed in the release handler(sd_remove/
> > sr_remove) of sdev->sdev_gendev, which is triggered in device_del(sdev->sdev_gendev)
> > in __scsi_remove_device(), when the request queue isn't cleaned up yet.
> > 
> > So kernel oops could be triggered when referring bdi via gendisk.
> > 
> > Fix the issue by moving blk_cleanup_queue() into sd_remove() and
> > sr_remove().
> 
> This looks like a bit of a bandaid to me.  I think the proper fix
> is to move the parts of blk_cleanup_queue that need a disk or bdi
> to del_gendisk.

From correctness viewpoint, we need to call blk_cleanup_queue
before releasing gendisk and after del_gendisk(). Now you have invented
blk_cleanup_disk(), do you plan to do the three in one helper? :-)

We don't have to put del_gendisk & blk_cleanup_queue together,
and it may cause other trouble at least for scsi disk since sd_shutdown()
follows del_gendisk() and has to be called before blk_cleanup_queue().

BTW, you asked the reproducer of the issue, I just observed the issue
one or two time when running blktests block/009, but my scsi lifetime
bpftrace script does show that gendisk is released before blk_cleanup_queue().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-16  1:36   ` Ming Lei
@ 2021-09-16 10:14     ` Christoph Hellwig
  2021-09-16 12:38       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-16 10:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen,
	linux-scsi, Sven Schnelle

On Thu, Sep 16, 2021 at 09:36:23AM +0800, Ming Lei wrote:
> >From correctness viewpoint, we need to call blk_cleanup_queue
> before releasing gendisk and after del_gendisk(). Now you have invented
> blk_cleanup_disk(), do you plan to do the three in one helper? :-)

No.  In retrospective blk_cleanup_disk wan't the best idea for a few
reasons.  But at least it consolidated some of the code.

> We don't have to put del_gendisk & blk_cleanup_queue together,

I don't want all of it together.  The important thing is that we have
two different concepts:

 - the gendisk is required to do file system style I/O
 - a standalone request_queue can be used for passthrough I/O.

> and it may cause other trouble at least for scsi disk since sd_shutdown()
> follows del_gendisk() and has to be called before blk_cleanup_queue().

Yes.  So we need to move the bits of blk_cleanup_queue that deal with
the file system I/O state to del_gendisk, and keep blk_cleanup_queue
for anything actually needed for the low-level queue.

To take SCSI as the example.  We can unload the sd/sr drivers and the
queue needs to still be around and work for use with the sg driver.

> BTW, you asked the reproducer of the issue, I just observed the issue
> one or two time when running blktests block/009, but my scsi lifetime
> bpftrace script does show that gendisk is released before blk_cleanup_queue().

Interesting.  What were the symptoms in this case?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-16 10:14     ` Christoph Hellwig
@ 2021-09-16 12:38       ` Ming Lei
  2021-09-16 14:20         ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-16 12:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Thu, Sep 16, 2021 at 12:14:51PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 16, 2021 at 09:36:23AM +0800, Ming Lei wrote:
> > >From correctness viewpoint, we need to call blk_cleanup_queue
> > before releasing gendisk and after del_gendisk(). Now you have invented
> > blk_cleanup_disk(), do you plan to do the three in one helper? :-)
> 
> No.  In retrospective blk_cleanup_disk wan't the best idea for a few
> reasons.  But at least it consolidated some of the code.
> 
> > We don't have to put del_gendisk & blk_cleanup_queue together,
> 
> I don't want all of it together.  The important thing is that we have
> two different concepts:
> 
>  - the gendisk is required to do file system style I/O
>  - a standalone request_queue can be used for passthrough I/O.

request_queue is also abstract in block I/O's implementation, which
can be thought as one lower level concept of gendisk too, IMO.

> 
> > and it may cause other trouble at least for scsi disk since sd_shutdown()
> > follows del_gendisk() and has to be called before blk_cleanup_queue().
> 
> Yes.  So we need to move the bits of blk_cleanup_queue that deal with
> the file system I/O state to del_gendisk, and keep blk_cleanup_queue
> for anything actually needed for the low-level queue.

Can you explain what the bits are in blk_cleanup_queue() for dealing with FS
I/O state? blk_cleanup_queue() drains and shutdown the queue basically,
all shouldn't be related with gendisk, and it is fine to implement one
queue without gendisk involved, such as nvme admin, connect queue or
sort of stuff.

Wrt. this reported issue, rq_qos_exit() needs to run before releasing
gendisk, but queue has to put into freezing before calling
rq_qos_exit(), so looks you suggest to move the following code into
del_gendisk()?


        WARN_ON_ONCE(blk_queue_registered(q));

        /* mark @q DYING, no new request or merges will be allowed afterwards */
        blk_set_queue_dying(q);

        blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q);
        blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q);

        /*
         * Drain all requests queued before DYING marking. Set DEAD flag to
         * prevent that blk_mq_run_hw_queues() accesses the hardware queues
         * after draining finished.
         */
        blk_freeze_queue(q);

        rq_qos_exit(q);

If we move the above into del_gendisk(), some corner cases have to be
taken care of, such as request queue without disk involved.

> 
> To take SCSI as the example.  We can unload the sd/sr drivers and the
> queue needs to still be around and work for use with the sg driver.
> 
> > BTW, you asked the reproducer of the issue, I just observed the issue
> > one or two time when running blktests block/009, but my scsi lifetime
> > bpftrace script does show that gendisk is released before blk_cleanup_queue().
> 
> Interesting.  What were the symptoms in this case?

It is same with recent report of 'general protection fault in wb_timer_fn'.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-16 12:38       ` Ming Lei
@ 2021-09-16 14:20         ` Christoph Hellwig
  2021-09-17  3:39           ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-16 14:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen,
	linux-scsi, Sven Schnelle

On Thu, Sep 16, 2021 at 08:38:16PM +0800, Ming Lei wrote:
> > > and it may cause other trouble at least for scsi disk since sd_shutdown()
> > > follows del_gendisk() and has to be called before blk_cleanup_queue().
> > 
> > Yes.  So we need to move the bits of blk_cleanup_queue that deal with
> > the file system I/O state to del_gendisk, and keep blk_cleanup_queue
> > for anything actually needed for the low-level queue.
> 
> Can you explain what the bits are in blk_cleanup_queue() for dealing with FS
> I/O state? blk_cleanup_queue() drains and shutdown the queue basically,
> all shouldn't be related with gendisk, and it is fine to implement one
> queue without gendisk involved, such as nvme admin, connect queue or
> sort of stuff.
> 
> Wrt. this reported issue, rq_qos_exit() needs to run before releasing
> gendisk, but queue has to put into freezing before calling
> rq_qos_exit(),

I was curious what you hit, but yes rq_qos_exit is obvious.
blk_flush_integrity also is very much about fs I/O state.



> so looks you suggest to move the following code into
> del_gendisk()?

something like that.  I think we need to split the dying flag into
one for the gendisk and one for the queue first, and make sure the
queue freeze in del_gendisk is released again so that passthrough
still works after.

> If we move the above into del_gendisk(), some corner cases have to be
> taken care of, such as request queue without disk involved.

Yes.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-16 14:20         ` Christoph Hellwig
@ 2021-09-17  3:39           ` Ming Lei
  2021-09-17  6:53             ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-17  3:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Thu, Sep 16, 2021 at 04:20:09PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 16, 2021 at 08:38:16PM +0800, Ming Lei wrote:
> > > > and it may cause other trouble at least for scsi disk since sd_shutdown()
> > > > follows del_gendisk() and has to be called before blk_cleanup_queue().
> > > 
> > > Yes.  So we need to move the bits of blk_cleanup_queue that deal with
> > > the file system I/O state to del_gendisk, and keep blk_cleanup_queue
> > > for anything actually needed for the low-level queue.
> > 
> > Can you explain what the bits are in blk_cleanup_queue() for dealing with FS
> > I/O state? blk_cleanup_queue() drains and shutdown the queue basically,
> > all shouldn't be related with gendisk, and it is fine to implement one
> > queue without gendisk involved, such as nvme admin, connect queue or
> > sort of stuff.
> > 
> > Wrt. this reported issue, rq_qos_exit() needs to run before releasing
> > gendisk, but queue has to put into freezing before calling
> > rq_qos_exit(),
> 
> I was curious what you hit, but yes rq_qos_exit is obvious.
> blk_flush_integrity also is very much about fs I/O state.
> 
> 
> 
> > so looks you suggest to move the following code into
> > del_gendisk()?
> 
> something like that.  I think we need to split the dying flag into
> one for the gendisk and one for the queue first, and make sure the
> queue freeze in del_gendisk is released again so that passthrough
> still works after.

If we do that, q->disk is really unnecessary, so looks the fix of
'd152c682f03c block: add an explicit ->disk backpointer to the request_queue'
isn't good. The original issue added in 'edb0872f44ec block: move the
bdi from the request_queue to the gendisk' can be fixed simply by moving
the two lines code in blk_unregister_queue() to blk_cleanup_queue():

        kobject_uevent(&q->kobj, KOBJ_REMOVE);
        kobject_del(&q->kobj);


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17  3:39           ` Ming Lei
@ 2021-09-17  6:53             ` Christoph Hellwig
  2021-09-17  7:41               ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-17  6:53 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen,
	linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 11:39:48AM +0800, Ming Lei wrote:
> If we do that, q->disk is really unnecessary, so looks the fix of
> 'd152c682f03c block: add an explicit ->disk backpointer to the request_queue'
> isn't good.
> The original issue added in 'edb0872f44ec block: move the
> bdi from the request_queue to the gendisk' can be fixed simply by moving
> the two lines code in blk_unregister_queue() to blk_cleanup_queue():
> 
>         kobject_uevent(&q->kobj, KOBJ_REMOVE);
>         kobject_del(&q->kobj);

Well, it is still useful to not do the device model dance, especially as
uses of queue->disk will grow in 5.16.

Anyway, can you give this patch a spin?  I've done some basic testing
including nvme surprise removal locally.  Note that just like
blk_cleanup_queue this doesn't actually wait for the freeze to finish.
Eventually we should do that, but I'm sure this will show tons of issues
with drivers not properly cleaning up.

diff --git a/block/blk-core.c b/block/blk-core.c
index 5454db2fa263b..18287c443ae81 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -49,7 +49,6 @@
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
-#include "blk-rq-qos.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -385,13 +384,8 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
-	rq_qos_exit(q);
-
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
-	/* for synchronous bio-based driver finish in-flight integrity i/o */
-	blk_flush_integrity();
-
 	blk_sync_queue(q);
 	if (queue_is_mq(q))
 		blk_mq_exit_queue(q);
@@ -470,19 +464,26 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 static inline int bio_queue_enter(struct bio *bio)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct gendisk *disk = bio->bi_bdev->bd_disk;
 	bool nowait = bio->bi_opf & REQ_NOWAIT;
 	int ret;
 
-	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
+	ret = blk_queue_enter(disk->queue, nowait ? BLK_MQ_REQ_NOWAIT : 0);
 	if (unlikely(ret)) {
-		if (nowait && !blk_queue_dying(q))
+		if (nowait && !blk_queue_dying(disk->queue))
 			bio_wouldblock_error(bio);
 		else
 			bio_io_error(bio);
+		return ret;
 	}
 
-	return ret;
+	if (unlikely(!disk_live(disk))) {
+		blk_queue_exit(disk->queue);
+		bio_io_error(bio);
+		return -ENODEV;
+	}
+
+	return 0;
 }
 
 void blk_queue_exit(struct request_queue *q)
diff --git a/block/genhd.c b/block/genhd.c
index 7b6e5e1cf9564..8abe12dca76f2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,7 @@
 #include <linux/badblocks.h>
 
 #include "blk.h"
+#include "blk-rq-qos.h"
 
 static struct kobject *block_depr;
 
@@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void del_gendisk(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
+
 	might_sleep();
 
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
@@ -572,6 +575,21 @@ void del_gendisk(struct gendisk *disk)
 	blk_drop_partitions(disk);
 	mutex_unlock(&disk->open_mutex);
 
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_freeze_queue_start(q);
+	if (queue_is_mq(q))
+		blk_mq_wake_waiters(q);
+	/* Make blk_queue_enter() reexamine the DYING flag. */
+	wake_up_all(&q->mq_freeze_wq);
+
+	rq_qos_exit(q);
+	blk_sync_queue(q);
+	blk_flush_integrity();
+	/* allow using passthrough request again after the queue is torn down */
+	blk_mq_unfreeze_queue(q);
+
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17  6:53             ` Christoph Hellwig
@ 2021-09-17  7:41               ` Ming Lei
  2021-09-17  7:56                 ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-17  7:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 08:53:05AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 17, 2021 at 11:39:48AM +0800, Ming Lei wrote:
> > If we do that, q->disk is really unnecessary, so looks the fix of
> > 'd152c682f03c block: add an explicit ->disk backpointer to the request_queue'
> > isn't good.
> > The original issue added in 'edb0872f44ec block: move the
> > bdi from the request_queue to the gendisk' can be fixed simply by moving
> > the two lines code in blk_unregister_queue() to blk_cleanup_queue():
> > 
> >         kobject_uevent(&q->kobj, KOBJ_REMOVE);
> >         kobject_del(&q->kobj);
> 
> Well, it is still useful to not do the device model dance, especially as
> uses of queue->disk will grow in 5.16.

It isn't an device model dance, since 

> 
> Anyway, can you give this patch a spin?  I've done some basic testing
> including nvme surprise removal locally.  Note that just like
> blk_cleanup_queue this doesn't actually wait for the freeze to finish.
> Eventually we should do that, but I'm sure this will show tons of issues
> with drivers not properly cleaning up.
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5454db2fa263b..18287c443ae81 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -49,7 +49,6 @@
>  #include "blk-mq.h"
>  #include "blk-mq-sched.h"
>  #include "blk-pm.h"
> -#include "blk-rq-qos.h"
>  
>  struct dentry *blk_debugfs_root;
>  
> @@ -385,13 +384,8 @@ void blk_cleanup_queue(struct request_queue *q)
>  	 */
>  	blk_freeze_queue(q);
>  
> -	rq_qos_exit(q);
> -
>  	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>  
> -	/* for synchronous bio-based driver finish in-flight integrity i/o */
> -	blk_flush_integrity();
> -
>  	blk_sync_queue(q);
>  	if (queue_is_mq(q))
>  		blk_mq_exit_queue(q);
> @@ -470,19 +464,26 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  
>  static inline int bio_queue_enter(struct bio *bio)
>  {
> -	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +	struct gendisk *disk = bio->bi_bdev->bd_disk;
>  	bool nowait = bio->bi_opf & REQ_NOWAIT;
>  	int ret;
>  
> -	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
> +	ret = blk_queue_enter(disk->queue, nowait ? BLK_MQ_REQ_NOWAIT : 0);
>  	if (unlikely(ret)) {
> -		if (nowait && !blk_queue_dying(q))
> +		if (nowait && !blk_queue_dying(disk->queue))
>  			bio_wouldblock_error(bio);
>  		else
>  			bio_io_error(bio);
> +		return ret;
>  	}
>  
> -	return ret;
> +	if (unlikely(!disk_live(disk))) {
> +		blk_queue_exit(disk->queue);
> +		bio_io_error(bio);
> +		return -ENODEV;
> +	}

Is it safe to fail IO in this way? There is still opened bdev, and
usually IO can be done successfully even when disk is being deleted.

Not mention it adds one extra check in real fast path.

> +
> +	return 0;
>  }
>  
>  void blk_queue_exit(struct request_queue *q)
> diff --git a/block/genhd.c b/block/genhd.c
> index 7b6e5e1cf9564..8abe12dca76f2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -26,6 +26,7 @@
>  #include <linux/badblocks.h>
>  
>  #include "blk.h"
> +#include "blk-rq-qos.h"
>  
>  static struct kobject *block_depr;
>  
> @@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
>   */
>  void del_gendisk(struct gendisk *disk)
>  {
> +	struct request_queue *q = disk->queue;
> +
>  	might_sleep();
>  
>  	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
> @@ -572,6 +575,21 @@ void del_gendisk(struct gendisk *disk)
>  	blk_drop_partitions(disk);
>  	mutex_unlock(&disk->open_mutex);
>  
> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_freeze_queue_start(q);
> +	if (queue_is_mq(q))
> +		blk_mq_wake_waiters(q);
> +	/* Make blk_queue_enter() reexamine the DYING flag. */
> +	wake_up_all(&q->mq_freeze_wq);
> +
> +	rq_qos_exit(q);

rq_qos_exit() requires the queue to be frozen, otherwise kernel oops
can be triggered. There may be old submitted bios not completed yet,
and rq_qos_done() can be referring to q->rq_qos.

But if waiting for queue frozen is added, one extra freeze is added,
which will slow done remove disk/queue.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17  7:41               ` Ming Lei
@ 2021-09-17  7:56                 ` Christoph Hellwig
  2021-09-17  8:32                   ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-17  7:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen,
	linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 03:41:05PM +0800, Ming Lei wrote:
> >  
> > -	return ret;
> > +	if (unlikely(!disk_live(disk))) {
> > +		blk_queue_exit(disk->queue);
> > +		bio_io_error(bio);
> > +		return -ENODEV;
> > +	}
> 
> Is it safe to fail IO in this way? There is still opened bdev, and
> usually IO can be done successfully even when disk is being deleted.

"normal" I/O should not really happen by the time it is deleted.  That
being said we should do this only after the fsync is done. While no
one should rely on that I'm pretty sure some file systems do.
So we'll actually need a deleted flag.

> Not mention it adds one extra check in real fast path.

I'm not really woried about the check itself.  That being
sais this inode cache line is not hot right now, so moving it to
disk->state will help as we need to check the read-only flag in
the the I/O submission path anyway.

> > +	/*
> > +	 * Prevent new I/O from crossing bio_queue_enter().
> > +	 */
> > +	blk_freeze_queue_start(q);
> > +	if (queue_is_mq(q))
> > +		blk_mq_wake_waiters(q);
> > +	/* Make blk_queue_enter() reexamine the DYING flag. */
> > +	wake_up_all(&q->mq_freeze_wq);
> > +
> > +	rq_qos_exit(q);
> 
> rq_qos_exit() requires the queue to be frozen, otherwise kernel oops
> can be triggered. There may be old submitted bios not completed yet,
> and rq_qos_done() can be referring to q->rq_qos.

Yes.  I actually misread the old code - it atually does two
blk_freeze_queue_start, but it also includes the wait.

> But if waiting for queue frozen is added, one extra freeze is added,
> which will slow done remove disk/queue.

Is it?  For the typical case the second free in blk_cleanp_queue will
be bsically free because there is no pending I/O.  The only case
where that matters if if there is pending passthrough I/O again,
which can only happen with SCSI, and even there is highly unusual.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17  7:56                 ` Christoph Hellwig
@ 2021-09-17  8:32                   ` Ming Lei
  2021-09-17 12:37                     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-17  8:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 09:56:50AM +0200, Christoph Hellwig wrote:
> On Fri, Sep 17, 2021 at 03:41:05PM +0800, Ming Lei wrote:
> > >  
> > > -	return ret;
> > > +	if (unlikely(!disk_live(disk))) {
> > > +		blk_queue_exit(disk->queue);
> > > +		bio_io_error(bio);
> > > +		return -ENODEV;
> > > +	}
> > 
> > Is it safe to fail IO in this way? There is still opened bdev, and
> > usually IO can be done successfully even when disk is being deleted.
> 
> "normal" I/O should not really happen by the time it is deleted.  That
> being said we should do this only after the fsync is done. While no
> one should rely on that I'm pretty sure some file systems do.
> So we'll actually need a deleted flag.
> 
> > Not mention it adds one extra check in real fast path.
> 
> I'm not really woried about the check itself.  That being
> sais this inode cache line is not hot right now, so moving it to
> disk->state will help as we need to check the read-only flag in
> the the I/O submission path anyway.

When the deleted flag is set in del_gendisk(), it may not be observed
in time in bio_submit() because of memory/compiler re-order, so the
check is still sort of relax constraint. I guess the same is true for
read-only check.

> 
> > > +	/*
> > > +	 * Prevent new I/O from crossing bio_queue_enter().
> > > +	 */
> > > +	blk_freeze_queue_start(q);
> > > +	if (queue_is_mq(q))
> > > +		blk_mq_wake_waiters(q);
> > > +	/* Make blk_queue_enter() reexamine the DYING flag. */
> > > +	wake_up_all(&q->mq_freeze_wq);
> > > +
> > > +	rq_qos_exit(q);
> > 
> > rq_qos_exit() requires the queue to be frozen, otherwise kernel oops
> > can be triggered. There may be old submitted bios not completed yet,
> > and rq_qos_done() can be referring to q->rq_qos.
> 
> Yes.  I actually misread the old code - it atually does two
> blk_freeze_queue_start, but it also includes the wait.

Only blk_freeze_queue() includes the wait, and blk_freeze_queue_start()
doesn't.

> 
> > But if waiting for queue frozen is added, one extra freeze is added,
> > which will slow done remove disk/queue.
> 
> Is it?  For the typical case the second free in blk_cleanp_queue will
> be bsically free because there is no pending I/O.  The only case
> where that matters if if there is pending passthrough I/O again,
> which can only happen with SCSI, and even there is highly unusual.

RCU grace period is involved in blk_freeze_queue(). One way you can
avoid it is to keep the percpu ref at atomic mode when running
blk_mq_unfreeze_queue() in del_gendisk().



Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17  8:32                   ` Ming Lei
@ 2021-09-17 12:37                     ` Christoph Hellwig
  2021-09-17 13:41                       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-17 12:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Martin K . Petersen,
	linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 04:32:15PM +0800, Ming Lei wrote:
> > Is it?  For the typical case the second free in blk_cleanp_queue will
> > be bsically free because there is no pending I/O.  The only case
> > where that matters if if there is pending passthrough I/O again,
> > which can only happen with SCSI, and even there is highly unusual.
> 
> RCU grace period is involved in blk_freeze_queue().

where?

> One way you can
> avoid it is to keep the percpu ref at atomic mode when running
> blk_mq_unfreeze_queue() in del_gendisk().
> 
> 
> 
> Thanks,
> Ming
---end quoted text---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] scsi: core: cleanup request queue before releasing gendisk
  2021-09-17 12:37                     ` Christoph Hellwig
@ 2021-09-17 13:41                       ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2021-09-17 13:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Martin K . Petersen, linux-scsi, Sven Schnelle

On Fri, Sep 17, 2021 at 02:37:19PM +0200, Christoph Hellwig wrote:
> On Fri, Sep 17, 2021 at 04:32:15PM +0800, Ming Lei wrote:
> > > Is it?  For the typical case the second free in blk_cleanp_queue will
> > > be bsically free because there is no pending I/O.  The only case
> > > where that matters if if there is pending passthrough I/O again,
> > > which can only happen with SCSI, and even there is highly unusual.
> > 
> > RCU grace period is involved in blk_freeze_queue().
> 
> where?

__percpu_ref_switch_to_atomic().

-- 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-17 13:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15  9:25 [PATCH] scsi: core: cleanup request queue before releasing gendisk Ming Lei
2021-09-15 13:40 ` Christoph Hellwig
2021-09-16  1:36   ` Ming Lei
2021-09-16 10:14     ` Christoph Hellwig
2021-09-16 12:38       ` Ming Lei
2021-09-16 14:20         ` Christoph Hellwig
2021-09-17  3:39           ` Ming Lei
2021-09-17  6:53             ` Christoph Hellwig
2021-09-17  7:41               ` Ming Lei
2021-09-17  7:56                 ` Christoph Hellwig
2021-09-17  8:32                   ` Ming Lei
2021-09-17 12:37                     ` Christoph Hellwig
2021-09-17 13:41                       ` Ming Lei
2021-09-15 13:40 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).