All of lore.kernel.org
 help / color / mirror / Atom feed
* move more work to disk_release
@ 2022-02-22 14:14 Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

Hi all,

this series resurrects and forward ports ports larger parts of the
"block: don't drain file system I/O on del_gendisk" series from Ming,
but does not remove the draining in del_gendisk, but instead the one
in the sd driver, which always was a bit ad-hoc.  As part of that sd
and sr are switched to use the new ->free_disk method to avoid having
to clear disk->private_data and the way to lookup the SCSI ULP is
cleaned up as well.

Git branch:

    git://git.infradead.org/users/hch/block.git freeze-5.18

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/freeze-5.18


Diffstat:
 block/blk-core.c           |    7 --
 block/blk-mq.c             |   10 +--
 block/blk-sysfs.c          |   25 --------
 block/blk.h                |    2 
 block/elevator.c           |    7 +-
 block/genhd.c              |   43 ++++++++++++++-
 drivers/scsi/sd.c          |   95 +++++----------------------------
 drivers/scsi/sd.h          |    3 -
 drivers/scsi/sr.c          |  129 +++++++++------------------------------------
 drivers/scsi/sr.h          |    5 -
 drivers/scsi/st.c          |    1 
 drivers/scsi/st.h          |    1 
 include/scsi/scsi_cmnd.h   |    9 ---
 include/scsi/scsi_driver.h |    9 ++-
 14 files changed, 104 insertions(+), 242 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-23  2:08   ` Ming Lei
  2022-02-22 14:14 ` [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

I/O accounting buckets I/O into the read/write/discard categories into
which passthrough I/O does not fit at all.  It also accounts to the
block_device, which may not even exist for passthrough I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 6 +-----
 block/blk.h    | 2 +-
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ce77250316..ee80853473d1e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -883,11 +883,7 @@ static inline void blk_account_io_done(struct request *req, u64 now)
 
 static void __blk_account_io_start(struct request *rq)
 {
-	/* passthrough requests can hold bios that do not have ->bi_bdev set */
-	if (rq->bio && rq->bio->bi_bdev)
-		rq->part = rq->bio->bi_bdev;
-	else if (rq->q->disk)
-		rq->part = rq->q->disk->part0;
+	rq->part = rq->bio->bi_bdev;
 
 	part_stat_lock();
 	update_io_ticks(rq->part, jiffies, false);
diff --git a/block/blk.h b/block/blk.h
index ebaa59ca46ca6..6f21859c7f0ff 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -325,7 +325,7 @@ int blk_dev_init(void);
  */
 static inline bool blk_do_io_stat(struct request *rq)
 {
-	return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
+	return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
 }
 
 void update_io_ticks(struct block_device *part, unsigned long now, bool end);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

From: Ming Lei <ming.lei@redhat.com>

To simplify further changes allow for double calling blk_mq_free_rqs on
a queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: split out from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ee80853473d1e..63e2d3fd60946 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3061,6 +3061,9 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	struct blk_mq_tags *drv_tags;
 	struct page *page;
 
+	if (list_empty(&tags->page_list))
+		return;
+
 	if (blk_mq_is_shared_tags(set->flags))
 		drv_tags = set->shared_tags;
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

Requiring every ULP to have the scsi_drive as first member of the
private data is rather fragile and not necessary anyway.  Just use
the driver hanging off the SCSI device instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c          | 3 +--
 drivers/scsi/sd.h          | 3 +--
 drivers/scsi/sr.c          | 5 ++---
 drivers/scsi/sr.h          | 1 -
 drivers/scsi/st.c          | 1 -
 drivers/scsi/st.h          | 1 -
 include/scsi/scsi_cmnd.h   | 9 ---------
 include/scsi/scsi_driver.h | 9 +++++++--
 8 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2d648d27bfd71..2a1e19e871d30 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3515,7 +3515,6 @@ static int sd_probe(struct device *dev)
 	}
 
 	sdkp->device = sdp;
-	sdkp->driver = &sd_template;
 	sdkp->disk = gd;
 	sdkp->index = index;
 	sdkp->max_retries = SD_MAX_RETRIES;
@@ -3548,7 +3547,7 @@ static int sd_probe(struct device *dev)
 	gd->minors = SD_MINORS;
 
 	gd->fops = &sd_fops;
-	gd->private_data = &sdkp->driver;
+	gd->private_data = sdkp;
 
 	/* defaults, until the device tells us otherwise */
 	sdp->sector_size = 512;
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 2e5932bde43d1..303aa1c23aefb 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -68,7 +68,6 @@ enum {
 };
 
 struct scsi_disk {
-	struct scsi_driver *driver;	/* always &sd_template */
 	struct scsi_device *device;
 	struct device	dev;
 	struct gendisk	*disk;
@@ -131,7 +130,7 @@ struct scsi_disk {
 
 static inline struct scsi_disk *scsi_disk(struct gendisk *disk)
 {
-	return container_of(disk->private_data, struct scsi_disk, driver);
+	return disk->private_data;
 }
 
 #define sd_printk(prefix, sdsk, fmt, a...)				\
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index f925b1f1f9ada..569bda76a5175 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -147,7 +147,7 @@ static void sr_kref_release(struct kref *kref);
 
 static inline struct scsi_cd *scsi_cd(struct gendisk *disk)
 {
-	return container_of(disk->private_data, struct scsi_cd, driver);
+	return disk->private_data;
 }
 
 static int sr_runtime_suspend(struct device *dev)
@@ -692,7 +692,6 @@ static int sr_probe(struct device *dev)
 
 	cd->device = sdev;
 	cd->disk = disk;
-	cd->driver = &sr_template;
 	cd->capacity = 0x1fffff;
 	cd->device->changed = 1;	/* force recheck CD type */
 	cd->media_present = 1;
@@ -713,7 +712,7 @@ static int sr_probe(struct device *dev)
 	sr_vendor_init(cd);
 
 	set_capacity(disk, cd->capacity);
-	disk->private_data = &cd->driver;
+	disk->private_data = cd;
 
 	if (register_cdrom(disk, &cd->cdi))
 		goto fail_minor;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 1609f02ed29ac..d80af3fcb6f97 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -32,7 +32,6 @@ struct scsi_device;
 
 
 typedef struct scsi_cd {
-	struct scsi_driver *driver;
 	unsigned capacity;	/* size in blocks                       */
 	struct scsi_device *device;
 	unsigned int vendor;	/* vendor code, see sr_vendor.c         */
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index e869e90e05afe..ebe9412c86f43 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4276,7 +4276,6 @@ static int st_probe(struct device *dev)
 		goto out_buffer_free;
 	}
 	kref_init(&tpnt->kref);
-	tpnt->driver = &st_template;
 
 	tpnt->device = SDp;
 	if (SDp->scsi_level <= 2)
diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h
index c0ef0d9aaf8a2..7a68eaba7e810 100644
--- a/drivers/scsi/st.h
+++ b/drivers/scsi/st.h
@@ -117,7 +117,6 @@ struct scsi_tape_stats {
 
 /* The tape drive descriptor */
 struct scsi_tape {
-	struct scsi_driver *driver;
 	struct scsi_device *device;
 	struct mutex lock;	/* For serialization */
 	struct completion wait;	/* For SCSI commands */
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 6794d7322cbde..e3a4c67794b14 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -13,7 +13,6 @@
 #include <scsi/scsi_request.h>
 
 struct Scsi_Host;
-struct scsi_driver;
 
 /*
  * MAX_COMMAND_SIZE is:
@@ -159,14 +158,6 @@ static inline void *scsi_cmd_priv(struct scsi_cmnd *cmd)
 	return cmd + 1;
 }
 
-/* make sure not to use it with passthrough commands */
-static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
-{
-	struct request *rq = scsi_cmd_to_rq(cmd);
-
-	return *(struct scsi_driver **)rq->q->disk->private_data;
-}
-
 void scsi_done(struct scsi_cmnd *cmd);
 
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 6dffa8555a390..4ce1988b2ba01 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -4,11 +4,10 @@
 
 #include <linux/blk_types.h>
 #include <linux/device.h>
+#include <scsi/scsi_cmnd.h>
 
 struct module;
 struct request;
-struct scsi_cmnd;
-struct scsi_device;
 
 struct scsi_driver {
 	struct device_driver	gendrv;
@@ -31,4 +30,10 @@ extern int scsi_register_interface(struct class_interface *);
 #define scsi_unregister_interface(intf) \
 	class_interface_unregister(intf)
 
+/* make sure not to use it with passthrough commands */
+static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
+{
+	return to_scsi_driver(cmd->device->sdev_gendev.driver);
+}
+
 #endif /* _SCSI_SCSI_DRIVER_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 05/12] sd: remove the extra sdev_gendev reference Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

Implement the ->free_disk method to to put struct scsi_disk when the last
gendisk reference count goes away.  This removes the need to clear
->private_data and thus freeze the queue on unbind.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c | 89 ++++++++---------------------------------------
 1 file changed, 15 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2a1e19e871d30..4eaa5deafc3dc 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -121,11 +121,6 @@ static void scsi_disk_release(struct device *cdev);
 
 static DEFINE_IDA(sd_index_ida);
 
-/* This semaphore is used to mediate the 0->1 reference get in the
- * face of object destruction (i.e. we can't allow a get on an
- * object after last put) */
-static DEFINE_MUTEX(sd_ref_mutex);
-
 static struct kmem_cache *sd_cdb_cache;
 static mempool_t *sd_cdb_pool;
 static mempool_t *sd_page_pool;
@@ -663,33 +658,6 @@ static int sd_major(int major_idx)
 	}
 }
 
-static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
-{
-	struct scsi_disk *sdkp = NULL;
-
-	mutex_lock(&sd_ref_mutex);
-
-	if (disk->private_data) {
-		sdkp = scsi_disk(disk);
-		if (scsi_device_get(sdkp->device) == 0)
-			get_device(&sdkp->dev);
-		else
-			sdkp = NULL;
-	}
-	mutex_unlock(&sd_ref_mutex);
-	return sdkp;
-}
-
-static void scsi_disk_put(struct scsi_disk *sdkp)
-{
-	struct scsi_device *sdev = sdkp->device;
-
-	mutex_lock(&sd_ref_mutex);
-	put_device(&sdkp->dev);
-	scsi_device_put(sdev);
-	mutex_unlock(&sd_ref_mutex);
-}
-
 #ifdef CONFIG_BLK_SED_OPAL
 static int sd_sec_submit(void *data, u16 spsp, u8 secp, void *buffer,
 		size_t len, bool send)
@@ -1418,17 +1386,15 @@ static bool sd_need_revalidate(struct block_device *bdev,
  **/
 static int sd_open(struct block_device *bdev, fmode_t mode)
 {
-	struct scsi_disk *sdkp = scsi_disk_get(bdev->bd_disk);
-	struct scsi_device *sdev;
+	struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
+	struct scsi_device *sdev = sdkp->device;
 	int retval;
 
-	if (!sdkp)
+	if (scsi_device_get(sdev))
 		return -ENXIO;
 
 	SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
 
-	sdev = sdkp->device;
-
 	/*
 	 * If the device is in error recovery, wait until it is done.
 	 * If the device is offline, then disallow any access to it.
@@ -1473,7 +1439,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
 	return 0;
 
 error_out:
-	scsi_disk_put(sdkp);
+	scsi_device_put(sdkp->device);
 	return retval;	
 }
 
@@ -1502,7 +1468,7 @@ static void sd_release(struct gendisk *disk, fmode_t mode)
 			scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
 	}
 
-	scsi_disk_put(sdkp);
+	scsi_device_put(sdkp->device);
 }
 
 static int sd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
@@ -1616,7 +1582,7 @@ static int media_not_present(struct scsi_disk *sdkp,
  **/
 static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
 {
-	struct scsi_disk *sdkp = scsi_disk_get(disk);
+	struct scsi_disk *sdkp = disk->private_data;
 	struct scsi_device *sdp;
 	int retval;
 	bool disk_changed;
@@ -1679,7 +1645,6 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
 	 */
 	disk_changed = sdp->changed;
 	sdp->changed = 0;
-	scsi_disk_put(sdkp);
 	return disk_changed ? DISK_EVENT_MEDIA_CHANGE : 0;
 }
 
@@ -1887,6 +1852,13 @@ static const struct pr_ops sd_pr_ops = {
 	.pr_clear	= sd_pr_clear,
 };
 
+static void scsi_disk_free_disk(struct gendisk *disk)
+{
+	struct scsi_disk *sdkp = disk->private_data;
+
+	put_device(&sdkp->dev);
+}
+
 static const struct block_device_operations sd_fops = {
 	.owner			= THIS_MODULE,
 	.open			= sd_open,
@@ -1898,6 +1870,7 @@ static const struct block_device_operations sd_fops = {
 	.unlock_native_capacity	= sd_unlock_native_capacity,
 	.report_zones		= sd_zbc_report_zones,
 	.get_unique_id		= sd_get_unique_id,
+	.free_disk		= scsi_disk_free_disk,
 	.pr_ops			= &sd_pr_ops,
 };
 
@@ -3623,9 +3596,8 @@ static int sd_probe(struct device *dev)
  **/
 static int sd_remove(struct device *dev)
 {
-	struct scsi_disk *sdkp;
+	struct scsi_disk *sdkp = dev_get_drvdata(dev);
 
-	sdkp = dev_get_drvdata(dev);
 	scsi_autopm_get_device(sdkp->device);
 
 	device_del(&sdkp->dev);
@@ -3634,48 +3606,17 @@ static int sd_remove(struct device *dev)
 
 	free_opal_dev(sdkp->opal_dev);
 
-	mutex_lock(&sd_ref_mutex);
-	dev_set_drvdata(dev, NULL);
 	put_device(&sdkp->dev);
-	mutex_unlock(&sd_ref_mutex);
-
 	return 0;
 }
 
-/**
- *	scsi_disk_release - Called to free the scsi_disk structure
- *	@dev: pointer to embedded class device
- *
- *	sd_ref_mutex must be held entering this routine.  Because it is
- *	called on last put, you should always use the scsi_disk_get()
- *	scsi_disk_put() helpers which manipulate the semaphore directly
- *	and never do a direct put_device.
- **/
 static void scsi_disk_release(struct device *dev)
 {
 	struct scsi_disk *sdkp = to_scsi_disk(dev);
-	struct gendisk *disk = sdkp->disk;
-	struct request_queue *q = disk->queue;
 
 	ida_free(&sd_index_ida, sdkp->index);
-
-	/*
-	 * Wait until all requests that are in progress have completed.
-	 * This is necessary to avoid that e.g. scsi_end_request() crashes
-	 * due to clearing the disk->private_data pointer. Wait from inside
-	 * scsi_disk_release() instead of from sd_release() to avoid that
-	 * freezing and unfreezing the request queue affects user space I/O
-	 * in case multiple processes open a /dev/sd... node concurrently.
-	 */
-	blk_mq_freeze_queue(q);
-	blk_mq_unfreeze_queue(q);
-
-	disk->private_data = NULL;
-	put_disk(disk);
 	put_device(&sdkp->device->sdev_gendev);
-
 	sd_zbc_release_disk(sdkp);
-
 	kfree(sdkp);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/12] sd: remove the extra sdev_gendev reference
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 06/12] sr: implement ->free_disk Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

device_add already takes a reference on the parent, not need to take an
extra one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4eaa5deafc3dc..041c21c9483f6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3503,7 +3503,7 @@ static int sd_probe(struct device *dev)
 	}
 
 	device_initialize(&sdkp->dev);
-	sdkp->dev.parent = get_device(dev);
+	sdkp->dev.parent = dev;
 	sdkp->dev.class = &sd_disk_class;
 	dev_set_name(&sdkp->dev, "%s", dev_name(dev));
 
@@ -3615,7 +3615,6 @@ static void scsi_disk_release(struct device *dev)
 	struct scsi_disk *sdkp = to_scsi_disk(dev);
 
 	ida_free(&sd_index_ida, sdkp->index);
-	put_device(&sdkp->device->sdev_gendev);
 	sd_zbc_release_disk(sdkp);
 	kfree(sdkp);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/12] sr: implement ->free_disk
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 05/12] sd: remove the extra sdev_gendev reference Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

Simplify the refcounting and remove the need to clear disk->private_data
by implementing the ->free_disk method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sr.c | 124 ++++++++++------------------------------------
 drivers/scsi/sr.h |   4 --
 2 files changed, 26 insertions(+), 102 deletions(-)

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 569bda76a5175..11fbdc75bb711 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -109,11 +109,6 @@ static DEFINE_SPINLOCK(sr_index_lock);
 
 static struct lock_class_key sr_bio_compl_lkclass;
 
-/* This semaphore is used to mediate the 0->1 reference get in the
- * face of object destruction (i.e. we can't allow a get on an
- * object after last put) */
-static DEFINE_MUTEX(sr_ref_mutex);
-
 static int sr_open(struct cdrom_device_info *, int);
 static void sr_release(struct cdrom_device_info *);
 
@@ -143,8 +138,6 @@ static const struct cdrom_device_ops sr_dops = {
 	.capability		= SR_CAPABILITIES,
 };
 
-static void sr_kref_release(struct kref *kref);
-
 static inline struct scsi_cd *scsi_cd(struct gendisk *disk)
 {
 	return disk->private_data;
@@ -163,38 +156,6 @@ static int sr_runtime_suspend(struct device *dev)
 		return 0;
 }
 
-/*
- * The get and put routines for the struct scsi_cd.  Note this entity
- * has a scsi_device pointer and owns a reference to this.
- */
-static inline struct scsi_cd *scsi_cd_get(struct gendisk *disk)
-{
-	struct scsi_cd *cd = NULL;
-
-	mutex_lock(&sr_ref_mutex);
-	if (disk->private_data == NULL)
-		goto out;
-	cd = scsi_cd(disk);
-	kref_get(&cd->kref);
-	if (scsi_device_get(cd->device)) {
-		kref_put(&cd->kref, sr_kref_release);
-		cd = NULL;
-	}
- out:
-	mutex_unlock(&sr_ref_mutex);
-	return cd;
-}
-
-static void scsi_cd_put(struct scsi_cd *cd)
-{
-	struct scsi_device *sdev = cd->device;
-
-	mutex_lock(&sr_ref_mutex);
-	kref_put(&cd->kref, sr_kref_release);
-	scsi_device_put(sdev);
-	mutex_unlock(&sr_ref_mutex);
-}
-
 static unsigned int sr_get_events(struct scsi_device *sdev)
 {
 	u8 buf[8];
@@ -522,15 +483,13 @@ static void sr_revalidate_disk(struct scsi_cd *cd)
 
 static int sr_block_open(struct block_device *bdev, fmode_t mode)
 {
-	struct scsi_cd *cd;
-	struct scsi_device *sdev;
+	struct scsi_cd *cd = cd = scsi_cd(bdev->bd_disk);
+	struct scsi_device *sdev = cd->device;
 	int ret = -ENXIO;
 
-	cd = scsi_cd_get(bdev->bd_disk);
-	if (!cd)
-		goto out;
+	if (scsi_device_get(cd->device))
+		return -ENXIO;
 
-	sdev = cd->device;
 	scsi_autopm_get_device(sdev);
 	if (bdev_check_media_change(bdev))
 		sr_revalidate_disk(cd);
@@ -541,9 +500,7 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
 
 	scsi_autopm_put_device(sdev);
 	if (ret)
-		scsi_cd_put(cd);
-
-out:
+		scsi_device_put(cd->device);
 	return ret;
 }
 
@@ -555,7 +512,7 @@ static void sr_block_release(struct gendisk *disk, fmode_t mode)
 	cdrom_release(&cd->cdi, mode);
 	mutex_unlock(&cd->lock);
 
-	scsi_cd_put(cd);
+	scsi_device_put(cd->device);
 }
 
 static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
@@ -595,18 +552,24 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 static unsigned int sr_block_check_events(struct gendisk *disk,
 					  unsigned int clearing)
 {
-	unsigned int ret = 0;
-	struct scsi_cd *cd;
+	struct scsi_cd *cd = disk->private_data;
 
-	cd = scsi_cd_get(disk);
-	if (!cd)
+	if (atomic_read(&cd->device->disk_events_disable_depth))
 		return 0;
+	return cdrom_check_events(&cd->cdi, clearing);
+}
 
-	if (!atomic_read(&cd->device->disk_events_disable_depth))
-		ret = cdrom_check_events(&cd->cdi, clearing);
+static void sr_free_disk(struct gendisk *disk)
+{
+	struct scsi_cd *cd = disk->private_data;
 
-	scsi_cd_put(cd);
-	return ret;
+	spin_lock(&sr_index_lock);
+	clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
+	spin_unlock(&sr_index_lock);
+
+	unregister_cdrom(&cd->cdi);
+	mutex_destroy(&cd->lock);
+	kfree(cd);
 }
 
 static const struct block_device_operations sr_bdops =
@@ -617,6 +580,7 @@ static const struct block_device_operations sr_bdops =
 	.ioctl		= sr_block_ioctl,
 	.compat_ioctl	= blkdev_compat_ptr_ioctl,
 	.check_events	= sr_block_check_events,
+	.free_disk	= sr_free_disk,
 };
 
 static int sr_open(struct cdrom_device_info *cdi, int purpose)
@@ -660,8 +624,6 @@ static int sr_probe(struct device *dev)
 	if (!cd)
 		goto fail;
 
-	kref_init(&cd->kref);
-
 	disk = __alloc_disk_node(sdev->request_queue, NUMA_NO_NODE,
 				 &sr_bio_compl_lkclass);
 	if (!disk)
@@ -727,10 +689,8 @@ static int sr_probe(struct device *dev)
 	sr_revalidate_disk(cd);
 
 	error = device_add_disk(&sdev->sdev_gendev, disk, NULL);
-	if (error) {
-		kref_put(&cd->kref, sr_kref_release);
-		goto fail;
-	}
+	if (error)
+		goto unregister_cdrom;
 
 	sdev_printk(KERN_DEBUG, sdev,
 		    "Attached scsi CD-ROM %s\n", cd->cdi.name);
@@ -738,6 +698,8 @@ static int sr_probe(struct device *dev)
 
 	return 0;
 
+unregister_cdrom:
+	unregister_cdrom(&cd->cdi);
 fail_minor:
 	spin_lock(&sr_index_lock);
 	clear_bit(minor, sr_index_bits);
@@ -1009,36 +971,6 @@ static int sr_read_cdda_bpc(struct cdrom_device_info *cdi, void __user *ubuf,
 	return ret;
 }
 
-
-/**
- *	sr_kref_release - Called to free the scsi_cd structure
- *	@kref: pointer to embedded kref
- *
- *	sr_ref_mutex must be held entering this routine.  Because it is
- *	called on last put, you should always use the scsi_cd_get()
- *	scsi_cd_put() helpers which manipulate the semaphore directly
- *	and never do a direct kref_put().
- **/
-static void sr_kref_release(struct kref *kref)
-{
-	struct scsi_cd *cd = container_of(kref, struct scsi_cd, kref);
-	struct gendisk *disk = cd->disk;
-
-	spin_lock(&sr_index_lock);
-	clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
-	spin_unlock(&sr_index_lock);
-
-	unregister_cdrom(&cd->cdi);
-
-	disk->private_data = NULL;
-
-	put_disk(disk);
-
-	mutex_destroy(&cd->lock);
-
-	kfree(cd);
-}
-
 static int sr_remove(struct device *dev)
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
@@ -1046,11 +978,7 @@ static int sr_remove(struct device *dev)
 	scsi_autopm_get_device(cd->device);
 
 	del_gendisk(cd->disk);
-	dev_set_drvdata(dev, NULL);
-
-	mutex_lock(&sr_ref_mutex);
-	kref_put(&cd->kref, sr_kref_release);
-	mutex_unlock(&sr_ref_mutex);
+	put_disk(cd->disk);
 
 	return 0;
 }
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index d80af3fcb6f97..1175f2e213b56 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -18,7 +18,6 @@
 #ifndef _SR_H
 #define _SR_H
 
-#include <linux/kref.h>
 #include <linux/mutex.h>
 
 #define MAX_RETRIES	3
@@ -51,9 +50,6 @@ typedef struct scsi_cd {
 
 	struct cdrom_device_info cdi;
 	struct mutex lock;
-	/* We hold gendisk and scsi_device references on probe and use
-	 * the refs on this kref to decide when to release them */
-	struct kref kref;
 	struct gendisk *disk;
 } Scsi_CD;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 06/12] sr: implement ->free_disk Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi, Bart Van Assche

From: Ming Lei <ming.lei@redhat.com>

blkcg works on FS bio level, so it is reasonable to make both blkcg and
gendisk sharing same lifetime. Meantime there won't be any FS IO when
releasing disk, so safe to move blkcg initialization/destroy into disk
allocation/release handler

Long term, we can move blkcg into gendisk completely.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c  |  5 -----
 block/blk-sysfs.c |  7 -------
 block/genhd.c     | 13 +++++++++++++
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d2..b2f2c65774812 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -496,17 +496,12 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
 				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
 		goto fail_stats;
 
-	if (blkcg_init_queue(q))
-		goto fail_ref;
-
 	blk_queue_dma_alignment(q, 511);
 	blk_set_default_limits(&q->limits);
 	q->nr_requests = BLKDEV_DEFAULT_RQ;
 
 	return q;
 
-fail_ref:
-	percpu_ref_exit(&q->q_usage_counter);
 fail_stats:
 	blk_free_queue_stats(q->stats);
 fail_split:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4c6b7dff71e5b..5f723d2ff8948 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -751,13 +751,6 @@ static void blk_exit_queue(struct request_queue *q)
 		ioc_clear_queue(q);
 		elevator_exit(q);
 	}
-
-	/*
-	 * Remove all references to @q from the block cgroup controller before
-	 * restoring @q->queue_lock to avoid that restoring this pointer causes
-	 * e.g. blkcg_print_blkgs() to crash.
-	 */
-	blkcg_exit_queue(q);
 }
 
 /**
diff --git a/block/genhd.c b/block/genhd.c
index e351fac41bf25..ebf0e0be1c545 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1115,9 +1115,17 @@ static void disk_release(struct device *dev)
 
 	blk_mq_cancel_work_sync(disk->queue);
 
+	/*
+	 * Remove all references to @q from the block cgroup controller before
+	 * restoring @q->queue_lock to avoid that restoring this pointer causes
+	 * e.g. blkcg_print_blkgs() to crash.
+	 */
+	blkcg_exit_queue(disk->queue);
+
 	disk_release_events(disk);
 	kfree(disk->random);
 	xa_destroy(&disk->part_tbl);
+
 	disk->queue->disk = NULL;
 	blk_put_queue(disk->queue);
 
@@ -1318,6 +1326,9 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
 	if (xa_insert(&disk->part_tbl, 0, disk->part0, GFP_KERNEL))
 		goto out_destroy_part_tbl;
 
+	if (blkcg_init_queue(q))
+		goto out_erase_part0;
+
 	rand_initialize_disk(disk);
 	disk_to_dev(disk)->class = &block_class;
 	disk_to_dev(disk)->type = &disk_type;
@@ -1330,6 +1341,8 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
 #endif
 	return disk;
 
+out_erase_part0:
+	xa_erase(&disk->part_tbl, 0);
 out_destroy_part_tbl:
 	xa_destroy(&disk->part_tbl);
 	disk->part0->bd_disk = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-23  4:06   ` Bart Van Assche
  2022-02-22 14:14 ` [PATCH 09/12] block: move q_usage_counter release into blk_queue_release Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

From: Ming Lei <ming.lei@redhat.com>

The queue's top debugfs dir is removed from blk_release_queue(), so all
hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
is only called from blk_cleanup_queue(), it isn't necessary to remove
hctx debugfs from blk_mq_exit_queue().

So remove it from blk_mq_exit_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 63e2d3fd60946..540c8da30da72 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (i == nr_queue)
 			break;
-		blk_mq_debugfs_unregister_hctx(hctx);
 		blk_mq_exit_hctx(q, set, hctx, i);
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/12] block: move q_usage_counter release into blk_queue_release
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi, Bart Van Assche

From: Ming Lei <ming.lei@redhat.com>

After blk_cleanup_queue() returns, disk may not be released yet, so
probably bio may still be submitted and ->q_usage_counter may be
touched, so far this way seems safe, but not good from API's viewpoint.

Move the release q_usage_counter into blk_queue_release().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c  | 2 --
 block/blk-sysfs.c | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b2f2c65774812..a8c59913dd78d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -342,8 +342,6 @@ void blk_cleanup_queue(struct request_queue *q)
 		blk_mq_sched_free_rqs(q);
 	mutex_unlock(&q->sysfs_lock);
 
-	percpu_ref_exit(&q->q_usage_counter);
-
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5f723d2ff8948..4ea22169b5186 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -780,6 +780,8 @@ static void blk_release_queue(struct kobject *kobj)
 
 	might_sleep();
 
+	percpu_ref_exit(&q->q_usage_counter);
+
 	if (q->poll_stat)
 		blk_stat_remove_callback(q, q->poll_cb);
 	blk_stat_free_callback(q->poll_cb);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 09/12] block: move q_usage_counter release into blk_queue_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 18:29   ` Bart Van Assche
  2022-02-22 14:14 ` [PATCH 11/12] block: do more work in elevator_exit Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

From: Ming Lei <ming.lei@redhat.com>

There can't be FS IO in disk_release(), so move blk_exit_queue() there.

We still need to freeze queue here since the request is freed after the
bio is completed and passthrough request rely on scheduler tags as well.

The disk can be released before or after queue is cleaned up, and we have
to free the scheduler request pool before blk_cleanup_queue returns,
while the static request pool has to be freed before exiting the
I/O scheduler.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 16 ----------------
 block/genhd.c     | 32 +++++++++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4ea22169b5186..faf8577578929 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -739,20 +739,6 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head)
 	kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
 }
 
-/* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */
-static void blk_exit_queue(struct request_queue *q)
-{
-	/*
-	 * Since the I/O scheduler exit code may access cgroup information,
-	 * perform I/O scheduler exit before disassociating from the block
-	 * cgroup controller.
-	 */
-	if (q->elevator) {
-		ioc_clear_queue(q);
-		elevator_exit(q);
-	}
-}
-
 /**
  * blk_release_queue - releases all allocated resources of the request_queue
  * @kobj: pointer to a kobject, whose container is a request_queue
@@ -786,8 +772,6 @@ static void blk_release_queue(struct kobject *kobj)
 		blk_stat_remove_callback(q, q->poll_cb);
 	blk_stat_free_callback(q->poll_cb);
 
-	blk_exit_queue(q);
-
 	blk_free_queue_stats(q->stats);
 	kfree(q->poll_stat);
 
diff --git a/block/genhd.c b/block/genhd.c
index ebf0e0be1c545..40ef013382872 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,6 +29,7 @@
 #include "blk.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
+#include "blk-cgroup.h"
 
 static struct kobject *block_depr;
 
@@ -1092,6 +1093,34 @@ static const struct attribute_group *disk_attr_groups[] = {
 	NULL
 };
 
+static void blk_mq_release_queue(struct request_queue *q)
+{
+	blk_mq_cancel_work_sync(q);
+
+	/*
+	 * There can't be any non non-passthrough bios in flight here, but
+	 * requests stay around longer, including passthrough ones so we
+	 * still need to freeze the queue here.
+	 */
+	blk_mq_freeze_queue(q);
+
+	/*
+	 * Since the I/O scheduler exit code may access cgroup information,
+	 * perform I/O scheduler exit before disassociating from the block
+	 * cgroup controller.
+	 */
+	if (q->elevator) {
+		ioc_clear_queue(q);
+
+		mutex_lock(&q->sysfs_lock);
+		blk_mq_sched_free_rqs(q);
+		elevator_exit(q);
+		mutex_unlock(&q->sysfs_lock);
+	}
+
+	__blk_mq_unfreeze_queue(q, true);
+}
+
 /**
  * disk_release - releases all allocated resources of the gendisk
  * @dev: the device representing this disk
@@ -1113,7 +1142,8 @@ static void disk_release(struct device *dev)
 	might_sleep();
 	WARN_ON_ONCE(disk_live(disk));
 
-	blk_mq_cancel_work_sync(disk->queue);
+	if (queue_is_mq(disk->queue))
+		blk_mq_release_queue(disk->queue);
 
 	/*
 	 * Remove all references to @q from the block cgroup controller before
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/12] block: do more work in elevator_exit
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-22 14:14 ` [PATCH 12/12] block: move rq_qos_exit() into disk_release() Christoph Hellwig
  2022-02-26  4:46 ` move more work to disk_release Bart Van Assche
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

Move the calls to ioc_clear_queue and blk_mq_sched_free_rqs into
elevator_exit.  Except for one call where we know we can't have io_cq
structures yet these always go together, and that extra call in an
error path is harmless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/elevator.c | 7 +++----
 block/genhd.c    | 3 ---
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 6847ab6e7aa50..4664cae50da86 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -192,6 +192,9 @@ void elevator_exit(struct request_queue *q)
 {
 	struct elevator_queue *e = q->elevator;
 
+	ioc_clear_queue(q);
+	blk_mq_sched_free_rqs(q);
+
 	mutex_lock(&e->sysfs_lock);
 	blk_mq_exit_sched(q, e);
 	mutex_unlock(&e->sysfs_lock);
@@ -595,9 +598,6 @@ int elevator_switch_mq(struct request_queue *q,
 	if (q->elevator) {
 		if (q->elevator->registered)
 			elv_unregister_queue(q);
-
-		ioc_clear_queue(q);
-		blk_mq_sched_free_rqs(q);
 		elevator_exit(q);
 	}
 
@@ -608,7 +608,6 @@ int elevator_switch_mq(struct request_queue *q,
 	if (new_e) {
 		ret = elv_register_queue(q, true);
 		if (ret) {
-			blk_mq_sched_free_rqs(q);
 			elevator_exit(q);
 			goto out;
 		}
diff --git a/block/genhd.c b/block/genhd.c
index 40ef013382872..40edff4331758 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1110,10 +1110,7 @@ static void blk_mq_release_queue(struct request_queue *q)
 	 * cgroup controller.
 	 */
 	if (q->elevator) {
-		ioc_clear_queue(q);
-
 		mutex_lock(&q->sysfs_lock);
-		blk_mq_sched_free_rqs(q);
 		elevator_exit(q);
 		mutex_unlock(&q->sysfs_lock);
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/12] block: move rq_qos_exit() into disk_release()
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 11/12] block: do more work in elevator_exit Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
  2022-02-26  4:46 ` move more work to disk_release Bart Van Assche
  12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

From: Ming Lei <ming.lei@redhat.com>

There can't be FS IO in disk_release(), so it is safe to move rq_qos_exit()
there.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/genhd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 40edff4331758..33d61bc10addc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -627,7 +627,6 @@ void del_gendisk(struct gendisk *disk)
 
 	blk_mq_freeze_queue_wait(q);
 
-	rq_qos_exit(q);
 	blk_sync_queue(q);
 	blk_flush_integrity();
 	/*
@@ -1114,7 +1113,7 @@ static void blk_mq_release_queue(struct request_queue *q)
 		elevator_exit(q);
 		mutex_unlock(&q->sysfs_lock);
 	}
-
+	rq_qos_exit(q);
 	__blk_mq_unfreeze_queue(q, true);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
@ 2022-02-22 18:29   ` Bart Van Assche
  2022-02-23  6:56     ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-22 18:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

On 2/22/22 06:14, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> There can't be FS IO in disk_release(), so move blk_exit_queue() there.
> 
> We still need to freeze queue here since the request is freed after the
> bio is completed and passthrough request rely on scheduler tags as well.
> 
> The disk can be released before or after queue is cleaned up, and we have
> to free the scheduler request pool before blk_cleanup_queue returns,
> while the static request pool has to be freed before exiting the
> I/O scheduler.

This patch looks dubious to me because:
- The blk_freeze_queue() call in blk_cleanup_queue() waits for pending
   requests to finish, so why to move blk_exit_queue() from
   blk_cleanup_queue() into disk_release()?
- I'm concerned that this patch will break user space, e.g. scripts that
   try to unload an I/O scheduler kernel module immediately after having
   removed a request queue.

> +static void blk_mq_release_queue(struct request_queue *q)
> +{
> +	blk_mq_cancel_work_sync(q);
> +
> +	/*
> +	 * There can't be any non non-passthrough bios in flight here, but
> +	 * requests stay around longer, including passthrough ones so we
> +	 * still need to freeze the queue here.
> +	 */
> +	blk_mq_freeze_queue(q);

The above comment should be elaborated since what matters in this 
context is not whether or not any bios are still in flight but what 
happens with the request structures. As you know blk_queue_enter() fails 
after the DYING flag has been set, a flag that is set by 
blk_cleanup_queue(). blk_cleanup_queue() already freezes the queue. So 
why is it necessary to call blk_mq_freeze_queue() from 
blk_mq_release_queue()?

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
  2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
@ 2022-02-23  2:08   ` Ming Lei
  2022-02-23  6:42     ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23  2:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Martin K. Petersen, linux-block, linux-scsi

On Tue, Feb 22, 2022 at 03:14:39PM +0100, Christoph Hellwig wrote:
> I/O accounting buckets I/O into the read/write/discard categories into
> which passthrough I/O does not fit at all.  It also accounts to the
> block_device, which may not even exist for passthrough I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-mq.c | 6 +-----
>  block/blk.h    | 2 +-
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a05ce77250316..ee80853473d1e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -883,11 +883,7 @@ static inline void blk_account_io_done(struct request *req, u64 now)
>  
>  static void __blk_account_io_start(struct request *rq)
>  {
> -	/* passthrough requests can hold bios that do not have ->bi_bdev set */
> -	if (rq->bio && rq->bio->bi_bdev)
> -		rq->part = rq->bio->bi_bdev;
> -	else if (rq->q->disk)
> -		rq->part = rq->q->disk->part0;
> +	rq->part = rq->bio->bi_bdev;
>  
>  	part_stat_lock();
>  	update_io_ticks(rq->part, jiffies, false);
> diff --git a/block/blk.h b/block/blk.h
> index ebaa59ca46ca6..6f21859c7f0ff 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -325,7 +325,7 @@ int blk_dev_init(void);
>   */
>  static inline bool blk_do_io_stat(struct request *rq)
>  {
> -	return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> +	return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);

I guess this way may cause regression for workloads with lots of userspace IO
from user viewpoint?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
  2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
@ 2022-02-23  4:06   ` Bart Van Assche
  2022-02-23  6:41     ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-23  4:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

On 2/22/22 06:14, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> The queue's top debugfs dir is removed from blk_release_queue(), so all
> hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
> is only called from blk_cleanup_queue(), it isn't necessary to remove
> hctx debugfs from blk_mq_exit_queue().
> 
> So remove it from blk_mq_exit_queue().
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 63e2d3fd60946..540c8da30da72 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
>   	queue_for_each_hw_ctx(q, hctx, i) {
>   		if (i == nr_queue)
>   			break;
> -		blk_mq_debugfs_unregister_hctx(hctx);
>   		blk_mq_exit_hctx(q, set, hctx, i);
>   	}
>   }

What will happen if a new queue with the same name as a removed queue is 
created before blk_release_queue() for the removed queue has finished? 
Will that cause registration of debugfs attributes for the newly created 
queue to fail?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
  2022-02-23  4:06   ` Bart Van Assche
@ 2022-02-23  6:41     ` Ming Lei
  0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-23  6:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
	linux-scsi

On Tue, Feb 22, 2022 at 08:06:31PM -0800, Bart Van Assche wrote:
> On 2/22/22 06:14, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > 
> > The queue's top debugfs dir is removed from blk_release_queue(), so all
> > hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
> > is only called from blk_cleanup_queue(), it isn't necessary to remove
> > hctx debugfs from blk_mq_exit_queue().
> > 
> > So remove it from blk_mq_exit_queue().
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >   block/blk-mq.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 63e2d3fd60946..540c8da30da72 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
> >   	queue_for_each_hw_ctx(q, hctx, i) {
> >   		if (i == nr_queue)
> >   			break;
> > -		blk_mq_debugfs_unregister_hctx(hctx);
> >   		blk_mq_exit_hctx(q, set, hctx, i);
> >   	}
> >   }
> 
> What will happen if a new queue with the same name as a removed queue is
> created before blk_release_queue() for the removed queue has finished? Will
> that cause registration of debugfs attributes for the newly created queue to
> fail?

That may happen, but not related with this patch, since this patch just
delays removing of hctx's debug entry. And q->debugfs_dir is removed
from blk_release_queue().

So far, request queue doesn't has name, and just uses the disk's name
for creating debugfs entry. The trouble should have been there for long
time.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
  2022-02-23  2:08   ` Ming Lei
@ 2022-02-23  6:42     ` Christoph Hellwig
  2022-02-23  7:02       ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-23  6:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
	linux-scsi

On Wed, Feb 23, 2022 at 10:08:20AM +0800, Ming Lei wrote:
> > -	return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> > +	return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> 
> I guess this way may cause regression for workloads with lots of userspace IO
> from user viewpoint?

I'd say it fixes it as the accounting right now is completely bogus.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-22 18:29   ` Bart Van Assche
@ 2022-02-23  6:56     ` Ming Lei
  2022-02-23 20:04       ` Bart Van Assche
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23  6:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
	linux-scsi

On Tue, Feb 22, 2022 at 10:29:47AM -0800, Bart Van Assche wrote:
> On 2/22/22 06:14, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> > 
> > There can't be FS IO in disk_release(), so move blk_exit_queue() there.
> > 
> > We still need to freeze queue here since the request is freed after the
> > bio is completed and passthrough request rely on scheduler tags as well.
> > 
> > The disk can be released before or after queue is cleaned up, and we have
> > to free the scheduler request pool before blk_cleanup_queue returns,
> > while the static request pool has to be freed before exiting the
> > I/O scheduler.
> 
> This patch looks dubious to me because:
> - The blk_freeze_queue() call in blk_cleanup_queue() waits for pending
>   requests to finish, so why to move blk_exit_queue() from
>   blk_cleanup_queue() into disk_release()?

scsi disk may be released before calling blk_cleanup_queue(), and we
want to tear down all FS related stuff(cgroup, rqos, elevator) in disk_release().

And FS bios have been drained already when releasing disk.

> - I'm concerned that this patch will break user space, e.g. scripts that
>   try to unload an I/O scheduler kernel module immediately after having
>   removed a request queue.

When removing a request queue, the associated disk has been removed
already, and queue's kobject has been deleted too, so how can userspace
unload I/O scheduler at that time?

> 
> > +static void blk_mq_release_queue(struct request_queue *q)
> > +{
> > +	blk_mq_cancel_work_sync(q);
> > +
> > +	/*
> > +	 * There can't be any non non-passthrough bios in flight here, but
> > +	 * requests stay around longer, including passthrough ones so we
> > +	 * still need to freeze the queue here.
> > +	 */
> > +	blk_mq_freeze_queue(q);
> 
> The above comment should be elaborated since what matters in this context is
> not whether or not any bios are still in flight but what happens with the
> request structures.

Yeah, bios have been done, but request is done after bio is ended, see
blk_update_request(), that is why we added blk_mq_freeze_queue() here.

> As you know blk_queue_enter() fails after the DYING flag
> has been set, a flag that is set by blk_cleanup_queue().blk_cleanup_queue()
> already freezes the queue. So why is it necessary to call
> blk_mq_freeze_queue() from blk_mq_release_queue()?

disk may be released before calling blk_cleanup_queue().

But I admit here the name of blk_mq_release_queue() is very misleading,
maybe blk_mq_release_io_queue() is better?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
  2022-02-23  6:42     ` Christoph Hellwig
@ 2022-02-23  7:02       ` Ming Lei
  2022-02-23  7:36         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23  7:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Martin K. Petersen, linux-block, linux-scsi

On Wed, Feb 23, 2022 at 07:42:26AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 23, 2022 at 10:08:20AM +0800, Ming Lei wrote:
> > > -	return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> > > +	return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> > 
> > I guess this way may cause regression for workloads with lots of userspace IO
> > from user viewpoint?
> 
> I'd say it fixes it as the accounting right now is completely bogus.

There are small amount of in-kernel passthrough requests(admin, or driver
private) which shouldn't be accounted, but passthrough RW IO requests from
userspace can be lots of, and user may rely on diskstat to account them.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
  2022-02-23  7:02       ` Ming Lei
@ 2022-02-23  7:36         ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-23  7:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
	linux-scsi

On Wed, Feb 23, 2022 at 03:02:08PM +0800, Ming Lei wrote:
> There are small amount of in-kernel passthrough requests(admin, or driver
> private) which shouldn't be accounted, but passthrough RW IO requests from
> userspace can be lots of, and user may rely on diskstat to account them.

/dev/sg won't be accounted either.  But most importantly they are
accounted wrongly: the accounting buckets into read/write/discard.  Any
most pass through commands are everything but.

Also the way how this accounting works is completely broken.
Passthrough requests are sent through a request_queue, and it does
not make sense to account them to a block_device which sits way about
that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-23  6:56     ` Ming Lei
@ 2022-02-23 20:04       ` Bart Van Assche
  2022-02-24  7:25         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-23 20:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
	linux-scsi

On 2/22/22 22:56, Ming Lei wrote:
> But I admit here the name of blk_mq_release_queue() is very misleading,
> maybe blk_mq_release_io_queue() is better?

I'm not sure what the best name for that function would be. Anyway, 
thanks for having clarified that disk structures are removed before the 
request queue is cleaned up. That's something I was missing.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-23 20:04       ` Bart Van Assche
@ 2022-02-24  7:25         ` Christoph Hellwig
  2022-02-25  1:26           ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-24  7:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	linux-block, linux-scsi

On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> On 2/22/22 22:56, Ming Lei wrote:
>> But I admit here the name of blk_mq_release_queue() is very misleading,
>> maybe blk_mq_release_io_queue() is better?
>
> I'm not sure what the best name for that function would be. Anyway, thanks 
> for having clarified that disk structures are removed before the request 
> queue is cleaned up. That's something I was missing.

Maybe disk_release_mq?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
  2022-02-24  7:25         ` Christoph Hellwig
@ 2022-02-25  1:26           ` Ming Lei
  0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-25  1:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Jens Axboe, Martin K. Petersen, linux-block, linux-scsi

On Thu, Feb 24, 2022 at 08:25:24AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> > On 2/22/22 22:56, Ming Lei wrote:
> >> But I admit here the name of blk_mq_release_queue() is very misleading,
> >> maybe blk_mq_release_io_queue() is better?
> >
> > I'm not sure what the best name for that function would be. Anyway, thanks 
> > for having clarified that disk structures are removed before the request 
> > queue is cleaned up. That's something I was missing.
> 
> Maybe disk_release_mq?

disk_release_mq() looks much better.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: move more work to disk_release
  2022-02-22 14:14 move more work to disk_release Christoph Hellwig
                   ` (11 preceding siblings ...)
  2022-02-22 14:14 ` [PATCH 12/12] block: move rq_qos_exit() into disk_release() Christoph Hellwig
@ 2022-02-26  4:46 ` Bart Van Assche
  12 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2022-02-26  4:46 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi

On 2/22/22 06:14, Christoph Hellwig wrote:
> Git branch:
> 
>      git://git.infradead.org/users/hch/block.git freeze-5.18

A patch in or before this patch series may need some additional
work. This is what I see in the kernel log if I verify the above
kernel branch with blktests:

run blktests block/027 at 2022-02-26 03:54:57
[ ... ]
==================================================================
BUG: KASAN: use-after-free in sd_release+0x2a/0x100 [sd_mod]
Read of size 8 at addr ffff888115a0a000 by task fio/7217

CPU: 1 PID: 7217 Comm: fio Not tainted 5.17.0-rc2-dbg+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
Call Trace:
sd 9:0:0:1: [sde] Synchronizing SCSI cache
  <TASK>
  show_stack+0x52/0x58
  dump_stack_lvl+0x5b/0x82
  print_address_description.constprop.0+0x24/0x160
  ? sd_release+0x2a/0x100 [sd_mod]
  kasan_report.cold+0x82/0xdb
  ? perf_trace_sched_numa_pair_template+0x340/0x350
  ? sd_release+0x2a/0x100 [sd_mod]
  __asan_load8+0x69/0x90
  sd_release+0x2a/0x100 [sd_mod]
  blkdev_put+0x15a/0x3b0
  blkdev_close+0x3c/0x50
  __fput+0x13d/0x430
  ____fput+0xe/0x10
  task_work_run+0x8e/0xe0
  do_exit+0x2b6/0x5e0
  do_group_exit+0x71/0x150
  __x64_sys_exit_group+0x31/0x40
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8d243d0ed1
Code: Unable to access opcode bytes at RIP 0x7f8d243d0ea7.
RSP: 002b:00007ffe2c7aae48 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f8d243d0ed1
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000013
RBP: 00007f8d1214ae90 R08: ffffffffffffe168 R09: a53fa94fea53fa95
R10: 0000000000000002 R11: 0000000000000206 R12: 00007f8d253d3c30
R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
  </TASK>

Allocated by task 5692:
  kasan_save_stack+0x26/0x50
  __kasan_kmalloc+0x88/0xa0
  kmem_cache_alloc_trace+0x1a3/0x2c0
  sd_probe+0x9a/0x700 [sd_mod]
  really_probe+0x141/0x5d0
  __driver_probe_device+0x1aa/0x240
  driver_probe_device+0x4e/0x110
  __device_attach_driver+0xf6/0x160
  bus_for_each_drv+0xfd/0x160
  __device_attach_async_helper+0x138/0x190
  async_run_entry_fn+0x63/0x240
  process_one_work+0x594/0xad0
  worker_thread+0x2de/0x6b0
  kthread+0x15f/0x190
  ret_from_fork+0x1f/0x30

Freed by task 6426:
  kasan_save_stack+0x26/0x50
  kasan_set_track+0x25/0x30
  kasan_set_free_info+0x24/0x40
  __kasan_slab_free+0x100/0x140
  kfree+0xd1/0x510
  scsi_disk_release+0x41/0x50 [sd_mod]
  device_release+0x60/0x100
  kobject_cleanup+0x7f/0x1c0
  kobject_put+0x76/0x90
  put_device+0x13/0x20
  sd_remove+0x63/0x70 [sd_mod]
  __device_release_driver+0x37e/0x390
  device_release_driver+0x2b/0x40
  bus_remove_device+0x1aa/0x270
  device_del+0x2d4/0x640
  __scsi_remove_device+0x168/0x1a0
  sdev_store_delete+0x75/0xe0
  dev_attr_store+0x3e/0x60
  sysfs_kf_write+0x87/0xa0
  kernfs_fop_write_iter+0x1c7/0x270
  new_sync_write+0x296/0x3c0
  vfs_write+0x43c/0x580
  ksys_write+0xd9/0x180
  __x64_sys_write+0x42/0x50
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Last potentially related work creation:
  kasan_save_stack+0x26/0x50
  __kasan_record_aux_stack+0xa8/0xc0
  kasan_record_aux_stack_noalloc+0xb/0x10
  insert_work+0x3b/0x170
  __queue_work+0x32f/0x7d0
  queue_work_on+0x7e/0x90
  rpm_idle+0x432/0x460
  __pm_runtime_set_status+0x1da/0x520
  pm_runtime_remove+0xb3/0xc0
  device_pm_remove+0x108/0x190
  device_del+0x2dc/0x640
  __scsi_remove_device+0x168/0x1a0
  sdev_store_delete+0x75/0xe0
  dev_attr_store+0x3e/0x60
  sysfs_kf_write+0x87/0xa0
  kernfs_fop_write_iter+0x1c7/0x270
  new_sync_write+0x296/0x3c0
  vfs_write+0x43c/0x580
  ksys_write+0xd9/0x180
  __x64_sys_write+0x42/0x50
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
  kasan_save_stack+0x26/0x50
  __kasan_record_aux_stack+0xa8/0xc0
  kasan_record_aux_stack_noalloc+0xb/0x10
  insert_work+0x3b/0x170
  __queue_work+0x32f/0x7d0
  queue_work_on+0x7e/0x90
  queue_release_one_tty+0xbf/0xd0
  release_tty+0x241/0x290
  tty_release_struct+0x92/0xb0
  tty_release+0x5b1/0x5f0
  __fput+0x13d/0x430
  ____fput+0xe/0x10
  task_work_run+0x8e/0xe0
  exit_to_user_mode_loop+0xee/0xf0
  exit_to_user_mode_prepare+0xd6/0x100
  syscall_exit_to_user_mode+0x1e/0x50
  do_syscall_64+0x42/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff888115a0a000
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 0 bytes inside of
  2048-byte region [ffff888115a0a000, ffff888115a0a800)
The buggy address belongs to the page:
page:00000000fac6ce95 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888115a0f000 pfn:0x115a08
head:00000000fac6ce95 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2000000000010200(slab|head|node=0|zone=2)
raw: 2000000000010200 ffffea00041d5408 ffffea000407d808 ffff888100042f00
raw: ffff888115a0f000 0000000000080006 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-02-26  4:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
2022-02-23  2:08   ` Ming Lei
2022-02-23  6:42     ` Christoph Hellwig
2022-02-23  7:02       ` Ming Lei
2022-02-23  7:36         ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs Christoph Hellwig
2022-02-22 14:14 ` [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver Christoph Hellwig
2022-02-22 14:14 ` [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting Christoph Hellwig
2022-02-22 14:14 ` [PATCH 05/12] sd: remove the extra sdev_gendev reference Christoph Hellwig
2022-02-22 14:14 ` [PATCH 06/12] sr: implement ->free_disk Christoph Hellwig
2022-02-22 14:14 ` [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler Christoph Hellwig
2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
2022-02-23  4:06   ` Bart Van Assche
2022-02-23  6:41     ` Ming Lei
2022-02-22 14:14 ` [PATCH 09/12] block: move q_usage_counter release into blk_queue_release Christoph Hellwig
2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
2022-02-22 18:29   ` Bart Van Assche
2022-02-23  6:56     ` Ming Lei
2022-02-23 20:04       ` Bart Van Assche
2022-02-24  7:25         ` Christoph Hellwig
2022-02-25  1:26           ` Ming Lei
2022-02-22 14:14 ` [PATCH 11/12] block: do more work in elevator_exit Christoph Hellwig
2022-02-22 14:14 ` [PATCH 12/12] block: move rq_qos_exit() into disk_release() Christoph Hellwig
2022-02-26  4:46 ` move more work to disk_release Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.