* [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-23 2:08 ` Ming Lei
2022-02-22 14:14 ` [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs Christoph Hellwig
` (11 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
I/O accounting buckets I/O into the read/write/discard categories into
which passthrough I/O does not fit at all. It also accounts to the
block_device, which may not even exist for passthrough I/O.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 6 +-----
block/blk.h | 2 +-
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ce77250316..ee80853473d1e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -883,11 +883,7 @@ static inline void blk_account_io_done(struct request *req, u64 now)
static void __blk_account_io_start(struct request *rq)
{
- /* passthrough requests can hold bios that do not have ->bi_bdev set */
- if (rq->bio && rq->bio->bi_bdev)
- rq->part = rq->bio->bi_bdev;
- else if (rq->q->disk)
- rq->part = rq->q->disk->part0;
+ rq->part = rq->bio->bi_bdev;
part_stat_lock();
update_io_ticks(rq->part, jiffies, false);
diff --git a/block/blk.h b/block/blk.h
index ebaa59ca46ca6..6f21859c7f0ff 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -325,7 +325,7 @@ int blk_dev_init(void);
*/
static inline bool blk_do_io_stat(struct request *rq)
{
- return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
+ return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
}
void update_io_ticks(struct block_device *part, unsigned long now, bool end);
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
@ 2022-02-23 2:08 ` Ming Lei
2022-02-23 6:42 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23 2:08 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, Martin K. Petersen, linux-block, linux-scsi
On Tue, Feb 22, 2022 at 03:14:39PM +0100, Christoph Hellwig wrote:
> I/O accounting buckets I/O into the read/write/discard categories into
> which passthrough I/O does not fit at all. It also accounts to the
> block_device, which may not even exist for passthrough I/O.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-mq.c | 6 +-----
> block/blk.h | 2 +-
> 2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a05ce77250316..ee80853473d1e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -883,11 +883,7 @@ static inline void blk_account_io_done(struct request *req, u64 now)
>
> static void __blk_account_io_start(struct request *rq)
> {
> - /* passthrough requests can hold bios that do not have ->bi_bdev set */
> - if (rq->bio && rq->bio->bi_bdev)
> - rq->part = rq->bio->bi_bdev;
> - else if (rq->q->disk)
> - rq->part = rq->q->disk->part0;
> + rq->part = rq->bio->bi_bdev;
>
> part_stat_lock();
> update_io_ticks(rq->part, jiffies, false);
> diff --git a/block/blk.h b/block/blk.h
> index ebaa59ca46ca6..6f21859c7f0ff 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -325,7 +325,7 @@ int blk_dev_init(void);
> */
> static inline bool blk_do_io_stat(struct request *rq)
> {
> - return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> + return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
I guess this way may cause regression for workloads with lots of userspace IO
from user viewpoint?
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
2022-02-23 2:08 ` Ming Lei
@ 2022-02-23 6:42 ` Christoph Hellwig
2022-02-23 7:02 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-23 6:42 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
linux-scsi
On Wed, Feb 23, 2022 at 10:08:20AM +0800, Ming Lei wrote:
> > - return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> > + return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
>
> I guess this way may cause regression for workloads with lots of userspace IO
> from user viewpoint?
I'd say it fixes it as the accounting right now is completely bogus.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
2022-02-23 6:42 ` Christoph Hellwig
@ 2022-02-23 7:02 ` Ming Lei
2022-02-23 7:36 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23 7:02 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Jens Axboe, Martin K. Petersen, linux-block, linux-scsi
On Wed, Feb 23, 2022 at 07:42:26AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 23, 2022 at 10:08:20AM +0800, Ming Lei wrote:
> > > - return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
> > > + return (rq->rq_flags & RQF_IO_STAT) && !blk_rq_is_passthrough(rq);
> >
> > I guess this way may cause regression for workloads with lots of userspace IO
> > from user viewpoint?
>
> I'd say it fixes it as the accounting right now is completely bogus.
There are small amount of in-kernel passthrough requests(admin, or driver
private) which shouldn't be accounted, but passthrough RW IO requests from
userspace can be lots of, and user may rely on diskstat to account them.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting
2022-02-23 7:02 ` Ming Lei
@ 2022-02-23 7:36 ` Christoph Hellwig
0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-23 7:36 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
linux-scsi
On Wed, Feb 23, 2022 at 03:02:08PM +0800, Ming Lei wrote:
> There are small amount of in-kernel passthrough requests(admin, or driver
> private) which shouldn't be accounted, but passthrough RW IO requests from
> userspace can be lots of, and user may rely on diskstat to account them.
/dev/sg won't be accounted either. But most importantly they are
accounted wrongly: the accounting buckets into read/write/discard. Any
most pass through commands are everything but.
Also the way how this accounting works is completely broken.
Passthrough requests are sent through a request_queue, and it does
not make sense to account them to a block_device which sits way about
that.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver Christoph Hellwig
` (10 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
From: Ming Lei <ming.lei@redhat.com>
To simplify further changes allow for double calling blk_mq_free_rqs on
a queue.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: split out from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ee80853473d1e..63e2d3fd60946 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3061,6 +3061,9 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
struct blk_mq_tags *drv_tags;
struct page *page;
+ if (list_empty(&tags->page_list))
+ return;
+
if (blk_mq_is_shared_tags(set->flags))
drv_tags = set->shared_tags;
else
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
2022-02-22 14:14 ` [PATCH 01/12] blk-mq: do not include passthrough requests in I/O accounting Christoph Hellwig
2022-02-22 14:14 ` [PATCH 02/12] blk-mq: handle already freed tags gracefully in blk_mq_free_rqs Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting Christoph Hellwig
` (9 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
Requiring every ULP to have the scsi_drive as first member of the
private data is rather fragile and not necessary anyway. Just use
the driver hanging off the SCSI device instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sd.c | 3 +--
drivers/scsi/sd.h | 3 +--
drivers/scsi/sr.c | 5 ++---
drivers/scsi/sr.h | 1 -
drivers/scsi/st.c | 1 -
drivers/scsi/st.h | 1 -
include/scsi/scsi_cmnd.h | 9 ---------
include/scsi/scsi_driver.h | 9 +++++++--
8 files changed, 11 insertions(+), 21 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2d648d27bfd71..2a1e19e871d30 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3515,7 +3515,6 @@ static int sd_probe(struct device *dev)
}
sdkp->device = sdp;
- sdkp->driver = &sd_template;
sdkp->disk = gd;
sdkp->index = index;
sdkp->max_retries = SD_MAX_RETRIES;
@@ -3548,7 +3547,7 @@ static int sd_probe(struct device *dev)
gd->minors = SD_MINORS;
gd->fops = &sd_fops;
- gd->private_data = &sdkp->driver;
+ gd->private_data = sdkp;
/* defaults, until the device tells us otherwise */
sdp->sector_size = 512;
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 2e5932bde43d1..303aa1c23aefb 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -68,7 +68,6 @@ enum {
};
struct scsi_disk {
- struct scsi_driver *driver; /* always &sd_template */
struct scsi_device *device;
struct device dev;
struct gendisk *disk;
@@ -131,7 +130,7 @@ struct scsi_disk {
static inline struct scsi_disk *scsi_disk(struct gendisk *disk)
{
- return container_of(disk->private_data, struct scsi_disk, driver);
+ return disk->private_data;
}
#define sd_printk(prefix, sdsk, fmt, a...) \
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index f925b1f1f9ada..569bda76a5175 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -147,7 +147,7 @@ static void sr_kref_release(struct kref *kref);
static inline struct scsi_cd *scsi_cd(struct gendisk *disk)
{
- return container_of(disk->private_data, struct scsi_cd, driver);
+ return disk->private_data;
}
static int sr_runtime_suspend(struct device *dev)
@@ -692,7 +692,6 @@ static int sr_probe(struct device *dev)
cd->device = sdev;
cd->disk = disk;
- cd->driver = &sr_template;
cd->capacity = 0x1fffff;
cd->device->changed = 1; /* force recheck CD type */
cd->media_present = 1;
@@ -713,7 +712,7 @@ static int sr_probe(struct device *dev)
sr_vendor_init(cd);
set_capacity(disk, cd->capacity);
- disk->private_data = &cd->driver;
+ disk->private_data = cd;
if (register_cdrom(disk, &cd->cdi))
goto fail_minor;
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index 1609f02ed29ac..d80af3fcb6f97 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -32,7 +32,6 @@ struct scsi_device;
typedef struct scsi_cd {
- struct scsi_driver *driver;
unsigned capacity; /* size in blocks */
struct scsi_device *device;
unsigned int vendor; /* vendor code, see sr_vendor.c */
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index e869e90e05afe..ebe9412c86f43 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4276,7 +4276,6 @@ static int st_probe(struct device *dev)
goto out_buffer_free;
}
kref_init(&tpnt->kref);
- tpnt->driver = &st_template;
tpnt->device = SDp;
if (SDp->scsi_level <= 2)
diff --git a/drivers/scsi/st.h b/drivers/scsi/st.h
index c0ef0d9aaf8a2..7a68eaba7e810 100644
--- a/drivers/scsi/st.h
+++ b/drivers/scsi/st.h
@@ -117,7 +117,6 @@ struct scsi_tape_stats {
/* The tape drive descriptor */
struct scsi_tape {
- struct scsi_driver *driver;
struct scsi_device *device;
struct mutex lock; /* For serialization */
struct completion wait; /* For SCSI commands */
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 6794d7322cbde..e3a4c67794b14 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -13,7 +13,6 @@
#include <scsi/scsi_request.h>
struct Scsi_Host;
-struct scsi_driver;
/*
* MAX_COMMAND_SIZE is:
@@ -159,14 +158,6 @@ static inline void *scsi_cmd_priv(struct scsi_cmnd *cmd)
return cmd + 1;
}
-/* make sure not to use it with passthrough commands */
-static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
-{
- struct request *rq = scsi_cmd_to_rq(cmd);
-
- return *(struct scsi_driver **)rq->q->disk->private_data;
-}
-
void scsi_done(struct scsi_cmnd *cmd);
extern void scsi_finish_command(struct scsi_cmnd *cmd);
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 6dffa8555a390..4ce1988b2ba01 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -4,11 +4,10 @@
#include <linux/blk_types.h>
#include <linux/device.h>
+#include <scsi/scsi_cmnd.h>
struct module;
struct request;
-struct scsi_cmnd;
-struct scsi_device;
struct scsi_driver {
struct device_driver gendrv;
@@ -31,4 +30,10 @@ extern int scsi_register_interface(struct class_interface *);
#define scsi_unregister_interface(intf) \
class_interface_unregister(intf)
+/* make sure not to use it with passthrough commands */
+static inline struct scsi_driver *scsi_cmd_to_driver(struct scsi_cmnd *cmd)
+{
+ return to_scsi_driver(cmd->device->sdev_gendev.driver);
+}
+
#endif /* _SCSI_SCSI_DRIVER_H */
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (2 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 03/12] scsi: don't use disk->private_data to find the scsi_driver Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 05/12] sd: remove the extra sdev_gendev reference Christoph Hellwig
` (8 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
Implement the ->free_disk method to to put struct scsi_disk when the last
gendisk reference count goes away. This removes the need to clear
->private_data and thus freeze the queue on unbind.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sd.c | 89 ++++++++---------------------------------------
1 file changed, 15 insertions(+), 74 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2a1e19e871d30..4eaa5deafc3dc 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -121,11 +121,6 @@ static void scsi_disk_release(struct device *cdev);
static DEFINE_IDA(sd_index_ida);
-/* This semaphore is used to mediate the 0->1 reference get in the
- * face of object destruction (i.e. we can't allow a get on an
- * object after last put) */
-static DEFINE_MUTEX(sd_ref_mutex);
-
static struct kmem_cache *sd_cdb_cache;
static mempool_t *sd_cdb_pool;
static mempool_t *sd_page_pool;
@@ -663,33 +658,6 @@ static int sd_major(int major_idx)
}
}
-static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
-{
- struct scsi_disk *sdkp = NULL;
-
- mutex_lock(&sd_ref_mutex);
-
- if (disk->private_data) {
- sdkp = scsi_disk(disk);
- if (scsi_device_get(sdkp->device) == 0)
- get_device(&sdkp->dev);
- else
- sdkp = NULL;
- }
- mutex_unlock(&sd_ref_mutex);
- return sdkp;
-}
-
-static void scsi_disk_put(struct scsi_disk *sdkp)
-{
- struct scsi_device *sdev = sdkp->device;
-
- mutex_lock(&sd_ref_mutex);
- put_device(&sdkp->dev);
- scsi_device_put(sdev);
- mutex_unlock(&sd_ref_mutex);
-}
-
#ifdef CONFIG_BLK_SED_OPAL
static int sd_sec_submit(void *data, u16 spsp, u8 secp, void *buffer,
size_t len, bool send)
@@ -1418,17 +1386,15 @@ static bool sd_need_revalidate(struct block_device *bdev,
**/
static int sd_open(struct block_device *bdev, fmode_t mode)
{
- struct scsi_disk *sdkp = scsi_disk_get(bdev->bd_disk);
- struct scsi_device *sdev;
+ struct scsi_disk *sdkp = scsi_disk(bdev->bd_disk);
+ struct scsi_device *sdev = sdkp->device;
int retval;
- if (!sdkp)
+ if (scsi_device_get(sdev))
return -ENXIO;
SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_open\n"));
- sdev = sdkp->device;
-
/*
* If the device is in error recovery, wait until it is done.
* If the device is offline, then disallow any access to it.
@@ -1473,7 +1439,7 @@ static int sd_open(struct block_device *bdev, fmode_t mode)
return 0;
error_out:
- scsi_disk_put(sdkp);
+ scsi_device_put(sdkp->device);
return retval;
}
@@ -1502,7 +1468,7 @@ static void sd_release(struct gendisk *disk, fmode_t mode)
scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
}
- scsi_disk_put(sdkp);
+ scsi_device_put(sdkp->device);
}
static int sd_getgeo(struct block_device *bdev, struct hd_geometry *geo)
@@ -1616,7 +1582,7 @@ static int media_not_present(struct scsi_disk *sdkp,
**/
static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
{
- struct scsi_disk *sdkp = scsi_disk_get(disk);
+ struct scsi_disk *sdkp = disk->private_data;
struct scsi_device *sdp;
int retval;
bool disk_changed;
@@ -1679,7 +1645,6 @@ static unsigned int sd_check_events(struct gendisk *disk, unsigned int clearing)
*/
disk_changed = sdp->changed;
sdp->changed = 0;
- scsi_disk_put(sdkp);
return disk_changed ? DISK_EVENT_MEDIA_CHANGE : 0;
}
@@ -1887,6 +1852,13 @@ static const struct pr_ops sd_pr_ops = {
.pr_clear = sd_pr_clear,
};
+static void scsi_disk_free_disk(struct gendisk *disk)
+{
+ struct scsi_disk *sdkp = disk->private_data;
+
+ put_device(&sdkp->dev);
+}
+
static const struct block_device_operations sd_fops = {
.owner = THIS_MODULE,
.open = sd_open,
@@ -1898,6 +1870,7 @@ static const struct block_device_operations sd_fops = {
.unlock_native_capacity = sd_unlock_native_capacity,
.report_zones = sd_zbc_report_zones,
.get_unique_id = sd_get_unique_id,
+ .free_disk = scsi_disk_free_disk,
.pr_ops = &sd_pr_ops,
};
@@ -3623,9 +3596,8 @@ static int sd_probe(struct device *dev)
**/
static int sd_remove(struct device *dev)
{
- struct scsi_disk *sdkp;
+ struct scsi_disk *sdkp = dev_get_drvdata(dev);
- sdkp = dev_get_drvdata(dev);
scsi_autopm_get_device(sdkp->device);
device_del(&sdkp->dev);
@@ -3634,48 +3606,17 @@ static int sd_remove(struct device *dev)
free_opal_dev(sdkp->opal_dev);
- mutex_lock(&sd_ref_mutex);
- dev_set_drvdata(dev, NULL);
put_device(&sdkp->dev);
- mutex_unlock(&sd_ref_mutex);
-
return 0;
}
-/**
- * scsi_disk_release - Called to free the scsi_disk structure
- * @dev: pointer to embedded class device
- *
- * sd_ref_mutex must be held entering this routine. Because it is
- * called on last put, you should always use the scsi_disk_get()
- * scsi_disk_put() helpers which manipulate the semaphore directly
- * and never do a direct put_device.
- **/
static void scsi_disk_release(struct device *dev)
{
struct scsi_disk *sdkp = to_scsi_disk(dev);
- struct gendisk *disk = sdkp->disk;
- struct request_queue *q = disk->queue;
ida_free(&sd_index_ida, sdkp->index);
-
- /*
- * Wait until all requests that are in progress have completed.
- * This is necessary to avoid that e.g. scsi_end_request() crashes
- * due to clearing the disk->private_data pointer. Wait from inside
- * scsi_disk_release() instead of from sd_release() to avoid that
- * freezing and unfreezing the request queue affects user space I/O
- * in case multiple processes open a /dev/sd... node concurrently.
- */
- blk_mq_freeze_queue(q);
- blk_mq_unfreeze_queue(q);
-
- disk->private_data = NULL;
- put_disk(disk);
put_device(&sdkp->device->sdev_gendev);
-
sd_zbc_release_disk(sdkp);
-
kfree(sdkp);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 05/12] sd: remove the extra sdev_gendev reference
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (3 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 04/12] sd: make use of ->free_disk to simplify refcounting Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 06/12] sr: implement ->free_disk Christoph Hellwig
` (7 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
device_add already takes a reference on the parent, not need to take an
extra one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sd.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 4eaa5deafc3dc..041c21c9483f6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3503,7 +3503,7 @@ static int sd_probe(struct device *dev)
}
device_initialize(&sdkp->dev);
- sdkp->dev.parent = get_device(dev);
+ sdkp->dev.parent = dev;
sdkp->dev.class = &sd_disk_class;
dev_set_name(&sdkp->dev, "%s", dev_name(dev));
@@ -3615,7 +3615,6 @@ static void scsi_disk_release(struct device *dev)
struct scsi_disk *sdkp = to_scsi_disk(dev);
ida_free(&sd_index_ida, sdkp->index);
- put_device(&sdkp->device->sdev_gendev);
sd_zbc_release_disk(sdkp);
kfree(sdkp);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 06/12] sr: implement ->free_disk
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (4 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 05/12] sd: remove the extra sdev_gendev reference Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler Christoph Hellwig
` (6 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
Simplify the refcounting and remove the need to clear disk->private_data
by implementing the ->free_disk method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/scsi/sr.c | 124 ++++++++++------------------------------------
drivers/scsi/sr.h | 4 --
2 files changed, 26 insertions(+), 102 deletions(-)
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 569bda76a5175..11fbdc75bb711 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -109,11 +109,6 @@ static DEFINE_SPINLOCK(sr_index_lock);
static struct lock_class_key sr_bio_compl_lkclass;
-/* This semaphore is used to mediate the 0->1 reference get in the
- * face of object destruction (i.e. we can't allow a get on an
- * object after last put) */
-static DEFINE_MUTEX(sr_ref_mutex);
-
static int sr_open(struct cdrom_device_info *, int);
static void sr_release(struct cdrom_device_info *);
@@ -143,8 +138,6 @@ static const struct cdrom_device_ops sr_dops = {
.capability = SR_CAPABILITIES,
};
-static void sr_kref_release(struct kref *kref);
-
static inline struct scsi_cd *scsi_cd(struct gendisk *disk)
{
return disk->private_data;
@@ -163,38 +156,6 @@ static int sr_runtime_suspend(struct device *dev)
return 0;
}
-/*
- * The get and put routines for the struct scsi_cd. Note this entity
- * has a scsi_device pointer and owns a reference to this.
- */
-static inline struct scsi_cd *scsi_cd_get(struct gendisk *disk)
-{
- struct scsi_cd *cd = NULL;
-
- mutex_lock(&sr_ref_mutex);
- if (disk->private_data == NULL)
- goto out;
- cd = scsi_cd(disk);
- kref_get(&cd->kref);
- if (scsi_device_get(cd->device)) {
- kref_put(&cd->kref, sr_kref_release);
- cd = NULL;
- }
- out:
- mutex_unlock(&sr_ref_mutex);
- return cd;
-}
-
-static void scsi_cd_put(struct scsi_cd *cd)
-{
- struct scsi_device *sdev = cd->device;
-
- mutex_lock(&sr_ref_mutex);
- kref_put(&cd->kref, sr_kref_release);
- scsi_device_put(sdev);
- mutex_unlock(&sr_ref_mutex);
-}
-
static unsigned int sr_get_events(struct scsi_device *sdev)
{
u8 buf[8];
@@ -522,15 +483,13 @@ static void sr_revalidate_disk(struct scsi_cd *cd)
static int sr_block_open(struct block_device *bdev, fmode_t mode)
{
- struct scsi_cd *cd;
- struct scsi_device *sdev;
+ struct scsi_cd *cd = cd = scsi_cd(bdev->bd_disk);
+ struct scsi_device *sdev = cd->device;
int ret = -ENXIO;
- cd = scsi_cd_get(bdev->bd_disk);
- if (!cd)
- goto out;
+ if (scsi_device_get(cd->device))
+ return -ENXIO;
- sdev = cd->device;
scsi_autopm_get_device(sdev);
if (bdev_check_media_change(bdev))
sr_revalidate_disk(cd);
@@ -541,9 +500,7 @@ static int sr_block_open(struct block_device *bdev, fmode_t mode)
scsi_autopm_put_device(sdev);
if (ret)
- scsi_cd_put(cd);
-
-out:
+ scsi_device_put(cd->device);
return ret;
}
@@ -555,7 +512,7 @@ static void sr_block_release(struct gendisk *disk, fmode_t mode)
cdrom_release(&cd->cdi, mode);
mutex_unlock(&cd->lock);
- scsi_cd_put(cd);
+ scsi_device_put(cd->device);
}
static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
@@ -595,18 +552,24 @@ static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
static unsigned int sr_block_check_events(struct gendisk *disk,
unsigned int clearing)
{
- unsigned int ret = 0;
- struct scsi_cd *cd;
+ struct scsi_cd *cd = disk->private_data;
- cd = scsi_cd_get(disk);
- if (!cd)
+ if (atomic_read(&cd->device->disk_events_disable_depth))
return 0;
+ return cdrom_check_events(&cd->cdi, clearing);
+}
- if (!atomic_read(&cd->device->disk_events_disable_depth))
- ret = cdrom_check_events(&cd->cdi, clearing);
+static void sr_free_disk(struct gendisk *disk)
+{
+ struct scsi_cd *cd = disk->private_data;
- scsi_cd_put(cd);
- return ret;
+ spin_lock(&sr_index_lock);
+ clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
+ spin_unlock(&sr_index_lock);
+
+ unregister_cdrom(&cd->cdi);
+ mutex_destroy(&cd->lock);
+ kfree(cd);
}
static const struct block_device_operations sr_bdops =
@@ -617,6 +580,7 @@ static const struct block_device_operations sr_bdops =
.ioctl = sr_block_ioctl,
.compat_ioctl = blkdev_compat_ptr_ioctl,
.check_events = sr_block_check_events,
+ .free_disk = sr_free_disk,
};
static int sr_open(struct cdrom_device_info *cdi, int purpose)
@@ -660,8 +624,6 @@ static int sr_probe(struct device *dev)
if (!cd)
goto fail;
- kref_init(&cd->kref);
-
disk = __alloc_disk_node(sdev->request_queue, NUMA_NO_NODE,
&sr_bio_compl_lkclass);
if (!disk)
@@ -727,10 +689,8 @@ static int sr_probe(struct device *dev)
sr_revalidate_disk(cd);
error = device_add_disk(&sdev->sdev_gendev, disk, NULL);
- if (error) {
- kref_put(&cd->kref, sr_kref_release);
- goto fail;
- }
+ if (error)
+ goto unregister_cdrom;
sdev_printk(KERN_DEBUG, sdev,
"Attached scsi CD-ROM %s\n", cd->cdi.name);
@@ -738,6 +698,8 @@ static int sr_probe(struct device *dev)
return 0;
+unregister_cdrom:
+ unregister_cdrom(&cd->cdi);
fail_minor:
spin_lock(&sr_index_lock);
clear_bit(minor, sr_index_bits);
@@ -1009,36 +971,6 @@ static int sr_read_cdda_bpc(struct cdrom_device_info *cdi, void __user *ubuf,
return ret;
}
-
-/**
- * sr_kref_release - Called to free the scsi_cd structure
- * @kref: pointer to embedded kref
- *
- * sr_ref_mutex must be held entering this routine. Because it is
- * called on last put, you should always use the scsi_cd_get()
- * scsi_cd_put() helpers which manipulate the semaphore directly
- * and never do a direct kref_put().
- **/
-static void sr_kref_release(struct kref *kref)
-{
- struct scsi_cd *cd = container_of(kref, struct scsi_cd, kref);
- struct gendisk *disk = cd->disk;
-
- spin_lock(&sr_index_lock);
- clear_bit(MINOR(disk_devt(disk)), sr_index_bits);
- spin_unlock(&sr_index_lock);
-
- unregister_cdrom(&cd->cdi);
-
- disk->private_data = NULL;
-
- put_disk(disk);
-
- mutex_destroy(&cd->lock);
-
- kfree(cd);
-}
-
static int sr_remove(struct device *dev)
{
struct scsi_cd *cd = dev_get_drvdata(dev);
@@ -1046,11 +978,7 @@ static int sr_remove(struct device *dev)
scsi_autopm_get_device(cd->device);
del_gendisk(cd->disk);
- dev_set_drvdata(dev, NULL);
-
- mutex_lock(&sr_ref_mutex);
- kref_put(&cd->kref, sr_kref_release);
- mutex_unlock(&sr_ref_mutex);
+ put_disk(cd->disk);
return 0;
}
diff --git a/drivers/scsi/sr.h b/drivers/scsi/sr.h
index d80af3fcb6f97..1175f2e213b56 100644
--- a/drivers/scsi/sr.h
+++ b/drivers/scsi/sr.h
@@ -18,7 +18,6 @@
#ifndef _SR_H
#define _SR_H
-#include <linux/kref.h>
#include <linux/mutex.h>
#define MAX_RETRIES 3
@@ -51,9 +50,6 @@ typedef struct scsi_cd {
struct cdrom_device_info cdi;
struct mutex lock;
- /* We hold gendisk and scsi_device references on probe and use
- * the refs on this kref to decide when to release them */
- struct kref kref;
struct gendisk *disk;
} Scsi_CD;
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (5 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 06/12] sr: implement ->free_disk Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
` (5 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe
Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi, Bart Van Assche
From: Ming Lei <ming.lei@redhat.com>
blkcg works on FS bio level, so it is reasonable to make both blkcg and
gendisk sharing same lifetime. Meantime there won't be any FS IO when
releasing disk, so safe to move blkcg initialization/destroy into disk
allocation/release handler
Long term, we can move blkcg into gendisk completely.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-core.c | 5 -----
block/blk-sysfs.c | 7 -------
block/genhd.c | 13 +++++++++++++
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d2..b2f2c65774812 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -496,17 +496,12 @@ struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu)
PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
goto fail_stats;
- if (blkcg_init_queue(q))
- goto fail_ref;
-
blk_queue_dma_alignment(q, 511);
blk_set_default_limits(&q->limits);
q->nr_requests = BLKDEV_DEFAULT_RQ;
return q;
-fail_ref:
- percpu_ref_exit(&q->q_usage_counter);
fail_stats:
blk_free_queue_stats(q->stats);
fail_split:
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4c6b7dff71e5b..5f723d2ff8948 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -751,13 +751,6 @@ static void blk_exit_queue(struct request_queue *q)
ioc_clear_queue(q);
elevator_exit(q);
}
-
- /*
- * Remove all references to @q from the block cgroup controller before
- * restoring @q->queue_lock to avoid that restoring this pointer causes
- * e.g. blkcg_print_blkgs() to crash.
- */
- blkcg_exit_queue(q);
}
/**
diff --git a/block/genhd.c b/block/genhd.c
index e351fac41bf25..ebf0e0be1c545 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1115,9 +1115,17 @@ static void disk_release(struct device *dev)
blk_mq_cancel_work_sync(disk->queue);
+ /*
+ * Remove all references to @q from the block cgroup controller before
+ * restoring @q->queue_lock to avoid that restoring this pointer causes
+ * e.g. blkcg_print_blkgs() to crash.
+ */
+ blkcg_exit_queue(disk->queue);
+
disk_release_events(disk);
kfree(disk->random);
xa_destroy(&disk->part_tbl);
+
disk->queue->disk = NULL;
blk_put_queue(disk->queue);
@@ -1318,6 +1326,9 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
if (xa_insert(&disk->part_tbl, 0, disk->part0, GFP_KERNEL))
goto out_destroy_part_tbl;
+ if (blkcg_init_queue(q))
+ goto out_erase_part0;
+
rand_initialize_disk(disk);
disk_to_dev(disk)->class = &block_class;
disk_to_dev(disk)->type = &disk_type;
@@ -1330,6 +1341,8 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
#endif
return disk;
+out_erase_part0:
+ xa_erase(&disk->part_tbl, 0);
out_destroy_part_tbl:
xa_destroy(&disk->part_tbl);
disk->part0->bd_disk = NULL;
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (6 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 07/12] block: move blkcg initialization/destroy into disk allocation/release handler Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-23 4:06 ` Bart Van Assche
2022-02-22 14:14 ` [PATCH 09/12] block: move q_usage_counter release into blk_queue_release Christoph Hellwig
` (4 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
From: Ming Lei <ming.lei@redhat.com>
The queue's top debugfs dir is removed from blk_release_queue(), so all
hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
is only called from blk_cleanup_queue(), it isn't necessary to remove
hctx debugfs from blk_mq_exit_queue().
So remove it from blk_mq_exit_queue().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-mq.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 63e2d3fd60946..540c8da30da72 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
queue_for_each_hw_ctx(q, hctx, i) {
if (i == nr_queue)
break;
- blk_mq_debugfs_unregister_hctx(hctx);
blk_mq_exit_hctx(q, set, hctx, i);
}
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
@ 2022-02-23 4:06 ` Bart Van Assche
2022-02-23 6:41 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-23 4:06 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
On 2/22/22 06:14, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
>
> The queue's top debugfs dir is removed from blk_release_queue(), so all
> hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
> is only called from blk_cleanup_queue(), it isn't necessary to remove
> hctx debugfs from blk_mq_exit_queue().
>
> So remove it from blk_mq_exit_queue().
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> block/blk-mq.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 63e2d3fd60946..540c8da30da72 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
> queue_for_each_hw_ctx(q, hctx, i) {
> if (i == nr_queue)
> break;
> - blk_mq_debugfs_unregister_hctx(hctx);
> blk_mq_exit_hctx(q, set, hctx, i);
> }
> }
What will happen if a new queue with the same name as a removed queue is
created before blk_release_queue() for the removed queue has finished?
Will that cause registration of debugfs attributes for the newly created
queue to fail?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue
2022-02-23 4:06 ` Bart Van Assche
@ 2022-02-23 6:41 ` Ming Lei
0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-23 6:41 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
linux-scsi
On Tue, Feb 22, 2022 at 08:06:31PM -0800, Bart Van Assche wrote:
> On 2/22/22 06:14, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> >
> > The queue's top debugfs dir is removed from blk_release_queue(), so all
> > hctx's debugfs dirs are removed from there. Given blk_mq_exit_queue()
> > is only called from blk_cleanup_queue(), it isn't necessary to remove
> > hctx debugfs from blk_mq_exit_queue().
> >
> > So remove it from blk_mq_exit_queue().
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> > block/blk-mq.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 63e2d3fd60946..540c8da30da72 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -3425,7 +3425,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
> > queue_for_each_hw_ctx(q, hctx, i) {
> > if (i == nr_queue)
> > break;
> > - blk_mq_debugfs_unregister_hctx(hctx);
> > blk_mq_exit_hctx(q, set, hctx, i);
> > }
> > }
>
> What will happen if a new queue with the same name as a removed queue is
> created before blk_release_queue() for the removed queue has finished? Will
> that cause registration of debugfs attributes for the newly created queue to
> fail?
That may happen, but not related with this patch, since this patch just
delays removing of hctx's debug entry. And q->debugfs_dir is removed
from blk_release_queue().
So far, request queue doesn't has name, and just uses the disk's name
for creating debugfs entry. The trouble should have been there for long
time.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 09/12] block: move q_usage_counter release into blk_queue_release
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (7 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 08/12] block: don't remove hctx debugfs dir from blk_mq_exit_queue Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
` (3 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe
Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi, Bart Van Assche
From: Ming Lei <ming.lei@redhat.com>
After blk_cleanup_queue() returns, disk may not be released yet, so
probably bio may still be submitted and ->q_usage_counter may be
touched, so far this way seems safe, but not good from API's viewpoint.
Move the release q_usage_counter into blk_queue_release().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-core.c | 2 --
block/blk-sysfs.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index b2f2c65774812..a8c59913dd78d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -342,8 +342,6 @@ void blk_cleanup_queue(struct request_queue *q)
blk_mq_sched_free_rqs(q);
mutex_unlock(&q->sysfs_lock);
- percpu_ref_exit(&q->q_usage_counter);
-
/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
}
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5f723d2ff8948..4ea22169b5186 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -780,6 +780,8 @@ static void blk_release_queue(struct kobject *kobj)
might_sleep();
+ percpu_ref_exit(&q->q_usage_counter);
+
if (q->poll_stat)
blk_stat_remove_callback(q, q->poll_cb);
blk_stat_free_callback(q->poll_cb);
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (8 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 09/12] block: move q_usage_counter release into blk_queue_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 18:29 ` Bart Van Assche
2022-02-22 14:14 ` [PATCH 11/12] block: do more work in elevator_exit Christoph Hellwig
` (2 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
From: Ming Lei <ming.lei@redhat.com>
There can't be FS IO in disk_release(), so move blk_exit_queue() there.
We still need to freeze queue here since the request is freed after the
bio is completed and passthrough request rely on scheduler tags as well.
The disk can be released before or after queue is cleaned up, and we have
to free the scheduler request pool before blk_cleanup_queue returns,
while the static request pool has to be freed before exiting the
I/O scheduler.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-sysfs.c | 16 ----------------
block/genhd.c | 32 +++++++++++++++++++++++++++++++-
2 files changed, 31 insertions(+), 17 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4ea22169b5186..faf8577578929 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -739,20 +739,6 @@ static void blk_free_queue_rcu(struct rcu_head *rcu_head)
kmem_cache_free(blk_get_queue_kmem_cache(blk_queue_has_srcu(q)), q);
}
-/* Unconfigure the I/O scheduler and dissociate from the cgroup controller. */
-static void blk_exit_queue(struct request_queue *q)
-{
- /*
- * Since the I/O scheduler exit code may access cgroup information,
- * perform I/O scheduler exit before disassociating from the block
- * cgroup controller.
- */
- if (q->elevator) {
- ioc_clear_queue(q);
- elevator_exit(q);
- }
-}
-
/**
* blk_release_queue - releases all allocated resources of the request_queue
* @kobj: pointer to a kobject, whose container is a request_queue
@@ -786,8 +772,6 @@ static void blk_release_queue(struct kobject *kobj)
blk_stat_remove_callback(q, q->poll_cb);
blk_stat_free_callback(q->poll_cb);
- blk_exit_queue(q);
-
blk_free_queue_stats(q->stats);
kfree(q->poll_stat);
diff --git a/block/genhd.c b/block/genhd.c
index ebf0e0be1c545..40ef013382872 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,6 +29,7 @@
#include "blk.h"
#include "blk-mq-sched.h"
#include "blk-rq-qos.h"
+#include "blk-cgroup.h"
static struct kobject *block_depr;
@@ -1092,6 +1093,34 @@ static const struct attribute_group *disk_attr_groups[] = {
NULL
};
+static void blk_mq_release_queue(struct request_queue *q)
+{
+ blk_mq_cancel_work_sync(q);
+
+ /*
+ * There can't be any non non-passthrough bios in flight here, but
+ * requests stay around longer, including passthrough ones so we
+ * still need to freeze the queue here.
+ */
+ blk_mq_freeze_queue(q);
+
+ /*
+ * Since the I/O scheduler exit code may access cgroup information,
+ * perform I/O scheduler exit before disassociating from the block
+ * cgroup controller.
+ */
+ if (q->elevator) {
+ ioc_clear_queue(q);
+
+ mutex_lock(&q->sysfs_lock);
+ blk_mq_sched_free_rqs(q);
+ elevator_exit(q);
+ mutex_unlock(&q->sysfs_lock);
+ }
+
+ __blk_mq_unfreeze_queue(q, true);
+}
+
/**
* disk_release - releases all allocated resources of the gendisk
* @dev: the device representing this disk
@@ -1113,7 +1142,8 @@ static void disk_release(struct device *dev)
might_sleep();
WARN_ON_ONCE(disk_live(disk));
- blk_mq_cancel_work_sync(disk->queue);
+ if (queue_is_mq(disk->queue))
+ blk_mq_release_queue(disk->queue);
/*
* Remove all references to @q from the block cgroup controller before
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
@ 2022-02-22 18:29 ` Bart Van Assche
2022-02-23 6:56 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-22 18:29 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
On 2/22/22 06:14, Christoph Hellwig wrote:
> From: Ming Lei <ming.lei@redhat.com>
>
> There can't be FS IO in disk_release(), so move blk_exit_queue() there.
>
> We still need to freeze queue here since the request is freed after the
> bio is completed and passthrough request rely on scheduler tags as well.
>
> The disk can be released before or after queue is cleaned up, and we have
> to free the scheduler request pool before blk_cleanup_queue returns,
> while the static request pool has to be freed before exiting the
> I/O scheduler.
This patch looks dubious to me because:
- The blk_freeze_queue() call in blk_cleanup_queue() waits for pending
requests to finish, so why to move blk_exit_queue() from
blk_cleanup_queue() into disk_release()?
- I'm concerned that this patch will break user space, e.g. scripts that
try to unload an I/O scheduler kernel module immediately after having
removed a request queue.
> +static void blk_mq_release_queue(struct request_queue *q)
> +{
> + blk_mq_cancel_work_sync(q);
> +
> + /*
> + * There can't be any non non-passthrough bios in flight here, but
> + * requests stay around longer, including passthrough ones so we
> + * still need to freeze the queue here.
> + */
> + blk_mq_freeze_queue(q);
The above comment should be elaborated since what matters in this
context is not whether or not any bios are still in flight but what
happens with the request structures. As you know blk_queue_enter() fails
after the DYING flag has been set, a flag that is set by
blk_cleanup_queue(). blk_cleanup_queue() already freezes the queue. So
why is it necessary to call blk_mq_freeze_queue() from
blk_mq_release_queue()?
Bart.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-22 18:29 ` Bart Van Assche
@ 2022-02-23 6:56 ` Ming Lei
2022-02-23 20:04 ` Bart Van Assche
0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2022-02-23 6:56 UTC (permalink / raw)
To: Bart Van Assche
Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
linux-scsi
On Tue, Feb 22, 2022 at 10:29:47AM -0800, Bart Van Assche wrote:
> On 2/22/22 06:14, Christoph Hellwig wrote:
> > From: Ming Lei <ming.lei@redhat.com>
> >
> > There can't be FS IO in disk_release(), so move blk_exit_queue() there.
> >
> > We still need to freeze queue here since the request is freed after the
> > bio is completed and passthrough request rely on scheduler tags as well.
> >
> > The disk can be released before or after queue is cleaned up, and we have
> > to free the scheduler request pool before blk_cleanup_queue returns,
> > while the static request pool has to be freed before exiting the
> > I/O scheduler.
>
> This patch looks dubious to me because:
> - The blk_freeze_queue() call in blk_cleanup_queue() waits for pending
> requests to finish, so why to move blk_exit_queue() from
> blk_cleanup_queue() into disk_release()?
scsi disk may be released before calling blk_cleanup_queue(), and we
want to tear down all FS related stuff(cgroup, rqos, elevator) in disk_release().
And FS bios have been drained already when releasing disk.
> - I'm concerned that this patch will break user space, e.g. scripts that
> try to unload an I/O scheduler kernel module immediately after having
> removed a request queue.
When removing a request queue, the associated disk has been removed
already, and queue's kobject has been deleted too, so how can userspace
unload I/O scheduler at that time?
>
> > +static void blk_mq_release_queue(struct request_queue *q)
> > +{
> > + blk_mq_cancel_work_sync(q);
> > +
> > + /*
> > + * There can't be any non non-passthrough bios in flight here, but
> > + * requests stay around longer, including passthrough ones so we
> > + * still need to freeze the queue here.
> > + */
> > + blk_mq_freeze_queue(q);
>
> The above comment should be elaborated since what matters in this context is
> not whether or not any bios are still in flight but what happens with the
> request structures.
Yeah, bios have been done, but request is done after bio is ended, see
blk_update_request(), that is why we added blk_mq_freeze_queue() here.
> As you know blk_queue_enter() fails after the DYING flag
> has been set, a flag that is set by blk_cleanup_queue().blk_cleanup_queue()
> already freezes the queue. So why is it necessary to call
> blk_mq_freeze_queue() from blk_mq_release_queue()?
disk may be released before calling blk_cleanup_queue().
But I admit here the name of blk_mq_release_queue() is very misleading,
maybe blk_mq_release_io_queue() is better?
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-23 6:56 ` Ming Lei
@ 2022-02-23 20:04 ` Bart Van Assche
2022-02-24 7:25 ` Christoph Hellwig
0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2022-02-23 20:04 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen, linux-block,
linux-scsi
On 2/22/22 22:56, Ming Lei wrote:
> But I admit here the name of blk_mq_release_queue() is very misleading,
> maybe blk_mq_release_io_queue() is better?
I'm not sure what the best name for that function would be. Anyway,
thanks for having clarified that disk structures are removed before the
request queue is cleaned up. That's something I was missing.
Bart.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-23 20:04 ` Bart Van Assche
@ 2022-02-24 7:25 ` Christoph Hellwig
2022-02-25 1:26 ` Ming Lei
0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-24 7:25 UTC (permalink / raw)
To: Bart Van Assche
Cc: Ming Lei, Christoph Hellwig, Jens Axboe, Martin K. Petersen,
linux-block, linux-scsi
On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> On 2/22/22 22:56, Ming Lei wrote:
>> But I admit here the name of blk_mq_release_queue() is very misleading,
>> maybe blk_mq_release_io_queue() is better?
>
> I'm not sure what the best name for that function would be. Anyway, thanks
> for having clarified that disk structures are removed before the request
> queue is cleaned up. That's something I was missing.
Maybe disk_release_mq?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 10/12] block: move blk_exit_queue into disk_release
2022-02-24 7:25 ` Christoph Hellwig
@ 2022-02-25 1:26 ` Ming Lei
0 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2022-02-25 1:26 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Bart Van Assche, Jens Axboe, Martin K. Petersen, linux-block, linux-scsi
On Thu, Feb 24, 2022 at 08:25:24AM +0100, Christoph Hellwig wrote:
> On Wed, Feb 23, 2022 at 12:04:03PM -0800, Bart Van Assche wrote:
> > On 2/22/22 22:56, Ming Lei wrote:
> >> But I admit here the name of blk_mq_release_queue() is very misleading,
> >> maybe blk_mq_release_io_queue() is better?
> >
> > I'm not sure what the best name for that function would be. Anyway, thanks
> > for having clarified that disk structures are removed before the request
> > queue is cleaned up. That's something I was missing.
>
> Maybe disk_release_mq?
disk_release_mq() looks much better.
Thanks,
Ming
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 11/12] block: do more work in elevator_exit
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (9 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 10/12] block: move blk_exit_queue into disk_release Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-22 14:14 ` [PATCH 12/12] block: move rq_qos_exit() into disk_release() Christoph Hellwig
2022-02-26 4:46 ` move more work to disk_release Bart Van Assche
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
Move the calls to ioc_clear_queue and blk_mq_sched_free_rqs into
elevator_exit. Except for one call where we know we can't have io_cq
structures yet these always go together, and that extra call in an
error path is harmless.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/elevator.c | 7 +++----
block/genhd.c | 3 ---
2 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/block/elevator.c b/block/elevator.c
index 6847ab6e7aa50..4664cae50da86 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -192,6 +192,9 @@ void elevator_exit(struct request_queue *q)
{
struct elevator_queue *e = q->elevator;
+ ioc_clear_queue(q);
+ blk_mq_sched_free_rqs(q);
+
mutex_lock(&e->sysfs_lock);
blk_mq_exit_sched(q, e);
mutex_unlock(&e->sysfs_lock);
@@ -595,9 +598,6 @@ int elevator_switch_mq(struct request_queue *q,
if (q->elevator) {
if (q->elevator->registered)
elv_unregister_queue(q);
-
- ioc_clear_queue(q);
- blk_mq_sched_free_rqs(q);
elevator_exit(q);
}
@@ -608,7 +608,6 @@ int elevator_switch_mq(struct request_queue *q,
if (new_e) {
ret = elv_register_queue(q, true);
if (ret) {
- blk_mq_sched_free_rqs(q);
elevator_exit(q);
goto out;
}
diff --git a/block/genhd.c b/block/genhd.c
index 40ef013382872..40edff4331758 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1110,10 +1110,7 @@ static void blk_mq_release_queue(struct request_queue *q)
* cgroup controller.
*/
if (q->elevator) {
- ioc_clear_queue(q);
-
mutex_lock(&q->sysfs_lock);
- blk_mq_sched_free_rqs(q);
elevator_exit(q);
mutex_unlock(&q->sysfs_lock);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 12/12] block: move rq_qos_exit() into disk_release()
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (10 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 11/12] block: do more work in elevator_exit Christoph Hellwig
@ 2022-02-22 14:14 ` Christoph Hellwig
2022-02-26 4:46 ` move more work to disk_release Bart Van Assche
12 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2022-02-22 14:14 UTC (permalink / raw)
To: Jens Axboe; +Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
From: Ming Lei <ming.lei@redhat.com>
There can't be FS IO in disk_release(), so it is safe to move rq_qos_exit()
there.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
block/genhd.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index 40edff4331758..33d61bc10addc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -627,7 +627,6 @@ void del_gendisk(struct gendisk *disk)
blk_mq_freeze_queue_wait(q);
- rq_qos_exit(q);
blk_sync_queue(q);
blk_flush_integrity();
/*
@@ -1114,7 +1113,7 @@ static void blk_mq_release_queue(struct request_queue *q)
elevator_exit(q);
mutex_unlock(&q->sysfs_lock);
}
-
+ rq_qos_exit(q);
__blk_mq_unfreeze_queue(q, true);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: move more work to disk_release
2022-02-22 14:14 move more work to disk_release Christoph Hellwig
` (11 preceding siblings ...)
2022-02-22 14:14 ` [PATCH 12/12] block: move rq_qos_exit() into disk_release() Christoph Hellwig
@ 2022-02-26 4:46 ` Bart Van Assche
12 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2022-02-26 4:46 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Martin K. Petersen, Ming Lei, linux-block, linux-scsi
On 2/22/22 06:14, Christoph Hellwig wrote:
> Git branch:
>
> git://git.infradead.org/users/hch/block.git freeze-5.18
A patch in or before this patch series may need some additional
work. This is what I see in the kernel log if I verify the above
kernel branch with blktests:
run blktests block/027 at 2022-02-26 03:54:57
[ ... ]
==================================================================
BUG: KASAN: use-after-free in sd_release+0x2a/0x100 [sd_mod]
Read of size 8 at addr ffff888115a0a000 by task fio/7217
CPU: 1 PID: 7217 Comm: fio Not tainted 5.17.0-rc2-dbg+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
Call Trace:
sd 9:0:0:1: [sde] Synchronizing SCSI cache
<TASK>
show_stack+0x52/0x58
dump_stack_lvl+0x5b/0x82
print_address_description.constprop.0+0x24/0x160
? sd_release+0x2a/0x100 [sd_mod]
kasan_report.cold+0x82/0xdb
? perf_trace_sched_numa_pair_template+0x340/0x350
? sd_release+0x2a/0x100 [sd_mod]
__asan_load8+0x69/0x90
sd_release+0x2a/0x100 [sd_mod]
blkdev_put+0x15a/0x3b0
blkdev_close+0x3c/0x50
__fput+0x13d/0x430
____fput+0xe/0x10
task_work_run+0x8e/0xe0
do_exit+0x2b6/0x5e0
do_group_exit+0x71/0x150
__x64_sys_exit_group+0x31/0x40
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f8d243d0ed1
Code: Unable to access opcode bytes at RIP 0x7f8d243d0ea7.
RSP: 002b:00007ffe2c7aae48 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f8d243d0ed1
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000013
RBP: 00007f8d1214ae90 R08: ffffffffffffe168 R09: a53fa94fea53fa95
R10: 0000000000000002 R11: 0000000000000206 R12: 00007f8d253d3c30
R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000000
</TASK>
Allocated by task 5692:
kasan_save_stack+0x26/0x50
__kasan_kmalloc+0x88/0xa0
kmem_cache_alloc_trace+0x1a3/0x2c0
sd_probe+0x9a/0x700 [sd_mod]
really_probe+0x141/0x5d0
__driver_probe_device+0x1aa/0x240
driver_probe_device+0x4e/0x110
__device_attach_driver+0xf6/0x160
bus_for_each_drv+0xfd/0x160
__device_attach_async_helper+0x138/0x190
async_run_entry_fn+0x63/0x240
process_one_work+0x594/0xad0
worker_thread+0x2de/0x6b0
kthread+0x15f/0x190
ret_from_fork+0x1f/0x30
Freed by task 6426:
kasan_save_stack+0x26/0x50
kasan_set_track+0x25/0x30
kasan_set_free_info+0x24/0x40
__kasan_slab_free+0x100/0x140
kfree+0xd1/0x510
scsi_disk_release+0x41/0x50 [sd_mod]
device_release+0x60/0x100
kobject_cleanup+0x7f/0x1c0
kobject_put+0x76/0x90
put_device+0x13/0x20
sd_remove+0x63/0x70 [sd_mod]
__device_release_driver+0x37e/0x390
device_release_driver+0x2b/0x40
bus_remove_device+0x1aa/0x270
device_del+0x2d4/0x640
__scsi_remove_device+0x168/0x1a0
sdev_store_delete+0x75/0xe0
dev_attr_store+0x3e/0x60
sysfs_kf_write+0x87/0xa0
kernfs_fop_write_iter+0x1c7/0x270
new_sync_write+0x296/0x3c0
vfs_write+0x43c/0x580
ksys_write+0xd9/0x180
__x64_sys_write+0x42/0x50
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Last potentially related work creation:
kasan_save_stack+0x26/0x50
__kasan_record_aux_stack+0xa8/0xc0
kasan_record_aux_stack_noalloc+0xb/0x10
insert_work+0x3b/0x170
__queue_work+0x32f/0x7d0
queue_work_on+0x7e/0x90
rpm_idle+0x432/0x460
__pm_runtime_set_status+0x1da/0x520
pm_runtime_remove+0xb3/0xc0
device_pm_remove+0x108/0x190
device_del+0x2dc/0x640
__scsi_remove_device+0x168/0x1a0
sdev_store_delete+0x75/0xe0
dev_attr_store+0x3e/0x60
sysfs_kf_write+0x87/0xa0
kernfs_fop_write_iter+0x1c7/0x270
new_sync_write+0x296/0x3c0
vfs_write+0x43c/0x580
ksys_write+0xd9/0x180
__x64_sys_write+0x42/0x50
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation:
kasan_save_stack+0x26/0x50
__kasan_record_aux_stack+0xa8/0xc0
kasan_record_aux_stack_noalloc+0xb/0x10
insert_work+0x3b/0x170
__queue_work+0x32f/0x7d0
queue_work_on+0x7e/0x90
queue_release_one_tty+0xbf/0xd0
release_tty+0x241/0x290
tty_release_struct+0x92/0xb0
tty_release+0x5b1/0x5f0
__fput+0x13d/0x430
____fput+0xe/0x10
task_work_run+0x8e/0xe0
exit_to_user_mode_loop+0xee/0xf0
exit_to_user_mode_prepare+0xd6/0x100
syscall_exit_to_user_mode+0x1e/0x50
do_syscall_64+0x42/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff888115a0a000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 0 bytes inside of
2048-byte region [ffff888115a0a000, ffff888115a0a800)
The buggy address belongs to the page:
page:00000000fac6ce95 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888115a0f000 pfn:0x115a08
head:00000000fac6ce95 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x2000000000010200(slab|head|node=0|zone=2)
raw: 2000000000010200 ffffea00041d5408 ffffea000407d808 ffff888100042f00
raw: ffff888115a0f000 0000000000080006 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
^ permalink raw reply [flat|nested] 25+ messages in thread