* [PATCH 01/15] nbd: use the correct block_device in nbd_bdev_reset
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-04-18 12:54 ` Jens Axboe
2022-03-30 5:29 ` [PATCH 02/15] zram: cleanup reset_store Christoph Hellwig
` (14 subsequent siblings)
15 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
The bdev parameter to ->ioctl contains the block device that the ioctl
is called on, which can be the partition. But the openers check in
nbd_bdev_reset really needs to check use the whole device, so switch to
using that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/nbd.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5a1f98494dddf..2f29da41fbc80 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1217,11 +1217,11 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
return -ENOSPC;
}
-static void nbd_bdev_reset(struct block_device *bdev)
+static void nbd_bdev_reset(struct nbd_device *nbd)
{
- if (bdev->bd_openers > 1)
+ if (nbd->disk->part0->bd_openers > 1)
return;
- set_capacity(bdev->bd_disk, 0);
+ set_capacity(nbd->disk, 0);
}
static void nbd_parse_flags(struct nbd_device *nbd)
@@ -1389,7 +1389,7 @@ static int nbd_start_device(struct nbd_device *nbd)
return nbd_set_size(nbd, config->bytesize, nbd_blksize(config));
}
-static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *bdev)
+static int nbd_start_device_ioctl(struct nbd_device *nbd)
{
struct nbd_config *config = nbd->config;
int ret;
@@ -1408,7 +1408,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
flush_workqueue(nbd->recv_workq);
mutex_lock(&nbd->config_lock);
- nbd_bdev_reset(bdev);
+ nbd_bdev_reset(nbd);
/* user requested, ignore socket errors */
if (test_bit(NBD_RT_DISCONNECT_REQUESTED, &config->runtime_flags))
ret = 0;
@@ -1422,7 +1422,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
{
sock_shutdown(nbd);
__invalidate_device(bdev, true);
- nbd_bdev_reset(bdev);
+ nbd_bdev_reset(nbd);
if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
nbd_config_put(nbd);
@@ -1468,7 +1468,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
config->flags = arg;
return 0;
case NBD_DO_IT:
- return nbd_start_device_ioctl(nbd, bdev);
+ return nbd_start_device_ioctl(nbd);
case NBD_CLEAR_QUE:
/*
* This is for compatibility only. The queue is always cleared
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 01/15] nbd: use the correct block_device in nbd_bdev_reset
2022-03-30 5:29 ` [PATCH 01/15] nbd: use the correct block_device in nbd_bdev_reset Christoph Hellwig
@ 2022-04-18 12:54 ` Jens Axboe
0 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2022-04-18 12:54 UTC (permalink / raw)
To: minchan, josef, ngupta, Christoph Hellwig
Cc: mcroce, linux-block, penguin-kernel, nbd, jack, djwong, ming.lei
On Wed, 30 Mar 2022 07:29:03 +0200, Christoph Hellwig wrote:
> The bdev parameter to ->ioctl contains the block device that the ioctl
> is called on, which can be the partition. But the openers check in
> nbd_bdev_reset really needs to check use the whole device, so switch to
> using that.
>
>
Applied, thanks!
[01/15] nbd: use the correct block_device in nbd_bdev_reset
commit: 2a852a693f8839bb877fc731ffbc9ece3a9c16d7
[02/15] zram: cleanup reset_store
commit: d666e20e2e7983d03bbf5e208b8485541ae616a1
[03/15] zram: cleanup zram_remove
commit: 7a86d6dc1493326feb0d3ce5af2f34401dd3defa
[04/15] block: add a disk_openers helper
commit: dbdc1be32591af023db2812706f01e6cd2f42bfc
[05/15] block: turn bdev->bd_openers into an atomic_t
commit: 9acf381f3e8f715175c29f4b6d722f6b6797d139
[06/15] loop: de-duplicate the idle worker freeing code
commit: 2cf429b53c1041a0e90943e1d2a5a7a7f89accb0
[07/15] loop: initialize the worker tracking fields once
commit: b15ed54694fbba714931dd81790f86797cf8bed2
[08/15] loop: remove the racy bd_inode->i_mapping->nrpages asserts
commit: 98ded54a33839e7b8f8bed772e01a544f48e33a7
[09/15] loop: don't freeze the queue in lo_release
commit: 46dc967445bde5300eee7e567a67796de2217586
[10/15] loop: only freeze the queue in __loop_clr_fd when needed
commit: 1fe0b1acb14dd3113b7dc975a118cd7f08af8316
[11/15] loop: implement ->free_disk
commit: d2c7f56f8b5256d57f9e3fc7794c31361d43bdd9
[12/15] loop: suppress uevents while reconfiguring the device
commit: 498ef5c777d9c89693b70cc453b40c392120ea1b
[13/15] loop: avoid loop_validate_mutex/lo_mutex in ->release
commit: 158eaeba4b8edf9940f64daa83cbd1ac7db7593c
[14/15] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
commit: a0e286b6a5b61d4da01bdf865071c4da417046d6
[15/15] loop: don't destroy lo->workqueue in __loop_clr_fd
commit: d292dc80686aeafc418d809b4f9598578a7f294f
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 02/15] zram: cleanup reset_store
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
2022-03-30 5:29 ` [PATCH 01/15] nbd: use the correct block_device in nbd_bdev_reset Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 03/15] zram: cleanup zram_remove Christoph Hellwig
` (13 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Use a local variable for the gendisk instead of the part0 block_device,
as the gendisk is what this function actually operates on.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/zram/zram_drv.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e9474b02012de..fd83fad59beb1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1786,7 +1786,7 @@ static ssize_t reset_store(struct device *dev,
int ret;
unsigned short do_reset;
struct zram *zram;
- struct block_device *bdev;
+ struct gendisk *disk;
ret = kstrtou16(buf, 10, &do_reset);
if (ret)
@@ -1796,26 +1796,26 @@ static ssize_t reset_store(struct device *dev,
return -EINVAL;
zram = dev_to_zram(dev);
- bdev = zram->disk->part0;
+ disk = zram->disk;
- mutex_lock(&bdev->bd_disk->open_mutex);
+ mutex_lock(&disk->open_mutex);
/* Do not reset an active device or claimed device */
- if (bdev->bd_openers || zram->claim) {
- mutex_unlock(&bdev->bd_disk->open_mutex);
+ if (disk->part0->bd_openers || zram->claim) {
+ mutex_unlock(&disk->open_mutex);
return -EBUSY;
}
/* From now on, anyone can't open /dev/zram[0-9] */
zram->claim = true;
- mutex_unlock(&bdev->bd_disk->open_mutex);
+ mutex_unlock(&disk->open_mutex);
/* Make sure all the pending I/O are finished */
- sync_blockdev(bdev);
+ sync_blockdev(disk->part0);
zram_reset_device(zram);
- mutex_lock(&bdev->bd_disk->open_mutex);
+ mutex_lock(&disk->open_mutex);
zram->claim = false;
- mutex_unlock(&bdev->bd_disk->open_mutex);
+ mutex_unlock(&disk->open_mutex);
return len;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 03/15] zram: cleanup zram_remove
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
2022-03-30 5:29 ` [PATCH 01/15] nbd: use the correct block_device in nbd_bdev_reset Christoph Hellwig
2022-03-30 5:29 ` [PATCH 02/15] zram: cleanup reset_store Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 04/15] block: add a disk_openers helper Christoph Hellwig
` (12 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Remove the bdev variable and just use the gendisk pointed to by the
zram_device directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/zram/zram_drv.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fd83fad59beb1..863606f1722b1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1987,19 +1987,18 @@ static int zram_add(void)
static int zram_remove(struct zram *zram)
{
- struct block_device *bdev = zram->disk->part0;
bool claimed;
- mutex_lock(&bdev->bd_disk->open_mutex);
- if (bdev->bd_openers) {
- mutex_unlock(&bdev->bd_disk->open_mutex);
+ mutex_lock(&zram->disk->open_mutex);
+ if (zram->disk->part0->bd_openers) {
+ mutex_unlock(&zram->disk->open_mutex);
return -EBUSY;
}
claimed = zram->claim;
if (!claimed)
zram->claim = true;
- mutex_unlock(&bdev->bd_disk->open_mutex);
+ mutex_unlock(&zram->disk->open_mutex);
zram_debugfs_unregister(zram);
@@ -2011,7 +2010,7 @@ static int zram_remove(struct zram *zram)
;
} else {
/* Make sure all the pending I/O are finished */
- sync_blockdev(bdev);
+ sync_blockdev(zram->disk->part0);
zram_reset_device(zram);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 04/15] block: add a disk_openers helper
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (2 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 03/15] zram: cleanup zram_remove Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 05/15] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
` (11 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Add a helper that returns the openers for a given gendisk to avoid having
drivers poke into disk->part0 to get at this information in a somewhat
cumbersome way.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/nbd.c | 4 ++--
drivers/block/zram/zram_drv.c | 4 ++--
include/linux/blkdev.h | 15 +++++++++++++++
3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 2f29da41fbc80..7473f3d4e1270 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1219,7 +1219,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
static void nbd_bdev_reset(struct nbd_device *nbd)
{
- if (nbd->disk->part0->bd_openers > 1)
+ if (disk_openers(nbd->disk) > 1)
return;
set_capacity(nbd->disk, 0);
}
@@ -1579,7 +1579,7 @@ static void nbd_release(struct gendisk *disk, fmode_t mode)
struct nbd_device *nbd = disk->private_data;
if (test_bit(NBD_RT_DISCONNECT_ON_CLOSE, &nbd->config->runtime_flags) &&
- disk->part0->bd_openers == 0)
+ disk_openers(disk) == 0)
nbd_disconnect_and_put(nbd);
nbd_config_put(nbd);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 863606f1722b1..2362385f782a9 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1800,7 +1800,7 @@ static ssize_t reset_store(struct device *dev,
mutex_lock(&disk->open_mutex);
/* Do not reset an active device or claimed device */
- if (disk->part0->bd_openers || zram->claim) {
+ if (disk_openers(disk) || zram->claim) {
mutex_unlock(&disk->open_mutex);
return -EBUSY;
}
@@ -1990,7 +1990,7 @@ static int zram_remove(struct zram *zram)
bool claimed;
mutex_lock(&zram->disk->open_mutex);
- if (zram->disk->part0->bd_openers) {
+ if (disk_openers(zram->disk)) {
mutex_unlock(&zram->disk->open_mutex);
return -EBUSY;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 60d0161389971..1abd5a56a5c99 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -176,6 +176,21 @@ static inline bool disk_live(struct gendisk *disk)
return !inode_unhashed(disk->part0->bd_inode);
}
+/**
+ * disk_openers - returns how many openers are there for a disk
+ * @disk: disk to check
+ *
+ * This returns the number of openers for a disk. Note that this value is only
+ * stable if disk->open_mutex is held.
+ *
+ * Note: Due to a quirk in the block layer open code, each open partition is
+ * only counted once even if there are multiple openers.
+ */
+static inline unsigned int disk_openers(struct gendisk *disk)
+{
+ return disk->part0->bd_openers;
+}
+
/*
* The gendisk is refcounted by the part0 block_device, and the bd_device
* therein is also used for device model presentation in sysfs.
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 05/15] block: turn bdev->bd_openers into an atomic_t
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (3 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 04/15] block: add a disk_openers helper Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 06/15] loop: de-duplicate the idle worker freeing code Christoph Hellwig
` (10 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
All manipulation of bd_openers is under disk->open_mutex and will remain
so for the foreseeable future. But at least one place reads it without
the lock (blkdev_get) and there are more to be added. So make sure the
compiler does not do turn the increments and decrements into non-atomic
sequences by using an atomic_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
block/bdev.c | 16 ++++++++--------
block/partitions/core.c | 2 +-
include/linux/blk_types.h | 2 +-
include/linux/blkdev.h | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 13de871fa8169..7bf88e591aaf3 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -673,17 +673,17 @@ static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
}
}
- if (!bdev->bd_openers)
+ if (!atomic_read(&bdev->bd_openers))
set_init_blocksize(bdev);
if (test_bit(GD_NEED_PART_SCAN, &disk->state))
bdev_disk_changed(disk, false);
- bdev->bd_openers++;
+ atomic_inc(&bdev->bd_openers);
return 0;
}
static void blkdev_put_whole(struct block_device *bdev, fmode_t mode)
{
- if (!--bdev->bd_openers)
+ if (atomic_dec_and_test(&bdev->bd_openers))
blkdev_flush_mapping(bdev);
if (bdev->bd_disk->fops->release)
bdev->bd_disk->fops->release(bdev->bd_disk, mode);
@@ -694,7 +694,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
struct gendisk *disk = part->bd_disk;
int ret;
- if (part->bd_openers)
+ if (atomic_read(&part->bd_openers))
goto done;
ret = blkdev_get_whole(bdev_whole(part), mode);
@@ -708,7 +708,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
disk->open_partitions++;
set_init_blocksize(part);
done:
- part->bd_openers++;
+ atomic_inc(&part->bd_openers);
return 0;
out_blkdev_put:
@@ -720,7 +720,7 @@ static void blkdev_put_part(struct block_device *part, fmode_t mode)
{
struct block_device *whole = bdev_whole(part);
- if (--part->bd_openers)
+ if (!atomic_dec_and_test(&part->bd_openers))
return;
blkdev_flush_mapping(part);
whole->bd_disk->open_partitions--;
@@ -899,7 +899,7 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
* of the world and we want to avoid long (could be several minute)
* syncs while holding the mutex.
*/
- if (bdev->bd_openers == 1)
+ if (atomic_read(&bdev->bd_openers) == 1)
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
@@ -1044,7 +1044,7 @@ void sync_bdevs(bool wait)
bdev = I_BDEV(inode);
mutex_lock(&bdev->bd_disk->open_mutex);
- if (!bdev->bd_openers) {
+ if (!atomic_read(&bdev->bd_openers)) {
; /* skip */
} else if (wait) {
/*
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 2ef8dfa1e5c85..373ed748dcf26 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -486,7 +486,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
goto out_unlock;
ret = -EBUSY;
- if (part->bd_openers)
+ if (atomic_read(&part->bd_openers))
goto out_unlock;
delete_partition(part);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index dd0763a1c6740..5fdda716620e6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -44,7 +44,7 @@ struct block_device {
unsigned long bd_stamp;
bool bd_read_only; /* read-only policy */
dev_t bd_dev;
- int bd_openers;
+ atomic_t bd_openers;
struct inode * bd_inode; /* will die */
struct super_block * bd_super;
void * bd_claiming;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1abd5a56a5c99..3b11625825d3f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -188,7 +188,7 @@ static inline bool disk_live(struct gendisk *disk)
*/
static inline unsigned int disk_openers(struct gendisk *disk)
{
- return disk->part0->bd_openers;
+ return atomic_read(&disk->part0->bd_openers);
}
/*
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 06/15] loop: de-duplicate the idle worker freeing code
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (4 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 05/15] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 07/15] loop: initialize the worker tracking fields once Christoph Hellwig
` (9 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Use a common helper for both timer based and uncoditional freeing of idle
workers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
drivers/block/loop.c | 73 +++++++++++++++++++++-----------------------
1 file changed, 35 insertions(+), 38 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3e636a75c83a8..762f0a18295d7 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -809,7 +809,6 @@ struct loop_worker {
static void loop_workfn(struct work_struct *work);
static void loop_rootcg_workfn(struct work_struct *work);
-static void loop_free_idle_workers(struct timer_list *timer);
#ifdef CONFIG_BLK_CGROUP
static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
@@ -893,6 +892,39 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
spin_unlock_irq(&lo->lo_work_lock);
}
+static void loop_set_timer(struct loop_device *lo)
+{
+ timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
+}
+
+static void loop_free_idle_workers(struct loop_device *lo, bool delete_all)
+{
+ struct loop_worker *pos, *worker;
+
+ spin_lock_irq(&lo->lo_work_lock);
+ list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
+ idle_list) {
+ if (!delete_all &&
+ time_is_after_jiffies(worker->last_ran_at +
+ LOOP_IDLE_WORKER_TIMEOUT))
+ break;
+ list_del(&worker->idle_list);
+ rb_erase(&worker->rb_node, &lo->worker_tree);
+ css_put(worker->blkcg_css);
+ kfree(worker);
+ }
+ if (!list_empty(&lo->idle_worker_list))
+ loop_set_timer(lo);
+ spin_unlock_irq(&lo->lo_work_lock);
+}
+
+static void loop_free_idle_workers_timer(struct timer_list *timer)
+{
+ struct loop_device *lo = container_of(timer, struct loop_device, timer);
+
+ return loop_free_idle_workers(lo, false);
+}
+
static void loop_update_rotational(struct loop_device *lo)
{
struct file *file = lo->lo_backing_file;
@@ -1027,7 +1059,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
INIT_LIST_HEAD(&lo->rootcg_cmd_list);
INIT_LIST_HEAD(&lo->idle_worker_list);
lo->worker_tree = RB_ROOT;
- timer_setup(&lo->timer, loop_free_idle_workers,
+ timer_setup(&lo->timer, loop_free_idle_workers_timer,
TIMER_DEFERRABLE);
lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
lo->lo_device = bdev;
@@ -1091,7 +1123,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
{
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- struct loop_worker *pos, *worker;
/*
* Flush loop_configure() and loop_change_fd(). It is acceptable for
@@ -1121,15 +1152,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
blk_mq_freeze_queue(lo->lo_queue);
destroy_workqueue(lo->workqueue);
- spin_lock_irq(&lo->lo_work_lock);
- list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
- idle_list) {
- list_del(&worker->idle_list);
- rb_erase(&worker->rb_node, &lo->worker_tree);
- css_put(worker->blkcg_css);
- kfree(worker);
- }
- spin_unlock_irq(&lo->lo_work_lock);
+ loop_free_idle_workers(lo, true);
del_timer_sync(&lo->timer);
spin_lock_irq(&lo->lo_lock);
@@ -1887,11 +1910,6 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
}
}
-static void loop_set_timer(struct loop_device *lo)
-{
- timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
-}
-
static void loop_process_work(struct loop_worker *worker,
struct list_head *cmd_list, struct loop_device *lo)
{
@@ -1940,27 +1958,6 @@ static void loop_rootcg_workfn(struct work_struct *work)
loop_process_work(NULL, &lo->rootcg_cmd_list, lo);
}
-static void loop_free_idle_workers(struct timer_list *timer)
-{
- struct loop_device *lo = container_of(timer, struct loop_device, timer);
- struct loop_worker *pos, *worker;
-
- spin_lock_irq(&lo->lo_work_lock);
- list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
- idle_list) {
- if (time_is_after_jiffies(worker->last_ran_at +
- LOOP_IDLE_WORKER_TIMEOUT))
- break;
- list_del(&worker->idle_list);
- rb_erase(&worker->rb_node, &lo->worker_tree);
- css_put(worker->blkcg_css);
- kfree(worker);
- }
- if (!list_empty(&lo->idle_worker_list))
- loop_set_timer(lo);
- spin_unlock_irq(&lo->lo_work_lock);
-}
-
static const struct blk_mq_ops loop_mq_ops = {
.queue_rq = loop_queue_rq,
.complete = lo_complete_rq,
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 07/15] loop: initialize the worker tracking fields once
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (5 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 06/15] loop: de-duplicate the idle worker freeing code Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 08/15] loop: remove the racy bd_inode->i_mapping->nrpages asserts Christoph Hellwig
` (8 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd, Chaitanya Kulkarni
There is no need to reinitialize idle_worker_list, worker_tree and timer
every time a loop device is configured. Just initialize them once at
allocation time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
drivers/block/loop.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 762f0a18295d7..d1c1086beedce 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1057,10 +1057,6 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
INIT_LIST_HEAD(&lo->rootcg_cmd_list);
- INIT_LIST_HEAD(&lo->idle_worker_list);
- lo->worker_tree = RB_ROOT;
- timer_setup(&lo->timer, loop_free_idle_workers_timer,
- TIMER_DEFERRABLE);
lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
lo->lo_device = bdev;
lo->lo_backing_file = file;
@@ -1973,6 +1969,9 @@ static int loop_add(int i)
lo = kzalloc(sizeof(*lo), GFP_KERNEL);
if (!lo)
goto out;
+ lo->worker_tree = RB_ROOT;
+ INIT_LIST_HEAD(&lo->idle_worker_list);
+ timer_setup(&lo->timer, loop_free_idle_workers_timer, TIMER_DEFERRABLE);
lo->lo_state = Lo_unbound;
err = mutex_lock_killable(&loop_ctl_mutex);
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 08/15] loop: remove the racy bd_inode->i_mapping->nrpages asserts
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (6 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 07/15] loop: initialize the worker tracking fields once Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 09/15] loop: don't freeze the queue in lo_release Christoph Hellwig
` (7 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Nothing prevents a file system or userspace opener of the block device
from redirtying the page right afte sync_blockdev returned. Fortunately
data in the page cache during a block device change is mostly harmless
anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
drivers/block/loop.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d1c1086beedce..25a71fd7b59da 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1276,15 +1276,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
/* I/O need to be drained during transfer transition */
blk_mq_freeze_queue(lo->lo_queue);
- if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) {
- /* If any pages were dirtied after invalidate_bdev(), try again */
- err = -EAGAIN;
- pr_warn("%s: loop%d (%s) still has dirty pages (nrpages=%lu)\n",
- __func__, lo->lo_number, lo->lo_file_name,
- lo->lo_device->bd_inode->i_mapping->nrpages);
- goto out_unfreeze;
- }
-
prev_lo_flags = lo->lo_flags;
err = loop_set_status_from_info(lo, info);
@@ -1495,21 +1486,10 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
invalidate_bdev(lo->lo_device);
blk_mq_freeze_queue(lo->lo_queue);
-
- /* invalidate_bdev should have truncated all the pages */
- if (lo->lo_device->bd_inode->i_mapping->nrpages) {
- err = -EAGAIN;
- pr_warn("%s: loop%d (%s) still has dirty pages (nrpages=%lu)\n",
- __func__, lo->lo_number, lo->lo_file_name,
- lo->lo_device->bd_inode->i_mapping->nrpages);
- goto out_unfreeze;
- }
-
blk_queue_logical_block_size(lo->lo_queue, arg);
blk_queue_physical_block_size(lo->lo_queue, arg);
blk_queue_io_min(lo->lo_queue, arg);
loop_update_dio(lo);
-out_unfreeze:
blk_mq_unfreeze_queue(lo->lo_queue);
return err;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 09/15] loop: don't freeze the queue in lo_release
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (7 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 08/15] loop: remove the racy bd_inode->i_mapping->nrpages asserts Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 10/15] loop: only freeze the queue in __loop_clr_fd when needed Christoph Hellwig
` (6 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
By the time the final ->release is called there can't be outstanding I/O.
For non-final ->release there is no need for driver action at all.
Thus remove the useless queue freeze.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
drivers/block/loop.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 25a71fd7b59da..c57acbcf9be6a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1753,13 +1753,6 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
*/
__loop_clr_fd(lo, true);
return;
- } else if (lo->lo_state == Lo_bound) {
- /*
- * Otherwise keep thread (if running) and config,
- * but flush possible ongoing bios in thread.
- */
- blk_mq_freeze_queue(lo->lo_queue);
- blk_mq_unfreeze_queue(lo->lo_queue);
}
out_unlock:
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 10/15] loop: only freeze the queue in __loop_clr_fd when needed
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (8 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 09/15] loop: don't freeze the queue in lo_release Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 11/15] loop: implement ->free_disk Christoph Hellwig
` (5 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
->release is only called after all outstanding I/O has completed, so only
freeze the queue when clearing the backing file of a live loop device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
drivers/block/loop.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index c57acbcf9be6a..a5dd259958ee2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1144,8 +1144,13 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
- /* freeze request queue during the transition */
- blk_mq_freeze_queue(lo->lo_queue);
+ /*
+ * Freeze the request queue when unbinding on a live file descriptor and
+ * thus an open device. When called from ->release we are guaranteed
+ * that there is no I/O in progress already.
+ */
+ if (!release)
+ blk_mq_freeze_queue(lo->lo_queue);
destroy_workqueue(lo->workqueue);
loop_free_idle_workers(lo, true);
@@ -1170,7 +1175,8 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
mapping_set_gfp_mask(filp->f_mapping, gfp);
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
- blk_mq_unfreeze_queue(lo->lo_queue);
+ if (!release)
+ blk_mq_unfreeze_queue(lo->lo_queue);
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 11/15] loop: implement ->free_disk
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (9 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 10/15] loop: only freeze the queue in __loop_clr_fd when needed Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 12/15] loop: suppress uevents while reconfiguring the device Christoph Hellwig
` (4 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Ensure that the lo_device which is stored in the gendisk private
data is valid until the gendisk is freed. Currently the loop driver
uses a lot of effort to make sure a device is not freed when it is
still in use, but to to fix a potential deadlock this will be relaxed
a bit soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/loop.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a5dd259958ee2..b3170e8cdbe95 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1765,6 +1765,14 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
mutex_unlock(&lo->lo_mutex);
}
+static void lo_free_disk(struct gendisk *disk)
+{
+ struct loop_device *lo = disk->private_data;
+
+ mutex_destroy(&lo->lo_mutex);
+ kfree(lo);
+}
+
static const struct block_device_operations lo_fops = {
.owner = THIS_MODULE,
.open = lo_open,
@@ -1773,6 +1781,7 @@ static const struct block_device_operations lo_fops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = lo_compat_ioctl,
#endif
+ .free_disk = lo_free_disk,
};
/*
@@ -2064,15 +2073,14 @@ static void loop_remove(struct loop_device *lo)
{
/* Make this loop device unreachable from pathname. */
del_gendisk(lo->lo_disk);
- blk_cleanup_disk(lo->lo_disk);
+ blk_cleanup_queue(lo->lo_disk->queue);
blk_mq_free_tag_set(&lo->tag_set);
mutex_lock(&loop_ctl_mutex);
idr_remove(&loop_index_idr, lo->lo_number);
mutex_unlock(&loop_ctl_mutex);
- /* There is no route which can find this loop device. */
- mutex_destroy(&lo->lo_mutex);
- kfree(lo);
+
+ put_disk(lo->lo_disk);
}
static void loop_probe(dev_t dev)
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 12/15] loop: suppress uevents while reconfiguring the device
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (10 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 11/15] loop: implement ->free_disk Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 13/15] loop: avoid loop_validate_mutex/lo_mutex in ->release Christoph Hellwig
` (3 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Currently, udev change event is generated for a loop device before the
device is ready for IO. Due to serialization on lo->lo_mutex in
lo_open() this does not matter because anybody is able to open the
device and do IO only after the configuration is finished. However this
synchronization in lo_open() is going away so make sure userspace
reacting to the change event will see the new device state by generating
the event only when the device is setup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/loop.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b3170e8cdbe95..bfd21af7aa38b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -572,6 +572,10 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
if (!file)
return -EBADF;
+
+ /* suppress uevents while reconfiguring the device */
+ dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
+
is_loop = is_loop_device(file);
error = loop_global_lock_killable(lo, is_loop);
if (error)
@@ -626,13 +630,18 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
fput(old_file);
if (partscan)
loop_reread_partitions(lo);
- return 0;
+
+ error = 0;
+done:
+ /* enable and uncork uevent now that we are done */
+ dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
+ return error;
out_err:
loop_global_unlock(lo, is_loop);
out_putf:
fput(file);
- return error;
+ goto done;
}
/* loop sysfs attributes */
@@ -999,6 +1008,9 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
/* This is safe, since we have a reference from open(). */
__module_get(THIS_MODULE);
+ /* suppress uevents while reconfiguring the device */
+ dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
+
/*
* If we don't hold exclusive handle for the device, upgrade to it
* here to avoid changing device under exclusive owner.
@@ -1101,7 +1113,12 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
loop_reread_partitions(lo);
if (!(mode & FMODE_EXCL))
bd_abort_claiming(bdev, loop_configure);
- return 0;
+
+ error = 0;
+done:
+ /* enable and uncork uevent now that we are done */
+ dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
+ return error;
out_unlock:
loop_global_unlock(lo, is_loop);
@@ -1112,7 +1129,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
fput(file);
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
- return error;
+ goto done;
}
static void __loop_clr_fd(struct loop_device *lo, bool release)
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 13/15] loop: avoid loop_validate_mutex/lo_mutex in ->release
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (11 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 12/15] loop: suppress uevents while reconfiguring the device Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 14/15] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
` (2 subsequent siblings)
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd, Tetsuo Handa
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Since ->release is called with disk->open_mutex held, and __loop_clr_fd()
from lo_release() is called via ->release when disk_openers() == 0, we are
guaranteed that "struct file" which will be passed to loop_validate_file()
via fget() cannot be the loop device __loop_clr_fd(lo, true) will clear.
Thus, there is no need to hold loop_validate_mutex from __loop_clr_fd()
if release == true.
When I made commit 3ce6e1f662a91097 ("loop: reintroduce global lock for
safe loop_validate_file() traversal"), I wrote "It is acceptable for
loop_validate_file() to succeed, for actual clear operation has not started
yet.". But now I came to feel why it is acceptable to succeed.
It seems that the loop driver was added in Linux 1.3.68, and
if (lo->lo_refcnt > 1)
return -EBUSY;
check in loop_clr_fd() was there from the beginning. The intent of this
check was unclear. But now I think that current
disk_openers(lo->lo_disk) > 1
form is there for three reasons.
(1) Avoid I/O errors when some process which opens and reads from this
loop device in response to uevent notification (e.g. systemd-udevd),
as described in commit a1ecac3b0656a682 ("loop: Make explicit loop
device destruction lazy"). This opener is short-lived because it is
likely that the file descriptor used by that process is closed soon.
(2) Avoid I/O errors caused by underlying layer of stacked loop devices
(i.e. ioctl(some_loop_fd, LOOP_SET_FD, other_loop_fd)) being suddenly
disappeared. This opener is long-lived because this reference is
associated with not a file descriptor but lo->lo_backing_file.
(3) Avoid I/O errors caused by underlying layer of mounted loop device
(i.e. mount(some_loop_device, some_mount_point)) being suddenly
disappeared. This opener is long-lived because this reference is
associated with not a file descriptor but mount.
While race in (1) might be acceptable, (2) and (3) should be checked
racelessly. That is, make sure that __loop_clr_fd() will not run if
loop_validate_file() succeeds, by doing refcount check with global lock
held when explicit loop device destruction is requested.
As a result of no longer waiting for lo->lo_mutex after setting Lo_rundown,
we can remove pointless BUG_ON(lo->lo_state != Lo_rundown) check.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/loop.c | 38 +++++++++++++-------------------------
1 file changed, 13 insertions(+), 25 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bfd21af7aa38b..ea3b0e055657f 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1137,27 +1137,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- /*
- * Flush loop_configure() and loop_change_fd(). It is acceptable for
- * loop_validate_file() to succeed, for actual clear operation has not
- * started yet.
- */
- mutex_lock(&loop_validate_mutex);
- mutex_unlock(&loop_validate_mutex);
- /*
- * loop_validate_file() now fails because l->lo_state != Lo_bound
- * became visible.
- */
-
- /*
- * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
- * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
- * lo->lo_state has changed while waiting for lo->lo_mutex.
- */
- mutex_lock(&lo->lo_mutex);
- BUG_ON(lo->lo_state != Lo_rundown);
- mutex_unlock(&lo->lo_mutex);
-
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
@@ -1244,11 +1223,20 @@ static int loop_clr_fd(struct loop_device *lo)
{
int err;
- err = mutex_lock_killable(&lo->lo_mutex);
+ /*
+ * Since lo_ioctl() is called without locks held, it is possible that
+ * loop_configure()/loop_change_fd() and loop_clr_fd() run in parallel.
+ *
+ * Therefore, use global lock when setting Lo_rundown state in order to
+ * make sure that loop_validate_file() will fail if the "struct file"
+ * which loop_configure()/loop_change_fd() found via fget() was this
+ * loop device.
+ */
+ err = loop_global_lock_killable(lo, true);
if (err)
return err;
if (lo->lo_state != Lo_bound) {
- mutex_unlock(&lo->lo_mutex);
+ loop_global_unlock(lo, true);
return -ENXIO;
}
/*
@@ -1263,11 +1251,11 @@ static int loop_clr_fd(struct loop_device *lo)
*/
if (atomic_read(&lo->lo_refcnt) > 1) {
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
- mutex_unlock(&lo->lo_mutex);
+ loop_global_unlock(lo, true);
return 0;
}
lo->lo_state = Lo_rundown;
- mutex_unlock(&lo->lo_mutex);
+ loop_global_unlock(lo, true);
__loop_clr_fd(lo, false);
return 0;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 14/15] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (12 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 13/15] loop: avoid loop_validate_mutex/lo_mutex in ->release Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-03-30 5:29 ` [PATCH 15/15] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
2022-04-04 7:42 ` yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
lo_refcount counts how many openers a loop device has, but that count
is already provided by the block layer in the bd_openers field of the
whole-disk block_device. Remove lo_refcount and allow opens to
succeed even on devices beeing deleted - now that ->free_disk is
implemented we can handle that race gracefull and all I/O on it will
just fail. Similarly there is a small race window now where
loop_control_remove does not synchronize the delete vs the remove
due do bd_openers not being under lo_mutex protection, but we can
handle that just as gracefully.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
drivers/block/loop.c | 37 +++++++------------------------------
drivers/block/loop.h | 1 -
2 files changed, 7 insertions(+), 31 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ea3b0e055657f..ca7103807d34a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1249,7 +1249,7 @@ static int loop_clr_fd(struct loop_device *lo)
* <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
* command to fail with EBUSY.
*/
- if (atomic_read(&lo->lo_refcnt) > 1) {
+ if (disk_openers(lo->lo_disk) > 1) {
lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
loop_global_unlock(lo, true);
return 0;
@@ -1729,33 +1729,15 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
}
#endif
-static int lo_open(struct block_device *bdev, fmode_t mode)
-{
- struct loop_device *lo = bdev->bd_disk->private_data;
- int err;
-
- err = mutex_lock_killable(&lo->lo_mutex);
- if (err)
- return err;
- if (lo->lo_state == Lo_deleting)
- err = -ENXIO;
- else
- atomic_inc(&lo->lo_refcnt);
- mutex_unlock(&lo->lo_mutex);
- return err;
-}
-
static void lo_release(struct gendisk *disk, fmode_t mode)
{
struct loop_device *lo = disk->private_data;
- mutex_lock(&lo->lo_mutex);
- if (atomic_dec_return(&lo->lo_refcnt))
- goto out_unlock;
+ if (disk_openers(disk) > 0)
+ return;
- if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
- if (lo->lo_state != Lo_bound)
- goto out_unlock;
+ mutex_lock(&lo->lo_mutex);
+ if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) {
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
/*
@@ -1765,8 +1747,6 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
__loop_clr_fd(lo, true);
return;
}
-
-out_unlock:
mutex_unlock(&lo->lo_mutex);
}
@@ -1780,7 +1760,6 @@ static void lo_free_disk(struct gendisk *disk)
static const struct block_device_operations lo_fops = {
.owner = THIS_MODULE,
- .open = lo_open,
.release = lo_release,
.ioctl = lo_ioctl,
#ifdef CONFIG_COMPAT
@@ -2034,7 +2013,6 @@ static int loop_add(int i)
*/
if (!part_shift)
disk->flags |= GENHD_FL_NO_PART;
- atomic_set(&lo->lo_refcnt, 0);
mutex_init(&lo->lo_mutex);
lo->lo_number = i;
spin_lock_init(&lo->lo_lock);
@@ -2124,13 +2102,12 @@ static int loop_control_remove(int idx)
ret = mutex_lock_killable(&lo->lo_mutex);
if (ret)
goto mark_visible;
- if (lo->lo_state != Lo_unbound ||
- atomic_read(&lo->lo_refcnt) > 0) {
+ if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) {
mutex_unlock(&lo->lo_mutex);
ret = -EBUSY;
goto mark_visible;
}
- /* Mark this loop device no longer open()-able. */
+ /* Mark this loop device as no more bound, but not quite unbound yet */
lo->lo_state = Lo_deleting;
mutex_unlock(&lo->lo_mutex);
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a6..449d562738c52 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -28,7 +28,6 @@ struct loop_func_table;
struct loop_device {
int lo_number;
- atomic_t lo_refcnt;
loff_t lo_offset;
loff_t lo_sizelimit;
int lo_flags;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 15/15] loop: don't destroy lo->workqueue in __loop_clr_fd
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (13 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 14/15] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
@ 2022-03-30 5:29 ` Christoph Hellwig
2022-04-04 7:42 ` yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
15 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-03-30 5:29 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd, syzbot+6479585dfd4dedd3f7e1
There is no need to destroy the workqueue when clearing unbinding
a loop device from a backing file. Not doing so on the other hand
avoid creating a complex lock dependency chain involving the global
system_transition_mutex.
Based on a patch from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>.
Reported-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
---
drivers/block/loop.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ca7103807d34a..8ff62c15f269b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -817,7 +817,6 @@ struct loop_worker {
};
static void loop_workfn(struct work_struct *work);
-static void loop_rootcg_workfn(struct work_struct *work);
#ifdef CONFIG_BLK_CGROUP
static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
@@ -1055,20 +1054,19 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
!file->f_op->write_iter)
lo->lo_flags |= LO_FLAGS_READ_ONLY;
- lo->workqueue = alloc_workqueue("loop%d",
- WQ_UNBOUND | WQ_FREEZABLE,
- 0,
- lo->lo_number);
if (!lo->workqueue) {
- error = -ENOMEM;
- goto out_unlock;
+ lo->workqueue = alloc_workqueue("loop%d",
+ WQ_UNBOUND | WQ_FREEZABLE,
+ 0, lo->lo_number);
+ if (!lo->workqueue) {
+ error = -ENOMEM;
+ goto out_unlock;
+ }
}
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
- INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
- INIT_LIST_HEAD(&lo->rootcg_cmd_list);
lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
lo->lo_device = bdev;
lo->lo_backing_file = file;
@@ -1148,10 +1146,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (!release)
blk_mq_freeze_queue(lo->lo_queue);
- destroy_workqueue(lo->workqueue);
- loop_free_idle_workers(lo, true);
- del_timer_sync(&lo->timer);
-
spin_lock_irq(&lo->lo_lock);
filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;
@@ -1754,6 +1748,10 @@ static void lo_free_disk(struct gendisk *disk)
{
struct loop_device *lo = disk->private_data;
+ if (lo->workqueue)
+ destroy_workqueue(lo->workqueue);
+ loop_free_idle_workers(lo, true);
+ del_timer_sync(&lo->timer);
mutex_destroy(&lo->lo_mutex);
kfree(lo);
}
@@ -2017,6 +2015,8 @@ static int loop_add(int i)
lo->lo_number = i;
spin_lock_init(&lo->lo_lock);
spin_lock_init(&lo->lo_work_lock);
+ INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
+ INIT_LIST_HEAD(&lo->rootcg_cmd_list);
disk->major = LOOP_MAJOR;
disk->first_minor = i << part_shift;
disk->minors = 1 << part_shift;
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-03-30 5:29 yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
` (14 preceding siblings ...)
2022-03-30 5:29 ` [PATCH 15/15] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
@ 2022-04-04 7:42 ` Christoph Hellwig
2022-04-04 9:39 ` Tetsuo Handa
15 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2022-04-04 7:42 UTC (permalink / raw)
To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
Any more comments? It would be good to settle this saga for 5.18.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-04-04 7:42 ` yet another approach to fix the loop lock order inversions v6 Christoph Hellwig
@ 2022-04-04 9:39 ` Tetsuo Handa
2022-04-05 6:28 ` Christoph Hellwig
2022-04-18 9:39 ` Tetsuo Handa
0 siblings, 2 replies; 23+ messages in thread
From: Tetsuo Handa @ 2022-04-04 9:39 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce, linux-block, nbd
On 2022/04/04 16:42, Christoph Hellwig wrote:
> Any more comments? It would be good to settle this saga for 5.18.
5 hours ago I added this series to my tree so that we can immediately
send to linux.git via linux-block.git#5.18 if nothing wrong happens.
https://osdn.net/projects/tomoyo/scm/git/tomoyo-test1/commits/99499a2b0ff01d8a5c0a06132ab33aaed4433b89
Two bugs which Jan has found in /bin/mount might not be yet fixed in
versions developers/users are using. Thus, let's wait for a while
before committing to linux.git.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-04-04 9:39 ` Tetsuo Handa
@ 2022-04-05 6:28 ` Christoph Hellwig
2022-04-05 6:38 ` Tetsuo Handa
2022-04-05 9:10 ` Jan Kara
2022-04-18 9:39 ` Tetsuo Handa
1 sibling, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2022-04-05 6:28 UTC (permalink / raw)
To: Jan Kara, Tetsuo Handa
Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Jan Kara,
Darrick J . Wong, Ming Lei, Matteo Croce, linux-block, nbd
On Mon, Apr 04, 2022 at 06:39:31PM +0900, Tetsuo Handa wrote:
> Two bugs which Jan has found in /bin/mount might not be yet fixed in
> versions developers/users are using. Thus, let's wait for a while
> before committing to linux.git.
Jan, which loop bugs might be relevant here?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-04-05 6:28 ` Christoph Hellwig
@ 2022-04-05 6:38 ` Tetsuo Handa
2022-04-05 9:10 ` Jan Kara
1 sibling, 0 replies; 23+ messages in thread
From: Tetsuo Handa @ 2022-04-05 6:38 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
Darrick J . Wong, Ming Lei, Matteo Croce, linux-block, nbd
On 2022/04/05 15:28, Christoph Hellwig wrote:
> On Mon, Apr 04, 2022 at 06:39:31PM +0900, Tetsuo Handa wrote:
>> Two bugs which Jan has found in /bin/mount might not be yet fixed in
>> versions developers/users are using. Thus, let's wait for a while
>> before committing to linux.git.
>
> Jan, which loop bugs might be relevant here?
Since async __loop_clr_fd() was reverted, these bugs should
no longer be relevant. But different bugs might be found with
this series. Thus, let's stay careful for a while...
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-04-05 6:28 ` Christoph Hellwig
2022-04-05 6:38 ` Tetsuo Handa
@ 2022-04-05 9:10 ` Jan Kara
1 sibling, 0 replies; 23+ messages in thread
From: Jan Kara @ 2022-04-05 9:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Tetsuo Handa, Jens Axboe, Josef Bacik, Minchan Kim,
Nitin Gupta, Darrick J . Wong, Ming Lei, Matteo Croce,
linux-block, nbd
On Tue 05-04-22 08:28:38, Christoph Hellwig wrote:
> On Mon, Apr 04, 2022 at 06:39:31PM +0900, Tetsuo Handa wrote:
> > Two bugs which Jan has found in /bin/mount might not be yet fixed in
> > versions developers/users are using. Thus, let's wait for a while
> > before committing to linux.git.
>
> Jan, which loop bugs might be relevant here?
So there was a bug in libmount code trying to reuse already setup loop
devices and changes in timing & removing the lo_mutex from lo_open() +
suitable racing with systemd-udev probing devices caused these bugs to
manifest. The visible effect was that a loop like:
while :; do mount -o loop,ro isofs.iso isofs/; umount isofs/; done
often failed with:
mount: /mnt: can't read superblock on /dev/loop0.
umount: /mnt: not mounted.
Bugs are fixed now by commits 3e1fc3bbe ("mount: Fix race in loop device
reuse code"), fb4b6b115 ("loopdev: Properly translate errors from
ul_path_read_*()"), 562990b552 ("loopdev: Do not treat errors when
detecting overlap as fatal") in util-linux git.
The only real-world impact I've heard about was LTP failing due to these
problems and those guys should be capable enough to update their util-linux
when we tell them. So I think we should be OK with pushing the kernel fixes
upstream but it may generate some noise from automated testers before
util-linux is updated on all of them...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yet another approach to fix the loop lock order inversions v6
2022-04-04 9:39 ` Tetsuo Handa
2022-04-05 6:28 ` Christoph Hellwig
@ 2022-04-18 9:39 ` Tetsuo Handa
1 sibling, 0 replies; 23+ messages in thread
From: Tetsuo Handa @ 2022-04-18 9:39 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
Cc: Jan Kara, Darrick J . Wong, Ming Lei, Matteo Croce, linux-block, nbd
On 2022/04/04 18:39, Tetsuo Handa wrote:
> On 2022/04/04 16:42, Christoph Hellwig wrote:
>> Any more comments? It would be good to settle this saga for 5.18.
>
> 5 hours ago I added this series to my tree so that we can immediately
> send to linux.git via linux-block.git#5.18 if nothing wrong happens.
>
> https://osdn.net/projects/tomoyo/scm/git/tomoyo-test1/commits/99499a2b0ff01d8a5c0a06132ab33aaed4433b89
This series was tested for 2 weeks using linux-next.git and got no problem report.
I think we can send this series to 5.18.
^ permalink raw reply [flat|nested] 23+ messages in thread