yet another approach to fix the loop lock order inversions v4

All of lore.kernel.org
 help / color / mirror / Atom feed

* yet another approach to fix the loop lock order inversions v4
@ 2022-03-24  7:51 Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Hi all,

this series uses the approach from Tetsuo to delay the destroy_workueue
call, extended by a delayed teardown of the workers to fix a potential
race window then the workqueue can be still round after finishing the
commands. 

Changes since v3:
 - change bd_openers into a atomic_t, including a bunch of cleanups
   and fix found while adding those

Changes since v2:
 - rebased to the lastest block for-next tree, which has the async
   clear reverted and ->free_disk
 - impkement ->free_disk for loop to handle open vs delete races
   more gracefully
 - get rid of lo_refcnt entirely

Changes since v1:
 - add comments to document the lo_refcnt synchronization
 - fix comment typos

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 12:20   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 02/13] zram: cleanup reset_store Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

The bdev parameter to ->ioctl contains the block device that the ioctl
is called on, which can be the partition.  But the code in nbd_ioctl
that uses it really wants the whole device for things like the bd_openers
check.  Switch to not pass the bdev along and always use nbd->disk->part0
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/nbd.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 5a1f98494dddf..795f65a5c9661 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1217,11 +1217,11 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
 	return -ENOSPC;
 }
 
-static void nbd_bdev_reset(struct block_device *bdev)
+static void nbd_bdev_reset(struct nbd_device *nbd)
 {
-	if (bdev->bd_openers > 1)
+	if (nbd->disk->part0->bd_openers > 1)
 		return;
-	set_capacity(bdev->bd_disk, 0);
+	set_capacity(nbd->disk, 0);
 }
 
 static void nbd_parse_flags(struct nbd_device *nbd)
@@ -1389,7 +1389,7 @@ static int nbd_start_device(struct nbd_device *nbd)
 	return nbd_set_size(nbd, config->bytesize, nbd_blksize(config));
 }
 
-static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *bdev)
+static int nbd_start_device_ioctl(struct nbd_device *nbd)
 {
 	struct nbd_config *config = nbd->config;
 	int ret;
@@ -1408,7 +1408,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 	flush_workqueue(nbd->recv_workq);
 
 	mutex_lock(&nbd->config_lock);
-	nbd_bdev_reset(bdev);
+	nbd_bdev_reset(nbd);
 	/* user requested, ignore socket errors */
 	if (test_bit(NBD_RT_DISCONNECT_REQUESTED, &config->runtime_flags))
 		ret = 0;
@@ -1417,12 +1417,11 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 	return ret;
 }
 
-static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
-				 struct block_device *bdev)
+static void nbd_clear_sock_ioctl(struct nbd_device *nbd)
 {
 	sock_shutdown(nbd);
-	__invalidate_device(bdev, true);
-	nbd_bdev_reset(bdev);
+	__invalidate_device(nbd->disk->part0, true);
+	nbd_bdev_reset(nbd);
 	if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
 			       &nbd->config->runtime_flags))
 		nbd_config_put(nbd);
@@ -1448,7 +1447,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 	case NBD_DISCONNECT:
 		return nbd_disconnect(nbd);
 	case NBD_CLEAR_SOCK:
-		nbd_clear_sock_ioctl(nbd, bdev);
+		nbd_clear_sock_ioctl(nbd);
 		return 0;
 	case NBD_SET_SOCK:
 		return nbd_add_socket(nbd, arg, false);
@@ -1468,7 +1467,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
 		config->flags = arg;
 		return 0;
 	case NBD_DO_IT:
-		return nbd_start_device_ioctl(nbd, bdev);
+		return nbd_start_device_ioctl(nbd);
 	case NBD_CLEAR_QUE:
 		/*
 		 * This is for compatibility only.  The queue is always cleared
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/13] zram: cleanup reset_store
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 12:21   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 03/13] zram: cleanup zram_remove Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Use a local variable for the gendisk instead of the part0 block_device,
as the gendisk is what this function actually operates on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/zram/zram_drv.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e9474b02012de..fd83fad59beb1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1786,7 +1786,7 @@ static ssize_t reset_store(struct device *dev,
 	int ret;
 	unsigned short do_reset;
 	struct zram *zram;
-	struct block_device *bdev;
+	struct gendisk *disk;
 
 	ret = kstrtou16(buf, 10, &do_reset);
 	if (ret)
@@ -1796,26 +1796,26 @@ static ssize_t reset_store(struct device *dev,
 		return -EINVAL;
 
 	zram = dev_to_zram(dev);
-	bdev = zram->disk->part0;
+	disk = zram->disk;
 
-	mutex_lock(&bdev->bd_disk->open_mutex);
+	mutex_lock(&disk->open_mutex);
 	/* Do not reset an active device or claimed device */
-	if (bdev->bd_openers || zram->claim) {
-		mutex_unlock(&bdev->bd_disk->open_mutex);
+	if (disk->part0->bd_openers || zram->claim) {
+		mutex_unlock(&disk->open_mutex);
 		return -EBUSY;
 	}
 
 	/* From now on, anyone can't open /dev/zram[0-9] */
 	zram->claim = true;
-	mutex_unlock(&bdev->bd_disk->open_mutex);
+	mutex_unlock(&disk->open_mutex);
 
 	/* Make sure all the pending I/O are finished */
-	sync_blockdev(bdev);
+	sync_blockdev(disk->part0);
 	zram_reset_device(zram);
 
-	mutex_lock(&bdev->bd_disk->open_mutex);
+	mutex_lock(&disk->open_mutex);
 	zram->claim = false;
-	mutex_unlock(&bdev->bd_disk->open_mutex);
+	mutex_unlock(&disk->open_mutex);
 
 	return len;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/13] zram: cleanup zram_remove
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 02/13] zram: cleanup reset_store Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 13:15   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 04/13] block: add a disk_openers helper Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Remove the bdev variable and just use the gendisk pointed to by the
zram_device directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/zram/zram_drv.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fd83fad59beb1..863606f1722b1 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1987,19 +1987,18 @@ static int zram_add(void)
 
 static int zram_remove(struct zram *zram)
 {
-	struct block_device *bdev = zram->disk->part0;
 	bool claimed;
 
-	mutex_lock(&bdev->bd_disk->open_mutex);
-	if (bdev->bd_openers) {
-		mutex_unlock(&bdev->bd_disk->open_mutex);
+	mutex_lock(&zram->disk->open_mutex);
+	if (zram->disk->part0->bd_openers) {
+		mutex_unlock(&zram->disk->open_mutex);
 		return -EBUSY;
 	}
 
 	claimed = zram->claim;
 	if (!claimed)
 		zram->claim = true;
-	mutex_unlock(&bdev->bd_disk->open_mutex);
+	mutex_unlock(&zram->disk->open_mutex);
 
 	zram_debugfs_unregister(zram);
 
@@ -2011,7 +2010,7 @@ static int zram_remove(struct zram *zram)
 		;
 	} else {
 		/* Make sure all the pending I/O are finished */
-		sync_blockdev(bdev);
+		sync_blockdev(zram->disk->part0);
 		zram_reset_device(zram);
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/13] block: add a disk_openers helper
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 03/13] zram: cleanup zram_remove Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 13:24   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Add a helper that returns the openers for a given gendisk to avoid having
drivers poke into disk->part0 to get at this information in a somewhat
cumbersome way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/nbd.c           |  4 ++--
 drivers/block/zram/zram_drv.c |  4 ++--
 include/linux/blkdev.h        | 15 +++++++++++++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 795f65a5c9661..93af7587d5ed6 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1219,7 +1219,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
 
 static void nbd_bdev_reset(struct nbd_device *nbd)
 {
-	if (nbd->disk->part0->bd_openers > 1)
+	if (disk_openers(nbd->disk) > 1)
 		return;
 	set_capacity(nbd->disk, 0);
 }
@@ -1578,7 +1578,7 @@ static void nbd_release(struct gendisk *disk, fmode_t mode)
 	struct nbd_device *nbd = disk->private_data;
 
 	if (test_bit(NBD_RT_DISCONNECT_ON_CLOSE, &nbd->config->runtime_flags) &&
-			disk->part0->bd_openers == 0)
+			disk_openers(disk) == 0)
 		nbd_disconnect_and_put(nbd);
 
 	nbd_config_put(nbd);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 863606f1722b1..2362385f782a9 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1800,7 +1800,7 @@ static ssize_t reset_store(struct device *dev,
 
 	mutex_lock(&disk->open_mutex);
 	/* Do not reset an active device or claimed device */
-	if (disk->part0->bd_openers || zram->claim) {
+	if (disk_openers(disk) || zram->claim) {
 		mutex_unlock(&disk->open_mutex);
 		return -EBUSY;
 	}
@@ -1990,7 +1990,7 @@ static int zram_remove(struct zram *zram)
 	bool claimed;
 
 	mutex_lock(&zram->disk->open_mutex);
-	if (zram->disk->part0->bd_openers) {
+	if (disk_openers(zram->disk)) {
 		mutex_unlock(&zram->disk->open_mutex);
 		return -EBUSY;
 	}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eb27312a1b8f3..9824ebc9b4d31 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -176,6 +176,21 @@ static inline bool disk_live(struct gendisk *disk)
 	return !inode_unhashed(disk->part0->bd_inode);
 }
 
+/**
+ * disk_openers - returns how many openers are there for a disk
+ * @disk: disk to check
+ *
+ * This returns the number of openers for a disk.  Note that this value is only
+ * stable if disk->open_mutex is held.
+ *
+ * Note: Due to a quirk in the block layer open code, each open partition is
+ * only counted once even if there are multiple openers.
+ */
+static inline unsigned int disk_openers(struct gendisk *disk)
+{
+	return disk->part0->bd_openers;
+}
+
 /*
  * The gendisk is refcounted by the part0 block_device, and the bd_device
  * therein is also used for device model presentation in sysfs.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 04/13] block: add a disk_openers helper Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 13:31   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 06/13] loop: de-duplicate the idle worker freeing code Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

All manipulation of bd_openers is under disk->open_mutex and will remain
so for the foreseeable future.  But at least one place reads it without
the lock (blkdev_get) and there are more to be added.  So make sure the
compiler does not do turn the increments and decrements into non-atomic
sequences by using an atomic_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bdev.c              | 16 ++++++++--------
 block/partitions/core.c   |  2 +-
 include/linux/blk_types.h |  2 +-
 include/linux/blkdev.h    |  2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 13de871fa8169..7bf88e591aaf3 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -673,17 +673,17 @@ static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
 		}
 	}
 
-	if (!bdev->bd_openers)
+	if (!atomic_read(&bdev->bd_openers))
 		set_init_blocksize(bdev);
 	if (test_bit(GD_NEED_PART_SCAN, &disk->state))
 		bdev_disk_changed(disk, false);
-	bdev->bd_openers++;
+	atomic_inc(&bdev->bd_openers);
 	return 0;
 }
 
 static void blkdev_put_whole(struct block_device *bdev, fmode_t mode)
 {
-	if (!--bdev->bd_openers)
+	if (atomic_dec_and_test(&bdev->bd_openers))
 		blkdev_flush_mapping(bdev);
 	if (bdev->bd_disk->fops->release)
 		bdev->bd_disk->fops->release(bdev->bd_disk, mode);
@@ -694,7 +694,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
 	struct gendisk *disk = part->bd_disk;
 	int ret;
 
-	if (part->bd_openers)
+	if (atomic_read(&part->bd_openers))
 		goto done;
 
 	ret = blkdev_get_whole(bdev_whole(part), mode);
@@ -708,7 +708,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
 	disk->open_partitions++;
 	set_init_blocksize(part);
 done:
-	part->bd_openers++;
+	atomic_inc(&part->bd_openers);
 	return 0;
 
 out_blkdev_put:
@@ -720,7 +720,7 @@ static void blkdev_put_part(struct block_device *part, fmode_t mode)
 {
 	struct block_device *whole = bdev_whole(part);
 
-	if (--part->bd_openers)
+	if (!atomic_dec_and_test(&part->bd_openers))
 		return;
 	blkdev_flush_mapping(part);
 	whole->bd_disk->open_partitions--;
@@ -899,7 +899,7 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
 	 * of the world and we want to avoid long (could be several minute)
 	 * syncs while holding the mutex.
 	 */
-	if (bdev->bd_openers == 1)
+	if (atomic_read(&bdev->bd_openers) == 1)
 		sync_blockdev(bdev);
 
 	mutex_lock(&disk->open_mutex);
@@ -1044,7 +1044,7 @@ void sync_bdevs(bool wait)
 		bdev = I_BDEV(inode);
 
 		mutex_lock(&bdev->bd_disk->open_mutex);
-		if (!bdev->bd_openers) {
+		if (!atomic_read(&bdev->bd_openers)) {
 			; /* skip */
 		} else if (wait) {
 			/*
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 2ef8dfa1e5c85..373ed748dcf26 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -486,7 +486,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
 		goto out_unlock;
 
 	ret = -EBUSY;
-	if (part->bd_openers)
+	if (atomic_read(&part->bd_openers))
 		goto out_unlock;
 
 	delete_partition(part);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 0c3563b45fe90..b1ced43ed0d3f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -44,7 +44,7 @@ struct block_device {
 	unsigned long		bd_stamp;
 	bool			bd_read_only;	/* read-only policy */
 	dev_t			bd_dev;
-	int			bd_openers;
+	atomic_t		bd_openers;
 	struct inode *		bd_inode;	/* will die */
 	struct super_block *	bd_super;
 	void *			bd_claiming;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9824ebc9b4d31..6b7c5af1d01df 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -188,7 +188,7 @@ static inline bool disk_live(struct gendisk *disk)
  */
 static inline unsigned int disk_openers(struct gendisk *disk)
 {
-	return disk->part0->bd_openers;
+	return atomic_read(&disk->part0->bd_openers);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/13] loop: de-duplicate the idle worker freeing code
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 07/13] loop: initialize the worker tracking fields once Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Use a common helper for both timer based and uncoditional freeing of idle
workers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 drivers/block/loop.c | 73 +++++++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 38 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3e636a75c83a8..762f0a18295d7 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -809,7 +809,6 @@ struct loop_worker {
 
 static void loop_workfn(struct work_struct *work);
 static void loop_rootcg_workfn(struct work_struct *work);
-static void loop_free_idle_workers(struct timer_list *timer);
 
 #ifdef CONFIG_BLK_CGROUP
 static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
@@ -893,6 +892,39 @@ static void loop_queue_work(struct loop_device *lo, struct loop_cmd *cmd)
 	spin_unlock_irq(&lo->lo_work_lock);
 }
 
+static void loop_set_timer(struct loop_device *lo)
+{
+	timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
+}
+
+static void loop_free_idle_workers(struct loop_device *lo, bool delete_all)
+{
+	struct loop_worker *pos, *worker;
+
+	spin_lock_irq(&lo->lo_work_lock);
+	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
+				idle_list) {
+		if (!delete_all &&
+		    time_is_after_jiffies(worker->last_ran_at +
+					  LOOP_IDLE_WORKER_TIMEOUT))
+			break;
+		list_del(&worker->idle_list);
+		rb_erase(&worker->rb_node, &lo->worker_tree);
+		css_put(worker->blkcg_css);
+		kfree(worker);
+	}
+	if (!list_empty(&lo->idle_worker_list))
+		loop_set_timer(lo);
+	spin_unlock_irq(&lo->lo_work_lock);
+}
+
+static void loop_free_idle_workers_timer(struct timer_list *timer)
+{
+	struct loop_device *lo = container_of(timer, struct loop_device, timer);
+
+	return loop_free_idle_workers(lo, false);
+}
+
 static void loop_update_rotational(struct loop_device *lo)
 {
 	struct file *file = lo->lo_backing_file;
@@ -1027,7 +1059,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
 	INIT_LIST_HEAD(&lo->idle_worker_list);
 	lo->worker_tree = RB_ROOT;
-	timer_setup(&lo->timer, loop_free_idle_workers,
+	timer_setup(&lo->timer, loop_free_idle_workers_timer,
 		TIMER_DEFERRABLE);
 	lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
 	lo->lo_device = bdev;
@@ -1091,7 +1123,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 {
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
-	struct loop_worker *pos, *worker;
 
 	/*
 	 * Flush loop_configure() and loop_change_fd(). It is acceptable for
@@ -1121,15 +1152,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	blk_mq_freeze_queue(lo->lo_queue);
 
 	destroy_workqueue(lo->workqueue);
-	spin_lock_irq(&lo->lo_work_lock);
-	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
-				idle_list) {
-		list_del(&worker->idle_list);
-		rb_erase(&worker->rb_node, &lo->worker_tree);
-		css_put(worker->blkcg_css);
-		kfree(worker);
-	}
-	spin_unlock_irq(&lo->lo_work_lock);
+	loop_free_idle_workers(lo, true);
 	del_timer_sync(&lo->timer);
 
 	spin_lock_irq(&lo->lo_lock);
@@ -1887,11 +1910,6 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
 	}
 }
 
-static void loop_set_timer(struct loop_device *lo)
-{
-	timer_reduce(&lo->timer, jiffies + LOOP_IDLE_WORKER_TIMEOUT);
-}
-
 static void loop_process_work(struct loop_worker *worker,
 			struct list_head *cmd_list, struct loop_device *lo)
 {
@@ -1940,27 +1958,6 @@ static void loop_rootcg_workfn(struct work_struct *work)
 	loop_process_work(NULL, &lo->rootcg_cmd_list, lo);
 }
 
-static void loop_free_idle_workers(struct timer_list *timer)
-{
-	struct loop_device *lo = container_of(timer, struct loop_device, timer);
-	struct loop_worker *pos, *worker;
-
-	spin_lock_irq(&lo->lo_work_lock);
-	list_for_each_entry_safe(worker, pos, &lo->idle_worker_list,
-				idle_list) {
-		if (time_is_after_jiffies(worker->last_ran_at +
-						LOOP_IDLE_WORKER_TIMEOUT))
-			break;
-		list_del(&worker->idle_list);
-		rb_erase(&worker->rb_node, &lo->worker_tree);
-		css_put(worker->blkcg_css);
-		kfree(worker);
-	}
-	if (!list_empty(&lo->idle_worker_list))
-		loop_set_timer(lo);
-	spin_unlock_irq(&lo->lo_work_lock);
-}
-
 static const struct blk_mq_ops loop_mq_ops = {
 	.queue_rq       = loop_queue_rq,
 	.complete	= lo_complete_rq,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/13] loop: initialize the worker tracking fields once
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 06/13] loop: de-duplicate the idle worker freeing code Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 08/13] loop: remove the racy bd_inode->i_mapping->nrpages asserts Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block,
	nbd, Chaitanya Kulkarni

There is no need to reinitialize idle_worker_list, worker_tree and timer
every time a loop device is configured.  Just initialize them once at
allocation time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 drivers/block/loop.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 762f0a18295d7..d1c1086beedce 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1057,10 +1057,6 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 
 	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
 	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
-	INIT_LIST_HEAD(&lo->idle_worker_list);
-	lo->worker_tree = RB_ROOT;
-	timer_setup(&lo->timer, loop_free_idle_workers_timer,
-		TIMER_DEFERRABLE);
 	lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
 	lo->lo_device = bdev;
 	lo->lo_backing_file = file;
@@ -1973,6 +1969,9 @@ static int loop_add(int i)
 	lo = kzalloc(sizeof(*lo), GFP_KERNEL);
 	if (!lo)
 		goto out;
+	lo->worker_tree = RB_ROOT;
+	INIT_LIST_HEAD(&lo->idle_worker_list);
+	timer_setup(&lo->timer, loop_free_idle_workers_timer, TIMER_DEFERRABLE);
 	lo->lo_state = Lo_unbound;
 
 	err = mutex_lock_killable(&loop_ctl_mutex);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/13] loop: remove the racy bd_inode->i_mapping->nrpages asserts
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 07/13] loop: initialize the worker tracking fields once Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 09/13] loop: don't freeze the queue in lo_release Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Nothing prevents a file system or userspace opener of the block device
from redirtying the page right afte sync_blockdev returned.  Fortunately
data in the page cache during a block device change is mostly harmless
anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 drivers/block/loop.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d1c1086beedce..25a71fd7b59da 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1276,15 +1276,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	/* I/O need to be drained during transfer transition */
 	blk_mq_freeze_queue(lo->lo_queue);
 
-	if (size_changed && lo->lo_device->bd_inode->i_mapping->nrpages) {
-		/* If any pages were dirtied after invalidate_bdev(), try again */
-		err = -EAGAIN;
-		pr_warn("%s: loop%d (%s) still has dirty pages (nrpages=%lu)\n",
-			__func__, lo->lo_number, lo->lo_file_name,
-			lo->lo_device->bd_inode->i_mapping->nrpages);
-		goto out_unfreeze;
-	}
-
 	prev_lo_flags = lo->lo_flags;
 
 	err = loop_set_status_from_info(lo, info);
@@ -1495,21 +1486,10 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
 	invalidate_bdev(lo->lo_device);
 
 	blk_mq_freeze_queue(lo->lo_queue);
-
-	/* invalidate_bdev should have truncated all the pages */
-	if (lo->lo_device->bd_inode->i_mapping->nrpages) {
-		err = -EAGAIN;
-		pr_warn("%s: loop%d (%s) still has dirty pages (nrpages=%lu)\n",
-			__func__, lo->lo_number, lo->lo_file_name,
-			lo->lo_device->bd_inode->i_mapping->nrpages);
-		goto out_unfreeze;
-	}
-
 	blk_queue_logical_block_size(lo->lo_queue, arg);
 	blk_queue_physical_block_size(lo->lo_queue, arg);
 	blk_queue_io_min(lo->lo_queue, arg);
 	loop_update_dio(lo);
-out_unfreeze:
 	blk_mq_unfreeze_queue(lo->lo_queue);
 
 	return err;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/13] loop: don't freeze the queue in lo_release
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 08/13] loop: remove the racy bd_inode->i_mapping->nrpages asserts Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 10/13] loop: only freeze the queue in __loop_clr_fd when needed Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

By the time the final ->release is called there can't be outstanding I/O.
For non-final ->release there is no need for driver action at all.

Thus remove the useless queue freeze.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 drivers/block/loop.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 25a71fd7b59da..c57acbcf9be6a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1753,13 +1753,6 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
 		 */
 		__loop_clr_fd(lo, true);
 		return;
-	} else if (lo->lo_state == Lo_bound) {
-		/*
-		 * Otherwise keep thread (if running) and config,
-		 * but flush possible ongoing bios in thread.
-		 */
-		blk_mq_freeze_queue(lo->lo_queue);
-		blk_mq_unfreeze_queue(lo->lo_queue);
 	}
 
 out_unlock:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/13] loop: only freeze the queue in __loop_clr_fd when needed
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 09/13] loop: don't freeze the queue in lo_release Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 11/13] loop: implement ->free_disk Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

->release is only called after all outstanding I/O has completed, so only
freeze the queue when clearing the backing file of a live loop device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 drivers/block/loop.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index c57acbcf9be6a..a5dd259958ee2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1144,8 +1144,13 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
 		blk_queue_write_cache(lo->lo_queue, false, false);
 
-	/* freeze request queue during the transition */
-	blk_mq_freeze_queue(lo->lo_queue);
+	/*
+	 * Freeze the request queue when unbinding on a live file descriptor and
+	 * thus an open device.  When called from ->release we are guaranteed
+	 * that there is no I/O in progress already.
+	 */
+	if (!release)
+		blk_mq_freeze_queue(lo->lo_queue);
 
 	destroy_workqueue(lo->workqueue);
 	loop_free_idle_workers(lo, true);
@@ -1170,7 +1175,8 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	mapping_set_gfp_mask(filp->f_mapping, gfp);
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	if (!release)
+		blk_mq_unfreeze_queue(lo->lo_queue);
 
 	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/13] loop: implement ->free_disk
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 10/13] loop: only freeze the queue in __loop_clr_fd when needed Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
  2022-03-24  7:51 ` [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
  12 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

Ensure that the lo_device which is stored in the gendisk private
data is valid until the gendisk is freed.  Currently the loop driver
uses a lot of effort to make sure a device is not freed when it is
still in use, but to to fix a potential deadlock this will be relaxed
a bit soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a5dd259958ee2..b3170e8cdbe95 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1765,6 +1765,14 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
 	mutex_unlock(&lo->lo_mutex);
 }
 
+static void lo_free_disk(struct gendisk *disk)
+{
+	struct loop_device *lo = disk->private_data;
+
+	mutex_destroy(&lo->lo_mutex);
+	kfree(lo);
+}
+
 static const struct block_device_operations lo_fops = {
 	.owner =	THIS_MODULE,
 	.open =		lo_open,
@@ -1773,6 +1781,7 @@ static const struct block_device_operations lo_fops = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl =	lo_compat_ioctl,
 #endif
+	.free_disk =	lo_free_disk,
 };
 
 /*
@@ -2064,15 +2073,14 @@ static void loop_remove(struct loop_device *lo)
 {
 	/* Make this loop device unreachable from pathname. */
 	del_gendisk(lo->lo_disk);
-	blk_cleanup_disk(lo->lo_disk);
+	blk_cleanup_queue(lo->lo_disk->queue);
 	blk_mq_free_tag_set(&lo->tag_set);
 
 	mutex_lock(&loop_ctl_mutex);
 	idr_remove(&loop_index_idr, lo->lo_number);
 	mutex_unlock(&loop_ctl_mutex);
-	/* There is no route which can find this loop device. */
-	mutex_destroy(&lo->lo_mutex);
-	kfree(lo);
+
+	put_disk(lo->lo_disk);
 }
 
 static void loop_probe(dev_t dev)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 11/13] loop: implement ->free_disk Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 14:13   ` Jan Kara
  2022-03-24  7:51 ` [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

lo_refcount counts how many openers a loop device has, but that count
is already provided by the block layer in the bd_openers field of the
whole-disk block_device.  Remove lo_refcount and allow opens to
succeed even on devices beeing deleted - now that ->free_disk is
implemented we can handle that race gracefull and all I/O on it will
just fail. Similarly there is a small race window now where
loop_control_remove does not synchronize the delete vs the remove
due do bd_openers not being under lo_mutex protection, but we can
handle that just as gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 37 +++++++------------------------------
 drivers/block/loop.h |  1 -
 2 files changed, 7 insertions(+), 31 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b3170e8cdbe95..e1eb925d3f855 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
 	 * command to fail with EBUSY.
 	 */
-	if (atomic_read(&lo->lo_refcnt) > 1) {
+	if (disk_openers(lo->lo_disk) > 1) {
 		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
 		mutex_unlock(&lo->lo_mutex);
 		return 0;
@@ -1724,33 +1724,15 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
 }
 #endif
 
-static int lo_open(struct block_device *bdev, fmode_t mode)
-{
-	struct loop_device *lo = bdev->bd_disk->private_data;
-	int err;
-
-	err = mutex_lock_killable(&lo->lo_mutex);
-	if (err)
-		return err;
-	if (lo->lo_state == Lo_deleting)
-		err = -ENXIO;
-	else
-		atomic_inc(&lo->lo_refcnt);
-	mutex_unlock(&lo->lo_mutex);
-	return err;
-}
-
 static void lo_release(struct gendisk *disk, fmode_t mode)
 {
 	struct loop_device *lo = disk->private_data;
 
-	mutex_lock(&lo->lo_mutex);
-	if (atomic_dec_return(&lo->lo_refcnt))
-		goto out_unlock;
+	if (disk_openers(disk) > 0)
+		return;
 
-	if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
-		if (lo->lo_state != Lo_bound)
-			goto out_unlock;
+	mutex_lock(&lo->lo_mutex);
+	if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) {
 		lo->lo_state = Lo_rundown;
 		mutex_unlock(&lo->lo_mutex);
 		/*
@@ -1760,8 +1742,6 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
 		__loop_clr_fd(lo, true);
 		return;
 	}
-
-out_unlock:
 	mutex_unlock(&lo->lo_mutex);
 }
 
@@ -1775,7 +1755,6 @@ static void lo_free_disk(struct gendisk *disk)
 
 static const struct block_device_operations lo_fops = {
 	.owner =	THIS_MODULE,
-	.open =		lo_open,
 	.release =	lo_release,
 	.ioctl =	lo_ioctl,
 #ifdef CONFIG_COMPAT
@@ -2029,7 +2008,6 @@ static int loop_add(int i)
 	 */
 	if (!part_shift)
 		disk->flags |= GENHD_FL_NO_PART;
-	atomic_set(&lo->lo_refcnt, 0);
 	mutex_init(&lo->lo_mutex);
 	lo->lo_number		= i;
 	spin_lock_init(&lo->lo_lock);
@@ -2119,13 +2097,12 @@ static int loop_control_remove(int idx)
 	ret = mutex_lock_killable(&lo->lo_mutex);
 	if (ret)
 		goto mark_visible;
-	if (lo->lo_state != Lo_unbound ||
-	    atomic_read(&lo->lo_refcnt) > 0) {
+	if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) {
 		mutex_unlock(&lo->lo_mutex);
 		ret = -EBUSY;
 		goto mark_visible;
 	}
-	/* Mark this loop device no longer open()-able. */
+	/* Mark this loop device as no more bound, but not quite unbound yet */
 	lo->lo_state = Lo_deleting;
 	mutex_unlock(&lo->lo_mutex);
 
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a6..449d562738c52 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -28,7 +28,6 @@ struct loop_func_table;
 
 struct loop_device {
 	int		lo_number;
-	atomic_t	lo_refcnt;
 	loff_t		lo_offset;
 	loff_t		lo_sizelimit;
 	int		lo_flags;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd
  2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2022-03-24  7:51 ` [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
@ 2022-03-24  7:51 ` Christoph Hellwig
  2022-03-24 14:14   ` Jan Kara
  12 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24  7:51 UTC (permalink / raw)
  To: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta
  Cc: Tetsuo Handa, Jan Kara, Darrick J . Wong, Ming Lei, linux-block,
	nbd, syzbot+6479585dfd4dedd3f7e1

There is no need to destroy the workqueue when clearing unbinding
a loop device from a backing file.  Not doing so on the other hand
avoid creating a complex lock dependency chain involving the global
system_transition_mutex.

Based on a patch from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>.

Reported-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index e1eb925d3f855..84613eb2fdd57 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -808,7 +808,6 @@ struct loop_worker {
 };
 
 static void loop_workfn(struct work_struct *work);
-static void loop_rootcg_workfn(struct work_struct *work);
 
 #ifdef CONFIG_BLK_CGROUP
 static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
@@ -1043,20 +1042,19 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	    !file->f_op->write_iter)
 		lo->lo_flags |= LO_FLAGS_READ_ONLY;
 
-	lo->workqueue = alloc_workqueue("loop%d",
-					WQ_UNBOUND | WQ_FREEZABLE,
-					0,
-					lo->lo_number);
 	if (!lo->workqueue) {
-		error = -ENOMEM;
-		goto out_unlock;
+		lo->workqueue = alloc_workqueue("loop%d",
+						WQ_UNBOUND | WQ_FREEZABLE,
+						0, lo->lo_number);
+		if (!lo->workqueue) {
+			error = -ENOMEM;
+			goto out_unlock;
+		}
 	}
 
 	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
 	set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
 
-	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
-	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
 	lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
 	lo->lo_device = bdev;
 	lo->lo_backing_file = file;
@@ -1152,10 +1150,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	if (!release)
 		blk_mq_freeze_queue(lo->lo_queue);
 
-	destroy_workqueue(lo->workqueue);
-	loop_free_idle_workers(lo, true);
-	del_timer_sync(&lo->timer);
-
 	spin_lock_irq(&lo->lo_lock);
 	filp = lo->lo_backing_file;
 	lo->lo_backing_file = NULL;
@@ -1749,6 +1743,10 @@ static void lo_free_disk(struct gendisk *disk)
 {
 	struct loop_device *lo = disk->private_data;
 
+	if (lo->workqueue)
+		destroy_workqueue(lo->workqueue);
+	loop_free_idle_workers(lo, true);
+	del_timer_sync(&lo->timer);
 	mutex_destroy(&lo->lo_mutex);
 	kfree(lo);
 }
@@ -2012,6 +2010,8 @@ static int loop_add(int i)
 	lo->lo_number		= i;
 	spin_lock_init(&lo->lo_lock);
 	spin_lock_init(&lo->lo_work_lock);
+	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
+	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
 	disk->major		= LOOP_MAJOR;
 	disk->first_minor	= i << part_shift;
 	disk->minors		= 1 << part_shift;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl
  2022-03-24  7:51 ` [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl Christoph Hellwig
@ 2022-03-24 12:20   ` Jan Kara
  2022-03-24 13:23     ` Jan Kara
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Kara @ 2022-03-24 12:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 08:51:07, Christoph Hellwig wrote:
> The bdev parameter to ->ioctl contains the block device that the ioctl
> is called on, which can be the partition.  But the code in nbd_ioctl
> that uses it really wants the whole device for things like the bd_openers
> check.  Switch to not pass the bdev along and always use nbd->disk->part0
> instead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/nbd.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 5a1f98494dddf..795f65a5c9661 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1217,11 +1217,11 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
>  	return -ENOSPC;
>  }
>  
> -static void nbd_bdev_reset(struct block_device *bdev)
> +static void nbd_bdev_reset(struct nbd_device *nbd)
>  {
> -	if (bdev->bd_openers > 1)
> +	if (nbd->disk->part0->bd_openers > 1)
>  		return;
> -	set_capacity(bdev->bd_disk, 0);
> +	set_capacity(nbd->disk, 0);
>  }
>  
>  static void nbd_parse_flags(struct nbd_device *nbd)
> @@ -1389,7 +1389,7 @@ static int nbd_start_device(struct nbd_device *nbd)
>  	return nbd_set_size(nbd, config->bytesize, nbd_blksize(config));
>  }
>  
> -static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *bdev)
> +static int nbd_start_device_ioctl(struct nbd_device *nbd)
>  {
>  	struct nbd_config *config = nbd->config;
>  	int ret;
> @@ -1408,7 +1408,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
>  	flush_workqueue(nbd->recv_workq);
>  
>  	mutex_lock(&nbd->config_lock);
> -	nbd_bdev_reset(bdev);
> +	nbd_bdev_reset(nbd);
>  	/* user requested, ignore socket errors */
>  	if (test_bit(NBD_RT_DISCONNECT_REQUESTED, &config->runtime_flags))
>  		ret = 0;
> @@ -1417,12 +1417,11 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
>  	return ret;
>  }
>  
> -static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
> -				 struct block_device *bdev)
> +static void nbd_clear_sock_ioctl(struct nbd_device *nbd)
>  {
>  	sock_shutdown(nbd);
> -	__invalidate_device(bdev, true);
> -	nbd_bdev_reset(bdev);
> +	__invalidate_device(nbd->disk->part0, true);
> +	nbd_bdev_reset(nbd);
>  	if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF,
>  			       &nbd->config->runtime_flags))
>  		nbd_config_put(nbd);
> @@ -1448,7 +1447,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>  	case NBD_DISCONNECT:
>  		return nbd_disconnect(nbd);
>  	case NBD_CLEAR_SOCK:
> -		nbd_clear_sock_ioctl(nbd, bdev);
> +		nbd_clear_sock_ioctl(nbd);
>  		return 0;
>  	case NBD_SET_SOCK:
>  		return nbd_add_socket(nbd, arg, false);
> @@ -1468,7 +1467,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
>  		config->flags = arg;
>  		return 0;
>  	case NBD_DO_IT:
> -		return nbd_start_device_ioctl(nbd, bdev);
> +		return nbd_start_device_ioctl(nbd);
>  	case NBD_CLEAR_QUE:
>  		/*
>  		 * This is for compatibility only.  The queue is always cleared
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/13] zram: cleanup reset_store
  2022-03-24  7:51 ` [PATCH 02/13] zram: cleanup reset_store Christoph Hellwig
@ 2022-03-24 12:21   ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-24 12:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 08:51:08, Christoph Hellwig wrote:
> Use a local variable for the gendisk instead of the part0 block_device,
> as the gendisk is what this function actually operates on.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/zram/zram_drv.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index e9474b02012de..fd83fad59beb1 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1786,7 +1786,7 @@ static ssize_t reset_store(struct device *dev,
>  	int ret;
>  	unsigned short do_reset;
>  	struct zram *zram;
> -	struct block_device *bdev;
> +	struct gendisk *disk;
>  
>  	ret = kstrtou16(buf, 10, &do_reset);
>  	if (ret)
> @@ -1796,26 +1796,26 @@ static ssize_t reset_store(struct device *dev,
>  		return -EINVAL;
>  
>  	zram = dev_to_zram(dev);
> -	bdev = zram->disk->part0;
> +	disk = zram->disk;
>  
> -	mutex_lock(&bdev->bd_disk->open_mutex);
> +	mutex_lock(&disk->open_mutex);
>  	/* Do not reset an active device or claimed device */
> -	if (bdev->bd_openers || zram->claim) {
> -		mutex_unlock(&bdev->bd_disk->open_mutex);
> +	if (disk->part0->bd_openers || zram->claim) {
> +		mutex_unlock(&disk->open_mutex);
>  		return -EBUSY;
>  	}
>  
>  	/* From now on, anyone can't open /dev/zram[0-9] */
>  	zram->claim = true;
> -	mutex_unlock(&bdev->bd_disk->open_mutex);
> +	mutex_unlock(&disk->open_mutex);
>  
>  	/* Make sure all the pending I/O are finished */
> -	sync_blockdev(bdev);
> +	sync_blockdev(disk->part0);
>  	zram_reset_device(zram);
>  
> -	mutex_lock(&bdev->bd_disk->open_mutex);
> +	mutex_lock(&disk->open_mutex);
>  	zram->claim = false;
> -	mutex_unlock(&bdev->bd_disk->open_mutex);
> +	mutex_unlock(&disk->open_mutex);
>  
>  	return len;
>  }
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/13] zram: cleanup zram_remove
  2022-03-24  7:51 ` [PATCH 03/13] zram: cleanup zram_remove Christoph Hellwig
@ 2022-03-24 13:15   ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-24 13:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 08:51:09, Christoph Hellwig wrote:
> Remove the bdev variable and just use the gendisk pointed to by the
> zram_device directly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/zram/zram_drv.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index fd83fad59beb1..863606f1722b1 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1987,19 +1987,18 @@ static int zram_add(void)
>  
>  static int zram_remove(struct zram *zram)
>  {
> -	struct block_device *bdev = zram->disk->part0;
>  	bool claimed;
>  
> -	mutex_lock(&bdev->bd_disk->open_mutex);
> -	if (bdev->bd_openers) {
> -		mutex_unlock(&bdev->bd_disk->open_mutex);
> +	mutex_lock(&zram->disk->open_mutex);
> +	if (zram->disk->part0->bd_openers) {
> +		mutex_unlock(&zram->disk->open_mutex);
>  		return -EBUSY;
>  	}
>  
>  	claimed = zram->claim;
>  	if (!claimed)
>  		zram->claim = true;
> -	mutex_unlock(&bdev->bd_disk->open_mutex);
> +	mutex_unlock(&zram->disk->open_mutex);
>  
>  	zram_debugfs_unregister(zram);
>  
> @@ -2011,7 +2010,7 @@ static int zram_remove(struct zram *zram)
>  		;
>  	} else {
>  		/* Make sure all the pending I/O are finished */
> -		sync_blockdev(bdev);
> +		sync_blockdev(zram->disk->part0);
>  		zram_reset_device(zram);
>  	}
>  
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl
  2022-03-24 12:20   ` Jan Kara
@ 2022-03-24 13:23     ` Jan Kara
  2022-03-24 17:11       ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Kara @ 2022-03-24 13:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 13:20:41, Jan Kara wrote:
> On Thu 24-03-22 08:51:07, Christoph Hellwig wrote:
> > The bdev parameter to ->ioctl contains the block device that the ioctl
> > is called on, which can be the partition.  But the code in nbd_ioctl
> > that uses it really wants the whole device for things like the bd_openers
> > check.  Switch to not pass the bdev along and always use nbd->disk->part0
> > instead.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Hum, thinking about this some more...

> > -static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
> > -				 struct block_device *bdev)
> > +static void nbd_clear_sock_ioctl(struct nbd_device *nbd)
> >  {
> >  	sock_shutdown(nbd);
> > -	__invalidate_device(bdev, true);
> > -	nbd_bdev_reset(bdev);
> > +	__invalidate_device(nbd->disk->part0, true);
> > +	nbd_bdev_reset(nbd);

Should't we call __invalidate_device() for the partition bdev here? Because
if the NBD device has partitions, filesystem will be mounted on this
partition and we want to invalidate it. Similarly the partition buffer
cache is different from the buffer cache of the whole device and we should
invalidate the partition one. In fact in cases like this I think we need
to invalidate all the partitions and filesystems that are there on this
disk so neither the old, nor the new code looks quite correct to me. Am I
missing something?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/13] block: add a disk_openers helper
  2022-03-24  7:51 ` [PATCH 04/13] block: add a disk_openers helper Christoph Hellwig
@ 2022-03-24 13:24   ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-24 13:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 08:51:10, Christoph Hellwig wrote:
> Add a helper that returns the openers for a given gendisk to avoid having
> drivers poke into disk->part0 to get at this information in a somewhat
> cumbersome way.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/nbd.c           |  4 ++--
>  drivers/block/zram/zram_drv.c |  4 ++--
>  include/linux/blkdev.h        | 15 +++++++++++++++
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 795f65a5c9661..93af7587d5ed6 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1219,7 +1219,7 @@ static int nbd_reconnect_socket(struct nbd_device *nbd, unsigned long arg)
>  
>  static void nbd_bdev_reset(struct nbd_device *nbd)
>  {
> -	if (nbd->disk->part0->bd_openers > 1)
> +	if (disk_openers(nbd->disk) > 1)
>  		return;
>  	set_capacity(nbd->disk, 0);
>  }
> @@ -1578,7 +1578,7 @@ static void nbd_release(struct gendisk *disk, fmode_t mode)
>  	struct nbd_device *nbd = disk->private_data;
>  
>  	if (test_bit(NBD_RT_DISCONNECT_ON_CLOSE, &nbd->config->runtime_flags) &&
> -			disk->part0->bd_openers == 0)
> +			disk_openers(disk) == 0)
>  		nbd_disconnect_and_put(nbd);
>  
>  	nbd_config_put(nbd);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 863606f1722b1..2362385f782a9 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1800,7 +1800,7 @@ static ssize_t reset_store(struct device *dev,
>  
>  	mutex_lock(&disk->open_mutex);
>  	/* Do not reset an active device or claimed device */
> -	if (disk->part0->bd_openers || zram->claim) {
> +	if (disk_openers(disk) || zram->claim) {
>  		mutex_unlock(&disk->open_mutex);
>  		return -EBUSY;
>  	}
> @@ -1990,7 +1990,7 @@ static int zram_remove(struct zram *zram)
>  	bool claimed;
>  
>  	mutex_lock(&zram->disk->open_mutex);
> -	if (zram->disk->part0->bd_openers) {
> +	if (disk_openers(zram->disk)) {
>  		mutex_unlock(&zram->disk->open_mutex);
>  		return -EBUSY;
>  	}
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index eb27312a1b8f3..9824ebc9b4d31 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -176,6 +176,21 @@ static inline bool disk_live(struct gendisk *disk)
>  	return !inode_unhashed(disk->part0->bd_inode);
>  }
>  
> +/**
> + * disk_openers - returns how many openers are there for a disk
> + * @disk: disk to check
> + *
> + * This returns the number of openers for a disk.  Note that this value is only
> + * stable if disk->open_mutex is held.
> + *
> + * Note: Due to a quirk in the block layer open code, each open partition is
> + * only counted once even if there are multiple openers.
> + */
> +static inline unsigned int disk_openers(struct gendisk *disk)
> +{
> +	return disk->part0->bd_openers;
> +}
> +
>  /*
>   * The gendisk is refcounted by the part0 block_device, and the bd_device
>   * therein is also used for device model presentation in sysfs.
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t
  2022-03-24  7:51 ` [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
@ 2022-03-24 13:31   ` Jan Kara
  2022-03-24 17:12     ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Kara @ 2022-03-24 13:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 08:51:11, Christoph Hellwig wrote:
> All manipulation of bd_openers is under disk->open_mutex and will remain
> so for the foreseeable future.  But at least one place reads it without
> the lock (blkdev_get) and there are more to be added.  So make sure the
> compiler does not do turn the increments and decrements into non-atomic
> sequences by using an atomic_t.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

BTW I suspect that drivers/block/aoe/ could now use bd_openers instead of
its driver specific mirror of it (->nopen). But that's certainly a separate
cleanup for some other time.

								Honza

> ---
>  block/bdev.c              | 16 ++++++++--------
>  block/partitions/core.c   |  2 +-
>  include/linux/blk_types.h |  2 +-
>  include/linux/blkdev.h    |  2 +-
>  4 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 13de871fa8169..7bf88e591aaf3 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -673,17 +673,17 @@ static int blkdev_get_whole(struct block_device *bdev, fmode_t mode)
>  		}
>  	}
>  
> -	if (!bdev->bd_openers)
> +	if (!atomic_read(&bdev->bd_openers))
>  		set_init_blocksize(bdev);
>  	if (test_bit(GD_NEED_PART_SCAN, &disk->state))
>  		bdev_disk_changed(disk, false);
> -	bdev->bd_openers++;
> +	atomic_inc(&bdev->bd_openers);
>  	return 0;
>  }
>  
>  static void blkdev_put_whole(struct block_device *bdev, fmode_t mode)
>  {
> -	if (!--bdev->bd_openers)
> +	if (atomic_dec_and_test(&bdev->bd_openers))
>  		blkdev_flush_mapping(bdev);
>  	if (bdev->bd_disk->fops->release)
>  		bdev->bd_disk->fops->release(bdev->bd_disk, mode);
> @@ -694,7 +694,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
>  	struct gendisk *disk = part->bd_disk;
>  	int ret;
>  
> -	if (part->bd_openers)
> +	if (atomic_read(&part->bd_openers))
>  		goto done;
>  
>  	ret = blkdev_get_whole(bdev_whole(part), mode);
> @@ -708,7 +708,7 @@ static int blkdev_get_part(struct block_device *part, fmode_t mode)
>  	disk->open_partitions++;
>  	set_init_blocksize(part);
>  done:
> -	part->bd_openers++;
> +	atomic_inc(&part->bd_openers);
>  	return 0;
>  
>  out_blkdev_put:
> @@ -720,7 +720,7 @@ static void blkdev_put_part(struct block_device *part, fmode_t mode)
>  {
>  	struct block_device *whole = bdev_whole(part);
>  
> -	if (--part->bd_openers)
> +	if (!atomic_dec_and_test(&part->bd_openers))
>  		return;
>  	blkdev_flush_mapping(part);
>  	whole->bd_disk->open_partitions--;
> @@ -899,7 +899,7 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
>  	 * of the world and we want to avoid long (could be several minute)
>  	 * syncs while holding the mutex.
>  	 */
> -	if (bdev->bd_openers == 1)
> +	if (atomic_read(&bdev->bd_openers) == 1)
>  		sync_blockdev(bdev);
>  
>  	mutex_lock(&disk->open_mutex);
> @@ -1044,7 +1044,7 @@ void sync_bdevs(bool wait)
>  		bdev = I_BDEV(inode);
>  
>  		mutex_lock(&bdev->bd_disk->open_mutex);
> -		if (!bdev->bd_openers) {
> +		if (!atomic_read(&bdev->bd_openers)) {
>  			; /* skip */
>  		} else if (wait) {
>  			/*
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 2ef8dfa1e5c85..373ed748dcf26 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -486,7 +486,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
>  		goto out_unlock;
>  
>  	ret = -EBUSY;
> -	if (part->bd_openers)
> +	if (atomic_read(&part->bd_openers))
>  		goto out_unlock;
>  
>  	delete_partition(part);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 0c3563b45fe90..b1ced43ed0d3f 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -44,7 +44,7 @@ struct block_device {
>  	unsigned long		bd_stamp;
>  	bool			bd_read_only;	/* read-only policy */
>  	dev_t			bd_dev;
> -	int			bd_openers;
> +	atomic_t		bd_openers;
>  	struct inode *		bd_inode;	/* will die */
>  	struct super_block *	bd_super;
>  	void *			bd_claiming;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 9824ebc9b4d31..6b7c5af1d01df 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -188,7 +188,7 @@ static inline bool disk_live(struct gendisk *disk)
>   */
>  static inline unsigned int disk_openers(struct gendisk *disk)
>  {
> -	return disk->part0->bd_openers;
> +	return atomic_read(&disk->part0->bd_openers);
>  }
>  
>  /*
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24  7:51 ` [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
@ 2022-03-24 14:13   ` Jan Kara
  2022-03-24 14:24     ` Tetsuo Handa
  2022-03-24 17:15     ` Christoph Hellwig
  0 siblings, 2 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-24 14:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd

[-- Attachment #1: Type: text/plain, Size: 4331 bytes --]

On Thu 24-03-22 08:51:18, Christoph Hellwig wrote:
> lo_refcount counts how many openers a loop device has, but that count
> is already provided by the block layer in the bd_openers field of the
> whole-disk block_device.  Remove lo_refcount and allow opens to
> succeed even on devices beeing deleted - now that ->free_disk is
> implemented we can handle that race gracefull and all I/O on it will
> just fail. Similarly there is a small race window now where
> loop_control_remove does not synchronize the delete vs the remove
> due do bd_openers not being under lo_mutex protection, but we can
> handle that just as gracefully.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good but I still think we need something like attached preparatory
patch to not regress e.g. filesystem probing triggered by udev events. What
do you think?

Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/loop.c | 37 +++++++------------------------------
>  drivers/block/loop.h |  1 -
>  2 files changed, 7 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index b3170e8cdbe95..e1eb925d3f855 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
>  	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
>  	 * command to fail with EBUSY.
>  	 */
> -	if (atomic_read(&lo->lo_refcnt) > 1) {
> +	if (disk_openers(lo->lo_disk) > 1) {
>  		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>  		mutex_unlock(&lo->lo_mutex);
>  		return 0;
> @@ -1724,33 +1724,15 @@ static int lo_compat_ioctl(struct block_device *bdev, fmode_t mode,
>  }
>  #endif
>  
> -static int lo_open(struct block_device *bdev, fmode_t mode)
> -{
> -	struct loop_device *lo = bdev->bd_disk->private_data;
> -	int err;
> -
> -	err = mutex_lock_killable(&lo->lo_mutex);
> -	if (err)
> -		return err;
> -	if (lo->lo_state == Lo_deleting)
> -		err = -ENXIO;
> -	else
> -		atomic_inc(&lo->lo_refcnt);
> -	mutex_unlock(&lo->lo_mutex);
> -	return err;
> -}
> -
>  static void lo_release(struct gendisk *disk, fmode_t mode)
>  {
>  	struct loop_device *lo = disk->private_data;
>  
> -	mutex_lock(&lo->lo_mutex);
> -	if (atomic_dec_return(&lo->lo_refcnt))
> -		goto out_unlock;
> +	if (disk_openers(disk) > 0)
> +		return;
>  
> -	if (lo->lo_flags & LO_FLAGS_AUTOCLEAR) {
> -		if (lo->lo_state != Lo_bound)
> -			goto out_unlock;
> +	mutex_lock(&lo->lo_mutex);
> +	if (lo->lo_state == Lo_bound && (lo->lo_flags & LO_FLAGS_AUTOCLEAR)) {
>  		lo->lo_state = Lo_rundown;
>  		mutex_unlock(&lo->lo_mutex);
>  		/*
> @@ -1760,8 +1742,6 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
>  		__loop_clr_fd(lo, true);
>  		return;
>  	}
> -
> -out_unlock:
>  	mutex_unlock(&lo->lo_mutex);
>  }
>  
> @@ -1775,7 +1755,6 @@ static void lo_free_disk(struct gendisk *disk)
>  
>  static const struct block_device_operations lo_fops = {
>  	.owner =	THIS_MODULE,
> -	.open =		lo_open,
>  	.release =	lo_release,
>  	.ioctl =	lo_ioctl,
>  #ifdef CONFIG_COMPAT
> @@ -2029,7 +2008,6 @@ static int loop_add(int i)
>  	 */
>  	if (!part_shift)
>  		disk->flags |= GENHD_FL_NO_PART;
> -	atomic_set(&lo->lo_refcnt, 0);
>  	mutex_init(&lo->lo_mutex);
>  	lo->lo_number		= i;
>  	spin_lock_init(&lo->lo_lock);
> @@ -2119,13 +2097,12 @@ static int loop_control_remove(int idx)
>  	ret = mutex_lock_killable(&lo->lo_mutex);
>  	if (ret)
>  		goto mark_visible;
> -	if (lo->lo_state != Lo_unbound ||
> -	    atomic_read(&lo->lo_refcnt) > 0) {
> +	if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) {
>  		mutex_unlock(&lo->lo_mutex);
>  		ret = -EBUSY;
>  		goto mark_visible;
>  	}
> -	/* Mark this loop device no longer open()-able. */
> +	/* Mark this loop device as no more bound, but not quite unbound yet */
>  	lo->lo_state = Lo_deleting;
>  	mutex_unlock(&lo->lo_mutex);
>  
> diff --git a/drivers/block/loop.h b/drivers/block/loop.h
> index 082d4b6bfc6a6..449d562738c52 100644
> --- a/drivers/block/loop.h
> +++ b/drivers/block/loop.h
> @@ -28,7 +28,6 @@ struct loop_func_table;
>  
>  struct loop_device {
>  	int		lo_number;
> -	atomic_t	lo_refcnt;
>  	loff_t		lo_offset;
>  	loff_t		lo_sizelimit;
>  	int		lo_flags;
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-loop-Send-udev-change-event-after-device-is-ready-fo.patch --]
[-- Type: text/x-patch, Size: 1933 bytes --]

From 663c2a1d0781a5fa34665a3e8544c5543f5aa70f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 24 Mar 2022 14:58:05 +0100
Subject: [PATCH] loop: Send udev change event after device is ready for IO

Currently, udev change event is generated for a loop device before the
device is ready for IO. Due to serialization on lo->lo_mutex in
lo_open() this does not matter because anybody is able to open the
device and do IO only after the configuration is finished. However this
synchronization in lo_open() is going away so make sure userspace
reacting to the change event will see the new device state by generating
the event only when the device is setup.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 drivers/block/loop.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3e636a75c83a..e401afc3e6dc 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -608,6 +608,8 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 	loop_update_dio(lo);
 	blk_mq_unfreeze_queue(lo->lo_queue);
 	partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
+	/* Notify userspace about device change when it is ready for IO */
+	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
 	loop_global_unlock(lo, is_loop);
 
 	/*
@@ -1020,7 +1022,6 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 		goto out_unlock;
 	}
 
-	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
 	set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
 
 	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
@@ -1068,6 +1069,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	if (partscan)
 		lo->lo_disk->flags &= ~GENHD_FL_NO_PART;
 
+	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
 	loop_global_unlock(lo, is_loop);
 	if (partscan)
 		loop_reread_partitions(lo);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd
  2022-03-24  7:51 ` [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
@ 2022-03-24 14:14   ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-24 14:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta, Tetsuo Handa,
	Jan Kara, Darrick J . Wong, Ming Lei, linux-block, nbd,
	syzbot+6479585dfd4dedd3f7e1

On Thu 24-03-22 08:51:19, Christoph Hellwig wrote:
> There is no need to destroy the workqueue when clearing unbinding
> a loop device from a backing file.  Not doing so on the other hand
> avoid creating a complex lock dependency chain involving the global
> system_transition_mutex.
> 
> Based on a patch from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>.
> 
> Reported-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/loop.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index e1eb925d3f855..84613eb2fdd57 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -808,7 +808,6 @@ struct loop_worker {
>  };
>  
>  static void loop_workfn(struct work_struct *work);
> -static void loop_rootcg_workfn(struct work_struct *work);
>  
>  #ifdef CONFIG_BLK_CGROUP
>  static inline int queue_on_root_worker(struct cgroup_subsys_state *css)
> @@ -1043,20 +1042,19 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  	    !file->f_op->write_iter)
>  		lo->lo_flags |= LO_FLAGS_READ_ONLY;
>  
> -	lo->workqueue = alloc_workqueue("loop%d",
> -					WQ_UNBOUND | WQ_FREEZABLE,
> -					0,
> -					lo->lo_number);
>  	if (!lo->workqueue) {
> -		error = -ENOMEM;
> -		goto out_unlock;
> +		lo->workqueue = alloc_workqueue("loop%d",
> +						WQ_UNBOUND | WQ_FREEZABLE,
> +						0, lo->lo_number);
> +		if (!lo->workqueue) {
> +			error = -ENOMEM;
> +			goto out_unlock;
> +		}
>  	}
>  
>  	disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
>  	set_disk_ro(lo->lo_disk, (lo->lo_flags & LO_FLAGS_READ_ONLY) != 0);
>  
> -	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
> -	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
>  	lo->use_dio = lo->lo_flags & LO_FLAGS_DIRECT_IO;
>  	lo->lo_device = bdev;
>  	lo->lo_backing_file = file;
> @@ -1152,10 +1150,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>  	if (!release)
>  		blk_mq_freeze_queue(lo->lo_queue);
>  
> -	destroy_workqueue(lo->workqueue);
> -	loop_free_idle_workers(lo, true);
> -	del_timer_sync(&lo->timer);
> -
>  	spin_lock_irq(&lo->lo_lock);
>  	filp = lo->lo_backing_file;
>  	lo->lo_backing_file = NULL;
> @@ -1749,6 +1743,10 @@ static void lo_free_disk(struct gendisk *disk)
>  {
>  	struct loop_device *lo = disk->private_data;
>  
> +	if (lo->workqueue)
> +		destroy_workqueue(lo->workqueue);
> +	loop_free_idle_workers(lo, true);
> +	del_timer_sync(&lo->timer);
>  	mutex_destroy(&lo->lo_mutex);
>  	kfree(lo);
>  }
> @@ -2012,6 +2010,8 @@ static int loop_add(int i)
>  	lo->lo_number		= i;
>  	spin_lock_init(&lo->lo_lock);
>  	spin_lock_init(&lo->lo_work_lock);
> +	INIT_WORK(&lo->rootcg_work, loop_rootcg_workfn);
> +	INIT_LIST_HEAD(&lo->rootcg_cmd_list);
>  	disk->major		= LOOP_MAJOR;
>  	disk->first_minor	= i << part_shift;
>  	disk->minors		= 1 << part_shift;
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 14:13   ` Jan Kara
@ 2022-03-24 14:24     ` Tetsuo Handa
  2022-03-24 17:23       ` Christoph Hellwig
  2022-03-24 17:15     ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2022-03-24 14:24 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
	Darrick J . Wong, Ming Lei, linux-block, nbd

On 2022/03/24 23:13, Jan Kara wrote:
>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>> index b3170e8cdbe95..e1eb925d3f855 100644
>> --- a/drivers/block/loop.c
>> +++ b/drivers/block/loop.c
>> @@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
>>  	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
>>  	 * command to fail with EBUSY.
>>  	 */
>> -	if (atomic_read(&lo->lo_refcnt) > 1) {
>> +	if (disk_openers(lo->lo_disk) > 1) {

This is sometimes "if (0) {" due to not holding disk->open_mutex.

>>  		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
>>  		mutex_unlock(&lo->lo_mutex);
>>  		return 0;

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl
  2022-03-24 13:23     ` Jan Kara
@ 2022-03-24 17:11       ` Christoph Hellwig
  2022-03-25  9:39         ` Jan Kara
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24 17:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim,
	Nitin Gupta, Tetsuo Handa, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Thu, Mar 24, 2022 at 02:23:22PM +0100, Jan Kara wrote:
> Should't we call __invalidate_device() for the partition bdev here? Because
> if the NBD device has partitions, filesystem will be mounted on this
> partition and we want to invalidate it. Similarly the partition buffer
> cache is different from the buffer cache of the whole device and we should
> invalidate the partition one. In fact in cases like this I think we need
> to invalidate all the partitions and filesystems that are there on this
> disk so neither the old, nor the new code looks quite correct to me. Am I
> missing something?

Well, that assumes just one partition is used, which kinda defeats the
purpose of partitions.  I can exclude the __invalidate_device to not
change from one kind of broken to another, but I suspect the real
question is why we have this __invalidate_device call at all.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t
  2022-03-24 13:31   ` Jan Kara
@ 2022-03-24 17:12     ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24 17:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim,
	Nitin Gupta, Tetsuo Handa, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Thu, Mar 24, 2022 at 02:31:36PM +0100, Jan Kara wrote:
> BTW I suspect that drivers/block/aoe/ could now use bd_openers instead of
> its driver specific mirror of it (->nopen). But that's certainly a separate
> cleanup for some other time.

Yes.  There actually are a lot of places that should drop their internal
number of openers counters.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 14:13   ` Jan Kara
  2022-03-24 14:24     ` Tetsuo Handa
@ 2022-03-24 17:15     ` Christoph Hellwig
  2022-03-24 17:47       ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24 17:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim,
	Nitin Gupta, Tetsuo Handa, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Thu, Mar 24, 2022 at 03:13:21PM +0100, Jan Kara wrote:
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good but I still think we need something like attached preparatory
> patch to not regress e.g. filesystem probing triggered by udev events. What
> do you think?

Yes, I think it makes sense to add that.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 14:24     ` Tetsuo Handa
@ 2022-03-24 17:23       ` Christoph Hellwig
  2022-03-25 10:54         ` Tetsuo Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24 17:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Jan Kara, Christoph Hellwig, Jens Axboe, Josef Bacik,
	Minchan Kim, Nitin Gupta, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Thu, Mar 24, 2022 at 11:24:43PM +0900, Tetsuo Handa wrote:
> On 2022/03/24 23:13, Jan Kara wrote:
> >> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> >> index b3170e8cdbe95..e1eb925d3f855 100644
> >> --- a/drivers/block/loop.c
> >> +++ b/drivers/block/loop.c
> >> @@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
> >>  	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
> >>  	 * command to fail with EBUSY.
> >>  	 */
> >> -	if (atomic_read(&lo->lo_refcnt) > 1) {
> >> +	if (disk_openers(lo->lo_disk) > 1) {
> 
> This is sometimes "if (0) {" due to not holding disk->open_mutex.

Yes, as clearly documented in the commit log.  In fact turning it
into an explicit if 0 (that is removing this code) might be a not
bad idea either - the code was added to work around a

	if (lo->lo_refcnt > 1)  /* we needed one fd for the ioctl */
		return -EBUSY;

that already did not make much sense to start with (but goes
back to before git history).

But for now I'd really prefer to stop moving the goalpost further and
further.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 17:15     ` Christoph Hellwig
@ 2022-03-24 17:47       ` Christoph Hellwig
  2022-03-25  9:34         ` Jan Kara
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-24 17:47 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, Josef Bacik, Minchan Kim,
	Nitin Gupta, Tetsuo Handa, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Thu, Mar 24, 2022 at 06:15:18PM +0100, Christoph Hellwig wrote:
> On Thu, Mar 24, 2022 at 03:13:21PM +0100, Jan Kara wrote:
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > 
> > Looks good but I still think we need something like attached preparatory
> > patch to not regress e.g. filesystem probing triggered by udev events. What
> > do you think?
> 
> Yes, I think it makes sense to add that.

Actually, looking at it in a little more detail: this misses the
explicit kobject_uevent calls for the capacity changes.  I think the
best idea might be something like this:

---
From db5ab8ab0fbcf07af769023a894fafc22b662cd9 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 24 Mar 2022 18:41:28 +0100
Subject: loop: suppress uevents while reconfiguring the device

Currently, udev change event is generated for a loop device before the
device is ready for IO. Due to serialization on lo->lo_mutex in
lo_open() this does not matter because anybody is able to open the
device and do IO only after the configuration is finished. However this
synchronization in lo_open() is going away so make sure userspace
reacting to the change event will see the new device state by generating
the event only when the device is setup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index b3170e8cdbe95..bfd21af7aa38b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -572,6 +572,10 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 
 	if (!file)
 		return -EBADF;
+
+	/* suppress uevents while reconfiguring the device */
+	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
+
 	is_loop = is_loop_device(file);
 	error = loop_global_lock_killable(lo, is_loop);
 	if (error)
@@ -626,13 +630,18 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
 	fput(old_file);
 	if (partscan)
 		loop_reread_partitions(lo);
-	return 0;
+
+	error = 0;
+done:
+	/* enable and uncork uevent now that we are done */
+	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
+	return error;
 
 out_err:
 	loop_global_unlock(lo, is_loop);
 out_putf:
 	fput(file);
-	return error;
+	goto done;
 }
 
 /* loop sysfs attributes */
@@ -999,6 +1008,9 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	/* This is safe, since we have a reference from open(). */
 	__module_get(THIS_MODULE);
 
+	/* suppress uevents while reconfiguring the device */
+	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
+
 	/*
 	 * If we don't hold exclusive handle for the device, upgrade to it
 	 * here to avoid changing device under exclusive owner.
@@ -1101,7 +1113,12 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 		loop_reread_partitions(lo);
 	if (!(mode & FMODE_EXCL))
 		bd_abort_claiming(bdev, loop_configure);
-	return 0;
+
+	error = 0;
+done:
+	/* enable and uncork uevent now that we are done */
+	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
+	return error;
 
 out_unlock:
 	loop_global_unlock(lo, is_loop);
@@ -1112,7 +1129,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	fput(file);
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
-	return error;
+	goto done;
 }
 
 static void __loop_clr_fd(struct loop_device *lo, bool release)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 17:47       ` Christoph Hellwig
@ 2022-03-25  9:34         ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-25  9:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
	Tetsuo Handa, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 18:47:01, Christoph Hellwig wrote:
> On Thu, Mar 24, 2022 at 06:15:18PM +0100, Christoph Hellwig wrote:
> > On Thu, Mar 24, 2022 at 03:13:21PM +0100, Jan Kara wrote:
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > 
> > > Looks good but I still think we need something like attached preparatory
> > > patch to not regress e.g. filesystem probing triggered by udev events. What
> > > do you think?
> > 
> > Yes, I think it makes sense to add that.
> 
> Actually, looking at it in a little more detail: this misses the
> explicit kobject_uevent calls for the capacity changes.  I think the
> best idea might be something like this:
> 
> ---
> From db5ab8ab0fbcf07af769023a894fafc22b662cd9 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 24 Mar 2022 18:41:28 +0100
> Subject: loop: suppress uevents while reconfiguring the device
> 
> Currently, udev change event is generated for a loop device before the
> device is ready for IO. Due to serialization on lo->lo_mutex in
> lo_open() this does not matter because anybody is able to open the
> device and do IO only after the configuration is finished. However this
> synchronization in lo_open() is going away so make sure userspace
> reacting to the change event will see the new device state by generating
> the event only when the device is setup.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Yeah, even better. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/block/loop.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index b3170e8cdbe95..bfd21af7aa38b 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -572,6 +572,10 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
>  
>  	if (!file)
>  		return -EBADF;
> +
> +	/* suppress uevents while reconfiguring the device */
> +	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
> +
>  	is_loop = is_loop_device(file);
>  	error = loop_global_lock_killable(lo, is_loop);
>  	if (error)
> @@ -626,13 +630,18 @@ static int loop_change_fd(struct loop_device *lo, struct block_device *bdev,
>  	fput(old_file);
>  	if (partscan)
>  		loop_reread_partitions(lo);
> -	return 0;
> +
> +	error = 0;
> +done:
> +	/* enable and uncork uevent now that we are done */
> +	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
> +	return error;
>  
>  out_err:
>  	loop_global_unlock(lo, is_loop);
>  out_putf:
>  	fput(file);
> -	return error;
> +	goto done;
>  }
>  
>  /* loop sysfs attributes */
> @@ -999,6 +1008,9 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  	/* This is safe, since we have a reference from open(). */
>  	__module_get(THIS_MODULE);
>  
> +	/* suppress uevents while reconfiguring the device */
> +	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 1);
> +
>  	/*
>  	 * If we don't hold exclusive handle for the device, upgrade to it
>  	 * here to avoid changing device under exclusive owner.
> @@ -1101,7 +1113,12 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  		loop_reread_partitions(lo);
>  	if (!(mode & FMODE_EXCL))
>  		bd_abort_claiming(bdev, loop_configure);
> -	return 0;
> +
> +	error = 0;
> +done:
> +	/* enable and uncork uevent now that we are done */
> +	dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
> +	return error;
>  
>  out_unlock:
>  	loop_global_unlock(lo, is_loop);
> @@ -1112,7 +1129,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  	fput(file);
>  	/* This is safe: open() is still holding a reference. */
>  	module_put(THIS_MODULE);
> -	return error;
> +	goto done;
>  }
>  
>  static void __loop_clr_fd(struct loop_device *lo, bool release)
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl
  2022-03-24 17:11       ` Christoph Hellwig
@ 2022-03-25  9:39         ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-25  9:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
	Tetsuo Handa, Darrick J . Wong, Ming Lei, linux-block, nbd

On Thu 24-03-22 18:11:48, Christoph Hellwig wrote:
> On Thu, Mar 24, 2022 at 02:23:22PM +0100, Jan Kara wrote:
> > Should't we call __invalidate_device() for the partition bdev here? Because
> > if the NBD device has partitions, filesystem will be mounted on this
> > partition and we want to invalidate it. Similarly the partition buffer
> > cache is different from the buffer cache of the whole device and we should
> > invalidate the partition one. In fact in cases like this I think we need
> > to invalidate all the partitions and filesystems that are there on this
> > disk so neither the old, nor the new code looks quite correct to me. Am I
> > missing something?
> 
> Well, that assumes just one partition is used, which kinda defeats the
> purpose of partitions.  I can exclude the __invalidate_device to not
> change from one kind of broken to another, but I suspect the real
> question is why we have this __invalidate_device call at all.

I suppose it tries to cleanup after effectively hot-unplugging the device.
But I think we don't need to untangle that in this patch set. I'd just
prefer we would not change one questionable behavior for another similarly
questionable one...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-24 17:23       ` Christoph Hellwig
@ 2022-03-25 10:54         ` Tetsuo Handa
  2022-03-25 16:23           ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2022-03-25 10:54 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner
  Cc: Jan Kara, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
	Darrick J . Wong, Ming Lei, linux-block, nbd

On 2022/03/25 2:23, Christoph Hellwig wrote:
> On Thu, Mar 24, 2022 at 11:24:43PM +0900, Tetsuo Handa wrote:
>> On 2022/03/24 23:13, Jan Kara wrote:
>>>> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
>>>> index b3170e8cdbe95..e1eb925d3f855 100644
>>>> --- a/drivers/block/loop.c
>>>> +++ b/drivers/block/loop.c
>>>> @@ -1244,7 +1244,7 @@ static int loop_clr_fd(struct loop_device *lo)
>>>>  	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
>>>>  	 * command to fail with EBUSY.
>>>>  	 */
>>>> -	if (atomic_read(&lo->lo_refcnt) > 1) {
>>>> +	if (disk_openers(lo->lo_disk) > 1) {
>>
>> This is sometimes "if (0) {" due to not holding disk->open_mutex.
> 
> Yes, as clearly documented in the commit log.  In fact turning it
> into an explicit if 0 (that is removing this code) might be a not
> bad idea either - the code was added to work around a
> 
> 	if (lo->lo_refcnt > 1)  /* we needed one fd for the ioctl */
> 		return -EBUSY;
> 
> that already did not make much sense to start with (but goes
> back to before git history).
> 
> But for now I'd really prefer to stop moving the goalpost further and
> further.

Then, why not kill this code?

 drivers/block/loop.c | 29 +++++++----------------------
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3e636a75c83a..26c8808d02d0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1205,30 +1205,15 @@ static int loop_clr_fd(struct loop_device *lo)
 	err = mutex_lock_killable(&lo->lo_mutex);
 	if (err)
 		return err;
-	if (lo->lo_state != Lo_bound) {
-		mutex_unlock(&lo->lo_mutex);
-		return -ENXIO;
-	}
-	/*
-	 * If we've explicitly asked to tear down the loop device,
-	 * and it has an elevated reference count, set it for auto-teardown when
-	 * the last reference goes away. This stops $!~#$@ udev from
-	 * preventing teardown because it decided that it needs to run blkid on
-	 * the loopback device whenever they appear. xfstests is notorious for
-	 * failing tests because blkid via udev races with a losetup
-	 * <dev>/do something like mkfs/losetup -d <dev> causing the losetup -d
-	 * command to fail with EBUSY.
-	 */
-	if (atomic_read(&lo->lo_refcnt) > 1) {
-		lo->lo_flags |= LO_FLAGS_AUTOCLEAR;
-		mutex_unlock(&lo->lo_mutex);
-		return 0;
-	}
-	lo->lo_state = Lo_rundown;
+	if (lo->lo_state != Lo_bound)
+		err = -ENXIO;
+	else
+		lo->lo_state = Lo_rundown;
 	mutex_unlock(&lo->lo_mutex);
 
-	__loop_clr_fd(lo, false);
-	return 0;
+	if (!err)
+		__loop_clr_fd(lo, false);
+	return err;
 }
 
 static int
-- 
2.32.0

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-25 10:54         ` Tetsuo Handa
@ 2022-03-25 16:23           ` Christoph Hellwig
  2022-03-28  8:30             ` Jan Kara
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-25 16:23 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christoph Hellwig, Dave Chinner, Jan Kara, Jens Axboe,
	Josef Bacik, Minchan Kim, Nitin Gupta, Darrick J . Wong,
	Ming Lei, linux-block, nbd

On Fri, Mar 25, 2022 at 07:54:15PM +0900, Tetsuo Handa wrote:
> > But for now I'd really prefer to stop moving the goalpost further and
> > further.
> 
> Then, why not kill this code?

I think we should eventually do that, and I've indeed tested a patch
that is only cosmetically different.  I wasn't really convinced we
should do it in this series, but if there is consensus that we should
do it now I can respin the series with a patch like this included.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-25 16:23           ` Christoph Hellwig
@ 2022-03-28  8:30             ` Jan Kara
  2022-03-29  6:39               ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Jan Kara @ 2022-03-28  8:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Tetsuo Handa, Dave Chinner, Jan Kara, Jens Axboe, Josef Bacik,
	Minchan Kim, Nitin Gupta, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Fri 25-03-22 17:23:31, Christoph Hellwig wrote:
> On Fri, Mar 25, 2022 at 07:54:15PM +0900, Tetsuo Handa wrote:
> > > But for now I'd really prefer to stop moving the goalpost further and
> > > further.
> > 
> > Then, why not kill this code?
> 
> I think we should eventually do that, and I've indeed tested a patch
> that is only cosmetically different.  I wasn't really convinced we
> should do it in this series, but if there is consensus that we should
> do it now I can respin the series with a patch like this included.

I'd defer it to a separate patchset. Because as much as the change to
disallow LOOP_CLR_FD ioctl for used loop device makes sense, I'm not sure
there isn't some framework using loop devices somewhere which relies on
this just getting magically translated to setting LO_AUTOCLEAR flag. So IMO
this has a big potential of userspace visible regression and as such I'd
prefer doing it separately from the bugfixes.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-28  8:30             ` Jan Kara
@ 2022-03-29  6:39               ` Christoph Hellwig
  2022-03-29  9:42                 ` Jan Kara
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-29  6:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Tetsuo Handa, Dave Chinner, Jens Axboe,
	Josef Bacik, Minchan Kim, Nitin Gupta, Darrick J . Wong,
	Ming Lei, linux-block, nbd

On Mon, Mar 28, 2022 at 10:30:45AM +0200, Jan Kara wrote:
> On Fri 25-03-22 17:23:31, Christoph Hellwig wrote:
> > On Fri, Mar 25, 2022 at 07:54:15PM +0900, Tetsuo Handa wrote:
> > > > But for now I'd really prefer to stop moving the goalpost further and
> > > > further.
> > > 
> > > Then, why not kill this code?
> > 
> > I think we should eventually do that, and I've indeed tested a patch
> > that is only cosmetically different.  I wasn't really convinced we
> > should do it in this series, but if there is consensus that we should
> > do it now I can respin the series with a patch like this included.
> 
> I'd defer it to a separate patchset. Because as much as the change to
> disallow LOOP_CLR_FD ioctl for used loop device makes sense, I'm not sure
> there isn't some framework using loop devices somewhere which relies on
> this just getting magically translated to setting LO_AUTOCLEAR flag. So IMO
> this has a big potential of userspace visible regression and as such I'd
> prefer doing it separately from the bugfixes.

At least my idea would not be to disallow LOOP_CLR_FD on a used block
devices as that would go back to the udev problems before Dave turned
it into a magic LO_AUTOCLEAR.  But to remove the lo_refcnt check
entirely, as loop_clr_fd now is safe against concurrent users - it
has to anyway as there can be other users even without an open.

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
---end quoted text---

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-29  6:39               ` Christoph Hellwig
@ 2022-03-29  9:42                 ` Jan Kara
  2022-03-29  9:49                   ` Tetsuo Handa
  2022-03-29 13:14                   ` Christoph Hellwig
  0 siblings, 2 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-29  9:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Tetsuo Handa, Dave Chinner, Jens Axboe, Josef Bacik,
	Minchan Kim, Nitin Gupta, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Tue 29-03-22 08:39:21, Christoph Hellwig wrote:
> On Mon, Mar 28, 2022 at 10:30:45AM +0200, Jan Kara wrote:
> > On Fri 25-03-22 17:23:31, Christoph Hellwig wrote:
> > > On Fri, Mar 25, 2022 at 07:54:15PM +0900, Tetsuo Handa wrote:
> > > > > But for now I'd really prefer to stop moving the goalpost further and
> > > > > further.
> > > > 
> > > > Then, why not kill this code?
> > > 
> > > I think we should eventually do that, and I've indeed tested a patch
> > > that is only cosmetically different.  I wasn't really convinced we
> > > should do it in this series, but if there is consensus that we should
> > > do it now I can respin the series with a patch like this included.
> > 
> > I'd defer it to a separate patchset. Because as much as the change to
> > disallow LOOP_CLR_FD ioctl for used loop device makes sense, I'm not sure
> > there isn't some framework using loop devices somewhere which relies on
> > this just getting magically translated to setting LO_AUTOCLEAR flag. So IMO
> > this has a big potential of userspace visible regression and as such I'd
> > prefer doing it separately from the bugfixes.
> 
> At least my idea would not be to disallow LOOP_CLR_FD on a used block
> devices as that would go back to the udev problems before Dave turned
> it into a magic LO_AUTOCLEAR.  But to remove the lo_refcnt check
> entirely, as loop_clr_fd now is safe against concurrent users - it
> has to anyway as there can be other users even without an open.

Ah, OK, so you'd always set LO_AUTOCLEAR and leave cleanup to happen
from lo_release()? That makes sense to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-29  9:42                 ` Jan Kara
@ 2022-03-29  9:49                   ` Tetsuo Handa
  2022-03-29 13:14                   ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Tetsuo Handa @ 2022-03-29  9:49 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: Dave Chinner, Jens Axboe, Josef Bacik, Minchan Kim, Nitin Gupta,
	Darrick J . Wong, Ming Lei, linux-block, nbd

On 2022/03/29 18:42, Jan Kara wrote:
> Ah, OK, so you'd always set LO_AUTOCLEAR and leave cleanup to happen
> from lo_release()? That makes sense to me.

"loop: remove lo_refcount and avoid lo_mutex in ->open / ->release" is going to
make "not always" set LO_AUTOCLEAR due to lack of disk->open_mutex serialization.

That's why this topic is discussed.

We could use loop_global_lock_killable()/loop_global_unlock() in order to compensate
for lack of disk->open_mutex serialization, as described in the bottom half of
https://lkml.kernel.org/r/03628e13-ca56-4ed0-da5a-ee698c83f48d@I-love.SAKURA.ne.jp .

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-29  9:42                 ` Jan Kara
  2022-03-29  9:49                   ` Tetsuo Handa
@ 2022-03-29 13:14                   ` Christoph Hellwig
  2022-03-30  7:58                     ` Jan Kara
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2022-03-29 13:14 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Tetsuo Handa, Dave Chinner, Jens Axboe,
	Josef Bacik, Minchan Kim, Nitin Gupta, Darrick J . Wong,
	Ming Lei, linux-block, nbd

On Tue, Mar 29, 2022 at 11:42:03AM +0200, Jan Kara wrote:
> > entirely, as loop_clr_fd now is safe against concurrent users - it
> > has to anyway as there can be other users even without an open.
> 
> Ah, OK, so you'd always set LO_AUTOCLEAR and leave cleanup to happen
> from lo_release()? That makes sense to me.

No, my idea was to never set LO_AUTOCLEAR.  We have a frozen queue and
all protections in place to make clearing the file perfectly safe.
In fact the change_fd case also allows this.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
  2022-03-29 13:14                   ` Christoph Hellwig
@ 2022-03-30  7:58                     ` Jan Kara
  0 siblings, 0 replies; 38+ messages in thread
From: Jan Kara @ 2022-03-30  7:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Tetsuo Handa, Dave Chinner, Jens Axboe, Josef Bacik,
	Minchan Kim, Nitin Gupta, Darrick J . Wong, Ming Lei,
	linux-block, nbd

On Tue 29-03-22 15:14:27, Christoph Hellwig wrote:
> On Tue, Mar 29, 2022 at 11:42:03AM +0200, Jan Kara wrote:
> > > entirely, as loop_clr_fd now is safe against concurrent users - it
> > > has to anyway as there can be other users even without an open.
> > 
> > Ah, OK, so you'd always set LO_AUTOCLEAR and leave cleanup to happen
> > from lo_release()? That makes sense to me.
> 
> No, my idea was to never set LO_AUTOCLEAR.  We have a frozen queue and
> all protections in place to make clearing the file perfectly safe.
> In fact the change_fd case also allows this.

I see, thanks for explanation. But then there's the risk of userspace
regressions if someone relies on the current behavior that LOOP_CLR_FD
ioctl does only delayed teardown of the device if someone is using it.
Personally I don't think the risk of regression is worth the benefit of the
cleanup but maybe it's worth trying. At least I know that for loop device
handling inside mount(8) (which plays tricks with reusing existing bound
loop devices), this change should be safe.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2022-03-30  7:58 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-24  7:51 yet another approach to fix the loop lock order inversions v4 Christoph Hellwig
2022-03-24  7:51 ` [PATCH 01/13] nbd: use the correct block_device in nbd_ioctl Christoph Hellwig
2022-03-24 12:20   ` Jan Kara
2022-03-24 13:23     ` Jan Kara
2022-03-24 17:11       ` Christoph Hellwig
2022-03-25  9:39         ` Jan Kara
2022-03-24  7:51 ` [PATCH 02/13] zram: cleanup reset_store Christoph Hellwig
2022-03-24 12:21   ` Jan Kara
2022-03-24  7:51 ` [PATCH 03/13] zram: cleanup zram_remove Christoph Hellwig
2022-03-24 13:15   ` Jan Kara
2022-03-24  7:51 ` [PATCH 04/13] block: add a disk_openers helper Christoph Hellwig
2022-03-24 13:24   ` Jan Kara
2022-03-24  7:51 ` [PATCH 05/13] block: turn bdev->bd_openers into an atomic_t Christoph Hellwig
2022-03-24 13:31   ` Jan Kara
2022-03-24 17:12     ` Christoph Hellwig
2022-03-24  7:51 ` [PATCH 06/13] loop: de-duplicate the idle worker freeing code Christoph Hellwig
2022-03-24  7:51 ` [PATCH 07/13] loop: initialize the worker tracking fields once Christoph Hellwig
2022-03-24  7:51 ` [PATCH 08/13] loop: remove the racy bd_inode->i_mapping->nrpages asserts Christoph Hellwig
2022-03-24  7:51 ` [PATCH 09/13] loop: don't freeze the queue in lo_release Christoph Hellwig
2022-03-24  7:51 ` [PATCH 10/13] loop: only freeze the queue in __loop_clr_fd when needed Christoph Hellwig
2022-03-24  7:51 ` [PATCH 11/13] loop: implement ->free_disk Christoph Hellwig
2022-03-24  7:51 ` [PATCH 12/13] loop: remove lo_refcount and avoid lo_mutex in ->open / ->release Christoph Hellwig
2022-03-24 14:13   ` Jan Kara
2022-03-24 14:24     ` Tetsuo Handa
2022-03-24 17:23       ` Christoph Hellwig
2022-03-25 10:54         ` Tetsuo Handa
2022-03-25 16:23           ` Christoph Hellwig
2022-03-28  8:30             ` Jan Kara
2022-03-29  6:39               ` Christoph Hellwig
2022-03-29  9:42                 ` Jan Kara
2022-03-29  9:49                   ` Tetsuo Handa
2022-03-29 13:14                   ` Christoph Hellwig
2022-03-30  7:58                     ` Jan Kara
2022-03-24 17:15     ` Christoph Hellwig
2022-03-24 17:47       ` Christoph Hellwig
2022-03-25  9:34         ` Jan Kara
2022-03-24  7:51 ` [PATCH 13/13] loop: don't destroy lo->workqueue in __loop_clr_fd Christoph Hellwig
2022-03-24 14:14   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.