linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* introduce bdev holder ops and a file system shutdown method v3
@ 2023-06-01  9:44 Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 01/16] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
                   ` (17 more replies)
  0 siblings, 18 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Hi all,

this series fixes the long standing problem that we never had a good way
to communicate block device events to the user of the block device.

It fixes this by introducing a new set of holder ops registered at
blkdev_get_by_* time for the exclusive holder, and then wire that up
to a shutdown super operation to report the block device remove to the
file systems.

Changes since v2:
 - rename a method in xfs
 - add ext4 support

Changes since v1:
 - add a patch to refactor bd_may_claim
 - add a sanity check for mismatching holder ops in bd_may_claim
 - move partition removal later in del_gendisk so that partitions
   are still around for the shutdown notification
 - add SHUTDOWN_DEVICE_REMOVED to XFS_SHUTDOWN_STRINGS

Diffstat:
 block/bdev.c                        |  159 ++++++++++++++++++++----------------
 block/blk.h                         |    2 
 block/fops.c                        |    2 
 block/genhd.c                       |   78 +++++++++++++----
 block/ioctl.c                       |    3 
 block/partitions/core.c             |   31 +++----
 drivers/block/drbd/drbd_nl.c        |    3 
 drivers/block/loop.c                |    2 
 drivers/block/pktcdvd.c             |    5 -
 drivers/block/rnbd/rnbd-srv.c       |    2 
 drivers/block/xen-blkback/xenbus.c  |    2 
 drivers/block/zram/zram_drv.c       |    2 
 drivers/md/bcache/super.c           |    2 
 drivers/md/dm.c                     |    2 
 drivers/md/md.c                     |    2 
 drivers/mtd/devices/block2mtd.c     |    4 
 drivers/nvme/target/io-cmd-bdev.c   |    2 
 drivers/s390/block/dasd_genhd.c     |    2 
 drivers/target/target_core_iblock.c |    2 
 drivers/target/target_core_pscsi.c  |    3 
 fs/btrfs/dev-replace.c              |    2 
 fs/btrfs/volumes.c                  |    6 -
 fs/erofs/super.c                    |    2 
 fs/ext4/ext4.h                      |    1 
 fs/ext4/ioctl.c                     |   24 +++--
 fs/ext4/super.c                     |   18 +++-
 fs/f2fs/super.c                     |    4 
 fs/jfs/jfs_logmgr.c                 |    2 
 fs/nfs/blocklayout/dev.c            |    5 -
 fs/nilfs2/super.c                   |    2 
 fs/ocfs2/cluster/heartbeat.c        |    2 
 fs/reiserfs/journal.c               |    5 -
 fs/super.c                          |   21 ++++
 fs/xfs/xfs_fsops.c                  |    3 
 fs/xfs/xfs_mount.h                  |    4 
 fs/xfs/xfs_super.c                  |   21 ++++
 include/linux/blk_types.h           |    2 
 include/linux/blkdev.h              |   12 ++
 include/linux/fs.h                  |    1 
 kernel/power/swap.c                 |    4 
 mm/swapfile.c                       |    3 
 41 files changed, 297 insertions(+), 157 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/16] block: factor out a bd_end_claim helper from blkdev_put
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 02/16] block: refactor bd_may_claim Christoph Hellwig
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Move all the logic to release an exclusive claim into a helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c | 63 +++++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef3237a..317bfd9cba40fa 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -589,6 +589,37 @@ void bd_abort_claiming(struct block_device *bdev, void *holder)
 }
 EXPORT_SYMBOL(bd_abort_claiming);
 
+static void bd_end_claim(struct block_device *bdev)
+{
+	struct block_device *whole = bdev_whole(bdev);
+	bool unblock = false;
+
+	/*
+	 * Release a claim on the device.  The holder fields are protected with
+	 * bdev_lock.  open_mutex is used to synchronize disk_holder unlinking.
+	 */
+	spin_lock(&bdev_lock);
+	WARN_ON_ONCE(--bdev->bd_holders < 0);
+	WARN_ON_ONCE(--whole->bd_holders < 0);
+	if (!bdev->bd_holders) {
+		bdev->bd_holder = NULL;
+		if (bdev->bd_write_holder)
+			unblock = true;
+	}
+	if (!whole->bd_holders)
+		whole->bd_holder = NULL;
+	spin_unlock(&bdev_lock);
+
+	/*
+	 * If this was the last claim, remove holder link and unblock evpoll if
+	 * it was a write holder.
+	 */
+	if (unblock) {
+		disk_unblock_events(bdev->bd_disk);
+		bdev->bd_write_holder = false;
+	}
+}
+
 static void blkdev_flush_mapping(struct block_device *bdev)
 {
 	WARN_ON_ONCE(bdev->bd_holders);
@@ -843,36 +874,8 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
 		sync_blockdev(bdev);
 
 	mutex_lock(&disk->open_mutex);
-	if (mode & FMODE_EXCL) {
-		struct block_device *whole = bdev_whole(bdev);
-		bool bdev_free;
-
-		/*
-		 * Release a claim on the device.  The holder fields
-		 * are protected with bdev_lock.  open_mutex is to
-		 * synchronize disk_holder unlinking.
-		 */
-		spin_lock(&bdev_lock);
-
-		WARN_ON_ONCE(--bdev->bd_holders < 0);
-		WARN_ON_ONCE(--whole->bd_holders < 0);
-
-		if ((bdev_free = !bdev->bd_holders))
-			bdev->bd_holder = NULL;
-		if (!whole->bd_holders)
-			whole->bd_holder = NULL;
-
-		spin_unlock(&bdev_lock);
-
-		/*
-		 * If this was the last claim, remove holder link and
-		 * unblock evpoll if it was a write holder.
-		 */
-		if (bdev_free && bdev->bd_write_holder) {
-			disk_unblock_events(disk);
-			bdev->bd_write_holder = false;
-		}
-	}
+	if (mode & FMODE_EXCL)
+		bd_end_claim(bdev);
 
 	/*
 	 * Trigger event checking and tell drivers to flush MEDIA_CHANGE
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/16] block: refactor bd_may_claim
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 01/16] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 03/16] block: turn bdev_lock into a mutex Christoph Hellwig
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

The long if/else chain obsfucates the actual logic.  Tidy it up to be
more structured.  Also drop the whole argument, as it can be trivially
derived from bdev using bdev_whole, and having the bdev_whole in the
function makes it easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 317bfd9cba40fa..080b5c83bfbc72 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -463,7 +463,6 @@ long nr_blockdev_pages(void)
 /**
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
- * @whole: whole block device containing @bdev, may equal @bdev
  * @holder: holder trying to claim @bdev
  *
  * Test whether @bdev can be claimed by @holder.
@@ -474,22 +473,27 @@ long nr_blockdev_pages(void)
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
-			 void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder)
 {
-	if (bdev->bd_holder == holder)
-		return true;	 /* already a holder */
-	else if (bdev->bd_holder != NULL)
-		return false; 	 /* held by someone else */
-	else if (whole == bdev)
-		return true;  	 /* is a whole device which isn't held */
-
-	else if (whole->bd_holder == bd_may_claim)
-		return true; 	 /* is a partition of a device that is being partitioned */
-	else if (whole->bd_holder != NULL)
-		return false;	 /* is a partition of a held device */
-	else
-		return true;	 /* is a partition of an un-held device */
+	struct block_device *whole = bdev_whole(bdev);
+
+	if (bdev->bd_holder) {
+		/*
+		 * The same holder can always re-claim.
+		 */
+		if (bdev->bd_holder == holder)
+			return true;
+		return false;
+	}
+
+	/*
+	 * If the whole devices holder is set to bd_may_claim, a partition on
+	 * the device is claimed, but not the whole device.
+	 */
+	if (whole != bdev &&
+	    whole->bd_holder && whole->bd_holder != bd_may_claim)
+		return false;
+	return true;
 }
 
 /**
@@ -513,7 +517,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 retry:
 	spin_lock(&bdev_lock);
 	/* if someone else claimed, fail */
-	if (!bd_may_claim(bdev, whole, holder)) {
+	if (!bd_may_claim(bdev, holder)) {
 		spin_unlock(&bdev_lock);
 		return -EBUSY;
 	}
@@ -559,7 +563,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	struct block_device *whole = bdev_whole(bdev);
 
 	spin_lock(&bdev_lock);
-	BUG_ON(!bd_may_claim(bdev, whole, holder));
+	BUG_ON(!bd_may_claim(bdev, holder));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
 	 * and bd_holder will be set to bd_may_claim before being set to holder
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/16] block: turn bdev_lock into a mutex
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 01/16] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 02/16] block: refactor bd_may_claim Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 04/16] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

There is no reason for this lock to spin, and being able to sleep under
it will come in handy soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 080b5c83bfbc72..f5ffcac762e0cd 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -308,7 +308,7 @@ EXPORT_SYMBOL(thaw_bdev);
  * pseudo-fs
  */
 
-static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(bdev_lock);
+static  __cacheline_aligned_in_smp DEFINE_MUTEX(bdev_lock);
 static struct kmem_cache * bdev_cachep __read_mostly;
 
 static struct inode *bdev_alloc_inode(struct super_block *sb)
@@ -467,9 +467,6 @@ long nr_blockdev_pages(void)
  *
  * Test whether @bdev can be claimed by @holder.
  *
- * CONTEXT:
- * spin_lock(&bdev_lock).
- *
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
@@ -477,6 +474,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
+	lockdep_assert_held(&bdev_lock);
+
 	if (bdev->bd_holder) {
 		/*
 		 * The same holder can always re-claim.
@@ -515,10 +514,10 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 	if (WARN_ON_ONCE(!holder))
 		return -EINVAL;
 retry:
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	/* if someone else claimed, fail */
 	if (!bd_may_claim(bdev, holder)) {
-		spin_unlock(&bdev_lock);
+		mutex_unlock(&bdev_lock);
 		return -EBUSY;
 	}
 
@@ -528,7 +527,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 		DEFINE_WAIT(wait);
 
 		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-		spin_unlock(&bdev_lock);
+		mutex_unlock(&bdev_lock);
 		schedule();
 		finish_wait(wq, &wait);
 		goto retry;
@@ -536,7 +535,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 
 	/* yay, all mine */
 	whole->bd_claiming = holder;
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
@@ -562,7 +561,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	BUG_ON(!bd_may_claim(bdev, holder));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
@@ -573,7 +572,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	bdev->bd_holders++;
 	bdev->bd_holder = holder;
 	bd_clear_claiming(whole, holder);
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 }
 
 /**
@@ -587,9 +586,9 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
  */
 void bd_abort_claiming(struct block_device *bdev, void *holder)
 {
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	bd_clear_claiming(bdev_whole(bdev), holder);
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 }
 EXPORT_SYMBOL(bd_abort_claiming);
 
@@ -602,7 +601,7 @@ static void bd_end_claim(struct block_device *bdev)
 	 * Release a claim on the device.  The holder fields are protected with
 	 * bdev_lock.  open_mutex is used to synchronize disk_holder unlinking.
 	 */
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	WARN_ON_ONCE(--bdev->bd_holders < 0);
 	WARN_ON_ONCE(--whole->bd_holders < 0);
 	if (!bdev->bd_holders) {
@@ -612,7 +611,7 @@ static void bd_end_claim(struct block_device *bdev)
 	}
 	if (!whole->bd_holders)
 		whole->bd_holder = NULL;
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 
 	/*
 	 * If this was the last claim, remove holder link and unblock evpoll if
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/16] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 03/16] block: turn bdev_lock into a mutex Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 05/16] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

blk_mark_disk_dead does very similar work a a section of del_gendisk:

 - set the GD_DEAD flag
 - set the capacity to zero
 - start a queue drain

but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
the disk, sets the capacity to zero before starting the drain, and both
with sending a uevent and kernel message for this fake capacity change.

Move the exact logic from the more heavily used del_gendisk into
blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/genhd.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 3537b7d7c484d7..aa327314905e63 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -572,13 +572,22 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void blk_mark_disk_dead(struct gendisk *disk)
 {
+	/*
+	 * Fail any new I/O.
+	 */
 	set_bit(GD_DEAD, &disk->state);
-	blk_queue_start_drain(disk->queue);
+	if (test_bit(GD_OWNS_QUEUE, &disk->state))
+		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
 	/*
 	 * Stop buffered writers from dirtying pages that can't be written out.
 	 */
-	set_capacity_and_notify(disk, 0);
+	set_capacity(disk, 0);
+
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_queue_start_drain(disk->queue);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
@@ -620,18 +629,7 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
-	/*
-	 * Fail any new I/O.
-	 */
-	set_bit(GD_DEAD, &disk->state);
-	if (test_bit(GD_OWNS_QUEUE, &disk->state))
-		blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-	set_capacity(disk, 0);
-
-	/*
-	 * Prevent new I/O from crossing bio_queue_enter().
-	 */
-	blk_queue_start_drain(q);
+	blk_mark_disk_dead(disk);
 
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/16] block: avoid repeated work in blk_mark_disk_dead
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 04/16] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 06/16] block: unhash the inode earlier in delete_partition Christoph Hellwig
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Check if GD_DEAD is already set in blk_mark_disk_dead, and don't
duplicate the work already done.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/genhd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index aa327314905e63..6fa926a02d8534 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -575,7 +575,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
 	/*
 	 * Fail any new I/O.
 	 */
-	set_bit(GD_DEAD, &disk->state);
+	if (test_and_set_bit(GD_DEAD, &disk->state))
+		return;
+
 	if (test_bit(GD_OWNS_QUEUE, &disk->state))
 		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/16] block: unhash the inode earlier in delete_partition
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 05/16] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 07/16] block: delete partitions later in del_gendisk Christoph Hellwig
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Move the call to remove_inode_hash to the beginning of delete_partition,
as we want to prevent opening a block_device that is about to be removed
ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/partitions/core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 82d26427deae25..9d1debaa5caf9a 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -267,6 +267,12 @@ static void delete_partition(struct block_device *part)
 {
 	lockdep_assert_held(&part->bd_disk->open_mutex);
 
+	/*
+	 * Remove the block device from the inode hash, so that it cannot be
+	 * looked up any more even when openers still hold references.
+	 */
+	remove_inode_hash(part->bd_inode);
+
 	fsync_bdev(part);
 	__invalidate_device(part, true);
 
@@ -274,12 +280,6 @@ static void delete_partition(struct block_device *part)
 	kobject_put(part->bd_holder_dir);
 	device_del(&part->bd_device);
 
-	/*
-	 * Remove the block device from the inode hash, so that it cannot be
-	 * looked up any more even when openers still hold references.
-	 */
-	remove_inode_hash(part->bd_inode);
-
 	put_device(&part->bd_device);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/16] block: delete partitions later in del_gendisk
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 06/16] block: unhash the inode earlier in delete_partition Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 08/16] block: remove blk_drop_partitions Christoph Hellwig
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Delay dropping the block_devices for partitions in del_gendisk until
after the call to blk_mark_disk_dead, so that we can implementat
notification of removed devices in blk_mark_disk_dead.

This requires splitting a lower-level drop_partition helper out of
delete_partition and using that from del_gendisk, while having a
common loop for the whole device and partitions that calls
remove_inode_hash, fsync_bdev and __invalidate_device before the
call to blk_mark_disk_dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/blk.h             |  2 +-
 block/genhd.c           | 24 +++++++++++++++++++-----
 block/partitions/core.c | 19 ++++++++++++-------
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 7ad7cb6ffa0135..9582fcd0df4123 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -409,7 +409,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
 int bdev_del_partition(struct gendisk *disk, int partno);
 int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
 		sector_t length);
-void blk_drop_partitions(struct gendisk *disk);
+void drop_partition(struct block_device *part);
 
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
 
diff --git a/block/genhd.c b/block/genhd.c
index 6fa926a02d8534..a668d2f0208766 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -615,6 +615,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 void del_gendisk(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
+	struct block_device *part;
+	unsigned long idx;
 
 	might_sleep();
 
@@ -623,16 +625,28 @@ void del_gendisk(struct gendisk *disk)
 
 	disk_del_events(disk);
 
+	/*
+	 * Prevent new openers by unlinked the bdev inode, and write out
+	 * dirty data before marking the disk dead and stopping all I/O.
+	 */
 	mutex_lock(&disk->open_mutex);
-	remove_inode_hash(disk->part0->bd_inode);
-	blk_drop_partitions(disk);
+	xa_for_each(&disk->part_tbl, idx, part) {
+		remove_inode_hash(part->bd_inode);
+		fsync_bdev(part);
+		__invalidate_device(part, true);
+	}
 	mutex_unlock(&disk->open_mutex);
 
-	fsync_bdev(disk->part0);
-	__invalidate_device(disk->part0, true);
-
 	blk_mark_disk_dead(disk);
 
+	/*
+	 * Drop all partitions now that the disk is marked dead.
+	 */
+	mutex_lock(&disk->open_mutex);
+	xa_for_each_start(&disk->part_tbl, idx, part, 1)
+		drop_partition(part);
+	mutex_unlock(&disk->open_mutex);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 9d1debaa5caf9a..c3c12671a949d2 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -263,10 +263,19 @@ const struct device_type part_type = {
 	.uevent		= part_uevent,
 };
 
-static void delete_partition(struct block_device *part)
+void drop_partition(struct block_device *part)
 {
 	lockdep_assert_held(&part->bd_disk->open_mutex);
 
+	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
+	kobject_put(part->bd_holder_dir);
+
+	device_del(&part->bd_device);
+	put_device(&part->bd_device);
+}
+
+static void delete_partition(struct block_device *part)
+{
 	/*
 	 * Remove the block device from the inode hash, so that it cannot be
 	 * looked up any more even when openers still hold references.
@@ -276,11 +285,7 @@ static void delete_partition(struct block_device *part)
 	fsync_bdev(part);
 	__invalidate_device(part, true);
 
-	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
-	kobject_put(part->bd_holder_dir);
-	device_del(&part->bd_device);
-
-	put_device(&part->bd_device);
+	drop_partition(part);
 }
 
 static ssize_t whole_disk_show(struct device *dev,
@@ -519,7 +524,7 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
 	return true;
 }
 
-void blk_drop_partitions(struct gendisk *disk)
+static void blk_drop_partitions(struct gendisk *disk)
 {
 	struct block_device *part;
 	unsigned long idx;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/16] block: remove blk_drop_partitions
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 07/16] block: delete partitions later in del_gendisk Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 09/16] block: introduce holder ops Christoph Hellwig
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

There is only a single caller left, so fold the loop into that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/partitions/core.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index c3c12671a949d2..87a21942d60667 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -524,17 +524,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
 	return true;
 }
 
-static void blk_drop_partitions(struct gendisk *disk)
-{
-	struct block_device *part;
-	unsigned long idx;
-
-	lockdep_assert_held(&disk->open_mutex);
-
-	xa_for_each_start(&disk->part_tbl, idx, part, 1)
-		delete_partition(part);
-}
-
 static bool blk_add_partition(struct gendisk *disk,
 		struct parsed_partitions *state, int p)
 {
@@ -651,6 +640,8 @@ static int blk_add_partitions(struct gendisk *disk)
 
 int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 {
+	struct block_device *part;
+	unsigned long idx;
 	int ret = 0;
 
 	lockdep_assert_held(&disk->open_mutex);
@@ -663,8 +654,9 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 		return -EBUSY;
 	sync_blockdev(disk->part0);
 	invalidate_bdev(disk->part0);
-	blk_drop_partitions(disk);
 
+	xa_for_each_start(&disk->part_tbl, idx, part, 1)
+		delete_partition(part);
 	clear_bit(GD_NEED_PART_SCAN, &disk->state);
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/16] block: introduce holder ops
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 08/16] block: remove blk_drop_partitions Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 10/16] block: add a mark_dead holder operation Christoph Hellwig
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims.  It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c                        | 41 ++++++++++++++++++++---------
 block/fops.c                        |  2 +-
 block/genhd.c                       |  6 +++--
 block/ioctl.c                       |  3 ++-
 drivers/block/drbd/drbd_nl.c        |  3 ++-
 drivers/block/loop.c                |  2 +-
 drivers/block/pktcdvd.c             |  5 ++--
 drivers/block/rnbd/rnbd-srv.c       |  2 +-
 drivers/block/xen-blkback/xenbus.c  |  2 +-
 drivers/block/zram/zram_drv.c       |  2 +-
 drivers/md/bcache/super.c           |  2 +-
 drivers/md/dm.c                     |  2 +-
 drivers/md/md.c                     |  2 +-
 drivers/mtd/devices/block2mtd.c     |  4 +--
 drivers/nvme/target/io-cmd-bdev.c   |  2 +-
 drivers/s390/block/dasd_genhd.c     |  2 +-
 drivers/target/target_core_iblock.c |  2 +-
 drivers/target/target_core_pscsi.c  |  3 ++-
 fs/btrfs/dev-replace.c              |  2 +-
 fs/btrfs/volumes.c                  |  6 ++---
 fs/erofs/super.c                    |  2 +-
 fs/ext4/super.c                     |  3 ++-
 fs/f2fs/super.c                     |  4 +--
 fs/jfs/jfs_logmgr.c                 |  2 +-
 fs/nfs/blocklayout/dev.c            |  5 ++--
 fs/nilfs2/super.c                   |  2 +-
 fs/ocfs2/cluster/heartbeat.c        |  2 +-
 fs/reiserfs/journal.c               |  5 ++--
 fs/super.c                          |  4 +--
 fs/xfs/xfs_super.c                  |  2 +-
 include/linux/blk_types.h           |  2 ++
 include/linux/blkdev.h              | 11 +++++---
 kernel/power/swap.c                 |  4 +--
 mm/swapfile.c                       |  3 ++-
 34 files changed, 90 insertions(+), 56 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index f5ffcac762e0cd..5c46ff10770638 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -102,7 +102,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
 	 * under live filesystem.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
+		int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
 		if (err)
 			goto invalidate;
 	}
@@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	bdev = I_BDEV(inode);
 	mutex_init(&bdev->bd_fsfreeze_mutex);
 	spin_lock_init(&bdev->bd_size_lock);
+	mutex_init(&bdev->bd_holder_lock);
 	bdev->bd_partno = partno;
 	bdev->bd_inode = inode;
 	bdev->bd_queue = disk->queue;
@@ -464,13 +465,15 @@ long nr_blockdev_pages(void)
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops
  *
  * Test whether @bdev can be claimed by @holder.
  *
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
@@ -480,8 +483,11 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
 		/*
 		 * The same holder can always re-claim.
 		 */
-		if (bdev->bd_holder == holder)
+		if (bdev->bd_holder == holder) {
+			if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
+				return false;
 			return true;
+		}
 		return false;
 	}
 
@@ -499,6 +505,7 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
  * bd_prepare_to_claim - claim a block device
  * @bdev: block device of interest
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops.
  *
  * Claim @bdev.  This function fails if @bdev is already claimed by another
  * holder and waits if another claiming is in progress. return, the caller
@@ -507,7 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
  * RETURNS:
  * 0 if @bdev can be claimed, -EBUSY otherwise.
  */
-int bd_prepare_to_claim(struct block_device *bdev, void *holder)
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
@@ -516,7 +524,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 retry:
 	mutex_lock(&bdev_lock);
 	/* if someone else claimed, fail */
-	if (!bd_may_claim(bdev, holder)) {
+	if (!bd_may_claim(bdev, holder, hops)) {
 		mutex_unlock(&bdev_lock);
 		return -EBUSY;
 	}
@@ -557,12 +565,13 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
  * Finish exclusive open of a block device. Mark the device as exlusively
  * open by the holder and wake up all waiters for exclusive open to finish.
  */
-static void bd_finish_claiming(struct block_device *bdev, void *holder)
+static void bd_finish_claiming(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
 	mutex_lock(&bdev_lock);
-	BUG_ON(!bd_may_claim(bdev, holder));
+	BUG_ON(!bd_may_claim(bdev, holder, hops));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
 	 * and bd_holder will be set to bd_may_claim before being set to holder
@@ -570,7 +579,10 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	whole->bd_holders++;
 	whole->bd_holder = bd_may_claim;
 	bdev->bd_holders++;
+	mutex_lock(&bdev->bd_holder_lock);
 	bdev->bd_holder = holder;
+	bdev->bd_holder_ops = hops;
+	mutex_unlock(&bdev->bd_holder_lock);
 	bd_clear_claiming(whole, holder);
 	mutex_unlock(&bdev_lock);
 }
@@ -605,7 +617,10 @@ static void bd_end_claim(struct block_device *bdev)
 	WARN_ON_ONCE(--bdev->bd_holders < 0);
 	WARN_ON_ONCE(--whole->bd_holders < 0);
 	if (!bdev->bd_holders) {
+		mutex_lock(&bdev->bd_holder_lock);
 		bdev->bd_holder = NULL;
+		bdev->bd_holder_ops = NULL;
+		mutex_unlock(&bdev->bd_holder_lock);
 		if (bdev->bd_write_holder)
 			unblock = true;
 	}
@@ -735,6 +750,7 @@ void blkdev_put_no_open(struct block_device *bdev)
  * @dev: device number of block device to open
  * @mode: FMODE_* mask
  * @holder: exclusive holder identifier
+ * @hops: holder operations
  *
  * Open the block device described by device number @dev. If @mode includes
  * %FMODE_EXCL, the block device is opened with exclusive access.  Specifying
@@ -751,7 +767,8 @@ void blkdev_put_no_open(struct block_device *bdev)
  * RETURNS:
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	bool unblock_events = true;
 	struct block_device *bdev;
@@ -771,7 +788,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 	disk = bdev->bd_disk;
 
 	if (mode & FMODE_EXCL) {
-		ret = bd_prepare_to_claim(bdev, holder);
+		ret = bd_prepare_to_claim(bdev, holder, hops);
 		if (ret)
 			goto put_blkdev;
 	}
@@ -791,7 +808,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 	if (ret)
 		goto put_module;
 	if (mode & FMODE_EXCL) {
-		bd_finish_claiming(bdev, holder);
+		bd_finish_claiming(bdev, holder, hops);
 
 		/*
 		 * Block event polling for write claims if requested.  Any write
@@ -842,7 +859,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
 struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-					void *holder)
+		void *holder, const struct blk_holder_ops *hops)
 {
 	struct block_device *bdev;
 	dev_t dev;
@@ -852,7 +869,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 	if (error)
 		return ERR_PTR(error);
 
-	bdev = blkdev_get_by_dev(dev, mode, holder);
+	bdev = blkdev_get_by_dev(dev, mode, holder, hops);
 	if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
 		blkdev_put(bdev, mode);
 		return ERR_PTR(-EACCES);
diff --git a/block/fops.c b/block/fops.c
index b12c4b2a3a69f2..6a3087b750a6cd 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -490,7 +490,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
 	if ((filp->f_flags & O_ACCMODE) == 3)
 		filp->f_mode |= FMODE_WRITE_IOCTL;
 
-	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
+	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
diff --git a/block/genhd.c b/block/genhd.c
index a668d2f0208766..b3bd58e9fbea81 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -370,13 +370,15 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
 	 * scanners.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
+		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
+					  NULL);
 		if (ret)
 			return ret;
 	}
 
 	set_bit(GD_NEED_PART_SCAN, &disk->state);
-	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
+	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL,
+				 NULL);
 	if (IS_ERR(bdev))
 		ret =  PTR_ERR(bdev);
 	else
diff --git a/block/ioctl.c b/block/ioctl.c
index 9c5f637ff153f8..c7d7d4345edb4f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -454,7 +454,8 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
 	if (mode & FMODE_EXCL)
 		return set_blocksize(bdev, n);
 
-	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
+	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev,
+			NULL)))
 		return -EBUSY;
 	ret = set_blocksize(bdev, n);
 	blkdev_put(bdev, mode | FMODE_EXCL);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1a5d3d72d91d27..cab59dab3410aa 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1641,7 +1641,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
 	int err = 0;
 
 	bdev = blkdev_get_by_path(bdev_path,
-				  FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
+				  FMODE_READ | FMODE_WRITE | FMODE_EXCL,
+				  claim_ptr, NULL);
 	if (IS_ERR(bdev)) {
 		drbd_err(device, "open(\"%s\") failed with %ld\n",
 				bdev_path, PTR_ERR(bdev));
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2cb..a73c857f5bfed0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1015,7 +1015,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	 * here to avoid changing device under exclusive owner.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		error = bd_prepare_to_claim(bdev, loop_configure);
+		error = bd_prepare_to_claim(bdev, loop_configure, NULL);
 		if (error)
 			goto out_putf;
 	}
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index d5d7884cedd477..377f8b34535294 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2125,7 +2125,8 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
 	 * to read/write from/to it. It is already opened in O_NONBLOCK mode
 	 * so open should not fail.
 	 */
-	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
+	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd,
+				 NULL);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		goto out;
@@ -2530,7 +2531,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 		}
 	}
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 	sdev = scsi_device_from_queue(bdev->bd_disk->queue);
diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 2cfed2e58d646f..cec22bbae2f9a5 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -719,7 +719,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
 		goto reject;
 	}
 
-	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
+	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE, NULL);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 4807af1d580593..43b36da9b3544d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 	vbd->pdevice  = MKDEV(major, minor);
 
 	bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
-				 FMODE_READ : FMODE_WRITE, NULL);
+				 FMODE_READ : FMODE_WRITE, NULL, NULL);
 
 	if (IS_ERR(bdev)) {
 		pr_warn("xen_vbd_create: device %08x could not be opened\n",
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b86691d2133ea8..0bc779446c6f8f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -508,7 +508,7 @@ static ssize_t backing_dev_store(struct device *dev,
 	}
 
 	bdev = blkdev_get_by_dev(inode->i_rdev,
-			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
+			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram, NULL);
 	if (IS_ERR(bdev)) {
 		err = PTR_ERR(bdev);
 		bdev = NULL;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7e9d19fd21ddd5..d84c09a73af803 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2560,7 +2560,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	err = "failed to open device";
 	bdev = blkdev_get_by_path(strim(path),
 				  FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				  sb);
+				  sb, NULL);
 	if (IS_ERR(bdev)) {
 		if (bdev == ERR_PTR(-EBUSY)) {
 			dev_t dev;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106e6..d759f8bdb3df2f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
 		return ERR_PTR(-ENOMEM);
 	refcount_set(&td->count, 1);
 
-	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr, NULL);
 	if (IS_ERR(bdev)) {
 		r = PTR_ERR(bdev);
 		goto out_free_td;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6a559a7e89c07f..fabf9c543735b6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3642,7 +3642,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
 
 	rdev->bdev = blkdev_get_by_dev(newdev,
 			FMODE_READ | FMODE_WRITE | FMODE_EXCL,
-			super_format == -2 ? &claim_rdev : rdev);
+			super_format == -2 ? &claim_rdev : rdev, NULL);
 	if (IS_ERR(rdev->bdev)) {
 		pr_warn("md: could not open device unknown-block(%u,%u).\n",
 			MAJOR(newdev), MINOR(newdev));
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 4cd37ec45762b6..7ac82c6fe35024 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -235,7 +235,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		return NULL;
 
 	/* Get a handle on the device */
-	bdev = blkdev_get_by_path(devname, mode, dev);
+	bdev = blkdev_get_by_path(devname, mode, dev, NULL);
 
 #ifndef MODULE
 	/*
@@ -257,7 +257,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		devt = name_to_dev_t(devname);
 		if (!devt)
 			continue;
-		bdev = blkdev_get_by_dev(devt, mode, dev);
+		bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
 	}
 #endif
 
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c2d6cea0236b0a..9b6d6d85c72544 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 		return -ENOTBLK;
 
 	ns->bdev = blkdev_get_by_path(ns->device_path,
-			FMODE_READ | FMODE_WRITE, NULL);
+			FMODE_READ | FMODE_WRITE, NULL, NULL);
 	if (IS_ERR(ns->bdev)) {
 		ret = PTR_ERR(ns->bdev);
 		if (ret != -ENOTBLK) {
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 998a961e170417..f21198bc483e1a 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -130,7 +130,7 @@ int dasd_scan_partitions(struct dasd_block *block)
 	struct block_device *bdev;
 	int rc;
 
-	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
+	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL, NULL);
 	if (IS_ERR(bdev)) {
 		DBF_DEV_EVENT(DBF_ERR, block->base,
 			      "scan partitions error, blkdev_get returned %ld",
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index cc838ffd129472..a5cbbefa78ee4e 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -114,7 +114,7 @@ static int iblock_configure_device(struct se_device *dev)
 	else
 		dev->dev_flags |= DF_READ_ONLY;
 
-	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
+	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
 	if (IS_ERR(bd)) {
 		ret = PTR_ERR(bd);
 		goto out_free_bioset;
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index e7425549e39c73..e3494e036c6c85 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -367,7 +367,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
 	 * for TYPE_DISK and TYPE_ZBC using supplied udev_path
 	 */
 	bd = blkdev_get_by_path(dev->udev_path,
-				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
+				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv,
+				NULL);
 	if (IS_ERR(bd)) {
 		pr_err("pSCSI: blkdev_get_by_path() failed\n");
 		scsi_device_put(sd);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 78696d331639bd..4de4984fa99ba3 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -258,7 +258,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	}
 
 	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
+				  fs_info->bdev_holder, NULL);
 	if (IS_ERR(bdev)) {
 		btrfs_err(fs_info, "target device %s is invalid!", device_path);
 		return PTR_ERR(bdev);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 841e799dece51b..784ccc8f6c69c1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -496,7 +496,7 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 {
 	int ret;
 
-	*bdev = blkdev_get_by_path(device_path, flags, holder);
+	*bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
 
 	if (IS_ERR(*bdev)) {
 		ret = PTR_ERR(*bdev);
@@ -1377,7 +1377,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	 * values temporarily, as the device paths of the fsid are the only
 	 * required information for assembling the volume.
 	 */
-	bdev = blkdev_get_by_path(path, flags, holder);
+	bdev = blkdev_get_by_path(path, flags, holder, NULL);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
@@ -2629,7 +2629,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		return -EROFS;
 
 	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
+				  fs_info->bdev_holder, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 811ab66d805ede..6c263e9cd38b2a 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -254,7 +254,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
 		dif->fscache = fscache;
 	} else if (!sbi->devs->flatdev) {
 		bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
+					  sb->s_type, NULL);
 		if (IS_ERR(bdev))
 			return PTR_ERR(bdev);
 		dif->bdev = bdev;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9680fe753e599a..865625089ecca3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1103,7 +1103,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
 {
 	struct block_device *bdev;
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
+				 NULL);
 	if (IS_ERR(bdev))
 		goto fail;
 	return bdev;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9f15b03037dba9..7c34ab082f1382 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4025,7 +4025,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
 			/* Single zoned block device mount */
 			FDEV(0).bdev =
 				blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
-					sbi->sb->s_mode, sbi->sb->s_type);
+					sbi->sb->s_mode, sbi->sb->s_type, NULL);
 		} else {
 			/* Multi-device mount */
 			memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
@@ -4044,7 +4044,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
 					sbi->log_blocks_per_seg) - 1;
 			}
 			FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
-					sbi->sb->s_mode, sbi->sb->s_type);
+					sbi->sb->s_mode, sbi->sb->s_type, NULL);
 		}
 		if (IS_ERR(FDEV(i).bdev))
 			return PTR_ERR(FDEV(i).bdev);
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 15c645827decae..46d393c8088a74 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1101,7 +1101,7 @@ int lmLogOpen(struct super_block *sb)
 	 */
 
 	bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				 log);
+				 log, NULL);
 	if (IS_ERR(bdev)) {
 		rc = PTR_ERR(bdev);
 		goto free;
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index fea5f8821da5ef..38b066ca699ed7 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -243,7 +243,7 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
 	if (!dev)
 		return -EIO;
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL, NULL);
 	if (IS_ERR(bdev)) {
 		printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
 			MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
@@ -312,7 +312,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
 	if (!devname)
 		return ERR_PTR(-ENOMEM);
 
-	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
+	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL,
+				  NULL);
 	if (IS_ERR(bdev)) {
 		pr_warn("pNFS: failed to open device %s (%ld)\n",
 			devname, PTR_ERR(bdev));
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 77f1e5778d1c84..91bfbd973d1d53 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1285,7 +1285,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
 	if (IS_ERR(sd.bdev))
 		return ERR_CAST(sd.bdev);
 
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 60b97c92e2b25e..6b13b8c3f2b8af 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1786,7 +1786,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 		goto out2;
 
 	reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
-					 FMODE_WRITE | FMODE_READ, NULL);
+					 FMODE_WRITE | FMODE_READ, NULL, NULL);
 	if (IS_ERR(reg->hr_bdev)) {
 		ret = PTR_ERR(reg->hr_bdev);
 		reg->hr_bdev = NULL;
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 4d11d60f493c14..5e4db9a0c8e5a3 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2616,7 +2616,7 @@ static int journal_init_dev(struct super_block *super,
 		if (jdev == super->s_dev)
 			blkdev_mode &= ~FMODE_EXCL;
 		journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
-						      journal);
+						      journal, NULL);
 		journal->j_dev_mode = blkdev_mode;
 		if (IS_ERR(journal->j_dev_bd)) {
 			result = PTR_ERR(journal->j_dev_bd);
@@ -2632,7 +2632,8 @@ static int journal_init_dev(struct super_block *super,
 	}
 
 	journal->j_dev_mode = blkdev_mode;
-	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
+	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal,
+					       NULL);
 	if (IS_ERR(journal->j_dev_bd)) {
 		result = PTR_ERR(journal->j_dev_bd);
 		journal->j_dev_bd = NULL;
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2bc3..012ce140080375 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1248,7 +1248,7 @@ int get_tree_bdev(struct fs_context *fc,
 	if (!fc->source)
 		return invalf(fc, "No source specified");
 
-	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
+	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
 	if (IS_ERR(bdev)) {
 		errorf(fc, "%s: Can't open blockdev", fc->source);
 		return PTR_ERR(bdev);
@@ -1333,7 +1333,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+	bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7e706255f16502..5684c538eb76dc 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -386,7 +386,7 @@ xfs_blkdev_get(
 	int			error = 0;
 
 	*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				    mp);
+				    mp, NULL);
 	if (IS_ERR(*bdevp)) {
 		error = PTR_ERR(*bdevp);
 		xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 8ef209e3aa96b8..deb69eeab6bd7b 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -55,6 +55,8 @@ struct block_device {
 	struct super_block *	bd_super;
 	void *			bd_claiming;
 	void *			bd_holder;
+	const struct blk_holder_ops *bd_holder_ops;
+	struct mutex		bd_holder_lock;
 	/* The counter of freeze processes */
 	int			bd_fsfreeze_count;
 	int			bd_holders;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d89c2da1469872..44f2a8bc57e87a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1470,10 +1470,15 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
 #define BLKDEV_MAJOR_MAX	0
 #endif
 
+struct blk_holder_ops {
+};
+
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+		const struct blk_holder_ops *hops);
 struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-		void *holder);
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
-int bd_prepare_to_claim(struct block_device *bdev, void *holder);
+		void *holder, const struct blk_holder_ops *hops);
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
 void blkdev_put(struct block_device *bdev, fmode_t mode);
 
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 92e41ed292ada8..801c411530d11c 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -357,7 +357,7 @@ static int swsusp_swap_check(void)
 	root_swap = res;
 
 	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
-			NULL);
+			NULL, NULL);
 	if (IS_ERR(hib_resume_bdev))
 		return PTR_ERR(hib_resume_bdev);
 
@@ -1524,7 +1524,7 @@ int swsusp_check(void)
 		mode |= FMODE_EXCL;
 
 	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
-					    mode, &holder);
+					    mode, &holder, NULL);
 	if (!IS_ERR(hib_resume_bdev)) {
 		set_blocksize(hib_resume_bdev, PAGE_SIZE);
 		clear_page(swsusp_header);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 274bbf79748006..cfbcf7d5705f5f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2770,7 +2770,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 
 	if (S_ISBLK(inode->i_mode)) {
 		p->bdev = blkdev_get_by_dev(inode->i_rdev,
-				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
+				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p,
+				   NULL);
 		if (IS_ERR(p->bdev)) {
 			error = PTR_ERR(p->bdev);
 			p->bdev = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/16] block: add a mark_dead holder operation
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 09/16] block: introduce holder ops Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 11/16] fs: add a method to shut down the file system Christoph Hellwig
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
to notify the holder that the block device it is using has been marked dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/genhd.c          | 24 ++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index b3bd58e9fbea81..a07c4d6a147637 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -565,6 +565,28 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 }
 EXPORT_SYMBOL(device_add_disk);
 
+static void blk_report_disk_dead(struct gendisk *disk)
+{
+	struct block_device *bdev;
+	unsigned long idx;
+
+	rcu_read_lock();
+	xa_for_each(&disk->part_tbl, idx, bdev) {
+		if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
+			continue;
+		rcu_read_unlock();
+
+		mutex_lock(&bdev->bd_holder_lock);
+		if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
+			bdev->bd_holder_ops->mark_dead(bdev);
+		mutex_unlock(&bdev->bd_holder_lock);
+
+		put_device(&bdev->bd_device);
+		rcu_read_lock();
+	}
+	rcu_read_unlock();
+}
+
 /**
  * blk_mark_disk_dead - mark a disk as dead
  * @disk: disk to mark as dead
@@ -592,6 +614,8 @@ void blk_mark_disk_dead(struct gendisk *disk)
 	 * Prevent new I/O from crossing bio_queue_enter().
 	 */
 	blk_queue_start_drain(disk->queue);
+
+	blk_report_disk_dead(disk);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 44f2a8bc57e87a..9e9a9e4edee94b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1471,6 +1471,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
 #endif
 
 struct blk_holder_ops {
+	void (*mark_dead)(struct block_device *bdev);
 };
 
 struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/16] fs: add a method to shut down the file system
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 10/16] block: add a mark_dead holder operation Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 12/16] xfs: wire up sops->shutdown Christoph Hellwig
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a new ->shutdown super operation that can be used to tell the file
system to shut down, and call it from newly created holder ops when the
block device under a file system shuts down.

This only covers the main block device for "simple" file systems using
get_tree_bdev / mount_bdev.  File systems their own get_tree method
or opening additional devices will need to set up their own
blk_holder_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/super.c         | 21 +++++++++++++++++++--
 include/linux/fs.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 012ce140080375..f127589700ab25 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1206,6 +1206,22 @@ int get_tree_keyed(struct fs_context *fc,
 EXPORT_SYMBOL(get_tree_keyed);
 
 #ifdef CONFIG_BLOCK
+static void fs_mark_dead(struct block_device *bdev)
+{
+	struct super_block *sb;
+
+	sb = get_super(bdev);
+	if (!sb)
+		return;
+
+	if (sb->s_op->shutdown)
+		sb->s_op->shutdown(sb);
+	drop_super(sb);
+}
+
+static const struct blk_holder_ops fs_holder_ops = {
+	.mark_dead		= fs_mark_dead,
+};
 
 static int set_bdev_super(struct super_block *s, void *data)
 {
@@ -1248,7 +1264,8 @@ int get_tree_bdev(struct fs_context *fc,
 	if (!fc->source)
 		return invalf(fc, "No source specified");
 
-	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
+	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type,
+				  &fs_holder_ops);
 	if (IS_ERR(bdev)) {
 		errorf(fc, "%s: Can't open blockdev", fc->source);
 		return PTR_ERR(bdev);
@@ -1333,7 +1350,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
+	bdev = blkdev_get_by_path(dev_name, mode, fs_type, &fs_holder_ops);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 08ba2ae1d3ce97..7b2053649820cc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1932,6 +1932,7 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*shutdown)(struct super_block *sb);
 };
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/16] xfs: wire up sops->shutdown
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 11/16] fs: add a method to shut down the file system Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 13/16] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead.  Add a new message to
clearly distinguish this shutdown reason from other shutdowns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_fsops.c | 3 +++
 fs/xfs/xfs_mount.h | 4 +++-
 fs/xfs/xfs_super.c | 8 ++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 13851c0d640bc8..9ebb8333a30800 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -534,6 +534,9 @@ xfs_do_force_shutdown(
 	} else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
 		tag = XFS_PTAG_SHUTDOWN_CORRUPT;
 		why = "Corruption of on-disk metadata";
+	} else if (flags & SHUTDOWN_DEVICE_REMOVED) {
+		tag = XFS_PTAG_SHUTDOWN_IOERROR;
+		why = "Block device removal";
 	} else {
 		tag = XFS_PTAG_SHUTDOWN_IOERROR;
 		why = "Metadata I/O Error";
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index aaaf5ec13492d2..429a5e12c1036e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -457,12 +457,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
 #define SHUTDOWN_FORCE_UMOUNT	(1u << 2) /* shutdown from a forced unmount */
 #define SHUTDOWN_CORRUPT_INCORE	(1u << 3) /* corrupt in-memory structures */
 #define SHUTDOWN_CORRUPT_ONDISK	(1u << 4)  /* corrupt metadata on device */
+#define SHUTDOWN_DEVICE_REMOVED	(1u << 5) /* device removed underneath us */
 
 #define XFS_SHUTDOWN_STRINGS \
 	{ SHUTDOWN_META_IO_ERROR,	"metadata_io" }, \
 	{ SHUTDOWN_LOG_IO_ERROR,	"log_io" }, \
 	{ SHUTDOWN_FORCE_UMOUNT,	"force_umount" }, \
-	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }
+	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }, \
+	{ SHUTDOWN_DEVICE_REMOVED,	"device_removed" }
 
 /*
  * Flags for xfs_mountfs
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5684c538eb76dc..eb469b8f9a0497 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1159,6 +1159,13 @@ xfs_fs_free_cached_objects(
 	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
+static void
+xfs_fs_shutdown(
+	struct super_block	*sb)
+{
+	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
+}
+
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
@@ -1172,6 +1179,7 @@ static const struct super_operations xfs_super_operations = {
 	.show_options		= xfs_fs_show_options,
 	.nr_cached_objects	= xfs_fs_nr_cached_objects,
 	.free_cached_objects	= xfs_fs_free_cached_objects,
+	.shutdown		= xfs_fs_shutdown,
 };
 
 static int
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/16] xfs: wire up the ->mark_dead holder operation for log and RT devices
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 12/16] xfs: wire up sops->shutdown Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01  9:44 ` [PATCH 14/16] ext4: split ext4_shutdown Christoph Hellwig
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Implement a set of holder_ops that shut down the file system when the
block device used as log or RT device is removed undeneath the file
system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eb469b8f9a0497..1b4bd5c88f4a11 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -377,6 +377,17 @@ xfs_setup_dax_always(
 	return 0;
 }
 
+static void
+xfs_bdev_mark_dead(
+	struct block_device	*bdev)
+{
+	xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
+}
+
+static const struct blk_holder_ops xfs_holder_ops = {
+	.mark_dead		= xfs_bdev_mark_dead,
+};
+
 STATIC int
 xfs_blkdev_get(
 	xfs_mount_t		*mp,
@@ -386,7 +397,7 @@ xfs_blkdev_get(
 	int			error = 0;
 
 	*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				    mp, NULL);
+				    mp, &xfs_holder_ops);
 	if (IS_ERR(*bdevp)) {
 		error = PTR_ERR(*bdevp);
 		xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/16] ext4: split ext4_shutdown
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 13/16] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01 10:10   ` Jan Kara
  2023-06-01  9:44 ` [PATCH 15/16] ext4: wire up sops->shutdown Christoph Hellwig
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Split ext4_shutdown into a low-level helper that will be reused for
implementing the shutdown super operation and a wrapper for the ioctl
handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/ext4.h  |  1 +
 fs/ext4/ioctl.c | 24 +++++++++++++++---------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 6948d673bba2e8..2d60bbe8d171d9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2965,6 +2965,7 @@ int ext4_fileattr_set(struct mnt_idmap *idmap,
 int ext4_fileattr_get(struct dentry *dentry, struct fileattr *fa);
 extern void ext4_reset_inode_seed(struct inode *inode);
 int ext4_update_overhead(struct super_block *sb, bool force);
+int ext4_force_shutdown(struct super_block *sb, u32 flags);
 
 /* migrate.c */
 extern int ext4_ext_migrate(struct inode *);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a43015206323..961284cc9b65cc 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -793,16 +793,9 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
 }
 #endif
 
-static int ext4_shutdown(struct super_block *sb, unsigned long arg)
+int ext4_force_shutdown(struct super_block *sb, u32 flags)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
-	__u32 flags;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	if (get_user(flags, (__u32 __user *)arg))
-		return -EFAULT;
 
 	if (flags > EXT4_GOING_FLAGS_NOLOGFLUSH)
 		return -EINVAL;
@@ -838,6 +831,19 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
 	return 0;
 }
 
+static int ext4_ioctl_shutdown(struct super_block *sb, unsigned long arg)
+{
+	u32 flags;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (get_user(flags, (__u32 __user *)arg))
+		return -EFAULT;
+
+	return ext4_force_shutdown(sb, flags);
+}
+
 struct getfsmap_info {
 	struct super_block	*gi_sb;
 	struct fsmap_head __user *gi_data;
@@ -1566,7 +1572,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		return ext4_ioctl_get_es_cache(filp, arg);
 
 	case EXT4_IOC_SHUTDOWN:
-		return ext4_shutdown(sb, arg);
+		return ext4_ioctl_shutdown(sb, arg);
 
 	case FS_IOC_ENABLE_VERITY:
 		if (!ext4_has_feature_verity(sb))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 15/16] ext4: wire up sops->shutdown
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 14/16] ext4: split ext4_shutdown Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01 10:10   ` Jan Kara
  2023-06-01  9:44 ` [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices Christoph Hellwig
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 865625089ecca3..a177a16c4d2fe5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1450,6 +1450,11 @@ static void ext4_destroy_inode(struct inode *inode)
 			 EXT4_I(inode)->i_reserved_data_blocks);
 }
 
+static void ext4_shutdown(struct super_block *sb)
+{
+       ext4_force_shutdown(sb, EXT4_GOING_FLAGS_NOLOGFLUSH);
+}
+
 static void init_once(void *foo)
 {
 	struct ext4_inode_info *ei = foo;
@@ -1610,6 +1615,7 @@ static const struct super_operations ext4_sops = {
 	.unfreeze_fs	= ext4_unfreeze,
 	.statfs		= ext4_statfs,
 	.show_options	= ext4_show_options,
+	.shutdown	= ext4_shutdown,
 #ifdef CONFIG_QUOTA
 	.quota_read	= ext4_quota_read,
 	.quota_write	= ext4_quota_write,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 15/16] ext4: wire up sops->shutdown Christoph Hellwig
@ 2023-06-01  9:44 ` Christoph Hellwig
  2023-06-01 10:11   ` Jan Kara
  2023-06-01 21:48 ` introduce bdev holder ops and a file system shutdown method v3 Dave Chinner
  2023-06-05 17:22 ` Jens Axboe
  17 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2023-06-01  9:44 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Implement a set of holder_ops that shut down the file system when the
block device used as log device is removed undeneath the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a177a16c4d2fe5..9070ea9154d727 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1096,6 +1096,15 @@ void ext4_update_dynamic_rev(struct super_block *sb)
 	 */
 }
 
+static void ext4_bdev_mark_dead(struct block_device *bdev)
+{
+	ext4_force_shutdown(bdev->bd_holder, EXT4_GOING_FLAGS_NOLOGFLUSH);
+}
+
+static const struct blk_holder_ops ext4_holder_ops = {
+	.mark_dead		= ext4_bdev_mark_dead,
+};
+
 /*
  * Open the external journal device
  */
@@ -1104,7 +1113,7 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
 	struct block_device *bdev;
 
 	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
-				 NULL);
+				 &ext4_holder_ops);
 	if (IS_ERR(bdev))
 		goto fail;
 	return bdev;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 14/16] ext4: split ext4_shutdown
  2023-06-01  9:44 ` [PATCH 14/16] ext4: split ext4_shutdown Christoph Hellwig
@ 2023-06-01 10:10   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2023-06-01 10:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 01-06-23 11:44:57, Christoph Hellwig wrote:
> Split ext4_shutdown into a low-level helper that will be reused for
> implementing the shutdown super operation and a wrapper for the ioctl
> handling.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/ext4.h  |  1 +
>  fs/ext4/ioctl.c | 24 +++++++++++++++---------
>  2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 6948d673bba2e8..2d60bbe8d171d9 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2965,6 +2965,7 @@ int ext4_fileattr_set(struct mnt_idmap *idmap,
>  int ext4_fileattr_get(struct dentry *dentry, struct fileattr *fa);
>  extern void ext4_reset_inode_seed(struct inode *inode);
>  int ext4_update_overhead(struct super_block *sb, bool force);
> +int ext4_force_shutdown(struct super_block *sb, u32 flags);
>  
>  /* migrate.c */
>  extern int ext4_ext_migrate(struct inode *);
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index f9a43015206323..961284cc9b65cc 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -793,16 +793,9 @@ static int ext4_ioctl_setproject(struct inode *inode, __u32 projid)
>  }
>  #endif
>  
> -static int ext4_shutdown(struct super_block *sb, unsigned long arg)
> +int ext4_force_shutdown(struct super_block *sb, u32 flags)
>  {
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
> -	__u32 flags;
> -
> -	if (!capable(CAP_SYS_ADMIN))
> -		return -EPERM;
> -
> -	if (get_user(flags, (__u32 __user *)arg))
> -		return -EFAULT;
>  
>  	if (flags > EXT4_GOING_FLAGS_NOLOGFLUSH)
>  		return -EINVAL;
> @@ -838,6 +831,19 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
>  	return 0;
>  }
>  
> +static int ext4_ioctl_shutdown(struct super_block *sb, unsigned long arg)
> +{
> +	u32 flags;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (get_user(flags, (__u32 __user *)arg))
> +		return -EFAULT;
> +
> +	return ext4_force_shutdown(sb, flags);
> +}
> +
>  struct getfsmap_info {
>  	struct super_block	*gi_sb;
>  	struct fsmap_head __user *gi_data;
> @@ -1566,7 +1572,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  		return ext4_ioctl_get_es_cache(filp, arg);
>  
>  	case EXT4_IOC_SHUTDOWN:
> -		return ext4_shutdown(sb, arg);
> +		return ext4_ioctl_shutdown(sb, arg);
>  
>  	case FS_IOC_ENABLE_VERITY:
>  		if (!ext4_has_feature_verity(sb))
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 15/16] ext4: wire up sops->shutdown
  2023-06-01  9:44 ` [PATCH 15/16] ext4: wire up sops->shutdown Christoph Hellwig
@ 2023-06-01 10:10   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2023-06-01 10:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 01-06-23 11:44:58, Christoph Hellwig wrote:
> Wire up the shutdown method to shut down the file system when the
> underlying block device is marked dead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/super.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 865625089ecca3..a177a16c4d2fe5 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1450,6 +1450,11 @@ static void ext4_destroy_inode(struct inode *inode)
>  			 EXT4_I(inode)->i_reserved_data_blocks);
>  }
>  
> +static void ext4_shutdown(struct super_block *sb)
> +{
> +       ext4_force_shutdown(sb, EXT4_GOING_FLAGS_NOLOGFLUSH);
> +}
> +
>  static void init_once(void *foo)
>  {
>  	struct ext4_inode_info *ei = foo;
> @@ -1610,6 +1615,7 @@ static const struct super_operations ext4_sops = {
>  	.unfreeze_fs	= ext4_unfreeze,
>  	.statfs		= ext4_statfs,
>  	.show_options	= ext4_show_options,
> +	.shutdown	= ext4_shutdown,
>  #ifdef CONFIG_QUOTA
>  	.quota_read	= ext4_quota_read,
>  	.quota_write	= ext4_quota_write,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices
  2023-06-01  9:44 ` [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices Christoph Hellwig
@ 2023-06-01 10:11   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2023-06-01 10:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 01-06-23 11:44:59, Christoph Hellwig wrote:
> Implement a set of holder_ops that shut down the file system when the
> block device used as log device is removed undeneath the file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/super.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index a177a16c4d2fe5..9070ea9154d727 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1096,6 +1096,15 @@ void ext4_update_dynamic_rev(struct super_block *sb)
>  	 */
>  }
>  
> +static void ext4_bdev_mark_dead(struct block_device *bdev)
> +{
> +	ext4_force_shutdown(bdev->bd_holder, EXT4_GOING_FLAGS_NOLOGFLUSH);
> +}
> +
> +static const struct blk_holder_ops ext4_holder_ops = {
> +	.mark_dead		= ext4_bdev_mark_dead,
> +};
> +
>  /*
>   * Open the external journal device
>   */
> @@ -1104,7 +1113,7 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
>  	struct block_device *bdev;
>  
>  	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
> -				 NULL);
> +				 &ext4_holder_ops);
>  	if (IS_ERR(bdev))
>  		goto fail;
>  	return bdev;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: introduce bdev holder ops and a file system shutdown method v3
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2023-06-01  9:44 ` [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices Christoph Hellwig
@ 2023-06-01 21:48 ` Dave Chinner
  2023-06-05 17:22 ` Jens Axboe
  17 siblings, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2023-06-01 21:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu, Jun 01, 2023 at 11:44:43AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series fixes the long standing problem that we never had a good way
> to communicate block device events to the user of the block device.
> 
> It fixes this by introducing a new set of holder ops registered at
> blkdev_get_by_* time for the exclusive holder, and then wire that up
> to a shutdown super operation to report the block device remove to the
> file systems.

Thanks for doing this, Christoph.

For the series:

Acked-by: Dave Chinner <dchinner@redhat.com>

For the XFS patches in the series:

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: introduce bdev holder ops and a file system shutdown method v3
  2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2023-06-01 21:48 ` introduce bdev holder ops and a file system shutdown method v3 Dave Chinner
@ 2023-06-05 17:22 ` Jens Axboe
  17 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2023-06-05 17:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs


On Thu, 01 Jun 2023 11:44:43 +0200, Christoph Hellwig wrote:
> this series fixes the long standing problem that we never had a good way
> to communicate block device events to the user of the block device.
> 
> It fixes this by introducing a new set of holder ops registered at
> blkdev_get_by_* time for the exclusive holder, and then wire that up
> to a shutdown super operation to report the block device remove to the
> file systems.
> 
> [...]

Applied, thanks!

[01/16] block: factor out a bd_end_claim helper from blkdev_put
        commit: 0783b1a7cbd9a02ddc35fe531b5966b674b304f0
[02/16] block: refactor bd_may_claim
        commit: ae5f855ead6b41422ca0c971ebda509c0414f8ec
[03/16] block: turn bdev_lock into a mutex
        commit: 74e6464a987b2572771ac19163e961777fd0252e
[04/16] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
        commit: 66fddc25fe182fd7d28b35f4173113f3eefc7fb5
[05/16] block: avoid repeated work in blk_mark_disk_dead
        commit: a4f75764d16bed317276b05a9fe2c179ef61680d
[06/16] block: unhash the inode earlier in delete_partition
        commit: 69f90b70bdb62e1a930239d33579e04884cd0b9a
[07/16] block: delete partitions later in del_gendisk
        commit: eec1be4c30df73238b936fa9f3653773a6f8b15c
[08/16] block: remove blk_drop_partitions
        commit: 00080f7fb7a599c26523037b202fb945f3141811
[09/16] block: introduce holder ops
        commit: 0718afd47f70cf46877c39c25d06b786e1a3f36c
[10/16] block: add a mark_dead holder operation
        commit: f55e017c642051ddc01d77a89ab18f5ee71d6276
[11/16] fs: add a method to shut down the file system
        commit: 87efb39075be6a288cd7f23858f15bd01c83028a
[12/16] xfs: wire up sops->shutdown
        commit: e7caa877e5ddac63886f4a8376cb3ffbd4dfe569
[13/16] xfs: wire up the ->mark_dead holder operation for log and RT devices
        commit: 8067ca1dcdfcc2a5e0a51bff3730ad3eef0623d6
[14/16] ext4: split ext4_shutdown
        commit: 97524b454bc562f4052751f0e635a61dad78f1b2
[15/16] ext4: wire up sops->shutdown
        commit: f5db130d4443ddf63b49e195782038ebaab0bec9
[16/16] ext4: wire up the ->mark_dead holder operation for log devices
        commit: dd2e31afba9e3a3107aa202726b6199c55075f59

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-05 17:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-01  9:44 introduce bdev holder ops and a file system shutdown method v3 Christoph Hellwig
2023-06-01  9:44 ` [PATCH 01/16] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
2023-06-01  9:44 ` [PATCH 02/16] block: refactor bd_may_claim Christoph Hellwig
2023-06-01  9:44 ` [PATCH 03/16] block: turn bdev_lock into a mutex Christoph Hellwig
2023-06-01  9:44 ` [PATCH 04/16] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
2023-06-01  9:44 ` [PATCH 05/16] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
2023-06-01  9:44 ` [PATCH 06/16] block: unhash the inode earlier in delete_partition Christoph Hellwig
2023-06-01  9:44 ` [PATCH 07/16] block: delete partitions later in del_gendisk Christoph Hellwig
2023-06-01  9:44 ` [PATCH 08/16] block: remove blk_drop_partitions Christoph Hellwig
2023-06-01  9:44 ` [PATCH 09/16] block: introduce holder ops Christoph Hellwig
2023-06-01  9:44 ` [PATCH 10/16] block: add a mark_dead holder operation Christoph Hellwig
2023-06-01  9:44 ` [PATCH 11/16] fs: add a method to shut down the file system Christoph Hellwig
2023-06-01  9:44 ` [PATCH 12/16] xfs: wire up sops->shutdown Christoph Hellwig
2023-06-01  9:44 ` [PATCH 13/16] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
2023-06-01  9:44 ` [PATCH 14/16] ext4: split ext4_shutdown Christoph Hellwig
2023-06-01 10:10   ` Jan Kara
2023-06-01  9:44 ` [PATCH 15/16] ext4: wire up sops->shutdown Christoph Hellwig
2023-06-01 10:10   ` Jan Kara
2023-06-01  9:44 ` [PATCH 16/16] ext4: wire up the ->mark_dead holder operation for log devices Christoph Hellwig
2023-06-01 10:11   ` Jan Kara
2023-06-01 21:48 ` introduce bdev holder ops and a file system shutdown method v3 Dave Chinner
2023-06-05 17:22 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).