linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* introduce bdev holder ops and a file system shutdown method v2
@ 2023-05-18  4:23 Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Hi all,

this series fixes the long standing problem that we never had a good way
to communicate block device events to the user of the block device.

It fixes this by introducing a new set of holder ops registered at
blkdev_get_by_* time for the exclusive holder, and then wire that up
to a shutdown super operation to report the block device remove to the
file systems.

Changes since v1:
 - add a patch to refactor bd_may_claim
 - add a sanity check for mismatching holder ops in bd_may_claim
 - move partition removal later in del_gendisk so that partitions
   are still around for the shutdown notification
 - add SHUTDOWN_DEVICE_REMOVED to XFS_SHUTDOWN_STRINGS

Diffstat:
 block/bdev.c                        |  159 ++++++++++++++++++++----------------
 block/blk.h                         |    2 
 block/fops.c                        |    2 
 block/genhd.c                       |   78 +++++++++++++----
 block/ioctl.c                       |    3 
 block/partitions/core.c             |   31 +++----
 drivers/block/drbd/drbd_nl.c        |    3 
 drivers/block/loop.c                |    2 
 drivers/block/pktcdvd.c             |    5 -
 drivers/block/rnbd/rnbd-srv.c       |    2 
 drivers/block/xen-blkback/xenbus.c  |    2 
 drivers/block/zram/zram_drv.c       |    2 
 drivers/md/bcache/super.c           |    2 
 drivers/md/dm.c                     |    2 
 drivers/md/md.c                     |    2 
 drivers/mtd/devices/block2mtd.c     |    4 
 drivers/nvme/target/io-cmd-bdev.c   |    2 
 drivers/s390/block/dasd_genhd.c     |    2 
 drivers/target/target_core_iblock.c |    2 
 drivers/target/target_core_pscsi.c  |    3 
 fs/btrfs/dev-replace.c              |    2 
 fs/btrfs/volumes.c                  |    6 -
 fs/erofs/super.c                    |    2 
 fs/ext4/super.c                     |    3 
 fs/f2fs/super.c                     |    4 
 fs/jfs/jfs_logmgr.c                 |    2 
 fs/nfs/blocklayout/dev.c            |    5 -
 fs/nilfs2/super.c                   |    2 
 fs/ocfs2/cluster/heartbeat.c        |    2 
 fs/reiserfs/journal.c               |    5 -
 fs/super.c                          |   21 ++++
 fs/xfs/xfs_fsops.c                  |    3 
 fs/xfs/xfs_mount.h                  |    4 
 fs/xfs/xfs_super.c                  |   21 ++++
 include/linux/blk_types.h           |    2 
 include/linux/blkdev.h              |   12 ++
 include/linux/fs.h                  |    1 
 kernel/power/swap.c                 |    4 
 mm/swapfile.c                       |    3 
 39 files changed, 266 insertions(+), 148 deletions(-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Move all the logic to release an exclusive claim into a helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c | 63 +++++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef3237a..317bfd9cba40fa 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -589,6 +589,37 @@ void bd_abort_claiming(struct block_device *bdev, void *holder)
 }
 EXPORT_SYMBOL(bd_abort_claiming);
 
+static void bd_end_claim(struct block_device *bdev)
+{
+	struct block_device *whole = bdev_whole(bdev);
+	bool unblock = false;
+
+	/*
+	 * Release a claim on the device.  The holder fields are protected with
+	 * bdev_lock.  open_mutex is used to synchronize disk_holder unlinking.
+	 */
+	spin_lock(&bdev_lock);
+	WARN_ON_ONCE(--bdev->bd_holders < 0);
+	WARN_ON_ONCE(--whole->bd_holders < 0);
+	if (!bdev->bd_holders) {
+		bdev->bd_holder = NULL;
+		if (bdev->bd_write_holder)
+			unblock = true;
+	}
+	if (!whole->bd_holders)
+		whole->bd_holder = NULL;
+	spin_unlock(&bdev_lock);
+
+	/*
+	 * If this was the last claim, remove holder link and unblock evpoll if
+	 * it was a write holder.
+	 */
+	if (unblock) {
+		disk_unblock_events(bdev->bd_disk);
+		bdev->bd_write_holder = false;
+	}
+}
+
 static void blkdev_flush_mapping(struct block_device *bdev)
 {
 	WARN_ON_ONCE(bdev->bd_holders);
@@ -843,36 +874,8 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
 		sync_blockdev(bdev);
 
 	mutex_lock(&disk->open_mutex);
-	if (mode & FMODE_EXCL) {
-		struct block_device *whole = bdev_whole(bdev);
-		bool bdev_free;
-
-		/*
-		 * Release a claim on the device.  The holder fields
-		 * are protected with bdev_lock.  open_mutex is to
-		 * synchronize disk_holder unlinking.
-		 */
-		spin_lock(&bdev_lock);
-
-		WARN_ON_ONCE(--bdev->bd_holders < 0);
-		WARN_ON_ONCE(--whole->bd_holders < 0);
-
-		if ((bdev_free = !bdev->bd_holders))
-			bdev->bd_holder = NULL;
-		if (!whole->bd_holders)
-			whole->bd_holder = NULL;
-
-		spin_unlock(&bdev_lock);
-
-		/*
-		 * If this was the last claim, remove holder link and
-		 * unblock evpoll if it was a write holder.
-		 */
-		if (bdev_free && bdev->bd_write_holder) {
-			disk_unblock_events(disk);
-			bdev->bd_write_holder = false;
-		}
-	}
+	if (mode & FMODE_EXCL)
+		bd_end_claim(bdev);
 
 	/*
 	 * Trigger event checking and tell drivers to flush MEDIA_CHANGE
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/13] block: refactor bd_may_claim
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 11:41   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

The long if/else chain obsfucates the actual logic.  Tidy it up to be
more structured.  Also drop the whole argument, as it can be trivially
derived from bdev using bdev_whole, and having the bdev_whole in the
function makes it easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bdev.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 317bfd9cba40fa..080b5c83bfbc72 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -463,7 +463,6 @@ long nr_blockdev_pages(void)
 /**
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
- * @whole: whole block device containing @bdev, may equal @bdev
  * @holder: holder trying to claim @bdev
  *
  * Test whether @bdev can be claimed by @holder.
@@ -474,22 +473,27 @@ long nr_blockdev_pages(void)
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
-			 void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder)
 {
-	if (bdev->bd_holder == holder)
-		return true;	 /* already a holder */
-	else if (bdev->bd_holder != NULL)
-		return false; 	 /* held by someone else */
-	else if (whole == bdev)
-		return true;  	 /* is a whole device which isn't held */
-
-	else if (whole->bd_holder == bd_may_claim)
-		return true; 	 /* is a partition of a device that is being partitioned */
-	else if (whole->bd_holder != NULL)
-		return false;	 /* is a partition of a held device */
-	else
-		return true;	 /* is a partition of an un-held device */
+	struct block_device *whole = bdev_whole(bdev);
+
+	if (bdev->bd_holder) {
+		/*
+		 * The same holder can always re-claim.
+		 */
+		if (bdev->bd_holder == holder)
+			return true;
+		return false;
+	}
+
+	/*
+	 * If the whole devices holder is set to bd_may_claim, a partition on
+	 * the device is claimed, but not the whole device.
+	 */
+	if (whole != bdev &&
+	    whole->bd_holder && whole->bd_holder != bd_may_claim)
+		return false;
+	return true;
 }
 
 /**
@@ -513,7 +517,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 retry:
 	spin_lock(&bdev_lock);
 	/* if someone else claimed, fail */
-	if (!bd_may_claim(bdev, whole, holder)) {
+	if (!bd_may_claim(bdev, holder)) {
 		spin_unlock(&bdev_lock);
 		return -EBUSY;
 	}
@@ -559,7 +563,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	struct block_device *whole = bdev_whole(bdev);
 
 	spin_lock(&bdev_lock);
-	BUG_ON(!bd_may_claim(bdev, whole, holder));
+	BUG_ON(!bd_may_claim(bdev, holder));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
 	 * and bd_holder will be set to bd_may_claim before being set to holder
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/13] block: turn bdev_lock into a mutex
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

There is no reason for this lock to spin, and being able to sleep under
it will come in handy soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/bdev.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 080b5c83bfbc72..f5ffcac762e0cd 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -308,7 +308,7 @@ EXPORT_SYMBOL(thaw_bdev);
  * pseudo-fs
  */
 
-static  __cacheline_aligned_in_smp DEFINE_SPINLOCK(bdev_lock);
+static  __cacheline_aligned_in_smp DEFINE_MUTEX(bdev_lock);
 static struct kmem_cache * bdev_cachep __read_mostly;
 
 static struct inode *bdev_alloc_inode(struct super_block *sb)
@@ -467,9 +467,6 @@ long nr_blockdev_pages(void)
  *
  * Test whether @bdev can be claimed by @holder.
  *
- * CONTEXT:
- * spin_lock(&bdev_lock).
- *
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
@@ -477,6 +474,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
+	lockdep_assert_held(&bdev_lock);
+
 	if (bdev->bd_holder) {
 		/*
 		 * The same holder can always re-claim.
@@ -515,10 +514,10 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 	if (WARN_ON_ONCE(!holder))
 		return -EINVAL;
 retry:
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	/* if someone else claimed, fail */
 	if (!bd_may_claim(bdev, holder)) {
-		spin_unlock(&bdev_lock);
+		mutex_unlock(&bdev_lock);
 		return -EBUSY;
 	}
 
@@ -528,7 +527,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 		DEFINE_WAIT(wait);
 
 		prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
-		spin_unlock(&bdev_lock);
+		mutex_unlock(&bdev_lock);
 		schedule();
 		finish_wait(wq, &wait);
 		goto retry;
@@ -536,7 +535,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 
 	/* yay, all mine */
 	whole->bd_claiming = holder;
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
@@ -562,7 +561,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	BUG_ON(!bd_may_claim(bdev, holder));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
@@ -573,7 +572,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	bdev->bd_holders++;
 	bdev->bd_holder = holder;
 	bd_clear_claiming(whole, holder);
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 }
 
 /**
@@ -587,9 +586,9 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
  */
 void bd_abort_claiming(struct block_device *bdev, void *holder)
 {
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	bd_clear_claiming(bdev_whole(bdev), holder);
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 }
 EXPORT_SYMBOL(bd_abort_claiming);
 
@@ -602,7 +601,7 @@ static void bd_end_claim(struct block_device *bdev)
 	 * Release a claim on the device.  The holder fields are protected with
 	 * bdev_lock.  open_mutex is used to synchronize disk_holder unlinking.
 	 */
-	spin_lock(&bdev_lock);
+	mutex_lock(&bdev_lock);
 	WARN_ON_ONCE(--bdev->bd_holders < 0);
 	WARN_ON_ONCE(--whole->bd_holders < 0);
 	if (!bdev->bd_holders) {
@@ -612,7 +611,7 @@ static void bd_end_claim(struct block_device *bdev)
 	}
 	if (!whole->bd_holders)
 		whole->bd_holder = NULL;
-	spin_unlock(&bdev_lock);
+	mutex_unlock(&bdev_lock);
 
 	/*
 	 * If this was the last claim, remove holder link and unblock evpoll if
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 11:58   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

blk_mark_disk_dead does very similar work a a section of del_gendisk:

 - set the GD_DEAD flag
 - set the capacity to zero
 - start a queue drain

but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
the disk, sets the capacity to zero before starting the drain, and both
with sending a uevent and kernel message for this fake capacity change.

Move the exact logic from the more heavily used del_gendisk into
blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/genhd.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 1cb489b927d50a..d8fe40c7d1f0a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -572,13 +572,22 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void blk_mark_disk_dead(struct gendisk *disk)
 {
+	/*
+	 * Fail any new I/O.
+	 */
 	set_bit(GD_DEAD, &disk->state);
-	blk_queue_start_drain(disk->queue);
+	if (test_bit(GD_OWNS_QUEUE, &disk->state))
+		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
 	/*
 	 * Stop buffered writers from dirtying pages that can't be written out.
 	 */
-	set_capacity_and_notify(disk, 0);
+	set_capacity(disk, 0);
+
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_queue_start_drain(disk->queue);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
@@ -620,18 +629,7 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
-	/*
-	 * Fail any new I/O.
-	 */
-	set_bit(GD_DEAD, &disk->state);
-	if (test_bit(GD_OWNS_QUEUE, &disk->state))
-		blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-	set_capacity(disk, 0);
-
-	/*
-	 * Prevent new I/O from crossing bio_queue_enter().
-	 */
-	blk_queue_start_drain(q);
+	blk_mark_disk_dead(disk);
 
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Check if GD_DEAD is already set in blk_mark_disk_dead, and don't
duplicate the work already done.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/genhd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index d8fe40c7d1f0a2..a744daeed55318 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -575,7 +575,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
 	/*
 	 * Fail any new I/O.
 	 */
-	set_bit(GD_DEAD, &disk->state);
+	if (test_and_set_bit(GD_DEAD, &disk->state))
+		return;
+
 	if (test_bit(GD_OWNS_QUEUE, &disk->state))
 		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/13] block: unhash the inode earlier in delete_partition
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 12:09   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Move the call to remove_inode_hash to the beginning of delete_partition,
as we want to prevent opening a block_device that is about to be removed
ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/partitions/core.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 49e0496ff23c1e..fa5c707fe0ad2f 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -267,6 +267,12 @@ static void delete_partition(struct block_device *part)
 {
 	lockdep_assert_held(&part->bd_disk->open_mutex);
 
+	/*
+	 * Remove the block device from the inode hash, so that it cannot be
+	 * looked up any more even when openers still hold references.
+	 */
+	remove_inode_hash(part->bd_inode);
+
 	fsync_bdev(part);
 	__invalidate_device(part, true);
 
@@ -274,12 +280,6 @@ static void delete_partition(struct block_device *part)
 	kobject_put(part->bd_holder_dir);
 	device_del(&part->bd_device);
 
-	/*
-	 * Remove the block device from the inode hash, so that it cannot be
-	 * looked up any more even when openers still hold references.
-	 */
-	remove_inode_hash(part->bd_inode);
-
 	put_device(&part->bd_device);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/13] block: delete partitions later in del_gendisk
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 12:55   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Delay dropping the block_devices for partitions in del_gendisk until
after the call to blk_mark_disk_dead, so that we can implementat
notification of removed devices in blk_mark_disk_dead.

This requires splitting a lower-level drop_partition helper out of
delete_partition and using that from del_gendisk, while having a
common loop for the whole device and partitions that calls
remove_inode_hash, fsync_bdev and __invalidate_device before the
call to blk_mark_disk_dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk.h             |  2 +-
 block/genhd.c           | 24 +++++++++++++++++++-----
 block/partitions/core.c | 19 ++++++++++++-------
 3 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 45547bcf111938..4363052f90416a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -409,7 +409,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
 int bdev_del_partition(struct gendisk *disk, int partno);
 int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
 		sector_t length);
-void blk_drop_partitions(struct gendisk *disk);
+void drop_partition(struct block_device *part);
 
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
 
diff --git a/block/genhd.c b/block/genhd.c
index a744daeed55318..bd4c4eca31363e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -615,6 +615,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 void del_gendisk(struct gendisk *disk)
 {
 	struct request_queue *q = disk->queue;
+	struct block_device *part;
+	unsigned long idx;
 
 	might_sleep();
 
@@ -623,16 +625,28 @@ void del_gendisk(struct gendisk *disk)
 
 	disk_del_events(disk);
 
+	/*
+	 * Prevent new openers by unlinked the bdev inode, and write out
+	 * dirty data before marking the disk dead and stopping all I/O.
+	 */
 	mutex_lock(&disk->open_mutex);
-	remove_inode_hash(disk->part0->bd_inode);
-	blk_drop_partitions(disk);
+	xa_for_each(&disk->part_tbl, idx, part) {
+		remove_inode_hash(part->bd_inode);
+		fsync_bdev(part);
+		__invalidate_device(part, true);
+	}
 	mutex_unlock(&disk->open_mutex);
 
-	fsync_bdev(disk->part0);
-	__invalidate_device(disk->part0, true);
-
 	blk_mark_disk_dead(disk);
 
+	/*
+	 * Drop all partitions now that the disk is marked dead.
+	 */
+	mutex_lock(&disk->open_mutex);
+	xa_for_each_start(&disk->part_tbl, idx, part, 1)
+		drop_partition(part);
+	mutex_unlock(&disk->open_mutex);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
diff --git a/block/partitions/core.c b/block/partitions/core.c
index fa5c707fe0ad2f..31ac815d77a83c 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -263,10 +263,19 @@ struct device_type part_type = {
 	.uevent		= part_uevent,
 };
 
-static void delete_partition(struct block_device *part)
+void drop_partition(struct block_device *part)
 {
 	lockdep_assert_held(&part->bd_disk->open_mutex);
 
+	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
+	kobject_put(part->bd_holder_dir);
+
+	device_del(&part->bd_device);
+	put_device(&part->bd_device);
+}
+
+static void delete_partition(struct block_device *part)
+{
 	/*
 	 * Remove the block device from the inode hash, so that it cannot be
 	 * looked up any more even when openers still hold references.
@@ -276,11 +285,7 @@ static void delete_partition(struct block_device *part)
 	fsync_bdev(part);
 	__invalidate_device(part, true);
 
-	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
-	kobject_put(part->bd_holder_dir);
-	device_del(&part->bd_device);
-
-	put_device(&part->bd_device);
+	drop_partition(part);
 }
 
 static ssize_t whole_disk_show(struct device *dev,
@@ -519,7 +524,7 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
 	return true;
 }
 
-void blk_drop_partitions(struct gendisk *disk)
+static void blk_drop_partitions(struct gendisk *disk)
 {
 	struct block_device *part;
 	unsigned long idx;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/13] block: remove blk_drop_partitions
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 12:56   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

There is only a single caller left, so fold the loop into that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/partitions/core.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 31ac815d77a83c..2559bb830273eb 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -524,17 +524,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
 	return true;
 }
 
-static void blk_drop_partitions(struct gendisk *disk)
-{
-	struct block_device *part;
-	unsigned long idx;
-
-	lockdep_assert_held(&disk->open_mutex);
-
-	xa_for_each_start(&disk->part_tbl, idx, part, 1)
-		delete_partition(part);
-}
-
 static bool blk_add_partition(struct gendisk *disk,
 		struct parsed_partitions *state, int p)
 {
@@ -651,6 +640,8 @@ static int blk_add_partitions(struct gendisk *disk)
 
 int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 {
+	struct block_device *part;
+	unsigned long idx;
 	int ret = 0;
 
 	lockdep_assert_held(&disk->open_mutex);
@@ -663,8 +654,9 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 		return -EBUSY;
 	sync_blockdev(disk->part0);
 	invalidate_bdev(disk->part0);
-	blk_drop_partitions(disk);
 
+	xa_for_each_start(&disk->part_tbl, idx, part, 1)
+		delete_partition(part);
 	clear_bit(GD_NEED_PART_SCAN, &disk->state);
 
 	/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/13] block: introduce holder ops
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 13:03   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims.  It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/bdev.c                        | 41 ++++++++++++++++++++---------
 block/fops.c                        |  2 +-
 block/genhd.c                       |  6 +++--
 block/ioctl.c                       |  3 ++-
 drivers/block/drbd/drbd_nl.c        |  3 ++-
 drivers/block/loop.c                |  2 +-
 drivers/block/pktcdvd.c             |  5 ++--
 drivers/block/rnbd/rnbd-srv.c       |  2 +-
 drivers/block/xen-blkback/xenbus.c  |  2 +-
 drivers/block/zram/zram_drv.c       |  2 +-
 drivers/md/bcache/super.c           |  2 +-
 drivers/md/dm.c                     |  2 +-
 drivers/md/md.c                     |  2 +-
 drivers/mtd/devices/block2mtd.c     |  4 +--
 drivers/nvme/target/io-cmd-bdev.c   |  2 +-
 drivers/s390/block/dasd_genhd.c     |  2 +-
 drivers/target/target_core_iblock.c |  2 +-
 drivers/target/target_core_pscsi.c  |  3 ++-
 fs/btrfs/dev-replace.c              |  2 +-
 fs/btrfs/volumes.c                  |  6 ++---
 fs/erofs/super.c                    |  2 +-
 fs/ext4/super.c                     |  3 ++-
 fs/f2fs/super.c                     |  4 +--
 fs/jfs/jfs_logmgr.c                 |  2 +-
 fs/nfs/blocklayout/dev.c            |  5 ++--
 fs/nilfs2/super.c                   |  2 +-
 fs/ocfs2/cluster/heartbeat.c        |  2 +-
 fs/reiserfs/journal.c               |  5 ++--
 fs/super.c                          |  4 +--
 fs/xfs/xfs_super.c                  |  2 +-
 include/linux/blk_types.h           |  2 ++
 include/linux/blkdev.h              | 11 +++++---
 kernel/power/swap.c                 |  4 +--
 mm/swapfile.c                       |  3 ++-
 34 files changed, 90 insertions(+), 56 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index f5ffcac762e0cd..5c46ff10770638 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -102,7 +102,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
 	 * under live filesystem.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
+		int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
 		if (err)
 			goto invalidate;
 	}
@@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	bdev = I_BDEV(inode);
 	mutex_init(&bdev->bd_fsfreeze_mutex);
 	spin_lock_init(&bdev->bd_size_lock);
+	mutex_init(&bdev->bd_holder_lock);
 	bdev->bd_partno = partno;
 	bdev->bd_inode = inode;
 	bdev->bd_queue = disk->queue;
@@ -464,13 +465,15 @@ long nr_blockdev_pages(void)
  * bd_may_claim - test whether a block device can be claimed
  * @bdev: block device of interest
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops
  *
  * Test whether @bdev can be claimed by @holder.
  *
  * RETURNS:
  * %true if @bdev can be claimed, %false otherwise.
  */
-static bool bd_may_claim(struct block_device *bdev, void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
@@ -480,8 +483,11 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
 		/*
 		 * The same holder can always re-claim.
 		 */
-		if (bdev->bd_holder == holder)
+		if (bdev->bd_holder == holder) {
+			if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
+				return false;
 			return true;
+		}
 		return false;
 	}
 
@@ -499,6 +505,7 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
  * bd_prepare_to_claim - claim a block device
  * @bdev: block device of interest
  * @holder: holder trying to claim @bdev
+ * @hops: holder ops.
  *
  * Claim @bdev.  This function fails if @bdev is already claimed by another
  * holder and waits if another claiming is in progress. return, the caller
@@ -507,7 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
  * RETURNS:
  * 0 if @bdev can be claimed, -EBUSY otherwise.
  */
-int bd_prepare_to_claim(struct block_device *bdev, void *holder)
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
@@ -516,7 +524,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
 retry:
 	mutex_lock(&bdev_lock);
 	/* if someone else claimed, fail */
-	if (!bd_may_claim(bdev, holder)) {
+	if (!bd_may_claim(bdev, holder, hops)) {
 		mutex_unlock(&bdev_lock);
 		return -EBUSY;
 	}
@@ -557,12 +565,13 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
  * Finish exclusive open of a block device. Mark the device as exlusively
  * open by the holder and wake up all waiters for exclusive open to finish.
  */
-static void bd_finish_claiming(struct block_device *bdev, void *holder)
+static void bd_finish_claiming(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	struct block_device *whole = bdev_whole(bdev);
 
 	mutex_lock(&bdev_lock);
-	BUG_ON(!bd_may_claim(bdev, holder));
+	BUG_ON(!bd_may_claim(bdev, holder, hops));
 	/*
 	 * Note that for a whole device bd_holders will be incremented twice,
 	 * and bd_holder will be set to bd_may_claim before being set to holder
@@ -570,7 +579,10 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
 	whole->bd_holders++;
 	whole->bd_holder = bd_may_claim;
 	bdev->bd_holders++;
+	mutex_lock(&bdev->bd_holder_lock);
 	bdev->bd_holder = holder;
+	bdev->bd_holder_ops = hops;
+	mutex_unlock(&bdev->bd_holder_lock);
 	bd_clear_claiming(whole, holder);
 	mutex_unlock(&bdev_lock);
 }
@@ -605,7 +617,10 @@ static void bd_end_claim(struct block_device *bdev)
 	WARN_ON_ONCE(--bdev->bd_holders < 0);
 	WARN_ON_ONCE(--whole->bd_holders < 0);
 	if (!bdev->bd_holders) {
+		mutex_lock(&bdev->bd_holder_lock);
 		bdev->bd_holder = NULL;
+		bdev->bd_holder_ops = NULL;
+		mutex_unlock(&bdev->bd_holder_lock);
 		if (bdev->bd_write_holder)
 			unblock = true;
 	}
@@ -735,6 +750,7 @@ void blkdev_put_no_open(struct block_device *bdev)
  * @dev: device number of block device to open
  * @mode: FMODE_* mask
  * @holder: exclusive holder identifier
+ * @hops: holder operations
  *
  * Open the block device described by device number @dev. If @mode includes
  * %FMODE_EXCL, the block device is opened with exclusive access.  Specifying
@@ -751,7 +767,8 @@ void blkdev_put_no_open(struct block_device *bdev)
  * RETURNS:
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+		const struct blk_holder_ops *hops)
 {
 	bool unblock_events = true;
 	struct block_device *bdev;
@@ -771,7 +788,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 	disk = bdev->bd_disk;
 
 	if (mode & FMODE_EXCL) {
-		ret = bd_prepare_to_claim(bdev, holder);
+		ret = bd_prepare_to_claim(bdev, holder, hops);
 		if (ret)
 			goto put_blkdev;
 	}
@@ -791,7 +808,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
 	if (ret)
 		goto put_module;
 	if (mode & FMODE_EXCL) {
-		bd_finish_claiming(bdev, holder);
+		bd_finish_claiming(bdev, holder, hops);
 
 		/*
 		 * Block event polling for write claims if requested.  Any write
@@ -842,7 +859,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
  * Reference to the block_device on success, ERR_PTR(-errno) on failure.
  */
 struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-					void *holder)
+		void *holder, const struct blk_holder_ops *hops)
 {
 	struct block_device *bdev;
 	dev_t dev;
@@ -852,7 +869,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
 	if (error)
 		return ERR_PTR(error);
 
-	bdev = blkdev_get_by_dev(dev, mode, holder);
+	bdev = blkdev_get_by_dev(dev, mode, holder, hops);
 	if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
 		blkdev_put(bdev, mode);
 		return ERR_PTR(-EACCES);
diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c7d..2ac5ea878fa4cc 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -490,7 +490,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
 	if ((filp->f_flags & O_ACCMODE) == 3)
 		filp->f_mode |= FMODE_WRITE_IOCTL;
 
-	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
+	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
diff --git a/block/genhd.c b/block/genhd.c
index bd4c4eca31363e..226ddb8329f751 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -370,13 +370,15 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
 	 * scanners.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
+		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
+					  NULL);
 		if (ret)
 			return ret;
 	}
 
 	set_bit(GD_NEED_PART_SCAN, &disk->state);
-	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
+	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL,
+				 NULL);
 	if (IS_ERR(bdev))
 		ret =  PTR_ERR(bdev);
 	else
diff --git a/block/ioctl.c b/block/ioctl.c
index 9c5f637ff153f8..c7d7d4345edb4f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -454,7 +454,8 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
 	if (mode & FMODE_EXCL)
 		return set_blocksize(bdev, n);
 
-	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
+	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev,
+			NULL)))
 		return -EBUSY;
 	ret = set_blocksize(bdev, n);
 	blkdev_put(bdev, mode | FMODE_EXCL);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1a5d3d72d91d27..cab59dab3410aa 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1641,7 +1641,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
 	int err = 0;
 
 	bdev = blkdev_get_by_path(bdev_path,
-				  FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
+				  FMODE_READ | FMODE_WRITE | FMODE_EXCL,
+				  claim_ptr, NULL);
 	if (IS_ERR(bdev)) {
 		drbd_err(device, "open(\"%s\") failed with %ld\n",
 				bdev_path, PTR_ERR(bdev));
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2cb..a73c857f5bfed0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1015,7 +1015,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
 	 * here to avoid changing device under exclusive owner.
 	 */
 	if (!(mode & FMODE_EXCL)) {
-		error = bd_prepare_to_claim(bdev, loop_configure);
+		error = bd_prepare_to_claim(bdev, loop_configure, NULL);
 		if (error)
 			goto out_putf;
 	}
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index d5d7884cedd477..377f8b34535294 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2125,7 +2125,8 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
 	 * to read/write from/to it. It is already opened in O_NONBLOCK mode
 	 * so open should not fail.
 	 */
-	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
+	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd,
+				 NULL);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		goto out;
@@ -2530,7 +2531,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 		}
 	}
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 	sdev = scsi_device_from_queue(bdev->bd_disk->queue);
diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 2cfed2e58d646f..cec22bbae2f9a5 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -719,7 +719,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
 		goto reject;
 	}
 
-	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
+	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE, NULL);
 	if (IS_ERR(bdev)) {
 		ret = PTR_ERR(bdev);
 		pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 4807af1d580593..43b36da9b3544d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 	vbd->pdevice  = MKDEV(major, minor);
 
 	bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
-				 FMODE_READ : FMODE_WRITE, NULL);
+				 FMODE_READ : FMODE_WRITE, NULL, NULL);
 
 	if (IS_ERR(bdev)) {
 		pr_warn("xen_vbd_create: device %08x could not be opened\n",
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f6d90f1ba5cf7b..ef9dc4ef6796da 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -508,7 +508,7 @@ static ssize_t backing_dev_store(struct device *dev,
 	}
 
 	bdev = blkdev_get_by_dev(inode->i_rdev,
-			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
+			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram, NULL);
 	if (IS_ERR(bdev)) {
 		err = PTR_ERR(bdev);
 		bdev = NULL;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7e9d19fd21ddd5..d84c09a73af803 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2560,7 +2560,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	err = "failed to open device";
 	bdev = blkdev_get_by_path(strim(path),
 				  FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				  sb);
+				  sb, NULL);
 	if (IS_ERR(bdev)) {
 		if (bdev == ERR_PTR(-EBUSY)) {
 			dev_t dev;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106e6..d759f8bdb3df2f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
 		return ERR_PTR(-ENOMEM);
 	refcount_set(&td->count, 1);
 
-	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr, NULL);
 	if (IS_ERR(bdev)) {
 		r = PTR_ERR(bdev);
 		goto out_free_td;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b34446f..60ab5c4bee77c5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3642,7 +3642,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
 
 	rdev->bdev = blkdev_get_by_dev(newdev,
 			FMODE_READ | FMODE_WRITE | FMODE_EXCL,
-			super_format == -2 ? &claim_rdev : rdev);
+			super_format == -2 ? &claim_rdev : rdev, NULL);
 	if (IS_ERR(rdev->bdev)) {
 		pr_warn("md: could not open device unknown-block(%u,%u).\n",
 			MAJOR(newdev), MINOR(newdev));
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 4cd37ec45762b6..7ac82c6fe35024 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -235,7 +235,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		return NULL;
 
 	/* Get a handle on the device */
-	bdev = blkdev_get_by_path(devname, mode, dev);
+	bdev = blkdev_get_by_path(devname, mode, dev, NULL);
 
 #ifndef MODULE
 	/*
@@ -257,7 +257,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		devt = name_to_dev_t(devname);
 		if (!devt)
 			continue;
-		bdev = blkdev_get_by_dev(devt, mode, dev);
+		bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
 	}
 #endif
 
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c2d6cea0236b0a..9b6d6d85c72544 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 		return -ENOTBLK;
 
 	ns->bdev = blkdev_get_by_path(ns->device_path,
-			FMODE_READ | FMODE_WRITE, NULL);
+			FMODE_READ | FMODE_WRITE, NULL, NULL);
 	if (IS_ERR(ns->bdev)) {
 		ret = PTR_ERR(ns->bdev);
 		if (ret != -ENOTBLK) {
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 998a961e170417..f21198bc483e1a 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -130,7 +130,7 @@ int dasd_scan_partitions(struct dasd_block *block)
 	struct block_device *bdev;
 	int rc;
 
-	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
+	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL, NULL);
 	if (IS_ERR(bdev)) {
 		DBF_DEV_EVENT(DBF_ERR, block->base,
 			      "scan partitions error, blkdev_get returned %ld",
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index cc838ffd129472..a5cbbefa78ee4e 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -114,7 +114,7 @@ static int iblock_configure_device(struct se_device *dev)
 	else
 		dev->dev_flags |= DF_READ_ONLY;
 
-	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
+	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
 	if (IS_ERR(bd)) {
 		ret = PTR_ERR(bd);
 		goto out_free_bioset;
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index e7425549e39c73..e3494e036c6c85 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -367,7 +367,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
 	 * for TYPE_DISK and TYPE_ZBC using supplied udev_path
 	 */
 	bd = blkdev_get_by_path(dev->udev_path,
-				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
+				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv,
+				NULL);
 	if (IS_ERR(bd)) {
 		pr_err("pSCSI: blkdev_get_by_path() failed\n");
 		scsi_device_put(sd);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 78696d331639bd..4de4984fa99ba3 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -258,7 +258,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	}
 
 	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
+				  fs_info->bdev_holder, NULL);
 	if (IS_ERR(bdev)) {
 		btrfs_err(fs_info, "target device %s is invalid!", device_path);
 		return PTR_ERR(bdev);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 841e799dece51b..784ccc8f6c69c1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -496,7 +496,7 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
 {
 	int ret;
 
-	*bdev = blkdev_get_by_path(device_path, flags, holder);
+	*bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
 
 	if (IS_ERR(*bdev)) {
 		ret = PTR_ERR(*bdev);
@@ -1377,7 +1377,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
 	 * values temporarily, as the device paths of the fsid are the only
 	 * required information for assembling the volume.
 	 */
-	bdev = blkdev_get_by_path(path, flags, holder);
+	bdev = blkdev_get_by_path(path, flags, holder, NULL);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
@@ -2629,7 +2629,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 		return -EROFS;
 
 	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
-				  fs_info->bdev_holder);
+				  fs_info->bdev_holder, NULL);
 	if (IS_ERR(bdev))
 		return PTR_ERR(bdev);
 
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 811ab66d805ede..6c263e9cd38b2a 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -254,7 +254,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
 		dif->fscache = fscache;
 	} else if (!sbi->devs->flatdev) {
 		bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
-					  sb->s_type);
+					  sb->s_type, NULL);
 		if (IS_ERR(bdev))
 			return PTR_ERR(bdev);
 		dif->bdev = bdev;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9680fe753e599a..865625089ecca3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1103,7 +1103,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
 {
 	struct block_device *bdev;
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
+				 NULL);
 	if (IS_ERR(bdev))
 		goto fail;
 	return bdev;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9f15b03037dba9..7c34ab082f1382 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4025,7 +4025,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
 			/* Single zoned block device mount */
 			FDEV(0).bdev =
 				blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
-					sbi->sb->s_mode, sbi->sb->s_type);
+					sbi->sb->s_mode, sbi->sb->s_type, NULL);
 		} else {
 			/* Multi-device mount */
 			memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
@@ -4044,7 +4044,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
 					sbi->log_blocks_per_seg) - 1;
 			}
 			FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
-					sbi->sb->s_mode, sbi->sb->s_type);
+					sbi->sb->s_mode, sbi->sb->s_type, NULL);
 		}
 		if (IS_ERR(FDEV(i).bdev))
 			return PTR_ERR(FDEV(i).bdev);
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 695415cbfe985b..8c55030c57ed52 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1101,7 +1101,7 @@ int lmLogOpen(struct super_block *sb)
 	 */
 
 	bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				 log);
+				 log, NULL);
 	if (IS_ERR(bdev)) {
 		rc = PTR_ERR(bdev);
 		goto free;
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index fea5f8821da5ef..38b066ca699ed7 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -243,7 +243,7 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
 	if (!dev)
 		return -EIO;
 
-	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
+	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL, NULL);
 	if (IS_ERR(bdev)) {
 		printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
 			MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
@@ -312,7 +312,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
 	if (!devname)
 		return ERR_PTR(-ENOMEM);
 
-	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
+	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL,
+				  NULL);
 	if (IS_ERR(bdev)) {
 		pr_warn("pNFS: failed to open device %s (%ld)\n",
 			devname, PTR_ERR(bdev));
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 77f1e5778d1c84..91bfbd973d1d53 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1285,7 +1285,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
 	if (IS_ERR(sd.bdev))
 		return ERR_CAST(sd.bdev);
 
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 60b97c92e2b25e..6b13b8c3f2b8af 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1786,7 +1786,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 		goto out2;
 
 	reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
-					 FMODE_WRITE | FMODE_READ, NULL);
+					 FMODE_WRITE | FMODE_READ, NULL, NULL);
 	if (IS_ERR(reg->hr_bdev)) {
 		ret = PTR_ERR(reg->hr_bdev);
 		reg->hr_bdev = NULL;
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 4d11d60f493c14..5e4db9a0c8e5a3 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2616,7 +2616,7 @@ static int journal_init_dev(struct super_block *super,
 		if (jdev == super->s_dev)
 			blkdev_mode &= ~FMODE_EXCL;
 		journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
-						      journal);
+						      journal, NULL);
 		journal->j_dev_mode = blkdev_mode;
 		if (IS_ERR(journal->j_dev_bd)) {
 			result = PTR_ERR(journal->j_dev_bd);
@@ -2632,7 +2632,8 @@ static int journal_init_dev(struct super_block *super,
 	}
 
 	journal->j_dev_mode = blkdev_mode;
-	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
+	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal,
+					       NULL);
 	if (IS_ERR(journal->j_dev_bd)) {
 		result = PTR_ERR(journal->j_dev_bd);
 		journal->j_dev_bd = NULL;
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2bc3..012ce140080375 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1248,7 +1248,7 @@ int get_tree_bdev(struct fs_context *fc,
 	if (!fc->source)
 		return invalf(fc, "No source specified");
 
-	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
+	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
 	if (IS_ERR(bdev)) {
 		errorf(fc, "%s: Can't open blockdev", fc->source);
 		return PTR_ERR(bdev);
@@ -1333,7 +1333,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+	bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7e706255f16502..5684c538eb76dc 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -386,7 +386,7 @@ xfs_blkdev_get(
 	int			error = 0;
 
 	*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				    mp);
+				    mp, NULL);
 	if (IS_ERR(*bdevp)) {
 		error = PTR_ERR(*bdevp);
 		xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 740afe80f29786..84a931caef514e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -55,6 +55,8 @@ struct block_device {
 	struct super_block *	bd_super;
 	void *			bd_claiming;
 	void *			bd_holder;
+	const struct blk_holder_ops *bd_holder_ops;
+	struct mutex		bd_holder_lock;
 	/* The counter of freeze processes */
 	int			bd_fsfreeze_count;
 	int			bd_holders;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b441e633f4dd49..c94f3b63c86422 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1465,10 +1465,15 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
 #define BLKDEV_MAJOR_MAX	0
 #endif
 
+struct blk_holder_ops {
+};
+
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+		const struct blk_holder_ops *hops);
 struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
-		void *holder);
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
-int bd_prepare_to_claim(struct block_device *bdev, void *holder);
+		void *holder, const struct blk_holder_ops *hops);
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+		const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
 void blkdev_put(struct block_device *bdev, fmode_t mode);
 
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 92e41ed292ada8..801c411530d11c 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -357,7 +357,7 @@ static int swsusp_swap_check(void)
 	root_swap = res;
 
 	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
-			NULL);
+			NULL, NULL);
 	if (IS_ERR(hib_resume_bdev))
 		return PTR_ERR(hib_resume_bdev);
 
@@ -1524,7 +1524,7 @@ int swsusp_check(void)
 		mode |= FMODE_EXCL;
 
 	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
-					    mode, &holder);
+					    mode, &holder, NULL);
 	if (!IS_ERR(hib_resume_bdev)) {
 		set_blocksize(hib_resume_bdev, PAGE_SIZE);
 		clear_page(swsusp_header);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 274bbf79748006..cfbcf7d5705f5f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2770,7 +2770,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 
 	if (S_ISBLK(inode->i_mode)) {
 		p->bdev = blkdev_get_by_dev(inode->i_rdev,
-				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
+				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p,
+				   NULL);
 		if (IS_ERR(p->bdev)) {
 			error = PTR_ERR(p->bdev);
 			p->bdev = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/13] block: add a mark_dead holder operation
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-30 13:05   ` Jan Kara
  2023-05-18  4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
to notify the holder that the block device it is using has been marked dead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
---
 block/genhd.c          | 24 ++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 226ddb8329f751..42aebf0e1e2628 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -565,6 +565,28 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 }
 EXPORT_SYMBOL(device_add_disk);
 
+static void blk_report_disk_dead(struct gendisk *disk)
+{
+	struct block_device *bdev;
+	unsigned long idx;
+
+	rcu_read_lock();
+	xa_for_each(&disk->part_tbl, idx, bdev) {
+		if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
+			continue;
+		rcu_read_unlock();
+
+		mutex_lock(&bdev->bd_holder_lock);
+		if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
+			bdev->bd_holder_ops->mark_dead(bdev);
+		mutex_unlock(&bdev->bd_holder_lock);
+
+		put_device(&bdev->bd_device);
+		rcu_read_lock();
+	}
+	rcu_read_unlock();
+}
+
 /**
  * blk_mark_disk_dead - mark a disk as dead
  * @disk: disk to mark as dead
@@ -592,6 +614,8 @@ void blk_mark_disk_dead(struct gendisk *disk)
 	 * Prevent new I/O from crossing bio_queue_enter().
 	 */
 	blk_queue_start_drain(disk->queue);
+
+	blk_report_disk_dead(disk);
 }
 EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c94f3b63c86422..41f894f6355f96 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1466,6 +1466,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
 #endif
 
 struct blk_holder_ops {
+	void (*mark_dead)(struct block_device *bdev);
 };
 
 struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/13] fs: add a method to shut down the file system
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Add a new ->shutdown super operation that can be used to tell the file
system to shut down, and call it from newly created holder ops when the
block device under a file system shuts down.

This only covers the main block device for "simple" file systems using
get_tree_bdev / mount_bdev.  File systems their own get_tree method
or opening additional devices will need to set up their own
blk_holder_ops.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/super.c         | 21 +++++++++++++++++++--
 include/linux/fs.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 012ce140080375..f127589700ab25 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1206,6 +1206,22 @@ int get_tree_keyed(struct fs_context *fc,
 EXPORT_SYMBOL(get_tree_keyed);
 
 #ifdef CONFIG_BLOCK
+static void fs_mark_dead(struct block_device *bdev)
+{
+	struct super_block *sb;
+
+	sb = get_super(bdev);
+	if (!sb)
+		return;
+
+	if (sb->s_op->shutdown)
+		sb->s_op->shutdown(sb);
+	drop_super(sb);
+}
+
+static const struct blk_holder_ops fs_holder_ops = {
+	.mark_dead		= fs_mark_dead,
+};
 
 static int set_bdev_super(struct super_block *s, void *data)
 {
@@ -1248,7 +1264,8 @@ int get_tree_bdev(struct fs_context *fc,
 	if (!fc->source)
 		return invalf(fc, "No source specified");
 
-	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
+	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type,
+				  &fs_holder_ops);
 	if (IS_ERR(bdev)) {
 		errorf(fc, "%s: Can't open blockdev", fc->source);
 		return PTR_ERR(bdev);
@@ -1333,7 +1350,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 	if (!(flags & SB_RDONLY))
 		mode |= FMODE_WRITE;
 
-	bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
+	bdev = blkdev_get_by_path(dev_name, mode, fs_type, &fs_holder_ops);
 	if (IS_ERR(bdev))
 		return ERR_CAST(bdev);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a98168085641..cf3042641b9b30 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1932,6 +1932,7 @@ struct super_operations {
 				  struct shrink_control *);
 	long (*free_cached_objects)(struct super_block *,
 				    struct shrink_control *);
+	void (*shutdown)(struct super_block *sb);
 };
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/13] xfs: wire up sops->shutdown
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  5:03   ` Darrick J. Wong
  2023-05-18  4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
  2023-05-19  2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead.  Add a new message to
clearly distinguish this shutdown reason from other shutdowns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_fsops.c | 3 +++
 fs/xfs/xfs_mount.h | 4 +++-
 fs/xfs/xfs_super.c | 8 ++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 13851c0d640bc8..9ebb8333a30800 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -534,6 +534,9 @@ xfs_do_force_shutdown(
 	} else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
 		tag = XFS_PTAG_SHUTDOWN_CORRUPT;
 		why = "Corruption of on-disk metadata";
+	} else if (flags & SHUTDOWN_DEVICE_REMOVED) {
+		tag = XFS_PTAG_SHUTDOWN_IOERROR;
+		why = "Block device removal";
 	} else {
 		tag = XFS_PTAG_SHUTDOWN_IOERROR;
 		why = "Metadata I/O Error";
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index aaaf5ec13492d2..429a5e12c1036e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -457,12 +457,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
 #define SHUTDOWN_FORCE_UMOUNT	(1u << 2) /* shutdown from a forced unmount */
 #define SHUTDOWN_CORRUPT_INCORE	(1u << 3) /* corrupt in-memory structures */
 #define SHUTDOWN_CORRUPT_ONDISK	(1u << 4)  /* corrupt metadata on device */
+#define SHUTDOWN_DEVICE_REMOVED	(1u << 5) /* device removed underneath us */
 
 #define XFS_SHUTDOWN_STRINGS \
 	{ SHUTDOWN_META_IO_ERROR,	"metadata_io" }, \
 	{ SHUTDOWN_LOG_IO_ERROR,	"log_io" }, \
 	{ SHUTDOWN_FORCE_UMOUNT,	"force_umount" }, \
-	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }
+	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }, \
+	{ SHUTDOWN_DEVICE_REMOVED,	"device_removed" }
 
 /*
  * Flags for xfs_mountfs
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5684c538eb76dc..eb469b8f9a0497 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1159,6 +1159,13 @@ xfs_fs_free_cached_objects(
 	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
 }
 
+static void
+xfs_fs_shutdown(
+	struct super_block	*sb)
+{
+	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
+}
+
 static const struct super_operations xfs_super_operations = {
 	.alloc_inode		= xfs_fs_alloc_inode,
 	.destroy_inode		= xfs_fs_destroy_inode,
@@ -1172,6 +1179,7 @@ static const struct super_operations xfs_super_operations = {
 	.show_options		= xfs_fs_show_options,
 	.nr_cached_objects	= xfs_fs_nr_cached_objects,
 	.free_cached_objects	= xfs_fs_free_cached_objects,
+	.shutdown		= xfs_fs_shutdown,
 };
 
 static int
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
@ 2023-05-18  4:23 ` Christoph Hellwig
  2023-05-18  5:40   ` Dave Chinner
  2023-05-19  2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
  13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18  4:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
	linux-block, linux-fsdevel, linux-xfs

Implement a set of holder_ops that shut down the file system when the
block device used as log or RT device is removed undeneath the file
system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eb469b8f9a0497..75d37bbc5415fc 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -377,6 +377,17 @@ xfs_setup_dax_always(
 	return 0;
 }
 
+static void
+xfs_hop_mark_dead(
+	struct block_device	*bdev)
+{
+	xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
+}
+
+static const struct blk_holder_ops xfs_holder_ops = {
+	.mark_dead		= xfs_hop_mark_dead,
+};
+
 STATIC int
 xfs_blkdev_get(
 	xfs_mount_t		*mp,
@@ -386,7 +397,7 @@ xfs_blkdev_get(
 	int			error = 0;
 
 	*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
-				    mp, NULL);
+				    mp, &xfs_holder_ops);
 	if (IS_ERR(*bdevp)) {
 		error = PTR_ERR(*bdevp);
 		xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 12/13] xfs: wire up sops->shutdown
  2023-05-18  4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
@ 2023-05-18  5:03   ` Darrick J. Wong
  0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2023-05-18  5:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Jan Kara, linux-block,
	linux-fsdevel, linux-xfs

On Thu, May 18, 2023 at 06:23:21AM +0200, Christoph Hellwig wrote:
> Wire up the shutdown method to shut down the file system when the
> underlying block device is marked dead.  Add a new message to
> clearly distinguish this shutdown reason from other shutdowns.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_fsops.c | 3 +++
>  fs/xfs/xfs_mount.h | 4 +++-
>  fs/xfs/xfs_super.c | 8 ++++++++
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 13851c0d640bc8..9ebb8333a30800 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -534,6 +534,9 @@ xfs_do_force_shutdown(
>  	} else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
>  		tag = XFS_PTAG_SHUTDOWN_CORRUPT;
>  		why = "Corruption of on-disk metadata";
> +	} else if (flags & SHUTDOWN_DEVICE_REMOVED) {
> +		tag = XFS_PTAG_SHUTDOWN_IOERROR;
> +		why = "Block device removal";
>  	} else {
>  		tag = XFS_PTAG_SHUTDOWN_IOERROR;
>  		why = "Metadata I/O Error";
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index aaaf5ec13492d2..429a5e12c1036e 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -457,12 +457,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
>  #define SHUTDOWN_FORCE_UMOUNT	(1u << 2) /* shutdown from a forced unmount */
>  #define SHUTDOWN_CORRUPT_INCORE	(1u << 3) /* corrupt in-memory structures */
>  #define SHUTDOWN_CORRUPT_ONDISK	(1u << 4)  /* corrupt metadata on device */
> +#define SHUTDOWN_DEVICE_REMOVED	(1u << 5) /* device removed underneath us */
>  
>  #define XFS_SHUTDOWN_STRINGS \
>  	{ SHUTDOWN_META_IO_ERROR,	"metadata_io" }, \
>  	{ SHUTDOWN_LOG_IO_ERROR,	"log_io" }, \
>  	{ SHUTDOWN_FORCE_UMOUNT,	"force_umount" }, \
> -	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }
> +	{ SHUTDOWN_CORRUPT_INCORE,	"corruption" }, \
> +	{ SHUTDOWN_DEVICE_REMOVED,	"device_removed" }
>  
>  /*
>   * Flags for xfs_mountfs
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 5684c538eb76dc..eb469b8f9a0497 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1159,6 +1159,13 @@ xfs_fs_free_cached_objects(
>  	return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
>  }
>  
> +static void
> +xfs_fs_shutdown(
> +	struct super_block	*sb)
> +{
> +	xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
> +}
> +
>  static const struct super_operations xfs_super_operations = {
>  	.alloc_inode		= xfs_fs_alloc_inode,
>  	.destroy_inode		= xfs_fs_destroy_inode,
> @@ -1172,6 +1179,7 @@ static const struct super_operations xfs_super_operations = {
>  	.show_options		= xfs_fs_show_options,
>  	.nr_cached_objects	= xfs_fs_nr_cached_objects,
>  	.free_cached_objects	= xfs_fs_free_cached_objects,
> +	.shutdown		= xfs_fs_shutdown,
>  };
>  
>  static int
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices
  2023-05-18  4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
@ 2023-05-18  5:40   ` Dave Chinner
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2023-05-18  5:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu, May 18, 2023 at 06:23:22AM +0200, Christoph Hellwig wrote:
> Implement a set of holder_ops that shut down the file system when the
> block device used as log or RT device is removed undeneath the file
> system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_super.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index eb469b8f9a0497..75d37bbc5415fc 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -377,6 +377,17 @@ xfs_setup_dax_always(
>  	return 0;
>  }
>  
> +static void
> +xfs_hop_mark_dead(
> +	struct block_device	*bdev)

I'd prefer these ops to be named "xfs_bdev_...." to indicate the are
fs bdev methods similar to how the super ops use "xfs_fs_...."
to indicate they are fs superblock methods....

Other that this this is fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: introduce bdev holder ops and a file system shutdown method v2
  2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2023-05-18  4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
@ 2023-05-19  2:00 ` Theodore Ts'o
  2023-05-19  4:11   ` Christoph Hellwig
  13 siblings, 1 reply; 28+ messages in thread
From: Theodore Ts'o @ 2023-05-19  2:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series fixes the long standing problem that we never had a good way
> to communicate block device events to the user of the block device.
> 
> It fixes this by introducing a new set of holder ops registered at
> blkdev_get_by_* time for the exclusive holder, and then wire that up
> to a shutdown super operation to report the block device remove to the
> file systems.

Thanks for working on this!  Is there going to be an fstest which
simulates a device removal while we're running fsstress or some such,
so we can exercise full device removal path?

Thanks,

					- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: introduce bdev holder ops and a file system shutdown method v2
  2023-05-19  2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
@ 2023-05-19  4:11   ` Christoph Hellwig
  2023-05-23  0:58     ` Darrick J. Wong
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-19  4:11 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Jens Axboe, Al Viro, Christian Brauner,
	Darrick J. Wong, Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu, May 18, 2023 at 10:00:12PM -0400, Theodore Ts'o wrote:
> On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> > Hi all,
> > 
> > this series fixes the long standing problem that we never had a good way
> > to communicate block device events to the user of the block device.
> > 
> > It fixes this by introducing a new set of holder ops registered at
> > blkdev_get_by_* time for the exclusive holder, and then wire that up
> > to a shutdown super operation to report the block device remove to the
> > file systems.
> 
> Thanks for working on this!  Is there going to be an fstest which
> simulates a device removal while we're running fsstress or some such,
> so we can exercise full device removal path?


So the problem with xfstests is that there isn't really any generic
way to remove a block device, and even less so to put it back.

xfstests has some scsi_debug based tests, maybe I can cook something up
for that.  My testing has been with nvme, so another option would be
to add nvme-loop support to xfstests and use that.  I'll see what I can
do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: introduce bdev holder ops and a file system shutdown method v2
  2023-05-19  4:11   ` Christoph Hellwig
@ 2023-05-23  0:58     ` Darrick J. Wong
  0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2023-05-23  0:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Ts'o, Jens Axboe, Al Viro, Christian Brauner,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Fri, May 19, 2023 at 06:11:36AM +0200, Christoph Hellwig wrote:
> On Thu, May 18, 2023 at 10:00:12PM -0400, Theodore Ts'o wrote:
> > On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> > > Hi all,
> > > 
> > > this series fixes the long standing problem that we never had a good way
> > > to communicate block device events to the user of the block device.
> > > 
> > > It fixes this by introducing a new set of holder ops registered at
> > > blkdev_get_by_* time for the exclusive holder, and then wire that up
> > > to a shutdown super operation to report the block device remove to the
> > > file systems.
> > 
> > Thanks for working on this!  Is there going to be an fstest which
> > simulates a device removal while we're running fsstress or some such,
> > so we can exercise full device removal path?
> 
> 
> So the problem with xfstests is that there isn't really any generic
> way to remove a block device, and even less so to put it back.
> 
> xfstests has some scsi_debug based tests, maybe I can cook something up
> for that.  My testing has been with nvme, so another option would be
> to add nvme-loop support to xfstests and use that.  I'll see what I can
> do.

Could you make dm-error accept a 'message' telling it to invoke all
these bdev removal actions?  There's already a bunch of helpers in
fstests to make that less awful for test authors.

--D

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] block: refactor bd_may_claim
  2023-05-18  4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
@ 2023-05-30 11:41   ` Jan Kara
  2023-06-01  8:11     ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2023-05-30 11:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:11, Christoph Hellwig wrote:
> The long if/else chain obsfucates the actual logic.  Tidy it up to be
> more structured.  Also drop the whole argument, as it can be trivially
> derived from bdev using bdev_whole, and having the bdev_whole in the
> function makes it easier to follow.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me. Just one nit below but regardless of how you decided feel
free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

> +static bool bd_may_claim(struct block_device *bdev, void *holder)
>  {
> -	if (bdev->bd_holder == holder)
> -		return true;	 /* already a holder */
> -	else if (bdev->bd_holder != NULL)
> -		return false; 	 /* held by someone else */
> -	else if (whole == bdev)
> -		return true;  	 /* is a whole device which isn't held */
> -
> -	else if (whole->bd_holder == bd_may_claim)
> -		return true; 	 /* is a partition of a device that is being partitioned */
> -	else if (whole->bd_holder != NULL)
> -		return false;	 /* is a partition of a held device */
> -	else
> -		return true;	 /* is a partition of an un-held device */
> +	struct block_device *whole = bdev_whole(bdev);
> +
> +	if (bdev->bd_holder) {
> +		/*
> +		 * The same holder can always re-claim.
> +		 */
> +		if (bdev->bd_holder == holder)
> +			return true;
> +		return false;

With this simple condition I'd just do:
		/* The same holder can always re-claim. */
		return bdev->bd_holder == holder;

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
  2023-05-18  4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
@ 2023-05-30 11:58   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 11:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:13, Christoph Hellwig wrote:
> blk_mark_disk_dead does very similar work a a section of del_gendisk:
> 
>  - set the GD_DEAD flag
>  - set the capacity to zero
>  - start a queue drain
> 
> but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
> the disk, sets the capacity to zero before starting the drain, and both
> with sending a uevent and kernel message for this fake capacity change.
> 
> Move the exact logic from the more heavily used del_gendisk into
> blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me. I've convinced myself the removed notification of device
capacity going to 0 before KOBJ_REMOVE event should not matter to userspace
;). Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/genhd.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 1cb489b927d50a..d8fe40c7d1f0a2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -572,13 +572,22 @@ EXPORT_SYMBOL(device_add_disk);
>   */
>  void blk_mark_disk_dead(struct gendisk *disk)
>  {
> +	/*
> +	 * Fail any new I/O.
> +	 */
>  	set_bit(GD_DEAD, &disk->state);
> -	blk_queue_start_drain(disk->queue);
> +	if (test_bit(GD_OWNS_QUEUE, &disk->state))
> +		blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
>  
>  	/*
>  	 * Stop buffered writers from dirtying pages that can't be written out.
>  	 */
> -	set_capacity_and_notify(disk, 0);
> +	set_capacity(disk, 0);
> +
> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_queue_start_drain(disk->queue);
>  }
>  EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>  
> @@ -620,18 +629,7 @@ void del_gendisk(struct gendisk *disk)
>  	fsync_bdev(disk->part0);
>  	__invalidate_device(disk->part0, true);
>  
> -	/*
> -	 * Fail any new I/O.
> -	 */
> -	set_bit(GD_DEAD, &disk->state);
> -	if (test_bit(GD_OWNS_QUEUE, &disk->state))
> -		blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> -	set_capacity(disk, 0);
> -
> -	/*
> -	 * Prevent new I/O from crossing bio_queue_enter().
> -	 */
> -	blk_queue_start_drain(q);
> +	blk_mark_disk_dead(disk);
>  
>  	if (!(disk->flags & GENHD_FL_HIDDEN)) {
>  		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 06/13] block: unhash the inode earlier in delete_partition
  2023-05-18  4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
@ 2023-05-30 12:09   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:15, Christoph Hellwig wrote:
> Move the call to remove_inode_hash to the beginning of delete_partition,
> as we want to prevent opening a block_device that is about to be removed
> ASAP.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

The justification looks a bit bogus because we hold disk->open_mutex in
delete_partition() which serializes with any opens anyway. But it's a
harmless code move so if it helps later then sure... Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/partitions/core.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 49e0496ff23c1e..fa5c707fe0ad2f 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -267,6 +267,12 @@ static void delete_partition(struct block_device *part)
>  {
>  	lockdep_assert_held(&part->bd_disk->open_mutex);
>  
> +	/*
> +	 * Remove the block device from the inode hash, so that it cannot be
> +	 * looked up any more even when openers still hold references.
> +	 */
> +	remove_inode_hash(part->bd_inode);
> +
>  	fsync_bdev(part);
>  	__invalidate_device(part, true);
>  
> @@ -274,12 +280,6 @@ static void delete_partition(struct block_device *part)
>  	kobject_put(part->bd_holder_dir);
>  	device_del(&part->bd_device);
>  
> -	/*
> -	 * Remove the block device from the inode hash, so that it cannot be
> -	 * looked up any more even when openers still hold references.
> -	 */
> -	remove_inode_hash(part->bd_inode);
> -
>  	put_device(&part->bd_device);
>  }
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 07/13] block: delete partitions later in del_gendisk
  2023-05-18  4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
@ 2023-05-30 12:55   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:16, Christoph Hellwig wrote:
> Delay dropping the block_devices for partitions in del_gendisk until
> after the call to blk_mark_disk_dead, so that we can implementat
> notification of removed devices in blk_mark_disk_dead.
> 
> This requires splitting a lower-level drop_partition helper out of
> delete_partition and using that from del_gendisk, while having a
> common loop for the whole device and partitions that calls
> remove_inode_hash, fsync_bdev and __invalidate_device before the
> call to blk_mark_disk_dead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/blk.h             |  2 +-
>  block/genhd.c           | 24 +++++++++++++++++++-----
>  block/partitions/core.c | 19 ++++++++++++-------
>  3 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk.h b/block/blk.h
> index 45547bcf111938..4363052f90416a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -409,7 +409,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
>  int bdev_del_partition(struct gendisk *disk, int partno);
>  int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
>  		sector_t length);
> -void blk_drop_partitions(struct gendisk *disk);
> +void drop_partition(struct block_device *part);
>  
>  void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
>  
> diff --git a/block/genhd.c b/block/genhd.c
> index a744daeed55318..bd4c4eca31363e 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -615,6 +615,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>  void del_gendisk(struct gendisk *disk)
>  {
>  	struct request_queue *q = disk->queue;
> +	struct block_device *part;
> +	unsigned long idx;
>  
>  	might_sleep();
>  
> @@ -623,16 +625,28 @@ void del_gendisk(struct gendisk *disk)
>  
>  	disk_del_events(disk);
>  
> +	/*
> +	 * Prevent new openers by unlinked the bdev inode, and write out
> +	 * dirty data before marking the disk dead and stopping all I/O.
> +	 */
>  	mutex_lock(&disk->open_mutex);
> -	remove_inode_hash(disk->part0->bd_inode);
> -	blk_drop_partitions(disk);
> +	xa_for_each(&disk->part_tbl, idx, part) {
> +		remove_inode_hash(part->bd_inode);
> +		fsync_bdev(part);
> +		__invalidate_device(part, true);
> +	}
>  	mutex_unlock(&disk->open_mutex);
>  
> -	fsync_bdev(disk->part0);
> -	__invalidate_device(disk->part0, true);
> -
>  	blk_mark_disk_dead(disk);
>  
> +	/*
> +	 * Drop all partitions now that the disk is marked dead.
> +	 */
> +	mutex_lock(&disk->open_mutex);
> +	xa_for_each_start(&disk->part_tbl, idx, part, 1)
> +		drop_partition(part);
> +	mutex_unlock(&disk->open_mutex);
> +
>  	if (!(disk->flags & GENHD_FL_HIDDEN)) {
>  		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
>  
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index fa5c707fe0ad2f..31ac815d77a83c 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -263,10 +263,19 @@ struct device_type part_type = {
>  	.uevent		= part_uevent,
>  };
>  
> -static void delete_partition(struct block_device *part)
> +void drop_partition(struct block_device *part)
>  {
>  	lockdep_assert_held(&part->bd_disk->open_mutex);
>  
> +	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
> +	kobject_put(part->bd_holder_dir);
> +
> +	device_del(&part->bd_device);
> +	put_device(&part->bd_device);
> +}
> +
> +static void delete_partition(struct block_device *part)
> +{
>  	/*
>  	 * Remove the block device from the inode hash, so that it cannot be
>  	 * looked up any more even when openers still hold references.
> @@ -276,11 +285,7 @@ static void delete_partition(struct block_device *part)
>  	fsync_bdev(part);
>  	__invalidate_device(part, true);
>  
> -	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
> -	kobject_put(part->bd_holder_dir);
> -	device_del(&part->bd_device);
> -
> -	put_device(&part->bd_device);
> +	drop_partition(part);
>  }
>  
>  static ssize_t whole_disk_show(struct device *dev,
> @@ -519,7 +524,7 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
>  	return true;
>  }
>  
> -void blk_drop_partitions(struct gendisk *disk)
> +static void blk_drop_partitions(struct gendisk *disk)
>  {
>  	struct block_device *part;
>  	unsigned long idx;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 08/13] block: remove blk_drop_partitions
  2023-05-18  4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
@ 2023-05-30 12:56   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:17, Christoph Hellwig wrote:
> There is only a single caller left, so fold the loop into that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Sure. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/partitions/core.c | 16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 31ac815d77a83c..2559bb830273eb 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -524,17 +524,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
>  	return true;
>  }
>  
> -static void blk_drop_partitions(struct gendisk *disk)
> -{
> -	struct block_device *part;
> -	unsigned long idx;
> -
> -	lockdep_assert_held(&disk->open_mutex);
> -
> -	xa_for_each_start(&disk->part_tbl, idx, part, 1)
> -		delete_partition(part);
> -}
> -
>  static bool blk_add_partition(struct gendisk *disk,
>  		struct parsed_partitions *state, int p)
>  {
> @@ -651,6 +640,8 @@ static int blk_add_partitions(struct gendisk *disk)
>  
>  int bdev_disk_changed(struct gendisk *disk, bool invalidate)
>  {
> +	struct block_device *part;
> +	unsigned long idx;
>  	int ret = 0;
>  
>  	lockdep_assert_held(&disk->open_mutex);
> @@ -663,8 +654,9 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
>  		return -EBUSY;
>  	sync_blockdev(disk->part0);
>  	invalidate_bdev(disk->part0);
> -	blk_drop_partitions(disk);
>  
> +	xa_for_each_start(&disk->part_tbl, idx, part, 1)
> +		delete_partition(part);
>  	clear_bit(GD_NEED_PART_SCAN, &disk->state);
>  
>  	/*
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/13] block: introduce holder ops
  2023-05-18  4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
@ 2023-05-30 13:03   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 13:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:18, Christoph Hellwig wrote:
> Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
> installed in the block_device for exclusive claims.  It will be used to
> allow the block layer to call back into the user of the block device for
> thing like notification of a removed device or a device resize.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c                        | 41 ++++++++++++++++++++---------
>  block/fops.c                        |  2 +-
>  block/genhd.c                       |  6 +++--
>  block/ioctl.c                       |  3 ++-
>  drivers/block/drbd/drbd_nl.c        |  3 ++-
>  drivers/block/loop.c                |  2 +-
>  drivers/block/pktcdvd.c             |  5 ++--
>  drivers/block/rnbd/rnbd-srv.c       |  2 +-
>  drivers/block/xen-blkback/xenbus.c  |  2 +-
>  drivers/block/zram/zram_drv.c       |  2 +-
>  drivers/md/bcache/super.c           |  2 +-
>  drivers/md/dm.c                     |  2 +-
>  drivers/md/md.c                     |  2 +-
>  drivers/mtd/devices/block2mtd.c     |  4 +--
>  drivers/nvme/target/io-cmd-bdev.c   |  2 +-
>  drivers/s390/block/dasd_genhd.c     |  2 +-
>  drivers/target/target_core_iblock.c |  2 +-
>  drivers/target/target_core_pscsi.c  |  3 ++-
>  fs/btrfs/dev-replace.c              |  2 +-
>  fs/btrfs/volumes.c                  |  6 ++---
>  fs/erofs/super.c                    |  2 +-
>  fs/ext4/super.c                     |  3 ++-
>  fs/f2fs/super.c                     |  4 +--
>  fs/jfs/jfs_logmgr.c                 |  2 +-
>  fs/nfs/blocklayout/dev.c            |  5 ++--
>  fs/nilfs2/super.c                   |  2 +-
>  fs/ocfs2/cluster/heartbeat.c        |  2 +-
>  fs/reiserfs/journal.c               |  5 ++--
>  fs/super.c                          |  4 +--
>  fs/xfs/xfs_super.c                  |  2 +-
>  include/linux/blk_types.h           |  2 ++
>  include/linux/blkdev.h              | 11 +++++---
>  kernel/power/swap.c                 |  4 +--
>  mm/swapfile.c                       |  3 ++-
>  34 files changed, 90 insertions(+), 56 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index f5ffcac762e0cd..5c46ff10770638 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -102,7 +102,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
>  	 * under live filesystem.
>  	 */
>  	if (!(mode & FMODE_EXCL)) {
> -		int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
> +		int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
>  		if (err)
>  			goto invalidate;
>  	}
> @@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>  	bdev = I_BDEV(inode);
>  	mutex_init(&bdev->bd_fsfreeze_mutex);
>  	spin_lock_init(&bdev->bd_size_lock);
> +	mutex_init(&bdev->bd_holder_lock);
>  	bdev->bd_partno = partno;
>  	bdev->bd_inode = inode;
>  	bdev->bd_queue = disk->queue;
> @@ -464,13 +465,15 @@ long nr_blockdev_pages(void)
>   * bd_may_claim - test whether a block device can be claimed
>   * @bdev: block device of interest
>   * @holder: holder trying to claim @bdev
> + * @hops: holder ops
>   *
>   * Test whether @bdev can be claimed by @holder.
>   *
>   * RETURNS:
>   * %true if @bdev can be claimed, %false otherwise.
>   */
> -static bool bd_may_claim(struct block_device *bdev, void *holder)
> +static bool bd_may_claim(struct block_device *bdev, void *holder,
> +		const struct blk_holder_ops *hops)
>  {
>  	struct block_device *whole = bdev_whole(bdev);
>  
> @@ -480,8 +483,11 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
>  		/*
>  		 * The same holder can always re-claim.
>  		 */
> -		if (bdev->bd_holder == holder)
> +		if (bdev->bd_holder == holder) {
> +			if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
> +				return false;
>  			return true;
> +		}
>  		return false;
>  	}
>  
> @@ -499,6 +505,7 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
>   * bd_prepare_to_claim - claim a block device
>   * @bdev: block device of interest
>   * @holder: holder trying to claim @bdev
> + * @hops: holder ops.
>   *
>   * Claim @bdev.  This function fails if @bdev is already claimed by another
>   * holder and waits if another claiming is in progress. return, the caller
> @@ -507,7 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
>   * RETURNS:
>   * 0 if @bdev can be claimed, -EBUSY otherwise.
>   */
> -int bd_prepare_to_claim(struct block_device *bdev, void *holder)
> +int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> +		const struct blk_holder_ops *hops)
>  {
>  	struct block_device *whole = bdev_whole(bdev);
>  
> @@ -516,7 +524,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
>  retry:
>  	mutex_lock(&bdev_lock);
>  	/* if someone else claimed, fail */
> -	if (!bd_may_claim(bdev, holder)) {
> +	if (!bd_may_claim(bdev, holder, hops)) {
>  		mutex_unlock(&bdev_lock);
>  		return -EBUSY;
>  	}
> @@ -557,12 +565,13 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
>   * Finish exclusive open of a block device. Mark the device as exlusively
>   * open by the holder and wake up all waiters for exclusive open to finish.
>   */
> -static void bd_finish_claiming(struct block_device *bdev, void *holder)
> +static void bd_finish_claiming(struct block_device *bdev, void *holder,
> +		const struct blk_holder_ops *hops)
>  {
>  	struct block_device *whole = bdev_whole(bdev);
>  
>  	mutex_lock(&bdev_lock);
> -	BUG_ON(!bd_may_claim(bdev, holder));
> +	BUG_ON(!bd_may_claim(bdev, holder, hops));
>  	/*
>  	 * Note that for a whole device bd_holders will be incremented twice,
>  	 * and bd_holder will be set to bd_may_claim before being set to holder
> @@ -570,7 +579,10 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
>  	whole->bd_holders++;
>  	whole->bd_holder = bd_may_claim;
>  	bdev->bd_holders++;
> +	mutex_lock(&bdev->bd_holder_lock);
>  	bdev->bd_holder = holder;
> +	bdev->bd_holder_ops = hops;
> +	mutex_unlock(&bdev->bd_holder_lock);
>  	bd_clear_claiming(whole, holder);
>  	mutex_unlock(&bdev_lock);
>  }
> @@ -605,7 +617,10 @@ static void bd_end_claim(struct block_device *bdev)
>  	WARN_ON_ONCE(--bdev->bd_holders < 0);
>  	WARN_ON_ONCE(--whole->bd_holders < 0);
>  	if (!bdev->bd_holders) {
> +		mutex_lock(&bdev->bd_holder_lock);
>  		bdev->bd_holder = NULL;
> +		bdev->bd_holder_ops = NULL;
> +		mutex_unlock(&bdev->bd_holder_lock);
>  		if (bdev->bd_write_holder)
>  			unblock = true;
>  	}
> @@ -735,6 +750,7 @@ void blkdev_put_no_open(struct block_device *bdev)
>   * @dev: device number of block device to open
>   * @mode: FMODE_* mask
>   * @holder: exclusive holder identifier
> + * @hops: holder operations
>   *
>   * Open the block device described by device number @dev. If @mode includes
>   * %FMODE_EXCL, the block device is opened with exclusive access.  Specifying
> @@ -751,7 +767,8 @@ void blkdev_put_no_open(struct block_device *bdev)
>   * RETURNS:
>   * Reference to the block_device on success, ERR_PTR(-errno) on failure.
>   */
> -struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
> +struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> +		const struct blk_holder_ops *hops)
>  {
>  	bool unblock_events = true;
>  	struct block_device *bdev;
> @@ -771,7 +788,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
>  	disk = bdev->bd_disk;
>  
>  	if (mode & FMODE_EXCL) {
> -		ret = bd_prepare_to_claim(bdev, holder);
> +		ret = bd_prepare_to_claim(bdev, holder, hops);
>  		if (ret)
>  			goto put_blkdev;
>  	}
> @@ -791,7 +808,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
>  	if (ret)
>  		goto put_module;
>  	if (mode & FMODE_EXCL) {
> -		bd_finish_claiming(bdev, holder);
> +		bd_finish_claiming(bdev, holder, hops);
>  
>  		/*
>  		 * Block event polling for write claims if requested.  Any write
> @@ -842,7 +859,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
>   * Reference to the block_device on success, ERR_PTR(-errno) on failure.
>   */
>  struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
> -					void *holder)
> +		void *holder, const struct blk_holder_ops *hops)
>  {
>  	struct block_device *bdev;
>  	dev_t dev;
> @@ -852,7 +869,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
>  	if (error)
>  		return ERR_PTR(error);
>  
> -	bdev = blkdev_get_by_dev(dev, mode, holder);
> +	bdev = blkdev_get_by_dev(dev, mode, holder, hops);
>  	if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
>  		blkdev_put(bdev, mode);
>  		return ERR_PTR(-EACCES);
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c7d..2ac5ea878fa4cc 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -490,7 +490,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
>  	if ((filp->f_flags & O_ACCMODE) == 3)
>  		filp->f_mode |= FMODE_WRITE_IOCTL;
>  
> -	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
> +	bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp, NULL);
>  	if (IS_ERR(bdev))
>  		return PTR_ERR(bdev);
>  
> diff --git a/block/genhd.c b/block/genhd.c
> index bd4c4eca31363e..226ddb8329f751 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -370,13 +370,15 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
>  	 * scanners.
>  	 */
>  	if (!(mode & FMODE_EXCL)) {
> -		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
> +		ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
> +					  NULL);
>  		if (ret)
>  			return ret;
>  	}
>  
>  	set_bit(GD_NEED_PART_SCAN, &disk->state);
> -	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
> +	bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL,
> +				 NULL);
>  	if (IS_ERR(bdev))
>  		ret =  PTR_ERR(bdev);
>  	else
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 9c5f637ff153f8..c7d7d4345edb4f 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -454,7 +454,8 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
>  	if (mode & FMODE_EXCL)
>  		return set_blocksize(bdev, n);
>  
> -	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
> +	if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev,
> +			NULL)))
>  		return -EBUSY;
>  	ret = set_blocksize(bdev, n);
>  	blkdev_put(bdev, mode | FMODE_EXCL);
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index 1a5d3d72d91d27..cab59dab3410aa 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -1641,7 +1641,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
>  	int err = 0;
>  
>  	bdev = blkdev_get_by_path(bdev_path,
> -				  FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
> +				  FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> +				  claim_ptr, NULL);
>  	if (IS_ERR(bdev)) {
>  		drbd_err(device, "open(\"%s\") failed with %ld\n",
>  				bdev_path, PTR_ERR(bdev));
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bc31bb7072a2cb..a73c857f5bfed0 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1015,7 +1015,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
>  	 * here to avoid changing device under exclusive owner.
>  	 */
>  	if (!(mode & FMODE_EXCL)) {
> -		error = bd_prepare_to_claim(bdev, loop_configure);
> +		error = bd_prepare_to_claim(bdev, loop_configure, NULL);
>  		if (error)
>  			goto out_putf;
>  	}
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index d5d7884cedd477..377f8b34535294 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -2125,7 +2125,8 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
>  	 * to read/write from/to it. It is already opened in O_NONBLOCK mode
>  	 * so open should not fail.
>  	 */
> -	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
> +	bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd,
> +				 NULL);
>  	if (IS_ERR(bdev)) {
>  		ret = PTR_ERR(bdev);
>  		goto out;
> @@ -2530,7 +2531,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
>  		}
>  	}
>  
> -	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
> +	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL, NULL);
>  	if (IS_ERR(bdev))
>  		return PTR_ERR(bdev);
>  	sdev = scsi_device_from_queue(bdev->bd_disk->queue);
> diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> index 2cfed2e58d646f..cec22bbae2f9a5 100644
> --- a/drivers/block/rnbd/rnbd-srv.c
> +++ b/drivers/block/rnbd/rnbd-srv.c
> @@ -719,7 +719,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
>  		goto reject;
>  	}
>  
> -	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
> +	bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE, NULL);
>  	if (IS_ERR(bdev)) {
>  		ret = PTR_ERR(bdev);
>  		pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 4807af1d580593..43b36da9b3544d 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
>  	vbd->pdevice  = MKDEV(major, minor);
>  
>  	bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
> -				 FMODE_READ : FMODE_WRITE, NULL);
> +				 FMODE_READ : FMODE_WRITE, NULL, NULL);
>  
>  	if (IS_ERR(bdev)) {
>  		pr_warn("xen_vbd_create: device %08x could not be opened\n",
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f6d90f1ba5cf7b..ef9dc4ef6796da 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -508,7 +508,7 @@ static ssize_t backing_dev_store(struct device *dev,
>  	}
>  
>  	bdev = blkdev_get_by_dev(inode->i_rdev,
> -			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
> +			FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram, NULL);
>  	if (IS_ERR(bdev)) {
>  		err = PTR_ERR(bdev);
>  		bdev = NULL;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 7e9d19fd21ddd5..d84c09a73af803 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2560,7 +2560,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>  	err = "failed to open device";
>  	bdev = blkdev_get_by_path(strim(path),
>  				  FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> -				  sb);
> +				  sb, NULL);
>  	if (IS_ERR(bdev)) {
>  		if (bdev == ERR_PTR(-EBUSY)) {
>  			dev_t dev;
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3b694ba3a106e6..d759f8bdb3df2f 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
>  		return ERR_PTR(-ENOMEM);
>  	refcount_set(&td->count, 1);
>  
> -	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
> +	bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr, NULL);
>  	if (IS_ERR(bdev)) {
>  		r = PTR_ERR(bdev);
>  		goto out_free_td;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8e344b4b34446f..60ab5c4bee77c5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -3642,7 +3642,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
>  
>  	rdev->bdev = blkdev_get_by_dev(newdev,
>  			FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> -			super_format == -2 ? &claim_rdev : rdev);
> +			super_format == -2 ? &claim_rdev : rdev, NULL);
>  	if (IS_ERR(rdev->bdev)) {
>  		pr_warn("md: could not open device unknown-block(%u,%u).\n",
>  			MAJOR(newdev), MINOR(newdev));
> diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
> index 4cd37ec45762b6..7ac82c6fe35024 100644
> --- a/drivers/mtd/devices/block2mtd.c
> +++ b/drivers/mtd/devices/block2mtd.c
> @@ -235,7 +235,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
>  		return NULL;
>  
>  	/* Get a handle on the device */
> -	bdev = blkdev_get_by_path(devname, mode, dev);
> +	bdev = blkdev_get_by_path(devname, mode, dev, NULL);
>  
>  #ifndef MODULE
>  	/*
> @@ -257,7 +257,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
>  		devt = name_to_dev_t(devname);
>  		if (!devt)
>  			continue;
> -		bdev = blkdev_get_by_dev(devt, mode, dev);
> +		bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
>  	}
>  #endif
>  
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index c2d6cea0236b0a..9b6d6d85c72544 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>  		return -ENOTBLK;
>  
>  	ns->bdev = blkdev_get_by_path(ns->device_path,
> -			FMODE_READ | FMODE_WRITE, NULL);
> +			FMODE_READ | FMODE_WRITE, NULL, NULL);
>  	if (IS_ERR(ns->bdev)) {
>  		ret = PTR_ERR(ns->bdev);
>  		if (ret != -ENOTBLK) {
> diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
> index 998a961e170417..f21198bc483e1a 100644
> --- a/drivers/s390/block/dasd_genhd.c
> +++ b/drivers/s390/block/dasd_genhd.c
> @@ -130,7 +130,7 @@ int dasd_scan_partitions(struct dasd_block *block)
>  	struct block_device *bdev;
>  	int rc;
>  
> -	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
> +	bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL, NULL);
>  	if (IS_ERR(bdev)) {
>  		DBF_DEV_EVENT(DBF_ERR, block->base,
>  			      "scan partitions error, blkdev_get returned %ld",
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index cc838ffd129472..a5cbbefa78ee4e 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -114,7 +114,7 @@ static int iblock_configure_device(struct se_device *dev)
>  	else
>  		dev->dev_flags |= DF_READ_ONLY;
>  
> -	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
> +	bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
>  	if (IS_ERR(bd)) {
>  		ret = PTR_ERR(bd);
>  		goto out_free_bioset;
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index e7425549e39c73..e3494e036c6c85 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -367,7 +367,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
>  	 * for TYPE_DISK and TYPE_ZBC using supplied udev_path
>  	 */
>  	bd = blkdev_get_by_path(dev->udev_path,
> -				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
> +				FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv,
> +				NULL);
>  	if (IS_ERR(bd)) {
>  		pr_err("pSCSI: blkdev_get_by_path() failed\n");
>  		scsi_device_put(sd);
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 78696d331639bd..4de4984fa99ba3 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -258,7 +258,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
>  	}
>  
>  	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
> -				  fs_info->bdev_holder);
> +				  fs_info->bdev_holder, NULL);
>  	if (IS_ERR(bdev)) {
>  		btrfs_err(fs_info, "target device %s is invalid!", device_path);
>  		return PTR_ERR(bdev);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 841e799dece51b..784ccc8f6c69c1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -496,7 +496,7 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
>  {
>  	int ret;
>  
> -	*bdev = blkdev_get_by_path(device_path, flags, holder);
> +	*bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
>  
>  	if (IS_ERR(*bdev)) {
>  		ret = PTR_ERR(*bdev);
> @@ -1377,7 +1377,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
>  	 * values temporarily, as the device paths of the fsid are the only
>  	 * required information for assembling the volume.
>  	 */
> -	bdev = blkdev_get_by_path(path, flags, holder);
> +	bdev = blkdev_get_by_path(path, flags, holder, NULL);
>  	if (IS_ERR(bdev))
>  		return ERR_CAST(bdev);
>  
> @@ -2629,7 +2629,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  		return -EROFS;
>  
>  	bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
> -				  fs_info->bdev_holder);
> +				  fs_info->bdev_holder, NULL);
>  	if (IS_ERR(bdev))
>  		return PTR_ERR(bdev);
>  
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 811ab66d805ede..6c263e9cd38b2a 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -254,7 +254,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
>  		dif->fscache = fscache;
>  	} else if (!sbi->devs->flatdev) {
>  		bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
> -					  sb->s_type);
> +					  sb->s_type, NULL);
>  		if (IS_ERR(bdev))
>  			return PTR_ERR(bdev);
>  		dif->bdev = bdev;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9680fe753e599a..865625089ecca3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1103,7 +1103,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
>  {
>  	struct block_device *bdev;
>  
> -	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
> +	bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
> +				 NULL);
>  	if (IS_ERR(bdev))
>  		goto fail;
>  	return bdev;
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 9f15b03037dba9..7c34ab082f1382 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -4025,7 +4025,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
>  			/* Single zoned block device mount */
>  			FDEV(0).bdev =
>  				blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
> -					sbi->sb->s_mode, sbi->sb->s_type);
> +					sbi->sb->s_mode, sbi->sb->s_type, NULL);
>  		} else {
>  			/* Multi-device mount */
>  			memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
> @@ -4044,7 +4044,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
>  					sbi->log_blocks_per_seg) - 1;
>  			}
>  			FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
> -					sbi->sb->s_mode, sbi->sb->s_type);
> +					sbi->sb->s_mode, sbi->sb->s_type, NULL);
>  		}
>  		if (IS_ERR(FDEV(i).bdev))
>  			return PTR_ERR(FDEV(i).bdev);
> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
> index 695415cbfe985b..8c55030c57ed52 100644
> --- a/fs/jfs/jfs_logmgr.c
> +++ b/fs/jfs/jfs_logmgr.c
> @@ -1101,7 +1101,7 @@ int lmLogOpen(struct super_block *sb)
>  	 */
>  
>  	bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> -				 log);
> +				 log, NULL);
>  	if (IS_ERR(bdev)) {
>  		rc = PTR_ERR(bdev);
>  		goto free;
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index fea5f8821da5ef..38b066ca699ed7 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -243,7 +243,7 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
>  	if (!dev)
>  		return -EIO;
>  
> -	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
> +	bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL, NULL);
>  	if (IS_ERR(bdev)) {
>  		printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
>  			MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
> @@ -312,7 +312,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
>  	if (!devname)
>  		return ERR_PTR(-ENOMEM);
>  
> -	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
> +	bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL,
> +				  NULL);
>  	if (IS_ERR(bdev)) {
>  		pr_warn("pNFS: failed to open device %s (%ld)\n",
>  			devname, PTR_ERR(bdev));
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 77f1e5778d1c84..91bfbd973d1d53 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -1285,7 +1285,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
>  	if (!(flags & SB_RDONLY))
>  		mode |= FMODE_WRITE;
>  
> -	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
> +	sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
>  	if (IS_ERR(sd.bdev))
>  		return ERR_CAST(sd.bdev);
>  
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index 60b97c92e2b25e..6b13b8c3f2b8af 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -1786,7 +1786,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
>  		goto out2;
>  
>  	reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
> -					 FMODE_WRITE | FMODE_READ, NULL);
> +					 FMODE_WRITE | FMODE_READ, NULL, NULL);
>  	if (IS_ERR(reg->hr_bdev)) {
>  		ret = PTR_ERR(reg->hr_bdev);
>  		reg->hr_bdev = NULL;
> diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
> index 4d11d60f493c14..5e4db9a0c8e5a3 100644
> --- a/fs/reiserfs/journal.c
> +++ b/fs/reiserfs/journal.c
> @@ -2616,7 +2616,7 @@ static int journal_init_dev(struct super_block *super,
>  		if (jdev == super->s_dev)
>  			blkdev_mode &= ~FMODE_EXCL;
>  		journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
> -						      journal);
> +						      journal, NULL);
>  		journal->j_dev_mode = blkdev_mode;
>  		if (IS_ERR(journal->j_dev_bd)) {
>  			result = PTR_ERR(journal->j_dev_bd);
> @@ -2632,7 +2632,8 @@ static int journal_init_dev(struct super_block *super,
>  	}
>  
>  	journal->j_dev_mode = blkdev_mode;
> -	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
> +	journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal,
> +					       NULL);
>  	if (IS_ERR(journal->j_dev_bd)) {
>  		result = PTR_ERR(journal->j_dev_bd);
>  		journal->j_dev_bd = NULL;
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2bc3..012ce140080375 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1248,7 +1248,7 @@ int get_tree_bdev(struct fs_context *fc,
>  	if (!fc->source)
>  		return invalf(fc, "No source specified");
>  
> -	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
> +	bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
>  	if (IS_ERR(bdev)) {
>  		errorf(fc, "%s: Can't open blockdev", fc->source);
>  		return PTR_ERR(bdev);
> @@ -1333,7 +1333,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
>  	if (!(flags & SB_RDONLY))
>  		mode |= FMODE_WRITE;
>  
> -	bdev = blkdev_get_by_path(dev_name, mode, fs_type);
> +	bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
>  	if (IS_ERR(bdev))
>  		return ERR_CAST(bdev);
>  
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 7e706255f16502..5684c538eb76dc 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -386,7 +386,7 @@ xfs_blkdev_get(
>  	int			error = 0;
>  
>  	*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> -				    mp);
> +				    mp, NULL);
>  	if (IS_ERR(*bdevp)) {
>  		error = PTR_ERR(*bdevp);
>  		xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 740afe80f29786..84a931caef514e 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -55,6 +55,8 @@ struct block_device {
>  	struct super_block *	bd_super;
>  	void *			bd_claiming;
>  	void *			bd_holder;
> +	const struct blk_holder_ops *bd_holder_ops;
> +	struct mutex		bd_holder_lock;
>  	/* The counter of freeze processes */
>  	int			bd_fsfreeze_count;
>  	int			bd_holders;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b441e633f4dd49..c94f3b63c86422 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1465,10 +1465,15 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
>  #define BLKDEV_MAJOR_MAX	0
>  #endif
>  
> +struct blk_holder_ops {
> +};
> +
> +struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> +		const struct blk_holder_ops *hops);
>  struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
> -		void *holder);
> -struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
> -int bd_prepare_to_claim(struct block_device *bdev, void *holder);
> +		void *holder, const struct blk_holder_ops *hops);
> +int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> +		const struct blk_holder_ops *hops);
>  void bd_abort_claiming(struct block_device *bdev, void *holder);
>  void blkdev_put(struct block_device *bdev, fmode_t mode);
>  
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index 92e41ed292ada8..801c411530d11c 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -357,7 +357,7 @@ static int swsusp_swap_check(void)
>  	root_swap = res;
>  
>  	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
> -			NULL);
> +			NULL, NULL);
>  	if (IS_ERR(hib_resume_bdev))
>  		return PTR_ERR(hib_resume_bdev);
>  
> @@ -1524,7 +1524,7 @@ int swsusp_check(void)
>  		mode |= FMODE_EXCL;
>  
>  	hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
> -					    mode, &holder);
> +					    mode, &holder, NULL);
>  	if (!IS_ERR(hib_resume_bdev)) {
>  		set_blocksize(hib_resume_bdev, PAGE_SIZE);
>  		clear_page(swsusp_header);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 274bbf79748006..cfbcf7d5705f5f 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2770,7 +2770,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
>  
>  	if (S_ISBLK(inode->i_mode)) {
>  		p->bdev = blkdev_get_by_dev(inode->i_rdev,
> -				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
> +				   FMODE_READ | FMODE_WRITE | FMODE_EXCL, p,
> +				   NULL);
>  		if (IS_ERR(p->bdev)) {
>  			error = PTR_ERR(p->bdev);
>  			p->bdev = NULL;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 10/13] block: add a mark_dead holder operation
  2023-05-18  4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
@ 2023-05-30 13:05   ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
	Jan Kara, linux-block, linux-fsdevel, linux-xfs

On Thu 18-05-23 06:23:19, Christoph Hellwig wrote:
> Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
> to notify the holder that the block device it is using has been marked dead.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Christian Brauner <brauner@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  block/genhd.c          | 24 ++++++++++++++++++++++++
>  include/linux/blkdev.h |  1 +
>  2 files changed, 25 insertions(+)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 226ddb8329f751..42aebf0e1e2628 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -565,6 +565,28 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
>  }
>  EXPORT_SYMBOL(device_add_disk);
>  
> +static void blk_report_disk_dead(struct gendisk *disk)
> +{
> +	struct block_device *bdev;
> +	unsigned long idx;
> +
> +	rcu_read_lock();
> +	xa_for_each(&disk->part_tbl, idx, bdev) {
> +		if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
> +			continue;
> +		rcu_read_unlock();
> +
> +		mutex_lock(&bdev->bd_holder_lock);
> +		if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
> +			bdev->bd_holder_ops->mark_dead(bdev);
> +		mutex_unlock(&bdev->bd_holder_lock);
> +
> +		put_device(&bdev->bd_device);
> +		rcu_read_lock();
> +	}
> +	rcu_read_unlock();
> +}
> +
>  /**
>   * blk_mark_disk_dead - mark a disk as dead
>   * @disk: disk to mark as dead
> @@ -592,6 +614,8 @@ void blk_mark_disk_dead(struct gendisk *disk)
>  	 * Prevent new I/O from crossing bio_queue_enter().
>  	 */
>  	blk_queue_start_drain(disk->queue);
> +
> +	blk_report_disk_dead(disk);
>  }
>  EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c94f3b63c86422..41f894f6355f96 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1466,6 +1466,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
>  #endif
>  
>  struct blk_holder_ops {
> +	void (*mark_dead)(struct block_device *bdev);
>  };
>  
>  struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] block: refactor bd_may_claim
  2023-05-30 11:41   ` Jan Kara
@ 2023-06-01  8:11     ` Christoph Hellwig
  2023-06-01  9:58       ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01  8:11 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Jens Axboe, Al Viro, Christian Brauner,
	Darrick J. Wong, linux-block, linux-fsdevel, linux-xfs

On Tue, May 30, 2023 at 01:41:48PM +0200, Jan Kara wrote:
> > +	if (bdev->bd_holder) {
> > +		/*
> > +		 * The same holder can always re-claim.
> > +		 */
> > +		if (bdev->bd_holder == holder)
> > +			return true;
> > +		return false;
> 
> With this simple condition I'd just do:
> 		/* The same holder can always re-claim. */
> 		return bdev->bd_holder == holder;

As of this patch this makes sense, and I did in fact did it that
way first.  But once we start checking the holder ops we need
the eplcicit conditional, so I decided to start out with this more
verbose option to avoid churn later.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] block: refactor bd_may_claim
  2023-06-01  8:11     ` Christoph Hellwig
@ 2023-06-01  9:58       ` Jan Kara
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-06-01  9:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Jens Axboe, Al Viro, Christian Brauner,
	Darrick J. Wong, linux-block, linux-fsdevel, linux-xfs

On Thu 01-06-23 10:11:05, Christoph Hellwig wrote:
> On Tue, May 30, 2023 at 01:41:48PM +0200, Jan Kara wrote:
> > > +	if (bdev->bd_holder) {
> > > +		/*
> > > +		 * The same holder can always re-claim.
> > > +		 */
> > > +		if (bdev->bd_holder == holder)
> > > +			return true;
> > > +		return false;
> > 
> > With this simple condition I'd just do:
> > 		/* The same holder can always re-claim. */
> > 		return bdev->bd_holder == holder;
> 
> As of this patch this makes sense, and I did in fact did it that
> way first.  But once we start checking the holder ops we need
> the eplcicit conditional, so I decided to start out with this more
> verbose option to avoid churn later.

Ah, OK.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-06-01  9:58 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18  4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
2023-05-18  4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
2023-05-18  4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
2023-05-30 11:41   ` Jan Kara
2023-06-01  8:11     ` Christoph Hellwig
2023-06-01  9:58       ` Jan Kara
2023-05-18  4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
2023-05-18  4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
2023-05-30 11:58   ` Jan Kara
2023-05-18  4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
2023-05-18  4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
2023-05-30 12:09   ` Jan Kara
2023-05-18  4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
2023-05-30 12:55   ` Jan Kara
2023-05-18  4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
2023-05-30 12:56   ` Jan Kara
2023-05-18  4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
2023-05-30 13:03   ` Jan Kara
2023-05-18  4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
2023-05-30 13:05   ` Jan Kara
2023-05-18  4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
2023-05-18  4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
2023-05-18  5:03   ` Darrick J. Wong
2023-05-18  4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
2023-05-18  5:40   ` Dave Chinner
2023-05-19  2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
2023-05-19  4:11   ` Christoph Hellwig
2023-05-23  0:58     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).