* introduce bdev holder ops and a file system shutdown method v2
@ 2023-05-18 4:23 Christoph Hellwig
2023-05-18 4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
` (13 more replies)
0 siblings, 14 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Hi all,
this series fixes the long standing problem that we never had a good way
to communicate block device events to the user of the block device.
It fixes this by introducing a new set of holder ops registered at
blkdev_get_by_* time for the exclusive holder, and then wire that up
to a shutdown super operation to report the block device remove to the
file systems.
Changes since v1:
- add a patch to refactor bd_may_claim
- add a sanity check for mismatching holder ops in bd_may_claim
- move partition removal later in del_gendisk so that partitions
are still around for the shutdown notification
- add SHUTDOWN_DEVICE_REMOVED to XFS_SHUTDOWN_STRINGS
Diffstat:
block/bdev.c | 159 ++++++++++++++++++++----------------
block/blk.h | 2
block/fops.c | 2
block/genhd.c | 78 +++++++++++++----
block/ioctl.c | 3
block/partitions/core.c | 31 +++----
drivers/block/drbd/drbd_nl.c | 3
drivers/block/loop.c | 2
drivers/block/pktcdvd.c | 5 -
drivers/block/rnbd/rnbd-srv.c | 2
drivers/block/xen-blkback/xenbus.c | 2
drivers/block/zram/zram_drv.c | 2
drivers/md/bcache/super.c | 2
drivers/md/dm.c | 2
drivers/md/md.c | 2
drivers/mtd/devices/block2mtd.c | 4
drivers/nvme/target/io-cmd-bdev.c | 2
drivers/s390/block/dasd_genhd.c | 2
drivers/target/target_core_iblock.c | 2
drivers/target/target_core_pscsi.c | 3
fs/btrfs/dev-replace.c | 2
fs/btrfs/volumes.c | 6 -
fs/erofs/super.c | 2
fs/ext4/super.c | 3
fs/f2fs/super.c | 4
fs/jfs/jfs_logmgr.c | 2
fs/nfs/blocklayout/dev.c | 5 -
fs/nilfs2/super.c | 2
fs/ocfs2/cluster/heartbeat.c | 2
fs/reiserfs/journal.c | 5 -
fs/super.c | 21 ++++
fs/xfs/xfs_fsops.c | 3
fs/xfs/xfs_mount.h | 4
fs/xfs/xfs_super.c | 21 ++++
include/linux/blk_types.h | 2
include/linux/blkdev.h | 12 ++
include/linux/fs.h | 1
kernel/power/swap.c | 4
mm/swapfile.c | 3
39 files changed, 266 insertions(+), 148 deletions(-)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
` (12 subsequent siblings)
13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Move all the logic to release an exclusive claim into a helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
block/bdev.c | 63 +++++++++++++++++++++++++++-------------------------
1 file changed, 33 insertions(+), 30 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 21c63bfef3237a..317bfd9cba40fa 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -589,6 +589,37 @@ void bd_abort_claiming(struct block_device *bdev, void *holder)
}
EXPORT_SYMBOL(bd_abort_claiming);
+static void bd_end_claim(struct block_device *bdev)
+{
+ struct block_device *whole = bdev_whole(bdev);
+ bool unblock = false;
+
+ /*
+ * Release a claim on the device. The holder fields are protected with
+ * bdev_lock. open_mutex is used to synchronize disk_holder unlinking.
+ */
+ spin_lock(&bdev_lock);
+ WARN_ON_ONCE(--bdev->bd_holders < 0);
+ WARN_ON_ONCE(--whole->bd_holders < 0);
+ if (!bdev->bd_holders) {
+ bdev->bd_holder = NULL;
+ if (bdev->bd_write_holder)
+ unblock = true;
+ }
+ if (!whole->bd_holders)
+ whole->bd_holder = NULL;
+ spin_unlock(&bdev_lock);
+
+ /*
+ * If this was the last claim, remove holder link and unblock evpoll if
+ * it was a write holder.
+ */
+ if (unblock) {
+ disk_unblock_events(bdev->bd_disk);
+ bdev->bd_write_holder = false;
+ }
+}
+
static void blkdev_flush_mapping(struct block_device *bdev)
{
WARN_ON_ONCE(bdev->bd_holders);
@@ -843,36 +874,8 @@ void blkdev_put(struct block_device *bdev, fmode_t mode)
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
- if (mode & FMODE_EXCL) {
- struct block_device *whole = bdev_whole(bdev);
- bool bdev_free;
-
- /*
- * Release a claim on the device. The holder fields
- * are protected with bdev_lock. open_mutex is to
- * synchronize disk_holder unlinking.
- */
- spin_lock(&bdev_lock);
-
- WARN_ON_ONCE(--bdev->bd_holders < 0);
- WARN_ON_ONCE(--whole->bd_holders < 0);
-
- if ((bdev_free = !bdev->bd_holders))
- bdev->bd_holder = NULL;
- if (!whole->bd_holders)
- whole->bd_holder = NULL;
-
- spin_unlock(&bdev_lock);
-
- /*
- * If this was the last claim, remove holder link and
- * unblock evpoll if it was a write holder.
- */
- if (bdev_free && bdev->bd_write_holder) {
- disk_unblock_events(disk);
- bdev->bd_write_holder = false;
- }
- }
+ if (mode & FMODE_EXCL)
+ bd_end_claim(bdev);
/*
* Trigger event checking and tell drivers to flush MEDIA_CHANGE
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 02/13] block: refactor bd_may_claim
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
2023-05-18 4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 11:41 ` Jan Kara
2023-05-18 4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
` (11 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
The long if/else chain obsfucates the actual logic. Tidy it up to be
more structured. Also drop the whole argument, as it can be trivially
derived from bdev using bdev_whole, and having the bdev_whole in the
function makes it easier to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/bdev.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 317bfd9cba40fa..080b5c83bfbc72 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -463,7 +463,6 @@ long nr_blockdev_pages(void)
/**
* bd_may_claim - test whether a block device can be claimed
* @bdev: block device of interest
- * @whole: whole block device containing @bdev, may equal @bdev
* @holder: holder trying to claim @bdev
*
* Test whether @bdev can be claimed by @holder.
@@ -474,22 +473,27 @@ long nr_blockdev_pages(void)
* RETURNS:
* %true if @bdev can be claimed, %false otherwise.
*/
-static bool bd_may_claim(struct block_device *bdev, struct block_device *whole,
- void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder)
{
- if (bdev->bd_holder == holder)
- return true; /* already a holder */
- else if (bdev->bd_holder != NULL)
- return false; /* held by someone else */
- else if (whole == bdev)
- return true; /* is a whole device which isn't held */
-
- else if (whole->bd_holder == bd_may_claim)
- return true; /* is a partition of a device that is being partitioned */
- else if (whole->bd_holder != NULL)
- return false; /* is a partition of a held device */
- else
- return true; /* is a partition of an un-held device */
+ struct block_device *whole = bdev_whole(bdev);
+
+ if (bdev->bd_holder) {
+ /*
+ * The same holder can always re-claim.
+ */
+ if (bdev->bd_holder == holder)
+ return true;
+ return false;
+ }
+
+ /*
+ * If the whole devices holder is set to bd_may_claim, a partition on
+ * the device is claimed, but not the whole device.
+ */
+ if (whole != bdev &&
+ whole->bd_holder && whole->bd_holder != bd_may_claim)
+ return false;
+ return true;
}
/**
@@ -513,7 +517,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
retry:
spin_lock(&bdev_lock);
/* if someone else claimed, fail */
- if (!bd_may_claim(bdev, whole, holder)) {
+ if (!bd_may_claim(bdev, holder)) {
spin_unlock(&bdev_lock);
return -EBUSY;
}
@@ -559,7 +563,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
struct block_device *whole = bdev_whole(bdev);
spin_lock(&bdev_lock);
- BUG_ON(!bd_may_claim(bdev, whole, holder));
+ BUG_ON(!bd_may_claim(bdev, holder));
/*
* Note that for a whole device bd_holders will be incremented twice,
* and bd_holder will be set to bd_may_claim before being set to holder
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 03/13] block: turn bdev_lock into a mutex
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
2023-05-18 4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
2023-05-18 4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
` (10 subsequent siblings)
13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
There is no reason for this lock to spin, and being able to sleep under
it will come in handy soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
block/bdev.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 080b5c83bfbc72..f5ffcac762e0cd 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -308,7 +308,7 @@ EXPORT_SYMBOL(thaw_bdev);
* pseudo-fs
*/
-static __cacheline_aligned_in_smp DEFINE_SPINLOCK(bdev_lock);
+static __cacheline_aligned_in_smp DEFINE_MUTEX(bdev_lock);
static struct kmem_cache * bdev_cachep __read_mostly;
static struct inode *bdev_alloc_inode(struct super_block *sb)
@@ -467,9 +467,6 @@ long nr_blockdev_pages(void)
*
* Test whether @bdev can be claimed by @holder.
*
- * CONTEXT:
- * spin_lock(&bdev_lock).
- *
* RETURNS:
* %true if @bdev can be claimed, %false otherwise.
*/
@@ -477,6 +474,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
{
struct block_device *whole = bdev_whole(bdev);
+ lockdep_assert_held(&bdev_lock);
+
if (bdev->bd_holder) {
/*
* The same holder can always re-claim.
@@ -515,10 +514,10 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
if (WARN_ON_ONCE(!holder))
return -EINVAL;
retry:
- spin_lock(&bdev_lock);
+ mutex_lock(&bdev_lock);
/* if someone else claimed, fail */
if (!bd_may_claim(bdev, holder)) {
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
return -EBUSY;
}
@@ -528,7 +527,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
DEFINE_WAIT(wait);
prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
schedule();
finish_wait(wq, &wait);
goto retry;
@@ -536,7 +535,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
/* yay, all mine */
whole->bd_claiming = holder;
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
return 0;
}
EXPORT_SYMBOL_GPL(bd_prepare_to_claim); /* only for the loop driver */
@@ -562,7 +561,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
{
struct block_device *whole = bdev_whole(bdev);
- spin_lock(&bdev_lock);
+ mutex_lock(&bdev_lock);
BUG_ON(!bd_may_claim(bdev, holder));
/*
* Note that for a whole device bd_holders will be incremented twice,
@@ -573,7 +572,7 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
bdev->bd_holders++;
bdev->bd_holder = holder;
bd_clear_claiming(whole, holder);
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
}
/**
@@ -587,9 +586,9 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
*/
void bd_abort_claiming(struct block_device *bdev, void *holder)
{
- spin_lock(&bdev_lock);
+ mutex_lock(&bdev_lock);
bd_clear_claiming(bdev_whole(bdev), holder);
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
}
EXPORT_SYMBOL(bd_abort_claiming);
@@ -602,7 +601,7 @@ static void bd_end_claim(struct block_device *bdev)
* Release a claim on the device. The holder fields are protected with
* bdev_lock. open_mutex is used to synchronize disk_holder unlinking.
*/
- spin_lock(&bdev_lock);
+ mutex_lock(&bdev_lock);
WARN_ON_ONCE(--bdev->bd_holders < 0);
WARN_ON_ONCE(--whole->bd_holders < 0);
if (!bdev->bd_holders) {
@@ -612,7 +611,7 @@ static void bd_end_claim(struct block_device *bdev)
}
if (!whole->bd_holders)
whole->bd_holder = NULL;
- spin_unlock(&bdev_lock);
+ mutex_unlock(&bdev_lock);
/*
* If this was the last claim, remove holder link and unblock evpoll if
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (2 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 11:58 ` Jan Kara
2023-05-18 4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
` (9 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
blk_mark_disk_dead does very similar work a a section of del_gendisk:
- set the GD_DEAD flag
- set the capacity to zero
- start a queue drain
but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
the disk, sets the capacity to zero before starting the drain, and both
with sending a uevent and kernel message for this fake capacity change.
Move the exact logic from the more heavily used del_gendisk into
blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/genhd.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
diff --git a/block/genhd.c b/block/genhd.c
index 1cb489b927d50a..d8fe40c7d1f0a2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -572,13 +572,22 @@ EXPORT_SYMBOL(device_add_disk);
*/
void blk_mark_disk_dead(struct gendisk *disk)
{
+ /*
+ * Fail any new I/O.
+ */
set_bit(GD_DEAD, &disk->state);
- blk_queue_start_drain(disk->queue);
+ if (test_bit(GD_OWNS_QUEUE, &disk->state))
+ blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
/*
* Stop buffered writers from dirtying pages that can't be written out.
*/
- set_capacity_and_notify(disk, 0);
+ set_capacity(disk, 0);
+
+ /*
+ * Prevent new I/O from crossing bio_queue_enter().
+ */
+ blk_queue_start_drain(disk->queue);
}
EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
@@ -620,18 +629,7 @@ void del_gendisk(struct gendisk *disk)
fsync_bdev(disk->part0);
__invalidate_device(disk->part0, true);
- /*
- * Fail any new I/O.
- */
- set_bit(GD_DEAD, &disk->state);
- if (test_bit(GD_OWNS_QUEUE, &disk->state))
- blk_queue_flag_set(QUEUE_FLAG_DYING, q);
- set_capacity(disk, 0);
-
- /*
- * Prevent new I/O from crossing bio_queue_enter().
- */
- blk_queue_start_drain(q);
+ blk_mark_disk_dead(disk);
if (!(disk->flags & GENHD_FL_HIDDEN)) {
sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (3 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
` (8 subsequent siblings)
13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Check if GD_DEAD is already set in blk_mark_disk_dead, and don't
duplicate the work already done.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Christian Brauner <brauner@kernel.org>
---
block/genhd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/genhd.c b/block/genhd.c
index d8fe40c7d1f0a2..a744daeed55318 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -575,7 +575,9 @@ void blk_mark_disk_dead(struct gendisk *disk)
/*
* Fail any new I/O.
*/
- set_bit(GD_DEAD, &disk->state);
+ if (test_and_set_bit(GD_DEAD, &disk->state))
+ return;
+
if (test_bit(GD_OWNS_QUEUE, &disk->state))
blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 06/13] block: unhash the inode earlier in delete_partition
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (4 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 12:09 ` Jan Kara
2023-05-18 4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
` (7 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Move the call to remove_inode_hash to the beginning of delete_partition,
as we want to prevent opening a block_device that is about to be removed
ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/partitions/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 49e0496ff23c1e..fa5c707fe0ad2f 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -267,6 +267,12 @@ static void delete_partition(struct block_device *part)
{
lockdep_assert_held(&part->bd_disk->open_mutex);
+ /*
+ * Remove the block device from the inode hash, so that it cannot be
+ * looked up any more even when openers still hold references.
+ */
+ remove_inode_hash(part->bd_inode);
+
fsync_bdev(part);
__invalidate_device(part, true);
@@ -274,12 +280,6 @@ static void delete_partition(struct block_device *part)
kobject_put(part->bd_holder_dir);
device_del(&part->bd_device);
- /*
- * Remove the block device from the inode hash, so that it cannot be
- * looked up any more even when openers still hold references.
- */
- remove_inode_hash(part->bd_inode);
-
put_device(&part->bd_device);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 07/13] block: delete partitions later in del_gendisk
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (5 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 12:55 ` Jan Kara
2023-05-18 4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
` (6 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Delay dropping the block_devices for partitions in del_gendisk until
after the call to blk_mark_disk_dead, so that we can implementat
notification of removed devices in blk_mark_disk_dead.
This requires splitting a lower-level drop_partition helper out of
delete_partition and using that from del_gendisk, while having a
common loop for the whole device and partitions that calls
remove_inode_hash, fsync_bdev and __invalidate_device before the
call to blk_mark_disk_dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk.h | 2 +-
block/genhd.c | 24 +++++++++++++++++++-----
block/partitions/core.c | 19 ++++++++++++-------
3 files changed, 32 insertions(+), 13 deletions(-)
diff --git a/block/blk.h b/block/blk.h
index 45547bcf111938..4363052f90416a 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -409,7 +409,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
int bdev_del_partition(struct gendisk *disk, int partno);
int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
sector_t length);
-void blk_drop_partitions(struct gendisk *disk);
+void drop_partition(struct block_device *part);
void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
diff --git a/block/genhd.c b/block/genhd.c
index a744daeed55318..bd4c4eca31363e 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -615,6 +615,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
void del_gendisk(struct gendisk *disk)
{
struct request_queue *q = disk->queue;
+ struct block_device *part;
+ unsigned long idx;
might_sleep();
@@ -623,16 +625,28 @@ void del_gendisk(struct gendisk *disk)
disk_del_events(disk);
+ /*
+ * Prevent new openers by unlinked the bdev inode, and write out
+ * dirty data before marking the disk dead and stopping all I/O.
+ */
mutex_lock(&disk->open_mutex);
- remove_inode_hash(disk->part0->bd_inode);
- blk_drop_partitions(disk);
+ xa_for_each(&disk->part_tbl, idx, part) {
+ remove_inode_hash(part->bd_inode);
+ fsync_bdev(part);
+ __invalidate_device(part, true);
+ }
mutex_unlock(&disk->open_mutex);
- fsync_bdev(disk->part0);
- __invalidate_device(disk->part0, true);
-
blk_mark_disk_dead(disk);
+ /*
+ * Drop all partitions now that the disk is marked dead.
+ */
+ mutex_lock(&disk->open_mutex);
+ xa_for_each_start(&disk->part_tbl, idx, part, 1)
+ drop_partition(part);
+ mutex_unlock(&disk->open_mutex);
+
if (!(disk->flags & GENHD_FL_HIDDEN)) {
sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
diff --git a/block/partitions/core.c b/block/partitions/core.c
index fa5c707fe0ad2f..31ac815d77a83c 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -263,10 +263,19 @@ struct device_type part_type = {
.uevent = part_uevent,
};
-static void delete_partition(struct block_device *part)
+void drop_partition(struct block_device *part)
{
lockdep_assert_held(&part->bd_disk->open_mutex);
+ xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
+ kobject_put(part->bd_holder_dir);
+
+ device_del(&part->bd_device);
+ put_device(&part->bd_device);
+}
+
+static void delete_partition(struct block_device *part)
+{
/*
* Remove the block device from the inode hash, so that it cannot be
* looked up any more even when openers still hold references.
@@ -276,11 +285,7 @@ static void delete_partition(struct block_device *part)
fsync_bdev(part);
__invalidate_device(part, true);
- xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
- kobject_put(part->bd_holder_dir);
- device_del(&part->bd_device);
-
- put_device(&part->bd_device);
+ drop_partition(part);
}
static ssize_t whole_disk_show(struct device *dev,
@@ -519,7 +524,7 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
return true;
}
-void blk_drop_partitions(struct gendisk *disk)
+static void blk_drop_partitions(struct gendisk *disk)
{
struct block_device *part;
unsigned long idx;
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 08/13] block: remove blk_drop_partitions
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (6 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 12:56 ` Jan Kara
2023-05-18 4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
` (5 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
There is only a single caller left, so fold the loop into that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/partitions/core.c | 16 ++++------------
1 file changed, 4 insertions(+), 12 deletions(-)
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 31ac815d77a83c..2559bb830273eb 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -524,17 +524,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
return true;
}
-static void blk_drop_partitions(struct gendisk *disk)
-{
- struct block_device *part;
- unsigned long idx;
-
- lockdep_assert_held(&disk->open_mutex);
-
- xa_for_each_start(&disk->part_tbl, idx, part, 1)
- delete_partition(part);
-}
-
static bool blk_add_partition(struct gendisk *disk,
struct parsed_partitions *state, int p)
{
@@ -651,6 +640,8 @@ static int blk_add_partitions(struct gendisk *disk)
int bdev_disk_changed(struct gendisk *disk, bool invalidate)
{
+ struct block_device *part;
+ unsigned long idx;
int ret = 0;
lockdep_assert_held(&disk->open_mutex);
@@ -663,8 +654,9 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
return -EBUSY;
sync_blockdev(disk->part0);
invalidate_bdev(disk->part0);
- blk_drop_partitions(disk);
+ xa_for_each_start(&disk->part_tbl, idx, part, 1)
+ delete_partition(part);
clear_bit(GD_NEED_PART_SCAN, &disk->state);
/*
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 09/13] block: introduce holder ops
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (7 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 13:03 ` Jan Kara
2023-05-18 4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
` (4 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims. It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/bdev.c | 41 ++++++++++++++++++++---------
block/fops.c | 2 +-
block/genhd.c | 6 +++--
block/ioctl.c | 3 ++-
drivers/block/drbd/drbd_nl.c | 3 ++-
drivers/block/loop.c | 2 +-
drivers/block/pktcdvd.c | 5 ++--
drivers/block/rnbd/rnbd-srv.c | 2 +-
drivers/block/xen-blkback/xenbus.c | 2 +-
drivers/block/zram/zram_drv.c | 2 +-
drivers/md/bcache/super.c | 2 +-
drivers/md/dm.c | 2 +-
drivers/md/md.c | 2 +-
drivers/mtd/devices/block2mtd.c | 4 +--
drivers/nvme/target/io-cmd-bdev.c | 2 +-
drivers/s390/block/dasd_genhd.c | 2 +-
drivers/target/target_core_iblock.c | 2 +-
drivers/target/target_core_pscsi.c | 3 ++-
fs/btrfs/dev-replace.c | 2 +-
fs/btrfs/volumes.c | 6 ++---
fs/erofs/super.c | 2 +-
fs/ext4/super.c | 3 ++-
fs/f2fs/super.c | 4 +--
fs/jfs/jfs_logmgr.c | 2 +-
fs/nfs/blocklayout/dev.c | 5 ++--
fs/nilfs2/super.c | 2 +-
fs/ocfs2/cluster/heartbeat.c | 2 +-
fs/reiserfs/journal.c | 5 ++--
fs/super.c | 4 +--
fs/xfs/xfs_super.c | 2 +-
include/linux/blk_types.h | 2 ++
include/linux/blkdev.h | 11 +++++---
kernel/power/swap.c | 4 +--
mm/swapfile.c | 3 ++-
34 files changed, 90 insertions(+), 56 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index f5ffcac762e0cd..5c46ff10770638 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -102,7 +102,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
* under live filesystem.
*/
if (!(mode & FMODE_EXCL)) {
- int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
+ int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
if (err)
goto invalidate;
}
@@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
bdev = I_BDEV(inode);
mutex_init(&bdev->bd_fsfreeze_mutex);
spin_lock_init(&bdev->bd_size_lock);
+ mutex_init(&bdev->bd_holder_lock);
bdev->bd_partno = partno;
bdev->bd_inode = inode;
bdev->bd_queue = disk->queue;
@@ -464,13 +465,15 @@ long nr_blockdev_pages(void)
* bd_may_claim - test whether a block device can be claimed
* @bdev: block device of interest
* @holder: holder trying to claim @bdev
+ * @hops: holder ops
*
* Test whether @bdev can be claimed by @holder.
*
* RETURNS:
* %true if @bdev can be claimed, %false otherwise.
*/
-static bool bd_may_claim(struct block_device *bdev, void *holder)
+static bool bd_may_claim(struct block_device *bdev, void *holder,
+ const struct blk_holder_ops *hops)
{
struct block_device *whole = bdev_whole(bdev);
@@ -480,8 +483,11 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
/*
* The same holder can always re-claim.
*/
- if (bdev->bd_holder == holder)
+ if (bdev->bd_holder == holder) {
+ if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
+ return false;
return true;
+ }
return false;
}
@@ -499,6 +505,7 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
* bd_prepare_to_claim - claim a block device
* @bdev: block device of interest
* @holder: holder trying to claim @bdev
+ * @hops: holder ops.
*
* Claim @bdev. This function fails if @bdev is already claimed by another
* holder and waits if another claiming is in progress. return, the caller
@@ -507,7 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
* RETURNS:
* 0 if @bdev can be claimed, -EBUSY otherwise.
*/
-int bd_prepare_to_claim(struct block_device *bdev, void *holder)
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+ const struct blk_holder_ops *hops)
{
struct block_device *whole = bdev_whole(bdev);
@@ -516,7 +524,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
retry:
mutex_lock(&bdev_lock);
/* if someone else claimed, fail */
- if (!bd_may_claim(bdev, holder)) {
+ if (!bd_may_claim(bdev, holder, hops)) {
mutex_unlock(&bdev_lock);
return -EBUSY;
}
@@ -557,12 +565,13 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
* Finish exclusive open of a block device. Mark the device as exlusively
* open by the holder and wake up all waiters for exclusive open to finish.
*/
-static void bd_finish_claiming(struct block_device *bdev, void *holder)
+static void bd_finish_claiming(struct block_device *bdev, void *holder,
+ const struct blk_holder_ops *hops)
{
struct block_device *whole = bdev_whole(bdev);
mutex_lock(&bdev_lock);
- BUG_ON(!bd_may_claim(bdev, holder));
+ BUG_ON(!bd_may_claim(bdev, holder, hops));
/*
* Note that for a whole device bd_holders will be incremented twice,
* and bd_holder will be set to bd_may_claim before being set to holder
@@ -570,7 +579,10 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
whole->bd_holders++;
whole->bd_holder = bd_may_claim;
bdev->bd_holders++;
+ mutex_lock(&bdev->bd_holder_lock);
bdev->bd_holder = holder;
+ bdev->bd_holder_ops = hops;
+ mutex_unlock(&bdev->bd_holder_lock);
bd_clear_claiming(whole, holder);
mutex_unlock(&bdev_lock);
}
@@ -605,7 +617,10 @@ static void bd_end_claim(struct block_device *bdev)
WARN_ON_ONCE(--bdev->bd_holders < 0);
WARN_ON_ONCE(--whole->bd_holders < 0);
if (!bdev->bd_holders) {
+ mutex_lock(&bdev->bd_holder_lock);
bdev->bd_holder = NULL;
+ bdev->bd_holder_ops = NULL;
+ mutex_unlock(&bdev->bd_holder_lock);
if (bdev->bd_write_holder)
unblock = true;
}
@@ -735,6 +750,7 @@ void blkdev_put_no_open(struct block_device *bdev)
* @dev: device number of block device to open
* @mode: FMODE_* mask
* @holder: exclusive holder identifier
+ * @hops: holder operations
*
* Open the block device described by device number @dev. If @mode includes
* %FMODE_EXCL, the block device is opened with exclusive access. Specifying
@@ -751,7 +767,8 @@ void blkdev_put_no_open(struct block_device *bdev)
* RETURNS:
* Reference to the block_device on success, ERR_PTR(-errno) on failure.
*/
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+ const struct blk_holder_ops *hops)
{
bool unblock_events = true;
struct block_device *bdev;
@@ -771,7 +788,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
disk = bdev->bd_disk;
if (mode & FMODE_EXCL) {
- ret = bd_prepare_to_claim(bdev, holder);
+ ret = bd_prepare_to_claim(bdev, holder, hops);
if (ret)
goto put_blkdev;
}
@@ -791,7 +808,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
if (ret)
goto put_module;
if (mode & FMODE_EXCL) {
- bd_finish_claiming(bdev, holder);
+ bd_finish_claiming(bdev, holder, hops);
/*
* Block event polling for write claims if requested. Any write
@@ -842,7 +859,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
* Reference to the block_device on success, ERR_PTR(-errno) on failure.
*/
struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
- void *holder)
+ void *holder, const struct blk_holder_ops *hops)
{
struct block_device *bdev;
dev_t dev;
@@ -852,7 +869,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
if (error)
return ERR_PTR(error);
- bdev = blkdev_get_by_dev(dev, mode, holder);
+ bdev = blkdev_get_by_dev(dev, mode, holder, hops);
if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
blkdev_put(bdev, mode);
return ERR_PTR(-EACCES);
diff --git a/block/fops.c b/block/fops.c
index d2e6be4e3d1c7d..2ac5ea878fa4cc 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -490,7 +490,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
if ((filp->f_flags & O_ACCMODE) == 3)
filp->f_mode |= FMODE_WRITE_IOCTL;
- bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
+ bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp, NULL);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
diff --git a/block/genhd.c b/block/genhd.c
index bd4c4eca31363e..226ddb8329f751 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -370,13 +370,15 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
* scanners.
*/
if (!(mode & FMODE_EXCL)) {
- ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
+ ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
+ NULL);
if (ret)
return ret;
}
set_bit(GD_NEED_PART_SCAN, &disk->state);
- bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
+ bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL,
+ NULL);
if (IS_ERR(bdev))
ret = PTR_ERR(bdev);
else
diff --git a/block/ioctl.c b/block/ioctl.c
index 9c5f637ff153f8..c7d7d4345edb4f 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -454,7 +454,8 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
if (mode & FMODE_EXCL)
return set_blocksize(bdev, n);
- if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
+ if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev,
+ NULL)))
return -EBUSY;
ret = set_blocksize(bdev, n);
blkdev_put(bdev, mode | FMODE_EXCL);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1a5d3d72d91d27..cab59dab3410aa 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1641,7 +1641,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
int err = 0;
bdev = blkdev_get_by_path(bdev_path,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
+ FMODE_READ | FMODE_WRITE | FMODE_EXCL,
+ claim_ptr, NULL);
if (IS_ERR(bdev)) {
drbd_err(device, "open(\"%s\") failed with %ld\n",
bdev_path, PTR_ERR(bdev));
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bc31bb7072a2cb..a73c857f5bfed0 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1015,7 +1015,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
* here to avoid changing device under exclusive owner.
*/
if (!(mode & FMODE_EXCL)) {
- error = bd_prepare_to_claim(bdev, loop_configure);
+ error = bd_prepare_to_claim(bdev, loop_configure, NULL);
if (error)
goto out_putf;
}
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index d5d7884cedd477..377f8b34535294 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2125,7 +2125,8 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
* to read/write from/to it. It is already opened in O_NONBLOCK mode
* so open should not fail.
*/
- bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
+ bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd,
+ NULL);
if (IS_ERR(bdev)) {
ret = PTR_ERR(bdev);
goto out;
@@ -2530,7 +2531,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
}
}
- bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
+ bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL, NULL);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
sdev = scsi_device_from_queue(bdev->bd_disk->queue);
diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
index 2cfed2e58d646f..cec22bbae2f9a5 100644
--- a/drivers/block/rnbd/rnbd-srv.c
+++ b/drivers/block/rnbd/rnbd-srv.c
@@ -719,7 +719,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
goto reject;
}
- bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
+ bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE, NULL);
if (IS_ERR(bdev)) {
ret = PTR_ERR(bdev);
pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 4807af1d580593..43b36da9b3544d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
vbd->pdevice = MKDEV(major, minor);
bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
- FMODE_READ : FMODE_WRITE, NULL);
+ FMODE_READ : FMODE_WRITE, NULL, NULL);
if (IS_ERR(bdev)) {
pr_warn("xen_vbd_create: device %08x could not be opened\n",
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f6d90f1ba5cf7b..ef9dc4ef6796da 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -508,7 +508,7 @@ static ssize_t backing_dev_store(struct device *dev,
}
bdev = blkdev_get_by_dev(inode->i_rdev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
+ FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram, NULL);
if (IS_ERR(bdev)) {
err = PTR_ERR(bdev);
bdev = NULL;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 7e9d19fd21ddd5..d84c09a73af803 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2560,7 +2560,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
err = "failed to open device";
bdev = blkdev_get_by_path(strim(path),
FMODE_READ|FMODE_WRITE|FMODE_EXCL,
- sb);
+ sb, NULL);
if (IS_ERR(bdev)) {
if (bdev == ERR_PTR(-EBUSY)) {
dev_t dev;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3b694ba3a106e6..d759f8bdb3df2f 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
return ERR_PTR(-ENOMEM);
refcount_set(&td->count, 1);
- bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
+ bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr, NULL);
if (IS_ERR(bdev)) {
r = PTR_ERR(bdev);
goto out_free_td;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b34446f..60ab5c4bee77c5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3642,7 +3642,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
rdev->bdev = blkdev_get_by_dev(newdev,
FMODE_READ | FMODE_WRITE | FMODE_EXCL,
- super_format == -2 ? &claim_rdev : rdev);
+ super_format == -2 ? &claim_rdev : rdev, NULL);
if (IS_ERR(rdev->bdev)) {
pr_warn("md: could not open device unknown-block(%u,%u).\n",
MAJOR(newdev), MINOR(newdev));
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 4cd37ec45762b6..7ac82c6fe35024 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -235,7 +235,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
return NULL;
/* Get a handle on the device */
- bdev = blkdev_get_by_path(devname, mode, dev);
+ bdev = blkdev_get_by_path(devname, mode, dev, NULL);
#ifndef MODULE
/*
@@ -257,7 +257,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
devt = name_to_dev_t(devname);
if (!devt)
continue;
- bdev = blkdev_get_by_dev(devt, mode, dev);
+ bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
}
#endif
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index c2d6cea0236b0a..9b6d6d85c72544 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
return -ENOTBLK;
ns->bdev = blkdev_get_by_path(ns->device_path,
- FMODE_READ | FMODE_WRITE, NULL);
+ FMODE_READ | FMODE_WRITE, NULL, NULL);
if (IS_ERR(ns->bdev)) {
ret = PTR_ERR(ns->bdev);
if (ret != -ENOTBLK) {
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 998a961e170417..f21198bc483e1a 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -130,7 +130,7 @@ int dasd_scan_partitions(struct dasd_block *block)
struct block_device *bdev;
int rc;
- bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
+ bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL, NULL);
if (IS_ERR(bdev)) {
DBF_DEV_EVENT(DBF_ERR, block->base,
"scan partitions error, blkdev_get returned %ld",
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index cc838ffd129472..a5cbbefa78ee4e 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -114,7 +114,7 @@ static int iblock_configure_device(struct se_device *dev)
else
dev->dev_flags |= DF_READ_ONLY;
- bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
+ bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
if (IS_ERR(bd)) {
ret = PTR_ERR(bd);
goto out_free_bioset;
diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
index e7425549e39c73..e3494e036c6c85 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -367,7 +367,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
* for TYPE_DISK and TYPE_ZBC using supplied udev_path
*/
bd = blkdev_get_by_path(dev->udev_path,
- FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
+ FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv,
+ NULL);
if (IS_ERR(bd)) {
pr_err("pSCSI: blkdev_get_by_path() failed\n");
scsi_device_put(sd);
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 78696d331639bd..4de4984fa99ba3 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -258,7 +258,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
}
bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
- fs_info->bdev_holder);
+ fs_info->bdev_holder, NULL);
if (IS_ERR(bdev)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
return PTR_ERR(bdev);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 841e799dece51b..784ccc8f6c69c1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -496,7 +496,7 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
{
int ret;
- *bdev = blkdev_get_by_path(device_path, flags, holder);
+ *bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
if (IS_ERR(*bdev)) {
ret = PTR_ERR(*bdev);
@@ -1377,7 +1377,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
* values temporarily, as the device paths of the fsid are the only
* required information for assembling the volume.
*/
- bdev = blkdev_get_by_path(path, flags, holder);
+ bdev = blkdev_get_by_path(path, flags, holder, NULL);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
@@ -2629,7 +2629,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
return -EROFS;
bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
- fs_info->bdev_holder);
+ fs_info->bdev_holder, NULL);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 811ab66d805ede..6c263e9cd38b2a 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -254,7 +254,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
dif->fscache = fscache;
} else if (!sbi->devs->flatdev) {
bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
- sb->s_type);
+ sb->s_type, NULL);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
dif->bdev = bdev;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 9680fe753e599a..865625089ecca3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1103,7 +1103,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
{
struct block_device *bdev;
- bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
+ bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
+ NULL);
if (IS_ERR(bdev))
goto fail;
return bdev;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 9f15b03037dba9..7c34ab082f1382 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4025,7 +4025,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
/* Single zoned block device mount */
FDEV(0).bdev =
blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
- sbi->sb->s_mode, sbi->sb->s_type);
+ sbi->sb->s_mode, sbi->sb->s_type, NULL);
} else {
/* Multi-device mount */
memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
@@ -4044,7 +4044,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
sbi->log_blocks_per_seg) - 1;
}
FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
- sbi->sb->s_mode, sbi->sb->s_type);
+ sbi->sb->s_mode, sbi->sb->s_type, NULL);
}
if (IS_ERR(FDEV(i).bdev))
return PTR_ERR(FDEV(i).bdev);
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 695415cbfe985b..8c55030c57ed52 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1101,7 +1101,7 @@ int lmLogOpen(struct super_block *sb)
*/
bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
- log);
+ log, NULL);
if (IS_ERR(bdev)) {
rc = PTR_ERR(bdev);
goto free;
diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
index fea5f8821da5ef..38b066ca699ed7 100644
--- a/fs/nfs/blocklayout/dev.c
+++ b/fs/nfs/blocklayout/dev.c
@@ -243,7 +243,7 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
if (!dev)
return -EIO;
- bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
+ bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL, NULL);
if (IS_ERR(bdev)) {
printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
@@ -312,7 +312,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
if (!devname)
return ERR_PTR(-ENOMEM);
- bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
+ bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL,
+ NULL);
if (IS_ERR(bdev)) {
pr_warn("pNFS: failed to open device %s (%ld)\n",
devname, PTR_ERR(bdev));
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 77f1e5778d1c84..91bfbd973d1d53 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1285,7 +1285,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
- sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+ sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
if (IS_ERR(sd.bdev))
return ERR_CAST(sd.bdev);
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 60b97c92e2b25e..6b13b8c3f2b8af 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1786,7 +1786,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
goto out2;
reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
- FMODE_WRITE | FMODE_READ, NULL);
+ FMODE_WRITE | FMODE_READ, NULL, NULL);
if (IS_ERR(reg->hr_bdev)) {
ret = PTR_ERR(reg->hr_bdev);
reg->hr_bdev = NULL;
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 4d11d60f493c14..5e4db9a0c8e5a3 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2616,7 +2616,7 @@ static int journal_init_dev(struct super_block *super,
if (jdev == super->s_dev)
blkdev_mode &= ~FMODE_EXCL;
journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
- journal);
+ journal, NULL);
journal->j_dev_mode = blkdev_mode;
if (IS_ERR(journal->j_dev_bd)) {
result = PTR_ERR(journal->j_dev_bd);
@@ -2632,7 +2632,8 @@ static int journal_init_dev(struct super_block *super,
}
journal->j_dev_mode = blkdev_mode;
- journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
+ journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal,
+ NULL);
if (IS_ERR(journal->j_dev_bd)) {
result = PTR_ERR(journal->j_dev_bd);
journal->j_dev_bd = NULL;
diff --git a/fs/super.c b/fs/super.c
index 34afe411cf2bc3..012ce140080375 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1248,7 +1248,7 @@ int get_tree_bdev(struct fs_context *fc,
if (!fc->source)
return invalf(fc, "No source specified");
- bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
+ bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
if (IS_ERR(bdev)) {
errorf(fc, "%s: Can't open blockdev", fc->source);
return PTR_ERR(bdev);
@@ -1333,7 +1333,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
- bdev = blkdev_get_by_path(dev_name, mode, fs_type);
+ bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 7e706255f16502..5684c538eb76dc 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -386,7 +386,7 @@ xfs_blkdev_get(
int error = 0;
*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
- mp);
+ mp, NULL);
if (IS_ERR(*bdevp)) {
error = PTR_ERR(*bdevp);
xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 740afe80f29786..84a931caef514e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -55,6 +55,8 @@ struct block_device {
struct super_block * bd_super;
void * bd_claiming;
void * bd_holder;
+ const struct blk_holder_ops *bd_holder_ops;
+ struct mutex bd_holder_lock;
/* The counter of freeze processes */
int bd_fsfreeze_count;
int bd_holders;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b441e633f4dd49..c94f3b63c86422 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1465,10 +1465,15 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
#define BLKDEV_MAJOR_MAX 0
#endif
+struct blk_holder_ops {
+};
+
+struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
+ const struct blk_holder_ops *hops);
struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
- void *holder);
-struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
-int bd_prepare_to_claim(struct block_device *bdev, void *holder);
+ void *holder, const struct blk_holder_ops *hops);
+int bd_prepare_to_claim(struct block_device *bdev, void *holder,
+ const struct blk_holder_ops *hops);
void bd_abort_claiming(struct block_device *bdev, void *holder);
void blkdev_put(struct block_device *bdev, fmode_t mode);
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 92e41ed292ada8..801c411530d11c 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -357,7 +357,7 @@ static int swsusp_swap_check(void)
root_swap = res;
hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
- NULL);
+ NULL, NULL);
if (IS_ERR(hib_resume_bdev))
return PTR_ERR(hib_resume_bdev);
@@ -1524,7 +1524,7 @@ int swsusp_check(void)
mode |= FMODE_EXCL;
hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
- mode, &holder);
+ mode, &holder, NULL);
if (!IS_ERR(hib_resume_bdev)) {
set_blocksize(hib_resume_bdev, PAGE_SIZE);
clear_page(swsusp_header);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 274bbf79748006..cfbcf7d5705f5f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2770,7 +2770,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
if (S_ISBLK(inode->i_mode)) {
p->bdev = blkdev_get_by_dev(inode->i_rdev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
+ FMODE_READ | FMODE_WRITE | FMODE_EXCL, p,
+ NULL);
if (IS_ERR(p->bdev)) {
error = PTR_ERR(p->bdev);
p->bdev = NULL;
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 10/13] block: add a mark_dead holder operation
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (8 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-30 13:05 ` Jan Kara
2023-05-18 4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
` (3 subsequent siblings)
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
to notify the holder that the block device it is using has been marked dead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
---
block/genhd.c | 24 ++++++++++++++++++++++++
include/linux/blkdev.h | 1 +
2 files changed, 25 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c
index 226ddb8329f751..42aebf0e1e2628 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -565,6 +565,28 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
}
EXPORT_SYMBOL(device_add_disk);
+static void blk_report_disk_dead(struct gendisk *disk)
+{
+ struct block_device *bdev;
+ unsigned long idx;
+
+ rcu_read_lock();
+ xa_for_each(&disk->part_tbl, idx, bdev) {
+ if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
+ continue;
+ rcu_read_unlock();
+
+ mutex_lock(&bdev->bd_holder_lock);
+ if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
+ bdev->bd_holder_ops->mark_dead(bdev);
+ mutex_unlock(&bdev->bd_holder_lock);
+
+ put_device(&bdev->bd_device);
+ rcu_read_lock();
+ }
+ rcu_read_unlock();
+}
+
/**
* blk_mark_disk_dead - mark a disk as dead
* @disk: disk to mark as dead
@@ -592,6 +614,8 @@ void blk_mark_disk_dead(struct gendisk *disk)
* Prevent new I/O from crossing bio_queue_enter().
*/
blk_queue_start_drain(disk->queue);
+
+ blk_report_disk_dead(disk);
}
EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c94f3b63c86422..41f894f6355f96 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1466,6 +1466,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
#endif
struct blk_holder_ops {
+ void (*mark_dead)(struct block_device *bdev);
};
struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 11/13] fs: add a method to shut down the file system
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (9 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
` (2 subsequent siblings)
13 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Add a new ->shutdown super operation that can be used to tell the file
system to shut down, and call it from newly created holder ops when the
block device under a file system shuts down.
This only covers the main block device for "simple" file systems using
get_tree_bdev / mount_bdev. File systems their own get_tree method
or opening additional devices will need to set up their own
blk_holder_ops.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/super.c | 21 +++++++++++++++++++--
include/linux/fs.h | 1 +
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 012ce140080375..f127589700ab25 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1206,6 +1206,22 @@ int get_tree_keyed(struct fs_context *fc,
EXPORT_SYMBOL(get_tree_keyed);
#ifdef CONFIG_BLOCK
+static void fs_mark_dead(struct block_device *bdev)
+{
+ struct super_block *sb;
+
+ sb = get_super(bdev);
+ if (!sb)
+ return;
+
+ if (sb->s_op->shutdown)
+ sb->s_op->shutdown(sb);
+ drop_super(sb);
+}
+
+static const struct blk_holder_ops fs_holder_ops = {
+ .mark_dead = fs_mark_dead,
+};
static int set_bdev_super(struct super_block *s, void *data)
{
@@ -1248,7 +1264,8 @@ int get_tree_bdev(struct fs_context *fc,
if (!fc->source)
return invalf(fc, "No source specified");
- bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
+ bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type,
+ &fs_holder_ops);
if (IS_ERR(bdev)) {
errorf(fc, "%s: Can't open blockdev", fc->source);
return PTR_ERR(bdev);
@@ -1333,7 +1350,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
if (!(flags & SB_RDONLY))
mode |= FMODE_WRITE;
- bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
+ bdev = blkdev_get_by_path(dev_name, mode, fs_type, &fs_holder_ops);
if (IS_ERR(bdev))
return ERR_CAST(bdev);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 21a98168085641..cf3042641b9b30 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1932,6 +1932,7 @@ struct super_operations {
struct shrink_control *);
long (*free_cached_objects)(struct super_block *,
struct shrink_control *);
+ void (*shutdown)(struct super_block *sb);
};
/*
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 12/13] xfs: wire up sops->shutdown
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (10 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 5:03 ` Darrick J. Wong
2023-05-18 4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
2023-05-19 2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Wire up the shutdown method to shut down the file system when the
underlying block device is marked dead. Add a new message to
clearly distinguish this shutdown reason from other shutdowns.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/xfs/xfs_fsops.c | 3 +++
fs/xfs/xfs_mount.h | 4 +++-
fs/xfs/xfs_super.c | 8 ++++++++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 13851c0d640bc8..9ebb8333a30800 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -534,6 +534,9 @@ xfs_do_force_shutdown(
} else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
tag = XFS_PTAG_SHUTDOWN_CORRUPT;
why = "Corruption of on-disk metadata";
+ } else if (flags & SHUTDOWN_DEVICE_REMOVED) {
+ tag = XFS_PTAG_SHUTDOWN_IOERROR;
+ why = "Block device removal";
} else {
tag = XFS_PTAG_SHUTDOWN_IOERROR;
why = "Metadata I/O Error";
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index aaaf5ec13492d2..429a5e12c1036e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -457,12 +457,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
#define SHUTDOWN_FORCE_UMOUNT (1u << 2) /* shutdown from a forced unmount */
#define SHUTDOWN_CORRUPT_INCORE (1u << 3) /* corrupt in-memory structures */
#define SHUTDOWN_CORRUPT_ONDISK (1u << 4) /* corrupt metadata on device */
+#define SHUTDOWN_DEVICE_REMOVED (1u << 5) /* device removed underneath us */
#define XFS_SHUTDOWN_STRINGS \
{ SHUTDOWN_META_IO_ERROR, "metadata_io" }, \
{ SHUTDOWN_LOG_IO_ERROR, "log_io" }, \
{ SHUTDOWN_FORCE_UMOUNT, "force_umount" }, \
- { SHUTDOWN_CORRUPT_INCORE, "corruption" }
+ { SHUTDOWN_CORRUPT_INCORE, "corruption" }, \
+ { SHUTDOWN_DEVICE_REMOVED, "device_removed" }
/*
* Flags for xfs_mountfs
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5684c538eb76dc..eb469b8f9a0497 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1159,6 +1159,13 @@ xfs_fs_free_cached_objects(
return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
}
+static void
+xfs_fs_shutdown(
+ struct super_block *sb)
+{
+ xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
+}
+
static const struct super_operations xfs_super_operations = {
.alloc_inode = xfs_fs_alloc_inode,
.destroy_inode = xfs_fs_destroy_inode,
@@ -1172,6 +1179,7 @@ static const struct super_operations xfs_super_operations = {
.show_options = xfs_fs_show_options,
.nr_cached_objects = xfs_fs_nr_cached_objects,
.free_cached_objects = xfs_fs_free_cached_objects,
+ .shutdown = xfs_fs_shutdown,
};
static int
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (11 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
@ 2023-05-18 4:23 ` Christoph Hellwig
2023-05-18 5:40 ` Dave Chinner
2023-05-19 2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
13 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-18 4:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Al Viro, Christian Brauner, Darrick J. Wong, Jan Kara,
linux-block, linux-fsdevel, linux-xfs
Implement a set of holder_ops that shut down the file system when the
block device used as log or RT device is removed undeneath the file
system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
fs/xfs/xfs_super.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index eb469b8f9a0497..75d37bbc5415fc 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -377,6 +377,17 @@ xfs_setup_dax_always(
return 0;
}
+static void
+xfs_hop_mark_dead(
+ struct block_device *bdev)
+{
+ xfs_force_shutdown(bdev->bd_holder, SHUTDOWN_DEVICE_REMOVED);
+}
+
+static const struct blk_holder_ops xfs_holder_ops = {
+ .mark_dead = xfs_hop_mark_dead,
+};
+
STATIC int
xfs_blkdev_get(
xfs_mount_t *mp,
@@ -386,7 +397,7 @@ xfs_blkdev_get(
int error = 0;
*bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
- mp, NULL);
+ mp, &xfs_holder_ops);
if (IS_ERR(*bdevp)) {
error = PTR_ERR(*bdevp);
xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 12/13] xfs: wire up sops->shutdown
2023-05-18 4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
@ 2023-05-18 5:03 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2023-05-18 5:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Jan Kara, linux-block,
linux-fsdevel, linux-xfs
On Thu, May 18, 2023 at 06:23:21AM +0200, Christoph Hellwig wrote:
> Wire up the shutdown method to shut down the file system when the
> underlying block device is marked dead. Add a new message to
> clearly distinguish this shutdown reason from other shutdowns.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_fsops.c | 3 +++
> fs/xfs/xfs_mount.h | 4 +++-
> fs/xfs/xfs_super.c | 8 ++++++++
> 3 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 13851c0d640bc8..9ebb8333a30800 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -534,6 +534,9 @@ xfs_do_force_shutdown(
> } else if (flags & SHUTDOWN_CORRUPT_ONDISK) {
> tag = XFS_PTAG_SHUTDOWN_CORRUPT;
> why = "Corruption of on-disk metadata";
> + } else if (flags & SHUTDOWN_DEVICE_REMOVED) {
> + tag = XFS_PTAG_SHUTDOWN_IOERROR;
> + why = "Block device removal";
> } else {
> tag = XFS_PTAG_SHUTDOWN_IOERROR;
> why = "Metadata I/O Error";
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index aaaf5ec13492d2..429a5e12c1036e 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -457,12 +457,14 @@ void xfs_do_force_shutdown(struct xfs_mount *mp, uint32_t flags, char *fname,
> #define SHUTDOWN_FORCE_UMOUNT (1u << 2) /* shutdown from a forced unmount */
> #define SHUTDOWN_CORRUPT_INCORE (1u << 3) /* corrupt in-memory structures */
> #define SHUTDOWN_CORRUPT_ONDISK (1u << 4) /* corrupt metadata on device */
> +#define SHUTDOWN_DEVICE_REMOVED (1u << 5) /* device removed underneath us */
>
> #define XFS_SHUTDOWN_STRINGS \
> { SHUTDOWN_META_IO_ERROR, "metadata_io" }, \
> { SHUTDOWN_LOG_IO_ERROR, "log_io" }, \
> { SHUTDOWN_FORCE_UMOUNT, "force_umount" }, \
> - { SHUTDOWN_CORRUPT_INCORE, "corruption" }
> + { SHUTDOWN_CORRUPT_INCORE, "corruption" }, \
> + { SHUTDOWN_DEVICE_REMOVED, "device_removed" }
>
> /*
> * Flags for xfs_mountfs
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 5684c538eb76dc..eb469b8f9a0497 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1159,6 +1159,13 @@ xfs_fs_free_cached_objects(
> return xfs_reclaim_inodes_nr(XFS_M(sb), sc->nr_to_scan);
> }
>
> +static void
> +xfs_fs_shutdown(
> + struct super_block *sb)
> +{
> + xfs_force_shutdown(XFS_M(sb), SHUTDOWN_DEVICE_REMOVED);
> +}
> +
> static const struct super_operations xfs_super_operations = {
> .alloc_inode = xfs_fs_alloc_inode,
> .destroy_inode = xfs_fs_destroy_inode,
> @@ -1172,6 +1179,7 @@ static const struct super_operations xfs_super_operations = {
> .show_options = xfs_fs_show_options,
> .nr_cached_objects = xfs_fs_nr_cached_objects,
> .free_cached_objects = xfs_fs_free_cached_objects,
> + .shutdown = xfs_fs_shutdown,
> };
>
> static int
> --
> 2.39.2
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices
2023-05-18 4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
@ 2023-05-18 5:40 ` Dave Chinner
0 siblings, 0 replies; 28+ messages in thread
From: Dave Chinner @ 2023-05-18 5:40 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu, May 18, 2023 at 06:23:22AM +0200, Christoph Hellwig wrote:
> Implement a set of holder_ops that shut down the file system when the
> block device used as log or RT device is removed undeneath the file
> system.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
> fs/xfs/xfs_super.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index eb469b8f9a0497..75d37bbc5415fc 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -377,6 +377,17 @@ xfs_setup_dax_always(
> return 0;
> }
>
> +static void
> +xfs_hop_mark_dead(
> + struct block_device *bdev)
I'd prefer these ops to be named "xfs_bdev_...." to indicate the are
fs bdev methods similar to how the super ops use "xfs_fs_...."
to indicate they are fs superblock methods....
Other that this this is fine.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: introduce bdev holder ops and a file system shutdown method v2
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
` (12 preceding siblings ...)
2023-05-18 4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
@ 2023-05-19 2:00 ` Theodore Ts'o
2023-05-19 4:11 ` Christoph Hellwig
13 siblings, 1 reply; 28+ messages in thread
From: Theodore Ts'o @ 2023-05-19 2:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> Hi all,
>
> this series fixes the long standing problem that we never had a good way
> to communicate block device events to the user of the block device.
>
> It fixes this by introducing a new set of holder ops registered at
> blkdev_get_by_* time for the exclusive holder, and then wire that up
> to a shutdown super operation to report the block device remove to the
> file systems.
Thanks for working on this! Is there going to be an fstest which
simulates a device removal while we're running fsstress or some such,
so we can exercise full device removal path?
Thanks,
- Ted
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: introduce bdev holder ops and a file system shutdown method v2
2023-05-19 2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
@ 2023-05-19 4:11 ` Christoph Hellwig
2023-05-23 0:58 ` Darrick J. Wong
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-05-19 4:11 UTC (permalink / raw)
To: Theodore Ts'o
Cc: Christoph Hellwig, Jens Axboe, Al Viro, Christian Brauner,
Darrick J. Wong, Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu, May 18, 2023 at 10:00:12PM -0400, Theodore Ts'o wrote:
> On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series fixes the long standing problem that we never had a good way
> > to communicate block device events to the user of the block device.
> >
> > It fixes this by introducing a new set of holder ops registered at
> > blkdev_get_by_* time for the exclusive holder, and then wire that up
> > to a shutdown super operation to report the block device remove to the
> > file systems.
>
> Thanks for working on this! Is there going to be an fstest which
> simulates a device removal while we're running fsstress or some such,
> so we can exercise full device removal path?
So the problem with xfstests is that there isn't really any generic
way to remove a block device, and even less so to put it back.
xfstests has some scsi_debug based tests, maybe I can cook something up
for that. My testing has been with nvme, so another option would be
to add nvme-loop support to xfstests and use that. I'll see what I can
do.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: introduce bdev holder ops and a file system shutdown method v2
2023-05-19 4:11 ` Christoph Hellwig
@ 2023-05-23 0:58 ` Darrick J. Wong
0 siblings, 0 replies; 28+ messages in thread
From: Darrick J. Wong @ 2023-05-23 0:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Theodore Ts'o, Jens Axboe, Al Viro, Christian Brauner,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Fri, May 19, 2023 at 06:11:36AM +0200, Christoph Hellwig wrote:
> On Thu, May 18, 2023 at 10:00:12PM -0400, Theodore Ts'o wrote:
> > On Thu, May 18, 2023 at 06:23:09AM +0200, Christoph Hellwig wrote:
> > > Hi all,
> > >
> > > this series fixes the long standing problem that we never had a good way
> > > to communicate block device events to the user of the block device.
> > >
> > > It fixes this by introducing a new set of holder ops registered at
> > > blkdev_get_by_* time for the exclusive holder, and then wire that up
> > > to a shutdown super operation to report the block device remove to the
> > > file systems.
> >
> > Thanks for working on this! Is there going to be an fstest which
> > simulates a device removal while we're running fsstress or some such,
> > so we can exercise full device removal path?
>
>
> So the problem with xfstests is that there isn't really any generic
> way to remove a block device, and even less so to put it back.
>
> xfstests has some scsi_debug based tests, maybe I can cook something up
> for that. My testing has been with nvme, so another option would be
> to add nvme-loop support to xfstests and use that. I'll see what I can
> do.
Could you make dm-error accept a 'message' telling it to invoke all
these bdev removal actions? There's already a bunch of helpers in
fstests to make that less awful for test authors.
--D
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/13] block: refactor bd_may_claim
2023-05-18 4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
@ 2023-05-30 11:41 ` Jan Kara
2023-06-01 8:11 ` Christoph Hellwig
0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2023-05-30 11:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:11, Christoph Hellwig wrote:
> The long if/else chain obsfucates the actual logic. Tidy it up to be
> more structured. Also drop the whole argument, as it can be trivially
> derived from bdev using bdev_whole, and having the bdev_whole in the
> function makes it easier to follow.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good to me. Just one nit below but regardless of how you decided feel
free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
> +static bool bd_may_claim(struct block_device *bdev, void *holder)
> {
> - if (bdev->bd_holder == holder)
> - return true; /* already a holder */
> - else if (bdev->bd_holder != NULL)
> - return false; /* held by someone else */
> - else if (whole == bdev)
> - return true; /* is a whole device which isn't held */
> -
> - else if (whole->bd_holder == bd_may_claim)
> - return true; /* is a partition of a device that is being partitioned */
> - else if (whole->bd_holder != NULL)
> - return false; /* is a partition of a held device */
> - else
> - return true; /* is a partition of an un-held device */
> + struct block_device *whole = bdev_whole(bdev);
> +
> + if (bdev->bd_holder) {
> + /*
> + * The same holder can always re-claim.
> + */
> + if (bdev->bd_holder == holder)
> + return true;
> + return false;
With this simple condition I'd just do:
/* The same holder can always re-claim. */
return bdev->bd_holder == holder;
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk
2023-05-18 4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
@ 2023-05-30 11:58 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 11:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:13, Christoph Hellwig wrote:
> blk_mark_disk_dead does very similar work a a section of del_gendisk:
>
> - set the GD_DEAD flag
> - set the capacity to zero
> - start a queue drain
>
> but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by
> the disk, sets the capacity to zero before starting the drain, and both
> with sending a uevent and kernel message for this fake capacity change.
>
> Move the exact logic from the more heavily used del_gendisk into
> blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good to me. I've convinced myself the removed notification of device
capacity going to 0 before KOBJ_REMOVE event should not matter to userspace
;). Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/genhd.c | 26 ++++++++++++--------------
> 1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 1cb489b927d50a..d8fe40c7d1f0a2 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -572,13 +572,22 @@ EXPORT_SYMBOL(device_add_disk);
> */
> void blk_mark_disk_dead(struct gendisk *disk)
> {
> + /*
> + * Fail any new I/O.
> + */
> set_bit(GD_DEAD, &disk->state);
> - blk_queue_start_drain(disk->queue);
> + if (test_bit(GD_OWNS_QUEUE, &disk->state))
> + blk_queue_flag_set(QUEUE_FLAG_DYING, disk->queue);
>
> /*
> * Stop buffered writers from dirtying pages that can't be written out.
> */
> - set_capacity_and_notify(disk, 0);
> + set_capacity(disk, 0);
> +
> + /*
> + * Prevent new I/O from crossing bio_queue_enter().
> + */
> + blk_queue_start_drain(disk->queue);
> }
> EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>
> @@ -620,18 +629,7 @@ void del_gendisk(struct gendisk *disk)
> fsync_bdev(disk->part0);
> __invalidate_device(disk->part0, true);
>
> - /*
> - * Fail any new I/O.
> - */
> - set_bit(GD_DEAD, &disk->state);
> - if (test_bit(GD_OWNS_QUEUE, &disk->state))
> - blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> - set_capacity(disk, 0);
> -
> - /*
> - * Prevent new I/O from crossing bio_queue_enter().
> - */
> - blk_queue_start_drain(q);
> + blk_mark_disk_dead(disk);
>
> if (!(disk->flags & GENHD_FL_HIDDEN)) {
> sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 06/13] block: unhash the inode earlier in delete_partition
2023-05-18 4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
@ 2023-05-30 12:09 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:09 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:15, Christoph Hellwig wrote:
> Move the call to remove_inode_hash to the beginning of delete_partition,
> as we want to prevent opening a block_device that is about to be removed
> ASAP.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
The justification looks a bit bogus because we hold disk->open_mutex in
delete_partition() which serializes with any opens anyway. But it's a
harmless code move so if it helps later then sure... Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/partitions/core.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 49e0496ff23c1e..fa5c707fe0ad2f 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -267,6 +267,12 @@ static void delete_partition(struct block_device *part)
> {
> lockdep_assert_held(&part->bd_disk->open_mutex);
>
> + /*
> + * Remove the block device from the inode hash, so that it cannot be
> + * looked up any more even when openers still hold references.
> + */
> + remove_inode_hash(part->bd_inode);
> +
> fsync_bdev(part);
> __invalidate_device(part, true);
>
> @@ -274,12 +280,6 @@ static void delete_partition(struct block_device *part)
> kobject_put(part->bd_holder_dir);
> device_del(&part->bd_device);
>
> - /*
> - * Remove the block device from the inode hash, so that it cannot be
> - * looked up any more even when openers still hold references.
> - */
> - remove_inode_hash(part->bd_inode);
> -
> put_device(&part->bd_device);
> }
>
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 07/13] block: delete partitions later in del_gendisk
2023-05-18 4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
@ 2023-05-30 12:55 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:16, Christoph Hellwig wrote:
> Delay dropping the block_devices for partitions in del_gendisk until
> after the call to blk_mark_disk_dead, so that we can implementat
> notification of removed devices in blk_mark_disk_dead.
>
> This requires splitting a lower-level drop_partition helper out of
> delete_partition and using that from del_gendisk, while having a
> common loop for the whole device and partitions that calls
> remove_inode_hash, fsync_bdev and __invalidate_device before the
> call to blk_mark_disk_dead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/blk.h | 2 +-
> block/genhd.c | 24 +++++++++++++++++++-----
> block/partitions/core.c | 19 ++++++++++++-------
> 3 files changed, 32 insertions(+), 13 deletions(-)
>
> diff --git a/block/blk.h b/block/blk.h
> index 45547bcf111938..4363052f90416a 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -409,7 +409,7 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
> int bdev_del_partition(struct gendisk *disk, int partno);
> int bdev_resize_partition(struct gendisk *disk, int partno, sector_t start,
> sector_t length);
> -void blk_drop_partitions(struct gendisk *disk);
> +void drop_partition(struct block_device *part);
>
> void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors);
>
> diff --git a/block/genhd.c b/block/genhd.c
> index a744daeed55318..bd4c4eca31363e 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -615,6 +615,8 @@ EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
> void del_gendisk(struct gendisk *disk)
> {
> struct request_queue *q = disk->queue;
> + struct block_device *part;
> + unsigned long idx;
>
> might_sleep();
>
> @@ -623,16 +625,28 @@ void del_gendisk(struct gendisk *disk)
>
> disk_del_events(disk);
>
> + /*
> + * Prevent new openers by unlinked the bdev inode, and write out
> + * dirty data before marking the disk dead and stopping all I/O.
> + */
> mutex_lock(&disk->open_mutex);
> - remove_inode_hash(disk->part0->bd_inode);
> - blk_drop_partitions(disk);
> + xa_for_each(&disk->part_tbl, idx, part) {
> + remove_inode_hash(part->bd_inode);
> + fsync_bdev(part);
> + __invalidate_device(part, true);
> + }
> mutex_unlock(&disk->open_mutex);
>
> - fsync_bdev(disk->part0);
> - __invalidate_device(disk->part0, true);
> -
> blk_mark_disk_dead(disk);
>
> + /*
> + * Drop all partitions now that the disk is marked dead.
> + */
> + mutex_lock(&disk->open_mutex);
> + xa_for_each_start(&disk->part_tbl, idx, part, 1)
> + drop_partition(part);
> + mutex_unlock(&disk->open_mutex);
> +
> if (!(disk->flags & GENHD_FL_HIDDEN)) {
> sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index fa5c707fe0ad2f..31ac815d77a83c 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -263,10 +263,19 @@ struct device_type part_type = {
> .uevent = part_uevent,
> };
>
> -static void delete_partition(struct block_device *part)
> +void drop_partition(struct block_device *part)
> {
> lockdep_assert_held(&part->bd_disk->open_mutex);
>
> + xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
> + kobject_put(part->bd_holder_dir);
> +
> + device_del(&part->bd_device);
> + put_device(&part->bd_device);
> +}
> +
> +static void delete_partition(struct block_device *part)
> +{
> /*
> * Remove the block device from the inode hash, so that it cannot be
> * looked up any more even when openers still hold references.
> @@ -276,11 +285,7 @@ static void delete_partition(struct block_device *part)
> fsync_bdev(part);
> __invalidate_device(part, true);
>
> - xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
> - kobject_put(part->bd_holder_dir);
> - device_del(&part->bd_device);
> -
> - put_device(&part->bd_device);
> + drop_partition(part);
> }
>
> static ssize_t whole_disk_show(struct device *dev,
> @@ -519,7 +524,7 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
> return true;
> }
>
> -void blk_drop_partitions(struct gendisk *disk)
> +static void blk_drop_partitions(struct gendisk *disk)
> {
> struct block_device *part;
> unsigned long idx;
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 08/13] block: remove blk_drop_partitions
2023-05-18 4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
@ 2023-05-30 12:56 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 12:56 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:17, Christoph Hellwig wrote:
> There is only a single caller left, so fold the loop into that.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Sure. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/partitions/core.c | 16 ++++------------
> 1 file changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 31ac815d77a83c..2559bb830273eb 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -524,17 +524,6 @@ static bool disk_unlock_native_capacity(struct gendisk *disk)
> return true;
> }
>
> -static void blk_drop_partitions(struct gendisk *disk)
> -{
> - struct block_device *part;
> - unsigned long idx;
> -
> - lockdep_assert_held(&disk->open_mutex);
> -
> - xa_for_each_start(&disk->part_tbl, idx, part, 1)
> - delete_partition(part);
> -}
> -
> static bool blk_add_partition(struct gendisk *disk,
> struct parsed_partitions *state, int p)
> {
> @@ -651,6 +640,8 @@ static int blk_add_partitions(struct gendisk *disk)
>
> int bdev_disk_changed(struct gendisk *disk, bool invalidate)
> {
> + struct block_device *part;
> + unsigned long idx;
> int ret = 0;
>
> lockdep_assert_held(&disk->open_mutex);
> @@ -663,8 +654,9 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
> return -EBUSY;
> sync_blockdev(disk->part0);
> invalidate_bdev(disk->part0);
> - blk_drop_partitions(disk);
>
> + xa_for_each_start(&disk->part_tbl, idx, part, 1)
> + delete_partition(part);
> clear_bit(GD_NEED_PART_SCAN, &disk->state);
>
> /*
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 09/13] block: introduce holder ops
2023-05-18 4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
@ 2023-05-30 13:03 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 13:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:18, Christoph Hellwig wrote:
> Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
> installed in the block_device for exclusive claims. It will be used to
> allow the block layer to call back into the user of the block device for
> thing like notification of a removed device or a device resize.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/bdev.c | 41 ++++++++++++++++++++---------
> block/fops.c | 2 +-
> block/genhd.c | 6 +++--
> block/ioctl.c | 3 ++-
> drivers/block/drbd/drbd_nl.c | 3 ++-
> drivers/block/loop.c | 2 +-
> drivers/block/pktcdvd.c | 5 ++--
> drivers/block/rnbd/rnbd-srv.c | 2 +-
> drivers/block/xen-blkback/xenbus.c | 2 +-
> drivers/block/zram/zram_drv.c | 2 +-
> drivers/md/bcache/super.c | 2 +-
> drivers/md/dm.c | 2 +-
> drivers/md/md.c | 2 +-
> drivers/mtd/devices/block2mtd.c | 4 +--
> drivers/nvme/target/io-cmd-bdev.c | 2 +-
> drivers/s390/block/dasd_genhd.c | 2 +-
> drivers/target/target_core_iblock.c | 2 +-
> drivers/target/target_core_pscsi.c | 3 ++-
> fs/btrfs/dev-replace.c | 2 +-
> fs/btrfs/volumes.c | 6 ++---
> fs/erofs/super.c | 2 +-
> fs/ext4/super.c | 3 ++-
> fs/f2fs/super.c | 4 +--
> fs/jfs/jfs_logmgr.c | 2 +-
> fs/nfs/blocklayout/dev.c | 5 ++--
> fs/nilfs2/super.c | 2 +-
> fs/ocfs2/cluster/heartbeat.c | 2 +-
> fs/reiserfs/journal.c | 5 ++--
> fs/super.c | 4 +--
> fs/xfs/xfs_super.c | 2 +-
> include/linux/blk_types.h | 2 ++
> include/linux/blkdev.h | 11 +++++---
> kernel/power/swap.c | 4 +--
> mm/swapfile.c | 3 ++-
> 34 files changed, 90 insertions(+), 56 deletions(-)
>
> diff --git a/block/bdev.c b/block/bdev.c
> index f5ffcac762e0cd..5c46ff10770638 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -102,7 +102,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode,
> * under live filesystem.
> */
> if (!(mode & FMODE_EXCL)) {
> - int err = bd_prepare_to_claim(bdev, truncate_bdev_range);
> + int err = bd_prepare_to_claim(bdev, truncate_bdev_range, NULL);
> if (err)
> goto invalidate;
> }
> @@ -415,6 +415,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
> bdev = I_BDEV(inode);
> mutex_init(&bdev->bd_fsfreeze_mutex);
> spin_lock_init(&bdev->bd_size_lock);
> + mutex_init(&bdev->bd_holder_lock);
> bdev->bd_partno = partno;
> bdev->bd_inode = inode;
> bdev->bd_queue = disk->queue;
> @@ -464,13 +465,15 @@ long nr_blockdev_pages(void)
> * bd_may_claim - test whether a block device can be claimed
> * @bdev: block device of interest
> * @holder: holder trying to claim @bdev
> + * @hops: holder ops
> *
> * Test whether @bdev can be claimed by @holder.
> *
> * RETURNS:
> * %true if @bdev can be claimed, %false otherwise.
> */
> -static bool bd_may_claim(struct block_device *bdev, void *holder)
> +static bool bd_may_claim(struct block_device *bdev, void *holder,
> + const struct blk_holder_ops *hops)
> {
> struct block_device *whole = bdev_whole(bdev);
>
> @@ -480,8 +483,11 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
> /*
> * The same holder can always re-claim.
> */
> - if (bdev->bd_holder == holder)
> + if (bdev->bd_holder == holder) {
> + if (WARN_ON_ONCE(bdev->bd_holder_ops != hops))
> + return false;
> return true;
> + }
> return false;
> }
>
> @@ -499,6 +505,7 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
> * bd_prepare_to_claim - claim a block device
> * @bdev: block device of interest
> * @holder: holder trying to claim @bdev
> + * @hops: holder ops.
> *
> * Claim @bdev. This function fails if @bdev is already claimed by another
> * holder and waits if another claiming is in progress. return, the caller
> @@ -507,7 +514,8 @@ static bool bd_may_claim(struct block_device *bdev, void *holder)
> * RETURNS:
> * 0 if @bdev can be claimed, -EBUSY otherwise.
> */
> -int bd_prepare_to_claim(struct block_device *bdev, void *holder)
> +int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> + const struct blk_holder_ops *hops)
> {
> struct block_device *whole = bdev_whole(bdev);
>
> @@ -516,7 +524,7 @@ int bd_prepare_to_claim(struct block_device *bdev, void *holder)
> retry:
> mutex_lock(&bdev_lock);
> /* if someone else claimed, fail */
> - if (!bd_may_claim(bdev, holder)) {
> + if (!bd_may_claim(bdev, holder, hops)) {
> mutex_unlock(&bdev_lock);
> return -EBUSY;
> }
> @@ -557,12 +565,13 @@ static void bd_clear_claiming(struct block_device *whole, void *holder)
> * Finish exclusive open of a block device. Mark the device as exlusively
> * open by the holder and wake up all waiters for exclusive open to finish.
> */
> -static void bd_finish_claiming(struct block_device *bdev, void *holder)
> +static void bd_finish_claiming(struct block_device *bdev, void *holder,
> + const struct blk_holder_ops *hops)
> {
> struct block_device *whole = bdev_whole(bdev);
>
> mutex_lock(&bdev_lock);
> - BUG_ON(!bd_may_claim(bdev, holder));
> + BUG_ON(!bd_may_claim(bdev, holder, hops));
> /*
> * Note that for a whole device bd_holders will be incremented twice,
> * and bd_holder will be set to bd_may_claim before being set to holder
> @@ -570,7 +579,10 @@ static void bd_finish_claiming(struct block_device *bdev, void *holder)
> whole->bd_holders++;
> whole->bd_holder = bd_may_claim;
> bdev->bd_holders++;
> + mutex_lock(&bdev->bd_holder_lock);
> bdev->bd_holder = holder;
> + bdev->bd_holder_ops = hops;
> + mutex_unlock(&bdev->bd_holder_lock);
> bd_clear_claiming(whole, holder);
> mutex_unlock(&bdev_lock);
> }
> @@ -605,7 +617,10 @@ static void bd_end_claim(struct block_device *bdev)
> WARN_ON_ONCE(--bdev->bd_holders < 0);
> WARN_ON_ONCE(--whole->bd_holders < 0);
> if (!bdev->bd_holders) {
> + mutex_lock(&bdev->bd_holder_lock);
> bdev->bd_holder = NULL;
> + bdev->bd_holder_ops = NULL;
> + mutex_unlock(&bdev->bd_holder_lock);
> if (bdev->bd_write_holder)
> unblock = true;
> }
> @@ -735,6 +750,7 @@ void blkdev_put_no_open(struct block_device *bdev)
> * @dev: device number of block device to open
> * @mode: FMODE_* mask
> * @holder: exclusive holder identifier
> + * @hops: holder operations
> *
> * Open the block device described by device number @dev. If @mode includes
> * %FMODE_EXCL, the block device is opened with exclusive access. Specifying
> @@ -751,7 +767,8 @@ void blkdev_put_no_open(struct block_device *bdev)
> * RETURNS:
> * Reference to the block_device on success, ERR_PTR(-errno) on failure.
> */
> -struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
> +struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> + const struct blk_holder_ops *hops)
> {
> bool unblock_events = true;
> struct block_device *bdev;
> @@ -771,7 +788,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
> disk = bdev->bd_disk;
>
> if (mode & FMODE_EXCL) {
> - ret = bd_prepare_to_claim(bdev, holder);
> + ret = bd_prepare_to_claim(bdev, holder, hops);
> if (ret)
> goto put_blkdev;
> }
> @@ -791,7 +808,7 @@ struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder)
> if (ret)
> goto put_module;
> if (mode & FMODE_EXCL) {
> - bd_finish_claiming(bdev, holder);
> + bd_finish_claiming(bdev, holder, hops);
>
> /*
> * Block event polling for write claims if requested. Any write
> @@ -842,7 +859,7 @@ EXPORT_SYMBOL(blkdev_get_by_dev);
> * Reference to the block_device on success, ERR_PTR(-errno) on failure.
> */
> struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
> - void *holder)
> + void *holder, const struct blk_holder_ops *hops)
> {
> struct block_device *bdev;
> dev_t dev;
> @@ -852,7 +869,7 @@ struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
> if (error)
> return ERR_PTR(error);
>
> - bdev = blkdev_get_by_dev(dev, mode, holder);
> + bdev = blkdev_get_by_dev(dev, mode, holder, hops);
> if (!IS_ERR(bdev) && (mode & FMODE_WRITE) && bdev_read_only(bdev)) {
> blkdev_put(bdev, mode);
> return ERR_PTR(-EACCES);
> diff --git a/block/fops.c b/block/fops.c
> index d2e6be4e3d1c7d..2ac5ea878fa4cc 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -490,7 +490,7 @@ static int blkdev_open(struct inode *inode, struct file *filp)
> if ((filp->f_flags & O_ACCMODE) == 3)
> filp->f_mode |= FMODE_WRITE_IOCTL;
>
> - bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp);
> + bdev = blkdev_get_by_dev(inode->i_rdev, filp->f_mode, filp, NULL);
> if (IS_ERR(bdev))
> return PTR_ERR(bdev);
>
> diff --git a/block/genhd.c b/block/genhd.c
> index bd4c4eca31363e..226ddb8329f751 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -370,13 +370,15 @@ int disk_scan_partitions(struct gendisk *disk, fmode_t mode)
> * scanners.
> */
> if (!(mode & FMODE_EXCL)) {
> - ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions);
> + ret = bd_prepare_to_claim(disk->part0, disk_scan_partitions,
> + NULL);
> if (ret)
> return ret;
> }
>
> set_bit(GD_NEED_PART_SCAN, &disk->state);
> - bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL);
> + bdev = blkdev_get_by_dev(disk_devt(disk), mode & ~FMODE_EXCL, NULL,
> + NULL);
> if (IS_ERR(bdev))
> ret = PTR_ERR(bdev);
> else
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 9c5f637ff153f8..c7d7d4345edb4f 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -454,7 +454,8 @@ static int blkdev_bszset(struct block_device *bdev, fmode_t mode,
> if (mode & FMODE_EXCL)
> return set_blocksize(bdev, n);
>
> - if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev)))
> + if (IS_ERR(blkdev_get_by_dev(bdev->bd_dev, mode | FMODE_EXCL, &bdev,
> + NULL)))
> return -EBUSY;
> ret = set_blocksize(bdev, n);
> blkdev_put(bdev, mode | FMODE_EXCL);
> diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
> index 1a5d3d72d91d27..cab59dab3410aa 100644
> --- a/drivers/block/drbd/drbd_nl.c
> +++ b/drivers/block/drbd/drbd_nl.c
> @@ -1641,7 +1641,8 @@ static struct block_device *open_backing_dev(struct drbd_device *device,
> int err = 0;
>
> bdev = blkdev_get_by_path(bdev_path,
> - FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
> + FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> + claim_ptr, NULL);
> if (IS_ERR(bdev)) {
> drbd_err(device, "open(\"%s\") failed with %ld\n",
> bdev_path, PTR_ERR(bdev));
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index bc31bb7072a2cb..a73c857f5bfed0 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1015,7 +1015,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
> * here to avoid changing device under exclusive owner.
> */
> if (!(mode & FMODE_EXCL)) {
> - error = bd_prepare_to_claim(bdev, loop_configure);
> + error = bd_prepare_to_claim(bdev, loop_configure, NULL);
> if (error)
> goto out_putf;
> }
> diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
> index d5d7884cedd477..377f8b34535294 100644
> --- a/drivers/block/pktcdvd.c
> +++ b/drivers/block/pktcdvd.c
> @@ -2125,7 +2125,8 @@ static int pkt_open_dev(struct pktcdvd_device *pd, fmode_t write)
> * to read/write from/to it. It is already opened in O_NONBLOCK mode
> * so open should not fail.
> */
> - bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd);
> + bdev = blkdev_get_by_dev(pd->bdev->bd_dev, FMODE_READ | FMODE_EXCL, pd,
> + NULL);
> if (IS_ERR(bdev)) {
> ret = PTR_ERR(bdev);
> goto out;
> @@ -2530,7 +2531,7 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
> }
> }
>
> - bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL);
> + bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_NDELAY, NULL, NULL);
> if (IS_ERR(bdev))
> return PTR_ERR(bdev);
> sdev = scsi_device_from_queue(bdev->bd_disk->queue);
> diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c
> index 2cfed2e58d646f..cec22bbae2f9a5 100644
> --- a/drivers/block/rnbd/rnbd-srv.c
> +++ b/drivers/block/rnbd/rnbd-srv.c
> @@ -719,7 +719,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
> goto reject;
> }
>
> - bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE);
> + bdev = blkdev_get_by_path(full_path, open_flags, THIS_MODULE, NULL);
> if (IS_ERR(bdev)) {
> ret = PTR_ERR(bdev);
> pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %d\n",
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 4807af1d580593..43b36da9b3544d 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -492,7 +492,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
> vbd->pdevice = MKDEV(major, minor);
>
> bdev = blkdev_get_by_dev(vbd->pdevice, vbd->readonly ?
> - FMODE_READ : FMODE_WRITE, NULL);
> + FMODE_READ : FMODE_WRITE, NULL, NULL);
>
> if (IS_ERR(bdev)) {
> pr_warn("xen_vbd_create: device %08x could not be opened\n",
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index f6d90f1ba5cf7b..ef9dc4ef6796da 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -508,7 +508,7 @@ static ssize_t backing_dev_store(struct device *dev,
> }
>
> bdev = blkdev_get_by_dev(inode->i_rdev,
> - FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram);
> + FMODE_READ | FMODE_WRITE | FMODE_EXCL, zram, NULL);
> if (IS_ERR(bdev)) {
> err = PTR_ERR(bdev);
> bdev = NULL;
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 7e9d19fd21ddd5..d84c09a73af803 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -2560,7 +2560,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
> err = "failed to open device";
> bdev = blkdev_get_by_path(strim(path),
> FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> - sb);
> + sb, NULL);
> if (IS_ERR(bdev)) {
> if (bdev == ERR_PTR(-EBUSY)) {
> dev_t dev;
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3b694ba3a106e6..d759f8bdb3df2f 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -746,7 +746,7 @@ static struct table_device *open_table_device(struct mapped_device *md,
> return ERR_PTR(-ENOMEM);
> refcount_set(&td->count, 1);
>
> - bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr);
> + bdev = blkdev_get_by_dev(dev, mode | FMODE_EXCL, _dm_claim_ptr, NULL);
> if (IS_ERR(bdev)) {
> r = PTR_ERR(bdev);
> goto out_free_td;
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8e344b4b34446f..60ab5c4bee77c5 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -3642,7 +3642,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
>
> rdev->bdev = blkdev_get_by_dev(newdev,
> FMODE_READ | FMODE_WRITE | FMODE_EXCL,
> - super_format == -2 ? &claim_rdev : rdev);
> + super_format == -2 ? &claim_rdev : rdev, NULL);
> if (IS_ERR(rdev->bdev)) {
> pr_warn("md: could not open device unknown-block(%u,%u).\n",
> MAJOR(newdev), MINOR(newdev));
> diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
> index 4cd37ec45762b6..7ac82c6fe35024 100644
> --- a/drivers/mtd/devices/block2mtd.c
> +++ b/drivers/mtd/devices/block2mtd.c
> @@ -235,7 +235,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
> return NULL;
>
> /* Get a handle on the device */
> - bdev = blkdev_get_by_path(devname, mode, dev);
> + bdev = blkdev_get_by_path(devname, mode, dev, NULL);
>
> #ifndef MODULE
> /*
> @@ -257,7 +257,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
> devt = name_to_dev_t(devname);
> if (!devt)
> continue;
> - bdev = blkdev_get_by_dev(devt, mode, dev);
> + bdev = blkdev_get_by_dev(devt, mode, dev, NULL);
> }
> #endif
>
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index c2d6cea0236b0a..9b6d6d85c72544 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -85,7 +85,7 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
> return -ENOTBLK;
>
> ns->bdev = blkdev_get_by_path(ns->device_path,
> - FMODE_READ | FMODE_WRITE, NULL);
> + FMODE_READ | FMODE_WRITE, NULL, NULL);
> if (IS_ERR(ns->bdev)) {
> ret = PTR_ERR(ns->bdev);
> if (ret != -ENOTBLK) {
> diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
> index 998a961e170417..f21198bc483e1a 100644
> --- a/drivers/s390/block/dasd_genhd.c
> +++ b/drivers/s390/block/dasd_genhd.c
> @@ -130,7 +130,7 @@ int dasd_scan_partitions(struct dasd_block *block)
> struct block_device *bdev;
> int rc;
>
> - bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL);
> + bdev = blkdev_get_by_dev(disk_devt(block->gdp), FMODE_READ, NULL, NULL);
> if (IS_ERR(bdev)) {
> DBF_DEV_EVENT(DBF_ERR, block->base,
> "scan partitions error, blkdev_get returned %ld",
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index cc838ffd129472..a5cbbefa78ee4e 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -114,7 +114,7 @@ static int iblock_configure_device(struct se_device *dev)
> else
> dev->dev_flags |= DF_READ_ONLY;
>
> - bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev);
> + bd = blkdev_get_by_path(ib_dev->ibd_udev_path, mode, ib_dev, NULL);
> if (IS_ERR(bd)) {
> ret = PTR_ERR(bd);
> goto out_free_bioset;
> diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c
> index e7425549e39c73..e3494e036c6c85 100644
> --- a/drivers/target/target_core_pscsi.c
> +++ b/drivers/target/target_core_pscsi.c
> @@ -367,7 +367,8 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
> * for TYPE_DISK and TYPE_ZBC using supplied udev_path
> */
> bd = blkdev_get_by_path(dev->udev_path,
> - FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv);
> + FMODE_WRITE|FMODE_READ|FMODE_EXCL, pdv,
> + NULL);
> if (IS_ERR(bd)) {
> pr_err("pSCSI: blkdev_get_by_path() failed\n");
> scsi_device_put(sd);
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 78696d331639bd..4de4984fa99ba3 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -258,7 +258,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
> }
>
> bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
> - fs_info->bdev_holder);
> + fs_info->bdev_holder, NULL);
> if (IS_ERR(bdev)) {
> btrfs_err(fs_info, "target device %s is invalid!", device_path);
> return PTR_ERR(bdev);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 841e799dece51b..784ccc8f6c69c1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -496,7 +496,7 @@ btrfs_get_bdev_and_sb(const char *device_path, fmode_t flags, void *holder,
> {
> int ret;
>
> - *bdev = blkdev_get_by_path(device_path, flags, holder);
> + *bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
>
> if (IS_ERR(*bdev)) {
> ret = PTR_ERR(*bdev);
> @@ -1377,7 +1377,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, fmode_t flags,
> * values temporarily, as the device paths of the fsid are the only
> * required information for assembling the volume.
> */
> - bdev = blkdev_get_by_path(path, flags, holder);
> + bdev = blkdev_get_by_path(path, flags, holder, NULL);
> if (IS_ERR(bdev))
> return ERR_CAST(bdev);
>
> @@ -2629,7 +2629,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> return -EROFS;
>
> bdev = blkdev_get_by_path(device_path, FMODE_WRITE | FMODE_EXCL,
> - fs_info->bdev_holder);
> + fs_info->bdev_holder, NULL);
> if (IS_ERR(bdev))
> return PTR_ERR(bdev);
>
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index 811ab66d805ede..6c263e9cd38b2a 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -254,7 +254,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
> dif->fscache = fscache;
> } else if (!sbi->devs->flatdev) {
> bdev = blkdev_get_by_path(dif->path, FMODE_READ | FMODE_EXCL,
> - sb->s_type);
> + sb->s_type, NULL);
> if (IS_ERR(bdev))
> return PTR_ERR(bdev);
> dif->bdev = bdev;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9680fe753e599a..865625089ecca3 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1103,7 +1103,8 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
> {
> struct block_device *bdev;
>
> - bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb);
> + bdev = blkdev_get_by_dev(dev, FMODE_READ|FMODE_WRITE|FMODE_EXCL, sb,
> + NULL);
> if (IS_ERR(bdev))
> goto fail;
> return bdev;
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 9f15b03037dba9..7c34ab082f1382 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -4025,7 +4025,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
> /* Single zoned block device mount */
> FDEV(0).bdev =
> blkdev_get_by_dev(sbi->sb->s_bdev->bd_dev,
> - sbi->sb->s_mode, sbi->sb->s_type);
> + sbi->sb->s_mode, sbi->sb->s_type, NULL);
> } else {
> /* Multi-device mount */
> memcpy(FDEV(i).path, RDEV(i).path, MAX_PATH_LEN);
> @@ -4044,7 +4044,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
> sbi->log_blocks_per_seg) - 1;
> }
> FDEV(i).bdev = blkdev_get_by_path(FDEV(i).path,
> - sbi->sb->s_mode, sbi->sb->s_type);
> + sbi->sb->s_mode, sbi->sb->s_type, NULL);
> }
> if (IS_ERR(FDEV(i).bdev))
> return PTR_ERR(FDEV(i).bdev);
> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
> index 695415cbfe985b..8c55030c57ed52 100644
> --- a/fs/jfs/jfs_logmgr.c
> +++ b/fs/jfs/jfs_logmgr.c
> @@ -1101,7 +1101,7 @@ int lmLogOpen(struct super_block *sb)
> */
>
> bdev = blkdev_get_by_dev(sbi->logdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> - log);
> + log, NULL);
> if (IS_ERR(bdev)) {
> rc = PTR_ERR(bdev);
> goto free;
> diff --git a/fs/nfs/blocklayout/dev.c b/fs/nfs/blocklayout/dev.c
> index fea5f8821da5ef..38b066ca699ed7 100644
> --- a/fs/nfs/blocklayout/dev.c
> +++ b/fs/nfs/blocklayout/dev.c
> @@ -243,7 +243,7 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
> if (!dev)
> return -EIO;
>
> - bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL);
> + bdev = blkdev_get_by_dev(dev, FMODE_READ | FMODE_WRITE, NULL, NULL);
> if (IS_ERR(bdev)) {
> printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
> MAJOR(dev), MINOR(dev), PTR_ERR(bdev));
> @@ -312,7 +312,8 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
> if (!devname)
> return ERR_PTR(-ENOMEM);
>
> - bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL);
> + bdev = blkdev_get_by_path(devname, FMODE_READ | FMODE_WRITE, NULL,
> + NULL);
> if (IS_ERR(bdev)) {
> pr_warn("pNFS: failed to open device %s (%ld)\n",
> devname, PTR_ERR(bdev));
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 77f1e5778d1c84..91bfbd973d1d53 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -1285,7 +1285,7 @@ nilfs_mount(struct file_system_type *fs_type, int flags,
> if (!(flags & SB_RDONLY))
> mode |= FMODE_WRITE;
>
> - sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type);
> + sd.bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
> if (IS_ERR(sd.bdev))
> return ERR_CAST(sd.bdev);
>
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index 60b97c92e2b25e..6b13b8c3f2b8af 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -1786,7 +1786,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
> goto out2;
>
> reg->hr_bdev = blkdev_get_by_dev(f.file->f_mapping->host->i_rdev,
> - FMODE_WRITE | FMODE_READ, NULL);
> + FMODE_WRITE | FMODE_READ, NULL, NULL);
> if (IS_ERR(reg->hr_bdev)) {
> ret = PTR_ERR(reg->hr_bdev);
> reg->hr_bdev = NULL;
> diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
> index 4d11d60f493c14..5e4db9a0c8e5a3 100644
> --- a/fs/reiserfs/journal.c
> +++ b/fs/reiserfs/journal.c
> @@ -2616,7 +2616,7 @@ static int journal_init_dev(struct super_block *super,
> if (jdev == super->s_dev)
> blkdev_mode &= ~FMODE_EXCL;
> journal->j_dev_bd = blkdev_get_by_dev(jdev, blkdev_mode,
> - journal);
> + journal, NULL);
> journal->j_dev_mode = blkdev_mode;
> if (IS_ERR(journal->j_dev_bd)) {
> result = PTR_ERR(journal->j_dev_bd);
> @@ -2632,7 +2632,8 @@ static int journal_init_dev(struct super_block *super,
> }
>
> journal->j_dev_mode = blkdev_mode;
> - journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal);
> + journal->j_dev_bd = blkdev_get_by_path(jdev_name, blkdev_mode, journal,
> + NULL);
> if (IS_ERR(journal->j_dev_bd)) {
> result = PTR_ERR(journal->j_dev_bd);
> journal->j_dev_bd = NULL;
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2bc3..012ce140080375 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1248,7 +1248,7 @@ int get_tree_bdev(struct fs_context *fc,
> if (!fc->source)
> return invalf(fc, "No source specified");
>
> - bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type);
> + bdev = blkdev_get_by_path(fc->source, mode, fc->fs_type, NULL);
> if (IS_ERR(bdev)) {
> errorf(fc, "%s: Can't open blockdev", fc->source);
> return PTR_ERR(bdev);
> @@ -1333,7 +1333,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
> if (!(flags & SB_RDONLY))
> mode |= FMODE_WRITE;
>
> - bdev = blkdev_get_by_path(dev_name, mode, fs_type);
> + bdev = blkdev_get_by_path(dev_name, mode, fs_type, NULL);
> if (IS_ERR(bdev))
> return ERR_CAST(bdev);
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 7e706255f16502..5684c538eb76dc 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -386,7 +386,7 @@ xfs_blkdev_get(
> int error = 0;
>
> *bdevp = blkdev_get_by_path(name, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
> - mp);
> + mp, NULL);
> if (IS_ERR(*bdevp)) {
> error = PTR_ERR(*bdevp);
> xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 740afe80f29786..84a931caef514e 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -55,6 +55,8 @@ struct block_device {
> struct super_block * bd_super;
> void * bd_claiming;
> void * bd_holder;
> + const struct blk_holder_ops *bd_holder_ops;
> + struct mutex bd_holder_lock;
> /* The counter of freeze processes */
> int bd_fsfreeze_count;
> int bd_holders;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index b441e633f4dd49..c94f3b63c86422 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1465,10 +1465,15 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
> #define BLKDEV_MAJOR_MAX 0
> #endif
>
> +struct blk_holder_ops {
> +};
> +
> +struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> + const struct blk_holder_ops *hops);
> struct block_device *blkdev_get_by_path(const char *path, fmode_t mode,
> - void *holder);
> -struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder);
> -int bd_prepare_to_claim(struct block_device *bdev, void *holder);
> + void *holder, const struct blk_holder_ops *hops);
> +int bd_prepare_to_claim(struct block_device *bdev, void *holder,
> + const struct blk_holder_ops *hops);
> void bd_abort_claiming(struct block_device *bdev, void *holder);
> void blkdev_put(struct block_device *bdev, fmode_t mode);
>
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index 92e41ed292ada8..801c411530d11c 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -357,7 +357,7 @@ static int swsusp_swap_check(void)
> root_swap = res;
>
> hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device, FMODE_WRITE,
> - NULL);
> + NULL, NULL);
> if (IS_ERR(hib_resume_bdev))
> return PTR_ERR(hib_resume_bdev);
>
> @@ -1524,7 +1524,7 @@ int swsusp_check(void)
> mode |= FMODE_EXCL;
>
> hib_resume_bdev = blkdev_get_by_dev(swsusp_resume_device,
> - mode, &holder);
> + mode, &holder, NULL);
> if (!IS_ERR(hib_resume_bdev)) {
> set_blocksize(hib_resume_bdev, PAGE_SIZE);
> clear_page(swsusp_header);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 274bbf79748006..cfbcf7d5705f5f 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2770,7 +2770,8 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
>
> if (S_ISBLK(inode->i_mode)) {
> p->bdev = blkdev_get_by_dev(inode->i_rdev,
> - FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
> + FMODE_READ | FMODE_WRITE | FMODE_EXCL, p,
> + NULL);
> if (IS_ERR(p->bdev)) {
> error = PTR_ERR(p->bdev);
> p->bdev = NULL;
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/13] block: add a mark_dead holder operation
2023-05-18 4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
@ 2023-05-30 13:05 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-05-30 13:05 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Al Viro, Christian Brauner, Darrick J. Wong,
Jan Kara, linux-block, linux-fsdevel, linux-xfs
On Thu 18-05-23 06:23:19, Christoph Hellwig wrote:
> Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead
> to notify the holder that the block device it is using has been marked dead.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Acked-by: Christian Brauner <brauner@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> block/genhd.c | 24 ++++++++++++++++++++++++
> include/linux/blkdev.h | 1 +
> 2 files changed, 25 insertions(+)
>
> diff --git a/block/genhd.c b/block/genhd.c
> index 226ddb8329f751..42aebf0e1e2628 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -565,6 +565,28 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
> }
> EXPORT_SYMBOL(device_add_disk);
>
> +static void blk_report_disk_dead(struct gendisk *disk)
> +{
> + struct block_device *bdev;
> + unsigned long idx;
> +
> + rcu_read_lock();
> + xa_for_each(&disk->part_tbl, idx, bdev) {
> + if (!kobject_get_unless_zero(&bdev->bd_device.kobj))
> + continue;
> + rcu_read_unlock();
> +
> + mutex_lock(&bdev->bd_holder_lock);
> + if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead)
> + bdev->bd_holder_ops->mark_dead(bdev);
> + mutex_unlock(&bdev->bd_holder_lock);
> +
> + put_device(&bdev->bd_device);
> + rcu_read_lock();
> + }
> + rcu_read_unlock();
> +}
> +
> /**
> * blk_mark_disk_dead - mark a disk as dead
> * @disk: disk to mark as dead
> @@ -592,6 +614,8 @@ void blk_mark_disk_dead(struct gendisk *disk)
> * Prevent new I/O from crossing bio_queue_enter().
> */
> blk_queue_start_drain(disk->queue);
> +
> + blk_report_disk_dead(disk);
> }
> EXPORT_SYMBOL_GPL(blk_mark_disk_dead);
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c94f3b63c86422..41f894f6355f96 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1466,6 +1466,7 @@ void blkdev_show(struct seq_file *seqf, off_t offset);
> #endif
>
> struct blk_holder_ops {
> + void (*mark_dead)(struct block_device *bdev);
> };
>
> struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder,
> --
> 2.39.2
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/13] block: refactor bd_may_claim
2023-05-30 11:41 ` Jan Kara
@ 2023-06-01 8:11 ` Christoph Hellwig
2023-06-01 9:58 ` Jan Kara
0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-06-01 8:11 UTC (permalink / raw)
To: Jan Kara
Cc: Christoph Hellwig, Jens Axboe, Al Viro, Christian Brauner,
Darrick J. Wong, linux-block, linux-fsdevel, linux-xfs
On Tue, May 30, 2023 at 01:41:48PM +0200, Jan Kara wrote:
> > + if (bdev->bd_holder) {
> > + /*
> > + * The same holder can always re-claim.
> > + */
> > + if (bdev->bd_holder == holder)
> > + return true;
> > + return false;
>
> With this simple condition I'd just do:
> /* The same holder can always re-claim. */
> return bdev->bd_holder == holder;
As of this patch this makes sense, and I did in fact did it that
way first. But once we start checking the holder ops we need
the eplcicit conditional, so I decided to start out with this more
verbose option to avoid churn later.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/13] block: refactor bd_may_claim
2023-06-01 8:11 ` Christoph Hellwig
@ 2023-06-01 9:58 ` Jan Kara
0 siblings, 0 replies; 28+ messages in thread
From: Jan Kara @ 2023-06-01 9:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jan Kara, Jens Axboe, Al Viro, Christian Brauner,
Darrick J. Wong, linux-block, linux-fsdevel, linux-xfs
On Thu 01-06-23 10:11:05, Christoph Hellwig wrote:
> On Tue, May 30, 2023 at 01:41:48PM +0200, Jan Kara wrote:
> > > + if (bdev->bd_holder) {
> > > + /*
> > > + * The same holder can always re-claim.
> > > + */
> > > + if (bdev->bd_holder == holder)
> > > + return true;
> > > + return false;
> >
> > With this simple condition I'd just do:
> > /* The same holder can always re-claim. */
> > return bdev->bd_holder == holder;
>
> As of this patch this makes sense, and I did in fact did it that
> way first. But once we start checking the holder ops we need
> the eplcicit conditional, so I decided to start out with this more
> verbose option to avoid churn later.
Ah, OK.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2023-06-01 9:58 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 4:23 introduce bdev holder ops and a file system shutdown method v2 Christoph Hellwig
2023-05-18 4:23 ` [PATCH 01/13] block: factor out a bd_end_claim helper from blkdev_put Christoph Hellwig
2023-05-18 4:23 ` [PATCH 02/13] block: refactor bd_may_claim Christoph Hellwig
2023-05-30 11:41 ` Jan Kara
2023-06-01 8:11 ` Christoph Hellwig
2023-06-01 9:58 ` Jan Kara
2023-05-18 4:23 ` [PATCH 03/13] block: turn bdev_lock into a mutex Christoph Hellwig
2023-05-18 4:23 ` [PATCH 04/13] block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendisk Christoph Hellwig
2023-05-30 11:58 ` Jan Kara
2023-05-18 4:23 ` [PATCH 05/13] block: avoid repeated work in blk_mark_disk_dead Christoph Hellwig
2023-05-18 4:23 ` [PATCH 06/13] block: unhash the inode earlier in delete_partition Christoph Hellwig
2023-05-30 12:09 ` Jan Kara
2023-05-18 4:23 ` [PATCH 07/13] block: delete partitions later in del_gendisk Christoph Hellwig
2023-05-30 12:55 ` Jan Kara
2023-05-18 4:23 ` [PATCH 08/13] block: remove blk_drop_partitions Christoph Hellwig
2023-05-30 12:56 ` Jan Kara
2023-05-18 4:23 ` [PATCH 09/13] block: introduce holder ops Christoph Hellwig
2023-05-30 13:03 ` Jan Kara
2023-05-18 4:23 ` [PATCH 10/13] block: add a mark_dead holder operation Christoph Hellwig
2023-05-30 13:05 ` Jan Kara
2023-05-18 4:23 ` [PATCH 11/13] fs: add a method to shut down the file system Christoph Hellwig
2023-05-18 4:23 ` [PATCH 12/13] xfs: wire up sops->shutdown Christoph Hellwig
2023-05-18 5:03 ` Darrick J. Wong
2023-05-18 4:23 ` [PATCH 13/13] xfs: wire up the ->mark_dead holder operation for log and RT devices Christoph Hellwig
2023-05-18 5:40 ` Dave Chinner
2023-05-19 2:00 ` introduce bdev holder ops and a file system shutdown method v2 Theodore Ts'o
2023-05-19 4:11 ` Christoph Hellwig
2023-05-23 0:58 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).