All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
@ 2024-02-22 12:45 Yu Kuai
  2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
                   ` (20 more replies)
  0 siblings, 21 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v4:
 - respin on the top of linux-next, based on Christian's patchset to
 open bdev as file. Most of patches from v3 is dropped and change to use
 file_inode(bdev_file) to get bd_inode or bdev_file->f_mapping to get
 bd_inode->i_mapping.

Changes in v3:
 - remove bdev_associated_mapping() and patch 12 from v1;
 - add kerneldoc comments for new bdev apis;
 - rename __bdev_get_folio() to bdev_get_folio;
 - fix a problem in erofs that erofs_init_metabuf() is not always
 called.
 - add reviewed-by tag for patch 15-17;

Changes in v2:
 - remove some bdev apis that is not necessary;
 - pass in offset for bdev_read_folio() and __bdev_get_folio();
 - remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
 prevent access bd_indoe() directly from mapping_gfp_constraint() in
 ext4.(patch 15, 16);
 - remove block_device_ejected() from ext4.

Yu Kuai (19):
  block: move two helpers into bdev.c
  block: remove sync_blockdev_nowait()
  block: remove sync_blockdev_range()
  block: prevent direct access of bd_inode
  bcachefs: remove dead function bdev_sectors()
  cramfs: prevent direct access of bd_inode
  erofs: prevent direct access of bd_inode
  nilfs2: prevent direct access of bd_inode
  gfs2: prevent direct access of bd_inode
  s390/dasd: use bdev api in dasd_format()
  btrfs: prevent direct access of bd_inode
  ext4: remove block_device_ejected()
  ext4: prevent direct access of bd_inode
  jbd2: prevent direct access of bd_inode
  bcache: prevent direct access of bd_inode
  block2mtd: prevent direct access of bd_inode
  dm-vdo: prevent direct access of bd_inode
  scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
  fs & block: remove bdev->bd_inode

 block/bdev.c                              | 108 +++++++++++++++-------
 block/blk-zoned.c                         |   4 +-
 block/blk.h                               |   2 +
 block/fops.c                              |   4 +-
 block/genhd.c                             |   9 +-
 block/ioctl.c                             |   8 +-
 block/partitions/core.c                   |   8 +-
 drivers/md/bcache/super.c                 |   7 +-
 drivers/md/dm-vdo/dedupe.c                |   3 +-
 drivers/md/dm-vdo/dm-vdo-target.c         |   5 +-
 drivers/md/dm-vdo/indexer/config.c        |   1 +
 drivers/md/dm-vdo/indexer/config.h        |   3 +
 drivers/md/dm-vdo/indexer/index-layout.c  |   6 +-
 drivers/md/dm-vdo/indexer/index-layout.h  |   2 +-
 drivers/md/dm-vdo/indexer/index-session.c |  13 +--
 drivers/md/dm-vdo/indexer/index.c         |   4 +-
 drivers/md/dm-vdo/indexer/index.h         |   2 +-
 drivers/md/dm-vdo/indexer/indexer.h       |   4 +-
 drivers/md/dm-vdo/indexer/io-factory.c    |  13 ++-
 drivers/md/dm-vdo/indexer/io-factory.h    |   4 +-
 drivers/md/dm-vdo/indexer/volume.c        |   4 +-
 drivers/md/dm-vdo/indexer/volume.h        |   2 +-
 drivers/md/md-bitmap.c                    |   2 +-
 drivers/mtd/devices/block2mtd.c           |   6 +-
 drivers/s390/block/dasd_ioctl.c           |   5 +-
 drivers/scsi/scsicam.c                    |   3 +-
 fs/affs/file.c                            |   2 +-
 fs/bcachefs/util.h                        |   5 -
 fs/btrfs/dev-replace.c                    |   2 +-
 fs/btrfs/disk-io.c                        |  17 ++--
 fs/btrfs/disk-io.h                        |   4 +-
 fs/btrfs/inode.c                          |   2 +-
 fs/btrfs/super.c                          |   2 +-
 fs/btrfs/volumes.c                        |  32 ++++---
 fs/btrfs/volumes.h                        |   2 +-
 fs/btrfs/zoned.c                          |  20 ++--
 fs/btrfs/zoned.h                          |   4 +-
 fs/buffer.c                               | 103 ++++++++++++---------
 fs/cramfs/inode.c                         |   2 +-
 fs/direct-io.c                            |   4 +-
 fs/erofs/data.c                           |   5 +-
 fs/erofs/internal.h                       |   1 +
 fs/erofs/zmap.c                           |   2 +-
 fs/exfat/fatent.c                         |   2 +-
 fs/ext2/inode.c                           |   4 +-
 fs/ext2/xattr.c                           |   2 +-
 fs/ext4/dir.c                             |   2 +-
 fs/ext4/ext4_jbd2.c                       |   2 +-
 fs/ext4/inode.c                           |   8 +-
 fs/ext4/mmp.c                             |   2 +-
 fs/ext4/page-io.c                         |   5 +-
 fs/ext4/super.c                           |  30 ++----
 fs/ext4/xattr.c                           |   2 +-
 fs/f2fs/data.c                            |   7 +-
 fs/f2fs/f2fs.h                            |   1 +
 fs/fat/inode.c                            |   2 +-
 fs/fuse/dax.c                             |   2 +-
 fs/gfs2/aops.c                            |   2 +-
 fs/gfs2/bmap.c                            |   2 +-
 fs/gfs2/glock.c                           |   2 +-
 fs/gfs2/meta_io.c                         |   2 +-
 fs/gfs2/ops_fstype.c                      |   2 +-
 fs/hpfs/file.c                            |   2 +-
 fs/iomap/buffered-io.c                    |   8 +-
 fs/iomap/direct-io.c                      |  11 ++-
 fs/iomap/swapfile.c                       |   2 +-
 fs/iomap/trace.h                          |   2 +-
 fs/jbd2/commit.c                          |   2 +-
 fs/jbd2/journal.c                         |  34 ++++---
 fs/jbd2/recovery.c                        |   8 +-
 fs/jbd2/revoke.c                          |  13 +--
 fs/jbd2/transaction.c                     |   8 +-
 fs/mpage.c                                |  26 ++++--
 fs/nilfs2/btnode.c                        |   4 +-
 fs/nilfs2/gcinode.c                       |   2 +-
 fs/nilfs2/mdt.c                           |   2 +-
 fs/nilfs2/page.c                          |   4 +-
 fs/nilfs2/recovery.c                      |  27 ++++--
 fs/nilfs2/segment.c                       |   2 +-
 fs/ntfs3/fsntfs.c                         |   8 +-
 fs/ntfs3/inode.c                          |   4 +-
 fs/ntfs3/super.c                          |   2 +-
 fs/ocfs2/journal.c                        |   2 +-
 fs/reiserfs/fix_node.c                    |   2 +-
 fs/reiserfs/journal.c                     |  10 +-
 fs/reiserfs/prints.c                      |   4 +-
 fs/reiserfs/reiserfs.h                    |   6 +-
 fs/reiserfs/stree.c                       |   2 +-
 fs/reiserfs/tail_conversion.c             |   2 +-
 fs/sync.c                                 |   9 +-
 fs/xfs/xfs_iomap.c                        |   4 +-
 fs/zonefs/file.c                          |   4 +-
 include/linux/blk_types.h                 |   1 -
 include/linux/blkdev.h                    |  21 +----
 include/linux/buffer_head.h               |  73 ++++++++++-----
 include/linux/iomap.h                     |  14 ++-
 include/linux/jbd2.h                      |  18 +++-
 include/trace/events/block.h              |   2 +-
 98 files changed, 491 insertions(+), 376 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 01/19] block: move two helpers into bdev.c
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:31   ` Jan Kara
  2024-03-17 21:19   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

disk_live() and block_size() access bd_inode directly, prepare to remove
the field bd_inode from block_device, and only access bd_inode in block
layer.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           | 12 ++++++++++++
 include/linux/blkdev.h | 12 ++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 140093c99bdc..726a2805a1ce 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1196,6 +1196,18 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 	blkdev_put_no_open(bdev);
 }
 
+bool disk_live(struct gendisk *disk)
+{
+	return !inode_unhashed(disk->part0->bd_inode);
+}
+EXPORT_SYMBOL_GPL(disk_live);
+
+unsigned int block_size(struct block_device *bdev)
+{
+	return 1 << bdev->bd_inode->i_blkbits;
+}
+EXPORT_SYMBOL_GPL(block_size);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 06e854186947..eb1f6eeaddc5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -211,11 +211,6 @@ struct gendisk {
 	struct blk_independent_access_ranges *ia_ranges;
 };
 
-static inline bool disk_live(struct gendisk *disk)
-{
-	return !inode_unhashed(disk->part0->bd_inode);
-}
-
 /**
  * disk_openers - returns how many openers are there for a disk
  * @disk: disk to check
@@ -1359,11 +1354,6 @@ static inline unsigned int blksize_bits(unsigned int size)
 	return order_base_2(size >> SECTOR_SHIFT) + SECTOR_SHIFT;
 }
 
-static inline unsigned int block_size(struct block_device *bdev)
-{
-	return 1 << bdev->bd_inode->i_blkbits;
-}
-
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
 
@@ -1531,6 +1521,8 @@ void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
 struct block_device *file_bdev(struct file *bdev_file);
+bool disk_live(struct gendisk *disk);
+unsigned int block_size(struct block_device *bdev);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
  2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:34   ` Jan Kara
  2024-03-17 21:19   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
                   ` (18 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to flush the file
mapping directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           | 8 --------
 fs/fat/inode.c         | 2 +-
 fs/ntfs3/inode.c       | 2 +-
 fs/sync.c              | 9 ++++++---
 include/linux/blkdev.h | 5 -----
 5 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 726a2805a1ce..49dcff483289 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -188,14 +188,6 @@ int sb_min_blocksize(struct super_block *sb, int size)
 
 EXPORT_SYMBOL(sb_min_blocksize);
 
-int sync_blockdev_nowait(struct block_device *bdev)
-{
-	if (!bdev)
-		return 0;
-	return filemap_flush(bdev->bd_inode->i_mapping);
-}
-EXPORT_SYMBOL_GPL(sync_blockdev_nowait);
-
 /*
  * Write out and wait upon all the dirty data associated with a block
  * device via its mapping.  Does not take the superblock lock.
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 5c813696d1ff..8527aef51841 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -1945,7 +1945,7 @@ int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
 	if (!ret && i2)
 		ret = writeback_inode(i2);
 	if (!ret)
-		ret = sync_blockdev_nowait(sb->s_bdev);
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(fat_flush_inodes);
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index eb7a8c9fba01..3c4c878f6d77 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -1081,7 +1081,7 @@ int ntfs_flush_inodes(struct super_block *sb, struct inode *i1,
 	if (!ret && i2)
 		ret = writeback_inode(i2);
 	if (!ret)
-		ret = sync_blockdev_nowait(sb->s_bdev);
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
 	return ret;
 }
 
diff --git a/fs/sync.c b/fs/sync.c
index dc725914e1ed..3a43062790d9 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -57,9 +57,12 @@ int sync_filesystem(struct super_block *sb)
 		if (ret)
 			return ret;
 	}
-	ret = sync_blockdev_nowait(sb->s_bdev);
-	if (ret)
-		return ret;
+
+	if (sb->s_bdev_file) {
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
+		if (ret)
+			return ret;
+	}
 
 	sync_inodes_sb(sb);
 	if (sb->s_op->sync_fs) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eb1f6eeaddc5..9e96811c8915 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1528,7 +1528,6 @@ unsigned int block_size(struct block_device *bdev);
 void invalidate_bdev(struct block_device *bdev);
 int sync_blockdev(struct block_device *bdev);
 int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
-int sync_blockdev_nowait(struct block_device *bdev);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
@@ -1541,10 +1540,6 @@ static inline int sync_blockdev(struct block_device *bdev)
 {
 	return 0;
 }
-static inline int sync_blockdev_nowait(struct block_device *bdev)
-{
-	return 0;
-}
 static inline void sync_bdevs(bool wait)
 {
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 03/19] block: remove sync_blockdev_range()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
  2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
  2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:37   ` Jan Kara
  2024-03-17 21:21   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to flush the file
mapping directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           |  7 -------
 fs/btrfs/dev-replace.c |  2 +-
 fs/btrfs/volumes.c     | 19 +++++++++++--------
 fs/btrfs/volumes.h     |  2 +-
 fs/exfat/fatent.c      |  2 +-
 include/linux/blkdev.h |  1 -
 6 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 49dcff483289..e493d5c72edb 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -200,13 +200,6 @@ int sync_blockdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(sync_blockdev);
 
-int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend)
-{
-	return filemap_write_and_wait_range(bdev->bd_inode->i_mapping,
-			lstart, lend);
-}
-EXPORT_SYMBOL(sync_blockdev_range);
-
 /**
  * bdev_freeze - lock a filesystem and force it into a consistent state
  * @bdev:	blockdevice to lock
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 7057221a46c3..88d45118cc64 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -982,7 +982,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	btrfs_sysfs_remove_device(src_device);
 	btrfs_sysfs_update_devid(tgt_device);
 	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state))
-		btrfs_scratch_superblocks(fs_info, src_device->bdev,
+		btrfs_scratch_superblocks(fs_info, src_device->bdev_file,
 					  src_device->name->str);
 
 	/* write back the superblocks */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 493e33b4ae94..e12451ff911a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2033,14 +2033,14 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
 }
 
 static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
-				     struct block_device *bdev, int copy_num)
+				     struct file *bdev_file, int copy_num)
 {
 	struct btrfs_super_block *disk_super;
 	const size_t len = sizeof(disk_super->magic);
 	const u64 bytenr = btrfs_sb_offset(copy_num);
 	int ret;
 
-	disk_super = btrfs_read_disk_super(bdev, bytenr, bytenr);
+	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
 	if (IS_ERR(disk_super))
 		return;
 
@@ -2048,26 +2048,29 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
 	folio_mark_dirty(virt_to_folio(disk_super));
 	btrfs_release_disk_super(disk_super);
 
-	ret = sync_blockdev_range(bdev, bytenr, bytenr + len - 1);
+	ret = filemap_write_and_wait_range(bdev_file->f_mapping,
+					   bytenr, bytenr + len - 1);
 	if (ret)
 		btrfs_warn(fs_info, "error clearing superblock number %d (%d)",
 			copy_num, ret);
 }
 
 void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
-			       struct block_device *bdev,
+			       struct file *bdev_file,
 			       const char *device_path)
 {
+	struct block_device *bdev;
 	int copy_num;
 
-	if (!bdev)
+	if (!bdev_file)
 		return;
 
+	bdev = file_bdev(bdev_file);
 	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
 		if (bdev_is_zoned(bdev))
 			btrfs_reset_sb_log_zones(bdev, copy_num);
 		else
-			btrfs_scratch_superblock(fs_info, bdev, copy_num);
+			btrfs_scratch_superblock(fs_info, bdev_file, copy_num);
 	}
 
 	/* Notify udev that device has changed */
@@ -2209,7 +2212,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 	 *  just flush the device and let the caller do the final bdev_release.
 	 */
 	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
-		btrfs_scratch_superblocks(fs_info, device->bdev,
+		btrfs_scratch_superblocks(fs_info, device->bdev_file,
 					  device->name->str);
 		if (device->bdev) {
 			sync_blockdev(device->bdev);
@@ -2323,7 +2326,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
 
 	mutex_unlock(&fs_devices->device_list_mutex);
 
-	btrfs_scratch_superblocks(tgtdev->fs_info, tgtdev->bdev,
+	btrfs_scratch_superblocks(tgtdev->fs_info, tgtdev->bdev_file,
 				  tgtdev->name->str);
 
 	btrfs_close_bdev(tgtdev);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2ef78d3cc4c3..1d566f40b83d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -818,7 +818,7 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void);
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 					struct btrfs_device *failing_dev);
 void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
-			       struct block_device *bdev,
+			       struct file *bdev_file,
 			       const char *device_path);
 
 enum btrfs_raid_types __attribute_const__ btrfs_bg_flags_to_raid_index(u64 flags);
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 56b870d9cc0d..1c86ec2465b7 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -296,7 +296,7 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
 	}
 
 	if (IS_DIRSYNC(dir))
-		return sync_blockdev_range(sb->s_bdev,
+		return filemap_write_and_wait_range(sb->s_bdev_file->f_mapping,
 				EXFAT_BLK_TO_B(blknr, sb),
 				EXFAT_BLK_TO_B(last_blknr, sb) - 1);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9e96811c8915..c510f334c84f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1527,7 +1527,6 @@ unsigned int block_size(struct block_device *bdev);
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
 int sync_blockdev(struct block_device *bdev);
-int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (2 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:44   ` Jan Kara
                     ` (2 more replies)
  2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
                   ` (16 subsequent siblings)
  20 siblings, 3 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Add helpers to access bd_inode, prepare to remove the field 'bd_inode'
after removing all the access from filesystems and drivers.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c            | 58 +++++++++++++++++++++++++++--------------
 block/blk-zoned.c       |  4 +--
 block/blk.h             |  2 ++
 block/fops.c            |  2 +-
 block/genhd.c           |  9 ++++---
 block/ioctl.c           |  8 +++---
 block/partitions/core.c |  8 +++---
 7 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index e493d5c72edb..60a1479eae83 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -43,6 +43,21 @@ static inline struct bdev_inode *BDEV_I(struct inode *inode)
 	return container_of(inode, struct bdev_inode, vfs_inode);
 }
 
+static inline struct bdev_inode *BDEV_B(struct block_device *bdev)
+{
+	return container_of(bdev, struct bdev_inode, bdev);
+}
+
+struct inode *bdev_inode(struct block_device *bdev)
+{
+	return &BDEV_B(bdev)->vfs_inode;
+}
+
+struct address_space *bdev_mapping(struct block_device *bdev)
+{
+	return BDEV_B(bdev)->vfs_inode.i_mapping;
+}
+
 struct block_device *I_BDEV(struct inode *inode)
 {
 	return &BDEV_I(inode)->bdev;
@@ -57,7 +72,7 @@ EXPORT_SYMBOL(file_bdev);
 
 static void bdev_write_inode(struct block_device *bdev)
 {
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int ret;
 
 	spin_lock(&inode->i_lock);
@@ -76,7 +91,7 @@ static void bdev_write_inode(struct block_device *bdev)
 /* Kill _all_ buffers and pagecache , dirty or not.. */
 static void kill_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(bdev);
 
 	if (mapping_empty(mapping))
 		return;
@@ -88,7 +103,7 @@ static void kill_bdev(struct block_device *bdev)
 /* Invalidate clean unused buffers and pagecache. */
 void invalidate_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(bdev);
 
 	if (mapping->nrpages) {
 		invalidate_bh_lrus();
@@ -116,7 +131,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 			goto invalidate;
 	}
 
-	truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
+	truncate_inode_pages_range(bdev_mapping(bdev), lstart, lend);
 	if (!(mode & BLK_OPEN_EXCL))
 		bd_abort_claiming(bdev, truncate_bdev_range);
 	return 0;
@@ -126,7 +141,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 	 * Someone else has handle exclusively open. Try invalidating instead.
 	 * The 'end' argument is inclusive so the rounding is safe.
 	 */
-	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
+	return invalidate_inode_pages2_range(bdev_mapping(bdev),
 					     lstart >> PAGE_SHIFT,
 					     lend >> PAGE_SHIFT);
 }
@@ -134,14 +149,14 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 static void set_init_blocksize(struct block_device *bdev)
 {
 	unsigned int bsize = bdev_logical_block_size(bdev);
-	loff_t size = i_size_read(bdev->bd_inode);
+	loff_t size = i_size_read(bdev_inode(bdev));
 
 	while (bsize < PAGE_SIZE) {
 		if (size & bsize)
 			break;
 		bsize <<= 1;
 	}
-	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
+	bdev_inode(bdev)->i_blkbits = blksize_bits(bsize);
 }
 
 int set_blocksize(struct block_device *bdev, int size)
@@ -155,9 +170,9 @@ int set_blocksize(struct block_device *bdev, int size)
 		return -EINVAL;
 
 	/* Don't change the size if it is same as current */
-	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
+	if (bdev_inode(bdev)->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-		bdev->bd_inode->i_blkbits = blksize_bits(size);
+		bdev_inode(bdev)->i_blkbits = blksize_bits(size);
 		kill_bdev(bdev);
 	}
 	return 0;
@@ -196,7 +211,7 @@ int sync_blockdev(struct block_device *bdev)
 {
 	if (!bdev)
 		return 0;
-	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
+	return filemap_write_and_wait(bdev_mapping(bdev));
 }
 EXPORT_SYMBOL(sync_blockdev);
 
@@ -415,19 +430,22 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
 {
 	spin_lock(&bdev->bd_size_lock);
-	i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+	i_size_write(bdev_inode(bdev), (loff_t)sectors << SECTOR_SHIFT);
 	bdev->bd_nr_sectors = sectors;
 	spin_unlock(&bdev->bd_size_lock);
 }
 
 void bdev_add(struct block_device *bdev, dev_t dev)
 {
+	struct inode *inode;
+
 	if (bdev_stable_writes(bdev))
-		mapping_set_stable_writes(bdev->bd_inode->i_mapping);
+		mapping_set_stable_writes(bdev_mapping(bdev));
 	bdev->bd_dev = dev;
-	bdev->bd_inode->i_rdev = dev;
-	bdev->bd_inode->i_ino = dev;
-	insert_inode_hash(bdev->bd_inode);
+	inode = bdev_inode(bdev);
+	inode->i_rdev = dev;
+	inode->i_ino = dev;
+	insert_inode_hash(inode);
 }
 
 long nr_blockdev_pages(void)
@@ -885,7 +903,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
 	if (bdev_nowait(bdev))
 		bdev_file->f_mode |= FMODE_NOWAIT;
-	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
+	bdev_file->f_mapping = bdev_mapping(bdev);
 	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
 	bdev_file->private_data = holder;
 
@@ -947,13 +965,13 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 		return ERR_PTR(-ENXIO);
 
 	flags = blk_to_file_flags(mode);
-	bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
+	bdev_file = alloc_file_pseudo_noaccount(bdev_inode(bdev),
 			blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
 	if (IS_ERR(bdev_file)) {
 		blkdev_put_no_open(bdev);
 		return bdev_file;
 	}
-	ihold(bdev->bd_inode);
+	ihold(bdev_inode(bdev));
 
 	ret = bdev_open(bdev, mode, holder, hops, bdev_file);
 	if (ret) {
@@ -1183,13 +1201,13 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 
 bool disk_live(struct gendisk *disk)
 {
-	return !inode_unhashed(disk->part0->bd_inode);
+	return !inode_unhashed(bdev_inode(disk->part0));
 }
 EXPORT_SYMBOL_GPL(disk_live);
 
 unsigned int block_size(struct block_device *bdev)
 {
-	return 1 << bdev->bd_inode->i_blkbits;
+	return 1 << bdev_inode(bdev)->i_blkbits;
 }
 EXPORT_SYMBOL_GPL(block_size);
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index d4f4f8325eff..ab022d990703 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -399,7 +399,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 		op = REQ_OP_ZONE_RESET;
 
 		/* Invalidate the page cache, including dirty pages. */
-		filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_lock(bdev_mapping(bdev));
 		ret = blkdev_truncate_zone_range(bdev, mode, &zrange);
 		if (ret)
 			goto fail;
@@ -421,7 +421,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 
 fail:
 	if (cmd == BLKRESETZONE)
-		filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_unlock(bdev_mapping(bdev));
 
 	return ret;
 }
diff --git a/block/blk.h b/block/blk.h
index 72bc8d27cc70..b612538588cb 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -414,6 +414,8 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
 }
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+struct inode *bdev_inode(struct block_device *bdev);
+struct address_space *bdev_mapping(struct block_device *bdev);
 struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
 void bdev_add(struct block_device *bdev, dev_t dev);
 
diff --git a/block/fops.c b/block/fops.c
index f4dcb9dd148d..1fcbdb131a8f 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -666,7 +666,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct block_device *bdev = I_BDEV(file->f_mapping->host);
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = bdev_inode(bdev);
 	loff_t size = bdev_nr_bytes(bdev);
 	size_t shorted = 0;
 	ssize_t ret;
diff --git a/block/genhd.c b/block/genhd.c
index 2f9834bdd14b..4f0f66b4798f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -656,7 +656,7 @@ void del_gendisk(struct gendisk *disk)
 	 */
 	mutex_lock(&disk->open_mutex);
 	xa_for_each(&disk->part_tbl, idx, part)
-		remove_inode_hash(part->bd_inode);
+		remove_inode_hash(bdev_inode(part));
 	mutex_unlock(&disk->open_mutex);
 
 	/*
@@ -745,7 +745,7 @@ void invalidate_disk(struct gendisk *disk)
 	struct block_device *bdev = disk->part0;
 
 	invalidate_bdev(bdev);
-	bdev->bd_inode->i_mapping->wb_err = 0;
+	bdev_mapping(bdev)->wb_err = 0;
 	set_capacity(disk, 0);
 }
 EXPORT_SYMBOL(invalidate_disk);
@@ -1191,7 +1191,8 @@ static void disk_release(struct device *dev)
 	if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk)
 		disk->fops->free_disk(disk);
 
-	iput(disk->part0->bd_inode);	/* frees the disk */
+	/* frees the disk */
+	iput(bdev_inode(disk->part0));
 }
 
 static int block_uevent(const struct device *dev, struct kobj_uevent_env *env)
@@ -1381,7 +1382,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
 out_destroy_part_tbl:
 	xa_destroy(&disk->part_tbl);
 	disk->part0->bd_disk = NULL;
-	iput(disk->part0->bd_inode);
+	iput(bdev_inode(disk->part0));
 out_free_bdi:
 	bdi_put(disk->bdi);
 out_free_bioset:
diff --git a/block/ioctl.c b/block/ioctl.c
index 4c8aebee595f..cb5b378cff38 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -90,7 +90,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, len;
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
@@ -144,12 +144,12 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
 	if (start + len > bdev_nr_bytes(bdev))
 		return -EINVAL;
 
-	filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_lock(bdev_mapping(bdev));
 	err = truncate_bdev_range(bdev, mode, start, start + len - 1);
 	if (!err)
 		err = blkdev_issue_secure_erase(bdev, start >> 9, len >> 9,
 						GFP_KERNEL);
-	filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_unlock(bdev_mapping(bdev));
 	return err;
 }
 
@@ -159,7 +159,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, end, len;
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 5f5ed5c75f04..6e91a4660588 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -243,7 +243,7 @@ static const struct attribute_group *part_attr_groups[] = {
 static void part_release(struct device *dev)
 {
 	put_disk(dev_to_bdev(dev)->bd_disk);
-	iput(dev_to_bdev(dev)->bd_inode);
+	iput(bdev_inode(dev_to_bdev(dev)));
 }
 
 static int part_uevent(const struct device *dev, struct kobj_uevent_env *env)
@@ -480,7 +480,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
 	 * Just delete the partition and invalidate it.
 	 */
 
-	remove_inode_hash(part->bd_inode);
+	remove_inode_hash(bdev_inode(part));
 	invalidate_bdev(part);
 	drop_partition(part);
 	ret = 0;
@@ -666,7 +666,7 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 		 * it cannot be looked up any more even when openers
 		 * still hold references.
 		 */
-		remove_inode_hash(part->bd_inode);
+		remove_inode_hash(bdev_inode(part));
 
 		/*
 		 * If @disk->open_partitions isn't elevated but there's
@@ -715,7 +715,7 @@ EXPORT_SYMBOL_GPL(bdev_disk_changed);
 
 void *read_part_sector(struct parsed_partitions *state, sector_t n, Sector *p)
 {
-	struct address_space *mapping = state->disk->part0->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(state->disk->part0);
 	struct folio *folio;
 
 	if (n >= get_capacity(state->disk)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (3 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:42   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

bdev_sectors() is not used hence remove it.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/bcachefs/util.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index 1b3aced8d83c..e2d7f22df618 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -443,11 +443,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
 void bch2_bio_map(struct bio *bio, void *base, size_t);
 int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
 
-static inline sector_t bdev_sectors(struct block_device *bdev)
-{
-	return bdev->bd_inode->i_size >> 9;
-}
-
 #define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (4 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:44   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get bdev mapping
from the file directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/cramfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 39e75131fd5a..1df4dd89350e 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -183,7 +183,7 @@ static int next_buffer;
 static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 				unsigned int len)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev_file->f_mapping;
 	struct file_ra_state ra = {};
 	struct page *pages[BLKS_PER_BUF];
 	unsigned i, blocknr, buffer;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 07/19] erofs: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (5 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:45   ` Jan Kara
                     ` (2 more replies)
  2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
                   ` (13 subsequent siblings)
  20 siblings, 3 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
for the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/erofs/data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 433fc39ba423..dc2d43abe8c5 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -70,7 +70,7 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 	if (erofs_is_fscache_mode(sb))
 		buf->inode = EROFS_SB(sb)->s_fscache->inode;
 	else
-		buf->inode = sb->s_bdev->bd_inode;
+		buf->inode = file_inode(sb->s_bdev_file);
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 08/19] nilfs2: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (6 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:49   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
                   ` (12 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/nilfs2/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index aa5290cb7467..2940e8ef88f4 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2790,7 +2790,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
 	if (!nilfs->ns_writer)
 		return -ENOMEM;
 
-	inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
+	inode_attach_wb(file_inode(nilfs->ns_sb->s_bdev_file), NULL);
 
 	err = nilfs_segctor_start_thread(nilfs->ns_writer);
 	if (unlikely(err))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 09/19] gfs2: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (7 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:54   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
                   ` (11 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/gfs2/glock.c      | 2 +-
 fs/gfs2/ops_fstype.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 34540f9d011c..95ade8979f6b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1227,7 +1227,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 	mapping = gfs2_glock2aspace(gl);
 	if (mapping) {
                 mapping->a_ops = &gfs2_meta_aops;
-		mapping->host = s->s_bdev->bd_inode;
+		mapping->host = file_inode(s->s_bdev_file);
 		mapping->flags = 0;
 		mapping_set_gfp_mask(mapping, GFP_NOFS);
 		mapping->i_private_data = NULL;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 572d58e86296..4384cb39b06c 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -114,7 +114,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	address_space_init_once(mapping);
 	mapping->a_ops = &gfs2_rgrp_aops;
-	mapping->host = sb->s_bdev->bd_inode;
+	mapping->host = file_inode(sb->s_bdev_file);
 	mapping->flags = 0;
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	mapping->i_private_data = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (8 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:55   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
                   ` (10 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/s390/block/dasd_ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 7e0ed7032f76..c1201590f343 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
 	 * enabling the device later.
 	 */
 	if (fdata->start_unit == 0) {
-		block->gdp->part0->bd_inode->i_blkbits =
-			blksize_bits(fdata->blksize);
+		rc = set_blocksize(block->gdp->part0, fdata->blksize);
+		if (rc)
+			return rc;
 	}
 
 	rc = base->discipline->format_device(base, fdata, 1);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (9 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 15:09   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 12/19] ext4: remove block_device_ejected() Yu Kuai
                   ` (9 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode or
mapping from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/btrfs/disk-io.c | 17 +++++++++--------
 fs/btrfs/disk-io.h |  4 ++--
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 15 +++++++--------
 fs/btrfs/zoned.c   | 20 +++++++++++---------
 fs/btrfs/zoned.h   |  4 ++--
 6 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bececdd63b4d..344955765f3e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3235,7 +3235,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	/*
 	 * Read super block and check the signature bytes only
 	 */
-	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
+	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev_file);
 	if (IS_ERR(disk_super)) {
 		ret = PTR_ERR(disk_super);
 		goto fail_alloc;
@@ -3656,17 +3656,18 @@ static void btrfs_end_super_write(struct bio *bio)
 	bio_put(bio);
 }
 
-struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
+struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
 						   int copy_num, bool drop_cache)
 {
 	struct btrfs_super_block *super;
 	struct page *page;
 	u64 bytenr, bytenr_orig;
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct block_device *bdev = file_bdev(bdev_file);
+	struct address_space *mapping = bdev_file->f_mapping;
 	int ret;
 
 	bytenr_orig = btrfs_sb_offset(copy_num);
-	ret = btrfs_sb_log_location_bdev(bdev, copy_num, READ, &bytenr);
+	ret = btrfs_sb_log_location_bdev(bdev_file, copy_num, READ, &bytenr);
 	if (ret == -ENOENT)
 		return ERR_PTR(-EINVAL);
 	else if (ret)
@@ -3707,7 +3708,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 }
 
 
-struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
+struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file)
 {
 	struct btrfs_super_block *super, *latest = NULL;
 	int i;
@@ -3719,7 +3720,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
 	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 	 */
 	for (i = 0; i < 1; i++) {
-		super = btrfs_read_dev_one_super(bdev, i, false);
+		super = btrfs_read_dev_one_super(bdev_file, i, false);
 		if (IS_ERR(super))
 			continue;
 
@@ -3749,7 +3750,7 @@ static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
+	struct address_space *mapping = device->bdev_file->f_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
 	int errors = 0;
@@ -3866,7 +3867,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		    device->commit_total_bytes)
 			break;
 
-		page = find_get_page(device->bdev->bd_inode->i_mapping,
+		page = find_get_page(device->bdev_file->f_mapping,
 				     bytenr >> PAGE_SHIFT);
 		if (!page) {
 			errors++;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 375f62ae3709..2c627885d8d1 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -60,8 +60,8 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 			 struct btrfs_super_block *sb, int mirror_num);
 int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount);
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
-struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev);
-struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
+struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file);
+struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
 						   int copy_num, bool drop_cache);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 40ae264fd3ed..9f50f20a1ba4 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2286,7 +2286,7 @@ static int check_dev_super(struct btrfs_device *dev)
 		return 0;
 
 	/* Only need to check the primary super block. */
-	sb = btrfs_read_dev_one_super(dev->bdev, 0, true);
+	sb = btrfs_read_dev_one_super(dev->bdev_file, 0, true);
 	if (IS_ERR(sb))
 		return PTR_ERR(sb);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e12451ff911a..9fccfb156bd2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -488,7 +488,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 		goto error;
 	}
 	invalidate_bdev(bdev);
-	*disk_super = btrfs_read_dev_super(bdev);
+	*disk_super = btrfs_read_dev_super(*bdev_file);
 	if (IS_ERR(*disk_super)) {
 		ret = PTR_ERR(*disk_super);
 		fput(*bdev_file);
@@ -1244,7 +1244,7 @@ void btrfs_release_disk_super(struct btrfs_super_block *super)
 	put_page(page);
 }
 
-static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
+static struct btrfs_super_block *btrfs_read_disk_super(struct file *bdev_file,
 						       u64 bytenr, u64 bytenr_orig)
 {
 	struct btrfs_super_block *disk_super;
@@ -1253,7 +1253,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 	pgoff_t index;
 
 	/* make sure our super fits in the device */
-	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(bdev))
+	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(file_bdev(bdev_file)))
 		return ERR_PTR(-EINVAL);
 
 	/* make sure our super fits in the page */
@@ -1266,7 +1266,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 		return ERR_PTR(-EINVAL);
 
 	/* pull in the page with our super */
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
+	page = read_cache_page_gfp(bdev_file->f_mapping, index, GFP_KERNEL);
 
 	if (IS_ERR(page))
 		return ERR_CAST(page);
@@ -1368,14 +1368,13 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
 		return ERR_CAST(bdev_file);
 
 	bytenr_orig = btrfs_sb_offset(0);
-	ret = btrfs_sb_log_location_bdev(file_bdev(bdev_file), 0, READ, &bytenr);
+	ret = btrfs_sb_log_location_bdev(bdev_file, 0, READ, &bytenr);
 	if (ret) {
 		device = ERR_PTR(ret);
 		goto error_bdev_put;
 	}
 
-	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr,
-					   bytenr_orig);
+	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr_orig);
 	if (IS_ERR(disk_super)) {
 		device = ERR_CAST(disk_super);
 		goto error_bdev_put;
@@ -2040,7 +2039,7 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
 	const u64 bytenr = btrfs_sb_offset(copy_num);
 	int ret;
 
-	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
+	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr);
 	if (IS_ERR(disk_super))
 		return;
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 12d77aba0148..9e4e2951cdf5 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -81,7 +81,7 @@ static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data
 	return 0;
 }
 
-static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
+static int sb_write_pointer(struct file *bdev_file, struct blk_zone *zones,
 			    u64 *wp_ret)
 {
 	bool empty[BTRFS_NR_SB_LOG_ZONES];
@@ -118,7 +118,7 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 		return -ENOENT;
 	} else if (full[0] && full[1]) {
 		/* Compare two super blocks */
-		struct address_space *mapping = bdev->bd_inode->i_mapping;
+		struct address_space *mapping = bdev_file->f_mapping;
 		struct page *page[BTRFS_NR_SB_LOG_ZONES];
 		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
 		int i;
@@ -562,7 +562,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
 		    BLK_ZONE_TYPE_CONVENTIONAL)
 			continue;
 
-		ret = sb_write_pointer(device->bdev,
+		ret = sb_write_pointer(device->bdev_file,
 				       &zone_info->sb_zones[sb_pos], &sb_wp);
 		if (ret != -ENOENT && ret) {
 			btrfs_err_in_rcu(device->fs_info,
@@ -798,7 +798,7 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount
 	return 0;
 }
 
-static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
+static int sb_log_location(struct file *bdev_file, struct blk_zone *zones,
 			   int rw, u64 *bytenr_ret)
 {
 	u64 wp;
@@ -809,7 +809,7 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 		return 0;
 	}
 
-	ret = sb_write_pointer(bdev, zones, &wp);
+	ret = sb_write_pointer(bdev_file, zones, &wp);
 	if (ret != -ENOENT && ret < 0)
 		return ret;
 
@@ -827,7 +827,8 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 			ASSERT(sb_zone_is_full(reset));
 
 			nofs_flags = memalloc_nofs_save();
-			ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
+			ret = blkdev_zone_mgmt(file_bdev(bdev_file),
+					       REQ_OP_ZONE_RESET,
 					       reset->start, reset->len);
 			memalloc_nofs_restore(nofs_flags);
 			if (ret)
@@ -859,10 +860,11 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 
 }
 
-int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
+int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
 			       u64 *bytenr_ret)
 {
 	struct blk_zone zones[BTRFS_NR_SB_LOG_ZONES];
+	struct block_device *bdev = file_bdev(bdev_file);
 	sector_t zone_sectors;
 	u32 sb_zone;
 	int ret;
@@ -896,7 +898,7 @@ int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
 	if (ret != BTRFS_NR_SB_LOG_ZONES)
 		return -EIO;
 
-	return sb_log_location(bdev, zones, rw, bytenr_ret);
+	return sb_log_location(bdev_file, zones, rw, bytenr_ret);
 }
 
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
@@ -920,7 +922,7 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 	if (zone_num + 1 >= zinfo->nr_zones)
 		return -ENOENT;
 
-	return sb_log_location(device->bdev,
+	return sb_log_location(device->bdev_file,
 			       &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror],
 			       rw, bytenr_ret);
 }
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 77c4321e331f..32680a04aa1f 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -61,7 +61,7 @@ void btrfs_destroy_dev_zone_info(struct btrfs_device *device);
 struct btrfs_zoned_device_info *btrfs_clone_dev_zone_info(struct btrfs_device *orig_dev);
 int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
 int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount_opt);
-int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
+int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
 			       u64 *bytenr_ret);
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 			  u64 *bytenr_ret);
@@ -142,7 +142,7 @@ static inline int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info,
 	return 0;
 }
 
-static inline int btrfs_sb_log_location_bdev(struct block_device *bdev,
+static inline int btrfs_sb_log_location_bdev(struct file *bdev_file,
 					     int mirror, int rw, u64 *bytenr_ret)
 {
 	*bytenr_ret = btrfs_sb_offset(mirror);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 12/19] ext4: remove block_device_ejected()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (10 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

block_device_ejected() is added by commit bdfe0cbd746a ("Revert
"ext4: remove block_device_ejected"") in 2015. At that time 'bdi->wb'
is destroyed synchronized from del_gendisk(), hence if ext4 is still
mounted, and then mark_buffer_dirty() will reference destroyed 'wb'.
However, such problem doesn't exist anymore:

- commit d03f6cdc1fc4 ("block: Dynamically allocate and refcount
backing_dev_info") switch bdi to use refcounting;
- commit 13eec2363ef0 ("fs: Get proper reference for s_bdi"), will grab
additional reference of bdi while mounting, so that 'bdi->wb' will not
be destroyed until generic_shutdown_super().

Hence remove this dead function block_device_ejected().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e487623f9456..2d82b9d4b079 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -492,22 +492,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else.  Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
-	struct inode *bd_inode = sb->s_bdev->bd_inode;
-	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
-	return bdi->dev == NULL;
-}
-
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
 {
 	struct super_block		*sb = journal->j_private;
@@ -6176,8 +6160,6 @@ static int ext4_commit_super(struct super_block *sb)
 
 	if (!sbh)
 		return -EINVAL;
-	if (block_device_ejected(sb))
-		return -ENODEV;
 
 	ext4_update_super(sb);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (11 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 12/19] ext4: remove block_device_ejected() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 14:58   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get mapping
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/ext4/dir.c       | 2 +-
 fs/ext4/ext4_jbd2.c | 2 +-
 fs/ext4/super.c     | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..0733bc1eec7a 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -192,7 +192,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 					(PAGE_SHIFT - inode->i_blkbits);
 			if (!ra_has_index(&file->f_ra, index))
 				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
+					sb->s_bdev_file->f_mapping,
 					&file->f_ra, file,
 					index, 1);
 			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 5d8055161acd..dbb9aff07ac1 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,7 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
 
 static void ext4_check_bdev_write_error(struct super_block *sb)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev_file->f_mapping;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int err;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2d82b9d4b079..55b3df71bf5e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -244,7 +244,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
 struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 				   blk_opf_t op_flags)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
 			~__GFP_FS) | __GFP_MOVABLE;
 
 	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
@@ -253,7 +253,7 @@ struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 					    sector_t block)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
 			~__GFP_FS);
 
 	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
@@ -5560,7 +5560,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	 * used to detect the metadata async write error.
 	 */
 	spin_lock_init(&sbi->s_bdev_wb_lock);
-	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
+	errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err,
 				 &sbi->s_bdev_wb_err);
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 14/19] jbd2: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (12 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 15:06   ` Jan Kara
  2024-03-17 21:26   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
                   ` (6 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get mapping
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/ext4/super.c      |  2 +-
 fs/jbd2/journal.c    | 26 +++++++++++++++-----------
 include/linux/jbd2.h | 18 ++++++++++++++----
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 55b3df71bf5e..4df1a5cfe0a5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5918,7 +5918,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
 	if (IS_ERR(bdev_file))
 		return ERR_CAST(bdev_file);
 
-	journal = jbd2_journal_init_dev(file_bdev(bdev_file), sb->s_bdev, j_start,
+	journal = jbd2_journal_init_dev(bdev_file, sb->s_bdev_file, j_start,
 					j_len, sb->s_blocksize);
 	if (IS_ERR(journal)) {
 		ext4_msg(sb, KERN_ERR, "failed to create device journal");
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..abd42a6ccd0e 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1516,11 +1516,12 @@ static int journal_load_superblock(journal_t *journal)
  * very few fields yet: that has to wait until we have created the
  * journal structures from from scratch, or loaded them from disk. */
 
-static journal_t *journal_init_common(struct block_device *bdev,
-			struct block_device *fs_dev,
+static journal_t *journal_init_common(struct file *bdev_file,
+			struct file *fs_dev_file,
 			unsigned long long start, int len, int blocksize)
 {
 	static struct lock_class_key jbd2_trans_commit_key;
+	struct block_device *bdev = file_bdev(bdev_file);
 	journal_t *journal;
 	int err;
 	int n;
@@ -1531,7 +1532,9 @@ static journal_t *journal_init_common(struct block_device *bdev,
 
 	journal->j_blocksize = blocksize;
 	journal->j_dev = bdev;
-	journal->j_fs_dev = fs_dev;
+	journal->j_dev_file = bdev_file;
+	journal->j_fs_dev = file_bdev(fs_dev_file);
+	journal->j_fs_dev_file = fs_dev_file;
 	journal->j_blk_offset = start;
 	journal->j_total_len = len;
 	jbd2_init_fs_dev_write_error(journal);
@@ -1628,8 +1631,8 @@ static journal_t *journal_init_common(struct block_device *bdev,
 
 /**
  *  journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure
- *  @bdev: Block device on which to create the journal
- *  @fs_dev: Device which hold journalled filesystem for this journal.
+ *  @bdev_file: Opened block device on which to create the journal
+ *  @fs_dev_file: Opened device which hold journalled filesystem for this journal.
  *  @start: Block nr Start of journal.
  *  @len:  Length of the journal in blocks.
  *  @blocksize: blocksize of journalling device
@@ -1640,13 +1643,13 @@ static journal_t *journal_init_common(struct block_device *bdev,
  *  range of blocks on an arbitrary block device.
  *
  */
-journal_t *jbd2_journal_init_dev(struct block_device *bdev,
-			struct block_device *fs_dev,
+journal_t *jbd2_journal_init_dev(struct file *bdev_file,
+			struct file *fs_dev_file,
 			unsigned long long start, int len, int blocksize)
 {
 	journal_t *journal;
 
-	journal = journal_init_common(bdev, fs_dev, start, len, blocksize);
+	journal = journal_init_common(bdev_file, fs_dev_file, start, len, blocksize);
 	if (IS_ERR(journal))
 		return ERR_CAST(journal);
 
@@ -1683,8 +1686,9 @@ journal_t *jbd2_journal_init_inode(struct inode *inode)
 		  inode->i_sb->s_id, inode->i_ino, (long long) inode->i_size,
 		  inode->i_sb->s_blocksize_bits, inode->i_sb->s_blocksize);
 
-	journal = journal_init_common(inode->i_sb->s_bdev, inode->i_sb->s_bdev,
-			blocknr, inode->i_size >> inode->i_sb->s_blocksize_bits,
+	journal = journal_init_common(inode->i_sb->s_bdev_file,
+			inode->i_sb->s_bdev_file, blocknr,
+			inode->i_size >> inode->i_sb->s_blocksize_bits,
 			inode->i_sb->s_blocksize);
 	if (IS_ERR(journal))
 		return ERR_CAST(journal);
@@ -2009,7 +2013,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
 		byte_count = (block_stop - block_start + 1) *
 				journal->j_blocksize;
 
-		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
+		truncate_inode_pages_range(journal->j_dev_file->f_mapping,
 				byte_start, byte_stop);
 
 		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..fc26730ae8ef 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -968,6 +968,11 @@ struct journal_s
 	 */
 	struct block_device	*j_dev;
 
+	/**
+	 * @j_dev_file: Opended device @j_dev.
+	 */
+	struct file		*j_dev_file;
+
 	/**
 	 * @j_blocksize: Block size for the location where we store the journal.
 	 */
@@ -993,6 +998,11 @@ struct journal_s
 	 */
 	struct block_device	*j_fs_dev;
 
+	/**
+	 * @j_fs_dev_file: Opened device @j_fs_dev.
+	 */
+	struct file		*j_fs_dev_file;
+
 	/**
 	 * @j_fs_dev_wb_err:
 	 *
@@ -1533,8 +1543,8 @@ extern void	 jbd2_journal_unlock_updates (journal_t *);
 
 void jbd2_journal_wait_updates(journal_t *);
 
-extern journal_t * jbd2_journal_init_dev(struct block_device *bdev,
-				struct block_device *fs_dev,
+extern journal_t *jbd2_journal_init_dev(struct file *bdev_file,
+				struct file *fs_dev_file,
 				unsigned long long start, int len, int bsize);
 extern journal_t * jbd2_journal_init_inode (struct inode *);
 extern int	   jbd2_journal_update_format (journal_t *);
@@ -1696,7 +1706,7 @@ static inline void jbd2_journal_abort_handle(handle_t *handle)
 
 static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
 
 	/*
 	 * Save the original wb_err value of client fs's bdev mapping which
@@ -1707,7 +1717,7 @@ static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 
 static inline int jbd2_check_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
 
 	return errseq_check(&mapping->wb_err,
 			    READ_ONCE(journal->j_fs_dev_wb_err));
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 15/19] bcache: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (13 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 15:11   ` Jan Kara
  2024-03-17 21:34   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that all bcache stash the file of opened bdev, it's ok to get
mapping from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/bcache/super.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4153c9ddbe0b..ec9efa79d5a8 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -163,15 +163,16 @@ static const char *read_super_common(struct cache_sb *sb,  struct block_device *
 }
 
 
-static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
+static const char *read_super(struct cache_sb *sb, struct file *bdev_file,
 			      struct cache_sb_disk **res)
 {
 	const char *err;
+	struct block_device *bdev = file_bdev(bdev_file);
 	struct cache_sb_disk *s;
 	struct page *page;
 	unsigned int i;
 
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+	page = read_cache_page_gfp(bdev_file->f_mapping,
 				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
 	if (IS_ERR(page))
 		return "IO error";
@@ -2564,7 +2565,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (set_blocksize(file_bdev(bdev_file), 4096))
 		goto out_blkdev_put;
 
-	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
+	err = read_super(sb, bdev_file, &sb_disk);
 	if (err)
 		goto out_blkdev_put;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 16/19] block2mtd: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (14 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-15 15:12   ` Jan Kara
  2024-03-17 21:36   ` Christoph Hellwig
  2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
                   ` (4 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that block2mtd stash the file of opened bdev, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/mtd/devices/block2mtd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 97a00ec9a4d4..e9ecb3286dcb 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -265,6 +265,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 	struct file *bdev_file;
 	struct block_device *bdev;
 	struct block2mtd_dev *dev;
+	loff_t size;
 	char *name;
 
 	if (!devname)
@@ -291,7 +292,8 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		goto err_free_block2mtd;
 	}
 
-	if ((long)bdev->bd_inode->i_size % erase_size) {
+	size = i_size_read(file_inode(bdev_file));
+	if ((long)size % erase_size) {
 		pr_err("erasesize must be a divisor of device size\n");
 		goto err_free_block2mtd;
 	}
@@ -309,7 +311,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 
 	dev->mtd.name = name;
 
-	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+	dev->mtd.size = size & PAGE_MASK;
 	dev->mtd.erasesize = erase_size;
 	dev->mtd.writesize = 1;
 	dev->mtd.writebufsize = PAGE_SIZE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (15 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-02-28 13:41   ` Christoph Hellwig
  2024-03-18  9:19   ` Jan Kara
  2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
                   ` (3 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Now that dm upper layer already statsh the file of opened device in
'dm_dev->bdev_file', it's ok to get inode from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-vdo/dedupe.c                |  3 ++-
 drivers/md/dm-vdo/dm-vdo-target.c         |  5 +++--
 drivers/md/dm-vdo/indexer/config.c        |  1 +
 drivers/md/dm-vdo/indexer/config.h        |  3 +++
 drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
 drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
 drivers/md/dm-vdo/indexer/index-session.c | 13 +++++++------
 drivers/md/dm-vdo/indexer/index.c         |  4 ++--
 drivers/md/dm-vdo/indexer/index.h         |  2 +-
 drivers/md/dm-vdo/indexer/indexer.h       |  4 +++-
 drivers/md/dm-vdo/indexer/io-factory.c    | 13 ++++++++-----
 drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
 drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
 drivers/md/dm-vdo/indexer/volume.h        |  2 +-
 14 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
index a9b189395592..532294a15174 100644
--- a/drivers/md/dm-vdo/dedupe.c
+++ b/drivers/md/dm-vdo/dedupe.c
@@ -2592,7 +2592,8 @@ static void resume_index(void *context, struct vdo_completion *parent)
 	int result;
 
 	zones->parameters.bdev = config->owned_device->bdev;
-	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
+	zones->parameters.bdev_file = config->owned_device->bdev_file;
+	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev_file);
 	if (result != UDS_SUCCESS)
 		vdo_log_error_strerror(result, "Error resuming dedupe index");
 
diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
index 89d00be9f075..b2d7f68e70be 100644
--- a/drivers/md/dm-vdo/dm-vdo-target.c
+++ b/drivers/md/dm-vdo/dm-vdo-target.c
@@ -883,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
 	}
 
 	if (config->version == 0) {
-		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
+		u64 device_size = i_size_read(file_inode(config->owned_device->bdev_file));
 
 		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
 	}
@@ -1018,7 +1018,8 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
 
 static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
 {
-	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
+	return i_size_read(file_inode(vdo->device_config->owned_device->bdev_file)) /
+		VDO_BLOCK_SIZE;
 }
 
 static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
index 260993ce1944..f1f66e232b54 100644
--- a/drivers/md/dm-vdo/indexer/config.c
+++ b/drivers/md/dm-vdo/indexer/config.c
@@ -347,6 +347,7 @@ int uds_make_configuration(const struct uds_parameters *params,
 	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
 	config->nonce = params->nonce;
 	config->bdev = params->bdev;
+	config->bdev_file = params->bdev_file;
 	config->offset = params->offset;
 	config->size = params->size;
 
diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
index fe7958263ed6..688f7450183e 100644
--- a/drivers/md/dm-vdo/indexer/config.h
+++ b/drivers/md/dm-vdo/indexer/config.h
@@ -28,6 +28,9 @@ struct uds_configuration {
 	/* Storage device for the index */
 	struct block_device *bdev;
 
+	/* Opened device fot the index */
+	struct file *bdev_file;
+
 	/* The maximum allowable size of the index */
 	size_t size;
 
diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
index 1453fddaa656..6dd80a432fe5 100644
--- a/drivers/md/dm-vdo/indexer/index-layout.c
+++ b/drivers/md/dm-vdo/indexer/index-layout.c
@@ -1672,7 +1672,7 @@ static int create_layout_factory(struct index_layout *layout,
 	size_t writable_size;
 	struct io_factory *factory = NULL;
 
-	result = uds_make_io_factory(config->bdev, &factory);
+	result = uds_make_io_factory(config->bdev_file, &factory);
 	if (result != UDS_SUCCESS)
 		return result;
 
@@ -1745,9 +1745,9 @@ void vdo_free_index_layout(struct index_layout *layout)
 }
 
 int uds_replace_index_layout_storage(struct index_layout *layout,
-				     struct block_device *bdev)
+				     struct file *bdev_file)
 {
-	return uds_replace_storage(layout->factory, bdev);
+	return uds_replace_storage(layout->factory, bdev_file);
 }
 
 /* Obtain a dm_bufio_client for the volume region. */
diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
index bd9b90c84a70..9b0c850fe9a7 100644
--- a/drivers/md/dm-vdo/indexer/index-layout.h
+++ b/drivers/md/dm-vdo/indexer/index-layout.h
@@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
 void vdo_free_index_layout(struct index_layout *layout);
 
 int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
-						  struct block_device *bdev);
+						  struct file *bdev_file);
 
 int __must_check uds_load_index_state(struct index_layout *layout,
 				      struct uds_index *index);
diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
index 1949a2598656..df8f8122a22d 100644
--- a/drivers/md/dm-vdo/indexer/index-session.c
+++ b/drivers/md/dm-vdo/indexer/index-session.c
@@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
 	return uds_status_to_errno(result);
 }
 
-static int replace_device(struct uds_index_session *session, struct block_device *bdev)
+static int replace_device(struct uds_index_session *session, struct file *bdev_file)
 {
 	int result;
 
-	result = uds_replace_index_storage(session->index, bdev);
+	result = uds_replace_index_storage(session->index, bdev_file);
 	if (result != UDS_SUCCESS)
 		return result;
 
-	session->parameters.bdev = bdev;
+	session->parameters.bdev = file_bdev(bdev_file);
+	session->parameters.bdev_file = bdev_file;
 	return UDS_SUCCESS;
 }
 
@@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
  * device differs from the current backing store, the index will start using the new backing store.
  */
 int uds_resume_index_session(struct uds_index_session *session,
-			     struct block_device *bdev)
+			     struct file *bdev_file)
 {
 	int result = UDS_SUCCESS;
 	bool no_work = false;
@@ -502,8 +503,8 @@ int uds_resume_index_session(struct uds_index_session *session,
 	if (no_work)
 		return result;
 
-	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
-		result = replace_device(session, bdev);
+	if ((session->index != NULL) && (bdev_file != session->parameters.bdev_file)) {
+		result = replace_device(session, bdev_file);
 		if (result != UDS_SUCCESS) {
 			mutex_lock(&session->request_mutex);
 			session->state &= ~IS_FLAG_WAITING;
diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
index bd2405738c50..3600a169ca98 100644
--- a/drivers/md/dm-vdo/indexer/index.c
+++ b/drivers/md/dm-vdo/indexer/index.c
@@ -1334,9 +1334,9 @@ int uds_save_index(struct uds_index *index)
 	return result;
 }
 
-int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
+int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
 {
-	return uds_replace_volume_storage(index->volume, index->layout, bdev);
+	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
 }
 
 /* Accessing statistics should be safe from any thread. */
diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
index 7fbc63db4131..9428ee025cda 100644
--- a/drivers/md/dm-vdo/indexer/index.h
+++ b/drivers/md/dm-vdo/indexer/index.h
@@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
 void vdo_free_index(struct uds_index *index);
 
 int __must_check uds_replace_index_storage(struct uds_index *index,
-					   struct block_device *bdev);
+					   struct file *bdev_file);
 
 void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
 
diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
index a832a34d9436..5dd2c93f12c2 100644
--- a/drivers/md/dm-vdo/indexer/indexer.h
+++ b/drivers/md/dm-vdo/indexer/indexer.h
@@ -130,6 +130,8 @@ struct uds_volume_record {
 struct uds_parameters {
 	/* The block_device used for storage */
 	struct block_device *bdev;
+	/* Then opened block_device */
+	struct file *bdev_file;
 	/* The maximum allowable size of the index on storage */
 	size_t size;
 	/* The offset where the index should start */
@@ -314,7 +316,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
  * start using the new backing store instead.
  */
 int __must_check uds_resume_index_session(struct uds_index_session *session,
-					  struct block_device *bdev);
+					  struct file *bdev_file);
 
 /* Wait until all outstanding index operations are complete. */
 int __must_check uds_flush_index_session(struct uds_index_session *session);
diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
index 61104d5ccd61..a855c3ac73bc 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.c
+++ b/drivers/md/dm-vdo/indexer/io-factory.c
@@ -23,6 +23,7 @@
  */
 struct io_factory {
 	struct block_device *bdev;
+	struct file *bdev_file;
 	atomic_t ref_count;
 };
 
@@ -59,7 +60,7 @@ static void uds_get_io_factory(struct io_factory *factory)
 	atomic_inc(&factory->ref_count);
 }
 
-int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
+int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
 {
 	int result;
 	struct io_factory *factory;
@@ -68,16 +69,18 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
 	if (result != VDO_SUCCESS)
 		return result;
 
-	factory->bdev = bdev;
+	factory->bdev = file_bdev(bdev_file);
+	factory->bdev_file = bdev_file;
 	atomic_set_release(&factory->ref_count, 1);
 
 	*factory_ptr = factory;
 	return UDS_SUCCESS;
 }
 
-int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
+int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
 {
-	factory->bdev = bdev;
+	factory->bdev = file_bdev(bdev_file);
+	factory->bdev_file = bdev_file;
 	return UDS_SUCCESS;
 }
 
@@ -90,7 +93,7 @@ void uds_put_io_factory(struct io_factory *factory)
 
 size_t uds_get_writable_size(struct io_factory *factory)
 {
-	return i_size_read(factory->bdev->bd_inode);
+	return i_size_read(file_inode(factory->bdev_file));
 }
 
 /* Create a struct dm_bufio_client for an index region starting at offset. */
diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
index 60749a9ff756..e5100ab57754 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.h
+++ b/drivers/md/dm-vdo/indexer/io-factory.h
@@ -24,11 +24,11 @@ enum {
 	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
 };
 
-int __must_check uds_make_io_factory(struct block_device *bdev,
+int __must_check uds_make_io_factory(struct file *bdev_file,
 				     struct io_factory **factory_ptr);
 
 int __must_check uds_replace_storage(struct io_factory *factory,
-				     struct block_device *bdev);
+				     struct file *bdev_file);
 
 void uds_put_io_factory(struct io_factory *factory);
 
diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
index 8b21ec93f3bc..a292840a83e3 100644
--- a/drivers/md/dm-vdo/indexer/volume.c
+++ b/drivers/md/dm-vdo/indexer/volume.c
@@ -1467,12 +1467,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
 
 int __must_check uds_replace_volume_storage(struct volume *volume,
 					    struct index_layout *layout,
-					    struct block_device *bdev)
+					    struct file *bdev_file)
 {
 	int result;
 	u32 i;
 
-	result = uds_replace_index_layout_storage(layout, bdev);
+	result = uds_replace_index_layout_storage(layout, bdev_file);
 	if (result != UDS_SUCCESS)
 		return result;
 
diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
index 7fdd44464db2..5861654d837e 100644
--- a/drivers/md/dm-vdo/indexer/volume.h
+++ b/drivers/md/dm-vdo/indexer/volume.h
@@ -131,7 +131,7 @@ void vdo_free_volume(struct volume *volume);
 
 int __must_check uds_replace_volume_storage(struct volume *volume,
 					    struct index_layout *layout,
-					    struct block_device *bdev);
+					    struct file *bdev_file);
 
 int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
 						    u64 *lowest_vcn, u64 *highest_vcn,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (16 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-03-17 21:36   ` Christoph Hellwig
  2024-03-18  9:22   ` Jan Kara
  2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (2 subsequent siblings)
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

scsi_bios_ptable() is reading without opening disk as file, factor out
a helper to read into block device page cache to prevent access bd_inode
directly from scsi.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           | 19 +++++++++++++++++++
 drivers/scsi/scsicam.c |  3 +--
 include/linux/blkdev.h |  1 +
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 60a1479eae83..b7af04d34af2 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1211,6 +1211,25 @@ unsigned int block_size(struct block_device *bdev)
 }
 EXPORT_SYMBOL_GPL(block_size);
 
+/**
+ * bdev_read_folio - Read into block device page cache.
+ * @bdev: the block device which holds the cache to read.
+ * @pos: the offset that allocated folio will contain.
+ *
+ * Read one page into the block device page cache. If it succeeds, the folio
+ * returned will contain @pos;
+ *
+ * This is only used for scsi_bios_ptable(), the bdev is not opened as files.
+ *
+ * Return: Uptodate folio on success, ERR_PTR() on failure.
+ */
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
+{
+	return mapping_read_folio_gfp(bdev_mapping(bdev),
+				      pos >> PAGE_SHIFT, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..1c99b964a0eb 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,11 +32,10 @@
  */
 unsigned char *scsi_bios_ptable(struct block_device *dev)
 {
-	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
 	unsigned char *res = NULL;
 	struct folio *folio;
 
-	folio = read_mapping_folio(mapping, 0, NULL);
+	folio = bdev_read_folio(bdev_whole(dev), 0);
 	if (IS_ERR(folio))
 		return NULL;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c510f334c84f..3fb02e3a527a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1514,6 +1514,7 @@ struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
 int bd_prepare_to_claim(struct block_device *bdev, void *holder,
 		const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos);
 
 /* just for blk-cgroup, don't use elsewhere */
 struct block_device *blkdev_get_no_open(dev_t dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (17 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
@ 2024-02-22 12:45 ` Yu Kuai
  2024-02-25  0:06   ` kernel test robot
  2024-03-17 21:38   ` Christoph Hellwig
  2024-02-28 13:42 ` [RFC v4 linux-next 00/19] " Christoph Hellwig
  2024-03-15 12:08 ` Yu Kuai
  20 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-02-22 12:45 UTC (permalink / raw)
  To: jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

In prior patches we introduced the ability to open block devices as
files and made all filesystems stash the opened block device files. With
this patch we remove bdev->bd_inode from struct block_device.

Using files allows us to stop passing struct block_device directly to
almost all buffer_head helpers. Whenever access to the inode of the
block device is needed bdev_file_inode(bdev_file) can be used instead of
bdev->bd_inode.

The only user that doesn't rely on files is the block layer itself in
block/fops.c where we only have access to the block device. As the bdev
filesystem doesn't open block devices as files obviously.

This introduces a union into struct buffer_head and struct iomap. The
union encompasses both struct block_device and struct file. In both
cases a flag is used to differentiate whether a block device or a proper
file was stashed. Simple accessors bh_bdev() and iomap_bdev() are used
to return the block device in the really low-level functions where it's
needed. These are overall just a few callsites.

Originally-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c                  |   8 ++-
 block/fops.c                  |   2 +
 drivers/md/md-bitmap.c        |   2 +-
 fs/affs/file.c                |   2 +-
 fs/btrfs/inode.c              |   2 +-
 fs/buffer.c                   | 103 +++++++++++++++++++---------------
 fs/direct-io.c                |   4 +-
 fs/erofs/data.c               |   3 +-
 fs/erofs/internal.h           |   1 +
 fs/erofs/zmap.c               |   2 +-
 fs/ext2/inode.c               |   4 +-
 fs/ext2/xattr.c               |   2 +-
 fs/ext4/inode.c               |   8 +--
 fs/ext4/mmp.c                 |   2 +-
 fs/ext4/page-io.c             |   5 +-
 fs/ext4/super.c               |   4 +-
 fs/ext4/xattr.c               |   2 +-
 fs/f2fs/data.c                |   7 ++-
 fs/f2fs/f2fs.h                |   1 +
 fs/fuse/dax.c                 |   2 +-
 fs/gfs2/aops.c                |   2 +-
 fs/gfs2/bmap.c                |   2 +-
 fs/gfs2/meta_io.c             |   2 +-
 fs/hpfs/file.c                |   2 +-
 fs/iomap/buffered-io.c        |   8 +--
 fs/iomap/direct-io.c          |  11 ++--
 fs/iomap/swapfile.c           |   2 +-
 fs/iomap/trace.h              |   2 +-
 fs/jbd2/commit.c              |   2 +-
 fs/jbd2/journal.c             |   8 +--
 fs/jbd2/recovery.c            |   8 +--
 fs/jbd2/revoke.c              |  13 +++--
 fs/jbd2/transaction.c         |   8 +--
 fs/mpage.c                    |  26 ++++++---
 fs/nilfs2/btnode.c            |   4 +-
 fs/nilfs2/gcinode.c           |   2 +-
 fs/nilfs2/mdt.c               |   2 +-
 fs/nilfs2/page.c              |   4 +-
 fs/nilfs2/recovery.c          |  27 +++++----
 fs/ntfs3/fsntfs.c             |   8 +--
 fs/ntfs3/inode.c              |   2 +-
 fs/ntfs3/super.c              |   2 +-
 fs/ocfs2/journal.c            |   2 +-
 fs/reiserfs/fix_node.c        |   2 +-
 fs/reiserfs/journal.c         |  10 ++--
 fs/reiserfs/prints.c          |   4 +-
 fs/reiserfs/reiserfs.h        |   6 +-
 fs/reiserfs/stree.c           |   2 +-
 fs/reiserfs/tail_conversion.c |   2 +-
 fs/xfs/xfs_iomap.c            |   4 +-
 fs/zonefs/file.c              |   4 +-
 include/linux/blk_types.h     |   1 -
 include/linux/blkdev.h        |   2 +
 include/linux/buffer_head.h   |  73 +++++++++++++++---------
 include/linux/iomap.h         |  14 ++++-
 include/trace/events/block.h  |   2 +-
 56 files changed, 259 insertions(+), 182 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index b7af04d34af2..98c192ff81ec 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -412,7 +412,6 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	spin_lock_init(&bdev->bd_size_lock);
 	mutex_init(&bdev->bd_holder_lock);
 	bdev->bd_partno = partno;
-	bdev->bd_inode = inode;
 	bdev->bd_queue = disk->queue;
 	if (partno)
 		bdev->bd_has_submit_bio = disk->part0->bd_has_submit_bio;
@@ -1230,6 +1229,13 @@ struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
 }
 EXPORT_SYMBOL_GPL(bdev_read_folio);
 
+void clean_bdev_aliases2(struct block_device *bdev, sector_t block,
+			 sector_t len)
+{
+	return __clean_bdev_aliases(bdev_inode(bdev), block, len);
+}
+EXPORT_SYMBOL_GPL(clean_bdev_aliases2);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/block/fops.c b/block/fops.c
index 1fcbdb131a8f..5550f8b53c21 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -386,6 +386,7 @@ static int blkdev_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	loff_t isize = i_size_read(inode);
 
 	iomap->bdev = bdev;
+	iomap->flags |= IOMAP_F_BDEV;
 	iomap->offset = ALIGN_DOWN(offset, bdev_logical_block_size(bdev));
 	if (iomap->offset >= isize)
 		return -EIO;
@@ -407,6 +408,7 @@ static int blkdev_get_block(struct inode *inode, sector_t iblock,
 	bh->b_bdev = I_BDEV(inode);
 	bh->b_blocknr = iblock;
 	set_buffer_mapped(bh);
+	set_buffer_bdev(bh);
 	return 0;
 }
 
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 9672f75c3050..689f5f543520 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -380,7 +380,7 @@ static int read_file_page(struct file *file, unsigned long index,
 			}
 
 			bh->b_blocknr = block;
-			bh->b_bdev = inode->i_sb->s_bdev;
+			bh->b_bdev_file = inode->i_sb->s_bdev_file;
 			if (count < blocksize)
 				count = 0;
 			else
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 04c018e19602..c0583831c58f 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -365,7 +365,7 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
 err_alloc:
 	brelse(ext_bh);
 	clear_buffer_mapped(bh_result);
-	bh_result->b_bdev = NULL;
+	bh_result->b_bdev_file = NULL;
 	// unlock cache
 	affs_unlock_ext(inode);
 	return -ENOSPC;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index df55dd891137..b3b2e01093dd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7716,7 +7716,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 		iomap->type = IOMAP_MAPPED;
 	}
 	iomap->offset = start;
-	iomap->bdev = fs_info->fs_devices->latest_dev->bdev;
+	iomap->bdev_file = fs_info->fs_devices->latest_dev->bdev_file;
 	iomap->length = len;
 	free_extent_map(em);
 
diff --git a/fs/buffer.c b/fs/buffer.c
index b55dea034a5d..5753c068ec78 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -129,7 +129,7 @@ static void buffer_io_error(struct buffer_head *bh, char *msg)
 	if (!test_bit(BH_Quiet, &bh->b_state))
 		printk_ratelimited(KERN_ERR
 			"Buffer I/O error on dev %pg, logical block %llu%s\n",
-			bh->b_bdev, (unsigned long long)bh->b_blocknr, msg);
+			bh_bdev(bh), (unsigned long long)bh->b_blocknr, msg);
 }
 
 /*
@@ -187,9 +187,9 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  * succeeds, there is no need to take i_private_lock.
  */
 static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct file *bdev_file, sector_t block)
 {
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = file_inode(bdev_file);
 	struct address_space *bd_mapping = bd_inode->i_mapping;
 	struct buffer_head *ret = NULL;
 	pgoff_t index;
@@ -232,7 +232,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 		       "device %pg blocksize: %d\n",
 		       (unsigned long long)block,
 		       (unsigned long long)bh->b_blocknr,
-		       bh->b_state, bh->b_size, bdev,
+		       bh->b_state, bh->b_size, file_bdev(bdev_file),
 		       1 << bd_inode->i_blkbits);
 	}
 out_unlock:
@@ -473,7 +473,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
  * try_to_free_buffers() will be operating against the *blockdev* mapping
  * at the time, not against the S_ISREG file which depends on those buffers.
  * So the locking for i_private_list is via the i_private_lock in the address_space
- * which backs the buffers.  Which is different from the address_space 
+ * which backs the buffers.  Which is different from the address_space
  * against which the buffers are listed.  So for a particular address_space,
  * mapping->i_private_lock does *not* protect mapping->i_private_list!  In fact,
  * mapping->i_private_list will always be protected by the backing blockdev's
@@ -655,10 +655,12 @@ EXPORT_SYMBOL(generic_buffers_fsync);
  * `bblock + 1' is probably a dirty indirect block.  Hunt it down and, if it's
  * dirty, schedule it for IO.  So that indirects merge nicely with their data.
  */
-void write_boundary_block(struct block_device *bdev,
+void write_boundary_block(struct file *bdev_file,
 			sector_t bblock, unsigned blocksize)
 {
-	struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize);
+	struct buffer_head *bh =
+		__find_get_block(bdev_file, bblock + 1, blocksize);
+
 	if (bh) {
 		if (buffer_dirty(bh))
 			write_dirty_buffer(bh, 0);
@@ -994,8 +996,9 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
  * Initialise the state of a blockdev folio's buffers.
  */ 
 static sector_t folio_init_buffers(struct folio *folio,
-		struct block_device *bdev, unsigned size)
+		struct file *bdev_file, unsigned int size)
 {
+	struct block_device *bdev = file_bdev(bdev_file);
 	struct buffer_head *head = folio_buffers(folio);
 	struct buffer_head *bh = head;
 	bool uptodate = folio_test_uptodate(folio);
@@ -1006,7 +1009,7 @@ static sector_t folio_init_buffers(struct folio *folio,
 		if (!buffer_mapped(bh)) {
 			bh->b_end_io = NULL;
 			bh->b_private = NULL;
-			bh->b_bdev = bdev;
+			bh->b_bdev_file = bdev_file;
 			bh->b_blocknr = block;
 			if (uptodate)
 				set_buffer_uptodate(bh);
@@ -1031,10 +1034,10 @@ static sector_t folio_init_buffers(struct folio *folio,
  * Returns false if we have a failure which cannot be cured by retrying
  * without sleeping.  Returns true if we succeeded, or the caller should retry.
  */
-static bool grow_dev_folio(struct block_device *bdev, sector_t block,
+static bool grow_dev_folio(struct file *bdev_file, sector_t block,
 		pgoff_t index, unsigned size, gfp_t gfp)
 {
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = file_inode(bdev_file);
 	struct folio *folio;
 	struct buffer_head *bh;
 	sector_t end_block = 0;
@@ -1047,7 +1050,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 	bh = folio_buffers(folio);
 	if (bh) {
 		if (bh->b_size == size) {
-			end_block = folio_init_buffers(folio, bdev, size);
+			end_block = folio_init_buffers(folio, bdev_file, size);
 			goto unlock;
 		}
 
@@ -1075,7 +1078,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 	 */
 	spin_lock(&inode->i_mapping->i_private_lock);
 	link_dev_buffers(folio, bh);
-	end_block = folio_init_buffers(folio, bdev, size);
+	end_block = folio_init_buffers(folio, bdev_file, size);
 	spin_unlock(&inode->i_mapping->i_private_lock);
 unlock:
 	folio_unlock(folio);
@@ -1088,7 +1091,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
  * that folio was dirty, the buffers are set dirty also.  Returns false
  * if we've hit a permanent error.
  */
-static bool grow_buffers(struct block_device *bdev, sector_t block,
+static bool grow_buffers(struct file *bdev_file, sector_t block,
 		unsigned size, gfp_t gfp)
 {
 	loff_t pos;
@@ -1100,18 +1103,19 @@ static bool grow_buffers(struct block_device *bdev, sector_t block,
 	if (check_mul_overflow(block, (sector_t)size, &pos) || pos > MAX_LFS_FILESIZE) {
 		printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
 			__func__, (unsigned long long)block,
-			bdev);
+			file_bdev(bdev_file));
 		return false;
 	}
 
 	/* Create a folio with the proper size buffers */
-	return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
+	return grow_dev_folio(bdev_file, block, pos / PAGE_SIZE, size, gfp);
 }
 
 static struct buffer_head *
-__getblk_slow(struct block_device *bdev, sector_t block,
-	     unsigned size, gfp_t gfp)
+__getblk_slow(struct file *bdev_file, sector_t block, unsigned size, gfp_t gfp)
 {
+	struct block_device *bdev = file_bdev(bdev_file);
+
 	/* Size must be multiple of hard sectorsize */
 	if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
 			(size < 512 || size > PAGE_SIZE))) {
@@ -1127,11 +1131,11 @@ __getblk_slow(struct block_device *bdev, sector_t block,
 	for (;;) {
 		struct buffer_head *bh;
 
-		bh = __find_get_block(bdev, block, size);
+		bh = __find_get_block(bdev_file, block, size);
 		if (bh)
 			return bh;
 
-		if (!grow_buffers(bdev, block, size, gfp))
+		if (!grow_buffers(bdev_file, block, size, gfp))
 			return NULL;
 	}
 }
@@ -1367,7 +1371,7 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
 	for (i = 0; i < BH_LRU_SIZE; i++) {
 		struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);
 
-		if (bh && bh->b_blocknr == block && bh->b_bdev == bdev &&
+		if (bh && bh->b_blocknr == block && bh_bdev(bh) == bdev &&
 		    bh->b_size == size) {
 			if (i) {
 				while (i) {
@@ -1392,13 +1396,14 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
  * NULL
  */
 struct buffer_head *
-__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+__find_get_block(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
+	struct buffer_head *bh = lookup_bh_lru(file_bdev(bdev_file), block,
+					       size);
 
 	if (bh == NULL) {
 		/* __find_get_block_slow will mark the page accessed */
-		bh = __find_get_block_slow(bdev, block);
+		bh = __find_get_block_slow(bdev_file, block);
 		if (bh)
 			bh_lru_install(bh);
 	} else
@@ -1410,32 +1415,32 @@ EXPORT_SYMBOL(__find_get_block);
 
 /**
  * bdev_getblk - Get a buffer_head in a block device's buffer cache.
- * @bdev: The block device.
+ * @bdev_file: The opened block device.
  * @block: The block number.
- * @size: The size of buffer_heads for this @bdev.
+ * @size: The size of buffer_heads for this block device.
  * @gfp: The memory allocation flags to use.
  *
  * Return: The buffer head, or NULL if memory could not be allocated.
  */
-struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
+struct buffer_head *bdev_getblk(struct file *bdev_file, sector_t block,
 		unsigned size, gfp_t gfp)
 {
-	struct buffer_head *bh = __find_get_block(bdev, block, size);
+	struct buffer_head *bh = __find_get_block(bdev_file, block, size);
 
 	might_alloc(gfp);
 	if (bh)
 		return bh;
 
-	return __getblk_slow(bdev, block, size, gfp);
+	return __getblk_slow(bdev_file, block, size, gfp);
 }
 EXPORT_SYMBOL(bdev_getblk);
 
 /*
  * Do async read-ahead on a buffer..
  */
-void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
+void __breadahead(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	struct buffer_head *bh = bdev_getblk(bdev, block, size,
+	struct buffer_head *bh = bdev_getblk(bdev_file, block, size,
 			GFP_NOWAIT | __GFP_MOVABLE);
 
 	if (likely(bh)) {
@@ -1447,7 +1452,7 @@ EXPORT_SYMBOL(__breadahead);
 
 /**
  *  __bread_gfp() - reads a specified block and returns the bh
- *  @bdev: the block_device to read from
+ *  @bdev_file: the opened block_device to read from
  *  @block: number of block
  *  @size: size (in bytes) to read
  *  @gfp: page allocation flag
@@ -1458,12 +1463,11 @@ EXPORT_SYMBOL(__breadahead);
  *  It returns NULL if the block was unreadable.
  */
 struct buffer_head *
-__bread_gfp(struct block_device *bdev, sector_t block,
-		   unsigned size, gfp_t gfp)
+__bread_gfp(struct file *bdev_file, sector_t block, unsigned int size, gfp_t gfp)
 {
 	struct buffer_head *bh;
 
-	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp |= mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 
 	/*
 	 * Prefer looping in the allocator rather than here, at least that
@@ -1471,7 +1475,7 @@ __bread_gfp(struct block_device *bdev, sector_t block,
 	 */
 	gfp |= __GFP_NOFAIL;
 
-	bh = bdev_getblk(bdev, block, size, gfp);
+	bh = bdev_getblk(bdev_file, block, size, gfp);
 
 	if (likely(bh) && !buffer_uptodate(bh))
 		bh = __bread_slow(bh);
@@ -1556,7 +1560,7 @@ EXPORT_SYMBOL(folio_set_bh);
 /* Bits that are cleared during an invalidate */
 #define BUFFER_FLAGS_DISCARD \
 	(1 << BH_Mapped | 1 << BH_New | 1 << BH_Req | \
-	 1 << BH_Delay | 1 << BH_Unwritten)
+	 1 << BH_Delay | 1 << BH_Unwritten | 1 << BH_Bdev)
 
 static void discard_buffer(struct buffer_head * bh)
 {
@@ -1564,7 +1568,7 @@ static void discard_buffer(struct buffer_head * bh)
 
 	lock_buffer(bh);
 	clear_buffer_dirty(bh);
-	bh->b_bdev = NULL;
+	bh->b_bdev_file = NULL;
 	b_state = READ_ONCE(bh->b_state);
 	do {
 	} while (!try_cmpxchg(&bh->b_state, &b_state,
@@ -1675,8 +1679,8 @@ struct buffer_head *create_empty_buffers(struct folio *folio,
 EXPORT_SYMBOL(create_empty_buffers);
 
 /**
- * clean_bdev_aliases: clean a range of buffers in block device
- * @bdev: Block device to clean buffers in
+ * __clean_bdev_aliases: clean a range of buffers in block device
+ * @inode: Block device inode to clean buffers in
  * @block: Start of a range of blocks to clean
  * @len: Number of blocks to clean
  *
@@ -1694,9 +1698,8 @@ EXPORT_SYMBOL(create_empty_buffers);
  * I/O in bforget() - it's more efficient to wait on the I/O only if we really
  * need to.  That happens here.
  */
-void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
+void __clean_bdev_aliases(struct inode *bd_inode, sector_t block, sector_t len)
 {
-	struct inode *bd_inode = bdev->bd_inode;
 	struct address_space *bd_mapping = bd_inode->i_mapping;
 	struct folio_batch fbatch;
 	pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
@@ -1746,7 +1749,7 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
 			break;
 	}
 }
-EXPORT_SYMBOL(clean_bdev_aliases);
+EXPORT_SYMBOL(__clean_bdev_aliases);
 
 static struct buffer_head *folio_create_buffers(struct folio *folio,
 						struct inode *inode,
@@ -2003,7 +2006,17 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 {
 	loff_t offset = (loff_t)block << inode->i_blkbits;
 
-	bh->b_bdev = iomap->bdev;
+	if (iomap->flags & IOMAP_F_BDEV) {
+		 /*
+		  * If this request originated directly from the block layer we
+		  * only have access to the plain block device. Mark the
+		  * buffer_head similarly.
+		  */
+		bh->b_bdev = iomap->bdev;
+		set_buffer_bdev(bh);
+	} else {
+		bh->b_bdev_file = iomap->bdev_file;
+	}
 
 	/*
 	 * Block points to offset in file we need to map, iomap contains
@@ -2778,7 +2791,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	if (buffer_prio(bh))
 		opf |= REQ_PRIO;
 
-	bio = bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO);
+	bio = bio_alloc(bh_bdev(bh), 1, opf, GFP_NOIO);
 
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 60456263a338..77691f2b2565 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -671,7 +671,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio,
 	sector = start_sector << (sdio->blkbits - 9);
 	nr_pages = bio_max_segs(sdio->pages_in_io);
 	BUG_ON(nr_pages <= 0);
-	dio_bio_alloc(dio, sdio, map_bh->b_bdev, sector, nr_pages);
+	dio_bio_alloc(dio, sdio, bh_bdev(map_bh), sector, nr_pages);
 	sdio->boundary = 0;
 out:
 	return ret;
@@ -946,7 +946,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 					map_bh->b_blocknr << sdio->blkfactor;
 				if (buffer_new(map_bh)) {
 					clean_bdev_aliases(
-						map_bh->b_bdev,
+						map_bh->b_bdev_file,
 						map_bh->b_blocknr,
 						map_bh->b_size >> i_blkbits);
 				}
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index dc2d43abe8c5..6127ff1ba453 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -204,6 +204,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	int id;
 
 	map->m_bdev = sb->s_bdev;
+	map->m_bdev_file = sb->s_bdev_file;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
 	map->m_fscache = EROFS_SB(sb)->s_fscache;
@@ -278,7 +279,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = mdev.m_daxdev;
 	else
-		iomap->bdev = mdev.m_bdev;
+		iomap->bdev_file = mdev.m_bdev_file;
 	iomap->length = map.m_llen;
 	iomap->flags = 0;
 	iomap->private = NULL;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 0f0706325b7b..50f8a7f161fd 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -377,6 +377,7 @@ enum {
 
 struct erofs_map_dev {
 	struct erofs_fscache *m_fscache;
+	struct file *m_bdev_file;
 	struct block_device *m_bdev;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index e313c936351d..6da3083e8252 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -739,7 +739,7 @@ static int z_erofs_iomap_begin_report(struct inode *inode, loff_t offset,
 	if (ret < 0)
 		return ret;
 
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->bdev_file = inode->i_sb->s_bdev_file;
 	iomap->offset = map.m_la;
 	iomap->length = map.m_llen;
 	if (map.m_flags & EROFS_MAP_MAPPED) {
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f3d570a9302b..32555734e727 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -744,7 +744,7 @@ static int ext2_get_blocks(struct inode *inode,
 		 * We must unmap blocks before zeroing so that writeback cannot
 		 * overwrite zeros with stale data from block device page cache.
 		 */
-		clean_bdev_aliases(inode->i_sb->s_bdev,
+		clean_bdev_aliases(inode->i_sb->s_bdev_file,
 				   le32_to_cpu(chain[depth-1].key),
 				   count);
 		/*
@@ -842,7 +842,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = sbi->s_daxdev;
 	else
-		iomap->bdev = inode->i_sb->s_bdev;
+		iomap->bdev_file = inode->i_sb->s_bdev_file;
 
 	if (ret == 0) {
 		/*
diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c
index c885dcc3bd0d..42e595e87a74 100644
--- a/fs/ext2/xattr.c
+++ b/fs/ext2/xattr.c
@@ -80,7 +80,7 @@
 	} while (0)
 # define ea_bdebug(bh, f...) do { \
 		printk(KERN_DEBUG "block %pg:%lu: ", \
-			bh->b_bdev, (unsigned long) bh->b_blocknr); \
+			bh_bdev(bh), (unsigned long) bh->b_blocknr); \
 		printk(f); \
 		printk("\n"); \
 	} while (0)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2ccf3b5e3a7c..eb861ca94e63 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1791,11 +1791,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
  * reserve space for a single block.
  *
  * For delayed buffer_head we have BH_Mapped, BH_New, BH_Delay set.
- * We also have b_blocknr = -1 and b_bdev initialized properly
+ * We also have b_blocknr = -1 and b_bdev_file initialized properly
  *
  * For unwritten buffer_head we have BH_Mapped, BH_New, BH_Unwritten set.
- * We also have b_blocknr = physicalblock mapping unwritten extent and b_bdev
- * initialized properly.
+ * We also have b_blocknr = physicalblock mapping unwritten extent and
+ * b_bdev_file initialized properly.
  */
 int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 			   struct buffer_head *bh, int create)
@@ -3235,7 +3235,7 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
 	else
-		iomap->bdev = inode->i_sb->s_bdev;
+		iomap->bdev_file = inode->i_sb->s_bdev_file;
 	iomap->offset = (u64) map->m_lblk << blkbits;
 	iomap->length = (u64) map->m_len << blkbits;
 
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index bd946d0c71b7..5641bd34d021 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -384,7 +384,7 @@ int ext4_multi_mount_protect(struct super_block *sb,
 
 	BUILD_BUG_ON(sizeof(mmp->mmp_bdevname) < BDEVNAME_SIZE);
 	snprintf(mmp->mmp_bdevname, sizeof(mmp->mmp_bdevname),
-		 "%pg", bh->b_bdev);
+		 "%pg", bh_bdev(bh));
 
 	/*
 	 * Start a kernel thread to update the MMP block periodically.
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 312bc6813357..b0c3de39daa1 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -93,8 +93,7 @@ struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end)
 static void buffer_io_error(struct buffer_head *bh)
 {
 	printk_ratelimited(KERN_ERR "Buffer I/O error on device %pg, logical block %llu\n",
-		       bh->b_bdev,
-			(unsigned long long)bh->b_blocknr);
+		       bh_bdev(bh), (unsigned long long)bh->b_blocknr);
 }
 
 static void ext4_finish_bio(struct bio *bio)
@@ -397,7 +396,7 @@ static void io_submit_init_bio(struct ext4_io_submit *io,
 	 * bio_alloc will _always_ be able to allocate a bio if
 	 * __GFP_DIRECT_RECLAIM is set, see comments for bio_alloc_bioset().
 	 */
-	bio = bio_alloc(bh->b_bdev, BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOIO);
+	bio = bio_alloc(bh_bdev(bh), BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOIO);
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio->bi_end_io = ext4_end_bio;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4df1a5cfe0a5..d2ca92bf5f7e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -261,7 +261,7 @@ struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 
 void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
 {
-	struct buffer_head *bh = bdev_getblk(sb->s_bdev, block,
+	struct buffer_head *bh = bdev_getblk(sb->s_bdev_file, block,
 			sb->s_blocksize, GFP_NOWAIT | __GFP_NOWARN);
 
 	if (likely(bh)) {
@@ -5862,7 +5862,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 	sb_block = EXT4_MIN_BLOCK_SIZE / blocksize;
 	offset = EXT4_MIN_BLOCK_SIZE % blocksize;
 	set_blocksize(bdev, blocksize);
-	bh = __bread(bdev, sb_block, blocksize);
+	bh = __bread(bdev_file, sb_block, blocksize);
 	if (!bh) {
 		ext4_msg(sb, KERN_ERR, "couldn't read superblock of "
 		       "external journal");
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 82dc5e673d5c..41128ccec2ec 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -68,7 +68,7 @@
 	       inode->i_sb->s_id, inode->i_ino, ##__VA_ARGS__)
 # define ea_bdebug(bh, fmt, ...)					\
 	printk(KERN_DEBUG "block %pg:%lu: " fmt "\n",			\
-	       bh->b_bdev, (unsigned long)bh->b_blocknr, ##__VA_ARGS__)
+	       bh_bdev(bh), (unsigned long)bh->b_blocknr, ##__VA_ARGS__)
 #else
 # define ea_idebug(inode, fmt, ...)	no_printk(fmt, ##__VA_ARGS__)
 # define ea_bdebug(bh, fmt, ...)	no_printk(fmt, ##__VA_ARGS__)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 05158f89ef32..8ec12b3716bc 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1606,6 +1606,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		goto out;
 
 	map->m_bdev = inode->i_sb->s_bdev;
+	map->m_bdev_file = inode->i_sb->s_bdev_file;
 	map->m_multidev_dio =
 		f2fs_allow_multi_device_dio(F2FS_I_SB(inode), flag);
 
@@ -1724,8 +1725,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		map->m_pblk = blkaddr;
 		map->m_len = 1;
 
-		if (map->m_multidev_dio)
+		if (map->m_multidev_dio) {
 			map->m_bdev = FDEV(bidx).bdev;
+			map->m_bdev_file = FDEV(bidx).bdev_file;
+		}
 	} else if ((map->m_pblk != NEW_ADDR &&
 			blkaddr == (map->m_pblk + ofs)) ||
 			(map->m_pblk == NEW_ADDR && blkaddr == NEW_ADDR) ||
@@ -4250,7 +4253,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->length = blks_to_bytes(inode, map.m_len);
 		iomap->type = IOMAP_MAPPED;
 		iomap->flags |= IOMAP_F_MERGED;
-		iomap->bdev = map.m_bdev;
+		iomap->bdev_file = map.m_bdev_file;
 		iomap->addr = blks_to_bytes(inode, map.m_pblk);
 	} else {
 		if (flags & IOMAP_WRITE)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cc481d7b9287..ed36c11325cd 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -697,6 +697,7 @@ struct extent_tree_info {
 				F2FS_MAP_DELALLOC)
 
 struct f2fs_map_blocks {
+	struct file *m_bdev_file;	/* for multi-device dio */
 	struct block_device *m_bdev;	/* for multi-device dio */
 	block_t m_pblk;
 	block_t m_lblk;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 12ef91d170bb..24966e93a237 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -575,7 +575,7 @@ static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	iomap->offset = pos;
 	iomap->flags = 0;
-	iomap->bdev = NULL;
+	iomap->bdev_file = NULL;
 	iomap->dax_dev = fc->dax->dev;
 
 	/*
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 974aca9c8ea8..0e4e295ebf49 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -622,7 +622,7 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
 			spin_unlock(&sdp->sd_ail_lock);
 		}
 	}
-	bh->b_bdev = NULL;
+	bh->b_bdev_file = NULL;
 	clear_buffer_mapped(bh);
 	clear_buffer_req(bh);
 	clear_buffer_new(bh);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 789af5c8fade..ef4e7ad83d4c 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -926,7 +926,7 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		iomap->flags |= IOMAP_F_GFS2_BOUNDARY;
 
 out:
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->bdev_file = inode->i_sb->s_bdev_file;
 unlock:
 	up_read(&ip->i_rw_mutex);
 	return ret;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index f814054c8cd0..2052d3fc2c24 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -218,7 +218,7 @@ static void gfs2_submit_bhs(blk_opf_t opf, struct buffer_head *bhs[], int num)
 		struct buffer_head *bh = *bhs;
 		struct bio *bio;
 
-		bio = bio_alloc(bh->b_bdev, num, opf, GFP_NOIO);
+		bio = bio_alloc(bh_bdev(bh), num, opf, GFP_NOIO);
 		bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 		while (num > 0) {
 			bh = *bhs;
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index 1bb8d97cd9ae..7353d0e2f35a 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -128,7 +128,7 @@ static int hpfs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (WARN_ON_ONCE(flags & (IOMAP_WRITE | IOMAP_ZERO)))
 		return -EINVAL;
 
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->bdev_file = inode->i_sb->s_bdev_file;
 	iomap->offset = offset;
 
 	hpfs_lock(sb);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 2ad0e287c704..2fc8abd693da 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -415,7 +415,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+		ctx->bio = bio_alloc(iomap_bdev(iomap), bio_max_segs(nr_vecs),
 				     REQ_OP_READ, gfp);
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
@@ -423,7 +423,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		 * what do_mpage_read_folio does.
 		 */
 		if (!ctx->bio) {
-			ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
+			ctx->bio = bio_alloc(iomap_bdev(iomap), 1, REQ_OP_READ,
 					     orig_gfp);
 		}
 		if (ctx->rac)
@@ -662,7 +662,7 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
 	struct bio_vec bvec;
 	struct bio bio;
 
-	bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
+	bio_init(&bio, iomap_bdev(iomap), &bvec, 1, REQ_OP_READ);
 	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
 	bio_add_folio_nofail(&bio, folio, plen, poff);
 	return submit_bio_wait(&bio);
@@ -1684,7 +1684,7 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
 	struct iomap_ioend *ioend;
 	struct bio *bio;
 
-	bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
+	bio = bio_alloc_bioset(iomap_bdev(&wpc->iomap), BIO_MAX_VECS,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index bcd3f8cf5ea4..42518754c65d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -56,9 +56,9 @@ static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter,
 		struct iomap_dio *dio, unsigned short nr_vecs, blk_opf_t opf)
 {
 	if (dio->dops && dio->dops->bio_set)
-		return bio_alloc_bioset(iter->iomap.bdev, nr_vecs, opf,
+		return bio_alloc_bioset(iomap_bdev(&iter->iomap), nr_vecs, opf,
 					GFP_KERNEL, dio->dops->bio_set);
-	return bio_alloc(iter->iomap.bdev, nr_vecs, opf, GFP_KERNEL);
+	return bio_alloc(iomap_bdev(&iter->iomap), nr_vecs, opf, GFP_KERNEL);
 }
 
 static void iomap_dio_submit_bio(const struct iomap_iter *iter,
@@ -288,8 +288,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
-	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
+	if ((pos | length) & (bdev_logical_block_size(iomap_bdev(iomap)) - 1) ||
+	    !bdev_iter_is_aligned(iomap_bdev(iomap), dio->submit.iter))
 		return -EINVAL;
 
 	if (iomap->type == IOMAP_UNWRITTEN) {
@@ -316,7 +316,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		 */
 		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
 		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
+		    (bdev_fua(iomap_bdev(iomap)) ||
+			      !bdev_write_cache(iomap_bdev(iomap))))
 			use_fua = true;
 		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
 			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 5fc0ac36dee3..20bd67e85d15 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -116,7 +116,7 @@ static loff_t iomap_swapfile_iter(const struct iomap_iter *iter,
 		return iomap_swapfile_fail(isi, "has shared extents");
 
 	/* Only one bdev per swap file. */
-	if (iomap->bdev != isi->sis->bdev)
+	if (iomap_bdev(iomap) != isi->sis->bdev)
 		return iomap_swapfile_fail(isi, "outside the main device");
 
 	if (isi->iomap.length == 0) {
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index c16fd55f5595..43fb3ce21674 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -134,7 +134,7 @@ DECLARE_EVENT_CLASS(iomap_class,
 		__entry->length = iomap->length;
 		__entry->type = iomap->type;
 		__entry->flags = iomap->flags;
-		__entry->bdev = iomap->bdev ? iomap->bdev->bd_dev : 0;
+		__entry->bdev = iomap_bdev(iomap) ? iomap_bdev(iomap)->bd_dev : 0;
 	),
 	TP_printk("dev %d:%d ino 0x%llx bdev %d:%d addr 0x%llx offset 0x%llx "
 		  "length 0x%llx type %s flags %s",
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 5e122586e06e..fffb1b4e2068 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1014,7 +1014,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 				clear_buffer_mapped(bh);
 				clear_buffer_new(bh);
 				clear_buffer_req(bh);
-				bh->b_bdev = NULL;
+				bh->b_bdev_file = NULL;
 			}
 		}
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index abd42a6ccd0e..bbe5d02801b6 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -434,7 +434,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
 
 	folio_set_bh(new_bh, new_folio, new_offset);
 	new_bh->b_size = bh_in->b_size;
-	new_bh->b_bdev = journal->j_dev;
+	new_bh->b_bdev_file = journal->j_dev_file;
 	new_bh->b_blocknr = blocknr;
 	new_bh->b_private = bh_in;
 	set_buffer_mapped(new_bh);
@@ -880,7 +880,7 @@ int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
 	if (ret)
 		return ret;
 
-	bh = __getblk(journal->j_dev, pblock, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, pblock, journal->j_blocksize);
 	if (!bh)
 		return -ENOMEM;
 
@@ -1007,7 +1007,7 @@ jbd2_journal_get_descriptor_buffer(transaction_t *transaction, int type)
 	if (err)
 		return NULL;
 
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, blocknr, journal->j_blocksize);
 	if (!bh)
 		return NULL;
 	atomic_dec(&transaction->t_outstanding_credits);
@@ -1461,7 +1461,7 @@ static int journal_load_superblock(journal_t *journal)
 	struct buffer_head *bh;
 	journal_superblock_t *sb;
 
-	bh = getblk_unmovable(journal->j_dev, journal->j_blk_offset,
+	bh = getblk_unmovable(journal->j_dev_file, journal->j_blk_offset,
 			      journal->j_blocksize);
 	if (bh)
 		err = bh_read(bh, 0);
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 1f7664984d6e..7b561e2c6a7c 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -92,7 +92,7 @@ static int do_readahead(journal_t *journal, unsigned int start)
 			goto failed;
 		}
 
-		bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+		bh = __getblk(journal->j_dev_file, blocknr, journal->j_blocksize);
 		if (!bh) {
 			err = -ENOMEM;
 			goto failed;
@@ -148,7 +148,7 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
 		return err;
 	}
 
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, blocknr, journal->j_blocksize);
 	if (!bh)
 		return -ENOMEM;
 
@@ -370,7 +370,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
 		journal->j_head = journal->j_first;
 	} else {
 #ifdef CONFIG_JBD2_DEBUG
-		int dropped = info.end_transaction - 
+		int dropped = info.end_transaction -
 			be32_to_cpu(journal->j_superblock->s_sequence);
 		jbd2_debug(1,
 			  "JBD2: ignoring %d transaction%s from the journal.\n",
@@ -672,7 +672,7 @@ static int do_one_pass(journal_t *journal,
 
 					/* Find a buffer for the new
 					 * data being restored */
-					nbh = __getblk(journal->j_fs_dev,
+					nbh = __getblk(journal->j_fs_dev_file,
 							blocknr,
 							journal->j_blocksize);
 					if (nbh == NULL) {
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 4556e4689024..99c2758539a8 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -328,7 +328,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 {
 	struct buffer_head *bh = NULL;
 	journal_t *journal;
-	struct block_device *bdev;
+	struct file *file;
 	int err;
 
 	might_sleep();
@@ -341,11 +341,11 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 		return -EINVAL;
 	}
 
-	bdev = journal->j_fs_dev;
+	file = journal->j_fs_dev_file;
 	bh = bh_in;
 
 	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh = __find_get_block(file, blocknr, journal->j_blocksize);
 		if (bh)
 			BUFFER_TRACE(bh, "found on hash");
 	}
@@ -355,7 +355,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 
 		/* If there is a different buffer_head lying around in
 		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh2 = __find_get_block(file, blocknr, journal->j_blocksize);
 		if (bh2) {
 			/* ... and it has RevokeValid status... */
 			if (bh2 != bh && buffer_revokevalid(bh2))
@@ -466,7 +466,8 @@ int jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	 * state machine will get very upset later on. */
 	if (need_cancel) {
 		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
+		bh2 = __find_get_block(bh->b_bdev_file, bh->b_blocknr,
+				       bh->b_size);
 		if (bh2) {
 			if (bh2 != bh)
 				clear_buffer_revoked(bh2);
@@ -495,7 +496,7 @@ void jbd2_clear_buffer_revoked_flags(journal_t *journal)
 			struct jbd2_revoke_record_s *record;
 			struct buffer_head *bh;
 			record = (struct jbd2_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
+			bh = __find_get_block(journal->j_fs_dev_file,
 					      record->blocknr,
 					      journal->j_blocksize);
 			if (bh) {
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index cb0b8d6fc0c6..30ebc93dc430 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -929,7 +929,7 @@ static void warn_dirty_buffer(struct buffer_head *bh)
 	       "JBD2: Spotted dirty metadata buffer (dev = %pg, blocknr = %llu). "
 	       "There's a risk of filesystem corruption in case of system "
 	       "crash.\n",
-	       bh->b_bdev, (unsigned long long)bh->b_blocknr);
+	       bh_bdev(bh), (unsigned long long)bh->b_blocknr);
 }
 
 /* Call t_frozen trigger and copy buffer data into jh->b_frozen_data. */
@@ -990,7 +990,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 	/* If it takes too long to lock the buffer, trace it */
 	time_lock = jbd2_time_diff(start_lock, jiffies);
 	if (time_lock > HZ/10)
-		trace_jbd2_lock_buffer_stall(bh->b_bdev->bd_dev,
+		trace_jbd2_lock_buffer_stall(bh_bdev(bh)->bd_dev,
 			jiffies_to_msecs(time_lock));
 
 	/* We now hold the buffer lock so it is safe to query the buffer
@@ -2374,7 +2374,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 			write_unlock(&journal->j_state_lock);
 			jbd2_journal_put_journal_head(jh);
 			/* Already zapped buffer? Nothing to do... */
-			if (!bh->b_bdev)
+			if (!bh_bdev(bh))
 				return 0;
 			return -EBUSY;
 		}
@@ -2428,7 +2428,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 	clear_buffer_new(bh);
 	clear_buffer_delay(bh);
 	clear_buffer_unwritten(bh);
-	bh->b_bdev = NULL;
+	bh->b_bdev_file = NULL;
 	return may_free;
 }
 
diff --git a/fs/mpage.c b/fs/mpage.c
index 738882e0766d..ef6e72eec312 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -126,7 +126,12 @@ static void map_buffer_to_folio(struct folio *folio, struct buffer_head *bh,
 	do {
 		if (block == page_block) {
 			page_bh->b_state = bh->b_state;
-			page_bh->b_bdev = bh->b_bdev;
+			if (buffer_bdev(bh)) {
+				page_bh->b_bdev = bh->b_bdev;
+				set_buffer_bdev(page_bh);
+			} else {
+				page_bh->b_bdev_file = bh->b_bdev_file;
+			}
 			page_bh->b_blocknr = bh->b_blocknr;
 			break;
 		}
@@ -216,7 +221,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			page_block++;
 			block_in_file++;
 		}
-		bdev = map_bh->b_bdev;
+		bdev = bh_bdev(map_bh);
 	}
 
 	/*
@@ -272,7 +277,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			page_block++;
 			block_in_file++;
 		}
-		bdev = map_bh->b_bdev;
+		bdev = bh_bdev(map_bh);
 	}
 
 	if (first_hole != blocks_per_page) {
@@ -472,7 +477,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 	struct block_device *bdev = NULL;
 	int boundary = 0;
 	sector_t boundary_block = 0;
-	struct block_device *boundary_bdev = NULL;
+	struct file *boundary_bdev_file = NULL;
 	size_t length;
 	struct buffer_head map_bh;
 	loff_t i_size = i_size_read(inode);
@@ -513,9 +518,9 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 			boundary = buffer_boundary(bh);
 			if (boundary) {
 				boundary_block = bh->b_blocknr;
-				boundary_bdev = bh->b_bdev;
+				boundary_bdev_file = bh->b_bdev_file;
 			}
-			bdev = bh->b_bdev;
+			bdev = bh_bdev(bh);
 		} while ((bh = bh->b_this_page) != head);
 
 		if (first_unmapped)
@@ -549,13 +554,16 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 		map_bh.b_size = 1 << blkbits;
 		if (mpd->get_block(inode, block_in_file, &map_bh, 1))
 			goto confused;
+		/* This helper cannot be used from the block layer directly. */
+		if (WARN_ON_ONCE(buffer_bdev(&map_bh)))
+			goto confused;
 		if (!buffer_mapped(&map_bh))
 			goto confused;
 		if (buffer_new(&map_bh))
 			clean_bdev_bh_alias(&map_bh);
 		if (buffer_boundary(&map_bh)) {
 			boundary_block = map_bh.b_blocknr;
-			boundary_bdev = map_bh.b_bdev;
+			boundary_bdev_file = map_bh.b_bdev_file;
 		}
 		if (page_block) {
 			if (map_bh.b_blocknr != first_block + page_block)
@@ -565,7 +573,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 		}
 		page_block++;
 		boundary = buffer_boundary(&map_bh);
-		bdev = map_bh.b_bdev;
+		bdev = bh_bdev(&map_bh);
 		if (block_in_file == last_block)
 			break;
 		block_in_file++;
@@ -627,7 +635,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 	if (boundary || (first_unmapped != blocks_per_page)) {
 		bio = mpage_bio_submit_write(bio);
 		if (boundary_block) {
-			write_boundary_block(boundary_bdev,
+			write_boundary_block(boundary_bdev_file,
 					boundary_block, 1 << blkbits);
 		}
 	} else {
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..0620bccbf6e0 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -59,7 +59,7 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
 		BUG();
 	}
 	memset(bh->b_data, 0, i_blocksize(inode));
-	bh->b_bdev = inode->i_sb->s_bdev;
+	bh->b_bdev_file = inode->i_sb->s_bdev_file;
 	bh->b_blocknr = blocknr;
 	set_buffer_mapped(bh);
 	set_buffer_uptodate(bh);
@@ -118,7 +118,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
 		goto found;
 	}
 	set_buffer_mapped(bh);
-	bh->b_bdev = inode->i_sb->s_bdev;
+	bh->b_bdev_file = inode->i_sb->s_bdev_file;
 	bh->b_blocknr = pblocknr; /* set block address for read */
 	bh->b_end_io = end_buffer_read_sync;
 	get_bh(bh);
diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index bf9a11d58817..77d4b9275b87 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -84,7 +84,7 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff,
 	}
 
 	if (!buffer_mapped(bh)) {
-		bh->b_bdev = inode->i_sb->s_bdev;
+		bh->b_bdev_file = inode->i_sb->s_bdev_file;
 		set_buffer_mapped(bh);
 	}
 	bh->b_blocknr = pbn;
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 4f792a0ad0f0..99cf302ce116 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -89,7 +89,7 @@ static int nilfs_mdt_create_block(struct inode *inode, unsigned long block,
 	if (buffer_uptodate(bh))
 		goto failed_bh;
 
-	bh->b_bdev = sb->s_bdev;
+	bh->b_bdev_file = sb->s_bdev_file;
 	err = nilfs_mdt_insert_new_block(inode, block, bh, init_block);
 	if (likely(!err)) {
 		get_bh(bh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 14e470fb8870..f893d7e2e472 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -111,7 +111,7 @@ void nilfs_copy_buffer(struct buffer_head *dbh, struct buffer_head *sbh)
 
 	dbh->b_state = sbh->b_state & NILFS_BUFFER_INHERENT_BITS;
 	dbh->b_blocknr = sbh->b_blocknr;
-	dbh->b_bdev = sbh->b_bdev;
+	dbh->b_bdev_file = sbh->b_bdev_file;
 
 	bh = dbh;
 	bits = sbh->b_state & (BIT(BH_Uptodate) | BIT(BH_Mapped));
@@ -216,7 +216,7 @@ static void nilfs_copy_folio(struct folio *dst, struct folio *src,
 		lock_buffer(dbh);
 		dbh->b_state = sbh->b_state & mask;
 		dbh->b_blocknr = sbh->b_blocknr;
-		dbh->b_bdev = sbh->b_bdev;
+		dbh->b_bdev_file = sbh->b_bdev_file;
 		sbh = sbh->b_this_page;
 		dbh = dbh->b_this_page;
 	} while (dbh != dbufs);
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index a9b8d77c8c1d..e2f5dcc923c7 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -107,7 +107,8 @@ static int nilfs_compute_checksum(struct the_nilfs *nilfs,
 		do {
 			struct buffer_head *bh;
 
-			bh = __bread(nilfs->ns_bdev, ++start, blocksize);
+			bh = __bread(nilfs->ns_sb->s_bdev_file, ++start,
+				     blocksize);
 			if (!bh)
 				return -EIO;
 			check_bytes -= size;
@@ -136,7 +137,8 @@ int nilfs_read_super_root_block(struct the_nilfs *nilfs, sector_t sr_block,
 	int ret;
 
 	*pbh = NULL;
-	bh_sr = __bread(nilfs->ns_bdev, sr_block, nilfs->ns_blocksize);
+	bh_sr = __bread(nilfs->ns_sb->s_bdev_file, sr_block,
+			nilfs->ns_blocksize);
 	if (unlikely(!bh_sr)) {
 		ret = NILFS_SEG_FAIL_IO;
 		goto failed;
@@ -183,7 +185,8 @@ nilfs_read_log_header(struct the_nilfs *nilfs, sector_t start_blocknr,
 {
 	struct buffer_head *bh_sum;
 
-	bh_sum = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
+	bh_sum = __bread(nilfs->ns_sb->s_bdev_file, start_blocknr,
+			 nilfs->ns_blocksize);
 	if (bh_sum)
 		*sum = (struct nilfs_segment_summary *)bh_sum->b_data;
 	return bh_sum;
@@ -250,7 +253,7 @@ static void *nilfs_read_summary_info(struct the_nilfs *nilfs,
 	if (bytes > (*pbh)->b_size - *offset) {
 		blocknr = (*pbh)->b_blocknr;
 		brelse(*pbh);
-		*pbh = __bread(nilfs->ns_bdev, blocknr + 1,
+		*pbh = __bread(nilfs->ns_sb->s_bdev_file, blocknr + 1,
 			       nilfs->ns_blocksize);
 		if (unlikely(!*pbh))
 			return NULL;
@@ -289,7 +292,7 @@ static void nilfs_skip_summary_info(struct the_nilfs *nilfs,
 		*offset = bytes * (count - (bcnt - 1) * nitem_per_block);
 
 		brelse(*pbh);
-		*pbh = __bread(nilfs->ns_bdev, blocknr + bcnt,
+		*pbh = __bread(nilfs->ns_sb->s_bdev_file, blocknr + bcnt,
 			       nilfs->ns_blocksize);
 	}
 }
@@ -318,7 +321,8 @@ static int nilfs_scan_dsync_log(struct the_nilfs *nilfs, sector_t start_blocknr,
 
 	sumbytes = le32_to_cpu(sum->ss_sumbytes);
 	blocknr = start_blocknr + DIV_ROUND_UP(sumbytes, nilfs->ns_blocksize);
-	bh = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
+	bh = __bread(nilfs->ns_sb->s_bdev_file, start_blocknr,
+		     nilfs->ns_blocksize);
 	if (unlikely(!bh))
 		goto out;
 
@@ -478,7 +482,8 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
 	size_t from = pos & ~PAGE_MASK;
 	void *kaddr;
 
-	bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
+	bh_org = __bread(nilfs->ns_sb->s_bdev_file, rb->blocknr,
+			 nilfs->ns_blocksize);
 	if (unlikely(!bh_org))
 		return -EIO;
 
@@ -697,7 +702,8 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
 	    nilfs_get_segnum_of_block(nilfs, ri->ri_super_root))
 		return;
 
-	bh = __getblk(nilfs->ns_bdev, ri->ri_lsegs_start, nilfs->ns_blocksize);
+	bh = __getblk(nilfs->ns_sb->s_bdev_file, ri->ri_lsegs_start,
+		      nilfs->ns_blocksize);
 	BUG_ON(!bh);
 	memset(bh->b_data, 0, bh->b_size);
 	set_buffer_dirty(bh);
@@ -823,7 +829,8 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 	/* Read ahead segment */
 	b = seg_start;
 	while (b <= seg_end)
-		__breadahead(nilfs->ns_bdev, b++, nilfs->ns_blocksize);
+		__breadahead(nilfs->ns_sb->s_bdev_file, b++,
+			     nilfs->ns_blocksize);
 
 	for (;;) {
 		brelse(bh_sum);
@@ -869,7 +876,7 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		if (pseg_start == seg_start) {
 			nilfs_get_segment_range(nilfs, nextnum, &b, &end);
 			while (b <= end)
-				__breadahead(nilfs->ns_bdev, b++,
+				__breadahead(nilfs->ns_sb->s_bdev_file, b++,
 					     nilfs->ns_blocksize);
 		}
 		if (!(flags & NILFS_SS_SR)) {
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
index ae2ef5c11868..def075a25b2c 100644
--- a/fs/ntfs3/fsntfs.c
+++ b/fs/ntfs3/fsntfs.c
@@ -1033,14 +1033,13 @@ struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block)
 
 int ntfs_sb_read(struct super_block *sb, u64 lbo, size_t bytes, void *buffer)
 {
-	struct block_device *bdev = sb->s_bdev;
 	u32 blocksize = sb->s_blocksize;
 	u64 block = lbo >> sb->s_blocksize_bits;
 	u32 off = lbo & (blocksize - 1);
 	u32 op = blocksize - off;
 
 	for (; bytes; block += 1, off = 0, op = blocksize) {
-		struct buffer_head *bh = __bread(bdev, block, blocksize);
+		struct buffer_head *bh = __bread(sb->s_bdev_file, block, blocksize);
 
 		if (!bh)
 			return -EIO;
@@ -1063,7 +1062,6 @@ int ntfs_sb_write(struct super_block *sb, u64 lbo, size_t bytes,
 		  const void *buf, int wait)
 {
 	u32 blocksize = sb->s_blocksize;
-	struct block_device *bdev = sb->s_bdev;
 	sector_t block = lbo >> sb->s_blocksize_bits;
 	u32 off = lbo & (blocksize - 1);
 	u32 op = blocksize - off;
@@ -1077,14 +1075,14 @@ int ntfs_sb_write(struct super_block *sb, u64 lbo, size_t bytes,
 			op = bytes;
 
 		if (op < blocksize) {
-			bh = __bread(bdev, block, blocksize);
+			bh = __bread(sb->s_bdev_file, block, blocksize);
 			if (!bh) {
 				ntfs_err(sb, "failed to read block %llx",
 					 (u64)block);
 				return -EIO;
 			}
 		} else {
-			bh = __getblk(bdev, block, blocksize);
+			bh = __getblk(sb->s_bdev_file, block, blocksize);
 			if (!bh)
 				return -ENOMEM;
 		}
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 3c4c878f6d77..a97eedc5130f 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -609,7 +609,7 @@ static noinline int ntfs_get_block_vbo(struct inode *inode, u64 vbo,
 	lbo = ((u64)lcn << cluster_bits) + off;
 
 	set_buffer_mapped(bh);
-	bh->b_bdev = sb->s_bdev;
+	bh->b_bdev_file = sb->s_bdev_file;
 	bh->b_blocknr = lbo >> sb->s_blocksize_bits;
 
 	valid = ni->i_valid;
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index cef5467fd928..aa7c6a8b04de 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1642,7 +1642,7 @@ void ntfs_unmap_meta(struct super_block *sb, CLST lcn, CLST len)
 		limit >>= 1;
 
 	while (blocks--) {
-		clean_bdev_aliases(bdev, devblock++, 1);
+		clean_bdev_aliases(sb->s_bdev_file, devblock++, 1);
 		if (cnt++ >= limit) {
 			sync_blockdev(bdev);
 			cnt = 0;
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 604fea3a26ff..4ad64997f3c7 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1209,7 +1209,7 @@ static int ocfs2_force_read_journal(struct inode *inode)
 		}
 
 		for (i = 0; i < p_blocks; i++, p_blkno++) {
-			bh = __find_get_block(osb->sb->s_bdev, p_blkno,
+			bh = __find_get_block(osb->sb->s_bdev_file, p_blkno,
 					osb->sb->s_blocksize);
 			/* block not cached. */
 			if (!bh)
diff --git a/fs/reiserfs/fix_node.c b/fs/reiserfs/fix_node.c
index 6c13a8d9a73c..2b288b1539d9 100644
--- a/fs/reiserfs/fix_node.c
+++ b/fs/reiserfs/fix_node.c
@@ -2332,7 +2332,7 @@ static void tb_buffer_sanity_check(struct super_block *sb,
 				       "in tree %s[%d] (%b)",
 				       descr, level, bh);
 
-		if (bh->b_bdev != sb->s_bdev)
+		if (bh_bdev(bh) != sb->s_bdev)
 			reiserfs_panic(sb, "jmacd-4", "buffer has wrong "
 				       "device %s[%d] (%b)",
 				       descr, level, bh);
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 6474529c4253..4d07d2f26317 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -618,7 +618,7 @@ static void reiserfs_end_buffer_io_sync(struct buffer_head *bh, int uptodate)
 	if (buffer_journaled(bh)) {
 		reiserfs_warning(NULL, "clm-2084",
 				 "pinned buffer %lu:%pg sent to disk",
-				 bh->b_blocknr, bh->b_bdev);
+				 bh->b_blocknr, bh_bdev(bh));
 	}
 	if (uptodate)
 		set_buffer_uptodate(bh);
@@ -2315,7 +2315,7 @@ static int journal_read_transaction(struct super_block *sb,
  * from other places.
  * Note: Do not use journal_getblk/sb_getblk functions here!
  */
-static struct buffer_head *reiserfs_breada(struct block_device *dev,
+static struct buffer_head *reiserfs_breada(struct file *bdev_file,
 					   b_blocknr_t block, int bufsize,
 					   b_blocknr_t max_block)
 {
@@ -2324,7 +2324,7 @@ static struct buffer_head *reiserfs_breada(struct block_device *dev,
 	struct buffer_head *bh;
 	int i, j;
 
-	bh = __getblk(dev, block, bufsize);
+	bh = __getblk(bdev_file, block, bufsize);
 	if (!bh || buffer_uptodate(bh))
 		return (bh);
 
@@ -2334,7 +2334,7 @@ static struct buffer_head *reiserfs_breada(struct block_device *dev,
 	bhlist[0] = bh;
 	j = 1;
 	for (i = 1; i < blocks; i++) {
-		bh = __getblk(dev, block + i, bufsize);
+		bh = __getblk(bdev_file, block + i, bufsize);
 		if (!bh)
 			break;
 		if (buffer_uptodate(bh)) {
@@ -2447,7 +2447,7 @@ static int journal_read(struct super_block *sb)
 		 * device and journal device to be the same
 		 */
 		d_bh =
-		    reiserfs_breada(file_bdev(journal->j_bdev_file), cur_dblock,
+		    reiserfs_breada(journal->j_bdev_file, cur_dblock,
 				    sb->s_blocksize,
 				    SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
 				    SB_ONDISK_JOURNAL_SIZE(sb));
diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c
index 84a194b77f19..249a458b6e28 100644
--- a/fs/reiserfs/prints.c
+++ b/fs/reiserfs/prints.c
@@ -156,7 +156,7 @@ static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh)
 {
 	return scnprintf(buf, size,
 			 "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)",
-			 bh->b_bdev, bh->b_size,
+			 bh_bdev(bh), bh->b_size,
 			 (unsigned long long)bh->b_blocknr,
 			 atomic_read(&(bh->b_count)),
 			 bh->b_state, bh->b_page,
@@ -561,7 +561,7 @@ static int print_super_block(struct buffer_head *bh)
 		return 1;
 	}
 
-	printk("%pg\'s super block is in block %llu\n", bh->b_bdev,
+	printk("%pg\'s super block is in block %llu\n", bh_bdev(bh),
 	       (unsigned long long)bh->b_blocknr);
 	printk("Reiserfs version %s\n", version);
 	printk("Block count %u\n", sb_block_count(rs));
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index f0e1f29f20ee..49caa7c42fb7 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -2810,10 +2810,10 @@ struct reiserfs_journal_header {
 
 /* We need these to make journal.c code more readable */
 #define journal_find_get_block(s, block) __find_get_block(\
-		file_bdev(SB_JOURNAL(s)->j_bdev_file), block, s->s_blocksize)
-#define journal_getblk(s, block) __getblk(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
+		SB_JOURNAL(s)->j_bdev_file, block, s->s_blocksize)
+#define journal_getblk(s, block) __getblk(SB_JOURNAL(s)->j_bdev_file,\
 		block, s->s_blocksize)
-#define journal_bread(s, block) __bread(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
+#define journal_bread(s, block) __bread(SB_JOURNAL(s)->j_bdev_file,\
 		block, s->s_blocksize)
 
 enum reiserfs_bh_state_bits {
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 5faf702f8d15..23998f071d9c 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -331,7 +331,7 @@ static inline int key_in_buffer(
 	       || chk_path->path_length > MAX_HEIGHT,
 	       "PAP-5050: pointer to the key(%p) is NULL or invalid path length(%d)",
 	       key, chk_path->path_length);
-	RFALSE(!PATH_PLAST_BUFFER(chk_path)->b_bdev,
+	RFALSE(!bh_bdev(PATH_PLAST_BUFFER(chk_path)),
 	       "PAP-5060: device must not be NODEV");
 
 	if (comp_keys(get_lkey(chk_path, sb), key) == 1)
diff --git a/fs/reiserfs/tail_conversion.c b/fs/reiserfs/tail_conversion.c
index 2cec61af2a9e..f38dfae74e32 100644
--- a/fs/reiserfs/tail_conversion.c
+++ b/fs/reiserfs/tail_conversion.c
@@ -187,7 +187,7 @@ void reiserfs_unmap_buffer(struct buffer_head *bh)
 	clear_buffer_mapped(bh);
 	clear_buffer_req(bh);
 	clear_buffer_new(bh);
-	bh->b_bdev = NULL;
+	bh->b_bdev_file = NULL;
 	unlock_buffer(bh);
 }
 
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 18c8f168b153..c06d41bbb919 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -125,7 +125,7 @@ xfs_bmbt_to_iomap(
 	if (mapping_flags & IOMAP_DAX)
 		iomap->dax_dev = target->bt_daxdev;
 	else
-		iomap->bdev = target->bt_bdev;
+		iomap->bdev_file = target->bt_bdev_file;
 	iomap->flags = iomap_flags;
 
 	if (xfs_ipincount(ip) &&
@@ -150,7 +150,7 @@ xfs_hole_to_iomap(
 	iomap->type = IOMAP_HOLE;
 	iomap->offset = XFS_FSB_TO_B(ip->i_mount, offset_fsb);
 	iomap->length = XFS_FSB_TO_B(ip->i_mount, end_fsb - offset_fsb);
-	iomap->bdev = target->bt_bdev;
+	iomap->bdev_file = target->bt_bdev_file;
 	iomap->dax_dev = target->bt_daxdev;
 }
 
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 8dab4c2ad300..e454d08ad7d0 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -38,7 +38,7 @@ static int zonefs_read_iomap_begin(struct inode *inode, loff_t offset,
 	 * act as if there is a hole up to the file maximum size.
 	 */
 	mutex_lock(&zi->i_truncate_mutex);
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->bdev_file = inode->i_sb->s_bdev_file;
 	iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize);
 	isize = i_size_read(inode);
 	if (iomap->offset >= isize) {
@@ -88,7 +88,7 @@ static int zonefs_write_iomap_begin(struct inode *inode, loff_t offset,
 	 * write pointer) and unwriten beyond.
 	 */
 	mutex_lock(&zi->i_truncate_mutex);
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap->bdev_file = inode->i_sb->s_bdev_file;
 	iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize);
 	iomap->addr = (z->z_sector << SECTOR_SHIFT) + iomap->offset;
 	isize = i_size_read(inode);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 1c07848dea7e..79c652f42e57 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -49,7 +49,6 @@ struct block_device {
 	bool			bd_write_holder;
 	bool			bd_has_submit_bio;
 	dev_t			bd_dev;
-	struct inode		*bd_inode;	/* will die */
 
 	atomic_t		bd_openers;
 	spinlock_t		bd_size_lock; /* for bd_inode->i_size updates */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3fb02e3a527a..f3bc2e77999a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1524,6 +1524,8 @@ struct block_device *I_BDEV(struct inode *inode);
 struct block_device *file_bdev(struct file *bdev_file);
 bool disk_live(struct gendisk *disk);
 unsigned int block_size(struct block_device *bdev);
+void clean_bdev_aliases2(struct block_device *bdev, sector_t block,
+			 sector_t len);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index d78454a4dd1f..863af22f24c4 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/blk_types.h>
+#include <linux/blkdev.h>
 #include <linux/fs.h>
 #include <linux/linkage.h>
 #include <linux/pagemap.h>
@@ -34,6 +35,7 @@ enum bh_state_bits {
 	BH_Meta,	/* Buffer contains metadata */
 	BH_Prio,	/* Buffer should be submitted with REQ_PRIO */
 	BH_Defer_Completion, /* Defer AIO completion to workqueue */
+	BH_Bdev,
 
 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities
@@ -68,7 +70,10 @@ struct buffer_head {
 	size_t b_size;			/* size of mapping */
 	char *b_data;			/* pointer to data within the page */
 
-	struct block_device *b_bdev;
+	union {
+		struct file *b_bdev_file;
+		struct block_device *b_bdev;
+	};
 	bh_end_io_t *b_end_io;		/* I/O completion */
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
@@ -135,6 +140,14 @@ BUFFER_FNS(Unwritten, unwritten)
 BUFFER_FNS(Meta, meta)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)
+BUFFER_FNS(Bdev, bdev)
+
+static __always_inline struct block_device *bh_bdev(struct buffer_head *bh)
+{
+	if (buffer_bdev(bh))
+		return bh->b_bdev;
+	return file_bdev(bh->b_bdev_file);
+}
 
 static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
 {
@@ -212,24 +225,33 @@ int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end,
 				  bool datasync);
 int generic_buffers_fsync(struct file *file, loff_t start, loff_t end,
 			  bool datasync);
-void clean_bdev_aliases(struct block_device *bdev, sector_t block,
-			sector_t len);
+void __clean_bdev_aliases(struct inode *inode, sector_t block, sector_t len);
+
+static inline void clean_bdev_aliases(struct file *bdev_file, sector_t block,
+				      sector_t len)
+{
+	return __clean_bdev_aliases(file_inode(bdev_file), block, len);
+}
+
 static inline void clean_bdev_bh_alias(struct buffer_head *bh)
 {
-	clean_bdev_aliases(bh->b_bdev, bh->b_blocknr, 1);
+	if (buffer_bdev(bh))
+		clean_bdev_aliases2(bh->b_bdev, bh->b_blocknr, 1);
+	else
+		clean_bdev_aliases(bh->b_bdev_file, bh->b_blocknr, 1);
 }
 
 void mark_buffer_async_write(struct buffer_head *bh);
 void __wait_on_buffer(struct buffer_head *);
 wait_queue_head_t *bh_waitq_head(struct buffer_head *bh);
-struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block,
+struct buffer_head *__find_get_block(struct file *bdev_file, sector_t block,
 			unsigned size);
-struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
+struct buffer_head *bdev_getblk(struct file *bdev_file, sector_t block,
 		unsigned size, gfp_t gfp);
 void __brelse(struct buffer_head *);
 void __bforget(struct buffer_head *);
-void __breadahead(struct block_device *, sector_t block, unsigned int size);
-struct buffer_head *__bread_gfp(struct block_device *,
+void __breadahead(struct file *bdev_file, sector_t block, unsigned int size);
+struct buffer_head *__bread_gfp(struct file *bdev_file,
 				sector_t block, unsigned size, gfp_t gfp);
 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
@@ -239,7 +261,7 @@ int sync_dirty_buffer(struct buffer_head *bh);
 int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
 void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
 void submit_bh(blk_opf_t, struct buffer_head *);
-void write_boundary_block(struct block_device *bdev,
+void write_boundary_block(struct file *bdev_file,
 			sector_t bblock, unsigned blocksize);
 int bh_uptodate_or_lock(struct buffer_head *bh);
 int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait);
@@ -318,66 +340,67 @@ static inline void bforget(struct buffer_head *bh)
 static inline struct buffer_head *
 sb_bread(struct super_block *sb, sector_t block)
 {
-	return __bread_gfp(sb->s_bdev, block, sb->s_blocksize, __GFP_MOVABLE);
+	return __bread_gfp(sb->s_bdev_file, block, sb->s_blocksize,
+			   __GFP_MOVABLE);
 }
 
 static inline struct buffer_head *
 sb_bread_unmovable(struct super_block *sb, sector_t block)
 {
-	return __bread_gfp(sb->s_bdev, block, sb->s_blocksize, 0);
+	return __bread_gfp(sb->s_bdev_file, block, sb->s_blocksize, 0);
 }
 
 static inline void
 sb_breadahead(struct super_block *sb, sector_t block)
 {
-	__breadahead(sb->s_bdev, block, sb->s_blocksize);
+	__breadahead(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
-static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
+static inline struct buffer_head *getblk_unmovable(struct file *bdev_file,
 		sector_t block, unsigned size)
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 	gfp |= __GFP_NOFAIL;
 
-	return bdev_getblk(bdev, block, size, gfp);
+	return bdev_getblk(bdev_file, block, size, gfp);
 }
 
-static inline struct buffer_head *__getblk(struct block_device *bdev,
+static inline struct buffer_head *__getblk(struct file *bdev_file,
 		sector_t block, unsigned size)
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 	gfp |= __GFP_MOVABLE | __GFP_NOFAIL;
 
-	return bdev_getblk(bdev, block, size, gfp);
+	return bdev_getblk(bdev_file, block, size, gfp);
 }
 
 static inline struct buffer_head *sb_getblk(struct super_block *sb,
 		sector_t block)
 {
-	return __getblk(sb->s_bdev, block, sb->s_blocksize);
+	return __getblk(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
 static inline struct buffer_head *sb_getblk_gfp(struct super_block *sb,
 		sector_t block, gfp_t gfp)
 {
-	return bdev_getblk(sb->s_bdev, block, sb->s_blocksize, gfp);
+	return bdev_getblk(sb->s_bdev_file, block, sb->s_blocksize, gfp);
 }
 
 static inline struct buffer_head *
 sb_find_get_block(struct super_block *sb, sector_t block)
 {
-	return __find_get_block(sb->s_bdev, block, sb->s_blocksize);
+	return __find_get_block(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
 static inline void
 map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block)
 {
 	set_buffer_mapped(bh);
-	bh->b_bdev = sb->s_bdev;
+	bh->b_bdev_file = sb->s_bdev_file;
 	bh->b_blocknr = block;
 	bh->b_size = sb->s_blocksize;
 }
@@ -438,7 +461,7 @@ static inline void bh_readahead_batch(int nr, struct buffer_head *bhs[],
 
 /**
  *  __bread() - reads a specified block and returns the bh
- *  @bdev: the block_device to read from
+ *  @bdev_file: the opened block_device to read from
  *  @block: number of block
  *  @size: size (in bytes) to read
  *
@@ -447,9 +470,9 @@ static inline void bh_readahead_batch(int nr, struct buffer_head *bhs[],
  *  It returns NULL if the block was unreadable.
  */
 static inline struct buffer_head *
-__bread(struct block_device *bdev, sector_t block, unsigned size)
+__bread(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
+	return __bread_gfp(bdev_file, block, size, __GFP_MOVABLE);
 }
 
 /**
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d..176b202a2c7d 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -77,6 +77,7 @@ struct vm_fault;
  */
 #define IOMAP_F_SIZE_CHANGED	(1U << 8)
 #define IOMAP_F_STALE		(1U << 9)
+#define IOMAP_F_BDEV		(1U << 10)
 
 /*
  * Flags from 0x1000 up are for file system specific usage:
@@ -97,7 +98,11 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
-	struct block_device	*bdev;	/* block device for I/O */
+	union {
+		/* block device for I/O */
+		struct block_device	*bdev;
+		struct file		*bdev_file;
+	};
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 	void			*inline_data;
 	void			*private; /* filesystem private */
@@ -105,6 +110,13 @@ struct iomap {
 	u64			validity_cookie; /* used with .iomap_valid() */
 };
 
+static inline struct block_device *iomap_bdev(const struct iomap *iomap)
+{
+	if (iomap->flags & IOMAP_F_BDEV)
+		return iomap->bdev;
+	return file_bdev(iomap->bdev_file);
+}
+
 static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
 {
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 0e128ad51460..95d3ed978864 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -26,7 +26,7 @@ DECLARE_EVENT_CLASS(block_buffer,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= bh->b_bdev->bd_dev;
+		__entry->dev		= bh_bdev(bh)->bd_dev;
 		__entry->sector		= bh->b_blocknr;
 		__entry->size		= bh->b_size;
 	),
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
@ 2024-02-25  0:06   ` kernel test robot
  2024-03-17 21:38   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: kernel test robot @ 2024-02-25  0:06 UTC (permalink / raw)
  To: Yu Kuai; +Cc: oe-kbuild-all

Hi Yu,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20240221]

url:    https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/block-move-two-helpers-into-bdev-c/20240222-205510
base:   next-20240221
patch link:    https://lore.kernel.org/r/20240222124555.2049140-20-yukuai1%40huaweicloud.com
patch subject: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
config: i386-buildonly-randconfig-002-20240225 (https://download.01.org/0day-ci/archive/20240225/202402250747.57bivGGZ-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240225/202402250747.57bivGGZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202402250747.57bivGGZ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> fs/buffer.c:1702: warning: Function parameter or struct member 'bd_inode' not described in '__clean_bdev_aliases'
>> fs/buffer.c:1702: warning: Excess function parameter 'inode' description in '__clean_bdev_aliases'


vim +1702 fs/buffer.c

^1da177e4c3f41 Linus Torvalds          2005-04-16  1680  
29f3ad7d838036 Jan Kara                2016-11-04  1681  /**
666c7b98061c32 Yu Kuai                 2024-02-22  1682   * __clean_bdev_aliases: clean a range of buffers in block device
666c7b98061c32 Yu Kuai                 2024-02-22  1683   * @inode: Block device inode to clean buffers in
29f3ad7d838036 Jan Kara                2016-11-04  1684   * @block: Start of a range of blocks to clean
29f3ad7d838036 Jan Kara                2016-11-04  1685   * @len: Number of blocks to clean
29f3ad7d838036 Jan Kara                2016-11-04  1686   *
29f3ad7d838036 Jan Kara                2016-11-04  1687   * We are taking a range of blocks for data and we don't want writeback of any
29f3ad7d838036 Jan Kara                2016-11-04  1688   * buffer-cache aliases starting from return from this function and until the
29f3ad7d838036 Jan Kara                2016-11-04  1689   * moment when something will explicitly mark the buffer dirty (hopefully that
29f3ad7d838036 Jan Kara                2016-11-04  1690   * will not happen until we will free that block ;-) We don't even need to mark
29f3ad7d838036 Jan Kara                2016-11-04  1691   * it not-uptodate - nobody can expect anything from a newly allocated buffer
29f3ad7d838036 Jan Kara                2016-11-04  1692   * anyway. We used to use unmap_buffer() for such invalidation, but that was
29f3ad7d838036 Jan Kara                2016-11-04  1693   * wrong. We definitely don't want to mark the alias unmapped, for example - it
29f3ad7d838036 Jan Kara                2016-11-04  1694   * would confuse anyone who might pick it with bread() afterwards...
29f3ad7d838036 Jan Kara                2016-11-04  1695   *
29f3ad7d838036 Jan Kara                2016-11-04  1696   * Also..  Note that bforget() doesn't lock the buffer.  So there can be
29f3ad7d838036 Jan Kara                2016-11-04  1697   * writeout I/O going on against recently-freed buffers.  We don't wait on that
29f3ad7d838036 Jan Kara                2016-11-04  1698   * I/O in bforget() - it's more efficient to wait on the I/O only if we really
29f3ad7d838036 Jan Kara                2016-11-04  1699   * need to.  That happens here.
29f3ad7d838036 Jan Kara                2016-11-04  1700   */
666c7b98061c32 Yu Kuai                 2024-02-22  1701  void __clean_bdev_aliases(struct inode *bd_inode, sector_t block, sector_t len)
^1da177e4c3f41 Linus Torvalds          2005-04-16 @1702  {
29f3ad7d838036 Jan Kara                2016-11-04  1703  	struct address_space *bd_mapping = bd_inode->i_mapping;
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1704) 	struct folio_batch fbatch;
4b04646caed544 Matthew Wilcox (Oracle  2023-11-09  1705) 	pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
29f3ad7d838036 Jan Kara                2016-11-04  1706  	pgoff_t end;
c10f778ddfc161 Jan Kara                2017-09-06  1707  	int i, count;
29f3ad7d838036 Jan Kara                2016-11-04  1708  	struct buffer_head *bh;
29f3ad7d838036 Jan Kara                2016-11-04  1709  	struct buffer_head *head;
^1da177e4c3f41 Linus Torvalds          2005-04-16  1710  
4b04646caed544 Matthew Wilcox (Oracle  2023-11-09  1711) 	end = ((loff_t)(block + len - 1) << bd_inode->i_blkbits) / PAGE_SIZE;
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1712) 	folio_batch_init(&fbatch);
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1713) 	while (filemap_get_folios(bd_mapping, &index, end, &fbatch)) {
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1714) 		count = folio_batch_count(&fbatch);
c10f778ddfc161 Jan Kara                2017-09-06  1715  		for (i = 0; i < count; i++) {
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1716) 			struct folio *folio = fbatch.folios[i];
^1da177e4c3f41 Linus Torvalds          2005-04-16  1717  
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1718) 			if (!folio_buffers(folio))
29f3ad7d838036 Jan Kara                2016-11-04  1719  				continue;
29f3ad7d838036 Jan Kara                2016-11-04  1720  			/*
600f111ef51dc2 Matthew Wilcox (Oracle  2023-11-17  1721) 			 * We use folio lock instead of bd_mapping->i_private_lock
29f3ad7d838036 Jan Kara                2016-11-04  1722  			 * to pin buffers here since we can afford to sleep and
29f3ad7d838036 Jan Kara                2016-11-04  1723  			 * it scales better than a global spinlock lock.
29f3ad7d838036 Jan Kara                2016-11-04  1724  			 */
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1725) 			folio_lock(folio);
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1726) 			/* Recheck when the folio is locked which pins bhs */
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1727) 			head = folio_buffers(folio);
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1728) 			if (!head)
29f3ad7d838036 Jan Kara                2016-11-04  1729  				goto unlock_page;
29f3ad7d838036 Jan Kara                2016-11-04  1730  			bh = head;
29f3ad7d838036 Jan Kara                2016-11-04  1731  			do {
6c006a9d94bfb5 Chandan Rajendra        2016-12-25  1732  				if (!buffer_mapped(bh) || (bh->b_blocknr < block))
29f3ad7d838036 Jan Kara                2016-11-04  1733  					goto next;
29f3ad7d838036 Jan Kara                2016-11-04  1734  				if (bh->b_blocknr >= block + len)
29f3ad7d838036 Jan Kara                2016-11-04  1735  					break;
29f3ad7d838036 Jan Kara                2016-11-04  1736  				clear_buffer_dirty(bh);
29f3ad7d838036 Jan Kara                2016-11-04  1737  				wait_on_buffer(bh);
29f3ad7d838036 Jan Kara                2016-11-04  1738  				clear_buffer_req(bh);
29f3ad7d838036 Jan Kara                2016-11-04  1739  next:
29f3ad7d838036 Jan Kara                2016-11-04  1740  				bh = bh->b_this_page;
29f3ad7d838036 Jan Kara                2016-11-04  1741  			} while (bh != head);
29f3ad7d838036 Jan Kara                2016-11-04  1742  unlock_page:
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1743) 			folio_unlock(folio);
29f3ad7d838036 Jan Kara                2016-11-04  1744  		}
9e0b6f31bae664 Matthew Wilcox (Oracle  2022-06-04  1745) 		folio_batch_release(&fbatch);
29f3ad7d838036 Jan Kara                2016-11-04  1746  		cond_resched();
c10f778ddfc161 Jan Kara                2017-09-06  1747  		/* End of range already reached? */
c10f778ddfc161 Jan Kara                2017-09-06  1748  		if (index > end || !index)
c10f778ddfc161 Jan Kara                2017-09-06  1749  			break;
^1da177e4c3f41 Linus Torvalds          2005-04-16  1750  	}
^1da177e4c3f41 Linus Torvalds          2005-04-16  1751  }
666c7b98061c32 Yu Kuai                 2024-02-22  1752  EXPORT_SYMBOL(__clean_bdev_aliases);
^1da177e4c3f41 Linus Torvalds          2005-04-16  1753  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
@ 2024-02-28 13:41   ` Christoph Hellwig
  2024-03-18  9:11     ` Jan Kara
  2024-03-18  9:19   ` Jan Kara
  1 sibling, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-02-28 13:41 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:53PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that dm upper layer already statsh the file of opened device in
> 'dm_dev->bdev_file', it's ok to get inode from the file.

Where did this code get in?


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (18 preceding siblings ...)
  2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
@ 2024-02-28 13:42 ` Christoph Hellwig
  2024-03-15 12:08 ` Yu Kuai
  20 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-02-28 13:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

The series looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (19 preceding siblings ...)
  2024-02-28 13:42 ` [RFC v4 linux-next 00/19] " Christoph Hellwig
@ 2024-03-15 12:08 ` Yu Kuai
  2024-03-15 13:54   ` Christian Brauner
  20 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-15 12:08 UTC (permalink / raw)
  To: Yu Kuai, jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun

Hi, Christian
Hi, Christoph
Hi, Jan

Perhaps now is a good time to send a formal version of this set.
However, I'm not sure yet what branch should I rebase and send this set.
Should I send to the vfs tree?

Thanks,
Kuai

在 2024/02/22 20:45, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Changes in v4:
>   - respin on the top of linux-next, based on Christian's patchset to
>   open bdev as file. Most of patches from v3 is dropped and change to use
>   file_inode(bdev_file) to get bd_inode or bdev_file->f_mapping to get
>   bd_inode->i_mapping.
> 
> Changes in v3:
>   - remove bdev_associated_mapping() and patch 12 from v1;
>   - add kerneldoc comments for new bdev apis;
>   - rename __bdev_get_folio() to bdev_get_folio;
>   - fix a problem in erofs that erofs_init_metabuf() is not always
>   called.
>   - add reviewed-by tag for patch 15-17;
> 
> Changes in v2:
>   - remove some bdev apis that is not necessary;
>   - pass in offset for bdev_read_folio() and __bdev_get_folio();
>   - remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
>   prevent access bd_indoe() directly from mapping_gfp_constraint() in
>   ext4.(patch 15, 16);
>   - remove block_device_ejected() from ext4.
> 
> Yu Kuai (19):
>    block: move two helpers into bdev.c
>    block: remove sync_blockdev_nowait()
>    block: remove sync_blockdev_range()
>    block: prevent direct access of bd_inode
>    bcachefs: remove dead function bdev_sectors()
>    cramfs: prevent direct access of bd_inode
>    erofs: prevent direct access of bd_inode
>    nilfs2: prevent direct access of bd_inode
>    gfs2: prevent direct access of bd_inode
>    s390/dasd: use bdev api in dasd_format()
>    btrfs: prevent direct access of bd_inode
>    ext4: remove block_device_ejected()
>    ext4: prevent direct access of bd_inode
>    jbd2: prevent direct access of bd_inode
>    bcache: prevent direct access of bd_inode
>    block2mtd: prevent direct access of bd_inode
>    dm-vdo: prevent direct access of bd_inode
>    scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
>    fs & block: remove bdev->bd_inode
> 
>   block/bdev.c                              | 108 +++++++++++++++-------
>   block/blk-zoned.c                         |   4 +-
>   block/blk.h                               |   2 +
>   block/fops.c                              |   4 +-
>   block/genhd.c                             |   9 +-
>   block/ioctl.c                             |   8 +-
>   block/partitions/core.c                   |   8 +-
>   drivers/md/bcache/super.c                 |   7 +-
>   drivers/md/dm-vdo/dedupe.c                |   3 +-
>   drivers/md/dm-vdo/dm-vdo-target.c         |   5 +-
>   drivers/md/dm-vdo/indexer/config.c        |   1 +
>   drivers/md/dm-vdo/indexer/config.h        |   3 +
>   drivers/md/dm-vdo/indexer/index-layout.c  |   6 +-
>   drivers/md/dm-vdo/indexer/index-layout.h  |   2 +-
>   drivers/md/dm-vdo/indexer/index-session.c |  13 +--
>   drivers/md/dm-vdo/indexer/index.c         |   4 +-
>   drivers/md/dm-vdo/indexer/index.h         |   2 +-
>   drivers/md/dm-vdo/indexer/indexer.h       |   4 +-
>   drivers/md/dm-vdo/indexer/io-factory.c    |  13 ++-
>   drivers/md/dm-vdo/indexer/io-factory.h    |   4 +-
>   drivers/md/dm-vdo/indexer/volume.c        |   4 +-
>   drivers/md/dm-vdo/indexer/volume.h        |   2 +-
>   drivers/md/md-bitmap.c                    |   2 +-
>   drivers/mtd/devices/block2mtd.c           |   6 +-
>   drivers/s390/block/dasd_ioctl.c           |   5 +-
>   drivers/scsi/scsicam.c                    |   3 +-
>   fs/affs/file.c                            |   2 +-
>   fs/bcachefs/util.h                        |   5 -
>   fs/btrfs/dev-replace.c                    |   2 +-
>   fs/btrfs/disk-io.c                        |  17 ++--
>   fs/btrfs/disk-io.h                        |   4 +-
>   fs/btrfs/inode.c                          |   2 +-
>   fs/btrfs/super.c                          |   2 +-
>   fs/btrfs/volumes.c                        |  32 ++++---
>   fs/btrfs/volumes.h                        |   2 +-
>   fs/btrfs/zoned.c                          |  20 ++--
>   fs/btrfs/zoned.h                          |   4 +-
>   fs/buffer.c                               | 103 ++++++++++++---------
>   fs/cramfs/inode.c                         |   2 +-
>   fs/direct-io.c                            |   4 +-
>   fs/erofs/data.c                           |   5 +-
>   fs/erofs/internal.h                       |   1 +
>   fs/erofs/zmap.c                           |   2 +-
>   fs/exfat/fatent.c                         |   2 +-
>   fs/ext2/inode.c                           |   4 +-
>   fs/ext2/xattr.c                           |   2 +-
>   fs/ext4/dir.c                             |   2 +-
>   fs/ext4/ext4_jbd2.c                       |   2 +-
>   fs/ext4/inode.c                           |   8 +-
>   fs/ext4/mmp.c                             |   2 +-
>   fs/ext4/page-io.c                         |   5 +-
>   fs/ext4/super.c                           |  30 ++----
>   fs/ext4/xattr.c                           |   2 +-
>   fs/f2fs/data.c                            |   7 +-
>   fs/f2fs/f2fs.h                            |   1 +
>   fs/fat/inode.c                            |   2 +-
>   fs/fuse/dax.c                             |   2 +-
>   fs/gfs2/aops.c                            |   2 +-
>   fs/gfs2/bmap.c                            |   2 +-
>   fs/gfs2/glock.c                           |   2 +-
>   fs/gfs2/meta_io.c                         |   2 +-
>   fs/gfs2/ops_fstype.c                      |   2 +-
>   fs/hpfs/file.c                            |   2 +-
>   fs/iomap/buffered-io.c                    |   8 +-
>   fs/iomap/direct-io.c                      |  11 ++-
>   fs/iomap/swapfile.c                       |   2 +-
>   fs/iomap/trace.h                          |   2 +-
>   fs/jbd2/commit.c                          |   2 +-
>   fs/jbd2/journal.c                         |  34 ++++---
>   fs/jbd2/recovery.c                        |   8 +-
>   fs/jbd2/revoke.c                          |  13 +--
>   fs/jbd2/transaction.c                     |   8 +-
>   fs/mpage.c                                |  26 ++++--
>   fs/nilfs2/btnode.c                        |   4 +-
>   fs/nilfs2/gcinode.c                       |   2 +-
>   fs/nilfs2/mdt.c                           |   2 +-
>   fs/nilfs2/page.c                          |   4 +-
>   fs/nilfs2/recovery.c                      |  27 ++++--
>   fs/nilfs2/segment.c                       |   2 +-
>   fs/ntfs3/fsntfs.c                         |   8 +-
>   fs/ntfs3/inode.c                          |   4 +-
>   fs/ntfs3/super.c                          |   2 +-
>   fs/ocfs2/journal.c                        |   2 +-
>   fs/reiserfs/fix_node.c                    |   2 +-
>   fs/reiserfs/journal.c                     |  10 +-
>   fs/reiserfs/prints.c                      |   4 +-
>   fs/reiserfs/reiserfs.h                    |   6 +-
>   fs/reiserfs/stree.c                       |   2 +-
>   fs/reiserfs/tail_conversion.c             |   2 +-
>   fs/sync.c                                 |   9 +-
>   fs/xfs/xfs_iomap.c                        |   4 +-
>   fs/zonefs/file.c                          |   4 +-
>   include/linux/blk_types.h                 |   1 -
>   include/linux/blkdev.h                    |  21 +----
>   include/linux/buffer_head.h               |  73 ++++++++++-----
>   include/linux/iomap.h                     |  14 ++-
>   include/linux/jbd2.h                      |  18 +++-
>   include/trace/events/block.h              |   2 +-
>   98 files changed, 491 insertions(+), 376 deletions(-)
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-15 12:08 ` Yu Kuai
@ 2024-03-15 13:54   ` Christian Brauner
  2024-03-16  2:49     ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Christian Brauner @ 2024-03-15 13:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Yu Kuai, jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun

On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
> Hi, Christian
> Hi, Christoph
> Hi, Jan
> 
> Perhaps now is a good time to send a formal version of this set.
> However, I'm not sure yet what branch should I rebase and send this set.
> Should I send to the vfs tree?

Nearly all of it is in fs/ so I'd say yes.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 01/19] block: move two helpers into bdev.c
  2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
@ 2024-03-15 14:31   ` Jan Kara
  2024-03-17 21:19   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:31 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:37, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> disk_live() and block_size() access bd_inode directly, prepare to remove
> the field bd_inode from block_device, and only access bd_inode in block
> layer.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c           | 12 ++++++++++++
>  include/linux/blkdev.h | 12 ++----------
>  2 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 140093c99bdc..726a2805a1ce 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -1196,6 +1196,18 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>  	blkdev_put_no_open(bdev);
>  }
>  
> +bool disk_live(struct gendisk *disk)
> +{
> +	return !inode_unhashed(disk->part0->bd_inode);
> +}
> +EXPORT_SYMBOL_GPL(disk_live);
> +
> +unsigned int block_size(struct block_device *bdev)
> +{
> +	return 1 << bdev->bd_inode->i_blkbits;
> +}
> +EXPORT_SYMBOL_GPL(block_size);
> +
>  static int __init setup_bdev_allow_write_mounted(char *str)
>  {
>  	if (kstrtobool(str, &bdev_allow_write_mounted))
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 06e854186947..eb1f6eeaddc5 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -211,11 +211,6 @@ struct gendisk {
>  	struct blk_independent_access_ranges *ia_ranges;
>  };
>  
> -static inline bool disk_live(struct gendisk *disk)
> -{
> -	return !inode_unhashed(disk->part0->bd_inode);
> -}
> -
>  /**
>   * disk_openers - returns how many openers are there for a disk
>   * @disk: disk to check
> @@ -1359,11 +1354,6 @@ static inline unsigned int blksize_bits(unsigned int size)
>  	return order_base_2(size >> SECTOR_SHIFT) + SECTOR_SHIFT;
>  }
>  
> -static inline unsigned int block_size(struct block_device *bdev)
> -{
> -	return 1 << bdev->bd_inode->i_blkbits;
> -}
> -
>  int kblockd_schedule_work(struct work_struct *work);
>  int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
>  
> @@ -1531,6 +1521,8 @@ void blkdev_put_no_open(struct block_device *bdev);
>  
>  struct block_device *I_BDEV(struct inode *inode);
>  struct block_device *file_bdev(struct file *bdev_file);
> +bool disk_live(struct gendisk *disk);
> +unsigned int block_size(struct block_device *bdev);
>  
>  #ifdef CONFIG_BLOCK
>  void invalidate_bdev(struct block_device *bdev);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait()
  2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
@ 2024-03-15 14:34   ` Jan Kara
  2024-03-17 21:19   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:34 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:38, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to flush the file
> mapping directly.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c           | 8 --------
>  fs/fat/inode.c         | 2 +-
>  fs/ntfs3/inode.c       | 2 +-
>  fs/sync.c              | 9 ++++++---
>  include/linux/blkdev.h | 5 -----
>  5 files changed, 8 insertions(+), 18 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 726a2805a1ce..49dcff483289 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -188,14 +188,6 @@ int sb_min_blocksize(struct super_block *sb, int size)
>  
>  EXPORT_SYMBOL(sb_min_blocksize);
>  
> -int sync_blockdev_nowait(struct block_device *bdev)
> -{
> -	if (!bdev)
> -		return 0;
> -	return filemap_flush(bdev->bd_inode->i_mapping);
> -}
> -EXPORT_SYMBOL_GPL(sync_blockdev_nowait);
> -
>  /*
>   * Write out and wait upon all the dirty data associated with a block
>   * device via its mapping.  Does not take the superblock lock.
> diff --git a/fs/fat/inode.c b/fs/fat/inode.c
> index 5c813696d1ff..8527aef51841 100644
> --- a/fs/fat/inode.c
> +++ b/fs/fat/inode.c
> @@ -1945,7 +1945,7 @@ int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
>  	if (!ret && i2)
>  		ret = writeback_inode(i2);
>  	if (!ret)
> -		ret = sync_blockdev_nowait(sb->s_bdev);
> +		ret = filemap_flush(sb->s_bdev_file->f_mapping);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(fat_flush_inodes);
> diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
> index eb7a8c9fba01..3c4c878f6d77 100644
> --- a/fs/ntfs3/inode.c
> +++ b/fs/ntfs3/inode.c
> @@ -1081,7 +1081,7 @@ int ntfs_flush_inodes(struct super_block *sb, struct inode *i1,
>  	if (!ret && i2)
>  		ret = writeback_inode(i2);
>  	if (!ret)
> -		ret = sync_blockdev_nowait(sb->s_bdev);
> +		ret = filemap_flush(sb->s_bdev_file->f_mapping);
>  	return ret;
>  }
>  
> diff --git a/fs/sync.c b/fs/sync.c
> index dc725914e1ed..3a43062790d9 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -57,9 +57,12 @@ int sync_filesystem(struct super_block *sb)
>  		if (ret)
>  			return ret;
>  	}
> -	ret = sync_blockdev_nowait(sb->s_bdev);
> -	if (ret)
> -		return ret;
> +
> +	if (sb->s_bdev_file) {
> +		ret = filemap_flush(sb->s_bdev_file->f_mapping);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	sync_inodes_sb(sb);
>  	if (sb->s_op->sync_fs) {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index eb1f6eeaddc5..9e96811c8915 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1528,7 +1528,6 @@ unsigned int block_size(struct block_device *bdev);
>  void invalidate_bdev(struct block_device *bdev);
>  int sync_blockdev(struct block_device *bdev);
>  int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
> -int sync_blockdev_nowait(struct block_device *bdev);
>  void sync_bdevs(bool wait);
>  void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
>  void printk_all_partitions(void);
> @@ -1541,10 +1540,6 @@ static inline int sync_blockdev(struct block_device *bdev)
>  {
>  	return 0;
>  }
> -static inline int sync_blockdev_nowait(struct block_device *bdev)
> -{
> -	return 0;
> -}
>  static inline void sync_bdevs(bool wait)
>  {
>  }
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 03/19] block: remove sync_blockdev_range()
  2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
@ 2024-03-15 14:37   ` Jan Kara
  2024-03-17 21:21   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:37 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:39, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to flush the file
> mapping directly.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c           |  7 -------
>  fs/btrfs/dev-replace.c |  2 +-
>  fs/btrfs/volumes.c     | 19 +++++++++++--------
>  fs/btrfs/volumes.h     |  2 +-
>  fs/exfat/fatent.c      |  2 +-
>  include/linux/blkdev.h |  1 -
>  6 files changed, 14 insertions(+), 19 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 49dcff483289..e493d5c72edb 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -200,13 +200,6 @@ int sync_blockdev(struct block_device *bdev)
>  }
>  EXPORT_SYMBOL(sync_blockdev);
>  
> -int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend)
> -{
> -	return filemap_write_and_wait_range(bdev->bd_inode->i_mapping,
> -			lstart, lend);
> -}
> -EXPORT_SYMBOL(sync_blockdev_range);
> -
>  /**
>   * bdev_freeze - lock a filesystem and force it into a consistent state
>   * @bdev:	blockdevice to lock
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 7057221a46c3..88d45118cc64 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -982,7 +982,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
>  	btrfs_sysfs_remove_device(src_device);
>  	btrfs_sysfs_update_devid(tgt_device);
>  	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &src_device->dev_state))
> -		btrfs_scratch_superblocks(fs_info, src_device->bdev,
> +		btrfs_scratch_superblocks(fs_info, src_device->bdev_file,
>  					  src_device->name->str);
>  
>  	/* write back the superblocks */
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 493e33b4ae94..e12451ff911a 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2033,14 +2033,14 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
>  }
>  
>  static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
> -				     struct block_device *bdev, int copy_num)
> +				     struct file *bdev_file, int copy_num)
>  {
>  	struct btrfs_super_block *disk_super;
>  	const size_t len = sizeof(disk_super->magic);
>  	const u64 bytenr = btrfs_sb_offset(copy_num);
>  	int ret;
>  
> -	disk_super = btrfs_read_disk_super(bdev, bytenr, bytenr);
> +	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
>  	if (IS_ERR(disk_super))
>  		return;
>  
> @@ -2048,26 +2048,29 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
>  	folio_mark_dirty(virt_to_folio(disk_super));
>  	btrfs_release_disk_super(disk_super);
>  
> -	ret = sync_blockdev_range(bdev, bytenr, bytenr + len - 1);
> +	ret = filemap_write_and_wait_range(bdev_file->f_mapping,
> +					   bytenr, bytenr + len - 1);
>  	if (ret)
>  		btrfs_warn(fs_info, "error clearing superblock number %d (%d)",
>  			copy_num, ret);
>  }
>  
>  void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
> -			       struct block_device *bdev,
> +			       struct file *bdev_file,
>  			       const char *device_path)
>  {
> +	struct block_device *bdev;
>  	int copy_num;
>  
> -	if (!bdev)
> +	if (!bdev_file)
>  		return;
>  
> +	bdev = file_bdev(bdev_file);
>  	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
>  		if (bdev_is_zoned(bdev))
>  			btrfs_reset_sb_log_zones(bdev, copy_num);
>  		else
> -			btrfs_scratch_superblock(fs_info, bdev, copy_num);
> +			btrfs_scratch_superblock(fs_info, bdev_file, copy_num);
>  	}
>  
>  	/* Notify udev that device has changed */
> @@ -2209,7 +2212,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
>  	 *  just flush the device and let the caller do the final bdev_release.
>  	 */
>  	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
> -		btrfs_scratch_superblocks(fs_info, device->bdev,
> +		btrfs_scratch_superblocks(fs_info, device->bdev_file,
>  					  device->name->str);
>  		if (device->bdev) {
>  			sync_blockdev(device->bdev);
> @@ -2323,7 +2326,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_device *tgtdev)
>  
>  	mutex_unlock(&fs_devices->device_list_mutex);
>  
> -	btrfs_scratch_superblocks(tgtdev->fs_info, tgtdev->bdev,
> +	btrfs_scratch_superblocks(tgtdev->fs_info, tgtdev->bdev_file,
>  				  tgtdev->name->str);
>  
>  	btrfs_close_bdev(tgtdev);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 2ef78d3cc4c3..1d566f40b83d 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -818,7 +818,7 @@ struct list_head * __attribute_const__ btrfs_get_fs_uuids(void);
>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>  					struct btrfs_device *failing_dev);
>  void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
> -			       struct block_device *bdev,
> +			       struct file *bdev_file,
>  			       const char *device_path);
>  
>  enum btrfs_raid_types __attribute_const__ btrfs_bg_flags_to_raid_index(u64 flags);
> diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
> index 56b870d9cc0d..1c86ec2465b7 100644
> --- a/fs/exfat/fatent.c
> +++ b/fs/exfat/fatent.c
> @@ -296,7 +296,7 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
>  	}
>  
>  	if (IS_DIRSYNC(dir))
> -		return sync_blockdev_range(sb->s_bdev,
> +		return filemap_write_and_wait_range(sb->s_bdev_file->f_mapping,
>  				EXFAT_BLK_TO_B(blknr, sb),
>  				EXFAT_BLK_TO_B(last_blknr, sb) - 1);
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 9e96811c8915..c510f334c84f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1527,7 +1527,6 @@ unsigned int block_size(struct block_device *bdev);
>  #ifdef CONFIG_BLOCK
>  void invalidate_bdev(struct block_device *bdev);
>  int sync_blockdev(struct block_device *bdev);
> -int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
>  void sync_bdevs(bool wait);
>  void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
>  void printk_all_partitions(void);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors()
  2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2024-03-15 14:42   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:41, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> bdev_sectors() is not used hence remove it.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/bcachefs/util.h | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
> index 1b3aced8d83c..e2d7f22df618 100644
> --- a/fs/bcachefs/util.h
> +++ b/fs/bcachefs/util.h
> @@ -443,11 +443,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
>  void bch2_bio_map(struct bio *bio, void *base, size_t);
>  int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
>  
> -static inline sector_t bdev_sectors(struct block_device *bdev)
> -{
> -	return bdev->bd_inode->i_size >> 9;
> -}
> -
>  #define closure_bio_submit(bio, cl)					\
>  do {									\
>  	closure_get(cl);						\
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
@ 2024-03-15 14:44   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:42, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get bdev mapping
> from the file directly.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/cramfs/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 39e75131fd5a..1df4dd89350e 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -183,7 +183,7 @@ static int next_buffer;
>  static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
>  				unsigned int len)
>  {
> -	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
> +	struct address_space *mapping = sb->s_bdev_file->f_mapping;
>  	struct file_ra_state ra = {};
>  	struct page *pages[BLKS_PER_BUF];
>  	unsigned i, blocknr, buffer;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
@ 2024-03-15 14:44   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
  2024-03-22  5:44   ` Al Viro
  2 siblings, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:40, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Add helpers to access bd_inode, prepare to remove the field 'bd_inode'
> after removing all the access from filesystems and drivers.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c            | 58 +++++++++++++++++++++++++++--------------
>  block/blk-zoned.c       |  4 +--
>  block/blk.h             |  2 ++
>  block/fops.c            |  2 +-
>  block/genhd.c           |  9 ++++---
>  block/ioctl.c           |  8 +++---
>  block/partitions/core.c |  8 +++---
>  7 files changed, 56 insertions(+), 35 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index e493d5c72edb..60a1479eae83 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -43,6 +43,21 @@ static inline struct bdev_inode *BDEV_I(struct inode *inode)
>  	return container_of(inode, struct bdev_inode, vfs_inode);
>  }
>  
> +static inline struct bdev_inode *BDEV_B(struct block_device *bdev)
> +{
> +	return container_of(bdev, struct bdev_inode, bdev);
> +}
> +
> +struct inode *bdev_inode(struct block_device *bdev)
> +{
> +	return &BDEV_B(bdev)->vfs_inode;
> +}
> +
> +struct address_space *bdev_mapping(struct block_device *bdev)
> +{
> +	return BDEV_B(bdev)->vfs_inode.i_mapping;
> +}
> +
>  struct block_device *I_BDEV(struct inode *inode)
>  {
>  	return &BDEV_I(inode)->bdev;
> @@ -57,7 +72,7 @@ EXPORT_SYMBOL(file_bdev);
>  
>  static void bdev_write_inode(struct block_device *bdev)
>  {
> -	struct inode *inode = bdev->bd_inode;
> +	struct inode *inode = bdev_inode(bdev);
>  	int ret;
>  
>  	spin_lock(&inode->i_lock);
> @@ -76,7 +91,7 @@ static void bdev_write_inode(struct block_device *bdev)
>  /* Kill _all_ buffers and pagecache , dirty or not.. */
>  static void kill_bdev(struct block_device *bdev)
>  {
> -	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	struct address_space *mapping = bdev_mapping(bdev);
>  
>  	if (mapping_empty(mapping))
>  		return;
> @@ -88,7 +103,7 @@ static void kill_bdev(struct block_device *bdev)
>  /* Invalidate clean unused buffers and pagecache. */
>  void invalidate_bdev(struct block_device *bdev)
>  {
> -	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	struct address_space *mapping = bdev_mapping(bdev);
>  
>  	if (mapping->nrpages) {
>  		invalidate_bh_lrus();
> @@ -116,7 +131,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
>  			goto invalidate;
>  	}
>  
> -	truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
> +	truncate_inode_pages_range(bdev_mapping(bdev), lstart, lend);
>  	if (!(mode & BLK_OPEN_EXCL))
>  		bd_abort_claiming(bdev, truncate_bdev_range);
>  	return 0;
> @@ -126,7 +141,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
>  	 * Someone else has handle exclusively open. Try invalidating instead.
>  	 * The 'end' argument is inclusive so the rounding is safe.
>  	 */
> -	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
> +	return invalidate_inode_pages2_range(bdev_mapping(bdev),
>  					     lstart >> PAGE_SHIFT,
>  					     lend >> PAGE_SHIFT);
>  }
> @@ -134,14 +149,14 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
>  static void set_init_blocksize(struct block_device *bdev)
>  {
>  	unsigned int bsize = bdev_logical_block_size(bdev);
> -	loff_t size = i_size_read(bdev->bd_inode);
> +	loff_t size = i_size_read(bdev_inode(bdev));
>  
>  	while (bsize < PAGE_SIZE) {
>  		if (size & bsize)
>  			break;
>  		bsize <<= 1;
>  	}
> -	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
> +	bdev_inode(bdev)->i_blkbits = blksize_bits(bsize);
>  }
>  
>  int set_blocksize(struct block_device *bdev, int size)
> @@ -155,9 +170,9 @@ int set_blocksize(struct block_device *bdev, int size)
>  		return -EINVAL;
>  
>  	/* Don't change the size if it is same as current */
> -	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
> +	if (bdev_inode(bdev)->i_blkbits != blksize_bits(size)) {
>  		sync_blockdev(bdev);
> -		bdev->bd_inode->i_blkbits = blksize_bits(size);
> +		bdev_inode(bdev)->i_blkbits = blksize_bits(size);
>  		kill_bdev(bdev);
>  	}
>  	return 0;
> @@ -196,7 +211,7 @@ int sync_blockdev(struct block_device *bdev)
>  {
>  	if (!bdev)
>  		return 0;
> -	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
> +	return filemap_write_and_wait(bdev_mapping(bdev));
>  }
>  EXPORT_SYMBOL(sync_blockdev);
>  
> @@ -415,19 +430,22 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>  void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
>  {
>  	spin_lock(&bdev->bd_size_lock);
> -	i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
> +	i_size_write(bdev_inode(bdev), (loff_t)sectors << SECTOR_SHIFT);
>  	bdev->bd_nr_sectors = sectors;
>  	spin_unlock(&bdev->bd_size_lock);
>  }
>  
>  void bdev_add(struct block_device *bdev, dev_t dev)
>  {
> +	struct inode *inode;
> +
>  	if (bdev_stable_writes(bdev))
> -		mapping_set_stable_writes(bdev->bd_inode->i_mapping);
> +		mapping_set_stable_writes(bdev_mapping(bdev));
>  	bdev->bd_dev = dev;
> -	bdev->bd_inode->i_rdev = dev;
> -	bdev->bd_inode->i_ino = dev;
> -	insert_inode_hash(bdev->bd_inode);
> +	inode = bdev_inode(bdev);
> +	inode->i_rdev = dev;
> +	inode->i_ino = dev;
> +	insert_inode_hash(inode);
>  }
>  
>  long nr_blockdev_pages(void)
> @@ -885,7 +903,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
>  	bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
>  	if (bdev_nowait(bdev))
>  		bdev_file->f_mode |= FMODE_NOWAIT;
> -	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
> +	bdev_file->f_mapping = bdev_mapping(bdev);
>  	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
>  	bdev_file->private_data = holder;
>  
> @@ -947,13 +965,13 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>  		return ERR_PTR(-ENXIO);
>  
>  	flags = blk_to_file_flags(mode);
> -	bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
> +	bdev_file = alloc_file_pseudo_noaccount(bdev_inode(bdev),
>  			blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
>  	if (IS_ERR(bdev_file)) {
>  		blkdev_put_no_open(bdev);
>  		return bdev_file;
>  	}
> -	ihold(bdev->bd_inode);
> +	ihold(bdev_inode(bdev));
>  
>  	ret = bdev_open(bdev, mode, holder, hops, bdev_file);
>  	if (ret) {
> @@ -1183,13 +1201,13 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>  
>  bool disk_live(struct gendisk *disk)
>  {
> -	return !inode_unhashed(disk->part0->bd_inode);
> +	return !inode_unhashed(bdev_inode(disk->part0));
>  }
>  EXPORT_SYMBOL_GPL(disk_live);
>  
>  unsigned int block_size(struct block_device *bdev)
>  {
> -	return 1 << bdev->bd_inode->i_blkbits;
> +	return 1 << bdev_inode(bdev)->i_blkbits;
>  }
>  EXPORT_SYMBOL_GPL(block_size);
>  
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index d4f4f8325eff..ab022d990703 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -399,7 +399,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
>  		op = REQ_OP_ZONE_RESET;
>  
>  		/* Invalidate the page cache, including dirty pages. */
> -		filemap_invalidate_lock(bdev->bd_inode->i_mapping);
> +		filemap_invalidate_lock(bdev_mapping(bdev));
>  		ret = blkdev_truncate_zone_range(bdev, mode, &zrange);
>  		if (ret)
>  			goto fail;
> @@ -421,7 +421,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
>  
>  fail:
>  	if (cmd == BLKRESETZONE)
> -		filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
> +		filemap_invalidate_unlock(bdev_mapping(bdev));
>  
>  	return ret;
>  }
> diff --git a/block/blk.h b/block/blk.h
> index 72bc8d27cc70..b612538588cb 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -414,6 +414,8 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
>  }
>  #endif /* CONFIG_BLK_DEV_ZONED */
>  
> +struct inode *bdev_inode(struct block_device *bdev);
> +struct address_space *bdev_mapping(struct block_device *bdev);
>  struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
>  void bdev_add(struct block_device *bdev, dev_t dev);
>  
> diff --git a/block/fops.c b/block/fops.c
> index f4dcb9dd148d..1fcbdb131a8f 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -666,7 +666,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct block_device *bdev = I_BDEV(file->f_mapping->host);
> -	struct inode *bd_inode = bdev->bd_inode;
> +	struct inode *bd_inode = bdev_inode(bdev);
>  	loff_t size = bdev_nr_bytes(bdev);
>  	size_t shorted = 0;
>  	ssize_t ret;
> diff --git a/block/genhd.c b/block/genhd.c
> index 2f9834bdd14b..4f0f66b4798f 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -656,7 +656,7 @@ void del_gendisk(struct gendisk *disk)
>  	 */
>  	mutex_lock(&disk->open_mutex);
>  	xa_for_each(&disk->part_tbl, idx, part)
> -		remove_inode_hash(part->bd_inode);
> +		remove_inode_hash(bdev_inode(part));
>  	mutex_unlock(&disk->open_mutex);
>  
>  	/*
> @@ -745,7 +745,7 @@ void invalidate_disk(struct gendisk *disk)
>  	struct block_device *bdev = disk->part0;
>  
>  	invalidate_bdev(bdev);
> -	bdev->bd_inode->i_mapping->wb_err = 0;
> +	bdev_mapping(bdev)->wb_err = 0;
>  	set_capacity(disk, 0);
>  }
>  EXPORT_SYMBOL(invalidate_disk);
> @@ -1191,7 +1191,8 @@ static void disk_release(struct device *dev)
>  	if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk)
>  		disk->fops->free_disk(disk);
>  
> -	iput(disk->part0->bd_inode);	/* frees the disk */
> +	/* frees the disk */
> +	iput(bdev_inode(disk->part0));
>  }
>  
>  static int block_uevent(const struct device *dev, struct kobj_uevent_env *env)
> @@ -1381,7 +1382,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
>  out_destroy_part_tbl:
>  	xa_destroy(&disk->part_tbl);
>  	disk->part0->bd_disk = NULL;
> -	iput(disk->part0->bd_inode);
> +	iput(bdev_inode(disk->part0));
>  out_free_bdi:
>  	bdi_put(disk->bdi);
>  out_free_bioset:
> diff --git a/block/ioctl.c b/block/ioctl.c
> index 4c8aebee595f..cb5b378cff38 100644
> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -90,7 +90,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
>  {
>  	uint64_t range[2];
>  	uint64_t start, len;
> -	struct inode *inode = bdev->bd_inode;
> +	struct inode *inode = bdev_inode(bdev);
>  	int err;
>  
>  	if (!(mode & BLK_OPEN_WRITE))
> @@ -144,12 +144,12 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
>  	if (start + len > bdev_nr_bytes(bdev))
>  		return -EINVAL;
>  
> -	filemap_invalidate_lock(bdev->bd_inode->i_mapping);
> +	filemap_invalidate_lock(bdev_mapping(bdev));
>  	err = truncate_bdev_range(bdev, mode, start, start + len - 1);
>  	if (!err)
>  		err = blkdev_issue_secure_erase(bdev, start >> 9, len >> 9,
>  						GFP_KERNEL);
> -	filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
> +	filemap_invalidate_unlock(bdev_mapping(bdev));
>  	return err;
>  }
>  
> @@ -159,7 +159,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
>  {
>  	uint64_t range[2];
>  	uint64_t start, end, len;
> -	struct inode *inode = bdev->bd_inode;
> +	struct inode *inode = bdev_inode(bdev);
>  	int err;
>  
>  	if (!(mode & BLK_OPEN_WRITE))
> diff --git a/block/partitions/core.c b/block/partitions/core.c
> index 5f5ed5c75f04..6e91a4660588 100644
> --- a/block/partitions/core.c
> +++ b/block/partitions/core.c
> @@ -243,7 +243,7 @@ static const struct attribute_group *part_attr_groups[] = {
>  static void part_release(struct device *dev)
>  {
>  	put_disk(dev_to_bdev(dev)->bd_disk);
> -	iput(dev_to_bdev(dev)->bd_inode);
> +	iput(bdev_inode(dev_to_bdev(dev)));
>  }
>  
>  static int part_uevent(const struct device *dev, struct kobj_uevent_env *env)
> @@ -480,7 +480,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
>  	 * Just delete the partition and invalidate it.
>  	 */
>  
> -	remove_inode_hash(part->bd_inode);
> +	remove_inode_hash(bdev_inode(part));
>  	invalidate_bdev(part);
>  	drop_partition(part);
>  	ret = 0;
> @@ -666,7 +666,7 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
>  		 * it cannot be looked up any more even when openers
>  		 * still hold references.
>  		 */
> -		remove_inode_hash(part->bd_inode);
> +		remove_inode_hash(bdev_inode(part));
>  
>  		/*
>  		 * If @disk->open_partitions isn't elevated but there's
> @@ -715,7 +715,7 @@ EXPORT_SYMBOL_GPL(bdev_disk_changed);
>  
>  void *read_part_sector(struct parsed_partitions *state, sector_t n, Sector *p)
>  {
> -	struct address_space *mapping = state->disk->part0->bd_inode->i_mapping;
> +	struct address_space *mapping = bdev_mapping(state->disk->part0);
>  	struct folio *folio;
>  
>  	if (n >= get_capacity(state->disk)) {
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 07/19] erofs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
@ 2024-03-15 14:45   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
  2024-03-18  2:39   ` Gao Xiang
  2 siblings, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:45 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:43, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> for the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/erofs/data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 433fc39ba423..dc2d43abe8c5 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -70,7 +70,7 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
>  	if (erofs_is_fscache_mode(sb))
>  		buf->inode = EROFS_SB(sb)->s_fscache->inode;
>  	else
> -		buf->inode = sb->s_bdev->bd_inode;
> +		buf->inode = file_inode(sb->s_bdev_file);
>  }
>  
>  void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 08/19] nilfs2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
@ 2024-03-15 14:49   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:49 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:44, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/nilfs2/segment.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index aa5290cb7467..2940e8ef88f4 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -2790,7 +2790,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
>  	if (!nilfs->ns_writer)
>  		return -ENOMEM;
>  
> -	inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
> +	inode_attach_wb(file_inode(nilfs->ns_sb->s_bdev_file), NULL);
>  
>  	err = nilfs_segctor_start_thread(nilfs->ns_writer);
>  	if (unlikely(err))
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 09/19] gfs2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
@ 2024-03-15 14:54   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:45, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/gfs2/glock.c      | 2 +-
>  fs/gfs2/ops_fstype.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 34540f9d011c..95ade8979f6b 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1227,7 +1227,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
>  	mapping = gfs2_glock2aspace(gl);
>  	if (mapping) {
>                  mapping->a_ops = &gfs2_meta_aops;
> -		mapping->host = s->s_bdev->bd_inode;
> +		mapping->host = file_inode(s->s_bdev_file);
>  		mapping->flags = 0;
>  		mapping_set_gfp_mask(mapping, GFP_NOFS);
>  		mapping->i_private_data = NULL;
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 572d58e86296..4384cb39b06c 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -114,7 +114,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>  
>  	address_space_init_once(mapping);
>  	mapping->a_ops = &gfs2_rgrp_aops;
> -	mapping->host = sb->s_bdev->bd_inode;
> +	mapping->host = file_inode(sb->s_bdev_file);
>  	mapping->flags = 0;
>  	mapping_set_gfp_mask(mapping, GFP_NOFS);
>  	mapping->i_private_data = NULL;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format()
  2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2024-03-15 14:55   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:55 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:46, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_devcie.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Makes sense. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/s390/block/dasd_ioctl.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
> index 7e0ed7032f76..c1201590f343 100644
> --- a/drivers/s390/block/dasd_ioctl.c
> +++ b/drivers/s390/block/dasd_ioctl.c
> @@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
>  	 * enabling the device later.
>  	 */
>  	if (fdata->start_unit == 0) {
> -		block->gdp->part0->bd_inode->i_blkbits =
> -			blksize_bits(fdata->blksize);
> +		rc = set_blocksize(block->gdp->part0, fdata->blksize);
> +		if (rc)
> +			return rc;
>  	}
>  
>  	rc = base->discipline->format_device(base, fdata, 1);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
@ 2024-03-15 14:58   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 14:58 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:49, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get mapping
> from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/dir.c       | 2 +-
>  fs/ext4/ext4_jbd2.c | 2 +-
>  fs/ext4/super.c     | 6 +++---
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 3985f8c33f95..0733bc1eec7a 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -192,7 +192,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
>  					(PAGE_SHIFT - inode->i_blkbits);
>  			if (!ra_has_index(&file->f_ra, index))
>  				page_cache_sync_readahead(
> -					sb->s_bdev->bd_inode->i_mapping,
> +					sb->s_bdev_file->f_mapping,
>  					&file->f_ra, file,
>  					index, 1);
>  			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
> diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
> index 5d8055161acd..dbb9aff07ac1 100644
> --- a/fs/ext4/ext4_jbd2.c
> +++ b/fs/ext4/ext4_jbd2.c
> @@ -206,7 +206,7 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
>  
>  static void ext4_check_bdev_write_error(struct super_block *sb)
>  {
> -	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
> +	struct address_space *mapping = sb->s_bdev_file->f_mapping;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
>  	int err;
>  
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2d82b9d4b079..55b3df71bf5e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -244,7 +244,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
>  struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
>  				   blk_opf_t op_flags)
>  {
> -	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
> +	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
>  			~__GFP_FS) | __GFP_MOVABLE;
>  
>  	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
> @@ -253,7 +253,7 @@ struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
>  struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
>  					    sector_t block)
>  {
> -	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
> +	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
>  			~__GFP_FS);
>  
>  	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
> @@ -5560,7 +5560,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
>  	 * used to detect the metadata async write error.
>  	 */
>  	spin_lock_init(&sbi->s_bdev_wb_lock);
> -	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
> +	errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err,
>  				 &sbi->s_bdev_wb_err);
>  	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
>  	ext4_orphan_cleanup(sb, es);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 14/19] jbd2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
@ 2024-03-15 15:06   ` Jan Kara
  2024-03-17 21:26   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 15:06 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:50, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get mapping
> from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/super.c      |  2 +-
>  fs/jbd2/journal.c    | 26 +++++++++++++++-----------
>  include/linux/jbd2.h | 18 ++++++++++++++----
>  3 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 55b3df71bf5e..4df1a5cfe0a5 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5918,7 +5918,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
>  	if (IS_ERR(bdev_file))
>  		return ERR_CAST(bdev_file);
>  
> -	journal = jbd2_journal_init_dev(file_bdev(bdev_file), sb->s_bdev, j_start,
> +	journal = jbd2_journal_init_dev(bdev_file, sb->s_bdev_file, j_start,
>  					j_len, sb->s_blocksize);
>  	if (IS_ERR(journal)) {
>  		ext4_msg(sb, KERN_ERR, "failed to create device journal");
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index b6c114c11b97..abd42a6ccd0e 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -1516,11 +1516,12 @@ static int journal_load_superblock(journal_t *journal)
>   * very few fields yet: that has to wait until we have created the
>   * journal structures from from scratch, or loaded them from disk. */
>  
> -static journal_t *journal_init_common(struct block_device *bdev,
> -			struct block_device *fs_dev,
> +static journal_t *journal_init_common(struct file *bdev_file,
> +			struct file *fs_dev_file,
>  			unsigned long long start, int len, int blocksize)
>  {
>  	static struct lock_class_key jbd2_trans_commit_key;
> +	struct block_device *bdev = file_bdev(bdev_file);
>  	journal_t *journal;
>  	int err;
>  	int n;
> @@ -1531,7 +1532,9 @@ static journal_t *journal_init_common(struct block_device *bdev,
>  
>  	journal->j_blocksize = blocksize;
>  	journal->j_dev = bdev;
> -	journal->j_fs_dev = fs_dev;
> +	journal->j_dev_file = bdev_file;
> +	journal->j_fs_dev = file_bdev(fs_dev_file);
> +	journal->j_fs_dev_file = fs_dev_file;
>  	journal->j_blk_offset = start;
>  	journal->j_total_len = len;
>  	jbd2_init_fs_dev_write_error(journal);
> @@ -1628,8 +1631,8 @@ static journal_t *journal_init_common(struct block_device *bdev,
>  
>  /**
>   *  journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure
> - *  @bdev: Block device on which to create the journal
> - *  @fs_dev: Device which hold journalled filesystem for this journal.
> + *  @bdev_file: Opened block device on which to create the journal
> + *  @fs_dev_file: Opened device which hold journalled filesystem for this journal.
>   *  @start: Block nr Start of journal.
>   *  @len:  Length of the journal in blocks.
>   *  @blocksize: blocksize of journalling device
> @@ -1640,13 +1643,13 @@ static journal_t *journal_init_common(struct block_device *bdev,
>   *  range of blocks on an arbitrary block device.
>   *
>   */
> -journal_t *jbd2_journal_init_dev(struct block_device *bdev,
> -			struct block_device *fs_dev,
> +journal_t *jbd2_journal_init_dev(struct file *bdev_file,
> +			struct file *fs_dev_file,
>  			unsigned long long start, int len, int blocksize)
>  {
>  	journal_t *journal;
>  
> -	journal = journal_init_common(bdev, fs_dev, start, len, blocksize);
> +	journal = journal_init_common(bdev_file, fs_dev_file, start, len, blocksize);
>  	if (IS_ERR(journal))
>  		return ERR_CAST(journal);
>  
> @@ -1683,8 +1686,9 @@ journal_t *jbd2_journal_init_inode(struct inode *inode)
>  		  inode->i_sb->s_id, inode->i_ino, (long long) inode->i_size,
>  		  inode->i_sb->s_blocksize_bits, inode->i_sb->s_blocksize);
>  
> -	journal = journal_init_common(inode->i_sb->s_bdev, inode->i_sb->s_bdev,
> -			blocknr, inode->i_size >> inode->i_sb->s_blocksize_bits,
> +	journal = journal_init_common(inode->i_sb->s_bdev_file,
> +			inode->i_sb->s_bdev_file, blocknr,
> +			inode->i_size >> inode->i_sb->s_blocksize_bits,
>  			inode->i_sb->s_blocksize);
>  	if (IS_ERR(journal))
>  		return ERR_CAST(journal);
> @@ -2009,7 +2013,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
>  		byte_count = (block_stop - block_start + 1) *
>  				journal->j_blocksize;
>  
> -		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> +		truncate_inode_pages_range(journal->j_dev_file->f_mapping,
>  				byte_start, byte_stop);
>  
>  		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 971f3e826e15..fc26730ae8ef 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -968,6 +968,11 @@ struct journal_s
>  	 */
>  	struct block_device	*j_dev;
>  
> +	/**
> +	 * @j_dev_file: Opended device @j_dev.
> +	 */
> +	struct file		*j_dev_file;
> +
>  	/**
>  	 * @j_blocksize: Block size for the location where we store the journal.
>  	 */
> @@ -993,6 +998,11 @@ struct journal_s
>  	 */
>  	struct block_device	*j_fs_dev;
>  
> +	/**
> +	 * @j_fs_dev_file: Opened device @j_fs_dev.
> +	 */
> +	struct file		*j_fs_dev_file;
> +
>  	/**
>  	 * @j_fs_dev_wb_err:
>  	 *
> @@ -1533,8 +1543,8 @@ extern void	 jbd2_journal_unlock_updates (journal_t *);
>  
>  void jbd2_journal_wait_updates(journal_t *);
>  
> -extern journal_t * jbd2_journal_init_dev(struct block_device *bdev,
> -				struct block_device *fs_dev,
> +extern journal_t *jbd2_journal_init_dev(struct file *bdev_file,
> +				struct file *fs_dev_file,
>  				unsigned long long start, int len, int bsize);
>  extern journal_t * jbd2_journal_init_inode (struct inode *);
>  extern int	   jbd2_journal_update_format (journal_t *);
> @@ -1696,7 +1706,7 @@ static inline void jbd2_journal_abort_handle(handle_t *handle)
>  
>  static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
>  {
> -	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
> +	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
>  
>  	/*
>  	 * Save the original wb_err value of client fs's bdev mapping which
> @@ -1707,7 +1717,7 @@ static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
>  
>  static inline int jbd2_check_fs_dev_write_error(journal_t *journal)
>  {
> -	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
> +	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
>  
>  	return errseq_check(&mapping->wb_err,
>  			    READ_ONCE(journal->j_fs_dev_wb_err));
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
@ 2024-03-15 15:09   ` Jan Kara
  2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 15:09 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:47, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode or
> mapping from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/btrfs/disk-io.c | 17 +++++++++--------
>  fs/btrfs/disk-io.h |  4 ++--
>  fs/btrfs/super.c   |  2 +-
>  fs/btrfs/volumes.c | 15 +++++++--------
>  fs/btrfs/zoned.c   | 20 +++++++++++---------
>  fs/btrfs/zoned.h   |  4 ++--
>  6 files changed, 32 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index bececdd63b4d..344955765f3e 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3235,7 +3235,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
>  	/*
>  	 * Read super block and check the signature bytes only
>  	 */
> -	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
> +	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev_file);
>  	if (IS_ERR(disk_super)) {
>  		ret = PTR_ERR(disk_super);
>  		goto fail_alloc;
> @@ -3656,17 +3656,18 @@ static void btrfs_end_super_write(struct bio *bio)
>  	bio_put(bio);
>  }
>  
> -struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> +struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
>  						   int copy_num, bool drop_cache)
>  {
>  	struct btrfs_super_block *super;
>  	struct page *page;
>  	u64 bytenr, bytenr_orig;
> -	struct address_space *mapping = bdev->bd_inode->i_mapping;
> +	struct block_device *bdev = file_bdev(bdev_file);
> +	struct address_space *mapping = bdev_file->f_mapping;
>  	int ret;
>  
>  	bytenr_orig = btrfs_sb_offset(copy_num);
> -	ret = btrfs_sb_log_location_bdev(bdev, copy_num, READ, &bytenr);
> +	ret = btrfs_sb_log_location_bdev(bdev_file, copy_num, READ, &bytenr);
>  	if (ret == -ENOENT)
>  		return ERR_PTR(-EINVAL);
>  	else if (ret)
> @@ -3707,7 +3708,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
>  }
>  
>  
> -struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
> +struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file)
>  {
>  	struct btrfs_super_block *super, *latest = NULL;
>  	int i;
> @@ -3719,7 +3720,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
>  	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
>  	 */
>  	for (i = 0; i < 1; i++) {
> -		super = btrfs_read_dev_one_super(bdev, i, false);
> +		super = btrfs_read_dev_one_super(bdev_file, i, false);
>  		if (IS_ERR(super))
>  			continue;
>  
> @@ -3749,7 +3750,7 @@ static int write_dev_supers(struct btrfs_device *device,
>  			    struct btrfs_super_block *sb, int max_mirrors)
>  {
>  	struct btrfs_fs_info *fs_info = device->fs_info;
> -	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
> +	struct address_space *mapping = device->bdev_file->f_mapping;
>  	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
>  	int i;
>  	int errors = 0;
> @@ -3866,7 +3867,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
>  		    device->commit_total_bytes)
>  			break;
>  
> -		page = find_get_page(device->bdev->bd_inode->i_mapping,
> +		page = find_get_page(device->bdev_file->f_mapping,
>  				     bytenr >> PAGE_SHIFT);
>  		if (!page) {
>  			errors++;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 375f62ae3709..2c627885d8d1 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -60,8 +60,8 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
>  			 struct btrfs_super_block *sb, int mirror_num);
>  int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount);
>  int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
> -struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev);
> -struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> +struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file);
> +struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
>  						   int copy_num, bool drop_cache);
>  int btrfs_commit_super(struct btrfs_fs_info *fs_info);
>  struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 40ae264fd3ed..9f50f20a1ba4 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2286,7 +2286,7 @@ static int check_dev_super(struct btrfs_device *dev)
>  		return 0;
>  
>  	/* Only need to check the primary super block. */
> -	sb = btrfs_read_dev_one_super(dev->bdev, 0, true);
> +	sb = btrfs_read_dev_one_super(dev->bdev_file, 0, true);
>  	if (IS_ERR(sb))
>  		return PTR_ERR(sb);
>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e12451ff911a..9fccfb156bd2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -488,7 +488,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
>  		goto error;
>  	}
>  	invalidate_bdev(bdev);
> -	*disk_super = btrfs_read_dev_super(bdev);
> +	*disk_super = btrfs_read_dev_super(*bdev_file);
>  	if (IS_ERR(*disk_super)) {
>  		ret = PTR_ERR(*disk_super);
>  		fput(*bdev_file);
> @@ -1244,7 +1244,7 @@ void btrfs_release_disk_super(struct btrfs_super_block *super)
>  	put_page(page);
>  }
>  
> -static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
> +static struct btrfs_super_block *btrfs_read_disk_super(struct file *bdev_file,
>  						       u64 bytenr, u64 bytenr_orig)
>  {
>  	struct btrfs_super_block *disk_super;
> @@ -1253,7 +1253,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
>  	pgoff_t index;
>  
>  	/* make sure our super fits in the device */
> -	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(bdev))
> +	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(file_bdev(bdev_file)))
>  		return ERR_PTR(-EINVAL);
>  
>  	/* make sure our super fits in the page */
> @@ -1266,7 +1266,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
>  		return ERR_PTR(-EINVAL);
>  
>  	/* pull in the page with our super */
> -	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
> +	page = read_cache_page_gfp(bdev_file->f_mapping, index, GFP_KERNEL);
>  
>  	if (IS_ERR(page))
>  		return ERR_CAST(page);
> @@ -1368,14 +1368,13 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>  		return ERR_CAST(bdev_file);
>  
>  	bytenr_orig = btrfs_sb_offset(0);
> -	ret = btrfs_sb_log_location_bdev(file_bdev(bdev_file), 0, READ, &bytenr);
> +	ret = btrfs_sb_log_location_bdev(bdev_file, 0, READ, &bytenr);
>  	if (ret) {
>  		device = ERR_PTR(ret);
>  		goto error_bdev_put;
>  	}
>  
> -	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr,
> -					   bytenr_orig);
> +	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr_orig);
>  	if (IS_ERR(disk_super)) {
>  		device = ERR_CAST(disk_super);
>  		goto error_bdev_put;
> @@ -2040,7 +2039,7 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
>  	const u64 bytenr = btrfs_sb_offset(copy_num);
>  	int ret;
>  
> -	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
> +	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr);
>  	if (IS_ERR(disk_super))
>  		return;
>  
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index 12d77aba0148..9e4e2951cdf5 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -81,7 +81,7 @@ static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data
>  	return 0;
>  }
>  
> -static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
> +static int sb_write_pointer(struct file *bdev_file, struct blk_zone *zones,
>  			    u64 *wp_ret)
>  {
>  	bool empty[BTRFS_NR_SB_LOG_ZONES];
> @@ -118,7 +118,7 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
>  		return -ENOENT;
>  	} else if (full[0] && full[1]) {
>  		/* Compare two super blocks */
> -		struct address_space *mapping = bdev->bd_inode->i_mapping;
> +		struct address_space *mapping = bdev_file->f_mapping;
>  		struct page *page[BTRFS_NR_SB_LOG_ZONES];
>  		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
>  		int i;
> @@ -562,7 +562,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
>  		    BLK_ZONE_TYPE_CONVENTIONAL)
>  			continue;
>  
> -		ret = sb_write_pointer(device->bdev,
> +		ret = sb_write_pointer(device->bdev_file,
>  				       &zone_info->sb_zones[sb_pos], &sb_wp);
>  		if (ret != -ENOENT && ret) {
>  			btrfs_err_in_rcu(device->fs_info,
> @@ -798,7 +798,7 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount
>  	return 0;
>  }
>  
> -static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
> +static int sb_log_location(struct file *bdev_file, struct blk_zone *zones,
>  			   int rw, u64 *bytenr_ret)
>  {
>  	u64 wp;
> @@ -809,7 +809,7 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
>  		return 0;
>  	}
>  
> -	ret = sb_write_pointer(bdev, zones, &wp);
> +	ret = sb_write_pointer(bdev_file, zones, &wp);
>  	if (ret != -ENOENT && ret < 0)
>  		return ret;
>  
> @@ -827,7 +827,8 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
>  			ASSERT(sb_zone_is_full(reset));
>  
>  			nofs_flags = memalloc_nofs_save();
> -			ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
> +			ret = blkdev_zone_mgmt(file_bdev(bdev_file),
> +					       REQ_OP_ZONE_RESET,
>  					       reset->start, reset->len);
>  			memalloc_nofs_restore(nofs_flags);
>  			if (ret)
> @@ -859,10 +860,11 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
>  
>  }
>  
> -int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
> +int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
>  			       u64 *bytenr_ret)
>  {
>  	struct blk_zone zones[BTRFS_NR_SB_LOG_ZONES];
> +	struct block_device *bdev = file_bdev(bdev_file);
>  	sector_t zone_sectors;
>  	u32 sb_zone;
>  	int ret;
> @@ -896,7 +898,7 @@ int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
>  	if (ret != BTRFS_NR_SB_LOG_ZONES)
>  		return -EIO;
>  
> -	return sb_log_location(bdev, zones, rw, bytenr_ret);
> +	return sb_log_location(bdev_file, zones, rw, bytenr_ret);
>  }
>  
>  int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
> @@ -920,7 +922,7 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
>  	if (zone_num + 1 >= zinfo->nr_zones)
>  		return -ENOENT;
>  
> -	return sb_log_location(device->bdev,
> +	return sb_log_location(device->bdev_file,
>  			       &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror],
>  			       rw, bytenr_ret);
>  }
> diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
> index 77c4321e331f..32680a04aa1f 100644
> --- a/fs/btrfs/zoned.h
> +++ b/fs/btrfs/zoned.h
> @@ -61,7 +61,7 @@ void btrfs_destroy_dev_zone_info(struct btrfs_device *device);
>  struct btrfs_zoned_device_info *btrfs_clone_dev_zone_info(struct btrfs_device *orig_dev);
>  int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
>  int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount_opt);
> -int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
> +int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
>  			       u64 *bytenr_ret);
>  int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
>  			  u64 *bytenr_ret);
> @@ -142,7 +142,7 @@ static inline int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info,
>  	return 0;
>  }
>  
> -static inline int btrfs_sb_log_location_bdev(struct block_device *bdev,
> +static inline int btrfs_sb_log_location_bdev(struct file *bdev_file,
>  					     int mirror, int rw, u64 *bytenr_ret)
>  {
>  	*bytenr_ret = btrfs_sb_offset(mirror);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 15/19] bcache: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
@ 2024-03-15 15:11   ` Jan Kara
  2024-03-17 21:34   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 15:11 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:51, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all bcache stash the file of opened bdev, it's ok to get
> mapping from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/md/bcache/super.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 4153c9ddbe0b..ec9efa79d5a8 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -163,15 +163,16 @@ static const char *read_super_common(struct cache_sb *sb,  struct block_device *
>  }
>  
>  
> -static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
> +static const char *read_super(struct cache_sb *sb, struct file *bdev_file,
>  			      struct cache_sb_disk **res)
>  {
>  	const char *err;
> +	struct block_device *bdev = file_bdev(bdev_file);
>  	struct cache_sb_disk *s;
>  	struct page *page;
>  	unsigned int i;
>  
> -	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
> +	page = read_cache_page_gfp(bdev_file->f_mapping,
>  				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
>  	if (IS_ERR(page))
>  		return "IO error";
> @@ -2564,7 +2565,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
>  	if (set_blocksize(file_bdev(bdev_file), 4096))
>  		goto out_blkdev_put;
>  
> -	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
> +	err = read_super(sb, bdev_file, &sb_disk);
>  	if (err)
>  		goto out_blkdev_put;
>  
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 16/19] block2mtd: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
@ 2024-03-15 15:12   ` Jan Kara
  2024-03-17 21:36   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-15 15:12 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:52, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that block2mtd stash the file of opened bdev, it's ok to get inode
> from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/mtd/devices/block2mtd.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
> index 97a00ec9a4d4..e9ecb3286dcb 100644
> --- a/drivers/mtd/devices/block2mtd.c
> +++ b/drivers/mtd/devices/block2mtd.c
> @@ -265,6 +265,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
>  	struct file *bdev_file;
>  	struct block_device *bdev;
>  	struct block2mtd_dev *dev;
> +	loff_t size;
>  	char *name;
>  
>  	if (!devname)
> @@ -291,7 +292,8 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
>  		goto err_free_block2mtd;
>  	}
>  
> -	if ((long)bdev->bd_inode->i_size % erase_size) {
> +	size = i_size_read(file_inode(bdev_file));
> +	if ((long)size % erase_size) {
>  		pr_err("erasesize must be a divisor of device size\n");
>  		goto err_free_block2mtd;
>  	}
> @@ -309,7 +311,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
>  
>  	dev->mtd.name = name;
>  
> -	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
> +	dev->mtd.size = size & PAGE_MASK;
>  	dev->mtd.erasesize = erase_size;
>  	dev->mtd.writesize = 1;
>  	dev->mtd.writebufsize = PAGE_SIZE;
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-15 13:54   ` Christian Brauner
@ 2024-03-16  2:49     ` Yu Kuai
  2024-03-18  9:39       ` Christian Brauner
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-16  2:49 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Yu Kuai, jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi, Christian

在 2024/03/15 21:54, Christian Brauner 写道:
> On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
>> Hi, Christian
>> Hi, Christoph
>> Hi, Jan
>>
>> Perhaps now is a good time to send a formal version of this set.
>> However, I'm not sure yet what branch should I rebase and send this set.
>> Should I send to the vfs tree?
> 
> Nearly all of it is in fs/ so I'd say yes.
> .

I see that you just create a new branch vfs.fixes, perhaps can I rebase
this set against this branch?

Thanks,
Kuai

> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 01/19] block: move two helpers into bdev.c
  2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
  2024-03-15 14:31   ` Jan Kara
@ 2024-03-17 21:19   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:37PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> disk_live() and block_size() access bd_inode directly, prepare to remove
> the field bd_inode from block_device, and only access bd_inode in block
> layer.

This looks good in general.

Reviewed-by: Christoph Hellwig <hch@lst.de>

(I wish we could eventually retired block_size() and the whole concept
of soft "block size" that could be different form the LBA size, but
that's a totally different adventure)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait()
  2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
  2024-03-15 14:34   ` Jan Kara
@ 2024-03-17 21:19   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 03/19] block: remove sync_blockdev_range()
  2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
  2024-03-15 14:37   ` Jan Kara
@ 2024-03-17 21:21   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:21 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
  2024-03-15 14:44   ` Jan Kara
@ 2024-03-17 21:23   ` Christoph Hellwig
  2024-03-22  5:44   ` Al Viro
  2 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:40PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Add helpers to access bd_inode, prepare to remove the field 'bd_inode'
> after removing all the access from filesystems and drivers.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

I suspect a few things like the partition parsing should pass down
the bdev file further, but no need to do that now.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors()
  2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
  2024-03-15 14:42   ` Jan Kara
@ 2024-03-17 21:23   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:41PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> bdev_sectors() is not used hence remove it.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
  2024-03-15 14:44   ` Jan Kara
@ 2024-03-17 21:23   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 07/19] erofs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
  2024-03-15 14:45   ` Jan Kara
@ 2024-03-17 21:24   ` Christoph Hellwig
  2024-03-18  2:39   ` Gao Xiang
  2 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:24 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 08/19] nilfs2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
  2024-03-15 14:49   ` Jan Kara
@ 2024-03-17 21:24   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:24 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 09/19] gfs2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
  2024-03-15 14:54   ` Jan Kara
@ 2024-03-17 21:24   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:24 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format()
  2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
  2024-03-15 14:55   ` Jan Kara
@ 2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:25 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
  2024-03-15 15:09   ` Jan Kara
@ 2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:25 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
  2024-03-15 14:58   ` Jan Kara
@ 2024-03-17 21:25   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:25 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 14/19] jbd2: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
  2024-03-15 15:06   ` Jan Kara
@ 2024-03-17 21:26   ` Christoph Hellwig
  2024-03-18  1:10     ` Yu Kuai
  1 sibling, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:26 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

> +extern journal_t *jbd2_journal_init_dev(struct file *bdev_file,
> +				struct file *fs_dev_file,

Maybe drop the pointless extern while you're at it?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 15/19] bcache: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
  2024-03-15 15:11   ` Jan Kara
@ 2024-03-17 21:34   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:34 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 16/19] block2mtd: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
  2024-03-15 15:12   ` Jan Kara
@ 2024-03-17 21:36   ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:36 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:52PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that block2mtd stash the file of opened bdev, it's ok to get inode
> from the file.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

(not that block2mtd would also significantly benefit from using the
normal file based APIs for accessing the underlying block device.  No
need to do that in this series, though)


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
  2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
@ 2024-03-17 21:36   ` Christoph Hellwig
  2024-03-18  1:12     ` Yu Kuai
  2024-03-18  9:22   ` Jan Kara
  1 sibling, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:36 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

Can you split this in a block layer patch adding the helper and scsi
one using it?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
  2024-02-25  0:06   ` kernel test robot
@ 2024-03-17 21:38   ` Christoph Hellwig
  2024-03-18  1:26     ` Yu Kuai
  1 sibling, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-17 21:38 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:55PM +0800, Yu Kuai wrote:
> The only user that doesn't rely on files is the block layer itself in
> block/fops.c where we only have access to the block device. As the bdev
> filesystem doesn't open block devices as files obviously.

Why is that obvious?  Maybe I'm just thick but this seems odd to me.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 14/19] jbd2: prevent direct access of bd_inode
  2024-03-17 21:26   ` Christoph Hellwig
@ 2024-03-18  1:10     ` Yu Kuai
  0 siblings, 0 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-18  1:10 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 5:26, Christoph Hellwig 写道:
>> +extern journal_t *jbd2_journal_init_dev(struct file *bdev_file,
>> +				struct file *fs_dev_file,
> 
> Maybe drop the pointless extern while you're at it?

Will do this in the formal version.

Thansk for the review!
Kuai
> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
  2024-03-17 21:36   ` Christoph Hellwig
@ 2024-03-18  1:12     ` Yu Kuai
  0 siblings, 0 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-18  1:12 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 5:36, Christoph Hellwig 写道:
> Can you split this in a block layer patch adding the helper and scsi
> one using it?

Of course, if you don't mind, I'll add your reviewed tag as well, since
there will be no functional change.

Thanks,
Kuai

> 
> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-17 21:38   ` Christoph Hellwig
@ 2024-03-18  1:26     ` Yu Kuai
  2024-03-18  1:32       ` Christoph Hellwig
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-18  1:26 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 5:38, Christoph Hellwig 写道:
> On Thu, Feb 22, 2024 at 08:45:55PM +0800, Yu Kuai wrote:
>> The only user that doesn't rely on files is the block layer itself in
>> block/fops.c where we only have access to the block device. As the bdev
>> filesystem doesn't open block devices as files obviously.
> 
> Why is that obvious?  Maybe I'm just thick but this seems odd to me.

Because there is a real filesystem(devtmpfs) used for raw block devcie
file operations, open syscall to devtmpfs:

blkdev_open
  bdev = blkdev_get_no_open
  bdev_open -> pass in file is from devtmpfs
  -> in this case, file inode is from devtmpfs,

Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
no access to the devtmpfs file, we can't use s_bdev_file() as other
filesystems here.

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18  1:26     ` Yu Kuai
@ 2024-03-18  1:32       ` Christoph Hellwig
  2024-03-18  1:51         ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-18  1:32 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> Because there is a real filesystem(devtmpfs) used for raw block devcie
> file operations, open syscall to devtmpfs:
>
> blkdev_open
>  bdev = blkdev_get_no_open
>  bdev_open -> pass in file is from devtmpfs
>  -> in this case, file inode is from devtmpfs,

But file->f_mapping->host should still point to the bdevfs inode,
and file->f_mapping->host is what everything in the I/O path should
be using.

> Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> no access to the devtmpfs file, we can't use s_bdev_file() as other
> filesystems here.

We can just pass the file down in iomap_iter.private

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18  1:32       ` Christoph Hellwig
@ 2024-03-18  1:51         ` Yu Kuai
  2024-03-18  7:19           ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-18  1:51 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 9:32, Christoph Hellwig 写道:
> On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
>> Because there is a real filesystem(devtmpfs) used for raw block devcie
>> file operations, open syscall to devtmpfs:
>>
>> blkdev_open
>>   bdev = blkdev_get_no_open
>>   bdev_open -> pass in file is from devtmpfs
>>   -> in this case, file inode is from devtmpfs,
> 
> But file->f_mapping->host should still point to the bdevfs inode,
> and file->f_mapping->host is what everything in the I/O path should
> be using.
> 
>> Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
>> no access to the devtmpfs file, we can't use s_bdev_file() as other
>> filesystems here.
> 
> We can just pass the file down in iomap_iter.private

I can do this for blkdev_read_folio(), however, for other ops like
blkdev_writepages(), I can't find a way to pass the file to
iomap_iter.private yet.

Any suggestions?

Thanks,
Kuai
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 07/19] erofs: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
  2024-03-15 14:45   ` Jan Kara
  2024-03-17 21:24   ` Christoph Hellwig
@ 2024-03-18  2:39   ` Gao Xiang
  2 siblings, 0 replies; 98+ messages in thread
From: Gao Xiang @ 2024-03-18  2:39 UTC (permalink / raw)
  To: Yu Kuai, jack, hch, brauner, axboe
  Cc: linux-fsdevel, linux-block, yukuai3, yi.zhang, yangerkun



On 2024/2/22 20:45, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> for the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

(BTW, it'd be better to +Cc EROFS mailing list for this patch.)

Thanks,
Gao Xiang

> ---
>   fs/erofs/data.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 433fc39ba423..dc2d43abe8c5 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -70,7 +70,7 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
>   	if (erofs_is_fscache_mode(sb))
>   		buf->inode = EROFS_SB(sb)->s_fscache->inode;
>   	else
> -		buf->inode = sb->s_bdev->bd_inode;
> +		buf->inode = file_inode(sb->s_bdev_file);
>   }
>   
>   void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18  1:51         ` Yu Kuai
@ 2024-03-18  7:19           ` Yu Kuai
  2024-03-18 10:07             ` Christian Brauner
  2024-03-18 23:22             ` Christoph Hellwig
  0 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-18  7:19 UTC (permalink / raw)
  To: Yu Kuai, Christoph Hellwig
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi, Christoph!

在 2024/03/18 9:51, Yu Kuai 写道:
> Hi,
> 
> 在 2024/03/18 9:32, Christoph Hellwig 写道:
>> On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
>>> Because there is a real filesystem(devtmpfs) used for raw block devcie
>>> file operations, open syscall to devtmpfs:
>>>
>>> blkdev_open
>>>   bdev = blkdev_get_no_open
>>>   bdev_open -> pass in file is from devtmpfs
>>>   -> in this case, file inode is from devtmpfs,
>>
>> But file->f_mapping->host should still point to the bdevfs inode,
>> and file->f_mapping->host is what everything in the I/O path should
>> be using.
>>
>>> Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
>>> no access to the devtmpfs file, we can't use s_bdev_file() as other
>>> filesystems here.
>>
>> We can just pass the file down in iomap_iter.private
> 
> I can do this for blkdev_read_folio(), however, for other ops like
> blkdev_writepages(), I can't find a way to pass the file to
> iomap_iter.private yet.
> 
> Any suggestions?

I come up with an ideal:

While opening the block_device the first time, store the generated new
file in "bd_inode->i_private". And release it after the last opener
close the block_device.

The advantages are:
  - multiple openers can share the same bdev_file;
  - raw block device ops can use the bdev_file as well, and there is no
need to distinguish iomap/buffer_head for raw block_device;

Please let me know what do you think?

Thanks,
Kuai
> 
> Thanks,
> Kuai
>> .
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-02-28 13:41   ` Christoph Hellwig
@ 2024-03-18  9:11     ` Jan Kara
  0 siblings, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-18  9:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yukuai3, yi.zhang, yangerkun

On Wed 28-02-24 05:41:54, Christoph Hellwig wrote:
> On Thu, Feb 22, 2024 at 08:45:53PM +0800, Yu Kuai wrote:
> > From: Yu Kuai <yukuai3@huawei.com>
> > 
> > Now that dm upper layer already statsh the file of opened device in
> > 'dm_dev->bdev_file', it's ok to get inode from the file.
> 
> Where did this code get in?

I was surprised as well but apparently 61387b8dcf1dc0 ("Merge tag
'for-6.9/dm-vdo' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm")
during this merge window...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
  2024-02-28 13:41   ` Christoph Hellwig
@ 2024-03-18  9:19   ` Jan Kara
  2024-03-18 13:38     ` Yu Kuai
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Kara @ 2024-03-18  9:19 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:53, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that dm upper layer already statsh the file of opened device in
> 'dm_dev->bdev_file', it's ok to get inode from the file.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Given there are like three real uses of ->bdev in dm-vdo, I suspect it
might be better to just replace bdev with bdev_file in struct io_factory
and in struct uds_parameters.

								Honza

> ---
>  drivers/md/dm-vdo/dedupe.c                |  3 ++-
>  drivers/md/dm-vdo/dm-vdo-target.c         |  5 +++--
>  drivers/md/dm-vdo/indexer/config.c        |  1 +
>  drivers/md/dm-vdo/indexer/config.h        |  3 +++
>  drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
>  drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
>  drivers/md/dm-vdo/indexer/index-session.c | 13 +++++++------
>  drivers/md/dm-vdo/indexer/index.c         |  4 ++--
>  drivers/md/dm-vdo/indexer/index.h         |  2 +-
>  drivers/md/dm-vdo/indexer/indexer.h       |  4 +++-
>  drivers/md/dm-vdo/indexer/io-factory.c    | 13 ++++++++-----
>  drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
>  drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
>  drivers/md/dm-vdo/indexer/volume.h        |  2 +-
>  14 files changed, 39 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
> index a9b189395592..532294a15174 100644
> --- a/drivers/md/dm-vdo/dedupe.c
> +++ b/drivers/md/dm-vdo/dedupe.c
> @@ -2592,7 +2592,8 @@ static void resume_index(void *context, struct vdo_completion *parent)
>  	int result;
>  
>  	zones->parameters.bdev = config->owned_device->bdev;
> -	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
> +	zones->parameters.bdev_file = config->owned_device->bdev_file;
> +	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev_file);
>  	if (result != UDS_SUCCESS)
>  		vdo_log_error_strerror(result, "Error resuming dedupe index");
>  
> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
> index 89d00be9f075..b2d7f68e70be 100644
> --- a/drivers/md/dm-vdo/dm-vdo-target.c
> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
> @@ -883,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
>  	}
>  
>  	if (config->version == 0) {
> -		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
> +		u64 device_size = i_size_read(file_inode(config->owned_device->bdev_file));
>  
>  		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>  	}
> @@ -1018,7 +1018,8 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
>  
>  static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
>  {
> -	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
> +	return i_size_read(file_inode(vdo->device_config->owned_device->bdev_file)) /
> +		VDO_BLOCK_SIZE;
>  }
>  
>  static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
> diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
> index 260993ce1944..f1f66e232b54 100644
> --- a/drivers/md/dm-vdo/indexer/config.c
> +++ b/drivers/md/dm-vdo/indexer/config.c
> @@ -347,6 +347,7 @@ int uds_make_configuration(const struct uds_parameters *params,
>  	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
>  	config->nonce = params->nonce;
>  	config->bdev = params->bdev;
> +	config->bdev_file = params->bdev_file;
>  	config->offset = params->offset;
>  	config->size = params->size;
>  
> diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
> index fe7958263ed6..688f7450183e 100644
> --- a/drivers/md/dm-vdo/indexer/config.h
> +++ b/drivers/md/dm-vdo/indexer/config.h
> @@ -28,6 +28,9 @@ struct uds_configuration {
>  	/* Storage device for the index */
>  	struct block_device *bdev;
>  
> +	/* Opened device fot the index */
> +	struct file *bdev_file;
> +
>  	/* The maximum allowable size of the index */
>  	size_t size;
>  
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
> index 1453fddaa656..6dd80a432fe5 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.c
> +++ b/drivers/md/dm-vdo/indexer/index-layout.c
> @@ -1672,7 +1672,7 @@ static int create_layout_factory(struct index_layout *layout,
>  	size_t writable_size;
>  	struct io_factory *factory = NULL;
>  
> -	result = uds_make_io_factory(config->bdev, &factory);
> +	result = uds_make_io_factory(config->bdev_file, &factory);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> @@ -1745,9 +1745,9 @@ void vdo_free_index_layout(struct index_layout *layout)
>  }
>  
>  int uds_replace_index_layout_storage(struct index_layout *layout,
> -				     struct block_device *bdev)
> +				     struct file *bdev_file)
>  {
> -	return uds_replace_storage(layout->factory, bdev);
> +	return uds_replace_storage(layout->factory, bdev_file);
>  }
>  
>  /* Obtain a dm_bufio_client for the volume region. */
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
> index bd9b90c84a70..9b0c850fe9a7 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.h
> +++ b/drivers/md/dm-vdo/indexer/index-layout.h
> @@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
>  void vdo_free_index_layout(struct index_layout *layout);
>  
>  int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
> -						  struct block_device *bdev);
> +						  struct file *bdev_file);
>  
>  int __must_check uds_load_index_state(struct index_layout *layout,
>  				      struct uds_index *index);
> diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
> index 1949a2598656..df8f8122a22d 100644
> --- a/drivers/md/dm-vdo/indexer/index-session.c
> +++ b/drivers/md/dm-vdo/indexer/index-session.c
> @@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
>  	return uds_status_to_errno(result);
>  }
>  
> -static int replace_device(struct uds_index_session *session, struct block_device *bdev)
> +static int replace_device(struct uds_index_session *session, struct file *bdev_file)
>  {
>  	int result;
>  
> -	result = uds_replace_index_storage(session->index, bdev);
> +	result = uds_replace_index_storage(session->index, bdev_file);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> -	session->parameters.bdev = bdev;
> +	session->parameters.bdev = file_bdev(bdev_file);
> +	session->parameters.bdev_file = bdev_file;
>  	return UDS_SUCCESS;
>  }
>  
> @@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
>   * device differs from the current backing store, the index will start using the new backing store.
>   */
>  int uds_resume_index_session(struct uds_index_session *session,
> -			     struct block_device *bdev)
> +			     struct file *bdev_file)
>  {
>  	int result = UDS_SUCCESS;
>  	bool no_work = false;
> @@ -502,8 +503,8 @@ int uds_resume_index_session(struct uds_index_session *session,
>  	if (no_work)
>  		return result;
>  
> -	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
> -		result = replace_device(session, bdev);
> +	if ((session->index != NULL) && (bdev_file != session->parameters.bdev_file)) {
> +		result = replace_device(session, bdev_file);
>  		if (result != UDS_SUCCESS) {
>  			mutex_lock(&session->request_mutex);
>  			session->state &= ~IS_FLAG_WAITING;
> diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
> index bd2405738c50..3600a169ca98 100644
> --- a/drivers/md/dm-vdo/indexer/index.c
> +++ b/drivers/md/dm-vdo/indexer/index.c
> @@ -1334,9 +1334,9 @@ int uds_save_index(struct uds_index *index)
>  	return result;
>  }
>  
> -int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
> +int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
>  {
> -	return uds_replace_volume_storage(index->volume, index->layout, bdev);
> +	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
>  }
>  
>  /* Accessing statistics should be safe from any thread. */
> diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
> index 7fbc63db4131..9428ee025cda 100644
> --- a/drivers/md/dm-vdo/indexer/index.h
> +++ b/drivers/md/dm-vdo/indexer/index.h
> @@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
>  void vdo_free_index(struct uds_index *index);
>  
>  int __must_check uds_replace_index_storage(struct uds_index *index,
> -					   struct block_device *bdev);
> +					   struct file *bdev_file);
>  
>  void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
>  
> diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
> index a832a34d9436..5dd2c93f12c2 100644
> --- a/drivers/md/dm-vdo/indexer/indexer.h
> +++ b/drivers/md/dm-vdo/indexer/indexer.h
> @@ -130,6 +130,8 @@ struct uds_volume_record {
>  struct uds_parameters {
>  	/* The block_device used for storage */
>  	struct block_device *bdev;
> +	/* Then opened block_device */
> +	struct file *bdev_file;
>  	/* The maximum allowable size of the index on storage */
>  	size_t size;
>  	/* The offset where the index should start */
> @@ -314,7 +316,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
>   * start using the new backing store instead.
>   */
>  int __must_check uds_resume_index_session(struct uds_index_session *session,
> -					  struct block_device *bdev);
> +					  struct file *bdev_file);
>  
>  /* Wait until all outstanding index operations are complete. */
>  int __must_check uds_flush_index_session(struct uds_index_session *session);
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
> index 61104d5ccd61..a855c3ac73bc 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.c
> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
> @@ -23,6 +23,7 @@
>   */
>  struct io_factory {
>  	struct block_device *bdev;
> +	struct file *bdev_file;
>  	atomic_t ref_count;
>  };
>  
> @@ -59,7 +60,7 @@ static void uds_get_io_factory(struct io_factory *factory)
>  	atomic_inc(&factory->ref_count);
>  }
>  
> -int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
> +int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
>  {
>  	int result;
>  	struct io_factory *factory;
> @@ -68,16 +69,18 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
>  	if (result != VDO_SUCCESS)
>  		return result;
>  
> -	factory->bdev = bdev;
> +	factory->bdev = file_bdev(bdev_file);
> +	factory->bdev_file = bdev_file;
>  	atomic_set_release(&factory->ref_count, 1);
>  
>  	*factory_ptr = factory;
>  	return UDS_SUCCESS;
>  }
>  
> -int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
> +int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
>  {
> -	factory->bdev = bdev;
> +	factory->bdev = file_bdev(bdev_file);
> +	factory->bdev_file = bdev_file;
>  	return UDS_SUCCESS;
>  }
>  
> @@ -90,7 +93,7 @@ void uds_put_io_factory(struct io_factory *factory)
>  
>  size_t uds_get_writable_size(struct io_factory *factory)
>  {
> -	return i_size_read(factory->bdev->bd_inode);
> +	return i_size_read(file_inode(factory->bdev_file));
>  }
>  
>  /* Create a struct dm_bufio_client for an index region starting at offset. */
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
> index 60749a9ff756..e5100ab57754 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.h
> +++ b/drivers/md/dm-vdo/indexer/io-factory.h
> @@ -24,11 +24,11 @@ enum {
>  	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
>  };
>  
> -int __must_check uds_make_io_factory(struct block_device *bdev,
> +int __must_check uds_make_io_factory(struct file *bdev_file,
>  				     struct io_factory **factory_ptr);
>  
>  int __must_check uds_replace_storage(struct io_factory *factory,
> -				     struct block_device *bdev);
> +				     struct file *bdev_file);
>  
>  void uds_put_io_factory(struct io_factory *factory);
>  
> diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
> index 8b21ec93f3bc..a292840a83e3 100644
> --- a/drivers/md/dm-vdo/indexer/volume.c
> +++ b/drivers/md/dm-vdo/indexer/volume.c
> @@ -1467,12 +1467,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
>  
>  int __must_check uds_replace_volume_storage(struct volume *volume,
>  					    struct index_layout *layout,
> -					    struct block_device *bdev)
> +					    struct file *bdev_file)
>  {
>  	int result;
>  	u32 i;
>  
> -	result = uds_replace_index_layout_storage(layout, bdev);
> +	result = uds_replace_index_layout_storage(layout, bdev_file);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
> index 7fdd44464db2..5861654d837e 100644
> --- a/drivers/md/dm-vdo/indexer/volume.h
> +++ b/drivers/md/dm-vdo/indexer/volume.h
> @@ -131,7 +131,7 @@ void vdo_free_volume(struct volume *volume);
>  
>  int __must_check uds_replace_volume_storage(struct volume *volume,
>  					    struct index_layout *layout,
> -					    struct block_device *bdev);
> +					    struct file *bdev_file);
>  
>  int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
>  						    u64 *lowest_vcn, u64 *highest_vcn,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable()
  2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
  2024-03-17 21:36   ` Christoph Hellwig
@ 2024-03-18  9:22   ` Jan Kara
  1 sibling, 0 replies; 98+ messages in thread
From: Jan Kara @ 2024-03-18  9:22 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu 22-02-24 20:45:54, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> scsi_bios_ptable() is reading without opening disk as file, factor out
> a helper to read into block device page cache to prevent access bd_inode
> directly from scsi.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me. Either before or after split feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bdev.c           | 19 +++++++++++++++++++
>  drivers/scsi/scsicam.c |  3 +--
>  include/linux/blkdev.h |  1 +
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bdev.c b/block/bdev.c
> index 60a1479eae83..b7af04d34af2 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -1211,6 +1211,25 @@ unsigned int block_size(struct block_device *bdev)
>  }
>  EXPORT_SYMBOL_GPL(block_size);
>  
> +/**
> + * bdev_read_folio - Read into block device page cache.
> + * @bdev: the block device which holds the cache to read.
> + * @pos: the offset that allocated folio will contain.
> + *
> + * Read one page into the block device page cache. If it succeeds, the folio
> + * returned will contain @pos;
> + *
> + * This is only used for scsi_bios_ptable(), the bdev is not opened as files.
> + *
> + * Return: Uptodate folio on success, ERR_PTR() on failure.
> + */
> +struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
> +{
> +	return mapping_read_folio_gfp(bdev_mapping(bdev),
> +				      pos >> PAGE_SHIFT, GFP_KERNEL);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio);
> +
>  static int __init setup_bdev_allow_write_mounted(char *str)
>  {
>  	if (kstrtobool(str, &bdev_allow_write_mounted))
> diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
> index e2c7d8ef205f..1c99b964a0eb 100644
> --- a/drivers/scsi/scsicam.c
> +++ b/drivers/scsi/scsicam.c
> @@ -32,11 +32,10 @@
>   */
>  unsigned char *scsi_bios_ptable(struct block_device *dev)
>  {
> -	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
>  	unsigned char *res = NULL;
>  	struct folio *folio;
>  
> -	folio = read_mapping_folio(mapping, 0, NULL);
> +	folio = bdev_read_folio(bdev_whole(dev), 0);
>  	if (IS_ERR(folio))
>  		return NULL;
>  
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c510f334c84f..3fb02e3a527a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1514,6 +1514,7 @@ struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
>  int bd_prepare_to_claim(struct block_device *bdev, void *holder,
>  		const struct blk_holder_ops *hops);
>  void bd_abort_claiming(struct block_device *bdev, void *holder);
> +struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos);
>  
>  /* just for blk-cgroup, don't use elsewhere */
>  struct block_device *blkdev_get_no_open(dev_t dev);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-16  2:49     ` Yu Kuai
@ 2024-03-18  9:39       ` Christian Brauner
  2024-03-19  1:18         ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Christian Brauner @ 2024-03-18  9:39 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sat, Mar 16, 2024 at 10:49:33AM +0800, Yu Kuai wrote:
> Hi, Christian
> 
> 在 2024/03/15 21:54, Christian Brauner 写道:
> > On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
> > > Hi, Christian
> > > Hi, Christoph
> > > Hi, Jan
> > > 
> > > Perhaps now is a good time to send a formal version of this set.
> > > However, I'm not sure yet what branch should I rebase and send this set.
> > > Should I send to the vfs tree?
> > 
> > Nearly all of it is in fs/ so I'd say yes.
> > .
> 
> I see that you just create a new branch vfs.fixes, perhaps can I rebase
> this set against this branch?

Please base it on vfs.super. I'll rebase it to v6.9-rc1 on Sunday.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18  7:19           ` Yu Kuai
@ 2024-03-18 10:07             ` Christian Brauner
  2024-03-18 10:29               ` Christian Brauner
  2024-03-18 23:22             ` Christoph Hellwig
  1 sibling, 1 reply; 98+ messages in thread
From: Christian Brauner @ 2024-03-18 10:07 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> Hi, Christoph!
> 
> 在 2024/03/18 9:51, Yu Kuai 写道:
> > Hi,
> > 
> > 在 2024/03/18 9:32, Christoph Hellwig 写道:
> > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> > > > Because there is a real filesystem(devtmpfs) used for raw block devcie
> > > > file operations, open syscall to devtmpfs:
> > > > 
> > > > blkdev_open
> > > >   bdev = blkdev_get_no_open
> > > >   bdev_open -> pass in file is from devtmpfs
> > > >   -> in this case, file inode is from devtmpfs,
> > > 
> > > But file->f_mapping->host should still point to the bdevfs inode,
> > > and file->f_mapping->host is what everything in the I/O path should
> > > be using.

I mentioned this in
https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner

"[...] if we want to have all code pass a file and we have code in
fs/buffer.c like iomap_to_bh():

iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
        loff_t offset = block << inode->i_blkbits;

        bh->b_bdev = iomap->bdev;
+       bh->f_b_bdev = iomap->f_bdev;

While that works for every single filesystem that uses block devices
because they stash them somewhere (like s_bdev_file) it doesn't work for
the bdev filesystem itself. So if the bdev filesystem calls into helpers
that expect e.g., buffer_head->s_f_bdev to have been initialized from
iomap->f_bdev this wouldn't work.

So if we want to remove b_bdev from struct buffer_head and fully rely on
f_b_bdev - and similar in iomap - then we need a story for the bdev fs
itself. And I wasn't clear on what that would be."

> > > 
> > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> > > > no access to the devtmpfs file, we can't use s_bdev_file() as other
> > > > filesystems here.
> > > 
> > > We can just pass the file down in iomap_iter.private
> > 
> > I can do this for blkdev_read_folio(), however, for other ops like
> > blkdev_writepages(), I can't find a way to pass the file to
> > iomap_iter.private yet.
> > 
> > Any suggestions?
> 
> I come up with an ideal:
> 
> While opening the block_device the first time, store the generated new
> file in "bd_inode->i_private". And release it after the last opener
> close the block_device.
> 
> The advantages are:
>  - multiple openers can share the same bdev_file;

You mean use the file stashed in bdev_inode->i_private only to retrieve
the inode/mapping in the block layer ops.

>  - raw block device ops can use the bdev_file as well, and there is no
> need to distinguish iomap/buffer_head for raw block_device;
> 
> Please let me know what do you think?

It's equally ugly but probably slightly less error prone than the union
approach. But please make that separate patches on top of the series.

This is somewhat reminiscent of the approach that Dave suggested in the
thread that I linked above. I only wonder whether we run into issue with
multiple block device openers when the original opener opened the block
device exclusively. So there might be some corner-cases.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18 10:07             ` Christian Brauner
@ 2024-03-18 10:29               ` Christian Brauner
  2024-03-18 10:46                 ` Christian Brauner
  2024-03-18 23:35                 ` Christoph Hellwig
  0 siblings, 2 replies; 98+ messages in thread
From: Christian Brauner @ 2024-03-18 10:29 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 11:07:49AM +0100, Christian Brauner wrote:
> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> > Hi, Christoph!
> > 
> > 在 2024/03/18 9:51, Yu Kuai 写道:
> > > Hi,
> > > 
> > > 在 2024/03/18 9:32, Christoph Hellwig 写道:
> > > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> > > > > Because there is a real filesystem(devtmpfs) used for raw block devcie
> > > > > file operations, open syscall to devtmpfs:

Don't forget:

mknod /my/xfs/file/system b 8 0

which means you're not opening it via devtmpfs but via xfs. IOW, the
inode for that file is from xfs.

> > > > > 
> > > > > blkdev_open
> > > > >   bdev = blkdev_get_no_open
> > > > >   bdev_open -> pass in file is from devtmpfs
> > > > >   -> in this case, file inode is from devtmpfs,
> > > > 
> > > > But file->f_mapping->host should still point to the bdevfs inode,
> > > > and file->f_mapping->host is what everything in the I/O path should
> > > > be using.
> 
> I mentioned this in
> https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner
> 
> "[...] if we want to have all code pass a file and we have code in
> fs/buffer.c like iomap_to_bh():
> 
> iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>         loff_t offset = block << inode->i_blkbits;
> 
>         bh->b_bdev = iomap->bdev;
> +       bh->f_b_bdev = iomap->f_bdev;
> 
> While that works for every single filesystem that uses block devices
> because they stash them somewhere (like s_bdev_file) it doesn't work for
> the bdev filesystem itself. So if the bdev filesystem calls into helpers
> that expect e.g., buffer_head->s_f_bdev to have been initialized from
> iomap->f_bdev this wouldn't work.
> 
> So if we want to remove b_bdev from struct buffer_head and fully rely on
> f_b_bdev - and similar in iomap - then we need a story for the bdev fs
> itself. And I wasn't clear on what that would be."
> 
> > > > 
> > > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> > > > > no access to the devtmpfs file, we can't use s_bdev_file() as other
> > > > > filesystems here.
> > > > 
> > > > We can just pass the file down in iomap_iter.private
> > > 
> > > I can do this for blkdev_read_folio(), however, for other ops like
> > > blkdev_writepages(), I can't find a way to pass the file to
> > > iomap_iter.private yet.
> > > 
> > > Any suggestions?
> > 
> > I come up with an ideal:
> > 
> > While opening the block_device the first time, store the generated new
> > file in "bd_inode->i_private". And release it after the last opener
> > close the block_device.
> > 
> > The advantages are:
> >  - multiple openers can share the same bdev_file;
> 
> You mean use the file stashed in bdev_inode->i_private only to retrieve
> the inode/mapping in the block layer ops.
> 
> >  - raw block device ops can use the bdev_file as well, and there is no
> > need to distinguish iomap/buffer_head for raw block_device;
> > 
> > Please let me know what do you think?
> 
> It's equally ugly but probably slightly less error prone than the union
> approach. But please make that separate patches on top of the series.
> 
> This is somewhat reminiscent of the approach that Dave suggested in the
> thread that I linked above. I only wonder whether we run into issue with
> multiple block device openers when the original opener opened the block
> device exclusively. So there might be some corner-cases.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18 10:29               ` Christian Brauner
@ 2024-03-18 10:46                 ` Christian Brauner
  2024-03-18 11:57                   ` Yu Kuai
  2024-03-18 23:35                 ` Christoph Hellwig
  1 sibling, 1 reply; 98+ messages in thread
From: Christian Brauner @ 2024-03-18 10:46 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 11:29:22AM +0100, Christian Brauner wrote:
> On Mon, Mar 18, 2024 at 11:07:49AM +0100, Christian Brauner wrote:
> > On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> > > Hi, Christoph!
> > > 
> > > 在 2024/03/18 9:51, Yu Kuai 写道:
> > > > Hi,
> > > > 
> > > > 在 2024/03/18 9:32, Christoph Hellwig 写道:
> > > > > On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
> > > > > > Because there is a real filesystem(devtmpfs) used for raw block devcie
> > > > > > file operations, open syscall to devtmpfs:
> 
> Don't forget:
> 
> mknod /my/xfs/file/system b 8 0
> 
> which means you're not opening it via devtmpfs but via xfs. IOW, the
> inode for that file is from xfs.
> 
> > > > > > 
> > > > > > blkdev_open
> > > > > >   bdev = blkdev_get_no_open
> > > > > >   bdev_open -> pass in file is from devtmpfs
> > > > > >   -> in this case, file inode is from devtmpfs,
> > > > > 
> > > > > But file->f_mapping->host should still point to the bdevfs inode,
> > > > > and file->f_mapping->host is what everything in the I/O path should
> > > > > be using.
> > 
> > I mentioned this in
> > https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner
> > 
> > "[...] if we want to have all code pass a file and we have code in
> > fs/buffer.c like iomap_to_bh():
> > 
> > iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> >         loff_t offset = block << inode->i_blkbits;
> > 
> >         bh->b_bdev = iomap->bdev;
> > +       bh->f_b_bdev = iomap->f_bdev;
> > 
> > While that works for every single filesystem that uses block devices
> > because they stash them somewhere (like s_bdev_file) it doesn't work for
> > the bdev filesystem itself. So if the bdev filesystem calls into helpers
> > that expect e.g., buffer_head->s_f_bdev to have been initialized from
> > iomap->f_bdev this wouldn't work.
> > 
> > So if we want to remove b_bdev from struct buffer_head and fully rely on
> > f_b_bdev - and similar in iomap - then we need a story for the bdev fs
> > itself. And I wasn't clear on what that would be."
> > 
> > > > > 
> > > > > > Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
> > > > > > no access to the devtmpfs file, we can't use s_bdev_file() as other
> > > > > > filesystems here.
> > > > > 
> > > > > We can just pass the file down in iomap_iter.private
> > > > 
> > > > I can do this for blkdev_read_folio(), however, for other ops like
> > > > blkdev_writepages(), I can't find a way to pass the file to
> > > > iomap_iter.private yet.
> > > > 
> > > > Any suggestions?
> > > 
> > > I come up with an ideal:
> > > 
> > > While opening the block_device the first time, store the generated new
> > > file in "bd_inode->i_private". And release it after the last opener
> > > close the block_device.
> > > 
> > > The advantages are:
> > >  - multiple openers can share the same bdev_file;
> > 
> > You mean use the file stashed in bdev_inode->i_private only to retrieve
> > the inode/mapping in the block layer ops.
> > 
> > >  - raw block device ops can use the bdev_file as well, and there is no
> > > need to distinguish iomap/buffer_head for raw block_device;
> > > 
> > > Please let me know what do you think?
> > 
> > It's equally ugly but probably slightly less error prone than the union
> > approach. But please make that separate patches on top of the series.

The other issue with this on-demand inode->i_private allocation will be
lifetime management. If you're doing some sort of writeback initiated
from the filesystem then you're guaranteed that the file stashed in
sb->bdev_file is aligned with the lifetime of the filesystem. All
writeback related stuff that relies on inode's can rely on the
superblock being valid while it is doing stuff.

In your approach that guarantee can't be given easily. If someone opens
a block device /dev/sda does some buffered writes and then closes it the
file might be cleaned up while there's still operations ongoing that
rely on the file stashed in inode->i_private to be valid.

If on the other hand you allocate a stub file on-demand during
bdev_open() and stash it in inode->i_private you need to make sure to
avoid creating reference count cycles that keep the inode alive.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18 10:46                 ` Christian Brauner
@ 2024-03-18 11:57                   ` Yu Kuai
  0 siblings, 0 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-18 11:57 UTC (permalink / raw)
  To: Christian Brauner, Yu Kuai
  Cc: Christoph Hellwig, jack, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2024/03/18 18:46, Christian Brauner 写道:
> On Mon, Mar 18, 2024 at 11:29:22AM +0100, Christian Brauner wrote:
>> On Mon, Mar 18, 2024 at 11:07:49AM +0100, Christian Brauner wrote:
>>> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
>>>> Hi, Christoph!
>>>>
>>>> 在 2024/03/18 9:51, Yu Kuai 写道:
>>>>> Hi,
>>>>>
>>>>> 在 2024/03/18 9:32, Christoph Hellwig 写道:
>>>>>> On Mon, Mar 18, 2024 at 09:26:48AM +0800, Yu Kuai wrote:
>>>>>>> Because there is a real filesystem(devtmpfs) used for raw block devcie
>>>>>>> file operations, open syscall to devtmpfs:
>>
>> Don't forget:
>>
>> mknod /my/xfs/file/system b 8 0
>>
>> which means you're not opening it via devtmpfs but via xfs. IOW, the
>> inode for that file is from xfs.

I think there is no difference from devtmpfs, no matter what file is
passed in from blkdev_open(), we'll find the only bd_inode and stash
new bdev_file here.
>>
>>>>>>>
>>>>>>> blkdev_open
>>>>>>>    bdev = blkdev_get_no_open
>>>>>>>    bdev_open -> pass in file is from devtmpfs
>>>>>>>    -> in this case, file inode is from devtmpfs,
>>>>>>
>>>>>> But file->f_mapping->host should still point to the bdevfs inode,
>>>>>> and file->f_mapping->host is what everything in the I/O path should
>>>>>> be using.
>>>
>>> I mentioned this in
>>> https://lore.kernel.org/r/20240118-gemustert-aalen-ee71d0c69826@brauner
>>>
>>> "[...] if we want to have all code pass a file and we have code in
>>> fs/buffer.c like iomap_to_bh():
>>>
>>> iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>          loff_t offset = block << inode->i_blkbits;
>>>
>>>          bh->b_bdev = iomap->bdev;
>>> +       bh->f_b_bdev = iomap->f_bdev;
>>>
>>> While that works for every single filesystem that uses block devices
>>> because they stash them somewhere (like s_bdev_file) it doesn't work for
>>> the bdev filesystem itself. So if the bdev filesystem calls into helpers
>>> that expect e.g., buffer_head->s_f_bdev to have been initialized from
>>> iomap->f_bdev this wouldn't work.
>>>
>>> So if we want to remove b_bdev from struct buffer_head and fully rely on
>>> f_b_bdev - and similar in iomap - then we need a story for the bdev fs
>>> itself. And I wasn't clear on what that would be."
>>>
>>>>>>
>>>>>>> Then later, in blkdev_iomap_begin(), bd_inode is passed in and there is
>>>>>>> no access to the devtmpfs file, we can't use s_bdev_file() as other
>>>>>>> filesystems here.
>>>>>>
>>>>>> We can just pass the file down in iomap_iter.private
>>>>>
>>>>> I can do this for blkdev_read_folio(), however, for other ops like
>>>>> blkdev_writepages(), I can't find a way to pass the file to
>>>>> iomap_iter.private yet.
>>>>>
>>>>> Any suggestions?
>>>>
>>>> I come up with an ideal:
>>>>
>>>> While opening the block_device the first time, store the generated new
>>>> file in "bd_inode->i_private". And release it after the last opener
>>>> close the block_device.
>>>>
>>>> The advantages are:
>>>>   - multiple openers can share the same bdev_file;
>>>
>>> You mean use the file stashed in bdev_inode->i_private only to retrieve
>>> the inode/mapping in the block layer ops.

Yes. I mean in the first bdev_open() allocate a bdev_file and stash it,
and free it in the last bdev_release().
>>>
>>>>   - raw block device ops can use the bdev_file as well, and there is no
>>>> need to distinguish iomap/buffer_head for raw block_device;
>>>>
>>>> Please let me know what do you think?
>>>
>>> It's equally ugly but probably slightly less error prone than the union
>>> approach. But please make that separate patches on top of the series.
> 
> The other issue with this on-demand inode->i_private allocation will be
> lifetime management. If you're doing some sort of writeback initiated
> from the filesystem then you're guaranteed that the file stashed in
> sb->bdev_file is aligned with the lifetime of the filesystem. All
> writeback related stuff that relies on inode's can rely on the
> superblock being valid while it is doing stuff.

For raw block device, before bdev_release() is called for the last
opener(specifically bd_openers decreased to zero),
blkdev_flush_mapping() is called, hence raw block_device writeback
should always see valid 'bdev_file' that will be release in the last
bdev_release().

And 'blockdev_superblock' will always be there and is always valid.>
> In your approach that guarantee can't be given easily. If someone opens
> a block device /dev/sda does some buffered writes and then closes it the
> file might be cleaned up while there's still operations ongoing that
> rely on the file stashed in inode->i_private to be valid.
> 
> If on the other hand you allocate a stub file on-demand during
> bdev_open() and stash it in inode->i_private you need to make sure to
> avoid creating reference count cycles that keep the inode alive.

I'm thinking about use 'bdev_openers' to gurantee the lifetime. I can't
think of possible problems for now, however, I cound be wrong.

Thanks,
Kuai

> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-03-18  9:19   ` Jan Kara
@ 2024-03-18 13:38     ` Yu Kuai
  2024-03-19  2:00       ` Matthew Sakai
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-18 13:38 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 17:19, Jan Kara 写道:
> On Thu 22-02-24 20:45:53, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Now that dm upper layer already statsh the file of opened device in
>> 'dm_dev->bdev_file', it's ok to get inode from the file.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> Given there are like three real uses of ->bdev in dm-vdo, I suspect it
> might be better to just replace bdev with bdev_file in struct io_factory
> and in struct uds_parameters.

Yes, this make sense.

Thanks for the review!
Kuai

> 
> 								Honza
> 
>> ---
>>   drivers/md/dm-vdo/dedupe.c                |  3 ++-
>>   drivers/md/dm-vdo/dm-vdo-target.c         |  5 +++--
>>   drivers/md/dm-vdo/indexer/config.c        |  1 +
>>   drivers/md/dm-vdo/indexer/config.h        |  3 +++
>>   drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
>>   drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
>>   drivers/md/dm-vdo/indexer/index-session.c | 13 +++++++------
>>   drivers/md/dm-vdo/indexer/index.c         |  4 ++--
>>   drivers/md/dm-vdo/indexer/index.h         |  2 +-
>>   drivers/md/dm-vdo/indexer/indexer.h       |  4 +++-
>>   drivers/md/dm-vdo/indexer/io-factory.c    | 13 ++++++++-----
>>   drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
>>   drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
>>   drivers/md/dm-vdo/indexer/volume.h        |  2 +-
>>   14 files changed, 39 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
>> index a9b189395592..532294a15174 100644
>> --- a/drivers/md/dm-vdo/dedupe.c
>> +++ b/drivers/md/dm-vdo/dedupe.c
>> @@ -2592,7 +2592,8 @@ static void resume_index(void *context, struct vdo_completion *parent)
>>   	int result;
>>   
>>   	zones->parameters.bdev = config->owned_device->bdev;
>> -	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
>> +	zones->parameters.bdev_file = config->owned_device->bdev_file;
>> +	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev_file);
>>   	if (result != UDS_SUCCESS)
>>   		vdo_log_error_strerror(result, "Error resuming dedupe index");
>>   
>> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
>> index 89d00be9f075..b2d7f68e70be 100644
>> --- a/drivers/md/dm-vdo/dm-vdo-target.c
>> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
>> @@ -883,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
>>   	}
>>   
>>   	if (config->version == 0) {
>> -		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
>> +		u64 device_size = i_size_read(file_inode(config->owned_device->bdev_file));
>>   
>>   		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>>   	}
>> @@ -1018,7 +1018,8 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
>>   
>>   static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
>>   {
>> -	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
>> +	return i_size_read(file_inode(vdo->device_config->owned_device->bdev_file)) /
>> +		VDO_BLOCK_SIZE;
>>   }
>>   
>>   static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
>> diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
>> index 260993ce1944..f1f66e232b54 100644
>> --- a/drivers/md/dm-vdo/indexer/config.c
>> +++ b/drivers/md/dm-vdo/indexer/config.c
>> @@ -347,6 +347,7 @@ int uds_make_configuration(const struct uds_parameters *params,
>>   	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
>>   	config->nonce = params->nonce;
>>   	config->bdev = params->bdev;
>> +	config->bdev_file = params->bdev_file;
>>   	config->offset = params->offset;
>>   	config->size = params->size;
>>   
>> diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
>> index fe7958263ed6..688f7450183e 100644
>> --- a/drivers/md/dm-vdo/indexer/config.h
>> +++ b/drivers/md/dm-vdo/indexer/config.h
>> @@ -28,6 +28,9 @@ struct uds_configuration {
>>   	/* Storage device for the index */
>>   	struct block_device *bdev;
>>   
>> +	/* Opened device fot the index */
>> +	struct file *bdev_file;
>> +
>>   	/* The maximum allowable size of the index */
>>   	size_t size;
>>   
>> diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
>> index 1453fddaa656..6dd80a432fe5 100644
>> --- a/drivers/md/dm-vdo/indexer/index-layout.c
>> +++ b/drivers/md/dm-vdo/indexer/index-layout.c
>> @@ -1672,7 +1672,7 @@ static int create_layout_factory(struct index_layout *layout,
>>   	size_t writable_size;
>>   	struct io_factory *factory = NULL;
>>   
>> -	result = uds_make_io_factory(config->bdev, &factory);
>> +	result = uds_make_io_factory(config->bdev_file, &factory);
>>   	if (result != UDS_SUCCESS)
>>   		return result;
>>   
>> @@ -1745,9 +1745,9 @@ void vdo_free_index_layout(struct index_layout *layout)
>>   }
>>   
>>   int uds_replace_index_layout_storage(struct index_layout *layout,
>> -				     struct block_device *bdev)
>> +				     struct file *bdev_file)
>>   {
>> -	return uds_replace_storage(layout->factory, bdev);
>> +	return uds_replace_storage(layout->factory, bdev_file);
>>   }
>>   
>>   /* Obtain a dm_bufio_client for the volume region. */
>> diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
>> index bd9b90c84a70..9b0c850fe9a7 100644
>> --- a/drivers/md/dm-vdo/indexer/index-layout.h
>> +++ b/drivers/md/dm-vdo/indexer/index-layout.h
>> @@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
>>   void vdo_free_index_layout(struct index_layout *layout);
>>   
>>   int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
>> -						  struct block_device *bdev);
>> +						  struct file *bdev_file);
>>   
>>   int __must_check uds_load_index_state(struct index_layout *layout,
>>   				      struct uds_index *index);
>> diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
>> index 1949a2598656..df8f8122a22d 100644
>> --- a/drivers/md/dm-vdo/indexer/index-session.c
>> +++ b/drivers/md/dm-vdo/indexer/index-session.c
>> @@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
>>   	return uds_status_to_errno(result);
>>   }
>>   
>> -static int replace_device(struct uds_index_session *session, struct block_device *bdev)
>> +static int replace_device(struct uds_index_session *session, struct file *bdev_file)
>>   {
>>   	int result;
>>   
>> -	result = uds_replace_index_storage(session->index, bdev);
>> +	result = uds_replace_index_storage(session->index, bdev_file);
>>   	if (result != UDS_SUCCESS)
>>   		return result;
>>   
>> -	session->parameters.bdev = bdev;
>> +	session->parameters.bdev = file_bdev(bdev_file);
>> +	session->parameters.bdev_file = bdev_file;
>>   	return UDS_SUCCESS;
>>   }
>>   
>> @@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
>>    * device differs from the current backing store, the index will start using the new backing store.
>>    */
>>   int uds_resume_index_session(struct uds_index_session *session,
>> -			     struct block_device *bdev)
>> +			     struct file *bdev_file)
>>   {
>>   	int result = UDS_SUCCESS;
>>   	bool no_work = false;
>> @@ -502,8 +503,8 @@ int uds_resume_index_session(struct uds_index_session *session,
>>   	if (no_work)
>>   		return result;
>>   
>> -	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
>> -		result = replace_device(session, bdev);
>> +	if ((session->index != NULL) && (bdev_file != session->parameters.bdev_file)) {
>> +		result = replace_device(session, bdev_file);
>>   		if (result != UDS_SUCCESS) {
>>   			mutex_lock(&session->request_mutex);
>>   			session->state &= ~IS_FLAG_WAITING;
>> diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
>> index bd2405738c50..3600a169ca98 100644
>> --- a/drivers/md/dm-vdo/indexer/index.c
>> +++ b/drivers/md/dm-vdo/indexer/index.c
>> @@ -1334,9 +1334,9 @@ int uds_save_index(struct uds_index *index)
>>   	return result;
>>   }
>>   
>> -int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
>> +int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
>>   {
>> -	return uds_replace_volume_storage(index->volume, index->layout, bdev);
>> +	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
>>   }
>>   
>>   /* Accessing statistics should be safe from any thread. */
>> diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
>> index 7fbc63db4131..9428ee025cda 100644
>> --- a/drivers/md/dm-vdo/indexer/index.h
>> +++ b/drivers/md/dm-vdo/indexer/index.h
>> @@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
>>   void vdo_free_index(struct uds_index *index);
>>   
>>   int __must_check uds_replace_index_storage(struct uds_index *index,
>> -					   struct block_device *bdev);
>> +					   struct file *bdev_file);
>>   
>>   void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
>>   
>> diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
>> index a832a34d9436..5dd2c93f12c2 100644
>> --- a/drivers/md/dm-vdo/indexer/indexer.h
>> +++ b/drivers/md/dm-vdo/indexer/indexer.h
>> @@ -130,6 +130,8 @@ struct uds_volume_record {
>>   struct uds_parameters {
>>   	/* The block_device used for storage */
>>   	struct block_device *bdev;
>> +	/* Then opened block_device */
>> +	struct file *bdev_file;
>>   	/* The maximum allowable size of the index on storage */
>>   	size_t size;
>>   	/* The offset where the index should start */
>> @@ -314,7 +316,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
>>    * start using the new backing store instead.
>>    */
>>   int __must_check uds_resume_index_session(struct uds_index_session *session,
>> -					  struct block_device *bdev);
>> +					  struct file *bdev_file);
>>   
>>   /* Wait until all outstanding index operations are complete. */
>>   int __must_check uds_flush_index_session(struct uds_index_session *session);
>> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
>> index 61104d5ccd61..a855c3ac73bc 100644
>> --- a/drivers/md/dm-vdo/indexer/io-factory.c
>> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
>> @@ -23,6 +23,7 @@
>>    */
>>   struct io_factory {
>>   	struct block_device *bdev;
>> +	struct file *bdev_file;
>>   	atomic_t ref_count;
>>   };
>>   
>> @@ -59,7 +60,7 @@ static void uds_get_io_factory(struct io_factory *factory)
>>   	atomic_inc(&factory->ref_count);
>>   }
>>   
>> -int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
>> +int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
>>   {
>>   	int result;
>>   	struct io_factory *factory;
>> @@ -68,16 +69,18 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
>>   	if (result != VDO_SUCCESS)
>>   		return result;
>>   
>> -	factory->bdev = bdev;
>> +	factory->bdev = file_bdev(bdev_file);
>> +	factory->bdev_file = bdev_file;
>>   	atomic_set_release(&factory->ref_count, 1);
>>   
>>   	*factory_ptr = factory;
>>   	return UDS_SUCCESS;
>>   }
>>   
>> -int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
>> +int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
>>   {
>> -	factory->bdev = bdev;
>> +	factory->bdev = file_bdev(bdev_file);
>> +	factory->bdev_file = bdev_file;
>>   	return UDS_SUCCESS;
>>   }
>>   
>> @@ -90,7 +93,7 @@ void uds_put_io_factory(struct io_factory *factory)
>>   
>>   size_t uds_get_writable_size(struct io_factory *factory)
>>   {
>> -	return i_size_read(factory->bdev->bd_inode);
>> +	return i_size_read(file_inode(factory->bdev_file));
>>   }
>>   
>>   /* Create a struct dm_bufio_client for an index region starting at offset. */
>> diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
>> index 60749a9ff756..e5100ab57754 100644
>> --- a/drivers/md/dm-vdo/indexer/io-factory.h
>> +++ b/drivers/md/dm-vdo/indexer/io-factory.h
>> @@ -24,11 +24,11 @@ enum {
>>   	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
>>   };
>>   
>> -int __must_check uds_make_io_factory(struct block_device *bdev,
>> +int __must_check uds_make_io_factory(struct file *bdev_file,
>>   				     struct io_factory **factory_ptr);
>>   
>>   int __must_check uds_replace_storage(struct io_factory *factory,
>> -				     struct block_device *bdev);
>> +				     struct file *bdev_file);
>>   
>>   void uds_put_io_factory(struct io_factory *factory);
>>   
>> diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
>> index 8b21ec93f3bc..a292840a83e3 100644
>> --- a/drivers/md/dm-vdo/indexer/volume.c
>> +++ b/drivers/md/dm-vdo/indexer/volume.c
>> @@ -1467,12 +1467,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
>>   
>>   int __must_check uds_replace_volume_storage(struct volume *volume,
>>   					    struct index_layout *layout,
>> -					    struct block_device *bdev)
>> +					    struct file *bdev_file)
>>   {
>>   	int result;
>>   	u32 i;
>>   
>> -	result = uds_replace_index_layout_storage(layout, bdev);
>> +	result = uds_replace_index_layout_storage(layout, bdev_file);
>>   	if (result != UDS_SUCCESS)
>>   		return result;
>>   
>> diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
>> index 7fdd44464db2..5861654d837e 100644
>> --- a/drivers/md/dm-vdo/indexer/volume.h
>> +++ b/drivers/md/dm-vdo/indexer/volume.h
>> @@ -131,7 +131,7 @@ void vdo_free_volume(struct volume *volume);
>>   
>>   int __must_check uds_replace_volume_storage(struct volume *volume,
>>   					    struct index_layout *layout,
>> -					    struct block_device *bdev);
>> +					    struct file *bdev_file);
>>   
>>   int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
>>   						    u64 *lowest_vcn, u64 *highest_vcn,
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18  7:19           ` Yu Kuai
  2024-03-18 10:07             ` Christian Brauner
@ 2024-03-18 23:22             ` Christoph Hellwig
  2024-03-19  8:26               ` Yu Kuai
  1 sibling, 1 reply; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-18 23:22 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> I come up with an ideal:
>
> While opening the block_device the first time, store the generated new
> file in "bd_inode->i_private". And release it after the last opener
> close the block_device.
>
> The advantages are:
>  - multiple openers can share the same bdev_file;
>  - raw block device ops can use the bdev_file as well, and there is no
> need to distinguish iomap/buffer_head for raw block_device;
>
> Please let me know what do you think?

That does sound very reasonable to me.


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18 10:29               ` Christian Brauner
  2024-03-18 10:46                 ` Christian Brauner
@ 2024-03-18 23:35                 ` Christoph Hellwig
  1 sibling, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-18 23:35 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Yu Kuai, Christoph Hellwig, jack, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Mon, Mar 18, 2024 at 11:29:17AM +0100, Christian Brauner wrote:
> Don't forget:
> 
> mknod /my/xfs/file/system b 8 0
> 
> which means you're not opening it via devtmpfs but via xfs. IOW, the
> inode for that file is from xfs.

Yes.  file_inode() for block devices is always the "upper" fs, which can
be any file system supporting device nodes.  file->f_mapping->host will
always be the bdevfs inode, and nothing in the I/O path should ever be
using file_inode().

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-18  9:39       ` Christian Brauner
@ 2024-03-19  1:18         ` Yu Kuai
  2024-03-19  1:43           ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-19  1:18 UTC (permalink / raw)
  To: Christian Brauner, Yu Kuai
  Cc: jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/18 17:39, Christian Brauner 写道:
> On Sat, Mar 16, 2024 at 10:49:33AM +0800, Yu Kuai wrote:
>> Hi, Christian
>>
>> 在 2024/03/15 21:54, Christian Brauner 写道:
>>> On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
>>>> Hi, Christian
>>>> Hi, Christoph
>>>> Hi, Jan
>>>>
>>>> Perhaps now is a good time to send a formal version of this set.
>>>> However, I'm not sure yet what branch should I rebase and send this set.
>>>> Should I send to the vfs tree?
>>>
>>> Nearly all of it is in fs/ so I'd say yes.
>>> .
>>
>> I see that you just create a new branch vfs.fixes, perhaps can I rebase
>> this set against this branch?
> 
> Please base it on vfs.super. I'll rebase it to v6.9-rc1 on Sunday.

Okay, I just see that vfs.super doesn't contain commit
1cdeac6da33f("btrfs: pass btrfs_device to btrfs_scratch_superblocks()"),
and you might need to fix the conflict at some point.

Thanks,
Kuai

> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-19  1:18         ` Yu Kuai
@ 2024-03-19  1:43           ` Yu Kuai
  2024-03-19  2:13             ` Matthew Sakai
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-19  1:43 UTC (permalink / raw)
  To: Yu Kuai, Christian Brauner
  Cc: jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/19 9:18, Yu Kuai 写道:
> Hi,
> 
> 在 2024/03/18 17:39, Christian Brauner 写道:
>> On Sat, Mar 16, 2024 at 10:49:33AM +0800, Yu Kuai wrote:
>>> Hi, Christian
>>>
>>> 在 2024/03/15 21:54, Christian Brauner 写道:
>>>> On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
>>>>> Hi, Christian
>>>>> Hi, Christoph
>>>>> Hi, Jan
>>>>>
>>>>> Perhaps now is a good time to send a formal version of this set.
>>>>> However, I'm not sure yet what branch should I rebase and send this 
>>>>> set.
>>>>> Should I send to the vfs tree?
>>>>
>>>> Nearly all of it is in fs/ so I'd say yes.
>>>> .
>>>
>>> I see that you just create a new branch vfs.fixes, perhaps can I rebase
>>> this set against this branch?
>>
>> Please base it on vfs.super. I'll rebase it to v6.9-rc1 on Sunday.
> 
> Okay, I just see that vfs.super doesn't contain commit
> 1cdeac6da33f("btrfs: pass btrfs_device to btrfs_scratch_superblocks()"),
> and you might need to fix the conflict at some point.

And there is another problem, dm-vdo doesn't exist in vfs.super yet. Do
you still want me to rebase here?

Thanks,
Kuai

> 
> Thanks,
> Kuai
> 
>> .
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 17/19] dm-vdo: prevent direct access of bd_inode
  2024-03-18 13:38     ` Yu Kuai
@ 2024-03-19  2:00       ` Matthew Sakai
  0 siblings, 0 replies; 98+ messages in thread
From: Matthew Sakai @ 2024-03-19  2:00 UTC (permalink / raw)
  To: Yu Kuai, Jan Kara
  Cc: hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)


On 3/18/24 09:38, Yu Kuai wrote:
> Hi,
> 
> 在 2024/03/18 17:19, Jan Kara 写道:
>> On Thu 22-02-24 20:45:53, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Now that dm upper layer already statsh the file of opened device in
>>> 'dm_dev->bdev_file', it's ok to get inode from the file.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>
>> Given there are like three real uses of ->bdev in dm-vdo, I suspect it
>> might be better to just replace bdev with bdev_file in struct io_factory
>> and in struct uds_parameters.
> 
> Yes, this make sense.
> 
> Thanks for the review!
> Kuai
>

At a glance this looks completely reasonable to me. However, can you be 
sure to CC: dm-devel@lists.linux.dev for dm-vdo patches? I almost missed 
seeing this patch. I will try to give it a proper review tomorrow.

Matt

>>
>>                                 Honza
>>
>>> ---
>>>   drivers/md/dm-vdo/dedupe.c                |  3 ++-
>>>   drivers/md/dm-vdo/dm-vdo-target.c         |  5 +++--
>>>   drivers/md/dm-vdo/indexer/config.c        |  1 +
>>>   drivers/md/dm-vdo/indexer/config.h        |  3 +++
>>>   drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
>>>   drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
>>>   drivers/md/dm-vdo/indexer/index-session.c | 13 +++++++------
>>>   drivers/md/dm-vdo/indexer/index.c         |  4 ++--
>>>   drivers/md/dm-vdo/indexer/index.h         |  2 +-
>>>   drivers/md/dm-vdo/indexer/indexer.h       |  4 +++-
>>>   drivers/md/dm-vdo/indexer/io-factory.c    | 13 ++++++++-----
>>>   drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
>>>   drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
>>>   drivers/md/dm-vdo/indexer/volume.h        |  2 +-
>>>   14 files changed, 39 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
>>> index a9b189395592..532294a15174 100644
>>> --- a/drivers/md/dm-vdo/dedupe.c
>>> +++ b/drivers/md/dm-vdo/dedupe.c
>>> @@ -2592,7 +2592,8 @@ static void resume_index(void *context, struct 
>>> vdo_completion *parent)
>>>       int result;
>>>       zones->parameters.bdev = config->owned_device->bdev;
>>> -    result = uds_resume_index_session(zones->index_session, 
>>> zones->parameters.bdev);
>>> +    zones->parameters.bdev_file = config->owned_device->bdev_file;
>>> +    result = uds_resume_index_session(zones->index_session, 
>>> zones->parameters.bdev_file);
>>>       if (result != UDS_SUCCESS)
>>>           vdo_log_error_strerror(result, "Error resuming dedupe index");
>>> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c 
>>> b/drivers/md/dm-vdo/dm-vdo-target.c
>>> index 89d00be9f075..b2d7f68e70be 100644
>>> --- a/drivers/md/dm-vdo/dm-vdo-target.c
>>> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
>>> @@ -883,7 +883,7 @@ static int parse_device_config(int argc, char 
>>> **argv, struct dm_target *ti,
>>>       }
>>>       if (config->version == 0) {
>>> -        u64 device_size = 
>>> i_size_read(config->owned_device->bdev->bd_inode);
>>> +        u64 device_size = 
>>> i_size_read(file_inode(config->owned_device->bdev_file));
>>>           config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>>>       }
>>> @@ -1018,7 +1018,8 @@ static void vdo_status(struct dm_target *ti, 
>>> status_type_t status_type,
>>>   static block_count_t __must_check 
>>> get_underlying_device_block_count(const struct vdo *vdo)
>>>   {
>>> -    return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / 
>>> VDO_BLOCK_SIZE;
>>> +    return 
>>> i_size_read(file_inode(vdo->device_config->owned_device->bdev_file)) /
>>> +        VDO_BLOCK_SIZE;
>>>   }
>>>   static int __must_check process_vdo_message_locked(struct vdo *vdo, 
>>> unsigned int argc,
>>> diff --git a/drivers/md/dm-vdo/indexer/config.c 
>>> b/drivers/md/dm-vdo/indexer/config.c
>>> index 260993ce1944..f1f66e232b54 100644
>>> --- a/drivers/md/dm-vdo/indexer/config.c
>>> +++ b/drivers/md/dm-vdo/indexer/config.c
>>> @@ -347,6 +347,7 @@ int uds_make_configuration(const struct 
>>> uds_parameters *params,
>>>       config->sparse_sample_rate = (params->sparse ? 
>>> DEFAULT_SPARSE_SAMPLE_RATE : 0);
>>>       config->nonce = params->nonce;
>>>       config->bdev = params->bdev;
>>> +    config->bdev_file = params->bdev_file;
>>>       config->offset = params->offset;
>>>       config->size = params->size;
>>> diff --git a/drivers/md/dm-vdo/indexer/config.h 
>>> b/drivers/md/dm-vdo/indexer/config.h
>>> index fe7958263ed6..688f7450183e 100644
>>> --- a/drivers/md/dm-vdo/indexer/config.h
>>> +++ b/drivers/md/dm-vdo/indexer/config.h
>>> @@ -28,6 +28,9 @@ struct uds_configuration {
>>>       /* Storage device for the index */
>>>       struct block_device *bdev;
>>> +    /* Opened device fot the index */
>>> +    struct file *bdev_file;
>>> +
>>>       /* The maximum allowable size of the index */
>>>       size_t size;
>>> diff --git a/drivers/md/dm-vdo/indexer/index-layout.c 
>>> b/drivers/md/dm-vdo/indexer/index-layout.c
>>> index 1453fddaa656..6dd80a432fe5 100644
>>> --- a/drivers/md/dm-vdo/indexer/index-layout.c
>>> +++ b/drivers/md/dm-vdo/indexer/index-layout.c
>>> @@ -1672,7 +1672,7 @@ static int create_layout_factory(struct 
>>> index_layout *layout,
>>>       size_t writable_size;
>>>       struct io_factory *factory = NULL;
>>> -    result = uds_make_io_factory(config->bdev, &factory);
>>> +    result = uds_make_io_factory(config->bdev_file, &factory);
>>>       if (result != UDS_SUCCESS)
>>>           return result;
>>> @@ -1745,9 +1745,9 @@ void vdo_free_index_layout(struct index_layout 
>>> *layout)
>>>   }
>>>   int uds_replace_index_layout_storage(struct index_layout *layout,
>>> -                     struct block_device *bdev)
>>> +                     struct file *bdev_file)
>>>   {
>>> -    return uds_replace_storage(layout->factory, bdev);
>>> +    return uds_replace_storage(layout->factory, bdev_file);
>>>   }
>>>   /* Obtain a dm_bufio_client for the volume region. */
>>> diff --git a/drivers/md/dm-vdo/indexer/index-layout.h 
>>> b/drivers/md/dm-vdo/indexer/index-layout.h
>>> index bd9b90c84a70..9b0c850fe9a7 100644
>>> --- a/drivers/md/dm-vdo/indexer/index-layout.h
>>> +++ b/drivers/md/dm-vdo/indexer/index-layout.h
>>> @@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct 
>>> uds_configuration *config, bool ne
>>>   void vdo_free_index_layout(struct index_layout *layout);
>>>   int __must_check uds_replace_index_layout_storage(struct 
>>> index_layout *layout,
>>> -                          struct block_device *bdev);
>>> +                          struct file *bdev_file);
>>>   int __must_check uds_load_index_state(struct index_layout *layout,
>>>                         struct uds_index *index);
>>> diff --git a/drivers/md/dm-vdo/indexer/index-session.c 
>>> b/drivers/md/dm-vdo/indexer/index-session.c
>>> index 1949a2598656..df8f8122a22d 100644
>>> --- a/drivers/md/dm-vdo/indexer/index-session.c
>>> +++ b/drivers/md/dm-vdo/indexer/index-session.c
>>> @@ -460,15 +460,16 @@ int uds_suspend_index_session(struct 
>>> uds_index_session *session, bool save)
>>>       return uds_status_to_errno(result);
>>>   }
>>> -static int replace_device(struct uds_index_session *session, struct 
>>> block_device *bdev)
>>> +static int replace_device(struct uds_index_session *session, struct 
>>> file *bdev_file)
>>>   {
>>>       int result;
>>> -    result = uds_replace_index_storage(session->index, bdev);
>>> +    result = uds_replace_index_storage(session->index, bdev_file);
>>>       if (result != UDS_SUCCESS)
>>>           return result;
>>> -    session->parameters.bdev = bdev;
>>> +    session->parameters.bdev = file_bdev(bdev_file);
>>> +    session->parameters.bdev_file = bdev_file;
>>>       return UDS_SUCCESS;
>>>   }
>>> @@ -477,7 +478,7 @@ static int replace_device(struct 
>>> uds_index_session *session, struct block_device
>>>    * device differs from the current backing store, the index will 
>>> start using the new backing store.
>>>    */
>>>   int uds_resume_index_session(struct uds_index_session *session,
>>> -                 struct block_device *bdev)
>>> +                 struct file *bdev_file)
>>>   {
>>>       int result = UDS_SUCCESS;
>>>       bool no_work = false;
>>> @@ -502,8 +503,8 @@ int uds_resume_index_session(struct 
>>> uds_index_session *session,
>>>       if (no_work)
>>>           return result;
>>> -    if ((session->index != NULL) && (bdev != 
>>> session->parameters.bdev)) {
>>> -        result = replace_device(session, bdev);
>>> +    if ((session->index != NULL) && (bdev_file != 
>>> session->parameters.bdev_file)) {
>>> +        result = replace_device(session, bdev_file);
>>>           if (result != UDS_SUCCESS) {
>>>               mutex_lock(&session->request_mutex);
>>>               session->state &= ~IS_FLAG_WAITING;
>>> diff --git a/drivers/md/dm-vdo/indexer/index.c 
>>> b/drivers/md/dm-vdo/indexer/index.c
>>> index bd2405738c50..3600a169ca98 100644
>>> --- a/drivers/md/dm-vdo/indexer/index.c
>>> +++ b/drivers/md/dm-vdo/indexer/index.c
>>> @@ -1334,9 +1334,9 @@ int uds_save_index(struct uds_index *index)
>>>       return result;
>>>   }
>>> -int uds_replace_index_storage(struct uds_index *index, struct 
>>> block_device *bdev)
>>> +int uds_replace_index_storage(struct uds_index *index, struct file 
>>> *bdev_file)
>>>   {
>>> -    return uds_replace_volume_storage(index->volume, index->layout, 
>>> bdev);
>>> +    return uds_replace_volume_storage(index->volume, index->layout, 
>>> bdev_file);
>>>   }
>>>   /* Accessing statistics should be safe from any thread. */
>>> diff --git a/drivers/md/dm-vdo/indexer/index.h 
>>> b/drivers/md/dm-vdo/indexer/index.h
>>> index 7fbc63db4131..9428ee025cda 100644
>>> --- a/drivers/md/dm-vdo/indexer/index.h
>>> +++ b/drivers/md/dm-vdo/indexer/index.h
>>> @@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index 
>>> *index);
>>>   void vdo_free_index(struct uds_index *index);
>>>   int __must_check uds_replace_index_storage(struct uds_index *index,
>>> -                       struct block_device *bdev);
>>> +                       struct file *bdev_file);
>>>   void uds_get_index_stats(struct uds_index *index, struct 
>>> uds_index_stats *counters);
>>> diff --git a/drivers/md/dm-vdo/indexer/indexer.h 
>>> b/drivers/md/dm-vdo/indexer/indexer.h
>>> index a832a34d9436..5dd2c93f12c2 100644
>>> --- a/drivers/md/dm-vdo/indexer/indexer.h
>>> +++ b/drivers/md/dm-vdo/indexer/indexer.h
>>> @@ -130,6 +130,8 @@ struct uds_volume_record {
>>>   struct uds_parameters {
>>>       /* The block_device used for storage */
>>>       struct block_device *bdev;
>>> +    /* Then opened block_device */
>>> +    struct file *bdev_file;
>>>       /* The maximum allowable size of the index on storage */
>>>       size_t size;
>>>       /* The offset where the index should start */
>>> @@ -314,7 +316,7 @@ int __must_check uds_suspend_index_session(struct 
>>> uds_index_session *session, bo
>>>    * start using the new backing store instead.
>>>    */
>>>   int __must_check uds_resume_index_session(struct uds_index_session 
>>> *session,
>>> -                      struct block_device *bdev);
>>> +                      struct file *bdev_file);
>>>   /* Wait until all outstanding index operations are complete. */
>>>   int __must_check uds_flush_index_session(struct uds_index_session 
>>> *session);
>>> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c 
>>> b/drivers/md/dm-vdo/indexer/io-factory.c
>>> index 61104d5ccd61..a855c3ac73bc 100644
>>> --- a/drivers/md/dm-vdo/indexer/io-factory.c
>>> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
>>> @@ -23,6 +23,7 @@
>>>    */
>>>   struct io_factory {
>>>       struct block_device *bdev;
>>> +    struct file *bdev_file;
>>>       atomic_t ref_count;
>>>   };
>>> @@ -59,7 +60,7 @@ static void uds_get_io_factory(struct io_factory 
>>> *factory)
>>>       atomic_inc(&factory->ref_count);
>>>   }
>>> -int uds_make_io_factory(struct block_device *bdev, struct io_factory 
>>> **factory_ptr)
>>> +int uds_make_io_factory(struct file *bdev_file, struct io_factory 
>>> **factory_ptr)
>>>   {
>>>       int result;
>>>       struct io_factory *factory;
>>> @@ -68,16 +69,18 @@ int uds_make_io_factory(struct block_device 
>>> *bdev, struct io_factory **factory_p
>>>       if (result != VDO_SUCCESS)
>>>           return result;
>>> -    factory->bdev = bdev;
>>> +    factory->bdev = file_bdev(bdev_file);
>>> +    factory->bdev_file = bdev_file;
>>>       atomic_set_release(&factory->ref_count, 1);
>>>       *factory_ptr = factory;
>>>       return UDS_SUCCESS;
>>>   }
>>> -int uds_replace_storage(struct io_factory *factory, struct 
>>> block_device *bdev)
>>> +int uds_replace_storage(struct io_factory *factory, struct file 
>>> *bdev_file)
>>>   {
>>> -    factory->bdev = bdev;
>>> +    factory->bdev = file_bdev(bdev_file);
>>> +    factory->bdev_file = bdev_file;
>>>       return UDS_SUCCESS;
>>>   }
>>> @@ -90,7 +93,7 @@ void uds_put_io_factory(struct io_factory *factory)
>>>   size_t uds_get_writable_size(struct io_factory *factory)
>>>   {
>>> -    return i_size_read(factory->bdev->bd_inode);
>>> +    return i_size_read(file_inode(factory->bdev_file));
>>>   }
>>>   /* Create a struct dm_bufio_client for an index region starting at 
>>> offset. */
>>> diff --git a/drivers/md/dm-vdo/indexer/io-factory.h 
>>> b/drivers/md/dm-vdo/indexer/io-factory.h
>>> index 60749a9ff756..e5100ab57754 100644
>>> --- a/drivers/md/dm-vdo/indexer/io-factory.h
>>> +++ b/drivers/md/dm-vdo/indexer/io-factory.h
>>> @@ -24,11 +24,11 @@ enum {
>>>       SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
>>>   };
>>> -int __must_check uds_make_io_factory(struct block_device *bdev,
>>> +int __must_check uds_make_io_factory(struct file *bdev_file,
>>>                        struct io_factory **factory_ptr);
>>>   int __must_check uds_replace_storage(struct io_factory *factory,
>>> -                     struct block_device *bdev);
>>> +                     struct file *bdev_file);
>>>   void uds_put_io_factory(struct io_factory *factory);
>>> diff --git a/drivers/md/dm-vdo/indexer/volume.c 
>>> b/drivers/md/dm-vdo/indexer/volume.c
>>> index 8b21ec93f3bc..a292840a83e3 100644
>>> --- a/drivers/md/dm-vdo/indexer/volume.c
>>> +++ b/drivers/md/dm-vdo/indexer/volume.c
>>> @@ -1467,12 +1467,12 @@ int uds_find_volume_chapter_boundaries(struct 
>>> volume *volume, u64 *lowest_vcn,
>>>   int __must_check uds_replace_volume_storage(struct volume *volume,
>>>                           struct index_layout *layout,
>>> -                        struct block_device *bdev)
>>> +                        struct file *bdev_file)
>>>   {
>>>       int result;
>>>       u32 i;
>>> -    result = uds_replace_index_layout_storage(layout, bdev);
>>> +    result = uds_replace_index_layout_storage(layout, bdev_file);
>>>       if (result != UDS_SUCCESS)
>>>           return result;
>>> diff --git a/drivers/md/dm-vdo/indexer/volume.h 
>>> b/drivers/md/dm-vdo/indexer/volume.h
>>> index 7fdd44464db2..5861654d837e 100644
>>> --- a/drivers/md/dm-vdo/indexer/volume.h
>>> +++ b/drivers/md/dm-vdo/indexer/volume.h
>>> @@ -131,7 +131,7 @@ void vdo_free_volume(struct volume *volume);
>>>   int __must_check uds_replace_volume_storage(struct volume *volume,
>>>                           struct index_layout *layout,
>>> -                        struct block_device *bdev);
>>> +                        struct file *bdev_file);
>>>   int __must_check uds_find_volume_chapter_boundaries(struct volume 
>>> *volume,
>>>                               u64 *lowest_vcn, u64 *highest_vcn,
>>> -- 
>>> 2.39.2
>>>
> 
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-19  1:43           ` Yu Kuai
@ 2024-03-19  2:13             ` Matthew Sakai
  2024-03-19  2:27               ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Matthew Sakai @ 2024-03-19  2:13 UTC (permalink / raw)
  To: Yu Kuai, Christian Brauner
  Cc: jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C),
	dm-devel


On 3/18/24 21:43, Yu Kuai wrote:
> Hi,
> 
> 在 2024/03/19 9:18, Yu Kuai 写道:
>> Hi,
>>
>> 在 2024/03/18 17:39, Christian Brauner 写道:
>>> On Sat, Mar 16, 2024 at 10:49:33AM +0800, Yu Kuai wrote:
>>>> Hi, Christian
>>>>
>>>> 在 2024/03/15 21:54, Christian Brauner 写道:
>>>>> On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
>>>>>> Hi, Christian
>>>>>> Hi, Christoph
>>>>>> Hi, Jan
>>>>>>
>>>>>> Perhaps now is a good time to send a formal version of this set.
>>>>>> However, I'm not sure yet what branch should I rebase and send 
>>>>>> this set.
>>>>>> Should I send to the vfs tree?
>>>>>
>>>>> Nearly all of it is in fs/ so I'd say yes.
>>>>> .
>>>>
>>>> I see that you just create a new branch vfs.fixes, perhaps can I rebase
>>>> this set against this branch?
>>>
>>> Please base it on vfs.super. I'll rebase it to v6.9-rc1 on Sunday.
>>
>> Okay, I just see that vfs.super doesn't contain commit
>> 1cdeac6da33f("btrfs: pass btrfs_device to btrfs_scratch_superblocks()"),
>> and you might need to fix the conflict at some point.
> 
> And there is another problem, dm-vdo doesn't exist in vfs.super yet. Do
> you still want me to rebase here?
> 

The dm-vdo changes don't appear to rely on earlier patches in the 
series, so I think dm-vdo could incorporate the dm-vdo patch 
independently from the rest of the series, if that would be helpful. (I 
don't want to confuse things too much.) In that case it would go through 
the dm tree with the rest of dm-vdo.

Matt


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode
  2024-03-19  2:13             ` Matthew Sakai
@ 2024-03-19  2:27               ` Yu Kuai
  0 siblings, 0 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-19  2:27 UTC (permalink / raw)
  To: Matthew Sakai, Yu Kuai, Christian Brauner
  Cc: jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, dm-devel, yukuai (C)

Hi,

在 2024/03/19 10:13, Matthew Sakai 写道:
> 
> On 3/18/24 21:43, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/03/19 9:18, Yu Kuai 写道:
>>> Hi,
>>>
>>> 在 2024/03/18 17:39, Christian Brauner 写道:
>>>> On Sat, Mar 16, 2024 at 10:49:33AM +0800, Yu Kuai wrote:
>>>>> Hi, Christian
>>>>>
>>>>> 在 2024/03/15 21:54, Christian Brauner 写道:
>>>>>> On Fri, Mar 15, 2024 at 08:08:49PM +0800, Yu Kuai wrote:
>>>>>>> Hi, Christian
>>>>>>> Hi, Christoph
>>>>>>> Hi, Jan
>>>>>>>
>>>>>>> Perhaps now is a good time to send a formal version of this set.
>>>>>>> However, I'm not sure yet what branch should I rebase and send 
>>>>>>> this set.
>>>>>>> Should I send to the vfs tree?
>>>>>>
>>>>>> Nearly all of it is in fs/ so I'd say yes.
>>>>>> .
>>>>>
>>>>> I see that you just create a new branch vfs.fixes, perhaps can I 
>>>>> rebase
>>>>> this set against this branch?
>>>>
>>>> Please base it on vfs.super. I'll rebase it to v6.9-rc1 on Sunday.
>>>
>>> Okay, I just see that vfs.super doesn't contain commit
>>> 1cdeac6da33f("btrfs: pass btrfs_device to btrfs_scratch_superblocks()"),
>>> and you might need to fix the conflict at some point.
>>
>> And there is another problem, dm-vdo doesn't exist in vfs.super yet. Do
>> you still want me to rebase here?
>>
> 
> The dm-vdo changes don't appear to rely on earlier patches in the 
> series, so I think dm-vdo could incorporate the dm-vdo patch 
> independently from the rest of the series, if that would be helpful. (I 
> don't want to confuse things too much.) In that case it would go through 
> the dm tree with the rest of dm-vdo.

We want to remove the 'bd_inode' field in this set. And if we want to go
through dm tree for dm-vdo changes, we must keep the field for now.

I don't have preference, Christian will make the decision. 😉

Thanks,
Kuai

> 
> Matt
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-18 23:22             ` Christoph Hellwig
@ 2024-03-19  8:26               ` Yu Kuai
  2024-03-21 11:27                 ` Jan Kara
  2024-03-22  6:33                 ` Al Viro
  0 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-19  8:26 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: jack, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/03/19 7:22, Christoph Hellwig 写道:
> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
>> I come up with an ideal:
>>
>> While opening the block_device the first time, store the generated new
>> file in "bd_inode->i_private". And release it after the last opener
>> close the block_device.
>>
>> The advantages are:
>>   - multiple openers can share the same bdev_file;
>>   - raw block device ops can use the bdev_file as well, and there is no
>> need to distinguish iomap/buffer_head for raw block_device;
>>
>> Please let me know what do you think?
> 
> That does sound very reasonable to me.
> 
I just implement the ideal with following patch(not fully tested, just
boot and some blktests)

Please let me know what you think.
Thanks!
Kuai

diff --git a/block/bdev.c b/block/bdev.c
index d42a6bc73474..8bc8962c59a5 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -899,14 +899,6 @@ int bdev_open(struct block_device *bdev, blk_mode_t 
mode, void *holder,
         if (unblock_events)
                 disk_unblock_events(disk);

-       bdev_file->f_flags |= O_LARGEFILE;
-       bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
-       if (bdev_nowait(bdev))
-               bdev_file->f_mode |= FMODE_NOWAIT;
-       bdev_file->f_mapping = bdev_mapping(bdev);
-       bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
-       bdev_file->private_data = holder;
-
         return 0;
  put_module:
         module_put(disk->fops->owner);
@@ -948,12 +940,66 @@ static unsigned blk_to_file_flags(blk_mode_t mode)
         return flags;
  }

+struct file *alloc_and_init_bdev_file(struct block_device *bdev,
+                                     blk_mode_t mode, void *holder)
+{
+       struct file *bdev_file = 
alloc_file_pseudo_noaccount(bdev_inode(bdev),
+                       blockdev_mnt, "", blk_to_file_flags(mode) | 
O_LARGEFILE,
+                       &def_blk_fops);
+
+       if (IS_ERR(bdev_file))
+               return bdev_file;
+
+       bdev_file->f_flags |= O_LARGEFILE;
+       bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+       if (bdev_nowait(bdev))
+               bdev_file->f_mode |= FMODE_NOWAIT;
+       bdev_file->f_mapping = bdev_mapping(bdev);
+       bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
+       bdev_file->private_data = holder;
+
+       return bdev_file;
+}
+
+void get_bdev_file(struct block_device *bdev, struct file *bdev_file)
+{
+       struct inode *bd_inode = bdev_inode(bdev);
+       struct file *file;
+
+       mutex_lock(&bdev->bd_disk->open_mutex);
+       file = bd_inode->i_private;
+
+       if (!file) {
+               get_file(bdev_file);
+               bd_inode->i_private = bdev_file;
+       } else {
+               get_file(file);
+       }
+
+       mutex_unlock(&bdev->bd_disk->open_mutex);
+}
+
+void put_bdev_file(struct block_device *bdev)
+{
+       struct file *file = NULL;
+       struct inode *bd_inode = bdev_inode(bdev);
+
+       mutex_lock(&bdev->bd_disk->open_mutex);
+       file = bd_inode->i_private;
+
+       if (!atomic_read(&bdev->bd_openers))
+               bd_inode->i_private = NULL;
+
+       mutex_unlock(&bdev->bd_disk->open_mutex);
+
+       fput(file);
+}
+
  struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void 
*holder,
                                    const struct blk_holder_ops *hops)
  {
         struct file *bdev_file;
         struct block_device *bdev;
-       unsigned int flags;
         int ret;

         ret = bdev_permission(dev, mode, holder);
@@ -964,20 +1010,20 @@ struct file *bdev_file_open_by_dev(dev_t dev, 
blk_mode_t mode, void *holder,
         if (!bdev)
                 return ERR_PTR(-ENXIO);

-       flags = blk_to_file_flags(mode);
-       bdev_file = alloc_file_pseudo_noaccount(bdev_inode(bdev),
-                       blockdev_mnt, "", flags | O_LARGEFILE, 
&def_blk_fops);
+       bdev_file = alloc_and_init_bdev_file(bdev, mode, holder);
         if (IS_ERR(bdev_file)) {
                 blkdev_put_no_open(bdev);
                 return bdev_file;
         }
         ihold(bdev_inode(bdev));
+       get_bdev_file(bdev, bdev_file);

         ret = bdev_open(bdev, mode, holder, hops, bdev_file);
         if (ret) {
                 /* We failed to open the block device. Let ->release() 
know. */
                 bdev_file->private_data = ERR_PTR(ret);
                 fput(bdev_file);
+               put_bdev_file(bdev);
                 return ERR_PTR(ret);
         }
         return bdev_file;
@@ -1049,6 +1095,7 @@ void bdev_release(struct file *bdev_file)

         module_put(disk->fops->owner);
  put_no_open:
+       put_bdev_file(bdev);
         blkdev_put_no_open(bdev);
  }

diff --git a/block/blk.h b/block/blk.h
index 5ac293179bfb..ebe99dc9cff5 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -518,6 +518,10 @@ static inline int req_ref_read(struct request *req)
         return atomic_read(&req->ref);
  }

+struct file *alloc_and_init_bdev_file(struct block_device *bdev,
+                                     blk_mode_t mode, void *holder);
+void get_bdev_file(struct block_device *bdev, struct file *bdev_file);
+void put_bdev_file(struct block_device *bdev);
  void bdev_release(struct file *bdev_file);
  int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
               const struct blk_holder_ops *hops, struct file *bdev_file);
diff --git a/block/fops.c b/block/fops.c
index 4037ae72a919..059f6c7d3c09 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -382,7 +382,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter)
  static int blkdev_iomap_begin(struct inode *inode, loff_t offset, 
loff_t length,
                 unsigned int flags, struct iomap *iomap, struct iomap 
*srcmap)
  {
-       struct block_device *bdev = I_BDEV(inode);
+       struct block_device *bdev = file_bdev(inode->i_private);
         loff_t isize = i_size_read(inode);

         iomap->bdev = bdev;
@@ -404,7 +404,7 @@ static const struct iomap_ops blkdev_iomap_ops = {
  static int blkdev_get_block(struct inode *inode, sector_t iblock,
                 struct buffer_head *bh, int create)
  {
-       bh->b_bdev = I_BDEV(inode);
+       bh->b_bdev = file_bdev(inode->i_private);
         bh->b_blocknr = iblock;
         set_buffer_mapped(bh);
         return 0;
@@ -598,6 +598,7 @@ blk_mode_t file_to_blk_mode(struct file *file)

  static int blkdev_open(struct inode *inode, struct file *filp)
  {
+       struct file *bdev_file;
         struct block_device *bdev;
         blk_mode_t mode;
         int ret;
@@ -614,9 +615,28 @@ static int blkdev_open(struct inode *inode, struct 
file *filp)
         if (!bdev)
                 return -ENXIO;

+       bdev_file = alloc_and_init_bdev_file(bdev,
+                       BLK_OPEN_READ | BLK_OPEN_WRITE, NULL);
+       if (IS_ERR(bdev_file)) {
+               blkdev_put_no_open(bdev);
+               return PTR_ERR(bdev_file);
+       }
+
+       bdev_file->private_data = ERR_PTR(-EINVAL);
+       get_bdev_file(bdev, bdev_file);
         ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
-       if (ret)
+       if (ret) {
+               put_bdev_file(bdev);
                 blkdev_put_no_open(bdev);
+       } else {
+               filp->f_flags |= O_LARGEFILE;
+               filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+               if (bdev_nowait(bdev))
+                       filp->f_mode |= FMODE_NOWAIT;
+               filp->f_mapping = bdev_mapping(bdev);
+               filp->f_wb_err = 
filemap_sample_wb_err(bdev_file->f_mapping);
+       }
+
         return ret;
  }

> .
> 


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-19  8:26               ` Yu Kuai
@ 2024-03-21 11:27                 ` Jan Kara
  2024-03-21 12:15                   ` Yu Kuai
  2024-03-22  6:33                 ` Al Viro
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Kara @ 2024-03-21 11:27 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

Hello!

On Tue 19-03-24 16:26:19, Yu Kuai wrote:
> 在 2024/03/19 7:22, Christoph Hellwig 写道:
> > On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
> > > I come up with an ideal:
> > > 
> > > While opening the block_device the first time, store the generated new
> > > file in "bd_inode->i_private". And release it after the last opener
> > > close the block_device.
> > > 
> > > The advantages are:
> > >   - multiple openers can share the same bdev_file;
> > >   - raw block device ops can use the bdev_file as well, and there is no
> > > need to distinguish iomap/buffer_head for raw block_device;
> > > 
> > > Please let me know what do you think?
> > 
> > That does sound very reasonable to me.
> > 
> I just implement the ideal with following patch(not fully tested, just
> boot and some blktests)

So I was looking into this and I'm not sure I 100% understand the problem.
I understand that the inode you get e.g. in blkdev_get_block(),
blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
block device instead of your file_bdev(inode->i_private)? I don't see any
advantage in stashing away that special bdev_file into inode->i_private but
perhaps I'm missing something...

								Honza

> diff --git a/block/fops.c b/block/fops.c
> index 4037ae72a919..059f6c7d3c09 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -382,7 +382,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb,
> struct iov_iter *iter)
>  static int blkdev_iomap_begin(struct inode *inode, loff_t offset, loff_t
> length,
>                 unsigned int flags, struct iomap *iomap, struct iomap
> *srcmap)
>  {
> -       struct block_device *bdev = I_BDEV(inode);
> +       struct block_device *bdev = file_bdev(inode->i_private);
>         loff_t isize = i_size_read(inode);
> 
>         iomap->bdev = bdev;
> @@ -404,7 +404,7 @@ static const struct iomap_ops blkdev_iomap_ops = {
>  static int blkdev_get_block(struct inode *inode, sector_t iblock,
>                 struct buffer_head *bh, int create)
>  {
> -       bh->b_bdev = I_BDEV(inode);
> +       bh->b_bdev = file_bdev(inode->i_private);
>         bh->b_blocknr = iblock;
>         set_buffer_mapped(bh);
>         return 0;
> @@ -598,6 +598,7 @@ blk_mode_t file_to_blk_mode(struct file *file)
> 
>  static int blkdev_open(struct inode *inode, struct file *filp)
>  {
> +       struct file *bdev_file;
>         struct block_device *bdev;
>         blk_mode_t mode;
>         int ret;
> @@ -614,9 +615,28 @@ static int blkdev_open(struct inode *inode, struct file
> *filp)
>         if (!bdev)
>                 return -ENXIO;
> 
> +       bdev_file = alloc_and_init_bdev_file(bdev,
> +                       BLK_OPEN_READ | BLK_OPEN_WRITE, NULL);
> +       if (IS_ERR(bdev_file)) {
> +               blkdev_put_no_open(bdev);
> +               return PTR_ERR(bdev_file);
> +       }
> +
> +       bdev_file->private_data = ERR_PTR(-EINVAL);
> +       get_bdev_file(bdev, bdev_file);
>         ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
> -       if (ret)
> +       if (ret) {
> +               put_bdev_file(bdev);
>                 blkdev_put_no_open(bdev);
> +       } else {
> +               filp->f_flags |= O_LARGEFILE;
> +               filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
> +               if (bdev_nowait(bdev))
> +                       filp->f_mode |= FMODE_NOWAIT;
> +               filp->f_mapping = bdev_mapping(bdev);
> +               filp->f_wb_err =
> filemap_sample_wb_err(bdev_file->f_mapping);
> +       }
> +
>         return ret;
>  }
> 
> > .
> > 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-21 11:27                 ` Jan Kara
@ 2024-03-21 12:15                   ` Yu Kuai
  2024-03-22  6:37                     ` Al Viro
  0 siblings, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-21 12:15 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: Christoph Hellwig, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Hi, Jan!

在 2024/03/21 19:27, Jan Kara 写道:
> Hello!
> 
> On Tue 19-03-24 16:26:19, Yu Kuai wrote:
>> 在 2024/03/19 7:22, Christoph Hellwig 写道:
>>> On Mon, Mar 18, 2024 at 03:19:03PM +0800, Yu Kuai wrote:
>>>> I come up with an ideal:
>>>>
>>>> While opening the block_device the first time, store the generated new
>>>> file in "bd_inode->i_private". And release it after the last opener
>>>> close the block_device.
>>>>
>>>> The advantages are:
>>>>    - multiple openers can share the same bdev_file;
>>>>    - raw block device ops can use the bdev_file as well, and there is no
>>>> need to distinguish iomap/buffer_head for raw block_device;
>>>>
>>>> Please let me know what do you think?
>>>
>>> That does sound very reasonable to me.
>>>
>> I just implement the ideal with following patch(not fully tested, just
>> boot and some blktests)
> 
> So I was looking into this and I'm not sure I 100% understand the problem.
> I understand that the inode you get e.g. in blkdev_get_block(),
> blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> block device instead of your file_bdev(inode->i_private)? I don't see any
> advantage in stashing away that special bdev_file into inode->i_private but
> perhaps I'm missing something...
> 

Because we're goning to remove the 'block_device' from iomap and
buffer_head, and replace it with a 'bdev_file'.

patch 19 from this set is using a union of block_device and bdev_file,
this can work as well.

Thanks,
Kuai

> 								Honza
> 
>> diff --git a/block/fops.c b/block/fops.c
>> index 4037ae72a919..059f6c7d3c09 100644
>> --- a/block/fops.c
>> +++ b/block/fops.c
>> @@ -382,7 +382,7 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb,
>> struct iov_iter *iter)
>>   static int blkdev_iomap_begin(struct inode *inode, loff_t offset, loff_t
>> length,
>>                  unsigned int flags, struct iomap *iomap, struct iomap
>> *srcmap)
>>   {
>> -       struct block_device *bdev = I_BDEV(inode);
>> +       struct block_device *bdev = file_bdev(inode->i_private);
>>          loff_t isize = i_size_read(inode);
>>
>>          iomap->bdev = bdev;
>> @@ -404,7 +404,7 @@ static const struct iomap_ops blkdev_iomap_ops = {
>>   static int blkdev_get_block(struct inode *inode, sector_t iblock,
>>                  struct buffer_head *bh, int create)
>>   {
>> -       bh->b_bdev = I_BDEV(inode);
>> +       bh->b_bdev = file_bdev(inode->i_private);
>>          bh->b_blocknr = iblock;
>>          set_buffer_mapped(bh);
>>          return 0;
>> @@ -598,6 +598,7 @@ blk_mode_t file_to_blk_mode(struct file *file)
>>
>>   static int blkdev_open(struct inode *inode, struct file *filp)
>>   {
>> +       struct file *bdev_file;
>>          struct block_device *bdev;
>>          blk_mode_t mode;
>>          int ret;
>> @@ -614,9 +615,28 @@ static int blkdev_open(struct inode *inode, struct file
>> *filp)
>>          if (!bdev)
>>                  return -ENXIO;
>>
>> +       bdev_file = alloc_and_init_bdev_file(bdev,
>> +                       BLK_OPEN_READ | BLK_OPEN_WRITE, NULL);
>> +       if (IS_ERR(bdev_file)) {
>> +               blkdev_put_no_open(bdev);
>> +               return PTR_ERR(bdev_file);
>> +       }
>> +
>> +       bdev_file->private_data = ERR_PTR(-EINVAL);
>> +       get_bdev_file(bdev, bdev_file);
>>          ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
>> -       if (ret)
>> +       if (ret) {
>> +               put_bdev_file(bdev);
>>                  blkdev_put_no_open(bdev);
>> +       } else {
>> +               filp->f_flags |= O_LARGEFILE;
>> +               filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
>> +               if (bdev_nowait(bdev))
>> +                       filp->f_mode |= FMODE_NOWAIT;
>> +               filp->f_mapping = bdev_mapping(bdev);
>> +               filp->f_wb_err =
>> filemap_sample_wb_err(bdev_file->f_mapping);
>> +       }
>> +
>>          return ret;
>>   }
>>
>>> .
>>>
>>


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode
  2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
  2024-03-15 14:44   ` Jan Kara
  2024-03-17 21:23   ` Christoph Hellwig
@ 2024-03-22  5:44   ` Al Viro
  2 siblings, 0 replies; 98+ messages in thread
From: Al Viro @ 2024-03-22  5:44 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yukuai3,
	yi.zhang, yangerkun

On Thu, Feb 22, 2024 at 08:45:40PM +0800, Yu Kuai wrote:

> +static inline struct bdev_inode *BDEV_B(struct block_device *bdev)
> +{
> +	return container_of(bdev, struct bdev_inode, bdev);
> +}
> +
> +struct inode *bdev_inode(struct block_device *bdev)
> +{
> +	return &BDEV_B(bdev)->vfs_inode;
> +}
> +
> +struct address_space *bdev_mapping(struct block_device *bdev)
> +{
> +	return BDEV_B(bdev)->vfs_inode.i_mapping;
> +}

Nit: that might as well had been &BDEV_B(bdev)->vfs_inode.i_data
These inodes always have ->i_mapping pointing their own ->i_data.
If we ever change that, we would have enough bdev.c work on hands
anyway.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-19  8:26               ` Yu Kuai
  2024-03-21 11:27                 ` Jan Kara
@ 2024-03-22  6:33                 ` Al Viro
  2024-03-22  7:09                   ` Yu Kuai
  2024-03-22 13:10                   ` Jan Kara
  1 sibling, 2 replies; 98+ messages in thread
From: Al Viro @ 2024-03-22  6:33 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Tue, Mar 19, 2024 at 04:26:19PM +0800, Yu Kuai wrote:

> +void put_bdev_file(struct block_device *bdev)
> +{
> +       struct file *file = NULL;
> +       struct inode *bd_inode = bdev_inode(bdev);
> +
> +       mutex_lock(&bdev->bd_disk->open_mutex);
> +       file = bd_inode->i_private;
> +
> +       if (!atomic_read(&bdev->bd_openers))
> +               bd_inode->i_private = NULL;
> +
> +       mutex_unlock(&bdev->bd_disk->open_mutex);
> +
> +       fput(file);
> +}

Locking is completely wrong here.  The only thing that protects
->bd_openers is ->open_mutex.  atomic_read() is obviously a red
herring.

Suppose another thread has already opened the same sucker
with bdev_file_open_by_dev().

Now you are doing the same thing, just as the other guy is
getting to bdev_release() call.

The thing is, between your get_bdev_file() and increment of ->bd_openers
(in bdev_open()) there's a window when bdev_release() of the old file
could've gotten all the way through the decrement of ->bd_openers
(to 0, since our increment has not happened yet) and through the
call of put_bdev_file(), which ends up clearing ->i_private.

End result:

* old ->i_private leaked (already grabbed by your get_bdev_file())
* ->bd_openers at 1 (after your bdev_open() gets through)
* ->i_private left NULL.

Christoph, could we please get rid of that atomic_t nonsense?
It only confuses people into brainos like that.  It really
needs ->open_mutex for any kind of atomicity.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-21 12:15                   ` Yu Kuai
@ 2024-03-22  6:37                     ` Al Viro
  2024-03-22  6:39                       ` Al Viro
  0 siblings, 1 reply; 98+ messages in thread
From: Al Viro @ 2024-03-22  6:37 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:

> > blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> > inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> > block device instead of your file_bdev(inode->i_private)? I don't see any
> > advantage in stashing away that special bdev_file into inode->i_private but
> > perhaps I'm missing something...
> > 
> 
> Because we're goning to remove the 'block_device' from iomap and
> buffer_head, and replace it with a 'bdev_file'.

What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
just fine...

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:37                     ` Al Viro
@ 2024-03-22  6:39                       ` Al Viro
  2024-03-22  6:52                         ` Yu Kuai
  0 siblings, 1 reply; 98+ messages in thread
From: Al Viro @ 2024-03-22  6:39 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 06:37:18AM +0000, Al Viro wrote:
> On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:
> 
> > > blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> > > inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> > > block device instead of your file_bdev(inode->i_private)? I don't see any
> > > advantage in stashing away that special bdev_file into inode->i_private but
> > > perhaps I'm missing something...
> > > 
> > 
> > Because we're goning to remove the 'block_device' from iomap and
> > buffer_head, and replace it with a 'bdev_file'.
> 
> What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
> just fine...

file->f_mapping->host, obviously - sorry.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:39                       ` Al Viro
@ 2024-03-22  6:52                         ` Yu Kuai
  2024-03-22 12:57                           ` Jan Kara
  2024-03-22 15:43                           ` Al Viro
  0 siblings, 2 replies; 98+ messages in thread
From: Yu Kuai @ 2024-03-22  6:52 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: Jan Kara, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2024/03/22 14:39, Al Viro 写道:
> On Fri, Mar 22, 2024 at 06:37:18AM +0000, Al Viro wrote:
>> On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:
>>
>>>> blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
>>>> inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
>>>> block device instead of your file_bdev(inode->i_private)? I don't see any
>>>> advantage in stashing away that special bdev_file into inode->i_private but
>>>> perhaps I'm missing something...
>>>>
>>>
>>> Because we're goning to remove the 'block_device' from iomap and
>>> buffer_head, and replace it with a 'bdev_file'.
>>
>> What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
>> just fine...
> 
> file->f_mapping->host, obviously - sorry.
> .

Yes, we already get bdev_inode this way, and use it in
blkdev_iomap_begin() and blkdev_get_block(), the problem is that if we
want to let iomap and buffer_head to use bdev_file for raw block fops as 
well, we need a 'bdev_file' somehow.

Thanks,
Kuai

> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:33                 ` Al Viro
@ 2024-03-22  7:09                   ` Yu Kuai
  2024-03-22 16:01                     ` Al Viro
  2024-03-22 13:10                   ` Jan Kara
  1 sibling, 1 reply; 98+ messages in thread
From: Yu Kuai @ 2024-03-22  7:09 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

Hi,

在 2024/03/22 14:33, Al Viro 写道:
> On Tue, Mar 19, 2024 at 04:26:19PM +0800, Yu Kuai wrote:
> 
>> +void put_bdev_file(struct block_device *bdev)
>> +{
>> +       struct file *file = NULL;
>> +       struct inode *bd_inode = bdev_inode(bdev);
>> +
>> +       mutex_lock(&bdev->bd_disk->open_mutex);
>> +       file = bd_inode->i_private;
>> +
>> +       if (!atomic_read(&bdev->bd_openers))
>> +               bd_inode->i_private = NULL;
>> +
>> +       mutex_unlock(&bdev->bd_disk->open_mutex);
>> +
>> +       fput(file);
>> +}
> 
> Locking is completely wrong here.  The only thing that protects
> ->bd_openers is ->open_mutex.  atomic_read() is obviously a red
> herring.

I'm lost here, in get_bdev_file() and put_bdev_file(), I grabbed
'open_mutex' to protect reading 'bd_openers', reading and setting
'bd_inode->i_private'.
> 
> Suppose another thread has already opened the same sucker
> with bdev_file_open_by_dev().
> 
> Now you are doing the same thing, just as the other guy is
> getting to bdev_release() call.
> 
> The thing is, between your get_bdev_file() and increment of ->bd_openers
> (in bdev_open()) there's a window when bdev_release() of the old file
> could've gotten all the way through the decrement of ->bd_openers
> (to 0, since our increment has not happened yet) and through the
> call of put_bdev_file(), which ends up clearing ->i_private.
> 
> End result:
> 
> * old ->i_private leaked (already grabbed by your get_bdev_file())
> * ->bd_openers at 1 (after your bdev_open() gets through)
> * ->i_private left NULL.
> 
Yes, I got you now. The problem is this patch is that:

1) opener 1, set bdev_file, bd_openers is 1
2) opener 2, before bdev_open(), get bdev_file,
3) close 1, bd_openers is 0, clear bdev_file
4) opener 2, after bdev_open(), bdev_file is cleared unexpected.

> Christoph, could we please get rid of that atomic_t nonsense?
> It only confuses people into brainos like that.  It really
> needs ->open_mutex for any kind of atomicity.

While we're here, which way should we move forward?
1. keep the behavior to use bdev for iomap/buffer_head for raw block
ops;
2. record new 'bdev_file' in 'bd_inode->i_private', and use a new way
to handle the concurrent scenario.
3. other possible solution?

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:52                         ` Yu Kuai
@ 2024-03-22 12:57                           ` Jan Kara
  2024-03-22 13:57                             ` Christian Brauner
  2024-03-22 15:43                           ` Al Viro
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Kara @ 2024-03-22 12:57 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Al Viro, Jan Kara, Christoph Hellwig, brauner, axboe,
	linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri 22-03-24 14:52:16, Yu Kuai wrote:
> 在 2024/03/22 14:39, Al Viro 写道:
> > On Fri, Mar 22, 2024 at 06:37:18AM +0000, Al Viro wrote:
> > > On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:
> > > 
> > > > > blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> > > > > inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> > > > > block device instead of your file_bdev(inode->i_private)? I don't see any
> > > > > advantage in stashing away that special bdev_file into inode->i_private but
> > > > > perhaps I'm missing something...
> > > > > 
> > > > 
> > > > Because we're goning to remove the 'block_device' from iomap and
> > > > buffer_head, and replace it with a 'bdev_file'.
> > > 
> > > What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
> > > just fine...
> > 
> > file->f_mapping->host, obviously - sorry.
> > .
> 
> Yes, we already get bdev_inode this way, and use it in
> blkdev_iomap_begin() and blkdev_get_block(), the problem is that if we
> want to let iomap and buffer_head to use bdev_file for raw block fops as
> well, we need a 'bdev_file' somehow.

Do you mean for operations like bread(), getblk(), or similar, don't you?
Frankly I don't find a huge value in this and seeing how clumsy it is
getting I'm not convinced it is worth it at this point.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:33                 ` Al Viro
  2024-03-22  7:09                   ` Yu Kuai
@ 2024-03-22 13:10                   ` Jan Kara
  2024-03-22 14:57                     ` Al Viro
  1 sibling, 1 reply; 98+ messages in thread
From: Jan Kara @ 2024-03-22 13:10 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri 22-03-24 06:33:46, Al Viro wrote:
> On Tue, Mar 19, 2024 at 04:26:19PM +0800, Yu Kuai wrote:
> 
> > +void put_bdev_file(struct block_device *bdev)
> > +{
> > +       struct file *file = NULL;
> > +       struct inode *bd_inode = bdev_inode(bdev);
> > +
> > +       mutex_lock(&bdev->bd_disk->open_mutex);
> > +       file = bd_inode->i_private;
> > +
> > +       if (!atomic_read(&bdev->bd_openers))
> > +               bd_inode->i_private = NULL;
> > +
> > +       mutex_unlock(&bdev->bd_disk->open_mutex);
> > +
> > +       fput(file);
> > +}
> 
> Locking is completely wrong here.  The only thing that protects
> ->bd_openers is ->open_mutex.  atomic_read() is obviously a red
> herring.
> 
> Suppose another thread has already opened the same sucker
> with bdev_file_open_by_dev().
> 
> Now you are doing the same thing, just as the other guy is
> getting to bdev_release() call.
> 
> The thing is, between your get_bdev_file() and increment of ->bd_openers
> (in bdev_open()) there's a window when bdev_release() of the old file
> could've gotten all the way through the decrement of ->bd_openers
> (to 0, since our increment has not happened yet) and through the
> call of put_bdev_file(), which ends up clearing ->i_private.
> 
> End result:
> 
> * old ->i_private leaked (already grabbed by your get_bdev_file())
> * ->bd_openers at 1 (after your bdev_open() gets through)
> * ->i_private left NULL.
> 
> Christoph, could we please get rid of that atomic_t nonsense?
> It only confuses people into brainos like that.  It really
> needs ->open_mutex for any kind of atomicity.

Well, there are a couple of places where we end up reading bd_openers
without ->open_mutex. Sure these places are racy wrt other opens / closes
so they need to be careful but we want to make sure we read back at least
some sane value which is not guaranteed with normal int and compiler
possily playing weird tricks when updating it. But yes, we could convert
the atomic_t to using READ_ONCE + WRITE_ONCE in appropriate places to avoid
these issues and make it more obvious bd_openers are not really handled in
an atomic way.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22 12:57                           ` Jan Kara
@ 2024-03-22 13:57                             ` Christian Brauner
  0 siblings, 0 replies; 98+ messages in thread
From: Christian Brauner @ 2024-03-22 13:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, Al Viro, Christoph Hellwig, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

> Do you mean for operations like bread(), getblk(), or similar, don't you?
> Frankly I don't find a huge value in this and seeing how clumsy it is
> getting I'm not convinced it is worth it at this point.

Yes, I agree.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22 13:10                   ` Jan Kara
@ 2024-03-22 14:57                     ` Al Viro
  2024-03-25  1:06                       ` Christoph Hellwig
  0 siblings, 1 reply; 98+ messages in thread
From: Al Viro @ 2024-03-22 14:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 02:10:30PM +0100, Jan Kara wrote:
> > End result:
> > 
> > * old ->i_private leaked (already grabbed by your get_bdev_file())
> > * ->bd_openers at 1 (after your bdev_open() gets through)
> > * ->i_private left NULL.
> > 
> > Christoph, could we please get rid of that atomic_t nonsense?
> > It only confuses people into brainos like that.  It really
> > needs ->open_mutex for any kind of atomicity.
> 
> Well, there are a couple of places where we end up reading bd_openers
> without ->open_mutex. Sure these places are racy wrt other opens / closes
> so they need to be careful but we want to make sure we read back at least
> some sane value which is not guaranteed with normal int and compiler
> possily playing weird tricks when updating it. But yes, we could convert
> the atomic_t to using READ_ONCE + WRITE_ONCE in appropriate places to avoid
> these issues and make it more obvious bd_openers are not really handled in
> an atomic way.

What WRITE_ONE()?  We really shouldn't modify it without ->open_mutex; do
we ever do that?  In current mainline:

in blkdev_get_whole(), both callers under ->open_mutex:
block/bdev.c:671:       if (!atomic_read(&bdev->bd_openers))
block/bdev.c:675:       atomic_inc(&bdev->bd_openers);

in blkdev_put_whole(), the sole caller under ->open_mutex:
block_mutex/bdev.c:681:       if (atomic_dec_and_test(&bdev->bd_openers))

in blkdev_get_part(), both callers under ->open_mutex:
block/bdev.c:700:       if (!atomic_read(&part->bd_openers)) {
block/bdev.c:704:       atomic_inc(&part->bd_openers);

in blkdev_put_whole(), the sole caller under ->open_mutex:
block/bdev.c:741:       if (atomic_dec_and_test(&part->bd_openers)) {

in bdev_release(), a deliberately racy reader, commented as such:
block/bdev.c:1032:      if (atomic_read(&bdev->bd_openers) == 1)

in sync_bdevs(), under ->open_mutex:
block/bdev.c:1163:              if (!atomic_read(&bdev->bd_openers)) {

in bdev_del_partition(), under ->open_mutex:
block/partitions/core.c:460:    if (atomic_read(&part->bd_openers))

and finally, in disk_openers(), a racy reader:
include/linux/blkdev.h:231:     return atomic_read(&disk->part0->bd_openers);

So that's two READ_ONCE() and a bunch of reads and writes under ->open_mutex.
Callers of disk_openers() need to be careful and looking through those...
Some of them are under ->open_mutex (either explicitly, or as e.g. lo_release()
called only via bdev ->release(), which comes only under ->open_mutex), but
four of them are not:

arch/um/drivers/ubd_kern.c:1023:                if (disk_openers(ubd_dev->disk))
in ubd_remove().  Racy, possibly a bug.  AFAICS, it's accessible through UML
console and there's nothing to stop it from racing with open().

drivers/block/loop.c:1245:      if (disk_openers(lo->lo_disk) > 1) {
in loop_clr_fd().  Under loop's private lock, but that's likely to
be a race - ->bd_openers updates are not under that.  Note that
there's no ->open() for /dev/loop, BTW...

drivers/block/loop.c:2161:      if (lo->lo_state != Lo_unbound || disk_openers(lo->lo_disk) > 0) {
in loop_control_remove().  Similar to the previous one, except that
it's done out of band *and* it doesn't have the "autoclean" logics
to work around udev, lovingly described in the comment before the
call in loop_clr_fd().

drivers/block/nbd.c:1279:       if (disk_openers(nbd->disk) > 1)
in nbd_bdev_reset().  Under nbd private mutex (->config_lock),
so there's some exclusion with nbd_open(), but ->bd_openers change
comes outside of that.  Might or might not be a bug - I need to wake
up properly to look through that.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  6:52                         ` Yu Kuai
  2024-03-22 12:57                           ` Jan Kara
@ 2024-03-22 15:43                           ` Al Viro
  2024-03-22 16:16                             ` Al Viro
  1 sibling, 1 reply; 98+ messages in thread
From: Al Viro @ 2024-03-22 15:43 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 02:52:16PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2024/03/22 14:39, Al Viro 写道:
> > On Fri, Mar 22, 2024 at 06:37:18AM +0000, Al Viro wrote:
> > > On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:
> > > 
> > > > > blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> > > > > inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> > > > > block device instead of your file_bdev(inode->i_private)? I don't see any
> > > > > advantage in stashing away that special bdev_file into inode->i_private but
> > > > > perhaps I'm missing something...
> > > > > 
> > > > 
> > > > Because we're goning to remove the 'block_device' from iomap and
> > > > buffer_head, and replace it with a 'bdev_file'.
> > > 
> > > What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
> > > just fine...
> > 
> > file->f_mapping->host, obviously - sorry.
> > .
> 
> Yes, we already get bdev_inode this way, and use it in
> blkdev_iomap_begin() and blkdev_get_block(), the problem is that if we
> want to let iomap and buffer_head to use bdev_file for raw block fops as
> well, we need a 'bdev_file' somehow.

Explain, please.  Why would anything care whether the file is bdevfs
one or coming from devtmpfs/xfs/ext2/whatnot?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22  7:09                   ` Yu Kuai
@ 2024-03-22 16:01                     ` Al Viro
  0 siblings, 0 replies; 98+ messages in thread
From: Al Viro @ 2024-03-22 16:01 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, jack, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 03:09:30PM +0800, Yu Kuai wrote:

> > End result:
> > 
> > * old ->i_private leaked (already grabbed by your get_bdev_file())
> > * ->bd_openers at 1 (after your bdev_open() gets through)
> > * ->i_private left NULL.
> > 
> Yes, I got you now. The problem is this patch is that:
> 
> 1) opener 1, set bdev_file, bd_openers is 1
> 2) opener 2, before bdev_open(), get bdev_file,
> 3) close 1, bd_openers is 0, clear bdev_file
> 4) opener 2, after bdev_open(), bdev_file is cleared unexpected.
> 
> > Christoph, could we please get rid of that atomic_t nonsense?
> > It only confuses people into brainos like that.  It really
> > needs ->open_mutex for any kind of atomicity.
> 
> While we're here, which way should we move forward?
> 1. keep the behavior to use bdev for iomap/buffer_head for raw block
> ops;
> 2. record new 'bdev_file' in 'bd_inode->i_private', and use a new way
> to handle the concurrent scenario.
> 3. other possible solution?

OK, what lifetime rules do you intend for your objects?  It's really
hard to tell from that patch (and the last one in the main series).

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22 15:43                           ` Al Viro
@ 2024-03-22 16:16                             ` Al Viro
  0 siblings, 0 replies; 98+ messages in thread
From: Al Viro @ 2024-03-22 16:16 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, Christoph Hellwig, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 03:43:47PM +0000, Al Viro wrote:
> On Fri, Mar 22, 2024 at 02:52:16PM +0800, Yu Kuai wrote:
> > Hi,
> > 
> > 在 2024/03/22 14:39, Al Viro 写道:
> > > On Fri, Mar 22, 2024 at 06:37:18AM +0000, Al Viro wrote:
> > > > On Thu, Mar 21, 2024 at 08:15:06PM +0800, Yu Kuai wrote:
> > > > 
> > > > > > blkdev_iomap_begin() etc. may be an arbitrary filesystem block device
> > > > > > inode. But why can't you use I_BDEV(inode->i_mapping->host) to get to the
> > > > > > block device instead of your file_bdev(inode->i_private)? I don't see any
> > > > > > advantage in stashing away that special bdev_file into inode->i_private but
> > > > > > perhaps I'm missing something...
> > > > > > 
> > > > > 
> > > > > Because we're goning to remove the 'block_device' from iomap and
> > > > > buffer_head, and replace it with a 'bdev_file'.
> > > > 
> > > > What of that?  file_inode(file)->f_mapping->host will give you bdevfs inode
> > > > just fine...
> > > 
> > > file->f_mapping->host, obviously - sorry.
> > > .
> > 
> > Yes, we already get bdev_inode this way, and use it in
> > blkdev_iomap_begin() and blkdev_get_block(), the problem is that if we
> > want to let iomap and buffer_head to use bdev_file for raw block fops as
> > well, we need a 'bdev_file' somehow.
> 
> Explain, please.  Why would anything care whether the file is bdevfs
> one or coming from devtmpfs/xfs/ext2/whatnot?

Yecchhh...  I see one possible reason, unfortunately, but I really doubt
that your approach is workable.  iomap is not a problem; nothing in
there will persist past the destruction of struct file you've used;
buffer_head, OTOH, is a problem.  They are, by their nature,
shared between various openers and we can't really withdraw them.

Why do we want ->b_bdev replaced with struct file * in the first place?
AFAICS, your patch tries to make it unique per opened bdev; that
makes the lifetime rules really convoluted, but that aside, what's
in that struct file that is not in struct block_device?

I don't see any point trying to shove that down into buffer_head, or,
Cthulhu forbid, bio.  Details, please...

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode
  2024-03-22 14:57                     ` Al Viro
@ 2024-03-25  1:06                       ` Christoph Hellwig
  0 siblings, 0 replies; 98+ messages in thread
From: Christoph Hellwig @ 2024-03-25  1:06 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Yu Kuai, Christoph Hellwig, brauner, axboe,
	linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Mar 22, 2024 at 02:57:28PM +0000, Al Viro wrote:
> What WRITE_ONE()?  We really shouldn't modify it without ->open_mutex; do
> we ever do that?  In current mainline:

READ_ONCE must be paired with WRITE_ONCE.  All updates are under a lock,
and if you want some other scheme than the atomic_t go ahead.  I original
did READ_ONCE/WRITE_ONCE and this was changed based on review feedback.


^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2024-03-25  1:06 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-22 12:45 [RFC v4 linux-next 00/19] fs & block: remove bdev->bd_inode Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 01/19] block: move two helpers into bdev.c Yu Kuai
2024-03-15 14:31   ` Jan Kara
2024-03-17 21:19   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 02/19] block: remove sync_blockdev_nowait() Yu Kuai
2024-03-15 14:34   ` Jan Kara
2024-03-17 21:19   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 03/19] block: remove sync_blockdev_range() Yu Kuai
2024-03-15 14:37   ` Jan Kara
2024-03-17 21:21   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 04/19] block: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:44   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-03-22  5:44   ` Al Viro
2024-02-22 12:45 ` [RFC v4 linux-next 05/19] bcachefs: remove dead function bdev_sectors() Yu Kuai
2024-03-15 14:42   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 06/19] cramfs: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:44   ` Jan Kara
2024-03-17 21:23   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 07/19] erofs: " Yu Kuai
2024-03-15 14:45   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-03-18  2:39   ` Gao Xiang
2024-02-22 12:45 ` [RFC v4 linux-next 08/19] nilfs2: " Yu Kuai
2024-03-15 14:49   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 09/19] gfs2: " Yu Kuai
2024-03-15 14:54   ` Jan Kara
2024-03-17 21:24   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 10/19] s390/dasd: use bdev api in dasd_format() Yu Kuai
2024-03-15 14:55   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 11/19] btrfs: prevent direct access of bd_inode Yu Kuai
2024-03-15 15:09   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 12/19] ext4: remove block_device_ejected() Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 13/19] ext4: prevent direct access of bd_inode Yu Kuai
2024-03-15 14:58   ` Jan Kara
2024-03-17 21:25   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 14/19] jbd2: " Yu Kuai
2024-03-15 15:06   ` Jan Kara
2024-03-17 21:26   ` Christoph Hellwig
2024-03-18  1:10     ` Yu Kuai
2024-02-22 12:45 ` [RFC v4 linux-next 15/19] bcache: " Yu Kuai
2024-03-15 15:11   ` Jan Kara
2024-03-17 21:34   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 16/19] block2mtd: " Yu Kuai
2024-03-15 15:12   ` Jan Kara
2024-03-17 21:36   ` Christoph Hellwig
2024-02-22 12:45 ` [RFC v4 linux-next 17/19] dm-vdo: " Yu Kuai
2024-02-28 13:41   ` Christoph Hellwig
2024-03-18  9:11     ` Jan Kara
2024-03-18  9:19   ` Jan Kara
2024-03-18 13:38     ` Yu Kuai
2024-03-19  2:00       ` Matthew Sakai
2024-02-22 12:45 ` [RFC v4 linux-next 18/19] scsi: factor out a helper bdev_read_folio() from scsi_bios_ptable() Yu Kuai
2024-03-17 21:36   ` Christoph Hellwig
2024-03-18  1:12     ` Yu Kuai
2024-03-18  9:22   ` Jan Kara
2024-02-22 12:45 ` [RFC v4 linux-next 19/19] fs & block: remove bdev->bd_inode Yu Kuai
2024-02-25  0:06   ` kernel test robot
2024-03-17 21:38   ` Christoph Hellwig
2024-03-18  1:26     ` Yu Kuai
2024-03-18  1:32       ` Christoph Hellwig
2024-03-18  1:51         ` Yu Kuai
2024-03-18  7:19           ` Yu Kuai
2024-03-18 10:07             ` Christian Brauner
2024-03-18 10:29               ` Christian Brauner
2024-03-18 10:46                 ` Christian Brauner
2024-03-18 11:57                   ` Yu Kuai
2024-03-18 23:35                 ` Christoph Hellwig
2024-03-18 23:22             ` Christoph Hellwig
2024-03-19  8:26               ` Yu Kuai
2024-03-21 11:27                 ` Jan Kara
2024-03-21 12:15                   ` Yu Kuai
2024-03-22  6:37                     ` Al Viro
2024-03-22  6:39                       ` Al Viro
2024-03-22  6:52                         ` Yu Kuai
2024-03-22 12:57                           ` Jan Kara
2024-03-22 13:57                             ` Christian Brauner
2024-03-22 15:43                           ` Al Viro
2024-03-22 16:16                             ` Al Viro
2024-03-22  6:33                 ` Al Viro
2024-03-22  7:09                   ` Yu Kuai
2024-03-22 16:01                     ` Al Viro
2024-03-22 13:10                   ` Jan Kara
2024-03-22 14:57                     ` Al Viro
2024-03-25  1:06                       ` Christoph Hellwig
2024-02-28 13:42 ` [RFC v4 linux-next 00/19] " Christoph Hellwig
2024-03-15 12:08 ` Yu Kuai
2024-03-15 13:54   ` Christian Brauner
2024-03-16  2:49     ` Yu Kuai
2024-03-18  9:39       ` Christian Brauner
2024-03-19  1:18         ` Yu Kuai
2024-03-19  1:43           ` Yu Kuai
2024-03-19  2:13             ` Matthew Sakai
2024-03-19  2:27               ` Yu Kuai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.