linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode
@ 2024-04-06  9:09 Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 01/26] block: move two helpers into bdev.c Yu Kuai
                   ` (26 more replies)
  0 siblings, 27 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Hi, Jens!
Hi, Jan!
Hi, Christoph!
Hi, Christian!
Hi, AL!

Sorry for the delay(I was overwhelmed with other work stuff). Main changes
from last version is patch 22(modified based on [1]), the idea is that
stash a 'bdev_file' in 'bd_inode->i_private' while opening bdev the first
time, and release it when last opener close the bdev.

The patch to use bdev and bdev_file as union for iomap/buffer_head is
dropped and changes for iomap/buffer is splitted to patch 23-26.

I tested this set in my VM with blktests for virtio-scsi and xfstests
for ext4/xfs for one round now, no regerssions are found yet.

Please let me know what you think!

[1] https://lore.kernel.org/all/c62dac0e-666f-9cc9-cffe-f3d985029d6a@huaweicloud.com/

Changes from RFC v4:
 - respin on the top of vfs.all branch from vfs tree;
 - add review tag, patches that are not reviewed: patch 19-26;
 - add patch 21, fix a module reference problem;
 - instead of using a union of bdev(for raw block device) and
 bdev_file(for filesystems), add patch 22 to stash a bdev_file to
 bd_inode->i_private, so that iomap and buffer_head for raw block device
 can convert to use bdev_file as well;
 - split the huge path for iomap/buffer into 4 patches, 21-24;

Changes from RFC v3:
 - respin on the top of linux-next, based on Christian's patchset to
 open bdev as file. Most of patches from v3 is dropped and change to use
 file_inode(bdev_file) to get bd_inode or bdev_file->f_mapping to get
 bd_inode->i_mapping.

Changes from RFC v2:
 - remove bdev_associated_mapping() and patch 12 from v1;
 - add kerneldoc comments for new bdev apis;
 - rename __bdev_get_folio() to bdev_get_folio;
 - fix a problem in erofs that erofs_init_metabuf() is not always
 called.
 - add reviewed-by tag for patch 15-17;

Changes from RFC v1:
 - remove some bdev apis that is not necessary;
 - pass in offset for bdev_read_folio() and __bdev_get_folio();
 - remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
 prevent access bd_indoe() directly from mapping_gfp_constraint() in
 ext4.(patch 15, 16);
 - remove block_device_ejected() from ext4.

Yu Kuai (26):
  block: move two helpers into bdev.c
  block: remove sync_blockdev_nowait()
  block: remove sync_blockdev_range()
  block: prevent direct access of bd_inode
  block: add a helper bdev_read_folio()
  bcachefs: remove dead function bdev_sectors()
  cramfs: prevent direct access of bd_inode
  erofs: prevent direct access of bd_inode
  nilfs2: prevent direct access of bd_inode
  gfs2: prevent direct access of bd_inode
  btrfs: prevent direct access of bd_inode
  ext4: remove block_device_ejected()
  ext4: prevent direct access of bd_inode
  jbd2: prevent direct access of bd_inode
  s390/dasd: use bdev api in dasd_format()
  bcache: prevent direct access of bd_inode
  block2mtd: prevent direct access of bd_inode
  scsi: use bdev helper in scsi_bios_ptable()
  dm-vdo: convert to use bdev_file
  block: factor out a helper init_bdev_file()
  block: fix module reference leakage from bdev_open_by_dev error path
  block: stash a bdev_file to read/write raw blcok_device
  iomap: add helpers helpers to get and set bdev
  iomap: convert to use bdev_file
  buffer: add helpers to get and set bdev
  buffer: convert to use bdev_file

 block/bdev.c                              | 262 ++++++++++++++++------
 block/blk-zoned.c                         |   4 +-
 block/blk.h                               |   2 +
 block/fops.c                              |   6 +-
 block/genhd.c                             |   9 +-
 block/ioctl.c                             |   8 +-
 block/partitions/core.c                   |   8 +-
 drivers/md/bcache/super.c                 |   7 +-
 drivers/md/dm-vdo/dedupe.c                |   7 +-
 drivers/md/dm-vdo/dm-vdo-target.c         |   9 +-
 drivers/md/dm-vdo/indexer/config.c        |   2 +-
 drivers/md/dm-vdo/indexer/config.h        |   4 +-
 drivers/md/dm-vdo/indexer/index-layout.c  |   6 +-
 drivers/md/dm-vdo/indexer/index-layout.h  |   2 +-
 drivers/md/dm-vdo/indexer/index-session.c |  18 +-
 drivers/md/dm-vdo/indexer/index.c         |   4 +-
 drivers/md/dm-vdo/indexer/index.h         |   2 +-
 drivers/md/dm-vdo/indexer/indexer.h       |   6 +-
 drivers/md/dm-vdo/indexer/io-factory.c    |  17 +-
 drivers/md/dm-vdo/indexer/io-factory.h    |   4 +-
 drivers/md/dm-vdo/indexer/volume.c        |   4 +-
 drivers/md/dm-vdo/indexer/volume.h        |   2 +-
 drivers/md/dm-vdo/vdo.c                   |   2 +-
 drivers/md/md-bitmap.c                    |   2 +-
 drivers/mtd/devices/block2mtd.c           |   6 +-
 drivers/s390/block/dasd_ioctl.c           |   5 +-
 drivers/scsi/scsicam.c                    |   3 +-
 fs/affs/file.c                            |   2 +-
 fs/bcachefs/util.h                        |   5 -
 fs/btrfs/disk-io.c                        |  17 +-
 fs/btrfs/disk-io.h                        |   4 +-
 fs/btrfs/inode.c                          |   2 +-
 fs/btrfs/super.c                          |   2 +-
 fs/btrfs/volumes.c                        |  25 ++-
 fs/btrfs/zoned.c                          |  20 +-
 fs/btrfs/zoned.h                          |   4 +-
 fs/buffer.c                               | 104 ++++-----
 fs/cramfs/inode.c                         |   2 +-
 fs/direct-io.c                            |   4 +-
 fs/erofs/data.c                           |  22 +-
 fs/erofs/internal.h                       |   1 +
 fs/erofs/zmap.c                           |   2 +-
 fs/exfat/fatent.c                         |   2 +-
 fs/ext2/inode.c                           |   4 +-
 fs/ext2/xattr.c                           |   2 +-
 fs/ext4/dir.c                             |   2 +-
 fs/ext4/ext4_jbd2.c                       |   2 +-
 fs/ext4/inode.c                           |   2 +-
 fs/ext4/mmp.c                             |   2 +-
 fs/ext4/page-io.c                         |   5 +-
 fs/ext4/super.c                           |  30 +--
 fs/ext4/xattr.c                           |   2 +-
 fs/f2fs/data.c                            |  10 +-
 fs/f2fs/f2fs.h                            |   1 +
 fs/fat/inode.c                            |   2 +-
 fs/fuse/dax.c                             |   2 +-
 fs/gfs2/aops.c                            |   2 +-
 fs/gfs2/bmap.c                            |   2 +-
 fs/gfs2/glock.c                           |   2 +-
 fs/gfs2/meta_io.c                         |   2 +-
 fs/gfs2/ops_fstype.c                      |   2 +-
 fs/hpfs/file.c                            |   2 +-
 fs/iomap/buffered-io.c                    |   8 +-
 fs/iomap/direct-io.c                      |  11 +-
 fs/iomap/swapfile.c                       |   2 +-
 fs/iomap/trace.h                          |   6 +-
 fs/jbd2/commit.c                          |   2 +-
 fs/jbd2/journal.c                         |  34 +--
 fs/jbd2/recovery.c                        |   9 +-
 fs/jbd2/revoke.c                          |  14 +-
 fs/jbd2/transaction.c                     |   8 +-
 fs/mpage.c                                |  18 +-
 fs/nilfs2/btnode.c                        |   4 +-
 fs/nilfs2/gcinode.c                       |   2 +-
 fs/nilfs2/mdt.c                           |   2 +-
 fs/nilfs2/page.c                          |   4 +-
 fs/nilfs2/recovery.c                      |  27 ++-
 fs/nilfs2/segment.c                       |   2 +-
 fs/ntfs3/fsntfs.c                         |  10 +-
 fs/ntfs3/inode.c                          |   4 +-
 fs/ntfs3/super.c                          |   6 +-
 fs/ocfs2/journal.c                        |   2 +-
 fs/reiserfs/fix_node.c                    |   2 +-
 fs/reiserfs/journal.c                     |  10 +-
 fs/reiserfs/prints.c                      |   4 +-
 fs/reiserfs/reiserfs.h                    |   6 +-
 fs/reiserfs/stree.c                       |   2 +-
 fs/reiserfs/tail_conversion.c             |   2 +-
 fs/sync.c                                 |   9 +-
 fs/xfs/xfs_iomap.c                        |   4 +-
 fs/zonefs/file.c                          |   4 +-
 include/linux/blk_types.h                 |   2 +-
 include/linux/blkdev.h                    |  19 +-
 include/linux/buffer_head.h               |  81 ++++---
 include/linux/iomap.h                     |  13 +-
 include/linux/jbd2.h                      |  18 +-
 include/trace/events/block.h              |   2 +-
 97 files changed, 620 insertions(+), 440 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 01/26] block: move two helpers into bdev.c
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 02/26] block: remove sync_blockdev_nowait() Yu Kuai
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

disk_live() and block_size() access bd_inode directly, prepare to remove
the field bd_inode from block_device, and only access bd_inode in block
layer.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c           | 12 ++++++++++++
 include/linux/blkdev.h | 12 ++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index dd26d37356aa..621b9163c0f6 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1251,6 +1251,18 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 	blkdev_put_no_open(bdev);
 }
 
+bool disk_live(struct gendisk *disk)
+{
+	return !inode_unhashed(disk->part0->bd_inode);
+}
+EXPORT_SYMBOL_GPL(disk_live);
+
+unsigned int block_size(struct block_device *bdev)
+{
+	return 1 << bdev->bd_inode->i_blkbits;
+}
+EXPORT_SYMBOL_GPL(block_size);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 172c91879999..2c0d3a89002c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -211,11 +211,6 @@ struct gendisk {
 	struct blk_independent_access_ranges *ia_ranges;
 };
 
-static inline bool disk_live(struct gendisk *disk)
-{
-	return !inode_unhashed(disk->part0->bd_inode);
-}
-
 /**
  * disk_openers - returns how many openers are there for a disk
  * @disk: disk to check
@@ -1364,11 +1359,6 @@ static inline unsigned int blksize_bits(unsigned int size)
 	return order_base_2(size >> SECTOR_SHIFT) + SECTOR_SHIFT;
 }
 
-static inline unsigned int block_size(struct block_device *bdev)
-{
-	return 1 << bdev->bd_inode->i_blkbits;
-}
-
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
 
@@ -1536,6 +1526,8 @@ void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
 struct block_device *file_bdev(struct file *bdev_file);
+bool disk_live(struct gendisk *disk);
+unsigned int block_size(struct block_device *bdev);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 02/26] block: remove sync_blockdev_nowait()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 01/26] block: move two helpers into bdev.c Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 03/26] block: remove sync_blockdev_range() Yu Kuai
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to flush the file
mapping directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c           | 8 --------
 fs/fat/inode.c         | 2 +-
 fs/ntfs3/inode.c       | 2 +-
 fs/sync.c              | 9 ++++++---
 include/linux/blkdev.h | 5 -----
 5 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 621b9163c0f6..c9b056782c96 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -188,14 +188,6 @@ int sb_min_blocksize(struct super_block *sb, int size)
 
 EXPORT_SYMBOL(sb_min_blocksize);
 
-int sync_blockdev_nowait(struct block_device *bdev)
-{
-	if (!bdev)
-		return 0;
-	return filemap_flush(bdev->bd_inode->i_mapping);
-}
-EXPORT_SYMBOL_GPL(sync_blockdev_nowait);
-
 /*
  * Write out and wait upon all the dirty data associated with a block
  * device via its mapping.  Does not take the superblock lock.
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d9e6fbb6f246..ef2ac3e7c3a8 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -1945,7 +1945,7 @@ int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
 	if (!ret && i2)
 		ret = writeback_inode(i2);
 	if (!ret)
-		ret = sync_blockdev_nowait(sb->s_bdev);
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(fat_flush_inodes);
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index eb7a8c9fba01..3c4c878f6d77 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -1081,7 +1081,7 @@ int ntfs_flush_inodes(struct super_block *sb, struct inode *i1,
 	if (!ret && i2)
 		ret = writeback_inode(i2);
 	if (!ret)
-		ret = sync_blockdev_nowait(sb->s_bdev);
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
 	return ret;
 }
 
diff --git a/fs/sync.c b/fs/sync.c
index dc725914e1ed..3a43062790d9 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -57,9 +57,12 @@ int sync_filesystem(struct super_block *sb)
 		if (ret)
 			return ret;
 	}
-	ret = sync_blockdev_nowait(sb->s_bdev);
-	if (ret)
-		return ret;
+
+	if (sb->s_bdev_file) {
+		ret = filemap_flush(sb->s_bdev_file->f_mapping);
+		if (ret)
+			return ret;
+	}
 
 	sync_inodes_sb(sb);
 	if (sb->s_op->sync_fs) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2c0d3a89002c..433c880299a6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1533,7 +1533,6 @@ unsigned int block_size(struct block_device *bdev);
 void invalidate_bdev(struct block_device *bdev);
 int sync_blockdev(struct block_device *bdev);
 int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
-int sync_blockdev_nowait(struct block_device *bdev);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
@@ -1546,10 +1545,6 @@ static inline int sync_blockdev(struct block_device *bdev)
 {
 	return 0;
 }
-static inline int sync_blockdev_nowait(struct block_device *bdev)
-{
-	return 0;
-}
 static inline void sync_bdevs(bool wait)
 {
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 03/26] block: remove sync_blockdev_range()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 01/26] block: move two helpers into bdev.c Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 02/26] block: remove sync_blockdev_nowait() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 04/26] block: prevent direct access of bd_inode Yu Kuai
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to flush the file
mapping directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c           |  7 -------
 fs/btrfs/volumes.c     | 12 +++++++-----
 fs/exfat/fatent.c      |  2 +-
 include/linux/blkdev.h |  1 -
 4 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index c9b056782c96..d53bf2f46b43 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -200,13 +200,6 @@ int sync_blockdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(sync_blockdev);
 
-int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend)
-{
-	return filemap_write_and_wait_range(bdev->bd_inode->i_mapping,
-			lstart, lend);
-}
-EXPORT_SYMBOL(sync_blockdev_range);
-
 /**
  * bdev_freeze - lock a filesystem and force it into a consistent state
  * @bdev:	blockdevice to lock
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1dc1f1946ae0..6f130c749dbc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2050,14 +2050,14 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
 }
 
 static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
-				     struct block_device *bdev, int copy_num)
+				     struct file *bdev_file, int copy_num)
 {
 	struct btrfs_super_block *disk_super;
 	const size_t len = sizeof(disk_super->magic);
 	const u64 bytenr = btrfs_sb_offset(copy_num);
 	int ret;
 
-	disk_super = btrfs_read_disk_super(bdev, bytenr, bytenr);
+	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
 	if (IS_ERR(disk_super))
 		return;
 
@@ -2065,7 +2065,8 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
 	folio_mark_dirty(virt_to_folio(disk_super));
 	btrfs_release_disk_super(disk_super);
 
-	ret = sync_blockdev_range(bdev, bytenr, bytenr + len - 1);
+	ret = filemap_write_and_wait_range(bdev_file->f_mapping,
+					   bytenr, bytenr + len - 1);
 	if (ret)
 		btrfs_warn(fs_info, "error clearing superblock number %d (%d)",
 			copy_num, ret);
@@ -2075,15 +2076,16 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, struct btrfs_devic
 {
 	int copy_num;
 	struct block_device *bdev = device->bdev;
+	struct file *bdev_file = device->bdev_file;
 
-	if (!bdev)
+	if (!bdev || !bdev_file)
 		return;
 
 	for (copy_num = 0; copy_num < BTRFS_SUPER_MIRROR_MAX; copy_num++) {
 		if (bdev_is_zoned(bdev))
 			btrfs_reset_sb_log_zones(bdev, copy_num);
 		else
-			btrfs_scratch_superblock(fs_info, bdev, copy_num);
+			btrfs_scratch_superblock(fs_info, bdev_file, copy_num);
 	}
 
 	/* Notify udev that device has changed */
diff --git a/fs/exfat/fatent.c b/fs/exfat/fatent.c
index 56b870d9cc0d..1c86ec2465b7 100644
--- a/fs/exfat/fatent.c
+++ b/fs/exfat/fatent.c
@@ -296,7 +296,7 @@ int exfat_zeroed_cluster(struct inode *dir, unsigned int clu)
 	}
 
 	if (IS_DIRSYNC(dir))
-		return sync_blockdev_range(sb->s_bdev,
+		return filemap_write_and_wait_range(sb->s_bdev_file->f_mapping,
 				EXFAT_BLK_TO_B(blknr, sb),
 				EXFAT_BLK_TO_B(last_blknr, sb) - 1);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 433c880299a6..08d4e6a0940c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1532,7 +1532,6 @@ unsigned int block_size(struct block_device *bdev);
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
 int sync_blockdev(struct block_device *bdev);
-int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 04/26] block: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (2 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 03/26] block: remove sync_blockdev_range() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-07  2:22   ` Al Viro
  2024-04-06  9:09 ` [PATCH vfs.all 05/26] block: add a helper bdev_read_folio() Yu Kuai
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Add helpers to access bd_inode, prepare to remove the field 'bd_inode'
after removing all the access from filesystems and drivers.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c            | 58 +++++++++++++++++++++++++++--------------
 block/blk-zoned.c       |  4 +--
 block/blk.h             |  2 ++
 block/fops.c            |  2 +-
 block/genhd.c           |  9 ++++---
 block/ioctl.c           |  8 +++---
 block/partitions/core.c |  8 +++---
 7 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index d53bf2f46b43..c0b30392563a 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -43,6 +43,21 @@ static inline struct bdev_inode *BDEV_I(struct inode *inode)
 	return container_of(inode, struct bdev_inode, vfs_inode);
 }
 
+static inline struct bdev_inode *BDEV_B(struct block_device *bdev)
+{
+	return container_of(bdev, struct bdev_inode, bdev);
+}
+
+struct inode *bdev_inode(struct block_device *bdev)
+{
+	return &BDEV_B(bdev)->vfs_inode;
+}
+
+struct address_space *bdev_mapping(struct block_device *bdev)
+{
+	return BDEV_B(bdev)->vfs_inode.i_mapping;
+}
+
 struct block_device *I_BDEV(struct inode *inode)
 {
 	return &BDEV_I(inode)->bdev;
@@ -57,7 +72,7 @@ EXPORT_SYMBOL(file_bdev);
 
 static void bdev_write_inode(struct block_device *bdev)
 {
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int ret;
 
 	spin_lock(&inode->i_lock);
@@ -76,7 +91,7 @@ static void bdev_write_inode(struct block_device *bdev)
 /* Kill _all_ buffers and pagecache , dirty or not.. */
 static void kill_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(bdev);
 
 	if (mapping_empty(mapping))
 		return;
@@ -88,7 +103,7 @@ static void kill_bdev(struct block_device *bdev)
 /* Invalidate clean unused buffers and pagecache. */
 void invalidate_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(bdev);
 
 	if (mapping->nrpages) {
 		invalidate_bh_lrus();
@@ -116,7 +131,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 			goto invalidate;
 	}
 
-	truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
+	truncate_inode_pages_range(bdev_mapping(bdev), lstart, lend);
 	if (!(mode & BLK_OPEN_EXCL))
 		bd_abort_claiming(bdev, truncate_bdev_range);
 	return 0;
@@ -126,7 +141,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 	 * Someone else has handle exclusively open. Try invalidating instead.
 	 * The 'end' argument is inclusive so the rounding is safe.
 	 */
-	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
+	return invalidate_inode_pages2_range(bdev_mapping(bdev),
 					     lstart >> PAGE_SHIFT,
 					     lend >> PAGE_SHIFT);
 }
@@ -134,14 +149,14 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 static void set_init_blocksize(struct block_device *bdev)
 {
 	unsigned int bsize = bdev_logical_block_size(bdev);
-	loff_t size = i_size_read(bdev->bd_inode);
+	loff_t size = i_size_read(bdev_inode(bdev));
 
 	while (bsize < PAGE_SIZE) {
 		if (size & bsize)
 			break;
 		bsize <<= 1;
 	}
-	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
+	bdev_inode(bdev)->i_blkbits = blksize_bits(bsize);
 }
 
 int set_blocksize(struct block_device *bdev, int size)
@@ -155,9 +170,9 @@ int set_blocksize(struct block_device *bdev, int size)
 		return -EINVAL;
 
 	/* Don't change the size if it is same as current */
-	if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
+	if (bdev_inode(bdev)->i_blkbits != blksize_bits(size)) {
 		sync_blockdev(bdev);
-		bdev->bd_inode->i_blkbits = blksize_bits(size);
+		bdev_inode(bdev)->i_blkbits = blksize_bits(size);
 		kill_bdev(bdev);
 	}
 	return 0;
@@ -196,7 +211,7 @@ int sync_blockdev(struct block_device *bdev)
 {
 	if (!bdev)
 		return 0;
-	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
+	return filemap_write_and_wait(bdev_mapping(bdev));
 }
 EXPORT_SYMBOL(sync_blockdev);
 
@@ -415,19 +430,22 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
 {
 	spin_lock(&bdev->bd_size_lock);
-	i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
+	i_size_write(bdev_inode(bdev), (loff_t)sectors << SECTOR_SHIFT);
 	bdev->bd_nr_sectors = sectors;
 	spin_unlock(&bdev->bd_size_lock);
 }
 
 void bdev_add(struct block_device *bdev, dev_t dev)
 {
+	struct inode *inode;
+
 	if (bdev_stable_writes(bdev))
-		mapping_set_stable_writes(bdev->bd_inode->i_mapping);
+		mapping_set_stable_writes(bdev_mapping(bdev));
 	bdev->bd_dev = dev;
-	bdev->bd_inode->i_rdev = dev;
-	bdev->bd_inode->i_ino = dev;
-	insert_inode_hash(bdev->bd_inode);
+	inode = bdev_inode(bdev);
+	inode->i_rdev = dev;
+	inode->i_ino = dev;
+	insert_inode_hash(inode);
 }
 
 long nr_blockdev_pages(void)
@@ -893,7 +911,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 		bdev_file->f_mode |= FMODE_NOWAIT;
 	if (mode & BLK_OPEN_RESTRICT_WRITES)
 		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
-	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
+	bdev_file->f_mapping = bdev_mapping(bdev);
 	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
 	bdev_file->private_data = holder;
 
@@ -955,13 +973,13 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 		return ERR_PTR(-ENXIO);
 
 	flags = blk_to_file_flags(mode);
-	bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
+	bdev_file = alloc_file_pseudo_noaccount(bdev_inode(bdev),
 			blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
 	if (IS_ERR(bdev_file)) {
 		blkdev_put_no_open(bdev);
 		return bdev_file;
 	}
-	ihold(bdev->bd_inode);
+	ihold(bdev_inode(bdev));
 
 	ret = bdev_open(bdev, mode, holder, hops, bdev_file);
 	if (ret) {
@@ -1238,13 +1256,13 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 
 bool disk_live(struct gendisk *disk)
 {
-	return !inode_unhashed(disk->part0->bd_inode);
+	return !inode_unhashed(bdev_inode(disk->part0));
 }
 EXPORT_SYMBOL_GPL(disk_live);
 
 unsigned int block_size(struct block_device *bdev)
 {
-	return 1 << bdev->bd_inode->i_blkbits;
+	return 1 << bdev_inode(bdev)->i_blkbits;
 }
 EXPORT_SYMBOL_GPL(block_size);
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index da0f4b2a8fa0..7e6805250317 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -398,7 +398,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 		op = REQ_OP_ZONE_RESET;
 
 		/* Invalidate the page cache, including dirty pages. */
-		filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_lock(bdev_mapping(bdev));
 		ret = blkdev_truncate_zone_range(bdev, mode, &zrange);
 		if (ret)
 			goto fail;
@@ -420,7 +420,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 
 fail:
 	if (cmd == BLKRESETZONE)
-		filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_unlock(bdev_mapping(bdev));
 
 	return ret;
 }
diff --git a/block/blk.h b/block/blk.h
index 5cac4e29ae17..a34bb590cce6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -427,6 +427,8 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
 }
 #endif /* CONFIG_BLK_DEV_ZONED */
 
+struct inode *bdev_inode(struct block_device *bdev);
+struct address_space *bdev_mapping(struct block_device *bdev);
 struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
 void bdev_add(struct block_device *bdev, dev_t dev);
 
diff --git a/block/fops.c b/block/fops.c
index af6c244314af..58b427051c0d 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -669,7 +669,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
 	struct block_device *bdev = I_BDEV(file->f_mapping->host);
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = bdev_inode(bdev);
 	loff_t size = bdev_nr_bytes(bdev);
 	size_t shorted = 0;
 	ssize_t ret;
diff --git a/block/genhd.c b/block/genhd.c
index bb29a68e1d67..9a7d1b7e9e95 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -656,7 +656,7 @@ void del_gendisk(struct gendisk *disk)
 	 */
 	mutex_lock(&disk->open_mutex);
 	xa_for_each(&disk->part_tbl, idx, part)
-		remove_inode_hash(part->bd_inode);
+		remove_inode_hash(bdev_inode(part));
 	mutex_unlock(&disk->open_mutex);
 
 	/*
@@ -745,7 +745,7 @@ void invalidate_disk(struct gendisk *disk)
 	struct block_device *bdev = disk->part0;
 
 	invalidate_bdev(bdev);
-	bdev->bd_inode->i_mapping->wb_err = 0;
+	bdev_mapping(bdev)->wb_err = 0;
 	set_capacity(disk, 0);
 }
 EXPORT_SYMBOL(invalidate_disk);
@@ -1191,7 +1191,8 @@ static void disk_release(struct device *dev)
 	if (test_bit(GD_ADDED, &disk->state) && disk->fops->free_disk)
 		disk->fops->free_disk(disk);
 
-	iput(disk->part0->bd_inode);	/* frees the disk */
+	/* frees the disk */
+	iput(bdev_inode(disk->part0));
 }
 
 static int block_uevent(const struct device *dev, struct kobj_uevent_env *env)
@@ -1381,7 +1382,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id,
 out_destroy_part_tbl:
 	xa_destroy(&disk->part_tbl);
 	disk->part0->bd_disk = NULL;
-	iput(disk->part0->bd_inode);
+	iput(bdev_inode(disk->part0));
 out_free_bdi:
 	bdi_put(disk->bdi);
 out_free_bioset:
diff --git a/block/ioctl.c b/block/ioctl.c
index 0c76137adcaa..0f78806abb62 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -97,7 +97,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, len;
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
@@ -151,12 +151,12 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
 	if (start + len > bdev_nr_bytes(bdev))
 		return -EINVAL;
 
-	filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_lock(bdev_mapping(bdev));
 	err = truncate_bdev_range(bdev, mode, start, start + len - 1);
 	if (!err)
 		err = blkdev_issue_secure_erase(bdev, start >> 9, len >> 9,
 						GFP_KERNEL);
-	filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_unlock(bdev_mapping(bdev));
 	return err;
 }
 
@@ -166,7 +166,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, end, len;
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = bdev_inode(bdev);
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
diff --git a/block/partitions/core.c b/block/partitions/core.c
index b11e88c82c8c..ddd418758fa4 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -243,7 +243,7 @@ static const struct attribute_group *part_attr_groups[] = {
 static void part_release(struct device *dev)
 {
 	put_disk(dev_to_bdev(dev)->bd_disk);
-	iput(dev_to_bdev(dev)->bd_inode);
+	iput(bdev_inode(dev_to_bdev(dev)));
 }
 
 static int part_uevent(const struct device *dev, struct kobj_uevent_env *env)
@@ -469,7 +469,7 @@ int bdev_del_partition(struct gendisk *disk, int partno)
 	 * Just delete the partition and invalidate it.
 	 */
 
-	remove_inode_hash(part->bd_inode);
+	remove_inode_hash(bdev_inode(part));
 	invalidate_bdev(part);
 	drop_partition(part);
 	ret = 0;
@@ -655,7 +655,7 @@ int bdev_disk_changed(struct gendisk *disk, bool invalidate)
 		 * it cannot be looked up any more even when openers
 		 * still hold references.
 		 */
-		remove_inode_hash(part->bd_inode);
+		remove_inode_hash(bdev_inode(part));
 
 		/*
 		 * If @disk->open_partitions isn't elevated but there's
@@ -704,7 +704,7 @@ EXPORT_SYMBOL_GPL(bdev_disk_changed);
 
 void *read_part_sector(struct parsed_partitions *state, sector_t n, Sector *p)
 {
-	struct address_space *mapping = state->disk->part0->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_mapping(state->disk->part0);
 	struct folio *folio;
 
 	if (n >= get_capacity(state->disk)) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 05/26] block: add a helper bdev_read_folio()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (3 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 04/26] block: prevent direct access of bd_inode Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 06/26] bcachefs: remove dead function bdev_sectors() Yu Kuai
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Currently scsi driver is reading without opening disk as file(
scsi_bios_ptable()), factor out a helper to read into block device page
cache to prevent access bd_inode directly from scsi.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 block/bdev.c           | 19 +++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 20 insertions(+)

diff --git a/block/bdev.c b/block/bdev.c
index c0b30392563a..4335df6d1266 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1266,6 +1266,25 @@ unsigned int block_size(struct block_device *bdev)
 }
 EXPORT_SYMBOL_GPL(block_size);
 
+/**
+ * bdev_read_folio - Read into block device page cache.
+ * @bdev: the block device which holds the cache to read.
+ * @pos: the offset that allocated folio will contain.
+ *
+ * Read one page into the block device page cache. If it succeeds, the folio
+ * returned will contain @pos;
+ *
+ * This is only used for scsi_bios_ptable(), the bdev is not opened as files.
+ *
+ * Return: Uptodate folio on success, ERR_PTR() on failure.
+ */
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
+{
+	return mapping_read_folio_gfp(bdev_mapping(bdev),
+				      pos >> PAGE_SHIFT, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 08d4e6a0940c..bc840e0fb6e5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1519,6 +1519,7 @@ struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
 int bd_prepare_to_claim(struct block_device *bdev, void *holder,
 		const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos);
 
 /* just for blk-cgroup, don't use elsewhere */
 struct block_device *blkdev_get_no_open(dev_t dev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 06/26] bcachefs: remove dead function bdev_sectors()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (4 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 05/26] block: add a helper bdev_read_folio() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 07/26] cramfs: prevent direct access of bd_inode Yu Kuai
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

bdev_sectors() is not used hence remove it.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/bcachefs/util.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index 175aee3074c7..960827eddff1 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -445,11 +445,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
 void bch2_bio_map(struct bio *bio, void *base, size_t);
 int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
 
-static inline sector_t bdev_sectors(struct block_device *bdev)
-{
-	return bdev->bd_inode->i_size >> 9;
-}
-
 #define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 07/26] cramfs: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (5 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 06/26] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 08/26] erofs: " Yu Kuai
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get bdev mapping
from the file directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/cramfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 9901057a15ba..38416c245ced 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -183,7 +183,7 @@ static int next_buffer;
 static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 				unsigned int len)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev_file->f_mapping;
 	struct file_ra_state ra = {};
 	struct page *pages[BLKS_PER_BUF];
 	unsigned i, blocknr, buffer;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (6 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 07/26] cramfs: prevent direct access of bd_inode Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-07  4:05   ` Al Viro
  2024-04-06  9:09 ` [PATCH vfs.all 09/26] nilfs2: " Yu Kuai
                   ` (18 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
for the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
 fs/erofs/data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 52524bd9698b..b0a55b4d8c30 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -70,7 +70,7 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 	if (erofs_is_fscache_mode(sb))
 		buf->inode = EROFS_SB(sb)->s_fscache->inode;
 	else
-		buf->inode = sb->s_bdev->bd_inode;
+		buf->inode = file_inode(sb->s_bdev_file);
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 09/26] nilfs2: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (7 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 08/26] erofs: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 10/26] gfs2: " Yu Kuai
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/nilfs2/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index aa5290cb7467..2940e8ef88f4 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2790,7 +2790,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
 	if (!nilfs->ns_writer)
 		return -ENOMEM;
 
-	inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
+	inode_attach_wb(file_inode(nilfs->ns_sb->s_bdev_file), NULL);
 
 	err = nilfs_segctor_start_thread(nilfs->ns_writer);
 	if (unlikely(err))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 10/26] gfs2: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (8 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 09/26] nilfs2: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 11/26] btrfs: " Yu Kuai
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/gfs2/glock.c      | 2 +-
 fs/gfs2/ops_fstype.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 34540f9d011c..95ade8979f6b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1227,7 +1227,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 	mapping = gfs2_glock2aspace(gl);
 	if (mapping) {
                 mapping->a_ops = &gfs2_meta_aops;
-		mapping->host = s->s_bdev->bd_inode;
+		mapping->host = file_inode(s->s_bdev_file);
 		mapping->flags = 0;
 		mapping_set_gfp_mask(mapping, GFP_NOFS);
 		mapping->i_private_data = NULL;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 572d58e86296..4384cb39b06c 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -114,7 +114,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	address_space_init_once(mapping);
 	mapping->a_ops = &gfs2_rgrp_aops;
-	mapping->host = sb->s_bdev->bd_inode;
+	mapping->host = file_inode(sb->s_bdev_file);
 	mapping->flags = 0;
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	mapping->i_private_data = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 11/26] btrfs: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (9 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 10/26] gfs2: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 12/26] ext4: remove block_device_ejected() Yu Kuai
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get inode or
mapping from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/btrfs/disk-io.c | 17 +++++++++--------
 fs/btrfs/disk-io.h |  4 ++--
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 15 +++++++--------
 fs/btrfs/zoned.c   | 20 +++++++++++---------
 fs/btrfs/zoned.h   |  4 ++--
 6 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3df5477d48a8..b565eef527a4 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3233,7 +3233,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	/*
 	 * Read super block and check the signature bytes only
 	 */
-	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
+	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev_file);
 	if (IS_ERR(disk_super)) {
 		ret = PTR_ERR(disk_super);
 		goto fail_alloc;
@@ -3645,17 +3645,18 @@ static void btrfs_end_super_write(struct bio *bio)
 	bio_put(bio);
 }
 
-struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
+struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
 						   int copy_num, bool drop_cache)
 {
 	struct btrfs_super_block *super;
 	struct page *page;
 	u64 bytenr, bytenr_orig;
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct block_device *bdev = file_bdev(bdev_file);
+	struct address_space *mapping = bdev_file->f_mapping;
 	int ret;
 
 	bytenr_orig = btrfs_sb_offset(copy_num);
-	ret = btrfs_sb_log_location_bdev(bdev, copy_num, READ, &bytenr);
+	ret = btrfs_sb_log_location_bdev(bdev_file, copy_num, READ, &bytenr);
 	if (ret == -ENOENT)
 		return ERR_PTR(-EINVAL);
 	else if (ret)
@@ -3696,7 +3697,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 }
 
 
-struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
+struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file)
 {
 	struct btrfs_super_block *super, *latest = NULL;
 	int i;
@@ -3708,7 +3709,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev)
 	 * later supers, using BTRFS_SUPER_MIRROR_MAX instead
 	 */
 	for (i = 0; i < 1; i++) {
-		super = btrfs_read_dev_one_super(bdev, i, false);
+		super = btrfs_read_dev_one_super(bdev_file, i, false);
 		if (IS_ERR(super))
 			continue;
 
@@ -3738,7 +3739,7 @@ static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
+	struct address_space *mapping = device->bdev_file->f_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
 	int errors = 0;
@@ -3855,7 +3856,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		    device->commit_total_bytes)
 			break;
 
-		page = find_get_page(device->bdev->bd_inode->i_mapping,
+		page = find_get_page(device->bdev_file->f_mapping,
 				     bytenr >> PAGE_SHIFT);
 		if (!page) {
 			errors++;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 76eb53fe7a11..8470426cd8e8 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -60,8 +60,8 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 			 struct btrfs_super_block *sb, int mirror_num);
 int btrfs_check_features(struct btrfs_fs_info *fs_info, bool is_rw_mount);
 int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors);
-struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev);
-struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
+struct btrfs_super_block *btrfs_read_dev_super(struct file *bdev_file);
+struct btrfs_super_block *btrfs_read_dev_one_super(struct file *bdev_file,
 						   int copy_num, bool drop_cache);
 int btrfs_commit_super(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 7e44ccaf348f..cbe64e9e22a8 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2289,7 +2289,7 @@ static int check_dev_super(struct btrfs_device *dev)
 		return 0;
 
 	/* Only need to check the primary super block. */
-	sb = btrfs_read_dev_one_super(dev->bdev, 0, true);
+	sb = btrfs_read_dev_one_super(dev->bdev_file, 0, true);
 	if (IS_ERR(sb))
 		return PTR_ERR(sb);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6f130c749dbc..e6c8b3a40ef7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -488,7 +488,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 		goto error;
 	}
 	invalidate_bdev(bdev);
-	*disk_super = btrfs_read_dev_super(bdev);
+	*disk_super = btrfs_read_dev_super(*bdev_file);
 	if (IS_ERR(*disk_super)) {
 		ret = PTR_ERR(*disk_super);
 		fput(*bdev_file);
@@ -1248,7 +1248,7 @@ void btrfs_release_disk_super(struct btrfs_super_block *super)
 	put_page(page);
 }
 
-static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
+static struct btrfs_super_block *btrfs_read_disk_super(struct file *bdev_file,
 						       u64 bytenr, u64 bytenr_orig)
 {
 	struct btrfs_super_block *disk_super;
@@ -1257,7 +1257,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 	pgoff_t index;
 
 	/* make sure our super fits in the device */
-	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(bdev))
+	if (bytenr + PAGE_SIZE >= bdev_nr_bytes(file_bdev(bdev_file)))
 		return ERR_PTR(-EINVAL);
 
 	/* make sure our super fits in the page */
@@ -1270,7 +1270,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 		return ERR_PTR(-EINVAL);
 
 	/* pull in the page with our super */
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
+	page = read_cache_page_gfp(bdev_file->f_mapping, index, GFP_KERNEL);
 
 	if (IS_ERR(page))
 		return ERR_CAST(page);
@@ -1388,14 +1388,13 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
 		return ERR_CAST(bdev_file);
 
 	bytenr_orig = btrfs_sb_offset(0);
-	ret = btrfs_sb_log_location_bdev(file_bdev(bdev_file), 0, READ, &bytenr);
+	ret = btrfs_sb_log_location_bdev(bdev_file, 0, READ, &bytenr);
 	if (ret) {
 		device = ERR_PTR(ret);
 		goto error_bdev_put;
 	}
 
-	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr,
-					   bytenr_orig);
+	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr_orig);
 	if (IS_ERR(disk_super)) {
 		device = ERR_CAST(disk_super);
 		goto error_bdev_put;
@@ -2057,7 +2056,7 @@ static void btrfs_scratch_superblock(struct btrfs_fs_info *fs_info,
 	const u64 bytenr = btrfs_sb_offset(copy_num);
 	int ret;
 
-	disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr, bytenr);
+	disk_super = btrfs_read_disk_super(bdev_file, bytenr, bytenr);
 	if (IS_ERR(disk_super))
 		return;
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 5a3d5ec75c5a..b9b78374a612 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -81,7 +81,7 @@ static int copy_zone_info_cb(struct blk_zone *zone, unsigned int idx, void *data
 	return 0;
 }
 
-static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
+static int sb_write_pointer(struct file *bdev_file, struct blk_zone *zones,
 			    u64 *wp_ret)
 {
 	bool empty[BTRFS_NR_SB_LOG_ZONES];
@@ -118,7 +118,7 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 		return -ENOENT;
 	} else if (full[0] && full[1]) {
 		/* Compare two super blocks */
-		struct address_space *mapping = bdev->bd_inode->i_mapping;
+		struct address_space *mapping = bdev_file->f_mapping;
 		struct page *page[BTRFS_NR_SB_LOG_ZONES];
 		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
 		int i;
@@ -562,7 +562,7 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
 		    BLK_ZONE_TYPE_CONVENTIONAL)
 			continue;
 
-		ret = sb_write_pointer(device->bdev,
+		ret = sb_write_pointer(device->bdev_file,
 				       &zone_info->sb_zones[sb_pos], &sb_wp);
 		if (ret != -ENOENT && ret) {
 			btrfs_err_in_rcu(device->fs_info,
@@ -798,7 +798,7 @@ int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount
 	return 0;
 }
 
-static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
+static int sb_log_location(struct file *bdev_file, struct blk_zone *zones,
 			   int rw, u64 *bytenr_ret)
 {
 	u64 wp;
@@ -809,7 +809,7 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 		return 0;
 	}
 
-	ret = sb_write_pointer(bdev, zones, &wp);
+	ret = sb_write_pointer(bdev_file, zones, &wp);
 	if (ret != -ENOENT && ret < 0)
 		return ret;
 
@@ -827,7 +827,8 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 			ASSERT(sb_zone_is_full(reset));
 
 			nofs_flags = memalloc_nofs_save();
-			ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
+			ret = blkdev_zone_mgmt(file_bdev(bdev_file),
+					       REQ_OP_ZONE_RESET,
 					       reset->start, reset->len);
 			memalloc_nofs_restore(nofs_flags);
 			if (ret)
@@ -859,10 +860,11 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
 
 }
 
-int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
+int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
 			       u64 *bytenr_ret)
 {
 	struct blk_zone zones[BTRFS_NR_SB_LOG_ZONES];
+	struct block_device *bdev = file_bdev(bdev_file);
 	sector_t zone_sectors;
 	u32 sb_zone;
 	int ret;
@@ -896,7 +898,7 @@ int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
 	if (ret != BTRFS_NR_SB_LOG_ZONES)
 		return -EIO;
 
-	return sb_log_location(bdev, zones, rw, bytenr_ret);
+	return sb_log_location(bdev_file, zones, rw, bytenr_ret);
 }
 
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
@@ -920,7 +922,7 @@ int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 	if (zone_num + 1 >= zinfo->nr_zones)
 		return -ENOENT;
 
-	return sb_log_location(device->bdev,
+	return sb_log_location(device->bdev_file,
 			       &zinfo->sb_zones[BTRFS_NR_SB_LOG_ZONES * mirror],
 			       rw, bytenr_ret);
 }
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index 77c4321e331f..32680a04aa1f 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -61,7 +61,7 @@ void btrfs_destroy_dev_zone_info(struct btrfs_device *device);
 struct btrfs_zoned_device_info *btrfs_clone_dev_zone_info(struct btrfs_device *orig_dev);
 int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info);
 int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info, unsigned long *mount_opt);
-int btrfs_sb_log_location_bdev(struct block_device *bdev, int mirror, int rw,
+int btrfs_sb_log_location_bdev(struct file *bdev_file, int mirror, int rw,
 			       u64 *bytenr_ret);
 int btrfs_sb_log_location(struct btrfs_device *device, int mirror, int rw,
 			  u64 *bytenr_ret);
@@ -142,7 +142,7 @@ static inline int btrfs_check_mountopts_zoned(struct btrfs_fs_info *info,
 	return 0;
 }
 
-static inline int btrfs_sb_log_location_bdev(struct block_device *bdev,
+static inline int btrfs_sb_log_location_bdev(struct file *bdev_file,
 					     int mirror, int rw, u64 *bytenr_ret)
 {
 	*bytenr_ret = btrfs_sb_offset(mirror);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 12/26] ext4: remove block_device_ejected()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (10 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 11/26] btrfs: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 13/26] ext4: prevent direct access of bd_inode Yu Kuai
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

block_device_ejected() is added by commit bdfe0cbd746a ("Revert
"ext4: remove block_device_ejected"") in 2015. At that time 'bdi->wb'
is destroyed synchronized from del_gendisk(), hence if ext4 is still
mounted, and then mark_buffer_dirty() will reference destroyed 'wb'.
However, such problem doesn't exist anymore:

- commit d03f6cdc1fc4 ("block: Dynamically allocate and refcount
backing_dev_info") switch bdi to use refcounting;
- commit 13eec2363ef0 ("fs: Get proper reference for s_bdi"), will grab
additional reference of bdi while mounting, so that 'bdi->wb' will not
be destroyed until generic_shutdown_super().

Hence remove this dead function block_device_ejected().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3fce1b80c419..cf4666b04d2e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -492,22 +492,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else.  Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
-	struct inode *bd_inode = sb->s_bdev->bd_inode;
-	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
-	return bdi->dev == NULL;
-}
-
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
 {
 	struct super_block		*sb = journal->j_private;
@@ -6168,8 +6152,6 @@ static int ext4_commit_super(struct super_block *sb)
 
 	if (!sbh)
 		return -EINVAL;
-	if (block_device_ejected(sb))
-		return -ENODEV;
 
 	ext4_update_super(sb);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 13/26] ext4: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (11 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 12/26] ext4: remove block_device_ejected() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 14/26] jbd2: " Yu Kuai
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get mapping
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/dir.c       | 2 +-
 fs/ext4/ext4_jbd2.c | 2 +-
 fs/ext4/super.c     | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..0733bc1eec7a 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -192,7 +192,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 					(PAGE_SHIFT - inode->i_blkbits);
 			if (!ra_has_index(&file->f_ra, index))
 				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
+					sb->s_bdev_file->f_mapping,
 					&file->f_ra, file,
 					index, 1);
 			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 5d8055161acd..dbb9aff07ac1 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,7 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
 
 static void ext4_check_bdev_write_error(struct super_block *sb)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev_file->f_mapping;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int err;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index cf4666b04d2e..2a1afe6c77f2 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -244,7 +244,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
 struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 				   blk_opf_t op_flags)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
 			~__GFP_FS) | __GFP_MOVABLE;
 
 	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
@@ -253,7 +253,7 @@ struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 					    sector_t block)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev_file->f_mapping,
 			~__GFP_FS);
 
 	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
@@ -5552,7 +5552,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	 * used to detect the metadata async write error.
 	 */
 	spin_lock_init(&sbi->s_bdev_wb_lock);
-	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
+	errseq_check_and_advance(&sb->s_bdev_file->f_mapping->wb_err,
 				 &sbi->s_bdev_wb_err);
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 14/26] jbd2: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (12 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 13/26] ext4: prevent direct access of bd_inode Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format() Yu Kuai
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all filesystems stash the bdev file, it's ok to get mapping
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/super.c      |  2 +-
 fs/jbd2/journal.c    | 26 +++++++++++++++-----------
 include/linux/jbd2.h | 18 ++++++++++++++----
 3 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2a1afe6c77f2..d47c1e7e8798 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5910,7 +5910,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
 	if (IS_ERR(bdev_file))
 		return ERR_CAST(bdev_file);
 
-	journal = jbd2_journal_init_dev(file_bdev(bdev_file), sb->s_bdev, j_start,
+	journal = jbd2_journal_init_dev(bdev_file, sb->s_bdev_file, j_start,
 					j_len, sb->s_blocksize);
 	if (IS_ERR(journal)) {
 		ext4_msg(sb, KERN_ERR, "failed to create device journal");
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..abd42a6ccd0e 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1516,11 +1516,12 @@ static int journal_load_superblock(journal_t *journal)
  * very few fields yet: that has to wait until we have created the
  * journal structures from from scratch, or loaded them from disk. */
 
-static journal_t *journal_init_common(struct block_device *bdev,
-			struct block_device *fs_dev,
+static journal_t *journal_init_common(struct file *bdev_file,
+			struct file *fs_dev_file,
 			unsigned long long start, int len, int blocksize)
 {
 	static struct lock_class_key jbd2_trans_commit_key;
+	struct block_device *bdev = file_bdev(bdev_file);
 	journal_t *journal;
 	int err;
 	int n;
@@ -1531,7 +1532,9 @@ static journal_t *journal_init_common(struct block_device *bdev,
 
 	journal->j_blocksize = blocksize;
 	journal->j_dev = bdev;
-	journal->j_fs_dev = fs_dev;
+	journal->j_dev_file = bdev_file;
+	journal->j_fs_dev = file_bdev(fs_dev_file);
+	journal->j_fs_dev_file = fs_dev_file;
 	journal->j_blk_offset = start;
 	journal->j_total_len = len;
 	jbd2_init_fs_dev_write_error(journal);
@@ -1628,8 +1631,8 @@ static journal_t *journal_init_common(struct block_device *bdev,
 
 /**
  *  journal_t * jbd2_journal_init_dev() - creates and initialises a journal structure
- *  @bdev: Block device on which to create the journal
- *  @fs_dev: Device which hold journalled filesystem for this journal.
+ *  @bdev_file: Opened block device on which to create the journal
+ *  @fs_dev_file: Opened device which hold journalled filesystem for this journal.
  *  @start: Block nr Start of journal.
  *  @len:  Length of the journal in blocks.
  *  @blocksize: blocksize of journalling device
@@ -1640,13 +1643,13 @@ static journal_t *journal_init_common(struct block_device *bdev,
  *  range of blocks on an arbitrary block device.
  *
  */
-journal_t *jbd2_journal_init_dev(struct block_device *bdev,
-			struct block_device *fs_dev,
+journal_t *jbd2_journal_init_dev(struct file *bdev_file,
+			struct file *fs_dev_file,
 			unsigned long long start, int len, int blocksize)
 {
 	journal_t *journal;
 
-	journal = journal_init_common(bdev, fs_dev, start, len, blocksize);
+	journal = journal_init_common(bdev_file, fs_dev_file, start, len, blocksize);
 	if (IS_ERR(journal))
 		return ERR_CAST(journal);
 
@@ -1683,8 +1686,9 @@ journal_t *jbd2_journal_init_inode(struct inode *inode)
 		  inode->i_sb->s_id, inode->i_ino, (long long) inode->i_size,
 		  inode->i_sb->s_blocksize_bits, inode->i_sb->s_blocksize);
 
-	journal = journal_init_common(inode->i_sb->s_bdev, inode->i_sb->s_bdev,
-			blocknr, inode->i_size >> inode->i_sb->s_blocksize_bits,
+	journal = journal_init_common(inode->i_sb->s_bdev_file,
+			inode->i_sb->s_bdev_file, blocknr,
+			inode->i_size >> inode->i_sb->s_blocksize_bits,
 			inode->i_sb->s_blocksize);
 	if (IS_ERR(journal))
 		return ERR_CAST(journal);
@@ -2009,7 +2013,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
 		byte_count = (block_stop - block_start + 1) *
 				journal->j_blocksize;
 
-		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
+		truncate_inode_pages_range(journal->j_dev_file->f_mapping,
 				byte_start, byte_stop);
 
 		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..fc26730ae8ef 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -968,6 +968,11 @@ struct journal_s
 	 */
 	struct block_device	*j_dev;
 
+	/**
+	 * @j_dev_file: Opended device @j_dev.
+	 */
+	struct file		*j_dev_file;
+
 	/**
 	 * @j_blocksize: Block size for the location where we store the journal.
 	 */
@@ -993,6 +998,11 @@ struct journal_s
 	 */
 	struct block_device	*j_fs_dev;
 
+	/**
+	 * @j_fs_dev_file: Opened device @j_fs_dev.
+	 */
+	struct file		*j_fs_dev_file;
+
 	/**
 	 * @j_fs_dev_wb_err:
 	 *
@@ -1533,8 +1543,8 @@ extern void	 jbd2_journal_unlock_updates (journal_t *);
 
 void jbd2_journal_wait_updates(journal_t *);
 
-extern journal_t * jbd2_journal_init_dev(struct block_device *bdev,
-				struct block_device *fs_dev,
+extern journal_t *jbd2_journal_init_dev(struct file *bdev_file,
+				struct file *fs_dev_file,
 				unsigned long long start, int len, int bsize);
 extern journal_t * jbd2_journal_init_inode (struct inode *);
 extern int	   jbd2_journal_update_format (journal_t *);
@@ -1696,7 +1706,7 @@ static inline void jbd2_journal_abort_handle(handle_t *handle)
 
 static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
 
 	/*
 	 * Save the original wb_err value of client fs's bdev mapping which
@@ -1707,7 +1717,7 @@ static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 
 static inline int jbd2_check_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev_file->f_mapping;
 
 	return errseq_check(&mapping->wb_err,
 			    READ_ONCE(journal->j_fs_dev_wb_err));
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (13 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 14/26] jbd2: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-16  1:35   ` Al Viro
  2024-04-06  9:09 ` [PATCH vfs.all 16/26] bcache: prevent direct access of bd_inode Yu Kuai
                   ` (11 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/s390/block/dasd_ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 7e0ed7032f76..c1201590f343 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
 	 * enabling the device later.
 	 */
 	if (fdata->start_unit == 0) {
-		block->gdp->part0->bd_inode->i_blkbits =
-			blksize_bits(fdata->blksize);
+		rc = set_blocksize(block->gdp->part0, fdata->blksize);
+		if (rc)
+			return rc;
 	}
 
 	rc = base->discipline->format_device(base, fdata, 1);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 16/26] bcache: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (14 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 17/26] block2mtd: " Yu Kuai
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that all bcache stash the file of opened bdev, it's ok to get
mapping from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/md/bcache/super.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 330bcd9ea4a9..be8565691abe 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -163,15 +163,16 @@ static const char *read_super_common(struct cache_sb *sb,  struct block_device *
 }
 
 
-static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
+static const char *read_super(struct cache_sb *sb, struct file *bdev_file,
 			      struct cache_sb_disk **res)
 {
 	const char *err;
+	struct block_device *bdev = file_bdev(bdev_file);
 	struct cache_sb_disk *s;
 	struct page *page;
 	unsigned int i;
 
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+	page = read_cache_page_gfp(bdev_file->f_mapping,
 				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
 	if (IS_ERR(page))
 		return "IO error";
@@ -2558,7 +2559,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 	if (set_blocksize(file_bdev(bdev_file), 4096))
 		goto out_blkdev_put;
 
-	err = read_super(sb, file_bdev(bdev_file), &sb_disk);
+	err = read_super(sb, bdev_file, &sb_disk);
 	if (err)
 		goto out_blkdev_put;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 17/26] block2mtd: prevent direct access of bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (15 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 16/26] bcache: prevent direct access of bd_inode Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 18/26] scsi: use bdev helper in scsi_bios_ptable() Yu Kuai
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that block2mtd stash the file of opened bdev, it's ok to get inode
from the file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/mtd/devices/block2mtd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index caacdc0a3819..1834692790a6 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -265,6 +265,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 	struct file *bdev_file;
 	struct block_device *bdev;
 	struct block2mtd_dev *dev;
+	loff_t size;
 	char *name;
 
 	if (!devname)
@@ -291,7 +292,8 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		goto err_free_block2mtd;
 	}
 
-	if ((long)bdev->bd_inode->i_size % erase_size) {
+	size = i_size_read(file_inode(bdev_file));
+	if ((long)size % erase_size) {
 		pr_err("erasesize must be a divisor of device size\n");
 		goto err_free_block2mtd;
 	}
@@ -309,7 +311,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 
 	dev->mtd.name = name;
 
-	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+	dev->mtd.size = size & PAGE_MASK;
 	dev->mtd.erasesize = erase_size;
 	dev->mtd.writesize = 1;
 	dev->mtd.writebufsize = PAGE_SIZE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 18/26] scsi: use bdev helper in scsi_bios_ptable()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (16 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 17/26] block2mtd: " Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file Yu Kuai
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

scsi_bios_ptable() is reading without opening disk as file, use the new
helper to read into block device page cache to prevent access bd_inode
directly from scsi.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/scsi/scsicam.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..1c99b964a0eb 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,11 +32,10 @@
  */
 unsigned char *scsi_bios_ptable(struct block_device *dev)
 {
-	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
 	unsigned char *res = NULL;
 	struct folio *folio;
 
-	folio = read_mapping_folio(mapping, 0, NULL);
+	folio = bdev_read_folio(bdev_whole(dev), 0);
 	if (IS_ERR(folio))
 		return NULL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (17 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 18/26] scsi: use bdev helper in scsi_bios_ptable() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-10 10:56   ` Jan Kara
  2024-04-10 17:26   ` Matthew Sakai
  2024-04-06  9:09 ` [PATCH vfs.all 20/26] block: factor out a helper init_bdev_file() Yu Kuai
                   ` (7 subsequent siblings)
  26 siblings, 2 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

Now that dm upper layer already statsh the file of opened device in
'dm_dev->bdev_file', it's ok to get inode from the file.

There are no functional changes, prepare to remove 'bd_inode' from
block_device.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/dm-vdo/dedupe.c                |  7 ++++---
 drivers/md/dm-vdo/dm-vdo-target.c         |  9 +++++++--
 drivers/md/dm-vdo/indexer/config.c        |  2 +-
 drivers/md/dm-vdo/indexer/config.h        |  4 ++--
 drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
 drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
 drivers/md/dm-vdo/indexer/index-session.c | 18 ++++++++++--------
 drivers/md/dm-vdo/indexer/index.c         |  4 ++--
 drivers/md/dm-vdo/indexer/index.h         |  2 +-
 drivers/md/dm-vdo/indexer/indexer.h       |  6 +++---
 drivers/md/dm-vdo/indexer/io-factory.c    | 17 +++++++++--------
 drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
 drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
 drivers/md/dm-vdo/indexer/volume.h        |  2 +-
 drivers/md/dm-vdo/vdo.c                   |  2 +-
 15 files changed, 49 insertions(+), 40 deletions(-)

diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
index 117266e1b3ae..0e311989247e 100644
--- a/drivers/md/dm-vdo/dedupe.c
+++ b/drivers/md/dm-vdo/dedupe.c
@@ -2191,7 +2191,7 @@ static int initialize_index(struct vdo *vdo, struct hash_zones *zones)
 	uds_offset = ((vdo_get_index_region_start(geometry) -
 		       geometry.bio_offset) * VDO_BLOCK_SIZE);
 	zones->parameters = (struct uds_parameters) {
-		.bdev = vdo->device_config->owned_device->bdev,
+		.bdev_file = vdo->device_config->owned_device->bdev_file,
 		.offset = uds_offset,
 		.size = (vdo_get_index_region_size(geometry) * VDO_BLOCK_SIZE),
 		.memory_size = geometry.index_config.mem,
@@ -2582,8 +2582,9 @@ static void resume_index(void *context, struct vdo_completion *parent)
 	struct device_config *config = parent->vdo->device_config;
 	int result;
 
-	zones->parameters.bdev = config->owned_device->bdev;
-	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
+	zones->parameters.bdev_file = config->owned_device->bdev_file;
+	result = uds_resume_index_session(zones->index_session,
+					  zones->parameters.bdev_file);
 	if (result != UDS_SUCCESS)
 		vdo_log_error_strerror(result, "Error resuming dedupe index");
 
diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
index 5a4b0a927f56..79e861c2887c 100644
--- a/drivers/md/dm-vdo/dm-vdo-target.c
+++ b/drivers/md/dm-vdo/dm-vdo-target.c
@@ -696,6 +696,11 @@ static void handle_parse_error(struct device_config *config, char **error_ptr,
 	*error_ptr = error_str;
 }
 
+static loff_t vdo_get_device_size(const struct device_config *config)
+{
+	return i_size_read(file_inode(config->owned_device->bdev_file));
+}
+
 /**
  * parse_device_config() - Convert the dmsetup table into a struct device_config.
  * @argc: The number of table values.
@@ -878,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
 	}
 
 	if (config->version == 0) {
-		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
+		u64 device_size = vdo_get_device_size(config);
 
 		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
 	}
@@ -1011,7 +1016,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
 
 static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
 {
-	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
+	return vdo_get_device_size(vdo->device_config) / VDO_BLOCK_SIZE;
 }
 
 static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
index 5532371b952f..dcf0742a6145 100644
--- a/drivers/md/dm-vdo/indexer/config.c
+++ b/drivers/md/dm-vdo/indexer/config.c
@@ -344,7 +344,7 @@ int uds_make_configuration(const struct uds_parameters *params,
 	config->volume_index_mean_delta = DEFAULT_VOLUME_INDEX_MEAN_DELTA;
 	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
 	config->nonce = params->nonce;
-	config->bdev = params->bdev;
+	config->bdev_file = params->bdev_file;
 	config->offset = params->offset;
 	config->size = params->size;
 
diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
index 08507dc2f7a1..8ba0cf72dec9 100644
--- a/drivers/md/dm-vdo/indexer/config.h
+++ b/drivers/md/dm-vdo/indexer/config.h
@@ -25,8 +25,8 @@ enum {
 
 /* A set of configuration parameters for the indexer. */
 struct uds_configuration {
-	/* Storage device for the index */
-	struct block_device *bdev;
+	/* File of opened storage device for the index */
+	struct file *bdev_file;
 
 	/* The maximum allowable size of the index */
 	size_t size;
diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
index 627adc24af3b..32eee76bc246 100644
--- a/drivers/md/dm-vdo/indexer/index-layout.c
+++ b/drivers/md/dm-vdo/indexer/index-layout.c
@@ -1668,7 +1668,7 @@ static int create_layout_factory(struct index_layout *layout,
 	size_t writable_size;
 	struct io_factory *factory = NULL;
 
-	result = uds_make_io_factory(config->bdev, &factory);
+	result = uds_make_io_factory(config->bdev_file, &factory);
 	if (result != UDS_SUCCESS)
 		return result;
 
@@ -1741,9 +1741,9 @@ void uds_free_index_layout(struct index_layout *layout)
 }
 
 int uds_replace_index_layout_storage(struct index_layout *layout,
-				     struct block_device *bdev)
+				     struct file *bdev_file)
 {
-	return uds_replace_storage(layout->factory, bdev);
+	return uds_replace_storage(layout->factory, bdev_file);
 }
 
 /* Obtain a dm_bufio_client for the volume region. */
diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
index e9ac6f4302d6..28f9be577631 100644
--- a/drivers/md/dm-vdo/indexer/index-layout.h
+++ b/drivers/md/dm-vdo/indexer/index-layout.h
@@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
 void uds_free_index_layout(struct index_layout *layout);
 
 int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
-						  struct block_device *bdev);
+						  struct file *bdev_file);
 
 int __must_check uds_load_index_state(struct index_layout *layout,
 				      struct uds_index *index);
diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
index aee0914d604a..914abf5e006b 100644
--- a/drivers/md/dm-vdo/indexer/index-session.c
+++ b/drivers/md/dm-vdo/indexer/index-session.c
@@ -335,7 +335,7 @@ int uds_open_index(enum uds_open_index_type open_type,
 		vdo_log_error("missing required parameters");
 		return -EINVAL;
 	}
-	if (parameters->bdev == NULL) {
+	if (parameters->bdev_file == NULL) {
 		vdo_log_error("missing required block device");
 		return -EINVAL;
 	}
@@ -349,7 +349,7 @@ int uds_open_index(enum uds_open_index_type open_type,
 		return uds_status_to_errno(result);
 
 	session->parameters = *parameters;
-	format_dev_t(name, parameters->bdev->bd_dev);
+	format_dev_t(name, file_bdev(parameters->bdev_file)->bd_dev);
 	vdo_log_info("%s: %s", get_open_type_string(open_type), name);
 
 	result = initialize_index_session(session, open_type);
@@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
 	return uds_status_to_errno(result);
 }
 
-static int replace_device(struct uds_index_session *session, struct block_device *bdev)
+static int replace_device(struct uds_index_session *session,
+			  struct file *bdev_file)
 {
 	int result;
 
-	result = uds_replace_index_storage(session->index, bdev);
+	result = uds_replace_index_storage(session->index, bdev_file);
 	if (result != UDS_SUCCESS)
 		return result;
 
-	session->parameters.bdev = bdev;
+	session->parameters.bdev_file = bdev_file;
 	return UDS_SUCCESS;
 }
 
@@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
  * device differs from the current backing store, the index will start using the new backing store.
  */
 int uds_resume_index_session(struct uds_index_session *session,
-			     struct block_device *bdev)
+			     struct file *bdev_file)
 {
 	int result = UDS_SUCCESS;
 	bool no_work = false;
@@ -502,8 +503,9 @@ int uds_resume_index_session(struct uds_index_session *session,
 	if (no_work)
 		return result;
 
-	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
-		result = replace_device(session, bdev);
+	if (session->index != NULL &&
+	    bdev_file != session->parameters.bdev_file) {
+		result = replace_device(session, bdev_file);
 		if (result != UDS_SUCCESS) {
 			mutex_lock(&session->request_mutex);
 			session->state &= ~IS_FLAG_WAITING;
diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
index 1ba767144426..48b16275a067 100644
--- a/drivers/md/dm-vdo/indexer/index.c
+++ b/drivers/md/dm-vdo/indexer/index.c
@@ -1336,9 +1336,9 @@ int uds_save_index(struct uds_index *index)
 	return result;
 }
 
-int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
+int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
 {
-	return uds_replace_volume_storage(index->volume, index->layout, bdev);
+	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
 }
 
 /* Accessing statistics should be safe from any thread. */
diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
index edabb239548e..6e2e203f43f7 100644
--- a/drivers/md/dm-vdo/indexer/index.h
+++ b/drivers/md/dm-vdo/indexer/index.h
@@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
 void uds_free_index(struct uds_index *index);
 
 int __must_check uds_replace_index_storage(struct uds_index *index,
-					   struct block_device *bdev);
+					   struct file *bdev_file);
 
 void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
 
diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
index 3744aaf625b0..246ff2810e01 100644
--- a/drivers/md/dm-vdo/indexer/indexer.h
+++ b/drivers/md/dm-vdo/indexer/indexer.h
@@ -128,8 +128,8 @@ struct uds_volume_record {
 };
 
 struct uds_parameters {
-	/* The block_device used for storage */
-	struct block_device *bdev;
+	/* The bdev_file used for storage */
+	struct file *bdev_file;
 	/* The maximum allowable size of the index on storage */
 	size_t size;
 	/* The offset where the index should start */
@@ -314,7 +314,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
  * start using the new backing store instead.
  */
 int __must_check uds_resume_index_session(struct uds_index_session *session,
-					  struct block_device *bdev);
+					  struct file *bdev_file);
 
 /* Wait until all outstanding index operations are complete. */
 int __must_check uds_flush_index_session(struct uds_index_session *session);
diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
index 515765d35794..f4dedb7b7f40 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.c
+++ b/drivers/md/dm-vdo/indexer/io-factory.c
@@ -22,7 +22,7 @@
  * make helper structures that can be used to access sections of the index.
  */
 struct io_factory {
-	struct block_device *bdev;
+	struct file *bdev_file;
 	atomic_t ref_count;
 };
 
@@ -59,7 +59,7 @@ static void uds_get_io_factory(struct io_factory *factory)
 	atomic_inc(&factory->ref_count);
 }
 
-int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
+int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
 {
 	int result;
 	struct io_factory *factory;
@@ -68,16 +68,16 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
 	if (result != VDO_SUCCESS)
 		return result;
 
-	factory->bdev = bdev;
+	factory->bdev_file = bdev_file;
 	atomic_set_release(&factory->ref_count, 1);
 
 	*factory_ptr = factory;
 	return UDS_SUCCESS;
 }
 
-int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
+int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
 {
-	factory->bdev = bdev;
+	factory->bdev_file = bdev_file;
 	return UDS_SUCCESS;
 }
 
@@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
 
 size_t uds_get_writable_size(struct io_factory *factory)
 {
-	return i_size_read(factory->bdev->bd_inode);
+	return i_size_read(file_inode(factory->bdev_file));
 }
 
 /* Create a struct dm_bufio_client for an index region starting at offset. */
@@ -99,8 +99,9 @@ int uds_make_bufio(struct io_factory *factory, off_t block_offset, size_t block_
 {
 	struct dm_bufio_client *client;
 
-	client = dm_bufio_client_create(factory->bdev, block_size, reserved_buffers, 0,
-					NULL, NULL, 0);
+	client = dm_bufio_client_create(file_bdev(factory->bdev_file),
+					block_size, reserved_buffers,
+					0, NULL, NULL, 0);
 	if (IS_ERR(client))
 		return -PTR_ERR(client);
 
diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
index 7fb5a0616a79..a3ca84d62f2d 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.h
+++ b/drivers/md/dm-vdo/indexer/io-factory.h
@@ -24,11 +24,11 @@ enum {
 	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
 };
 
-int __must_check uds_make_io_factory(struct block_device *bdev,
+int __must_check uds_make_io_factory(struct file *bdev_file,
 				     struct io_factory **factory_ptr);
 
 int __must_check uds_replace_storage(struct io_factory *factory,
-				     struct block_device *bdev);
+				     struct file *bdev_file);
 
 void uds_put_io_factory(struct io_factory *factory);
 
diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
index 655453bb276b..edbe46252657 100644
--- a/drivers/md/dm-vdo/indexer/volume.c
+++ b/drivers/md/dm-vdo/indexer/volume.c
@@ -1465,12 +1465,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
 
 int __must_check uds_replace_volume_storage(struct volume *volume,
 					    struct index_layout *layout,
-					    struct block_device *bdev)
+					    struct file *bdev_file)
 {
 	int result;
 	u32 i;
 
-	result = uds_replace_index_layout_storage(layout, bdev);
+	result = uds_replace_index_layout_storage(layout, bdev_file);
 	if (result != UDS_SUCCESS)
 		return result;
 
diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
index 8679a5e55347..1dc3561b8b43 100644
--- a/drivers/md/dm-vdo/indexer/volume.h
+++ b/drivers/md/dm-vdo/indexer/volume.h
@@ -130,7 +130,7 @@ void uds_free_volume(struct volume *volume);
 
 int __must_check uds_replace_volume_storage(struct volume *volume,
 					    struct index_layout *layout,
-					    struct block_device *bdev);
+					    struct file *bdev_file);
 
 int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
 						    u64 *lowest_vcn, u64 *highest_vcn,
diff --git a/drivers/md/dm-vdo/vdo.c b/drivers/md/dm-vdo/vdo.c
index fff847767755..eca9f8b51535 100644
--- a/drivers/md/dm-vdo/vdo.c
+++ b/drivers/md/dm-vdo/vdo.c
@@ -809,7 +809,7 @@ void vdo_load_super_block(struct vdo *vdo, struct vdo_completion *parent)
  */
 struct block_device *vdo_get_backing_device(const struct vdo *vdo)
 {
-	return vdo->device_config->owned_device->bdev;
+	return file_bdev(vdo->device_config->owned_device->bdev_file);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 20/26] block: factor out a helper init_bdev_file()
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (18 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path Yu Kuai
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

There are no functional changes, the helper will be used in later
patches to initialize stashed bdev_file as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 4335df6d1266..82fb1688f4c9 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -832,6 +832,20 @@ static void bdev_yield_write_access(struct file *bdev_file)
 		bdev->bd_writers--;
 }
 
+static void init_bdev_file(struct file *bdev_file, struct block_device *bdev,
+			   blk_mode_t mode, void *holder)
+{
+	bdev_file->f_flags |= O_LARGEFILE;
+	bdev_file->f_mode |= FMODE_CAN_ODIRECT;
+	if (bdev_nowait(bdev))
+		bdev_file->f_mode |= FMODE_NOWAIT;
+	if (mode & BLK_OPEN_RESTRICT_WRITES)
+		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
+	bdev_file->f_mapping = bdev_mapping(bdev);
+	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
+	bdev_file->private_data = holder;
+}
+
 /**
  * bdev_open - open a block device
  * @bdev: block device to open
@@ -905,15 +919,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	if (unblock_events)
 		disk_unblock_events(disk);
 
-	bdev_file->f_flags |= O_LARGEFILE;
-	bdev_file->f_mode |= FMODE_CAN_ODIRECT;
-	if (bdev_nowait(bdev))
-		bdev_file->f_mode |= FMODE_NOWAIT;
-	if (mode & BLK_OPEN_RESTRICT_WRITES)
-		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
-	bdev_file->f_mapping = bdev_mapping(bdev);
-	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
-	bdev_file->private_data = holder;
+	init_bdev_file(bdev_file, bdev, mode, holder);
 
 	return 0;
 put_module:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (19 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 20/26] block: factor out a helper init_bdev_file() Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-11  9:16   ` (subset) " Christian Brauner
  2024-04-06  9:09 ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
                   ` (5 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

At the time bdev_may_open() is called, module reference is grabbed
already, hence module reference should be released if bdev_may_open()
failed.

This problem is found by code review.

Fixes: ed5cc702d311 ("block: Add config option to not allow writing to mounted devices")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bdev.c b/block/bdev.c
index 82fb1688f4c9..86db97b0709e 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -890,7 +890,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 		goto abort_claiming;
 	ret = -EBUSY;
 	if (!bdev_may_open(bdev, mode))
-		goto abort_claiming;
+		goto put_module;
 	if (bdev_is_partition(bdev))
 		ret = blkdev_get_part(bdev, mode);
 	else
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (20 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06 19:42   ` Al Viro
  2024-04-09 10:23   ` Christian Brauner
  2024-04-06  9:09 ` [PATCH vfs.all 23/26] iomap: add helpers helpers to get and set bdev Yu Kuai
                   ` (4 subsequent siblings)
  26 siblings, 2 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

So that iomap and bffer_head can convert to use bdev_file in following
patches.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c              | 137 +++++++++++++++++++++++++++++---------
 include/linux/blk_types.h |   1 +
 2 files changed, 107 insertions(+), 31 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 86db97b0709e..3d300823da6b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -846,6 +846,101 @@ static void init_bdev_file(struct file *bdev_file, struct block_device *bdev,
 	bdev_file->private_data = holder;
 }
 
+/*
+ * If BLK_OPEN_WRITE_IOCTL is set then this is a historical quirk
+ * associated with the floppy driver where it has allowed ioctls if the
+ * file was opened for writing, but does not allow reads or writes.
+ * Make sure that this quirk is reflected in @f_flags.
+ *
+ * It can also happen if a block device is opened as O_RDWR | O_WRONLY.
+ */
+static unsigned blk_to_file_flags(blk_mode_t mode)
+{
+	unsigned int flags = 0;
+
+	if ((mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) ==
+	    (BLK_OPEN_READ | BLK_OPEN_WRITE))
+		flags |= O_RDWR;
+	else if (mode & BLK_OPEN_WRITE_IOCTL)
+		flags |= O_RDWR | O_WRONLY;
+	else if (mode & BLK_OPEN_WRITE)
+		flags |= O_WRONLY;
+	else if (mode & BLK_OPEN_READ)
+		flags |= O_RDONLY; /* homeopathic, because O_RDONLY is 0 */
+	else
+		WARN_ON_ONCE(true);
+
+	if (mode & BLK_OPEN_NDELAY)
+		flags |= O_NDELAY;
+
+	return flags;
+}
+
+static int __stash_bdev_file(struct block_device *bdev)
+{
+	struct inode *inode = bdev_inode(bdev);
+	unsigned int flags = blk_to_file_flags(BLK_OPEN_READ | BLK_OPEN_WRITE);
+	struct file *file;
+	static struct file_operations stash_fops;
+
+	file = inode->i_private;
+	if (!file) {
+		/*
+		 * This file is used for iomap/buffer_head for raw block_device
+		 * read/write operations to access block_device.
+		 */
+		file = alloc_file_pseudo_noaccount(bdev_inode(bdev),
+				blockdev_mnt, "", flags | O_LARGEFILE,
+				&stash_fops);
+
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		ihold(inode);
+		init_bdev_file(file, bdev, 0, NULL);
+
+		inode->i_private = file;
+		WARN_ON_ONCE(bdev->stash_count != 0);
+	}
+
+	bdev->stash_count++;
+	return 0;
+}
+
+static void __unstash_bdev_file(struct block_device *bdev)
+{
+
+	WARN_ON_ONCE(bdev->stash_count <= 0);
+	if (--bdev->stash_count == 0) {
+		struct inode *inode = bdev_inode(bdev);
+		struct file *file = inode->i_private;
+
+		inode->i_private = NULL;
+		fput(file);
+	}
+}
+
+static int stash_bdev_file(struct block_device *bdev)
+{
+	int ret = __stash_bdev_file(bdev);
+
+	if (ret || !bdev_is_partition(bdev))
+		return ret;
+
+	ret = __stash_bdev_file(bdev_whole(bdev));
+	if (ret)
+		__unstash_bdev_file(bdev);
+
+	return ret;
+}
+
+static void unstash_bdev_file(struct block_device *bdev)
+{
+	__unstash_bdev_file(bdev);
+	if (bdev_is_partition(bdev))
+		__unstash_bdev_file(bdev_whole(bdev));
+}
+
 /**
  * bdev_open - open a block device
  * @bdev: block device to open
@@ -891,12 +986,17 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	ret = -EBUSY;
 	if (!bdev_may_open(bdev, mode))
 		goto put_module;
+
+	ret = stash_bdev_file(bdev);
+	if (ret)
+		goto put_module;
+
 	if (bdev_is_partition(bdev))
 		ret = blkdev_get_part(bdev, mode);
 	else
 		ret = blkdev_get_whole(bdev, mode);
 	if (ret)
-		goto put_module;
+		goto unstash_bdev_file;
 	bdev_claim_write_access(bdev, mode);
 	if (holder) {
 		bd_finish_claiming(bdev, holder, hops);
@@ -922,6 +1022,9 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	init_bdev_file(bdev_file, bdev, mode, holder);
 
 	return 0;
+
+unstash_bdev_file:
+	unstash_bdev_file(bdev);
 put_module:
 	module_put(disk->fops->owner);
 abort_claiming:
@@ -932,36 +1035,6 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 	return ret;
 }
 
-/*
- * If BLK_OPEN_WRITE_IOCTL is set then this is a historical quirk
- * associated with the floppy driver where it has allowed ioctls if the
- * file was opened for writing, but does not allow reads or writes.
- * Make sure that this quirk is reflected in @f_flags.
- *
- * It can also happen if a block device is opened as O_RDWR | O_WRONLY.
- */
-static unsigned blk_to_file_flags(blk_mode_t mode)
-{
-	unsigned int flags = 0;
-
-	if ((mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) ==
-	    (BLK_OPEN_READ | BLK_OPEN_WRITE))
-		flags |= O_RDWR;
-	else if (mode & BLK_OPEN_WRITE_IOCTL)
-		flags |= O_RDWR | O_WRONLY;
-	else if (mode & BLK_OPEN_WRITE)
-		flags |= O_WRONLY;
-	else if (mode & BLK_OPEN_READ)
-		flags |= O_RDONLY; /* homeopathic, because O_RDONLY is 0 */
-	else
-		WARN_ON_ONCE(true);
-
-	if (mode & BLK_OPEN_NDELAY)
-		flags |= O_NDELAY;
-
-	return flags;
-}
-
 struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 				   const struct blk_holder_ops *hops)
 {
@@ -1073,6 +1146,8 @@ void bdev_release(struct file *bdev_file)
 		blkdev_put_part(bdev);
 	else
 		blkdev_put_whole(bdev);
+
+	unstash_bdev_file(bdev);
 	mutex_unlock(&disk->open_mutex);
 
 	module_put(disk->fops->owner);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cb1526ec44b5..22f736908cbe 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -70,6 +70,7 @@ struct block_device {
 #endif
 	bool			bd_ro_warned;
 	int			bd_writers;
+	int			stash_count;
 	/*
 	 * keep this out-of-line as it's both big and not needed in the fast
 	 * path
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 23/26] iomap: add helpers helpers to get and set bdev
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (21 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 24/26] iomap: convert to use bdev_file Yu Kuai
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

So that we have unified APIs, there are no functional changes and
prepare to convert iomap to use bdev_file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/fops.c           |  2 +-
 fs/btrfs/inode.c       |  2 +-
 fs/buffer.c            |  2 +-
 fs/erofs/data.c        | 20 ++++++++++++++++----
 fs/erofs/internal.h    |  1 +
 fs/erofs/zmap.c        |  2 +-
 fs/ext2/inode.c        |  2 +-
 fs/ext4/inode.c        |  2 +-
 fs/f2fs/data.c         | 10 ++++++++--
 fs/f2fs/f2fs.h         |  1 +
 fs/fuse/dax.c          |  2 +-
 fs/gfs2/bmap.c         |  2 +-
 fs/hpfs/file.c         |  2 +-
 fs/iomap/buffered-io.c |  8 ++++----
 fs/iomap/direct-io.c   | 11 ++++++-----
 fs/iomap/swapfile.c    |  2 +-
 fs/iomap/trace.h       |  6 ++++--
 fs/xfs/xfs_iomap.c     |  4 ++--
 fs/zonefs/file.c       |  4 ++--
 include/linux/iomap.h  | 11 +++++++++++
 20 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 58b427051c0d..7d177be788cd 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -388,7 +388,7 @@ static int blkdev_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	struct block_device *bdev = I_BDEV(inode);
 	loff_t isize = i_size_read(inode);
 
-	iomap->bdev = bdev;
+	iomap_set_bdev_file(iomap, inode->i_private);
 	iomap->offset = ALIGN_DOWN(offset, bdev_logical_block_size(bdev));
 	if (iomap->offset >= isize)
 		return -EIO;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8cf692c708d7..e7495581bc58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7709,7 +7709,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 		iomap->type = IOMAP_MAPPED;
 	}
 	iomap->offset = start;
-	iomap->bdev = fs_info->fs_devices->latest_dev->bdev;
+	iomap_set_bdev_file(iomap, fs_info->fs_devices->latest_dev->bdev_file);
 	iomap->length = len;
 	free_extent_map(em);
 
diff --git a/fs/buffer.c b/fs/buffer.c
index 4f73d23c2c46..7900720fc54b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2005,7 +2005,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 {
 	loff_t offset = (loff_t)block << inode->i_blkbits;
 
-	bh->b_bdev = iomap->bdev;
+	bh->b_bdev = iomap_bdev(iomap);
 
 	/*
 	 * Block points to offset in file we need to map, iomap contains
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index b0a55b4d8c30..ea149cfef88e 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -204,6 +204,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 	int id;
 
 	map->m_bdev = sb->s_bdev;
+	map->m_bdev_file = sb->s_bdev_file;
 	map->m_daxdev = EROFS_SB(sb)->dax_dev;
 	map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
 	map->m_fscache = EROFS_SB(sb)->s_fscache;
@@ -220,7 +221,13 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 			up_read(&devs->rwsem);
 			return 0;
 		}
-		map->m_bdev = dif->bdev_file ? file_bdev(dif->bdev_file) : NULL;
+		if (dif->bdev_file) {
+			map->m_bdev = file_bdev(dif->bdev_file);
+			map->m_bdev_file = dif->bdev_file;
+		} else {
+			map->m_bdev = NULL;
+			map->m_bdev_file = NULL;
+		}
 		map->m_daxdev = dif->dax_dev;
 		map->m_dax_part_off = dif->dax_part_off;
 		map->m_fscache = dif->fscache;
@@ -238,8 +245,13 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
 			if (map->m_pa >= startoff &&
 			    map->m_pa < startoff + length) {
 				map->m_pa -= startoff;
-				map->m_bdev = dif->bdev_file ?
-					      file_bdev(dif->bdev_file) : NULL;
+				if (dif->bdev_file) {
+					map->m_bdev = file_bdev(dif->bdev_file);
+					map->m_bdev_file = dif->bdev_file;
+				} else {
+					map->m_bdev = NULL;
+					map->m_bdev_file = NULL;
+				}
 				map->m_daxdev = dif->dax_dev;
 				map->m_dax_part_off = dif->dax_part_off;
 				map->m_fscache = dif->fscache;
@@ -278,7 +290,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = mdev.m_daxdev;
 	else
-		iomap->bdev = mdev.m_bdev;
+		iomap_set_bdev_file(iomap, mdev.m_bdev_file);
 	iomap->length = map.m_llen;
 	iomap->flags = 0;
 	iomap->private = NULL;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 39c67119f43b..a91481178876 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -378,6 +378,7 @@ enum {
 struct erofs_map_dev {
 	struct erofs_fscache *m_fscache;
 	struct block_device *m_bdev;
+	struct file *m_bdev_file;
 	struct dax_device *m_daxdev;
 	u64 m_dax_part_off;
 
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index e313c936351d..71e6c5342d72 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -739,7 +739,7 @@ static int z_erofs_iomap_begin_report(struct inode *inode, loff_t offset,
 	if (ret < 0)
 		return ret;
 
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 	iomap->offset = map.m_la;
 	iomap->length = map.m_llen;
 	if (map.m_flags & EROFS_MAP_MAPPED) {
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f3d570a9302b..6286d1578426 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -842,7 +842,7 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = sbi->s_daxdev;
 	else
-		iomap->bdev = inode->i_sb->s_bdev;
+		iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 
 	if (ret == 0) {
 		/*
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..588af2604bb8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3235,7 +3235,7 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
 	if (flags & IOMAP_DAX)
 		iomap->dax_dev = EXT4_SB(inode->i_sb)->s_daxdev;
 	else
-		iomap->bdev = inode->i_sb->s_bdev;
+		iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 	iomap->offset = (u64) map->m_lblk << blkbits;
 	iomap->length = (u64) map->m_len << blkbits;
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d9494b5fc7c1..8002a5b511d9 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1499,10 +1499,12 @@ static bool f2fs_map_blocks_cached(struct inode *inode,
 		struct f2fs_dev_info *dev = &sbi->devs[bidx];
 
 		map->m_bdev = dev->bdev;
+		map->m_bdev_file = dev->bdev_file;
 		map->m_pblk -= dev->start_blk;
 		map->m_len = min(map->m_len, dev->end_blk + 1 - map->m_pblk);
 	} else {
 		map->m_bdev = inode->i_sb->s_bdev;
+		map->m_bdev_file = inode->i_sb->s_bdev_file;
 	}
 	return true;
 }
@@ -1534,6 +1536,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		goto out;
 
 	map->m_bdev = inode->i_sb->s_bdev;
+	map->m_bdev_file = inode->i_sb->s_bdev_file;
 	map->m_multidev_dio =
 		f2fs_allow_multi_device_dio(F2FS_I_SB(inode), flag);
 
@@ -1651,8 +1654,10 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 		map->m_pblk = blkaddr;
 		map->m_len = 1;
 
-		if (map->m_multidev_dio)
+		if (map->m_multidev_dio) {
 			map->m_bdev = FDEV(bidx).bdev;
+			map->m_bdev_file = FDEV(bidx).bdev_file;
+		}
 	} else if ((map->m_pblk != NEW_ADDR &&
 			blkaddr == (map->m_pblk + ofs)) ||
 			(map->m_pblk == NEW_ADDR && blkaddr == NEW_ADDR) ||
@@ -1725,6 +1730,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag)
 			bidx = f2fs_target_device_index(sbi, map->m_pblk);
 
 			map->m_bdev = FDEV(bidx).bdev;
+			map->m_bdev_file = FDEV(bidx).bdev_file;
 			map->m_pblk -= FDEV(bidx).start_blk;
 
 			if (map->m_may_create)
@@ -4189,7 +4195,7 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->length = blks_to_bytes(inode, map.m_len);
 		iomap->type = IOMAP_MAPPED;
 		iomap->flags |= IOMAP_F_MERGED;
-		iomap->bdev = map.m_bdev;
+		iomap_set_bdev_file(iomap, map.m_bdev_file);
 		iomap->addr = blks_to_bytes(inode, map.m_pblk);
 	} else {
 		if (flags & IOMAP_WRITE)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index fced2b7652f4..49894ac4f7ff 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -699,6 +699,7 @@ struct extent_tree_info {
 
 struct f2fs_map_blocks {
 	struct block_device *m_bdev;	/* for multi-device dio */
+	struct file *m_bdev_file;	/* for multi-device dio */
 	block_t m_pblk;
 	block_t m_lblk;
 	unsigned int m_len;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 12ef91d170bb..1fbc1c5688ca 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -575,7 +575,7 @@ static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
 
 	iomap->offset = pos;
 	iomap->flags = 0;
-	iomap->bdev = NULL;
+	iomap_set_bdev_file(iomap, NULL);
 	iomap->dax_dev = fc->dax->dev;
 
 	/*
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 789af5c8fade..20eb2db774b0 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -926,7 +926,7 @@ static int __gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
 		iomap->flags |= IOMAP_F_GFS2_BOUNDARY;
 
 out:
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 unlock:
 	up_read(&ip->i_rw_mutex);
 	return ret;
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index 1bb8d97cd9ae..77c01a9252c7 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -128,7 +128,7 @@ static int hpfs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	if (WARN_ON_ONCE(flags & (IOMAP_WRITE | IOMAP_ZERO)))
 		return -EINVAL;
 
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 	iomap->offset = offset;
 
 	hpfs_lock(sb);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0..66a83c84d11d 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -415,7 +415,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 
 		if (ctx->rac) /* same as readahead_gfp_mask */
 			gfp |= __GFP_NORETRY | __GFP_NOWARN;
-		ctx->bio = bio_alloc(iomap->bdev, bio_max_segs(nr_vecs),
+		ctx->bio = bio_alloc(iomap_bdev(iomap), bio_max_segs(nr_vecs),
 				     REQ_OP_READ, gfp);
 		/*
 		 * If the bio_alloc fails, try it again for a single page to
@@ -423,7 +423,7 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
 		 * what do_mpage_read_folio does.
 		 */
 		if (!ctx->bio) {
-			ctx->bio = bio_alloc(iomap->bdev, 1, REQ_OP_READ,
+			ctx->bio = bio_alloc(iomap_bdev(iomap), 1, REQ_OP_READ,
 					     orig_gfp);
 		}
 		if (ctx->rac)
@@ -662,7 +662,7 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio,
 	struct bio_vec bvec;
 	struct bio bio;
 
-	bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ);
+	bio_init(&bio, iomap_bdev(iomap), &bvec, 1, REQ_OP_READ);
 	bio.bi_iter.bi_sector = iomap_sector(iomap, block_start);
 	bio_add_folio_nofail(&bio, folio, plen, poff);
 	return submit_bio_wait(&bio);
@@ -1684,7 +1684,7 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
 	struct iomap_ioend *ioend;
 	struct bio *bio;
 
-	bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
+	bio = bio_alloc_bioset(iomap_bdev(&wpc->iomap), BIO_MAX_VECS,
 			       REQ_OP_WRITE | wbc_to_write_flags(wbc),
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f3b43d223a46..3e9f54727326 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -56,9 +56,9 @@ static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter,
 		struct iomap_dio *dio, unsigned short nr_vecs, blk_opf_t opf)
 {
 	if (dio->dops && dio->dops->bio_set)
-		return bio_alloc_bioset(iter->iomap.bdev, nr_vecs, opf,
+		return bio_alloc_bioset(iomap_bdev(&iter->iomap), nr_vecs, opf,
 					GFP_KERNEL, dio->dops->bio_set);
-	return bio_alloc(iter->iomap.bdev, nr_vecs, opf, GFP_KERNEL);
+	return bio_alloc(iomap_bdev(&iter->iomap), nr_vecs, opf, GFP_KERNEL);
 }
 
 static void iomap_dio_submit_bio(const struct iomap_iter *iter,
@@ -288,8 +288,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 	size_t copied = 0;
 	size_t orig_count;
 
-	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
-	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
+	if ((pos | length) & (bdev_logical_block_size(iomap_bdev(iomap)) - 1) ||
+	    !bdev_iter_is_aligned(iomap_bdev(iomap), dio->submit.iter))
 		return -EINVAL;
 
 	if (iomap->type == IOMAP_UNWRITTEN) {
@@ -316,7 +316,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		 */
 		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
 		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
+		    (bdev_fua(iomap_bdev(iomap)) ||
+		     !bdev_write_cache(iomap_bdev(iomap))))
 			use_fua = true;
 		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
 			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c
index 5fc0ac36dee3..20bd67e85d15 100644
--- a/fs/iomap/swapfile.c
+++ b/fs/iomap/swapfile.c
@@ -116,7 +116,7 @@ static loff_t iomap_swapfile_iter(const struct iomap_iter *iter,
 		return iomap_swapfile_fail(isi, "has shared extents");
 
 	/* Only one bdev per swap file. */
-	if (iomap->bdev != isi->sis->bdev)
+	if (iomap_bdev(iomap) != isi->sis->bdev)
 		return iomap_swapfile_fail(isi, "outside the main device");
 
 	if (isi->iomap.length == 0) {
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 0a991c4ce87d..39ac91fd4a50 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -134,7 +134,8 @@ DECLARE_EVENT_CLASS(iomap_class,
 		__entry->length = iomap->length;
 		__entry->type = iomap->type;
 		__entry->flags = iomap->flags;
-		__entry->bdev = iomap->bdev ? iomap->bdev->bd_dev : 0;
+		__entry->bdev = iomap_bdev(iomap) ?
+				iomap_bdev(iomap)->bd_dev : 0;
 	),
 	TP_printk("dev %d:%d ino 0x%llx bdev %d:%d addr 0x%llx offset 0x%llx "
 		  "length 0x%llx type %s flags %s",
@@ -181,7 +182,8 @@ TRACE_EVENT(iomap_writepage_map,
 		__entry->length = iomap->length;
 		__entry->type = iomap->type;
 		__entry->flags = iomap->flags;
-		__entry->bdev = iomap->bdev ? iomap->bdev->bd_dev : 0;
+		__entry->bdev = iomap_bdev(iomap) ?
+				iomap_bdev(iomap)->bd_dev : 0;
 	),
 	TP_printk("dev %d:%d ino 0x%llx bdev %d:%d pos 0x%llx dirty len 0x%llx "
 		  "addr 0x%llx offset 0x%llx length 0x%llx type %s flags %s",
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4087af7f3c9f..cb4ac7129bce 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -129,7 +129,7 @@ xfs_bmbt_to_iomap(
 	if (mapping_flags & IOMAP_DAX)
 		iomap->dax_dev = target->bt_daxdev;
 	else
-		iomap->bdev = target->bt_bdev;
+		iomap_set_bdev_file(iomap, target->bt_bdev_file);
 	iomap->flags = iomap_flags;
 
 	if (xfs_ipincount(ip) &&
@@ -154,7 +154,7 @@ xfs_hole_to_iomap(
 	iomap->type = IOMAP_HOLE;
 	iomap->offset = XFS_FSB_TO_B(ip->i_mount, offset_fsb);
 	iomap->length = XFS_FSB_TO_B(ip->i_mount, end_fsb - offset_fsb);
-	iomap->bdev = target->bt_bdev;
+	iomap_set_bdev_file(iomap, target->bt_bdev_file);
 	iomap->dax_dev = target->bt_daxdev;
 }
 
diff --git a/fs/zonefs/file.c b/fs/zonefs/file.c
index 3b103715acc9..34100c6e008d 100644
--- a/fs/zonefs/file.c
+++ b/fs/zonefs/file.c
@@ -38,7 +38,7 @@ static int zonefs_read_iomap_begin(struct inode *inode, loff_t offset,
 	 * act as if there is a hole up to the file maximum size.
 	 */
 	mutex_lock(&zi->i_truncate_mutex);
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 	iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize);
 	isize = i_size_read(inode);
 	if (iomap->offset >= isize) {
@@ -88,7 +88,7 @@ static int zonefs_write_iomap_begin(struct inode *inode, loff_t offset,
 	 * write pointer) and unwriten beyond.
 	 */
 	mutex_lock(&zi->i_truncate_mutex);
-	iomap->bdev = inode->i_sb->s_bdev;
+	iomap_set_bdev_file(iomap, inode->i_sb->s_bdev_file);
 	iomap->offset = ALIGN_DOWN(offset, sb->s_blocksize);
 	iomap->addr = (z->z_sector << SECTOR_SHIFT) + iomap->offset;
 	isize = i_size_read(inode);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d..8ae384f0eeb1 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -105,6 +105,17 @@ struct iomap {
 	u64			validity_cookie; /* used with .iomap_valid() */
 };
 
+static inline struct block_device *iomap_bdev(const struct iomap *iomap)
+{
+	return iomap->bdev;
+}
+
+static inline void iomap_set_bdev_file(struct iomap *iomap,
+				       struct file *bdev_file)
+{
+	iomap->bdev = bdev_file ? file_bdev(bdev_file) : NULL;
+}
+
 static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
 {
 	return (iomap->addr + pos - iomap->offset) >> SECTOR_SHIFT;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 24/26] iomap: convert to use bdev_file
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (22 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 23/26] iomap: add helpers helpers to get and set bdev Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 25/26] buffer: add helpers to get and set bdev Yu Kuai
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

With previous commit both filesystems and raw block device provide
bdev_file while initializing iomap, it's safe to convert to use
bdev_file. Prepare to remove bd_inode from block_device after convert
buffer_head to use bdev_file as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 include/linux/iomap.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 8ae384f0eeb1..1386f3a618fe 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -97,7 +97,7 @@ struct iomap {
 	u64			length;	/* length of mapping, bytes */
 	u16			type;	/* type of mapping */
 	u16			flags;	/* flags for mapping */
-	struct block_device	*bdev;	/* block device for I/O */
+	struct file		*bdev_file; /* block device for I/O */
 	struct dax_device	*dax_dev; /* dax_dev for dax operations */
 	void			*inline_data;
 	void			*private; /* filesystem private */
@@ -107,13 +107,13 @@ struct iomap {
 
 static inline struct block_device *iomap_bdev(const struct iomap *iomap)
 {
-	return iomap->bdev;
+	return iomap->bdev_file ? file_bdev(iomap->bdev_file) : NULL;
 }
 
 static inline void iomap_set_bdev_file(struct iomap *iomap,
 				       struct file *bdev_file)
 {
-	iomap->bdev = bdev_file ? file_bdev(bdev_file) : NULL;
+	iomap->bdev_file = bdev_file;
 }
 
 static inline sector_t iomap_sector(const struct iomap *iomap, loff_t pos)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 25/26] buffer: add helpers to get and set bdev
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (23 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 24/26] iomap: convert to use bdev_file Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-06  9:09 ` [PATCH vfs.all 26/26] buffer: convert to use bdev_file Yu Kuai
  2024-04-07  2:20 ` [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

So that we have unified APIs, there are no functional changes and
prepare to convert buffer_head to use bdev_file.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/fops.c                  |  2 +-
 drivers/md/md-bitmap.c        |  2 +-
 fs/affs/file.c                |  2 +-
 fs/buffer.c                   | 10 +++++-----
 fs/direct-io.c                |  4 ++--
 fs/ext2/xattr.c               |  2 +-
 fs/ext4/mmp.c                 |  2 +-
 fs/ext4/page-io.c             |  5 ++---
 fs/ext4/xattr.c               |  2 +-
 fs/gfs2/aops.c                |  2 +-
 fs/gfs2/meta_io.c             |  2 +-
 fs/jbd2/commit.c              |  2 +-
 fs/jbd2/journal.c             |  2 +-
 fs/jbd2/transaction.c         |  8 ++++----
 fs/mpage.c                    | 10 +++++-----
 fs/nilfs2/btnode.c            |  4 ++--
 fs/nilfs2/gcinode.c           |  2 +-
 fs/nilfs2/mdt.c               |  2 +-
 fs/nilfs2/page.c              |  4 ++--
 fs/ntfs3/inode.c              |  2 +-
 fs/reiserfs/fix_node.c        |  2 +-
 fs/reiserfs/journal.c         |  2 +-
 fs/reiserfs/prints.c          |  4 ++--
 fs/reiserfs/stree.c           |  2 +-
 fs/reiserfs/tail_conversion.c |  2 +-
 include/linux/buffer_head.h   | 20 +++++++++++++++++++-
 include/trace/events/block.h  |  2 +-
 27 files changed, 61 insertions(+), 44 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 7d177be788cd..edae216e31dd 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -407,7 +407,7 @@ static const struct iomap_ops blkdev_iomap_ops = {
 static int blkdev_get_block(struct inode *inode, sector_t iblock,
 		struct buffer_head *bh, int create)
 {
-	bh->b_bdev = I_BDEV(inode);
+	bh_set_bdev_file(bh, inode->i_private);
 	bh->b_blocknr = iblock;
 	set_buffer_mapped(bh);
 	return 0;
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 059afc24c08b..fd6c95e0c625 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -381,7 +381,7 @@ static int read_file_page(struct file *file, unsigned long index,
 			}
 
 			bh->b_blocknr = block;
-			bh->b_bdev = inode->i_sb->s_bdev;
+			bh_set_bdev_file(bh, inode->i_sb->s_bdev_file);
 			if (count < blocksize)
 				count = 0;
 			else
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 04c018e19602..f15b24202aab 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -365,7 +365,7 @@ affs_get_block(struct inode *inode, sector_t block, struct buffer_head *bh_resul
 err_alloc:
 	brelse(ext_bh);
 	clear_buffer_mapped(bh_result);
-	bh_result->b_bdev = NULL;
+	bh_set_bdev_file(bh_result, NULL);
 	// unlock cache
 	affs_unlock_ext(inode);
 	return -ENOSPC;
diff --git a/fs/buffer.c b/fs/buffer.c
index 7900720fc54b..e4d74eb63265 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -129,7 +129,7 @@ static void buffer_io_error(struct buffer_head *bh, char *msg)
 	if (!test_bit(BH_Quiet, &bh->b_state))
 		printk_ratelimited(KERN_ERR
 			"Buffer I/O error on dev %pg, logical block %llu%s\n",
-			bh->b_bdev, (unsigned long long)bh->b_blocknr, msg);
+			bh_bdev(bh), (unsigned long long)bh->b_blocknr, msg);
 }
 
 /*
@@ -1367,7 +1367,7 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
 	for (i = 0; i < BH_LRU_SIZE; i++) {
 		struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);
 
-		if (bh && bh->b_blocknr == block && bh->b_bdev == bdev &&
+		if (bh && bh->b_blocknr == block && bh_bdev(bh) == bdev &&
 		    bh->b_size == size) {
 			if (i) {
 				while (i) {
@@ -1564,7 +1564,7 @@ static void discard_buffer(struct buffer_head * bh)
 
 	lock_buffer(bh);
 	clear_buffer_dirty(bh);
-	bh->b_bdev = NULL;
+	bh_set_bdev_file(bh, NULL);
 	b_state = READ_ONCE(bh->b_state);
 	do {
 	} while (!try_cmpxchg(&bh->b_state, &b_state,
@@ -2005,7 +2005,7 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 {
 	loff_t offset = (loff_t)block << inode->i_blkbits;
 
-	bh->b_bdev = iomap_bdev(iomap);
+	bh_set_bdev_file(bh, iomap->bdev_file);
 
 	/*
 	 * Block points to offset in file we need to map, iomap contains
@@ -2781,7 +2781,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 	if (buffer_prio(bh))
 		opf |= REQ_PRIO;
 
-	bio = bio_alloc(bh->b_bdev, 1, opf, GFP_NOIO);
+	bio = bio_alloc(bh_bdev(bh), 1, opf, GFP_NOIO);
 
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 62c97ff9e852..49475f530e0f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -673,7 +673,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio,
 	sector = start_sector << (sdio->blkbits - 9);
 	nr_pages = bio_max_segs(sdio->pages_in_io);
 	BUG_ON(nr_pages <= 0);
-	dio_bio_alloc(dio, sdio, map_bh->b_bdev, sector, nr_pages);
+	dio_bio_alloc(dio, sdio, bh_bdev(map_bh), sector, nr_pages);
 	sdio->boundary = 0;
 out:
 	return ret;
@@ -948,7 +948,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 					map_bh->b_blocknr << sdio->blkfactor;
 				if (buffer_new(map_bh)) {
 					clean_bdev_aliases(
-						map_bh->b_bdev,
+						bh_bdev(map_bh),
 						map_bh->b_blocknr,
 						map_bh->b_size >> i_blkbits);
 				}
diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c
index c885dcc3bd0d..42e595e87a74 100644
--- a/fs/ext2/xattr.c
+++ b/fs/ext2/xattr.c
@@ -80,7 +80,7 @@
 	} while (0)
 # define ea_bdebug(bh, f...) do { \
 		printk(KERN_DEBUG "block %pg:%lu: ", \
-			bh->b_bdev, (unsigned long) bh->b_blocknr); \
+			bh_bdev(bh), (unsigned long) bh->b_blocknr); \
 		printk(f); \
 		printk("\n"); \
 	} while (0)
diff --git a/fs/ext4/mmp.c b/fs/ext4/mmp.c
index bd946d0c71b7..5641bd34d021 100644
--- a/fs/ext4/mmp.c
+++ b/fs/ext4/mmp.c
@@ -384,7 +384,7 @@ int ext4_multi_mount_protect(struct super_block *sb,
 
 	BUILD_BUG_ON(sizeof(mmp->mmp_bdevname) < BDEVNAME_SIZE);
 	snprintf(mmp->mmp_bdevname, sizeof(mmp->mmp_bdevname),
-		 "%pg", bh->b_bdev);
+		 "%pg", bh_bdev(bh));
 
 	/*
 	 * Start a kernel thread to update the MMP block periodically.
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 312bc6813357..1b02b6a28eca 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -93,8 +93,7 @@ struct ext4_io_end_vec *ext4_last_io_end_vec(ext4_io_end_t *io_end)
 static void buffer_io_error(struct buffer_head *bh)
 {
 	printk_ratelimited(KERN_ERR "Buffer I/O error on device %pg, logical block %llu\n",
-		       bh->b_bdev,
-			(unsigned long long)bh->b_blocknr);
+			   bh_bdev(bh), (unsigned long long)bh->b_blocknr);
 }
 
 static void ext4_finish_bio(struct bio *bio)
@@ -397,7 +396,7 @@ static void io_submit_init_bio(struct ext4_io_submit *io,
 	 * bio_alloc will _always_ be able to allocate a bio if
 	 * __GFP_DIRECT_RECLAIM is set, see comments for bio_alloc_bioset().
 	 */
-	bio = bio_alloc(bh->b_bdev, BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOIO);
+	bio = bio_alloc(bh_bdev(bh), BIO_MAX_VECS, REQ_OP_WRITE, GFP_NOIO);
 	fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio->bi_end_io = ext4_end_bio;
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index b67a176bfcf9..005af215e24a 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -68,7 +68,7 @@
 	       inode->i_sb->s_id, inode->i_ino, ##__VA_ARGS__)
 # define ea_bdebug(bh, fmt, ...)					\
 	printk(KERN_DEBUG "block %pg:%lu: " fmt "\n",			\
-	       bh->b_bdev, (unsigned long)bh->b_blocknr, ##__VA_ARGS__)
+	       bh_bdev(bh), (unsigned long)bh->b_blocknr, ##__VA_ARGS__)
 #else
 # define ea_idebug(inode, fmt, ...)	no_printk(fmt, ##__VA_ARGS__)
 # define ea_bdebug(bh, fmt, ...)	no_printk(fmt, ##__VA_ARGS__)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 974aca9c8ea8..24b6cf9021ca 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -622,7 +622,7 @@ static void gfs2_discard(struct gfs2_sbd *sdp, struct buffer_head *bh)
 			spin_unlock(&sdp->sd_ail_lock);
 		}
 	}
-	bh->b_bdev = NULL;
+	bh_set_bdev_file(bh, NULL);
 	clear_buffer_mapped(bh);
 	clear_buffer_req(bh);
 	clear_buffer_new(bh);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index f814054c8cd0..2052d3fc2c24 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -218,7 +218,7 @@ static void gfs2_submit_bhs(blk_opf_t opf, struct buffer_head *bhs[], int num)
 		struct buffer_head *bh = *bhs;
 		struct bio *bio;
 
-		bio = bio_alloc(bh->b_bdev, num, opf, GFP_NOIO);
+		bio = bio_alloc(bh_bdev(bh), num, opf, GFP_NOIO);
 		bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 		while (num > 0) {
 			bh = *bhs;
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 5e122586e06e..413f32b2f308 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -1014,7 +1014,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 				clear_buffer_mapped(bh);
 				clear_buffer_new(bh);
 				clear_buffer_req(bh);
-				bh->b_bdev = NULL;
+				bh_set_bdev_file(bh, NULL);
 			}
 		}
 
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index abd42a6ccd0e..c1ce32d99267 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -434,7 +434,7 @@ int jbd2_journal_write_metadata_buffer(transaction_t *transaction,
 
 	folio_set_bh(new_bh, new_folio, new_offset);
 	new_bh->b_size = bh_in->b_size;
-	new_bh->b_bdev = journal->j_dev;
+	bh_set_bdev_file(new_bh, journal->j_dev_file);
 	new_bh->b_blocknr = blocknr;
 	new_bh->b_private = bh_in;
 	set_buffer_mapped(new_bh);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index cb0b8d6fc0c6..04021f54ca97 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -929,7 +929,7 @@ static void warn_dirty_buffer(struct buffer_head *bh)
 	       "JBD2: Spotted dirty metadata buffer (dev = %pg, blocknr = %llu). "
 	       "There's a risk of filesystem corruption in case of system "
 	       "crash.\n",
-	       bh->b_bdev, (unsigned long long)bh->b_blocknr);
+	       bh_bdev(bh), (unsigned long long)bh->b_blocknr);
 }
 
 /* Call t_frozen trigger and copy buffer data into jh->b_frozen_data. */
@@ -990,7 +990,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
 	/* If it takes too long to lock the buffer, trace it */
 	time_lock = jbd2_time_diff(start_lock, jiffies);
 	if (time_lock > HZ/10)
-		trace_jbd2_lock_buffer_stall(bh->b_bdev->bd_dev,
+		trace_jbd2_lock_buffer_stall(bh_bdev(bh)->bd_dev,
 			jiffies_to_msecs(time_lock));
 
 	/* We now hold the buffer lock so it is safe to query the buffer
@@ -2374,7 +2374,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 			write_unlock(&journal->j_state_lock);
 			jbd2_journal_put_journal_head(jh);
 			/* Already zapped buffer? Nothing to do... */
-			if (!bh->b_bdev)
+			if (!bh_bdev(bh))
 				return 0;
 			return -EBUSY;
 		}
@@ -2428,7 +2428,7 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh,
 	clear_buffer_new(bh);
 	clear_buffer_delay(bh);
 	clear_buffer_unwritten(bh);
-	bh->b_bdev = NULL;
+	bh_set_bdev_file(bh, NULL);
 	return may_free;
 }
 
diff --git a/fs/mpage.c b/fs/mpage.c
index fa8b99a199fa..40594afa63cb 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -126,7 +126,7 @@ static void map_buffer_to_folio(struct folio *folio, struct buffer_head *bh,
 	do {
 		if (block == page_block) {
 			page_bh->b_state = bh->b_state;
-			page_bh->b_bdev = bh->b_bdev;
+			bh_copy_bdev_file(page_bh, bh);
 			page_bh->b_blocknr = bh->b_blocknr;
 			break;
 		}
@@ -216,7 +216,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			page_block++;
 			block_in_file++;
 		}
-		bdev = map_bh->b_bdev;
+		bdev = bh_bdev(map_bh);
 	}
 
 	/*
@@ -272,7 +272,7 @@ static struct bio *do_mpage_readpage(struct mpage_readpage_args *args)
 			page_block++;
 			block_in_file++;
 		}
-		bdev = map_bh->b_bdev;
+		bdev = bh_bdev(map_bh);
 	}
 
 	if (first_hole != blocks_per_page) {
@@ -515,7 +515,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 				boundary_block = bh->b_blocknr;
 				boundary_bdev = bh->b_bdev;
 			}
-			bdev = bh->b_bdev;
+			bdev = bh_bdev(bh);
 		} while ((bh = bh->b_this_page) != head);
 
 		if (first_unmapped)
@@ -565,7 +565,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 		}
 		page_block++;
 		boundary = buffer_boundary(&map_bh);
-		bdev = map_bh.b_bdev;
+		bdev = bh_bdev(&map_bh);
 		if (block_in_file == last_block)
 			break;
 		block_in_file++;
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..3f81d00fc031 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -59,7 +59,7 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
 		BUG();
 	}
 	memset(bh->b_data, 0, i_blocksize(inode));
-	bh->b_bdev = inode->i_sb->s_bdev;
+	bh_set_bdev_file(bh, inode->i_sb->s_bdev_file);
 	bh->b_blocknr = blocknr;
 	set_buffer_mapped(bh);
 	set_buffer_uptodate(bh);
@@ -118,7 +118,7 @@ int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
 		goto found;
 	}
 	set_buffer_mapped(bh);
-	bh->b_bdev = inode->i_sb->s_bdev;
+	bh_set_bdev_file(bh, inode->i_sb->s_bdev_file);
 	bh->b_blocknr = pblocknr; /* set block address for read */
 	bh->b_end_io = end_buffer_read_sync;
 	get_bh(bh);
diff --git a/fs/nilfs2/gcinode.c b/fs/nilfs2/gcinode.c
index bf9a11d58817..83d2b5e034ad 100644
--- a/fs/nilfs2/gcinode.c
+++ b/fs/nilfs2/gcinode.c
@@ -84,7 +84,7 @@ int nilfs_gccache_submit_read_data(struct inode *inode, sector_t blkoff,
 	}
 
 	if (!buffer_mapped(bh)) {
-		bh->b_bdev = inode->i_sb->s_bdev;
+		bh_set_bdev_file(bh, inode->i_sb->s_bdev_file);
 		set_buffer_mapped(bh);
 	}
 	bh->b_blocknr = pbn;
diff --git a/fs/nilfs2/mdt.c b/fs/nilfs2/mdt.c
index 4f792a0ad0f0..10f33017a1c9 100644
--- a/fs/nilfs2/mdt.c
+++ b/fs/nilfs2/mdt.c
@@ -89,7 +89,7 @@ static int nilfs_mdt_create_block(struct inode *inode, unsigned long block,
 	if (buffer_uptodate(bh))
 		goto failed_bh;
 
-	bh->b_bdev = sb->s_bdev;
+	bh_set_bdev_file(bh, sb->s_bdev_file);
 	err = nilfs_mdt_insert_new_block(inode, block, bh, init_block);
 	if (likely(!err)) {
 		get_bh(bh);
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 14e470fb8870..b6cc95dd13c0 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -111,7 +111,7 @@ void nilfs_copy_buffer(struct buffer_head *dbh, struct buffer_head *sbh)
 
 	dbh->b_state = sbh->b_state & NILFS_BUFFER_INHERENT_BITS;
 	dbh->b_blocknr = sbh->b_blocknr;
-	dbh->b_bdev = sbh->b_bdev;
+	bh_copy_bdev_file(dbh, sbh);
 
 	bh = dbh;
 	bits = sbh->b_state & (BIT(BH_Uptodate) | BIT(BH_Mapped));
@@ -216,7 +216,7 @@ static void nilfs_copy_folio(struct folio *dst, struct folio *src,
 		lock_buffer(dbh);
 		dbh->b_state = sbh->b_state & mask;
 		dbh->b_blocknr = sbh->b_blocknr;
-		dbh->b_bdev = sbh->b_bdev;
+		bh_copy_bdev_file(dbh, sbh);
 		sbh = sbh->b_this_page;
 		dbh = dbh->b_this_page;
 	} while (dbh != dbufs);
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 3c4c878f6d77..c795fd2000ee 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -609,7 +609,7 @@ static noinline int ntfs_get_block_vbo(struct inode *inode, u64 vbo,
 	lbo = ((u64)lcn << cluster_bits) + off;
 
 	set_buffer_mapped(bh);
-	bh->b_bdev = sb->s_bdev;
+	bh_set_bdev_file(bh, sb->s_bdev_file);
 	bh->b_blocknr = lbo >> sb->s_blocksize_bits;
 
 	valid = ni->i_valid;
diff --git a/fs/reiserfs/fix_node.c b/fs/reiserfs/fix_node.c
index 6c13a8d9a73c..2b288b1539d9 100644
--- a/fs/reiserfs/fix_node.c
+++ b/fs/reiserfs/fix_node.c
@@ -2332,7 +2332,7 @@ static void tb_buffer_sanity_check(struct super_block *sb,
 				       "in tree %s[%d] (%b)",
 				       descr, level, bh);
 
-		if (bh->b_bdev != sb->s_bdev)
+		if (bh_bdev(bh) != sb->s_bdev)
 			reiserfs_panic(sb, "jmacd-4", "buffer has wrong "
 				       "device %s[%d] (%b)",
 				       descr, level, bh);
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index e539ccd39e1e..724113cb79d3 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -618,7 +618,7 @@ static void reiserfs_end_buffer_io_sync(struct buffer_head *bh, int uptodate)
 	if (buffer_journaled(bh)) {
 		reiserfs_warning(NULL, "clm-2084",
 				 "pinned buffer %lu:%pg sent to disk",
-				 bh->b_blocknr, bh->b_bdev);
+				 bh->b_blocknr, bh_bdev(bh));
 	}
 	if (uptodate)
 		set_buffer_uptodate(bh);
diff --git a/fs/reiserfs/prints.c b/fs/reiserfs/prints.c
index 84a194b77f19..249a458b6e28 100644
--- a/fs/reiserfs/prints.c
+++ b/fs/reiserfs/prints.c
@@ -156,7 +156,7 @@ static int scnprintf_buffer_head(char *buf, size_t size, struct buffer_head *bh)
 {
 	return scnprintf(buf, size,
 			 "dev %pg, size %zd, blocknr %llu, count %d, state 0x%lx, page %p, (%s, %s, %s)",
-			 bh->b_bdev, bh->b_size,
+			 bh_bdev(bh), bh->b_size,
 			 (unsigned long long)bh->b_blocknr,
 			 atomic_read(&(bh->b_count)),
 			 bh->b_state, bh->b_page,
@@ -561,7 +561,7 @@ static int print_super_block(struct buffer_head *bh)
 		return 1;
 	}
 
-	printk("%pg\'s super block is in block %llu\n", bh->b_bdev,
+	printk("%pg\'s super block is in block %llu\n", bh_bdev(bh),
 	       (unsigned long long)bh->b_blocknr);
 	printk("Reiserfs version %s\n", version);
 	printk("Block count %u\n", sb_block_count(rs));
diff --git a/fs/reiserfs/stree.c b/fs/reiserfs/stree.c
index 5faf702f8d15..23998f071d9c 100644
--- a/fs/reiserfs/stree.c
+++ b/fs/reiserfs/stree.c
@@ -331,7 +331,7 @@ static inline int key_in_buffer(
 	       || chk_path->path_length > MAX_HEIGHT,
 	       "PAP-5050: pointer to the key(%p) is NULL or invalid path length(%d)",
 	       key, chk_path->path_length);
-	RFALSE(!PATH_PLAST_BUFFER(chk_path)->b_bdev,
+	RFALSE(!bh_bdev(PATH_PLAST_BUFFER(chk_path)),
 	       "PAP-5060: device must not be NODEV");
 
 	if (comp_keys(get_lkey(chk_path, sb), key) == 1)
diff --git a/fs/reiserfs/tail_conversion.c b/fs/reiserfs/tail_conversion.c
index 2cec61af2a9e..300e6737a0db 100644
--- a/fs/reiserfs/tail_conversion.c
+++ b/fs/reiserfs/tail_conversion.c
@@ -187,7 +187,7 @@ void reiserfs_unmap_buffer(struct buffer_head *bh)
 	clear_buffer_mapped(bh);
 	clear_buffer_req(bh);
 	clear_buffer_new(bh);
-	bh->b_bdev = NULL;
+	bh_set_bdev_file(bh, NULL);
 	unlock_buffer(bh);
 }
 
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index d78454a4dd1f..4c6f0d0332c8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/blk_types.h>
+#include <linux/blkdev.h>
 #include <linux/fs.h>
 #include <linux/linkage.h>
 #include <linux/pagemap.h>
@@ -136,6 +137,23 @@ BUFFER_FNS(Meta, meta)
 BUFFER_FNS(Prio, prio)
 BUFFER_FNS(Defer_Completion, defer_completion)
 
+static __always_inline void bh_set_bdev_file(struct buffer_head *bh,
+					     struct file *bdev_file)
+{
+	bh->b_bdev = bdev_file ? file_bdev(bdev_file) : NULL;
+}
+
+static __always_inline void bh_copy_bdev_file(struct buffer_head *dbh,
+					      struct buffer_head *sbh)
+{
+	dbh->b_bdev = sbh->b_bdev;
+}
+
+static __always_inline struct block_device *bh_bdev(struct buffer_head *bh)
+{
+	return bh->b_bdev;
+}
+
 static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
 {
 	/*
@@ -377,7 +395,7 @@ static inline void
 map_bh(struct buffer_head *bh, struct super_block *sb, sector_t block)
 {
 	set_buffer_mapped(bh);
-	bh->b_bdev = sb->s_bdev;
+	bh_set_bdev_file(bh, sb->s_bdev_file);
 	bh->b_blocknr = block;
 	bh->b_size = sb->s_blocksize;
 }
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 0e128ad51460..95d3ed978864 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -26,7 +26,7 @@ DECLARE_EVENT_CLASS(block_buffer,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= bh->b_bdev->bd_dev;
+		__entry->dev		= bh_bdev(bh)->bd_dev;
 		__entry->sector		= bh->b_blocknr;
 		__entry->size		= bh->b_size;
 	),
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH vfs.all 26/26] buffer: convert to use bdev_file
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (24 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 25/26] buffer: add helpers to get and set bdev Yu Kuai
@ 2024-04-06  9:09 ` Yu Kuai
  2024-04-07  2:20 ` [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
  26 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-06  9:09 UTC (permalink / raw)
  To: jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3

From: Yu Kuai <yukuai3@huawei.com>

With previous commit both filesystems and raw block device provide
bdev_file, it's safe to convert to use bdev_file. Now that there are no
users of bd_inode anymore, remove bd_inode from block_device as well.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c                |  1 -
 fs/buffer.c                 | 96 +++++++++++++++++++------------------
 fs/direct-io.c              |  2 +-
 fs/ext2/inode.c             |  2 +-
 fs/ext4/super.c             |  4 +-
 fs/jbd2/journal.c           |  6 +--
 fs/jbd2/recovery.c          |  9 ++--
 fs/jbd2/revoke.c            | 14 +++---
 fs/mpage.c                  |  8 ++--
 fs/nilfs2/recovery.c        | 27 +++++++----
 fs/ntfs3/fsntfs.c           | 10 ++--
 fs/ntfs3/super.c            |  6 +--
 fs/ocfs2/journal.c          |  2 +-
 fs/reiserfs/journal.c       |  8 ++--
 fs/reiserfs/reiserfs.h      |  6 +--
 include/linux/blk_types.h   |  1 -
 include/linux/buffer_head.h | 67 +++++++++++++-------------
 17 files changed, 141 insertions(+), 128 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 3d300823da6b..31972a7bd358 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -412,7 +412,6 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	spin_lock_init(&bdev->bd_size_lock);
 	mutex_init(&bdev->bd_holder_lock);
 	bdev->bd_partno = partno;
-	bdev->bd_inode = inode;
 	bdev->bd_queue = disk->queue;
 	if (partno)
 		bdev->bd_has_submit_bio = disk->part0->bd_has_submit_bio;
diff --git a/fs/buffer.c b/fs/buffer.c
index e4d74eb63265..a84e9878b52f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -187,9 +187,9 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  * succeeds, there is no need to take i_private_lock.
  */
 static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct file *bdev_file, sector_t block)
 {
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = file_inode(bdev_file);
 	struct address_space *bd_mapping = bd_inode->i_mapping;
 	struct buffer_head *ret = NULL;
 	pgoff_t index;
@@ -232,7 +232,7 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)
 		       "device %pg blocksize: %d\n",
 		       (unsigned long long)block,
 		       (unsigned long long)bh->b_blocknr,
-		       bh->b_state, bh->b_size, bdev,
+		       bh->b_state, bh->b_size, file_bdev(bdev_file),
 		       1 << bd_inode->i_blkbits);
 	}
 out_unlock:
@@ -655,10 +655,12 @@ EXPORT_SYMBOL(generic_buffers_fsync);
  * `bblock + 1' is probably a dirty indirect block.  Hunt it down and, if it's
  * dirty, schedule it for IO.  So that indirects merge nicely with their data.
  */
-void write_boundary_block(struct block_device *bdev,
-			sector_t bblock, unsigned blocksize)
+void write_boundary_block(struct file *bdev_file, sector_t bblock,
+			  unsigned int blocksize)
 {
-	struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize);
+	struct buffer_head *bh = __find_get_block(bdev_file, bblock + 1,
+						  blocksize);
+
 	if (bh) {
 		if (buffer_dirty(bh))
 			write_dirty_buffer(bh, 0);
@@ -992,21 +994,21 @@ static sector_t blkdev_max_block(struct block_device *bdev, unsigned int size)
 
 /*
  * Initialise the state of a blockdev folio's buffers.
- */ 
-static sector_t folio_init_buffers(struct folio *folio,
-		struct block_device *bdev, unsigned size)
+ */
+static sector_t folio_init_buffers(struct folio *folio, struct file *bdev_file,
+				   unsigned int size)
 {
 	struct buffer_head *head = folio_buffers(folio);
 	struct buffer_head *bh = head;
 	bool uptodate = folio_test_uptodate(folio);
 	sector_t block = div_u64(folio_pos(folio), size);
-	sector_t end_block = blkdev_max_block(bdev, size);
+	sector_t end_block = blkdev_max_block(file_bdev(bdev_file), size);
 
 	do {
 		if (!buffer_mapped(bh)) {
 			bh->b_end_io = NULL;
 			bh->b_private = NULL;
-			bh->b_bdev = bdev;
+			bh->b_bdev_file = bdev_file;
 			bh->b_blocknr = block;
 			if (uptodate)
 				set_buffer_uptodate(bh);
@@ -1031,10 +1033,10 @@ static sector_t folio_init_buffers(struct folio *folio,
  * Returns false if we have a failure which cannot be cured by retrying
  * without sleeping.  Returns true if we succeeded, or the caller should retry.
  */
-static bool grow_dev_folio(struct block_device *bdev, sector_t block,
-		pgoff_t index, unsigned size, gfp_t gfp)
+static bool grow_dev_folio(struct file *bdev_file, sector_t block,
+			   pgoff_t index, unsigned int size, gfp_t gfp)
 {
-	struct inode *inode = bdev->bd_inode;
+	struct inode *inode = file_inode(bdev_file);
 	struct folio *folio;
 	struct buffer_head *bh;
 	sector_t end_block = 0;
@@ -1047,7 +1049,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 	bh = folio_buffers(folio);
 	if (bh) {
 		if (bh->b_size == size) {
-			end_block = folio_init_buffers(folio, bdev, size);
+			end_block = folio_init_buffers(folio, bdev_file, size);
 			goto unlock;
 		}
 
@@ -1075,7 +1077,7 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 	 */
 	spin_lock(&inode->i_mapping->i_private_lock);
 	link_dev_buffers(folio, bh);
-	end_block = folio_init_buffers(folio, bdev, size);
+	end_block = folio_init_buffers(folio, bdev_file, size);
 	spin_unlock(&inode->i_mapping->i_private_lock);
 unlock:
 	folio_unlock(folio);
@@ -1088,8 +1090,8 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
  * that folio was dirty, the buffers are set dirty also.  Returns false
  * if we've hit a permanent error.
  */
-static bool grow_buffers(struct block_device *bdev, sector_t block,
-		unsigned size, gfp_t gfp)
+static bool grow_buffers(struct file *bdev_file, sector_t block,
+			 unsigned int size, gfp_t gfp)
 {
 	loff_t pos;
 
@@ -1100,18 +1102,20 @@ static bool grow_buffers(struct block_device *bdev, sector_t block,
 	if (check_mul_overflow(block, (sector_t)size, &pos) || pos > MAX_LFS_FILESIZE) {
 		printk(KERN_ERR "%s: requested out-of-range block %llu for device %pg\n",
 			__func__, (unsigned long long)block,
-			bdev);
+			file_bdev(bdev_file));
 		return false;
 	}
 
 	/* Create a folio with the proper size buffers */
-	return grow_dev_folio(bdev, block, pos / PAGE_SIZE, size, gfp);
+	return grow_dev_folio(bdev_file, block, pos / PAGE_SIZE, size, gfp);
 }
 
 static struct buffer_head *
-__getblk_slow(struct block_device *bdev, sector_t block,
-	     unsigned size, gfp_t gfp)
+__getblk_slow(struct file *bdev_file, sector_t block, unsigned int size,
+	      gfp_t gfp)
 {
+	struct block_device *bdev = file_bdev(bdev_file);
+
 	/* Size must be multiple of hard sectorsize */
 	if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
 			(size < 512 || size > PAGE_SIZE))) {
@@ -1127,11 +1131,11 @@ __getblk_slow(struct block_device *bdev, sector_t block,
 	for (;;) {
 		struct buffer_head *bh;
 
-		bh = __find_get_block(bdev, block, size);
+		bh = __find_get_block(bdev_file, block, size);
 		if (bh)
 			return bh;
 
-		if (!grow_buffers(bdev, block, size, gfp))
+		if (!grow_buffers(bdev_file, block, size, gfp))
 			return NULL;
 	}
 }
@@ -1353,7 +1357,7 @@ static void bh_lru_install(struct buffer_head *bh)
  * Look up the bh in this cpu's LRU.  If it's there, move it to the head.
  */
 static struct buffer_head *
-lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
+lookup_bh_lru(struct file *bdev_file, sector_t block, unsigned int size)
 {
 	struct buffer_head *ret = NULL;
 	unsigned int i;
@@ -1367,8 +1371,8 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
 	for (i = 0; i < BH_LRU_SIZE; i++) {
 		struct buffer_head *bh = __this_cpu_read(bh_lrus.bhs[i]);
 
-		if (bh && bh->b_blocknr == block && bh_bdev(bh) == bdev &&
-		    bh->b_size == size) {
+		if (bh && bh->b_blocknr == block &&
+		    bh_bdev(bh) == file_bdev(bdev_file) && bh->b_size == size) {
 			if (i) {
 				while (i) {
 					__this_cpu_write(bh_lrus.bhs[i],
@@ -1392,13 +1396,13 @@ lookup_bh_lru(struct block_device *bdev, sector_t block, unsigned size)
  * NULL
  */
 struct buffer_head *
-__find_get_block(struct block_device *bdev, sector_t block, unsigned size)
+__find_get_block(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	struct buffer_head *bh = lookup_bh_lru(bdev, block, size);
+	struct buffer_head *bh = lookup_bh_lru(bdev_file, block, size);
 
 	if (bh == NULL) {
 		/* __find_get_block_slow will mark the page accessed */
-		bh = __find_get_block_slow(bdev, block);
+		bh = __find_get_block_slow(bdev_file, block);
 		if (bh)
 			bh_lru_install(bh);
 	} else
@@ -1410,32 +1414,32 @@ EXPORT_SYMBOL(__find_get_block);
 
 /**
  * bdev_getblk - Get a buffer_head in a block device's buffer cache.
- * @bdev: The block device.
+ * @bdev_file: The opened block device.
  * @block: The block number.
- * @size: The size of buffer_heads for this @bdev.
+ * @size: The size of buffer_heads for this @bdev_file.
  * @gfp: The memory allocation flags to use.
  *
  * Return: The buffer head, or NULL if memory could not be allocated.
  */
-struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
-		unsigned size, gfp_t gfp)
+struct buffer_head *bdev_getblk(struct file *bdev_file, sector_t block,
+				unsigned int size, gfp_t gfp)
 {
-	struct buffer_head *bh = __find_get_block(bdev, block, size);
+	struct buffer_head *bh = __find_get_block(bdev_file, block, size);
 
 	might_alloc(gfp);
 	if (bh)
 		return bh;
 
-	return __getblk_slow(bdev, block, size, gfp);
+	return __getblk_slow(bdev_file, block, size, gfp);
 }
 EXPORT_SYMBOL(bdev_getblk);
 
 /*
  * Do async read-ahead on a buffer..
  */
-void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
+void __breadahead(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	struct buffer_head *bh = bdev_getblk(bdev, block, size,
+	struct buffer_head *bh = bdev_getblk(bdev_file, block, size,
 			GFP_NOWAIT | __GFP_MOVABLE);
 
 	if (likely(bh)) {
@@ -1447,7 +1451,7 @@ EXPORT_SYMBOL(__breadahead);
 
 /**
  *  __bread_gfp() - reads a specified block and returns the bh
- *  @bdev: the block_device to read from
+ *  @bdev_file: the opened block_device to read from
  *  @block: number of block
  *  @size: size (in bytes) to read
  *  @gfp: page allocation flag
@@ -1458,12 +1462,12 @@ EXPORT_SYMBOL(__breadahead);
  *  It returns NULL if the block was unreadable.
  */
 struct buffer_head *
-__bread_gfp(struct block_device *bdev, sector_t block,
-		   unsigned size, gfp_t gfp)
+__bread_gfp(struct file *bdev_file, sector_t block, unsigned int size,
+	    gfp_t gfp)
 {
 	struct buffer_head *bh;
 
-	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp |= mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 
 	/*
 	 * Prefer looping in the allocator rather than here, at least that
@@ -1471,7 +1475,7 @@ __bread_gfp(struct block_device *bdev, sector_t block,
 	 */
 	gfp |= __GFP_NOFAIL;
 
-	bh = bdev_getblk(bdev, block, size, gfp);
+	bh = bdev_getblk(bdev_file, block, size, gfp);
 
 	if (likely(bh) && !buffer_uptodate(bh))
 		bh = __bread_slow(bh);
@@ -1676,7 +1680,7 @@ EXPORT_SYMBOL(create_empty_buffers);
 
 /**
  * clean_bdev_aliases: clean a range of buffers in block device
- * @bdev: Block device to clean buffers in
+ * @bdev_file: Opened block device to clean buffers in
  * @block: Start of a range of blocks to clean
  * @len: Number of blocks to clean
  *
@@ -1694,9 +1698,9 @@ EXPORT_SYMBOL(create_empty_buffers);
  * I/O in bforget() - it's more efficient to wait on the I/O only if we really
  * need to.  That happens here.
  */
-void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
+void clean_bdev_aliases(struct file *bdev_file, sector_t block, sector_t len)
 {
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = file_inode(bdev_file);
 	struct address_space *bd_mapping = bd_inode->i_mapping;
 	struct folio_batch fbatch;
 	pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 49475f530e0f..dade4cea754b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -948,7 +948,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio,
 					map_bh->b_blocknr << sdio->blkfactor;
 				if (buffer_new(map_bh)) {
 					clean_bdev_aliases(
-						bh_bdev(map_bh),
+						map_bh->b_bdev_file,
 						map_bh->b_blocknr,
 						map_bh->b_size >> i_blkbits);
 				}
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 6286d1578426..c8f59a61c95b 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -744,7 +744,7 @@ static int ext2_get_blocks(struct inode *inode,
 		 * We must unmap blocks before zeroing so that writeback cannot
 		 * overwrite zeros with stale data from block device page cache.
 		 */
-		clean_bdev_aliases(inode->i_sb->s_bdev,
+		clean_bdev_aliases(inode->i_sb->s_bdev_file,
 				   le32_to_cpu(chain[depth-1].key),
 				   count);
 		/*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d47c1e7e8798..1516d58a16ec 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -261,7 +261,7 @@ struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 
 void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
 {
-	struct buffer_head *bh = bdev_getblk(sb->s_bdev, block,
+	struct buffer_head *bh = bdev_getblk(sb->s_bdev_file, block,
 			sb->s_blocksize, GFP_NOWAIT | __GFP_NOWARN);
 
 	if (likely(bh)) {
@@ -5854,7 +5854,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
 	sb_block = EXT4_MIN_BLOCK_SIZE / blocksize;
 	offset = EXT4_MIN_BLOCK_SIZE % blocksize;
 	set_blocksize(bdev, blocksize);
-	bh = __bread(bdev, sb_block, blocksize);
+	bh = __bread(bdev_file, sb_block, blocksize);
 	if (!bh) {
 		ext4_msg(sb, KERN_ERR, "couldn't read superblock of "
 		       "external journal");
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index c1ce32d99267..6157496deec2 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -880,7 +880,7 @@ int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
 	if (ret)
 		return ret;
 
-	bh = __getblk(journal->j_dev, pblock, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, pblock, journal->j_blocksize);
 	if (!bh)
 		return -ENOMEM;
 
@@ -1007,7 +1007,7 @@ jbd2_journal_get_descriptor_buffer(transaction_t *transaction, int type)
 	if (err)
 		return NULL;
 
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, blocknr, journal->j_blocksize);
 	if (!bh)
 		return NULL;
 	atomic_dec(&transaction->t_outstanding_credits);
@@ -1461,7 +1461,7 @@ static int journal_load_superblock(journal_t *journal)
 	struct buffer_head *bh;
 	journal_superblock_t *sb;
 
-	bh = getblk_unmovable(journal->j_dev, journal->j_blk_offset,
+	bh = getblk_unmovable(journal->j_dev_file, journal->j_blk_offset,
 			      journal->j_blocksize);
 	if (bh)
 		err = bh_read(bh, 0);
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 1f7664984d6e..1685a139467a 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -92,7 +92,8 @@ static int do_readahead(journal_t *journal, unsigned int start)
 			goto failed;
 		}
 
-		bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+		bh = __getblk(journal->j_dev_file, blocknr,
+			      journal->j_blocksize);
 		if (!bh) {
 			err = -ENOMEM;
 			goto failed;
@@ -148,7 +149,7 @@ static int jread(struct buffer_head **bhp, journal_t *journal,
 		return err;
 	}
 
-	bh = __getblk(journal->j_dev, blocknr, journal->j_blocksize);
+	bh = __getblk(journal->j_dev_file, blocknr, journal->j_blocksize);
 	if (!bh)
 		return -ENOMEM;
 
@@ -370,7 +371,7 @@ int jbd2_journal_skip_recovery(journal_t *journal)
 		journal->j_head = journal->j_first;
 	} else {
 #ifdef CONFIG_JBD2_DEBUG
-		int dropped = info.end_transaction - 
+		int dropped = info.end_transaction -
 			be32_to_cpu(journal->j_superblock->s_sequence);
 		jbd2_debug(1,
 			  "JBD2: ignoring %d transaction%s from the journal.\n",
@@ -672,7 +673,7 @@ static int do_one_pass(journal_t *journal,
 
 					/* Find a buffer for the new
 					 * data being restored */
-					nbh = __getblk(journal->j_fs_dev,
+					nbh = __getblk(journal->j_fs_dev_file,
 							blocknr,
 							journal->j_blocksize);
 					if (nbh == NULL) {
diff --git a/fs/jbd2/revoke.c b/fs/jbd2/revoke.c
index 4556e4689024..f464f84d08e6 100644
--- a/fs/jbd2/revoke.c
+++ b/fs/jbd2/revoke.c
@@ -328,7 +328,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 {
 	struct buffer_head *bh = NULL;
 	journal_t *journal;
-	struct block_device *bdev;
+	struct file *bdev_file;
 	int err;
 
 	might_sleep();
@@ -341,11 +341,11 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 		return -EINVAL;
 	}
 
-	bdev = journal->j_fs_dev;
+	bdev_file = journal->j_fs_dev_file;
 	bh = bh_in;
 
 	if (!bh) {
-		bh = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh = __find_get_block(bdev_file, blocknr, journal->j_blocksize);
 		if (bh)
 			BUFFER_TRACE(bh, "found on hash");
 	}
@@ -355,7 +355,7 @@ int jbd2_journal_revoke(handle_t *handle, unsigned long long blocknr,
 
 		/* If there is a different buffer_head lying around in
 		 * memory anywhere... */
-		bh2 = __find_get_block(bdev, blocknr, journal->j_blocksize);
+		bh2 = __find_get_block(bdev_file, blocknr, journal->j_blocksize);
 		if (bh2) {
 			/* ... and it has RevokeValid status... */
 			if (bh2 != bh && buffer_revokevalid(bh2))
@@ -466,7 +466,9 @@ int jbd2_journal_cancel_revoke(handle_t *handle, struct journal_head *jh)
 	 * state machine will get very upset later on. */
 	if (need_cancel) {
 		struct buffer_head *bh2;
-		bh2 = __find_get_block(bh->b_bdev, bh->b_blocknr, bh->b_size);
+
+		bh2 = __find_get_block(bh->b_bdev_file, bh->b_blocknr,
+				       bh->b_size);
 		if (bh2) {
 			if (bh2 != bh)
 				clear_buffer_revoked(bh2);
@@ -495,7 +497,7 @@ void jbd2_clear_buffer_revoked_flags(journal_t *journal)
 			struct jbd2_revoke_record_s *record;
 			struct buffer_head *bh;
 			record = (struct jbd2_revoke_record_s *)list_entry;
-			bh = __find_get_block(journal->j_fs_dev,
+			bh = __find_get_block(journal->j_fs_dev_file,
 					      record->blocknr,
 					      journal->j_blocksize);
 			if (bh) {
diff --git a/fs/mpage.c b/fs/mpage.c
index 40594afa63cb..f01f06f20585 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -472,7 +472,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 	struct block_device *bdev = NULL;
 	int boundary = 0;
 	sector_t boundary_block = 0;
-	struct block_device *boundary_bdev = NULL;
+	struct file *boundary_bdev_file = NULL;
 	size_t length;
 	struct buffer_head map_bh;
 	loff_t i_size = i_size_read(inode);
@@ -513,7 +513,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 			boundary = buffer_boundary(bh);
 			if (boundary) {
 				boundary_block = bh->b_blocknr;
-				boundary_bdev = bh->b_bdev;
+				boundary_bdev_file = bh->b_bdev_file;
 			}
 			bdev = bh_bdev(bh);
 		} while ((bh = bh->b_this_page) != head);
@@ -555,7 +555,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 			clean_bdev_bh_alias(&map_bh);
 		if (buffer_boundary(&map_bh)) {
 			boundary_block = map_bh.b_blocknr;
-			boundary_bdev = map_bh.b_bdev;
+			boundary_bdev_file = map_bh.b_bdev_file;
 		}
 		if (page_block) {
 			if (map_bh.b_blocknr != first_block + page_block)
@@ -628,7 +628,7 @@ static int __mpage_writepage(struct folio *folio, struct writeback_control *wbc,
 	if (boundary || (first_unmapped != blocks_per_page)) {
 		bio = mpage_bio_submit_write(bio);
 		if (boundary_block) {
-			write_boundary_block(boundary_bdev,
+			write_boundary_block(boundary_bdev_file,
 					boundary_block, 1 << blkbits);
 		}
 	} else {
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index 49a70c68bf3c..88e4f130c932 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -107,7 +107,8 @@ static int nilfs_compute_checksum(struct the_nilfs *nilfs,
 		do {
 			struct buffer_head *bh;
 
-			bh = __bread(nilfs->ns_bdev, ++start, blocksize);
+			bh = __bread(nilfs->ns_sb->s_bdev_file, ++start,
+				     blocksize);
 			if (!bh)
 				return -EIO;
 			check_bytes -= size;
@@ -136,7 +137,8 @@ int nilfs_read_super_root_block(struct the_nilfs *nilfs, sector_t sr_block,
 	int ret;
 
 	*pbh = NULL;
-	bh_sr = __bread(nilfs->ns_bdev, sr_block, nilfs->ns_blocksize);
+	bh_sr = __bread(nilfs->ns_sb->s_bdev_file, sr_block,
+			nilfs->ns_blocksize);
 	if (unlikely(!bh_sr)) {
 		ret = NILFS_SEG_FAIL_IO;
 		goto failed;
@@ -183,7 +185,8 @@ nilfs_read_log_header(struct the_nilfs *nilfs, sector_t start_blocknr,
 {
 	struct buffer_head *bh_sum;
 
-	bh_sum = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
+	bh_sum = __bread(nilfs->ns_sb->s_bdev_file, start_blocknr,
+			 nilfs->ns_blocksize);
 	if (bh_sum)
 		*sum = (struct nilfs_segment_summary *)bh_sum->b_data;
 	return bh_sum;
@@ -250,7 +253,7 @@ static void *nilfs_read_summary_info(struct the_nilfs *nilfs,
 	if (bytes > (*pbh)->b_size - *offset) {
 		blocknr = (*pbh)->b_blocknr;
 		brelse(*pbh);
-		*pbh = __bread(nilfs->ns_bdev, blocknr + 1,
+		*pbh = __bread(nilfs->ns_sb->s_bdev_file, blocknr + 1,
 			       nilfs->ns_blocksize);
 		if (unlikely(!*pbh))
 			return NULL;
@@ -289,7 +292,7 @@ static void nilfs_skip_summary_info(struct the_nilfs *nilfs,
 		*offset = bytes * (count - (bcnt - 1) * nitem_per_block);
 
 		brelse(*pbh);
-		*pbh = __bread(nilfs->ns_bdev, blocknr + bcnt,
+		*pbh = __bread(nilfs->ns_sb->s_bdev_file, blocknr + bcnt,
 			       nilfs->ns_blocksize);
 	}
 }
@@ -318,7 +321,8 @@ static int nilfs_scan_dsync_log(struct the_nilfs *nilfs, sector_t start_blocknr,
 
 	sumbytes = le32_to_cpu(sum->ss_sumbytes);
 	blocknr = start_blocknr + DIV_ROUND_UP(sumbytes, nilfs->ns_blocksize);
-	bh = __bread(nilfs->ns_bdev, start_blocknr, nilfs->ns_blocksize);
+	bh = __bread(nilfs->ns_sb->s_bdev_file, start_blocknr,
+		     nilfs->ns_blocksize);
 	if (unlikely(!bh))
 		goto out;
 
@@ -478,7 +482,8 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
 	size_t from = pos & ~PAGE_MASK;
 	void *kaddr;
 
-	bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
+	bh_org = __bread(nilfs->ns_sb->s_bdev_file, rb->blocknr,
+			 nilfs->ns_blocksize);
 	if (unlikely(!bh_org))
 		return -EIO;
 
@@ -697,7 +702,8 @@ static void nilfs_finish_roll_forward(struct the_nilfs *nilfs,
 	    nilfs_get_segnum_of_block(nilfs, ri->ri_super_root))
 		return;
 
-	bh = __getblk(nilfs->ns_bdev, ri->ri_lsegs_start, nilfs->ns_blocksize);
+	bh = __getblk(nilfs->ns_sb->s_bdev_file, ri->ri_lsegs_start,
+		      nilfs->ns_blocksize);
 	BUG_ON(!bh);
 	memset(bh->b_data, 0, bh->b_size);
 	set_buffer_dirty(bh);
@@ -823,7 +829,8 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 	/* Read ahead segment */
 	b = seg_start;
 	while (b <= seg_end)
-		__breadahead(nilfs->ns_bdev, b++, nilfs->ns_blocksize);
+		__breadahead(nilfs->ns_sb->s_bdev_file, b++,
+			     nilfs->ns_blocksize);
 
 	for (;;) {
 		brelse(bh_sum);
@@ -869,7 +876,7 @@ int nilfs_search_super_root(struct the_nilfs *nilfs,
 		if (pseg_start == seg_start) {
 			nilfs_get_segment_range(nilfs, nextnum, &b, &end);
 			while (b <= end)
-				__breadahead(nilfs->ns_bdev, b++,
+				__breadahead(nilfs->ns_sb->s_bdev_file, b++,
 					     nilfs->ns_blocksize);
 		}
 		if (!(flags & NILFS_SS_SR)) {
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c
index ae2ef5c11868..32085ede15ea 100644
--- a/fs/ntfs3/fsntfs.c
+++ b/fs/ntfs3/fsntfs.c
@@ -1033,14 +1033,14 @@ struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block)
 
 int ntfs_sb_read(struct super_block *sb, u64 lbo, size_t bytes, void *buffer)
 {
-	struct block_device *bdev = sb->s_bdev;
 	u32 blocksize = sb->s_blocksize;
 	u64 block = lbo >> sb->s_blocksize_bits;
 	u32 off = lbo & (blocksize - 1);
 	u32 op = blocksize - off;
 
 	for (; bytes; block += 1, off = 0, op = blocksize) {
-		struct buffer_head *bh = __bread(bdev, block, blocksize);
+		struct buffer_head *bh = __bread(sb->s_bdev_file, block,
+						 blocksize);
 
 		if (!bh)
 			return -EIO;
@@ -1063,7 +1063,7 @@ int ntfs_sb_write(struct super_block *sb, u64 lbo, size_t bytes,
 		  const void *buf, int wait)
 {
 	u32 blocksize = sb->s_blocksize;
-	struct block_device *bdev = sb->s_bdev;
+	struct file *bdev_file = sb->s_bdev_file;
 	sector_t block = lbo >> sb->s_blocksize_bits;
 	u32 off = lbo & (blocksize - 1);
 	u32 op = blocksize - off;
@@ -1077,14 +1077,14 @@ int ntfs_sb_write(struct super_block *sb, u64 lbo, size_t bytes,
 			op = bytes;
 
 		if (op < blocksize) {
-			bh = __bread(bdev, block, blocksize);
+			bh = __bread(bdev_file, block, blocksize);
 			if (!bh) {
 				ntfs_err(sb, "failed to read block %llx",
 					 (u64)block);
 				return -EIO;
 			}
 		} else {
-			bh = __getblk(bdev, block, blocksize);
+			bh = __getblk(bdev_file, block, blocksize);
 			if (!bh)
 				return -ENOMEM;
 		}
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 9df7c20d066f..d67becf7302e 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1627,7 +1627,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
 void ntfs_unmap_meta(struct super_block *sb, CLST lcn, CLST len)
 {
 	struct ntfs_sb_info *sbi = sb->s_fs_info;
-	struct block_device *bdev = sb->s_bdev;
+	struct file *bdev_file = sb->s_bdev_file;
 	sector_t devblock = (u64)lcn * sbi->blocks_per_cluster;
 	unsigned long blocks = (u64)len * sbi->blocks_per_cluster;
 	unsigned long cnt = 0;
@@ -1642,9 +1642,9 @@ void ntfs_unmap_meta(struct super_block *sb, CLST lcn, CLST len)
 		limit >>= 1;
 
 	while (blocks--) {
-		clean_bdev_aliases(bdev, devblock++, 1);
+		clean_bdev_aliases(bdev_file, devblock++, 1);
 		if (cnt++ >= limit) {
-			sync_blockdev(bdev);
+			filemap_write_and_wait(bdev_file->f_mapping);
 			cnt = 0;
 		}
 	}
diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 604fea3a26ff..4ad64997f3c7 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -1209,7 +1209,7 @@ static int ocfs2_force_read_journal(struct inode *inode)
 		}
 
 		for (i = 0; i < p_blocks; i++, p_blkno++) {
-			bh = __find_get_block(osb->sb->s_bdev, p_blkno,
+			bh = __find_get_block(osb->sb->s_bdev_file, p_blkno,
 					osb->sb->s_blocksize);
 			/* block not cached. */
 			if (!bh)
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c
index 724113cb79d3..3961f406ee7e 100644
--- a/fs/reiserfs/journal.c
+++ b/fs/reiserfs/journal.c
@@ -2315,7 +2315,7 @@ static int journal_read_transaction(struct super_block *sb,
  * from other places.
  * Note: Do not use journal_getblk/sb_getblk functions here!
  */
-static struct buffer_head *reiserfs_breada(struct block_device *dev,
+static struct buffer_head *reiserfs_breada(struct file *bdev_file,
 					   b_blocknr_t block, int bufsize,
 					   b_blocknr_t max_block)
 {
@@ -2324,7 +2324,7 @@ static struct buffer_head *reiserfs_breada(struct block_device *dev,
 	struct buffer_head *bh;
 	int i, j;
 
-	bh = __getblk(dev, block, bufsize);
+	bh = __getblk(bdev_file, block, bufsize);
 	if (!bh || buffer_uptodate(bh))
 		return (bh);
 
@@ -2334,7 +2334,7 @@ static struct buffer_head *reiserfs_breada(struct block_device *dev,
 	bhlist[0] = bh;
 	j = 1;
 	for (i = 1; i < blocks; i++) {
-		bh = __getblk(dev, block + i, bufsize);
+		bh = __getblk(bdev_file, block + i, bufsize);
 		if (!bh)
 			break;
 		if (buffer_uptodate(bh)) {
@@ -2447,7 +2447,7 @@ static int journal_read(struct super_block *sb)
 		 * device and journal device to be the same
 		 */
 		d_bh =
-		    reiserfs_breada(file_bdev(journal->j_bdev_file), cur_dblock,
+		    reiserfs_breada(journal->j_bdev_file, cur_dblock,
 				    sb->s_blocksize,
 				    SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
 				    SB_ONDISK_JOURNAL_SIZE(sb));
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index f0e1f29f20ee..49caa7c42fb7 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -2810,10 +2810,10 @@ struct reiserfs_journal_header {
 
 /* We need these to make journal.c code more readable */
 #define journal_find_get_block(s, block) __find_get_block(\
-		file_bdev(SB_JOURNAL(s)->j_bdev_file), block, s->s_blocksize)
-#define journal_getblk(s, block) __getblk(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
+		SB_JOURNAL(s)->j_bdev_file, block, s->s_blocksize)
+#define journal_getblk(s, block) __getblk(SB_JOURNAL(s)->j_bdev_file,\
 		block, s->s_blocksize)
-#define journal_bread(s, block) __bread(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
+#define journal_bread(s, block) __bread(SB_JOURNAL(s)->j_bdev_file,\
 		block, s->s_blocksize)
 
 enum reiserfs_bh_state_bits {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 22f736908cbe..d0907c079779 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -50,7 +50,6 @@ struct block_device {
 	bool			bd_write_holder;
 	bool			bd_has_submit_bio;
 	dev_t			bd_dev;
-	struct inode		*bd_inode;	/* will die */
 
 	atomic_t		bd_openers;
 	spinlock_t		bd_size_lock; /* for bd_inode->i_size updates */
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 4c6f0d0332c8..cebff2645d59 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -69,7 +69,7 @@ struct buffer_head {
 	size_t b_size;			/* size of mapping */
 	char *b_data;			/* pointer to data within the page */
 
-	struct block_device *b_bdev;
+	struct file *b_bdev_file;
 	bh_end_io_t *b_end_io;		/* I/O completion */
  	void *b_private;		/* reserved for b_end_io */
 	struct list_head b_assoc_buffers; /* associated with another mapping */
@@ -140,18 +140,18 @@ BUFFER_FNS(Defer_Completion, defer_completion)
 static __always_inline void bh_set_bdev_file(struct buffer_head *bh,
 					     struct file *bdev_file)
 {
-	bh->b_bdev = bdev_file ? file_bdev(bdev_file) : NULL;
+	bh->b_bdev_file = bdev_file;
 }
 
 static __always_inline void bh_copy_bdev_file(struct buffer_head *dbh,
 					      struct buffer_head *sbh)
 {
-	dbh->b_bdev = sbh->b_bdev;
+	dbh->b_bdev_file = sbh->b_bdev_file;
 }
 
 static __always_inline struct block_device *bh_bdev(struct buffer_head *bh)
 {
-	return bh->b_bdev;
+	return bh->b_bdev_file ? file_bdev(bh->b_bdev_file) : NULL;
 }
 
 static __always_inline void set_buffer_uptodate(struct buffer_head *bh)
@@ -230,25 +230,24 @@ int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end,
 				  bool datasync);
 int generic_buffers_fsync(struct file *file, loff_t start, loff_t end,
 			  bool datasync);
-void clean_bdev_aliases(struct block_device *bdev, sector_t block,
-			sector_t len);
+void clean_bdev_aliases(struct file *bdev_file, sector_t block, sector_t len);
 static inline void clean_bdev_bh_alias(struct buffer_head *bh)
 {
-	clean_bdev_aliases(bh->b_bdev, bh->b_blocknr, 1);
+	clean_bdev_aliases(bh->b_bdev_file, bh->b_blocknr, 1);
 }
 
 void mark_buffer_async_write(struct buffer_head *bh);
 void __wait_on_buffer(struct buffer_head *);
 wait_queue_head_t *bh_waitq_head(struct buffer_head *bh);
-struct buffer_head *__find_get_block(struct block_device *bdev, sector_t block,
-			unsigned size);
-struct buffer_head *bdev_getblk(struct block_device *bdev, sector_t block,
-		unsigned size, gfp_t gfp);
+struct buffer_head *__find_get_block(struct file *bdev_file, sector_t block,
+				     unsigned int size);
+struct buffer_head *bdev_getblk(struct file *bdev_file, sector_t block,
+				unsigned int size, gfp_t gfp);
 void __brelse(struct buffer_head *);
 void __bforget(struct buffer_head *);
-void __breadahead(struct block_device *, sector_t block, unsigned int size);
-struct buffer_head *__bread_gfp(struct block_device *,
-				sector_t block, unsigned size, gfp_t gfp);
+void __breadahead(struct file *bdev_file, sector_t block, unsigned int size);
+struct buffer_head *__bread_gfp(struct file *bdev_file, sector_t block,
+				unsigned int size, gfp_t gfp);
 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
 void unlock_buffer(struct buffer_head *bh);
@@ -257,8 +256,8 @@ int sync_dirty_buffer(struct buffer_head *bh);
 int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
 void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags);
 void submit_bh(blk_opf_t, struct buffer_head *);
-void write_boundary_block(struct block_device *bdev,
-			sector_t bblock, unsigned blocksize);
+void write_boundary_block(struct file *bdev_file, sector_t bblock,
+			  unsigned int blocksize);
 int bh_uptodate_or_lock(struct buffer_head *bh);
 int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait);
 void __bh_read_batch(int nr, struct buffer_head *bhs[],
@@ -336,59 +335,61 @@ static inline void bforget(struct buffer_head *bh)
 static inline struct buffer_head *
 sb_bread(struct super_block *sb, sector_t block)
 {
-	return __bread_gfp(sb->s_bdev, block, sb->s_blocksize, __GFP_MOVABLE);
+	return __bread_gfp(sb->s_bdev_file, block, sb->s_blocksize,
+			   __GFP_MOVABLE);
 }
 
 static inline struct buffer_head *
 sb_bread_unmovable(struct super_block *sb, sector_t block)
 {
-	return __bread_gfp(sb->s_bdev, block, sb->s_blocksize, 0);
+	return __bread_gfp(sb->s_bdev_file, block, sb->s_blocksize, 0);
 }
 
 static inline void
 sb_breadahead(struct super_block *sb, sector_t block)
 {
-	__breadahead(sb->s_bdev, block, sb->s_blocksize);
+	__breadahead(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
-static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
-		sector_t block, unsigned size)
+static inline struct buffer_head *getblk_unmovable(struct file *bdev_file,
+						   sector_t block,
+						   unsigned int size)
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 	gfp |= __GFP_NOFAIL;
 
-	return bdev_getblk(bdev, block, size, gfp);
+	return bdev_getblk(bdev_file, block, size, gfp);
 }
 
-static inline struct buffer_head *__getblk(struct block_device *bdev,
-		sector_t block, unsigned size)
+static inline struct buffer_head *__getblk(struct file *bdev_file,
+					   sector_t block, unsigned int size)
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev_file->f_mapping, ~__GFP_FS);
 	gfp |= __GFP_MOVABLE | __GFP_NOFAIL;
 
-	return bdev_getblk(bdev, block, size, gfp);
+	return bdev_getblk(bdev_file, block, size, gfp);
 }
 
 static inline struct buffer_head *sb_getblk(struct super_block *sb,
 		sector_t block)
 {
-	return __getblk(sb->s_bdev, block, sb->s_blocksize);
+	return __getblk(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
 static inline struct buffer_head *sb_getblk_gfp(struct super_block *sb,
 		sector_t block, gfp_t gfp)
 {
-	return bdev_getblk(sb->s_bdev, block, sb->s_blocksize, gfp);
+	return bdev_getblk(sb->s_bdev_file, block, sb->s_blocksize, gfp);
 }
 
 static inline struct buffer_head *
 sb_find_get_block(struct super_block *sb, sector_t block)
 {
-	return __find_get_block(sb->s_bdev, block, sb->s_blocksize);
+	return __find_get_block(sb->s_bdev_file, block, sb->s_blocksize);
 }
 
 static inline void
@@ -456,7 +457,7 @@ static inline void bh_readahead_batch(int nr, struct buffer_head *bhs[],
 
 /**
  *  __bread() - reads a specified block and returns the bh
- *  @bdev: the block_device to read from
+ *  @bdev_file: the opened block_device to read from
  *  @block: number of block
  *  @size: size (in bytes) to read
  *
@@ -465,9 +466,9 @@ static inline void bh_readahead_batch(int nr, struct buffer_head *bhs[],
  *  It returns NULL if the block was unreadable.
  */
 static inline struct buffer_head *
-__bread(struct block_device *bdev, sector_t block, unsigned size)
+__bread(struct file *bdev_file, sector_t block, unsigned int size)
 {
-	return __bread_gfp(bdev, block, size, __GFP_MOVABLE);
+	return __bread_gfp(bdev_file, block, size, __GFP_MOVABLE);
 }
 
 /**
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-06  9:09 ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
@ 2024-04-06 19:42   ` Al Viro
  2024-04-06 20:29     ` Al Viro
  2024-04-09 10:23   ` Christian Brauner
  1 sibling, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-06 19:42 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

On Sat, Apr 06, 2024 at 05:09:26PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> So that iomap and bffer_head can convert to use bdev_file in following
> patches.

Let me see if I got it straight.  You introduce dummy struct file instances
(no methods, nothing).  The *ONLY* purpose they serve is to correspond to
opened instances of struct bdev.  No other use is possible.

You shove them into ->i_private of bdevfs inodes.  Lifetime rules are...
odd.

In bdev_open() you arrange for such beast to be present.  You never
return it anywhere, they only get accessed via ->i_private, exposing
it at least to fs/buffer.c.  Reference to those suckers get stored
(without grabbing refcount) into buffer_head instances.

And all of that is for... what, exactly?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-06 19:42   ` Al Viro
@ 2024-04-06 20:29     ` Al Viro
  2024-04-07  1:18       ` Yu Kuai
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-06 20:29 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

On Sat, Apr 06, 2024 at 08:42:06PM +0100, Al Viro wrote:
> On Sat, Apr 06, 2024 at 05:09:26PM +0800, Yu Kuai wrote:
> > From: Yu Kuai <yukuai3@huawei.com>
> > 
> > So that iomap and bffer_head can convert to use bdev_file in following
> > patches.
> 
> Let me see if I got it straight.  You introduce dummy struct file instances
> (no methods, nothing).  The *ONLY* purpose they serve is to correspond to
> opened instances of struct bdev.  No other use is possible.
> 
> You shove them into ->i_private of bdevfs inodes.  Lifetime rules are...
> odd.
> 
> In bdev_open() you arrange for such beast to be present.  You never
> return it anywhere, they only get accessed via ->i_private, exposing
> it at least to fs/buffer.c.  Reference to those suckers get stored
> (without grabbing refcount) into buffer_head instances.
> 
> And all of that is for... what, exactly?

Put another way, what's the endgame here?  Are you going to try and
propagate those beasts down into bio_alloc()?  Because if you do not,
you need to keep struct block_device * around anyway.

We use ->b_bdev for several things:
	* passing to bio_alloc() (quite a few places)
	* %pg in debugging printks
	* (rare) passing to write_boundary_block().
	* (twice) passing to clean_bdev_aliases().
	* (once) passing to __find_get_block().
	* one irregular use as a key in lookup_bh_lru()

IDGI...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-06 20:29     ` Al Viro
@ 2024-04-07  1:18       ` Yu Kuai
  2024-04-07  1:51         ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-07  1:18 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/07 4:29, Al Viro 写道:
> On Sat, Apr 06, 2024 at 08:42:06PM +0100, Al Viro wrote:
>> On Sat, Apr 06, 2024 at 05:09:26PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> So that iomap and bffer_head can convert to use bdev_file in following
>>> patches.
>>
>> Let me see if I got it straight.  You introduce dummy struct file instances
>> (no methods, nothing).  The *ONLY* purpose they serve is to correspond to
>> opened instances of struct bdev.  No other use is possible.

Yes, this is the only purpose.
>>
>> You shove them into ->i_private of bdevfs inodes.  Lifetime rules are...
>> odd.
>>
>> In bdev_open() you arrange for such beast to be present.  You never
>> return it anywhere, they only get accessed via ->i_private, exposing
>> it at least to fs/buffer.c.  Reference to those suckers get stored
>> (without grabbing refcount) into buffer_head instances.
>>
>> And all of that is for... what, exactly?
> 
> Put another way, what's the endgame here?  Are you going to try and
> propagate those beasts down into bio_alloc()?  Because if you do not,
> you need to keep struct block_device * around anyway.

Yes, patch 23-26 already do the work to remove the field block_device
and convert to use bdev_file for iomap and buffer_head.

Or maybe you prefer the idea from last version to keep the block_device
field in iomap/buffer_head, and use it for raw block_device fops?

Thanks,
Kuai

> 
> We use ->b_bdev for several things:
> 	* passing to bio_alloc() (quite a few places)
> 	* %pg in debugging printks
> 	* (rare) passing to write_boundary_block().
> 	* (twice) passing to clean_bdev_aliases().
> 	* (once) passing to __find_get_block().
> 	* one irregular use as a key in lookup_bh_lru()
> 
> IDGI...
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  1:18       ` Yu Kuai
@ 2024-04-07  1:51         ` Al Viro
  2024-04-07  2:34           ` Yu Kuai
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-07  1:51 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 09:18:20AM +0800, Yu Kuai wrote:

> Yes, patch 23-26 already do the work to remove the field block_device
> and convert to use bdev_file for iomap and buffer_head.

What for?  I mean, what makes that dummy struct file * any better than
struct block_device *?  What's the point?

I agree that keeping an opened struct file for a block device is
a good idea - certainly better than weird crap used to carry the
"how had it been opened" along with bdev.  But that does *not*
mean not keeping ->s_bdev around; we might or might not find that
convenient, but it's not "struct block_device is Evil(tm), let's
exorcise".

Why do we care to do anything to struct buffer_head?  Or to
struct bio, for that matter...

I'm not saying that parts of the patchset do not make sense on
their own, but I don't understand what the last part is all
about.

Al, still going through that series...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode
  2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
                   ` (25 preceding siblings ...)
  2024-04-06  9:09 ` [PATCH vfs.all 26/26] buffer: convert to use bdev_file Yu Kuai
@ 2024-04-07  2:20 ` Yu Kuai
  2024-04-08 14:05   ` Jan Kara
  26 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-07  2:20 UTC (permalink / raw)
  To: Yu Kuai, jack, hch, brauner, viro, axboe, gustavoars
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai (C)

Hi, Christian!
Hi, Jan!
+CC Gustavo

While testing this set, I found that the branch vfs.all seems broken,
xfstests report success while lots of BUG is reported in dmesg:

[22709.079704] 
=============================================================================^M
[22709.082404] BUG kmalloc-16 (Not tainted): Right Redzone overwritten^M
[22709.084148] 
-----------------------------------------------------------------------------^M
[22709.084148] ^M
[22709.086784] 0xffff88817d52e7a0-0xffff88817d52e7a7 @offset=1952. First 
byte 0x0 instead of 0xcc^M
[22709.089169] Allocated in do_handle_open+0x97/0x440 age=10 cpu=13 
pid=814795^M
[22709.091158]  __kmalloc+0x41d/0x5e0^M
[22709.092153]  do_handle_open+0x97/0x440^M
[22709.093240]  __x64_sys_open_by_handle_at+0x23/0x30^M
[22709.094482]  do_syscall_64+0xb1/0x210^M
[22709.095316]  entry_SYSCALL_64_after_hwframe+0x6c/0x74^M
[22709.096414] Freed in kvfree+0x4c/0x60 age=43560 cpu=15 pid=813506^M
[22709.097719]  kfree+0x31c/0x530^M
[22709.098396]  kvfree+0x4c/0x60^M
[22709.099048]  ext4_mb_release+0x29c/0x570^M
[22709.099901]  ext4_put_super+0x17f/0x590^M
[22709.100735]  generic_shutdown_super+0xba/0x240^M
[22709.101698]  kill_block_super+0x22/0x70^M
[22709.102525]  ext4_kill_sb+0x2a/0x70^M
[22709.103297]  deactivate_locked_super+0x4f/0xe0^M
[22709.104261]  deactivate_super+0x81/0x90^M
[22709.104876]  cleanup_mnt+0xe0/0x1b0^M
[22709.105419]  __cleanup_mnt+0x1a/0x30^M
[22709.105964]  task_work_run+0x88/0x100^M
[22709.106531]  syscall_exit_to_user_mode+0x3cc/0x3e0^M
[22709.107263]  do_syscall_64+0xc5/0x210^M
[22709.107820]  entry_SYSCALL_64_after_hwframe+0x6c/0x74^M

While digging this problem, I found that commit 1b43c4629756 ("fs:
Annotate struct file_handle with __counted_by() and use struct_size()")
might made a mistake, and I verified following patch can fix the
problem.

Thanks,
Kuai

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 53ed54711cd2..bcfecac2dc54 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -201,8 +201,7 @@ static int handle_to_path(int mountdirfd, struct 
file_handle __user *ufh,
         /* copy the full handle */
         *handle = f_handle;
         if (copy_from_user(&handle->f_handle,
-                          &ufh->f_handle,
-                          struct_size(ufh, f_handle, 
f_handle.handle_bytes))) {
+                          &ufh->f_handle, f_handle.handle_bytes)) {
                 retval = -EFAULT;
                 goto out_handle;
         }

在 2024/04/06 17:09, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Hi, Jens!
> Hi, Jan!
> Hi, Christoph!
> Hi, Christian!
> Hi, AL!
> 
> Sorry for the delay(I was overwhelmed with other work stuff). Main changes
> from last version is patch 22(modified based on [1]), the idea is that
> stash a 'bdev_file' in 'bd_inode->i_private' while opening bdev the first
> time, and release it when last opener close the bdev.
> 
> The patch to use bdev and bdev_file as union for iomap/buffer_head is
> dropped and changes for iomap/buffer is splitted to patch 23-26.
> 
> I tested this set in my VM with blktests for virtio-scsi and xfstests
> for ext4/xfs for one round now, no regerssions are found yet.
> 
> Please let me know what you think!
> 
> [1] https://lore.kernel.org/all/c62dac0e-666f-9cc9-cffe-f3d985029d6a@huaweicloud.com/
> 
> Changes from RFC v4:
>   - respin on the top of vfs.all branch from vfs tree;
>   - add review tag, patches that are not reviewed: patch 19-26;
>   - add patch 21, fix a module reference problem;
>   - instead of using a union of bdev(for raw block device) and
>   bdev_file(for filesystems), add patch 22 to stash a bdev_file to
>   bd_inode->i_private, so that iomap and buffer_head for raw block device
>   can convert to use bdev_file as well;
>   - split the huge path for iomap/buffer into 4 patches, 21-24;
> 
> Changes from RFC v3:
>   - respin on the top of linux-next, based on Christian's patchset to
>   open bdev as file. Most of patches from v3 is dropped and change to use
>   file_inode(bdev_file) to get bd_inode or bdev_file->f_mapping to get
>   bd_inode->i_mapping.
> 
> Changes from RFC v2:
>   - remove bdev_associated_mapping() and patch 12 from v1;
>   - add kerneldoc comments for new bdev apis;
>   - rename __bdev_get_folio() to bdev_get_folio;
>   - fix a problem in erofs that erofs_init_metabuf() is not always
>   called.
>   - add reviewed-by tag for patch 15-17;
> 
> Changes from RFC v1:
>   - remove some bdev apis that is not necessary;
>   - pass in offset for bdev_read_folio() and __bdev_get_folio();
>   - remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
>   prevent access bd_indoe() directly from mapping_gfp_constraint() in
>   ext4.(patch 15, 16);
>   - remove block_device_ejected() from ext4.
> 
> Yu Kuai (26):
>    block: move two helpers into bdev.c
>    block: remove sync_blockdev_nowait()
>    block: remove sync_blockdev_range()
>    block: prevent direct access of bd_inode
>    block: add a helper bdev_read_folio()
>    bcachefs: remove dead function bdev_sectors()
>    cramfs: prevent direct access of bd_inode
>    erofs: prevent direct access of bd_inode
>    nilfs2: prevent direct access of bd_inode
>    gfs2: prevent direct access of bd_inode
>    btrfs: prevent direct access of bd_inode
>    ext4: remove block_device_ejected()
>    ext4: prevent direct access of bd_inode
>    jbd2: prevent direct access of bd_inode
>    s390/dasd: use bdev api in dasd_format()
>    bcache: prevent direct access of bd_inode
>    block2mtd: prevent direct access of bd_inode
>    scsi: use bdev helper in scsi_bios_ptable()
>    dm-vdo: convert to use bdev_file
>    block: factor out a helper init_bdev_file()
>    block: fix module reference leakage from bdev_open_by_dev error path
>    block: stash a bdev_file to read/write raw blcok_device
>    iomap: add helpers helpers to get and set bdev
>    iomap: convert to use bdev_file
>    buffer: add helpers to get and set bdev
>    buffer: convert to use bdev_file
> 
>   block/bdev.c                              | 262 ++++++++++++++++------
>   block/blk-zoned.c                         |   4 +-
>   block/blk.h                               |   2 +
>   block/fops.c                              |   6 +-
>   block/genhd.c                             |   9 +-
>   block/ioctl.c                             |   8 +-
>   block/partitions/core.c                   |   8 +-
>   drivers/md/bcache/super.c                 |   7 +-
>   drivers/md/dm-vdo/dedupe.c                |   7 +-
>   drivers/md/dm-vdo/dm-vdo-target.c         |   9 +-
>   drivers/md/dm-vdo/indexer/config.c        |   2 +-
>   drivers/md/dm-vdo/indexer/config.h        |   4 +-
>   drivers/md/dm-vdo/indexer/index-layout.c  |   6 +-
>   drivers/md/dm-vdo/indexer/index-layout.h  |   2 +-
>   drivers/md/dm-vdo/indexer/index-session.c |  18 +-
>   drivers/md/dm-vdo/indexer/index.c         |   4 +-
>   drivers/md/dm-vdo/indexer/index.h         |   2 +-
>   drivers/md/dm-vdo/indexer/indexer.h       |   6 +-
>   drivers/md/dm-vdo/indexer/io-factory.c    |  17 +-
>   drivers/md/dm-vdo/indexer/io-factory.h    |   4 +-
>   drivers/md/dm-vdo/indexer/volume.c        |   4 +-
>   drivers/md/dm-vdo/indexer/volume.h        |   2 +-
>   drivers/md/dm-vdo/vdo.c                   |   2 +-
>   drivers/md/md-bitmap.c                    |   2 +-
>   drivers/mtd/devices/block2mtd.c           |   6 +-
>   drivers/s390/block/dasd_ioctl.c           |   5 +-
>   drivers/scsi/scsicam.c                    |   3 +-
>   fs/affs/file.c                            |   2 +-
>   fs/bcachefs/util.h                        |   5 -
>   fs/btrfs/disk-io.c                        |  17 +-
>   fs/btrfs/disk-io.h                        |   4 +-
>   fs/btrfs/inode.c                          |   2 +-
>   fs/btrfs/super.c                          |   2 +-
>   fs/btrfs/volumes.c                        |  25 ++-
>   fs/btrfs/zoned.c                          |  20 +-
>   fs/btrfs/zoned.h                          |   4 +-
>   fs/buffer.c                               | 104 ++++-----
>   fs/cramfs/inode.c                         |   2 +-
>   fs/direct-io.c                            |   4 +-
>   fs/erofs/data.c                           |  22 +-
>   fs/erofs/internal.h                       |   1 +
>   fs/erofs/zmap.c                           |   2 +-
>   fs/exfat/fatent.c                         |   2 +-
>   fs/ext2/inode.c                           |   4 +-
>   fs/ext2/xattr.c                           |   2 +-
>   fs/ext4/dir.c                             |   2 +-
>   fs/ext4/ext4_jbd2.c                       |   2 +-
>   fs/ext4/inode.c                           |   2 +-
>   fs/ext4/mmp.c                             |   2 +-
>   fs/ext4/page-io.c                         |   5 +-
>   fs/ext4/super.c                           |  30 +--
>   fs/ext4/xattr.c                           |   2 +-
>   fs/f2fs/data.c                            |  10 +-
>   fs/f2fs/f2fs.h                            |   1 +
>   fs/fat/inode.c                            |   2 +-
>   fs/fuse/dax.c                             |   2 +-
>   fs/gfs2/aops.c                            |   2 +-
>   fs/gfs2/bmap.c                            |   2 +-
>   fs/gfs2/glock.c                           |   2 +-
>   fs/gfs2/meta_io.c                         |   2 +-
>   fs/gfs2/ops_fstype.c                      |   2 +-
>   fs/hpfs/file.c                            |   2 +-
>   fs/iomap/buffered-io.c                    |   8 +-
>   fs/iomap/direct-io.c                      |  11 +-
>   fs/iomap/swapfile.c                       |   2 +-
>   fs/iomap/trace.h                          |   6 +-
>   fs/jbd2/commit.c                          |   2 +-
>   fs/jbd2/journal.c                         |  34 +--
>   fs/jbd2/recovery.c                        |   9 +-
>   fs/jbd2/revoke.c                          |  14 +-
>   fs/jbd2/transaction.c                     |   8 +-
>   fs/mpage.c                                |  18 +-
>   fs/nilfs2/btnode.c                        |   4 +-
>   fs/nilfs2/gcinode.c                       |   2 +-
>   fs/nilfs2/mdt.c                           |   2 +-
>   fs/nilfs2/page.c                          |   4 +-
>   fs/nilfs2/recovery.c                      |  27 ++-
>   fs/nilfs2/segment.c                       |   2 +-
>   fs/ntfs3/fsntfs.c                         |  10 +-
>   fs/ntfs3/inode.c                          |   4 +-
>   fs/ntfs3/super.c                          |   6 +-
>   fs/ocfs2/journal.c                        |   2 +-
>   fs/reiserfs/fix_node.c                    |   2 +-
>   fs/reiserfs/journal.c                     |  10 +-
>   fs/reiserfs/prints.c                      |   4 +-
>   fs/reiserfs/reiserfs.h                    |   6 +-
>   fs/reiserfs/stree.c                       |   2 +-
>   fs/reiserfs/tail_conversion.c             |   2 +-
>   fs/sync.c                                 |   9 +-
>   fs/xfs/xfs_iomap.c                        |   4 +-
>   fs/zonefs/file.c                          |   4 +-
>   include/linux/blk_types.h                 |   2 +-
>   include/linux/blkdev.h                    |  19 +-
>   include/linux/buffer_head.h               |  81 ++++---
>   include/linux/iomap.h                     |  13 +-
>   include/linux/jbd2.h                      |  18 +-
>   include/trace/events/block.h              |   2 +-
>   97 files changed, 620 insertions(+), 440 deletions(-)
> 


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 04/26] block: prevent direct access of bd_inode
  2024-04-06  9:09 ` [PATCH vfs.all 04/26] block: prevent direct access of bd_inode Yu Kuai
@ 2024-04-07  2:22   ` Al Viro
  2024-04-07  2:37     ` Yu Kuai
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-07  2:22 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

On Sat, Apr 06, 2024 at 05:09:08PM +0800, Yu Kuai wrote:
> @@ -669,7 +669,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>  	struct file *file = iocb->ki_filp;
>  	struct block_device *bdev = I_BDEV(file->f_mapping->host);
> -	struct inode *bd_inode = bdev->bd_inode;
> +	struct inode *bd_inode = bdev_inode(bdev);

What you want here is this:

	struct inode *bd_inode = file->f_mapping->host;
	struct block_device *bdev = I_BDEV(bd_inode);


> --- a/block/ioctl.c
> +++ b/block/ioctl.c
> @@ -97,7 +97,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
>  {
>  	uint64_t range[2];
>  	uint64_t start, len;
> -	struct inode *inode = bdev->bd_inode;
> +	struct inode *inode = bdev_inode(bdev);
>  	int err;

The uses of 'inode' in this function are
        filemap_invalidate_lock(inode->i_mapping);
and
        filemap_invalidate_unlock(inode->i_mapping);

IOW, you want bdev_mapping(bdev), not bdev_inode(bdev).

> @@ -166,7 +166,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
>  {
>  	uint64_t range[2];
>  	uint64_t start, end, len;
> -	struct inode *inode = bdev->bd_inode;
> +	struct inode *inode = bdev_inode(bdev);

Same story.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  1:51         ` Al Viro
@ 2024-04-07  2:34           ` Yu Kuai
  2024-04-07  3:06             ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-07  2:34 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/07 9:51, Al Viro 写道:
> On Sun, Apr 07, 2024 at 09:18:20AM +0800, Yu Kuai wrote:
> 
>> Yes, patch 23-26 already do the work to remove the field block_device
>> and convert to use bdev_file for iomap and buffer_head.
> 
> What for?  I mean, what makes that dummy struct file * any better than
> struct block_device *?  What's the point?
> 
> I agree that keeping an opened struct file for a block device is
> a good idea - certainly better than weird crap used to carry the
> "how had it been opened" along with bdev.  But that does *not*
> mean not keeping ->s_bdev around; we might or might not find that
> convenient, but it's not "struct block_device is Evil(tm), let's
> exorcise".
> 
> Why do we care to do anything to struct buffer_head?  Or to
> struct bio, for that matter...

Other than raw block_device fops, other filesystems can use the opened
bdev_file directly for iomap and buffer_head, and they actually don't
need to reference block_device anymore. The point here is that whether
we want to keep a special handling for block_device fops or not. There
are two proposes now:

- one is from Christian to keep using block_device for block_device
fops, in order to do that, a new flag and some special handling is added
to iomap and buffer_head. See the patch from last version [1].

- one is from this patchset, allocate a *dummy* bdev_file just for iomap
and buffer_head to access bdev and bd_inode.

I personally prefer the later one, that's why there is a new version,
however, what do I know? That will depend on how people think.

[1] 
https://lore.kernel.org/all/20240222124555.2049140-20-yukuai1@huaweicloud.com/
Thanks,
Kuai

> 
> I'm not saying that parts of the patchset do not make sense on
> their own, but I don't understand what the last part is all
> about.
> 
> Al, still going through that series...
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 04/26] block: prevent direct access of bd_inode
  2024-04-07  2:22   ` Al Viro
@ 2024-04-07  2:37     ` Yu Kuai
  2024-04-11 11:12       ` Christian Brauner
  0 siblings, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-07  2:37 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/07 10:22, Al Viro 写道:
> On Sat, Apr 06, 2024 at 05:09:08PM +0800, Yu Kuai wrote:
>> @@ -669,7 +669,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>   {
>>   	struct file *file = iocb->ki_filp;
>>   	struct block_device *bdev = I_BDEV(file->f_mapping->host);
>> -	struct inode *bd_inode = bdev->bd_inode;
>> +	struct inode *bd_inode = bdev_inode(bdev);
> 
> What you want here is this:
> 
> 	struct inode *bd_inode = file->f_mapping->host;
> 	struct block_device *bdev = I_BDEV(bd_inode);

Yes, this way is better, logically.
> 
> 
>> --- a/block/ioctl.c
>> +++ b/block/ioctl.c
>> @@ -97,7 +97,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
>>   {
>>   	uint64_t range[2];
>>   	uint64_t start, len;
>> -	struct inode *inode = bdev->bd_inode;
>> +	struct inode *inode = bdev_inode(bdev);
>>   	int err;
> 
> The uses of 'inode' in this function are
>          filemap_invalidate_lock(inode->i_mapping);
> and
>          filemap_invalidate_unlock(inode->i_mapping);
> 
> IOW, you want bdev_mapping(bdev), not bdev_inode(bdev).
> 
>> @@ -166,7 +166,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
>>   {
>>   	uint64_t range[2];
>>   	uint64_t start, end, len;
>> -	struct inode *inode = bdev->bd_inode;
>> +	struct inode *inode = bdev_inode(bdev);
> 
> Same story.

Yes.

Thanks for the suggestions!
Kuai

> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  2:34           ` Yu Kuai
@ 2024-04-07  3:06             ` Al Viro
  2024-04-07  3:21               ` Yu Kuai
  2024-04-09  9:00               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Christian Brauner
  0 siblings, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-07  3:06 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:

> Other than raw block_device fops, other filesystems can use the opened
> bdev_file directly for iomap and buffer_head, and they actually don't
> need to reference block_device anymore. The point here is that whether

What do you mean, "reference"?  The counting reference is to opened
file; ->s_bdev is a cached pointer to associated struct block_device,
and neither it nor pointers in buffer_head are valid past the moment
when you close the file.  Storing (non-counting) pointers to struct
file in struct buffer_head is not different in that respect - they
are *still* only valid while the "master" reference is held.

Again, what's the point of storing struct file * in struct buffer_head
or struct iomap?  In any instances of those structures?

There is a good reason to have it in places that keep a reference to
opened block device - the kind that _keeps_ the device opened.  Namely,
there's state that need to be carried from the place where we'd opened
the sucker to the place where we close it, and that state is better
carried by opened file.

But neither iomap nor buffer_head contain anything of that sort -
the lifetime management of the opened device is not in their
competence.  As the matter of fact, the logics around closing
those opened devices (bdev_release()) makes sure that no
instances of buffer_head (or iomap) will outlive them.
And they don't care about any extra state - everything
they use is in block_device and coallocated inode.

I could've easily missed something in one of the threads around
the earlier iterations of the patchset; if that's the case,
could somebody restate the rationale for that part and/or
post relevant lore.kernel.org links?  Christian?  hch?
What am I missing here?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  3:06             ` Al Viro
@ 2024-04-07  3:21               ` Yu Kuai
  2024-04-07  4:57                 ` Al Viro
  2024-04-09  4:26                 ` Al Viro
  2024-04-09  9:00               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Christian Brauner
  1 sibling, 2 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-07  3:21 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/07 11:06, Al Viro 写道:
> On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
> 
>> Other than raw block_device fops, other filesystems can use the opened
>> bdev_file directly for iomap and buffer_head, and they actually don't
>> need to reference block_device anymore. The point here is that whether
> 
> What do you mean, "reference"?  The counting reference is to opened
> file; ->s_bdev is a cached pointer to associated struct block_device,
> and neither it nor pointers in buffer_head are valid past the moment
> when you close the file.  Storing (non-counting) pointers to struct
> file in struct buffer_head is not different in that respect - they
> are *still* only valid while the "master" reference is held.
> 
> Again, what's the point of storing struct file * in struct buffer_head
> or struct iomap?  In any instances of those structures?

Perhaps this is what you missed, like the title of this set, in order to
remove direct acceess of bdev->bd_inode from fs/buffer, we must store
bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
with 'file_inode(bdev)' now.

Some history of previous discussions:

[1] https://lore.kernel.org/all/ZWRDeQ4K8BiYnV+X@infradead.org/
[2] 
https://lore.kernel.org/all/28237ec3-c3c1-1f0c-5250-04a88845d4a6@huaweicloud.com/
[3] 
https://lore.kernel.org/all/20240129-vfs-bdev-file-bd_inode-v1-0-42eb9eea96cf@kernel.org/

Thanks,
Kuai

> 
> There is a good reason to have it in places that keep a reference to
> opened block device - the kind that _keeps_ the device opened.  Namely,
> there's state that need to be carried from the place where we'd opened
> the sucker to the place where we close it, and that state is better
> carried by opened file.
> 
> But neither iomap nor buffer_head contain anything of that sort -
> the lifetime management of the opened device is not in their
> competence.  As the matter of fact, the logics around closing
> those opened devices (bdev_release()) makes sure that no
> instances of buffer_head (or iomap) will outlive them.
> And they don't care about any extra state - everything
> they use is in block_device and coallocated inode.
> 
> I could've easily missed something in one of the threads around
> the earlier iterations of the patchset; if that's the case,
> could somebody restate the rationale for that part and/or
> post relevant lore.kernel.org links?  Christian?  hch?
> What am I missing here?
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-06  9:09 ` [PATCH vfs.all 08/26] erofs: " Yu Kuai
@ 2024-04-07  4:05   ` Al Viro
  2024-04-07  4:08     ` Al Viro
  2024-04-11 16:13     ` Gao Xiang
  0 siblings, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-07  4:05 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that all filesystems stash the bdev file, it's ok to get inode
> for the file.

Looking at the only user of erofs_buf->inode (erofs_bread())...  We
use the inode for two things there - block size calculation (to get
from block number to position in bytes) and access to page cache.
We read in full pages anyway.  And frankly, looking at the callers,
we really would be better off if we passed position in bytes instead
of block number.  IOW, it smells like erofs_bread() having wrong type.

Look at the callers.  With 3 exceptions it's
fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);

and all of them actually want the return value + erofs_offset(...).  IOW,
we take a linear position (in bytes).  Divide it by block size (from sb).
Pass the factor to erofs_bread(), where we multiply that by block size
(from inode), see which page will that be in, get that page and return a
pointer *into* that page.  Then we again divide the same position
by block size (from sb) and add the remainder to the pointer returned
by erofs_bread().

IOW, it would be much easier to pass the position directly and to hell
with block size logics.  Three exceptions to that pattern:

fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);

Those could bloody well multiply the argument by block size;
the first one (erofs_read_metabuf()) is also interesting - its
callers themselves follow the similar pattern.  So it might be
worth passing it a position in bytes as well...

In any case, all 3 have superblock reference, so they can convert
from blocks to bytes conveniently.  Which means that erofs_bread()
doesn't need to mess with block size considerations at all.

IOW, it might make sense to replace erofs_buf->inode with
pointer to address space.  And use file_mapping() instead of
file_inode() in that patch...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-07  4:05   ` Al Viro
@ 2024-04-07  4:08     ` Al Viro
  2024-04-11 16:13     ` Gao Xiang
  1 sibling, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-07  4:08 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

On Sun, Apr 07, 2024 at 05:05:31AM +0100, Al Viro wrote:

> IOW, it might make sense to replace erofs_buf->inode with
> pointer to address space.  And use file_mapping() instead of
				     ->f_mapping, that is.
> file_inode() in that patch...
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  3:21               ` Yu Kuai
@ 2024-04-07  4:57                 ` Al Viro
  2024-04-07  5:11                   ` Al Viro
  2024-04-09  4:26                 ` Al Viro
  1 sibling, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-07  4:57 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 11:21:56AM +0800, Yu Kuai wrote:

> Perhaps this is what you missed, like the title of this set, in order to
> remove direct acceess of bdev->bd_inode from fs/buffer, we must store
> bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
> with 'file_inode(bdev)' now.

TBH, that looks like a very massive overkill that doesn't address
the real issues - you are still poking in that coallocated struct
inode, only instead of fetching it from pointer in struct block_device
you get it from a pointer in a dummy struct file.  How is that an
improvement?  After all, grepping for '\->[ 	]*bd_inode\>' and
looking through the few remaining users in e.g. fs/buffer.c is
*much* easier than grepping for file_inode callers.

AFAICS, Christoph's objections had been about the need to use saner
APIs instead of getting to inode in some way and poking in it.
And I agree that quite a few things in that series do just
that.  The final part doesn't.

IMO, those dummy struct file (used as convenient storage for pointer
to address_space *and* to the damn inode, with all its guts hanging
out) are simply wrong.

To reiterate:
	* we need to reduce the number of uses of those inodes
	* we need to find out what *is* getting used and sort out
the sane set of primitives; that's hard to do when we still have
a lot of noise.
	* we need convert to those saner primitives
	* we need to prevent reintroduction of noise, or at least
make such reintroduced noise easy to catch and whack.

->bd_inode is a problem because it's an attractive nuisance.  Removing
it would be fine, if there wasn't a harder to spot alternative way to
get the same pointer.  Try to grep for file_inode and bd_inode resp.
and compare the hit counts.  Seriously reduced set of bd_inode users is
fine - my impression is that after this series without the final part
we'd be down to 20 users or so.  In the meanwhile, there's about 1.4e3
users of file_inode(), most of them completely unrelated to block
devices.

PS: in grow_dev_folio() we probably want
	struct address_space *mapping = bdev->bd_inode->i_mapping;
instead of
	struct inode *inode = bdev->bd_inode;
as one of the preliminary chunks.
FWIW, it really looks like address_space (== page cache of block device,
not an unreasonably candidate for primitive) and block size (well,
logarithm thereof) cover the majority of what remains, with device
size possibly being (remote) third...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  4:57                 ` Al Viro
@ 2024-04-07  5:11                   ` Al Viro
  2024-04-07  5:21                     ` Al Viro
  2024-04-11 15:22                     ` Matthew Wilcox
  0 siblings, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-07  5:11 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 05:57:58AM +0100, Al Viro wrote:

> PS: in grow_dev_folio() we probably want
> 	struct address_space *mapping = bdev->bd_inode->i_mapping;
> instead of
> 	struct inode *inode = bdev->bd_inode;
> as one of the preliminary chunks.
> FWIW, it really looks like address_space (== page cache of block device,
> not an unreasonably candidate for primitive) and block size (well,
> logarithm thereof) cover the majority of what remains, with device
> size possibly being (remote) third...

Incidentally, how painful would it be to switch __bread_gfp() and __bread()
to passing *logarithm* of block size instead of block size?  And possibly
supply the same to clean_bdev_aliases()...

That would reduce fs/buffer.c uses to just "give me the address_space of
that block device"...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  5:11                   ` Al Viro
@ 2024-04-07  5:21                     ` Al Viro
  2024-04-11 15:22                     ` Matthew Wilcox
  1 sibling, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-07  5:21 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 06:11:19AM +0100, Al Viro wrote:
> On Sun, Apr 07, 2024 at 05:57:58AM +0100, Al Viro wrote:
> 
> > PS: in grow_dev_folio() we probably want
> > 	struct address_space *mapping = bdev->bd_inode->i_mapping;
> > instead of
> > 	struct inode *inode = bdev->bd_inode;
> > as one of the preliminary chunks.
> > FWIW, it really looks like address_space (== page cache of block device,
> > not an unreasonably candidate for primitive) and block size (well,
> > logarithm thereof) cover the majority of what remains, with device
> > size possibly being (remote) third...
> 
> Incidentally, how painful would it be to switch __bread_gfp() and __bread()
> to passing *logarithm* of block size instead of block size?  And possibly
> supply the same to clean_bdev_aliases()...
> 
> That would reduce fs/buffer.c uses to just "give me the address_space of
> that block device"...

... and from what I've seen in your series, it very much looks like after
that we could replace ->bd_inode with ->bd_mapping, turning your bdev_mapping()
into an inline and (hopefully) leaving the few remaining uses of bdev_inode()
outside of block/bdev.c _not_ on hot paths.  If nothing else, it would
make it much easier to grep for remaining odd stuff.

Might trim the btrfs parts of the series, at that - a lot of that seems to
be "how do we propagate opened file instead of just bdev, so that we could
get to its ->f_mapping deep in call chain"...

Again, all of that is only if __bread...() conversion to log(size) is feasible
without a massive PITA - there might be dragons...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode
  2024-04-07  2:20 ` [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
@ 2024-04-08 14:05   ` Jan Kara
  0 siblings, 0 replies; 116+ messages in thread
From: Jan Kara @ 2024-04-08 14:05 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, viro, axboe, gustavoars, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Sun 07-04-24 10:20:39, Yu Kuai wrote:
> Hi, Christian!
> Hi, Jan!
> +CC Gustavo
> 
> While testing this set, I found that the branch vfs.all seems broken,
> xfstests report success while lots of BUG is reported in dmesg:
> 
> [22709.079704] =============================================================================^M
> [22709.082404] BUG kmalloc-16 (Not tainted): Right Redzone overwritten^M
> [22709.084148] -----------------------------------------------------------------------------^M
> [22709.084148] ^M
> [22709.086784] 0xffff88817d52e7a0-0xffff88817d52e7a7 @offset=1952. First
> byte 0x0 instead of 0xcc^M
> [22709.089169] Allocated in do_handle_open+0x97/0x440 age=10 cpu=13
> pid=814795^M
> [22709.091158]  __kmalloc+0x41d/0x5e0^M
> [22709.092153]  do_handle_open+0x97/0x440^M
> [22709.093240]  __x64_sys_open_by_handle_at+0x23/0x30^M
> [22709.094482]  do_syscall_64+0xb1/0x210^M
> [22709.095316]  entry_SYSCALL_64_after_hwframe+0x6c/0x74^M
> [22709.096414] Freed in kvfree+0x4c/0x60 age=43560 cpu=15 pid=813506^M
> [22709.097719]  kfree+0x31c/0x530^M
> [22709.098396]  kvfree+0x4c/0x60^M
> [22709.099048]  ext4_mb_release+0x29c/0x570^M
> [22709.099901]  ext4_put_super+0x17f/0x590^M
> [22709.100735]  generic_shutdown_super+0xba/0x240^M
> [22709.101698]  kill_block_super+0x22/0x70^M
> [22709.102525]  ext4_kill_sb+0x2a/0x70^M
> [22709.103297]  deactivate_locked_super+0x4f/0xe0^M
> [22709.104261]  deactivate_super+0x81/0x90^M
> [22709.104876]  cleanup_mnt+0xe0/0x1b0^M
> [22709.105419]  __cleanup_mnt+0x1a/0x30^M
> [22709.105964]  task_work_run+0x88/0x100^M
> [22709.106531]  syscall_exit_to_user_mode+0x3cc/0x3e0^M
> [22709.107263]  do_syscall_64+0xc5/0x210^M
> [22709.107820]  entry_SYSCALL_64_after_hwframe+0x6c/0x74^M
> 
> While digging this problem, I found that commit 1b43c4629756 ("fs:
> Annotate struct file_handle with __counted_by() and use struct_size()")
> might made a mistake, and I verified following patch can fix the
> problem.

Yep, this should have been fixed recently in VFS tree as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  3:21               ` Yu Kuai
  2024-04-07  4:57                 ` Al Viro
@ 2024-04-09  4:26                 ` Al Viro
  2024-04-09  4:53                   ` Al Viro
  2024-04-09  6:22                   ` Yu Kuai
  1 sibling, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-09  4:26 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 11:21:56AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2024/04/07 11:06, Al Viro 写道:
> > On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
> > 
> > > Other than raw block_device fops, other filesystems can use the opened
> > > bdev_file directly for iomap and buffer_head, and they actually don't
> > > need to reference block_device anymore. The point here is that whether
> > 
> > What do you mean, "reference"?  The counting reference is to opened
> > file; ->s_bdev is a cached pointer to associated struct block_device,
> > and neither it nor pointers in buffer_head are valid past the moment
> > when you close the file.  Storing (non-counting) pointers to struct
> > file in struct buffer_head is not different in that respect - they
> > are *still* only valid while the "master" reference is held.
> > 
> > Again, what's the point of storing struct file * in struct buffer_head
> > or struct iomap?  In any instances of those structures?
> 
> Perhaps this is what you missed, like the title of this set, in order to
> remove direct acceess of bdev->bd_inode from fs/buffer, we must store
> bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
> with 'file_inode(bdev)' now.

BTW, what does that have to do with iomap?  All it passes ->bdev to is
	1) bio_alloc()
	2) bio_alloc_bioset()
	3) bio_init()
	4) bdev_logical_block_size()
	5) bdev_iter_is_aligned()
	6) bdev_fua() 
	7) bdev_write_cache()

None of those goes anywhere near fs/buffer.c or uses ->bd_inode, AFAICS.

Again, what's the point?  It feels like you are trying to replace *all*
uses of struct block_device with struct file, just because.

If that's what's going on, please don't.  Using struct file instead
of that bdev_handle crap - sure, makes perfect sense.  But shoving it
down into struct bio really, really does not.

I'd suggest to start with adding ->bd_mapping as the first step and
converting the places where mapping is all we want to using that.
Right at the beginning of your series.  Then let's see what gets
left.

And leave ->bd_inode there for now; don't blindly replace it with
->bd_mapping->host everywhere.  It's much easier to grep for.
The point of the exercise is to find what do we really need ->bd_inode
for and what primitives are missing, not getting rid of a bad word...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-09  4:26                 ` Al Viro
@ 2024-04-09  4:53                   ` Al Viro
  2024-04-09  6:22                   ` Yu Kuai
  1 sibling, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-09  4:53 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Tue, Apr 09, 2024 at 05:26:43AM +0100, Al Viro wrote:
> On Sun, Apr 07, 2024 at 11:21:56AM +0800, Yu Kuai wrote:
> > Hi,
> > 
> > 在 2024/04/07 11:06, Al Viro 写道:
> > > On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
> > > 
> > > > Other than raw block_device fops, other filesystems can use the opened
> > > > bdev_file directly for iomap and buffer_head, and they actually don't
> > > > need to reference block_device anymore. The point here is that whether
> > > 
> > > What do you mean, "reference"?  The counting reference is to opened
> > > file; ->s_bdev is a cached pointer to associated struct block_device,
> > > and neither it nor pointers in buffer_head are valid past the moment
> > > when you close the file.  Storing (non-counting) pointers to struct
> > > file in struct buffer_head is not different in that respect - they
> > > are *still* only valid while the "master" reference is held.
> > > 
> > > Again, what's the point of storing struct file * in struct buffer_head
> > > or struct iomap?  In any instances of those structures?
> > 
> > Perhaps this is what you missed, like the title of this set, in order to
> > remove direct acceess of bdev->bd_inode from fs/buffer, we must store
> > bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
> > with 'file_inode(bdev)' now.
> 
> BTW, what does that have to do with iomap?  All it passes ->bdev to is
> 	1) bio_alloc()
> 	2) bio_alloc_bioset()
> 	3) bio_init()
> 	4) bdev_logical_block_size()
> 	5) bdev_iter_is_aligned()
> 	6) bdev_fua() 
> 	7) bdev_write_cache()
> 
> None of those goes anywhere near fs/buffer.c or uses ->bd_inode, AFAICS.

Note that callers of iomap stuff in block/fops.c *do* have struct file *,
so there's no problem with getting to inode - there the use of ->f_mapping->host
is normal for ->write_iter()/->read_iter() instances.  Same for filemap_read()
and iomap_file_buffered_write().

As the matter of fact, the only use of ->bd_inode in block/fops.c is easily
killable, as discussed upthread.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-09  4:26                 ` Al Viro
  2024-04-09  4:53                   ` Al Viro
@ 2024-04-09  6:22                   ` Yu Kuai
  2024-04-10 10:59                     ` Jan Kara
  1 sibling, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-09  6:22 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/09 12:26, Al Viro 写道:
> On Sun, Apr 07, 2024 at 11:21:56AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/04/07 11:06, Al Viro 写道:
>>> On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
>>>
>>>> Other than raw block_device fops, other filesystems can use the opened
>>>> bdev_file directly for iomap and buffer_head, and they actually don't
>>>> need to reference block_device anymore. The point here is that whether
>>>
>>> What do you mean, "reference"?  The counting reference is to opened
>>> file; ->s_bdev is a cached pointer to associated struct block_device,
>>> and neither it nor pointers in buffer_head are valid past the moment
>>> when you close the file.  Storing (non-counting) pointers to struct
>>> file in struct buffer_head is not different in that respect - they
>>> are *still* only valid while the "master" reference is held.
>>>
>>> Again, what's the point of storing struct file * in struct buffer_head
>>> or struct iomap?  In any instances of those structures?
>>
>> Perhaps this is what you missed, like the title of this set, in order to
>> remove direct acceess of bdev->bd_inode from fs/buffer, we must store
>> bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
>> with 'file_inode(bdev)' now.
> 
> BTW, what does that have to do with iomap?  All it passes ->bdev to is
> 	1) bio_alloc()
> 	2) bio_alloc_bioset()
> 	3) bio_init()
> 	4) bdev_logical_block_size()
> 	5) bdev_iter_is_aligned()
> 	6) bdev_fua()
> 	7) bdev_write_cache()
> 
> None of those goes anywhere near fs/buffer.c or uses ->bd_inode, AFAICS.
> 
> Again, what's the point?  It feels like you are trying to replace *all*
> uses of struct block_device with struct file, just because.
> 
> If that's what's going on, please don't.  Using struct file instead
> of that bdev_handle crap - sure, makes perfect sense.  But shoving it
> down into struct bio really, really does not.
> 
> I'd suggest to start with adding ->bd_mapping as the first step and
> converting the places where mapping is all we want to using that.
> Right at the beginning of your series.  Then let's see what gets
> left.

Thanks so much for your advice, in fact, I totally agree with this that
adding a 'bd_mapping' or expose the helper bdev_mapping().

However, I will let Christoph and Jan to make the decision, when they
get time to take a look at this.

Thanks!
Kuai

> 
> And leave ->bd_inode there for now; don't blindly replace it with
> ->bd_mapping->host everywhere.  It's much easier to grep for.
> The point of the exercise is to find what do we really need ->bd_inode
> for and what primitives are missing, not getting rid of a bad word...
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  3:06             ` Al Viro
  2024-04-07  3:21               ` Yu Kuai
@ 2024-04-09  9:00               ` Christian Brauner
  1 sibling, 0 replies; 116+ messages in thread
From: Christian Brauner @ 2024-04-09  9:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 04:06:10AM +0100, Al Viro wrote:
> On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
> 
> > Other than raw block_device fops, other filesystems can use the opened
> > bdev_file directly for iomap and buffer_head, and they actually don't
> > need to reference block_device anymore. The point here is that whether
> 
> What do you mean, "reference"?  The counting reference is to opened
> file; ->s_bdev is a cached pointer to associated struct block_device,
> and neither it nor pointers in buffer_head are valid past the moment
> when you close the file.  Storing (non-counting) pointers to struct
> file in struct buffer_head is not different in that respect - they
> are *still* only valid while the "master" reference is held.
> 
> Again, what's the point of storing struct file * in struct buffer_head
> or struct iomap?  In any instances of those structures?
> 
> There is a good reason to have it in places that keep a reference to
> opened block device - the kind that _keeps_ the device opened.  Namely,
> there's state that need to be carried from the place where we'd opened
> the sucker to the place where we close it, and that state is better
> carried by opened file.
> 
> But neither iomap nor buffer_head contain anything of that sort -
> the lifetime management of the opened device is not in their
> competence.  As the matter of fact, the logics around closing
> those opened devices (bdev_release()) makes sure that no
> instances of buffer_head (or iomap) will outlive them.
> And they don't care about any extra state - everything
> they use is in block_device and coallocated inode.
> 
> I could've easily missed something in one of the threads around
> the earlier iterations of the patchset; if that's the case,
> could somebody restate the rationale for that part and/or
> post relevant lore.kernel.org links?  Christian?  hch?
> What am I missing here?

The original series was a simple RFC/POC to show that struct file could
be used to remove bd_inode access in a wide variety of situations. But
as I've mentioned in that thread I wasn't happy with various aspects of
the approach which is why I never pushed forward with it. The part where
we pushed struct file into buffer_header was the most obvious one.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-06  9:09 ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
  2024-04-06 19:42   ` Al Viro
@ 2024-04-09 10:23   ` Christian Brauner
  2024-04-09 11:53     ` Yu Kuai
  1 sibling, 1 reply; 116+ messages in thread
From: Christian Brauner @ 2024-04-09 10:23 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, viro, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

> +static int __stash_bdev_file(struct block_device *bdev)

I've said that on the previous version. I think that this is really
error prone and seems overall like an unpleasant solution. I would
really like to avoid going down that route.

I think a chunk of this series is good though specicially simple
conversions of individual filesystems where file_inode() or f_mapping
makes sense. There's a few exceptions where we might be better of
replacing the current apis with something else (I think Al touched on
that somewhere further down the thread.).

I'd suggest the straightforward bd_inode removals into a separate series
that I can take.

Thanks for working on all of this. It's certainly a contentious area.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-09 10:23   ` Christian Brauner
@ 2024-04-09 11:53     ` Yu Kuai
  0 siblings, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-09 11:53 UTC (permalink / raw)
  To: Christian Brauner, Yu Kuai
  Cc: jack, hch, viro, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/09 18:23, Christian Brauner 写道:
>> +static int __stash_bdev_file(struct block_device *bdev)
> 
> I've said that on the previous version. I think that this is really
> error prone and seems overall like an unpleasant solution. I would
> really like to avoid going down that route.

Yes, I see your point, and it's indeed reasonable.

> 
> I think a chunk of this series is good though specicially simple
> conversions of individual filesystems where file_inode() or f_mapping
> makes sense. There's a few exceptions where we might be better of
> replacing the current apis with something else (I think Al touched on
> that somewhere further down the thread.).
> 
> I'd suggest the straightforward bd_inode removals into a separate series
> that I can take.
> 
> Thanks for working on all of this. It's certainly a contentious area.

How about following simple patch to expose bdev_mapping() for
fs/buffer.c for now?

Thanks,
Kuai

diff --git a/block/blk.h b/block/blk.h
index a34bb590cce6..f8bcb43a12c6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -428,7 +428,6 @@ static inline int blkdev_zone_mgmt_ioctl(struct 
block_device *bdev,
  #endif /* CONFIG_BLK_DEV_ZONED */

  struct inode *bdev_inode(struct block_device *bdev);
-struct address_space *bdev_mapping(struct block_device *bdev);
  struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
  void bdev_add(struct block_device *bdev, dev_t dev);

diff --git a/fs/buffer.c b/fs/buffer.c
index 4f73d23c2c46..e2bd19e3fe48 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -189,8 +189,8 @@ EXPORT_SYMBOL(end_buffer_write_sync);
  static struct buffer_head *
  __find_get_block_slow(struct block_device *bdev, sector_t block)
  {
-       struct inode *bd_inode = bdev->bd_inode;
-       struct address_space *bd_mapping = bd_inode->i_mapping;
+       struct address_space *bd_mapping = bdev_mapping(bdev);
+       struct inode *bd_inode = bd_mapping->host;
         struct buffer_head *ret = NULL;
         pgoff_t index;
         struct buffer_head *bh;
@@ -1034,12 +1034,12 @@ static sector_t folio_init_buffers(struct folio 
*folio,
  static bool grow_dev_folio(struct block_device *bdev, sector_t block,
                 pgoff_t index, unsigned size, gfp_t gfp)
  {
-       struct inode *inode = bdev->bd_inode;
+       struct address_space *bd_mapping = bdev_mapping(bdev);
         struct folio *folio;
         struct buffer_head *bh;
         sector_t end_block = 0;

-       folio = __filemap_get_folio(inode->i_mapping, index,
+       folio = __filemap_get_folio(bd_mapping, index,
                         FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
         if (IS_ERR(folio))
                 return false;
@@ -1073,10 +1073,10 @@ static bool grow_dev_folio(struct block_device 
*bdev, sector_t block,
          * lock to be atomic wrt __find_get_block(), which does not
          * run under the folio lock.
          */
-       spin_lock(&inode->i_mapping->i_private_lock);
+       spin_lock(&bd_mapping->i_private_lock);
         link_dev_buffers(folio, bh);
         end_block = folio_init_buffers(folio, bdev, size);
-       spin_unlock(&inode->i_mapping->i_private_lock);
+       spin_unlock(&bd_mapping->i_private_lock);
  unlock:
         folio_unlock(folio);
         folio_put(folio);
@@ -1463,7 +1463,7 @@ __bread_gfp(struct block_device *bdev, sector_t block,
  {
         struct buffer_head *bh;

-       gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+       gfp |= mapping_gfp_constraint(bdev_mapping(bdev), ~__GFP_FS);

         /*
          * Prefer looping in the allocator rather than here, at least that
@@ -1696,8 +1696,8 @@ EXPORT_SYMBOL(create_empty_buffers);
   */
  void clean_bdev_aliases(struct block_device *bdev, sector_t block, 
sector_t len)
  {
-       struct inode *bd_inode = bdev->bd_inode;
-       struct address_space *bd_mapping = bd_inode->i_mapping;
+       struct address_space *bd_mapping = bdev_mapping(bdev);
+       struct inode *bd_inode = bd_mapping->host;
         struct folio_batch fbatch;
         pgoff_t index = ((loff_t)block << bd_inode->i_blkbits) / PAGE_SIZE;
         pgoff_t end;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bc840e0fb6e5..bbae55535d53 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1527,6 +1527,7 @@ void blkdev_put_no_open(struct block_device *bdev);

  struct block_device *I_BDEV(struct inode *inode);
  struct block_device *file_bdev(struct file *bdev_file);
+struct address_space *bdev_mapping(struct block_device *bdev);
  bool disk_live(struct gendisk *disk);
  unsigned int block_size(struct block_device *bdev);

> .
> 


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-06  9:09 ` [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file Yu Kuai
@ 2024-04-10 10:56   ` Jan Kara
  2024-04-10 17:26   ` Matthew Sakai
  1 sibling, 0 replies; 116+ messages in thread
From: Jan Kara @ 2024-04-10 10:56 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, hch, brauner, viro, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

On Sat 06-04-24 17:09:23, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that dm upper layer already statsh the file of opened device in
> 'dm_dev->bdev_file', it's ok to get inode from the file.
> 
> There are no functional changes, prepare to remove 'bd_inode' from
> block_device.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/md/dm-vdo/dedupe.c                |  7 ++++---
>  drivers/md/dm-vdo/dm-vdo-target.c         |  9 +++++++--
>  drivers/md/dm-vdo/indexer/config.c        |  2 +-
>  drivers/md/dm-vdo/indexer/config.h        |  4 ++--
>  drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
>  drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
>  drivers/md/dm-vdo/indexer/index-session.c | 18 ++++++++++--------
>  drivers/md/dm-vdo/indexer/index.c         |  4 ++--
>  drivers/md/dm-vdo/indexer/index.h         |  2 +-
>  drivers/md/dm-vdo/indexer/indexer.h       |  6 +++---
>  drivers/md/dm-vdo/indexer/io-factory.c    | 17 +++++++++--------
>  drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
>  drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
>  drivers/md/dm-vdo/indexer/volume.h        |  2 +-
>  drivers/md/dm-vdo/vdo.c                   |  2 +-
>  15 files changed, 49 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
> index 117266e1b3ae..0e311989247e 100644
> --- a/drivers/md/dm-vdo/dedupe.c
> +++ b/drivers/md/dm-vdo/dedupe.c
> @@ -2191,7 +2191,7 @@ static int initialize_index(struct vdo *vdo, struct hash_zones *zones)
>  	uds_offset = ((vdo_get_index_region_start(geometry) -
>  		       geometry.bio_offset) * VDO_BLOCK_SIZE);
>  	zones->parameters = (struct uds_parameters) {
> -		.bdev = vdo->device_config->owned_device->bdev,
> +		.bdev_file = vdo->device_config->owned_device->bdev_file,
>  		.offset = uds_offset,
>  		.size = (vdo_get_index_region_size(geometry) * VDO_BLOCK_SIZE),
>  		.memory_size = geometry.index_config.mem,
> @@ -2582,8 +2582,9 @@ static void resume_index(void *context, struct vdo_completion *parent)
>  	struct device_config *config = parent->vdo->device_config;
>  	int result;
>  
> -	zones->parameters.bdev = config->owned_device->bdev;
> -	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
> +	zones->parameters.bdev_file = config->owned_device->bdev_file;
> +	result = uds_resume_index_session(zones->index_session,
> +					  zones->parameters.bdev_file);
>  	if (result != UDS_SUCCESS)
>  		vdo_log_error_strerror(result, "Error resuming dedupe index");
>  
> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
> index 5a4b0a927f56..79e861c2887c 100644
> --- a/drivers/md/dm-vdo/dm-vdo-target.c
> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
> @@ -696,6 +696,11 @@ static void handle_parse_error(struct device_config *config, char **error_ptr,
>  	*error_ptr = error_str;
>  }
>  
> +static loff_t vdo_get_device_size(const struct device_config *config)
> +{
> +	return i_size_read(file_inode(config->owned_device->bdev_file));
> +}
> +
>  /**
>   * parse_device_config() - Convert the dmsetup table into a struct device_config.
>   * @argc: The number of table values.
> @@ -878,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
>  	}
>  
>  	if (config->version == 0) {
> -		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
> +		u64 device_size = vdo_get_device_size(config);
>  
>  		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>  	}
> @@ -1011,7 +1016,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
>  
>  static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
>  {
> -	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
> +	return vdo_get_device_size(vdo->device_config) / VDO_BLOCK_SIZE;
>  }
>  
>  static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
> diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
> index 5532371b952f..dcf0742a6145 100644
> --- a/drivers/md/dm-vdo/indexer/config.c
> +++ b/drivers/md/dm-vdo/indexer/config.c
> @@ -344,7 +344,7 @@ int uds_make_configuration(const struct uds_parameters *params,
>  	config->volume_index_mean_delta = DEFAULT_VOLUME_INDEX_MEAN_DELTA;
>  	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
>  	config->nonce = params->nonce;
> -	config->bdev = params->bdev;
> +	config->bdev_file = params->bdev_file;
>  	config->offset = params->offset;
>  	config->size = params->size;
>  
> diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
> index 08507dc2f7a1..8ba0cf72dec9 100644
> --- a/drivers/md/dm-vdo/indexer/config.h
> +++ b/drivers/md/dm-vdo/indexer/config.h
> @@ -25,8 +25,8 @@ enum {
>  
>  /* A set of configuration parameters for the indexer. */
>  struct uds_configuration {
> -	/* Storage device for the index */
> -	struct block_device *bdev;
> +	/* File of opened storage device for the index */
> +	struct file *bdev_file;
>  
>  	/* The maximum allowable size of the index */
>  	size_t size;
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
> index 627adc24af3b..32eee76bc246 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.c
> +++ b/drivers/md/dm-vdo/indexer/index-layout.c
> @@ -1668,7 +1668,7 @@ static int create_layout_factory(struct index_layout *layout,
>  	size_t writable_size;
>  	struct io_factory *factory = NULL;
>  
> -	result = uds_make_io_factory(config->bdev, &factory);
> +	result = uds_make_io_factory(config->bdev_file, &factory);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> @@ -1741,9 +1741,9 @@ void uds_free_index_layout(struct index_layout *layout)
>  }
>  
>  int uds_replace_index_layout_storage(struct index_layout *layout,
> -				     struct block_device *bdev)
> +				     struct file *bdev_file)
>  {
> -	return uds_replace_storage(layout->factory, bdev);
> +	return uds_replace_storage(layout->factory, bdev_file);
>  }
>  
>  /* Obtain a dm_bufio_client for the volume region. */
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
> index e9ac6f4302d6..28f9be577631 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.h
> +++ b/drivers/md/dm-vdo/indexer/index-layout.h
> @@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
>  void uds_free_index_layout(struct index_layout *layout);
>  
>  int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
> -						  struct block_device *bdev);
> +						  struct file *bdev_file);
>  
>  int __must_check uds_load_index_state(struct index_layout *layout,
>  				      struct uds_index *index);
> diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
> index aee0914d604a..914abf5e006b 100644
> --- a/drivers/md/dm-vdo/indexer/index-session.c
> +++ b/drivers/md/dm-vdo/indexer/index-session.c
> @@ -335,7 +335,7 @@ int uds_open_index(enum uds_open_index_type open_type,
>  		vdo_log_error("missing required parameters");
>  		return -EINVAL;
>  	}
> -	if (parameters->bdev == NULL) {
> +	if (parameters->bdev_file == NULL) {
>  		vdo_log_error("missing required block device");
>  		return -EINVAL;
>  	}
> @@ -349,7 +349,7 @@ int uds_open_index(enum uds_open_index_type open_type,
>  		return uds_status_to_errno(result);
>  
>  	session->parameters = *parameters;
> -	format_dev_t(name, parameters->bdev->bd_dev);
> +	format_dev_t(name, file_bdev(parameters->bdev_file)->bd_dev);
>  	vdo_log_info("%s: %s", get_open_type_string(open_type), name);
>  
>  	result = initialize_index_session(session, open_type);
> @@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
>  	return uds_status_to_errno(result);
>  }
>  
> -static int replace_device(struct uds_index_session *session, struct block_device *bdev)
> +static int replace_device(struct uds_index_session *session,
> +			  struct file *bdev_file)
>  {
>  	int result;
>  
> -	result = uds_replace_index_storage(session->index, bdev);
> +	result = uds_replace_index_storage(session->index, bdev_file);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> -	session->parameters.bdev = bdev;
> +	session->parameters.bdev_file = bdev_file;
>  	return UDS_SUCCESS;
>  }
>  
> @@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
>   * device differs from the current backing store, the index will start using the new backing store.
>   */
>  int uds_resume_index_session(struct uds_index_session *session,
> -			     struct block_device *bdev)
> +			     struct file *bdev_file)
>  {
>  	int result = UDS_SUCCESS;
>  	bool no_work = false;
> @@ -502,8 +503,9 @@ int uds_resume_index_session(struct uds_index_session *session,
>  	if (no_work)
>  		return result;
>  
> -	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
> -		result = replace_device(session, bdev);
> +	if (session->index != NULL &&
> +	    bdev_file != session->parameters.bdev_file) {
> +		result = replace_device(session, bdev_file);
>  		if (result != UDS_SUCCESS) {
>  			mutex_lock(&session->request_mutex);
>  			session->state &= ~IS_FLAG_WAITING;
> diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
> index 1ba767144426..48b16275a067 100644
> --- a/drivers/md/dm-vdo/indexer/index.c
> +++ b/drivers/md/dm-vdo/indexer/index.c
> @@ -1336,9 +1336,9 @@ int uds_save_index(struct uds_index *index)
>  	return result;
>  }
>  
> -int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
> +int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
>  {
> -	return uds_replace_volume_storage(index->volume, index->layout, bdev);
> +	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
>  }
>  
>  /* Accessing statistics should be safe from any thread. */
> diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
> index edabb239548e..6e2e203f43f7 100644
> --- a/drivers/md/dm-vdo/indexer/index.h
> +++ b/drivers/md/dm-vdo/indexer/index.h
> @@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
>  void uds_free_index(struct uds_index *index);
>  
>  int __must_check uds_replace_index_storage(struct uds_index *index,
> -					   struct block_device *bdev);
> +					   struct file *bdev_file);
>  
>  void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
>  
> diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
> index 3744aaf625b0..246ff2810e01 100644
> --- a/drivers/md/dm-vdo/indexer/indexer.h
> +++ b/drivers/md/dm-vdo/indexer/indexer.h
> @@ -128,8 +128,8 @@ struct uds_volume_record {
>  };
>  
>  struct uds_parameters {
> -	/* The block_device used for storage */
> -	struct block_device *bdev;
> +	/* The bdev_file used for storage */
> +	struct file *bdev_file;
>  	/* The maximum allowable size of the index on storage */
>  	size_t size;
>  	/* The offset where the index should start */
> @@ -314,7 +314,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
>   * start using the new backing store instead.
>   */
>  int __must_check uds_resume_index_session(struct uds_index_session *session,
> -					  struct block_device *bdev);
> +					  struct file *bdev_file);
>  
>  /* Wait until all outstanding index operations are complete. */
>  int __must_check uds_flush_index_session(struct uds_index_session *session);
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
> index 515765d35794..f4dedb7b7f40 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.c
> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
> @@ -22,7 +22,7 @@
>   * make helper structures that can be used to access sections of the index.
>   */
>  struct io_factory {
> -	struct block_device *bdev;
> +	struct file *bdev_file;
>  	atomic_t ref_count;
>  };
>  
> @@ -59,7 +59,7 @@ static void uds_get_io_factory(struct io_factory *factory)
>  	atomic_inc(&factory->ref_count);
>  }
>  
> -int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
> +int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
>  {
>  	int result;
>  	struct io_factory *factory;
> @@ -68,16 +68,16 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
>  	if (result != VDO_SUCCESS)
>  		return result;
>  
> -	factory->bdev = bdev;
> +	factory->bdev_file = bdev_file;
>  	atomic_set_release(&factory->ref_count, 1);
>  
>  	*factory_ptr = factory;
>  	return UDS_SUCCESS;
>  }
>  
> -int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
> +int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
>  {
> -	factory->bdev = bdev;
> +	factory->bdev_file = bdev_file;
>  	return UDS_SUCCESS;
>  }
>  
> @@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
>  
>  size_t uds_get_writable_size(struct io_factory *factory)
>  {
> -	return i_size_read(factory->bdev->bd_inode);
> +	return i_size_read(file_inode(factory->bdev_file));
>  }
>  
>  /* Create a struct dm_bufio_client for an index region starting at offset. */
> @@ -99,8 +99,9 @@ int uds_make_bufio(struct io_factory *factory, off_t block_offset, size_t block_
>  {
>  	struct dm_bufio_client *client;
>  
> -	client = dm_bufio_client_create(factory->bdev, block_size, reserved_buffers, 0,
> -					NULL, NULL, 0);
> +	client = dm_bufio_client_create(file_bdev(factory->bdev_file),
> +					block_size, reserved_buffers,
> +					0, NULL, NULL, 0);
>  	if (IS_ERR(client))
>  		return -PTR_ERR(client);
>  
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
> index 7fb5a0616a79..a3ca84d62f2d 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.h
> +++ b/drivers/md/dm-vdo/indexer/io-factory.h
> @@ -24,11 +24,11 @@ enum {
>  	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
>  };
>  
> -int __must_check uds_make_io_factory(struct block_device *bdev,
> +int __must_check uds_make_io_factory(struct file *bdev_file,
>  				     struct io_factory **factory_ptr);
>  
>  int __must_check uds_replace_storage(struct io_factory *factory,
> -				     struct block_device *bdev);
> +				     struct file *bdev_file);
>  
>  void uds_put_io_factory(struct io_factory *factory);
>  
> diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
> index 655453bb276b..edbe46252657 100644
> --- a/drivers/md/dm-vdo/indexer/volume.c
> +++ b/drivers/md/dm-vdo/indexer/volume.c
> @@ -1465,12 +1465,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
>  
>  int __must_check uds_replace_volume_storage(struct volume *volume,
>  					    struct index_layout *layout,
> -					    struct block_device *bdev)
> +					    struct file *bdev_file)
>  {
>  	int result;
>  	u32 i;
>  
> -	result = uds_replace_index_layout_storage(layout, bdev);
> +	result = uds_replace_index_layout_storage(layout, bdev_file);
>  	if (result != UDS_SUCCESS)
>  		return result;
>  
> diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
> index 8679a5e55347..1dc3561b8b43 100644
> --- a/drivers/md/dm-vdo/indexer/volume.h
> +++ b/drivers/md/dm-vdo/indexer/volume.h
> @@ -130,7 +130,7 @@ void uds_free_volume(struct volume *volume);
>  
>  int __must_check uds_replace_volume_storage(struct volume *volume,
>  					    struct index_layout *layout,
> -					    struct block_device *bdev);
> +					    struct file *bdev_file);
>  
>  int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
>  						    u64 *lowest_vcn, u64 *highest_vcn,
> diff --git a/drivers/md/dm-vdo/vdo.c b/drivers/md/dm-vdo/vdo.c
> index fff847767755..eca9f8b51535 100644
> --- a/drivers/md/dm-vdo/vdo.c
> +++ b/drivers/md/dm-vdo/vdo.c
> @@ -809,7 +809,7 @@ void vdo_load_super_block(struct vdo *vdo, struct vdo_completion *parent)
>   */
>  struct block_device *vdo_get_backing_device(const struct vdo *vdo)
>  {
> -	return vdo->device_config->owned_device->bdev;
> +	return file_bdev(vdo->device_config->owned_device->bdev_file);
>  }
>  
>  /**
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-09  6:22                   ` Yu Kuai
@ 2024-04-10 10:59                     ` Jan Kara
  2024-04-10 22:34                       ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Jan Kara @ 2024-04-10 10:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Al Viro, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Tue 09-04-24 14:22:37, Yu Kuai wrote:
> Hi,
> 
> 在 2024/04/09 12:26, Al Viro 写道:
> > On Sun, Apr 07, 2024 at 11:21:56AM +0800, Yu Kuai wrote:
> > > Hi,
> > > 
> > > 在 2024/04/07 11:06, Al Viro 写道:
> > > > On Sun, Apr 07, 2024 at 10:34:56AM +0800, Yu Kuai wrote:
> > > > 
> > > > > Other than raw block_device fops, other filesystems can use the opened
> > > > > bdev_file directly for iomap and buffer_head, and they actually don't
> > > > > need to reference block_device anymore. The point here is that whether
> > > > 
> > > > What do you mean, "reference"?  The counting reference is to opened
> > > > file; ->s_bdev is a cached pointer to associated struct block_device,
> > > > and neither it nor pointers in buffer_head are valid past the moment
> > > > when you close the file.  Storing (non-counting) pointers to struct
> > > > file in struct buffer_head is not different in that respect - they
> > > > are *still* only valid while the "master" reference is held.
> > > > 
> > > > Again, what's the point of storing struct file * in struct buffer_head
> > > > or struct iomap?  In any instances of those structures?
> > > 
> > > Perhaps this is what you missed, like the title of this set, in order to
> > > remove direct acceess of bdev->bd_inode from fs/buffer, we must store
> > > bdev_file in buffer_head and iomap, and 'bdev->bd_inode' is replaced
> > > with 'file_inode(bdev)' now.
> > 
> > BTW, what does that have to do with iomap?  All it passes ->bdev to is
> > 	1) bio_alloc()
> > 	2) bio_alloc_bioset()
> > 	3) bio_init()
> > 	4) bdev_logical_block_size()
> > 	5) bdev_iter_is_aligned()
> > 	6) bdev_fua()
> > 	7) bdev_write_cache()
> > 
> > None of those goes anywhere near fs/buffer.c or uses ->bd_inode, AFAICS.
> > 
> > Again, what's the point?  It feels like you are trying to replace *all*
> > uses of struct block_device with struct file, just because.
> > 
> > If that's what's going on, please don't.  Using struct file instead
> > of that bdev_handle crap - sure, makes perfect sense.  But shoving it
> > down into struct bio really, really does not.
> > 
> > I'd suggest to start with adding ->bd_mapping as the first step and
> > converting the places where mapping is all we want to using that.
> > Right at the beginning of your series.  Then let's see what gets
> > left.
> 
> Thanks so much for your advice, in fact, I totally agree with this that
> adding a 'bd_mapping' or expose the helper bdev_mapping().
> 
> However, I will let Christoph and Jan to make the decision, when they
> get time to take a look at this.

I agree with Christian and Al - and I think I've expressed that already in
the previous version of the series [1] but I guess I was not explicit
enough :). I think the initial part of the series (upto patch 21, perhaps
excluding patch 20) is a nice cleanup but the latter part playing with
stashing struct file is not an improvement and seems pointless to me. So
I'd separate the initial part cleaning up the obvious places and let
Christian merge it and then we can figure out what (if anything) to do with
remaining bd_inode uses in fs/buffer.c etc. E.g. what Al suggests with
bd_mapping makes sense to me but I didn't check what's left after your
initial patches...

								Honza

[1] https://lore.kernel.org/all/20240322125750.jov4f3alsrkmqnq7@quack3

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-06  9:09 ` [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file Yu Kuai
  2024-04-10 10:56   ` Jan Kara
@ 2024-04-10 17:26   ` Matthew Sakai
  2024-04-10 17:40     ` Al Viro
  1 sibling, 1 reply; 116+ messages in thread
From: Matthew Sakai @ 2024-04-10 17:26 UTC (permalink / raw)
  To: Yu Kuai, jack, hch, brauner, viro, axboe
  Cc: linux-fsdevel, linux-block, yi.zhang, yangerkun, yukuai3, dm-devel

+dm-devel

On 4/6/24 05:09, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Now that dm upper layer already statsh the file of opened device in

                                  ^ stashes

> 'dm_dev->bdev_file', it's ok to get inode from the file.
> There are no functional changes, prepare to remove 'bd_inode' from
> block_device.
> 
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   drivers/md/dm-vdo/dedupe.c                |  7 ++++---
>   drivers/md/dm-vdo/dm-vdo-target.c         |  9 +++++++--
>   drivers/md/dm-vdo/indexer/config.c        |  2 +-
>   drivers/md/dm-vdo/indexer/config.h        |  4 ++--
>   drivers/md/dm-vdo/indexer/index-layout.c  |  6 +++---
>   drivers/md/dm-vdo/indexer/index-layout.h  |  2 +-
>   drivers/md/dm-vdo/indexer/index-session.c | 18 ++++++++++--------
>   drivers/md/dm-vdo/indexer/index.c         |  4 ++--
>   drivers/md/dm-vdo/indexer/index.h         |  2 +-
>   drivers/md/dm-vdo/indexer/indexer.h       |  6 +++---
>   drivers/md/dm-vdo/indexer/io-factory.c    | 17 +++++++++--------
>   drivers/md/dm-vdo/indexer/io-factory.h    |  4 ++--
>   drivers/md/dm-vdo/indexer/volume.c        |  4 ++--
>   drivers/md/dm-vdo/indexer/volume.h        |  2 +-
>   drivers/md/dm-vdo/vdo.c                   |  2 +-
>   15 files changed, 49 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/md/dm-vdo/dedupe.c b/drivers/md/dm-vdo/dedupe.c
> index 117266e1b3ae..0e311989247e 100644
> --- a/drivers/md/dm-vdo/dedupe.c
> +++ b/drivers/md/dm-vdo/dedupe.c
> @@ -2191,7 +2191,7 @@ static int initialize_index(struct vdo *vdo, struct hash_zones *zones)
>   	uds_offset = ((vdo_get_index_region_start(geometry) -
>   		       geometry.bio_offset) * VDO_BLOCK_SIZE);
>   	zones->parameters = (struct uds_parameters) {
> -		.bdev = vdo->device_config->owned_device->bdev,
> +		.bdev_file = vdo->device_config->owned_device->bdev_file,
>   		.offset = uds_offset,
>   		.size = (vdo_get_index_region_size(geometry) * VDO_BLOCK_SIZE),
>   		.memory_size = geometry.index_config.mem,
> @@ -2582,8 +2582,9 @@ static void resume_index(void *context, struct vdo_completion *parent)
>   	struct device_config *config = parent->vdo->device_config;
>   	int result;
>   
> -	zones->parameters.bdev = config->owned_device->bdev;
> -	result = uds_resume_index_session(zones->index_session, zones->parameters.bdev);
> +	zones->parameters.bdev_file = config->owned_device->bdev_file;
> +	result = uds_resume_index_session(zones->index_session,
> +					  zones->parameters.bdev_file);
>   	if (result != UDS_SUCCESS)
>   		vdo_log_error_strerror(result, "Error resuming dedupe index");
>   
> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
> index 5a4b0a927f56..79e861c2887c 100644
> --- a/drivers/md/dm-vdo/dm-vdo-target.c
> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
> @@ -696,6 +696,11 @@ static void handle_parse_error(struct device_config *config, char **error_ptr,
>   	*error_ptr = error_str;
>   }
>   
> +static loff_t vdo_get_device_size(const struct device_config *config)
> +{
> +	return i_size_read(file_inode(config->owned_device->bdev_file));
> +}
> +
>   /**
>    * parse_device_config() - Convert the dmsetup table into a struct device_config.
>    * @argc: The number of table values.
> @@ -878,7 +883,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
>   	}
>   
>   	if (config->version == 0) {
> -		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
> +		u64 device_size = vdo_get_device_size(config);
>   
>   		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>   	}
> @@ -1011,7 +1016,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
>   
>   static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
>   {
> -	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
> +	return vdo_get_device_size(vdo->device_config) / VDO_BLOCK_SIZE;
>   }
>   
>   static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
> diff --git a/drivers/md/dm-vdo/indexer/config.c b/drivers/md/dm-vdo/indexer/config.c
> index 5532371b952f..dcf0742a6145 100644
> --- a/drivers/md/dm-vdo/indexer/config.c
> +++ b/drivers/md/dm-vdo/indexer/config.c
> @@ -344,7 +344,7 @@ int uds_make_configuration(const struct uds_parameters *params,
>   	config->volume_index_mean_delta = DEFAULT_VOLUME_INDEX_MEAN_DELTA;
>   	config->sparse_sample_rate = (params->sparse ? DEFAULT_SPARSE_SAMPLE_RATE : 0);
>   	config->nonce = params->nonce;
> -	config->bdev = params->bdev;
> +	config->bdev_file = params->bdev_file;
>   	config->offset = params->offset;
>   	config->size = params->size;
>   
> diff --git a/drivers/md/dm-vdo/indexer/config.h b/drivers/md/dm-vdo/indexer/config.h
> index 08507dc2f7a1..8ba0cf72dec9 100644
> --- a/drivers/md/dm-vdo/indexer/config.h
> +++ b/drivers/md/dm-vdo/indexer/config.h
> @@ -25,8 +25,8 @@ enum {
>   
>   /* A set of configuration parameters for the indexer. */
>   struct uds_configuration {
> -	/* Storage device for the index */
> -	struct block_device *bdev;
> +	/* File of opened storage device for the index */
> +	struct file *bdev_file;
>   
>   	/* The maximum allowable size of the index */
>   	size_t size;
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.c b/drivers/md/dm-vdo/indexer/index-layout.c
> index 627adc24af3b..32eee76bc246 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.c
> +++ b/drivers/md/dm-vdo/indexer/index-layout.c
> @@ -1668,7 +1668,7 @@ static int create_layout_factory(struct index_layout *layout,
>   	size_t writable_size;
>   	struct io_factory *factory = NULL;
>   
> -	result = uds_make_io_factory(config->bdev, &factory);
> +	result = uds_make_io_factory(config->bdev_file, &factory);
>   	if (result != UDS_SUCCESS)
>   		return result;
>   
> @@ -1741,9 +1741,9 @@ void uds_free_index_layout(struct index_layout *layout)
>   }
>   
>   int uds_replace_index_layout_storage(struct index_layout *layout,
> -				     struct block_device *bdev)
> +				     struct file *bdev_file)
>   {
> -	return uds_replace_storage(layout->factory, bdev);
> +	return uds_replace_storage(layout->factory, bdev_file);
>   }
>   
>   /* Obtain a dm_bufio_client for the volume region. */
> diff --git a/drivers/md/dm-vdo/indexer/index-layout.h b/drivers/md/dm-vdo/indexer/index-layout.h
> index e9ac6f4302d6..28f9be577631 100644
> --- a/drivers/md/dm-vdo/indexer/index-layout.h
> +++ b/drivers/md/dm-vdo/indexer/index-layout.h
> @@ -24,7 +24,7 @@ int __must_check uds_make_index_layout(struct uds_configuration *config, bool ne
>   void uds_free_index_layout(struct index_layout *layout);
>   
>   int __must_check uds_replace_index_layout_storage(struct index_layout *layout,
> -						  struct block_device *bdev);
> +						  struct file *bdev_file);
>   
>   int __must_check uds_load_index_state(struct index_layout *layout,
>   				      struct uds_index *index);
> diff --git a/drivers/md/dm-vdo/indexer/index-session.c b/drivers/md/dm-vdo/indexer/index-session.c
> index aee0914d604a..914abf5e006b 100644
> --- a/drivers/md/dm-vdo/indexer/index-session.c
> +++ b/drivers/md/dm-vdo/indexer/index-session.c
> @@ -335,7 +335,7 @@ int uds_open_index(enum uds_open_index_type open_type,
>   		vdo_log_error("missing required parameters");
>   		return -EINVAL;
>   	}
> -	if (parameters->bdev == NULL) {
> +	if (parameters->bdev_file == NULL) {
>   		vdo_log_error("missing required block device");
>   		return -EINVAL;
>   	}
> @@ -349,7 +349,7 @@ int uds_open_index(enum uds_open_index_type open_type,
>   		return uds_status_to_errno(result);
>   
>   	session->parameters = *parameters;
> -	format_dev_t(name, parameters->bdev->bd_dev);
> +	format_dev_t(name, file_bdev(parameters->bdev_file)->bd_dev);
>   	vdo_log_info("%s: %s", get_open_type_string(open_type), name);
>   
>   	result = initialize_index_session(session, open_type);
> @@ -460,15 +460,16 @@ int uds_suspend_index_session(struct uds_index_session *session, bool save)
>   	return uds_status_to_errno(result);
>   }
>   
> -static int replace_device(struct uds_index_session *session, struct block_device *bdev)
> +static int replace_device(struct uds_index_session *session,
> +			  struct file *bdev_file)
>   {
>   	int result;
>   
> -	result = uds_replace_index_storage(session->index, bdev);
> +	result = uds_replace_index_storage(session->index, bdev_file);
>   	if (result != UDS_SUCCESS)
>   		return result;
>   
> -	session->parameters.bdev = bdev;
> +	session->parameters.bdev_file = bdev_file;
>   	return UDS_SUCCESS;
>   }
>   
> @@ -477,7 +478,7 @@ static int replace_device(struct uds_index_session *session, struct block_device
>    * device differs from the current backing store, the index will start using the new backing store.
>    */
>   int uds_resume_index_session(struct uds_index_session *session,
> -			     struct block_device *bdev)
> +			     struct file *bdev_file)
>   {
>   	int result = UDS_SUCCESS;
>   	bool no_work = false;
> @@ -502,8 +503,9 @@ int uds_resume_index_session(struct uds_index_session *session,
>   	if (no_work)
>   		return result;
>   
> -	if ((session->index != NULL) && (bdev != session->parameters.bdev)) {
> -		result = replace_device(session, bdev);
> +	if (session->index != NULL &&
> +	    bdev_file != session->parameters.bdev_file) {
> +		result = replace_device(session, bdev_file);
>   		if (result != UDS_SUCCESS) {
>   			mutex_lock(&session->request_mutex);
>   			session->state &= ~IS_FLAG_WAITING;
> diff --git a/drivers/md/dm-vdo/indexer/index.c b/drivers/md/dm-vdo/indexer/index.c
> index 1ba767144426..48b16275a067 100644
> --- a/drivers/md/dm-vdo/indexer/index.c
> +++ b/drivers/md/dm-vdo/indexer/index.c
> @@ -1336,9 +1336,9 @@ int uds_save_index(struct uds_index *index)
>   	return result;
>   }
>   
> -int uds_replace_index_storage(struct uds_index *index, struct block_device *bdev)
> +int uds_replace_index_storage(struct uds_index *index, struct file *bdev_file)
>   {
> -	return uds_replace_volume_storage(index->volume, index->layout, bdev);
> +	return uds_replace_volume_storage(index->volume, index->layout, bdev_file);
>   }
>   
>   /* Accessing statistics should be safe from any thread. */
> diff --git a/drivers/md/dm-vdo/indexer/index.h b/drivers/md/dm-vdo/indexer/index.h
> index edabb239548e..6e2e203f43f7 100644
> --- a/drivers/md/dm-vdo/indexer/index.h
> +++ b/drivers/md/dm-vdo/indexer/index.h
> @@ -72,7 +72,7 @@ int __must_check uds_save_index(struct uds_index *index);
>   void uds_free_index(struct uds_index *index);
>   
>   int __must_check uds_replace_index_storage(struct uds_index *index,
> -					   struct block_device *bdev);
> +					   struct file *bdev_file);
>   
>   void uds_get_index_stats(struct uds_index *index, struct uds_index_stats *counters);
>   
> diff --git a/drivers/md/dm-vdo/indexer/indexer.h b/drivers/md/dm-vdo/indexer/indexer.h
> index 3744aaf625b0..246ff2810e01 100644
> --- a/drivers/md/dm-vdo/indexer/indexer.h
> +++ b/drivers/md/dm-vdo/indexer/indexer.h
> @@ -128,8 +128,8 @@ struct uds_volume_record {
>   };
>   
>   struct uds_parameters {
> -	/* The block_device used for storage */
> -	struct block_device *bdev;
> +	/* The bdev_file used for storage */
> +	struct file *bdev_file;
>   	/* The maximum allowable size of the index on storage */
>   	size_t size;
>   	/* The offset where the index should start */
> @@ -314,7 +314,7 @@ int __must_check uds_suspend_index_session(struct uds_index_session *session, bo
>    * start using the new backing store instead.
>    */
>   int __must_check uds_resume_index_session(struct uds_index_session *session,
> -					  struct block_device *bdev);
> +					  struct file *bdev_file);
>   
>   /* Wait until all outstanding index operations are complete. */
>   int __must_check uds_flush_index_session(struct uds_index_session *session);
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
> index 515765d35794..f4dedb7b7f40 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.c
> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
> @@ -22,7 +22,7 @@
>    * make helper structures that can be used to access sections of the index.
>    */
>   struct io_factory {
> -	struct block_device *bdev;
> +	struct file *bdev_file;
>   	atomic_t ref_count;
>   };
>   
> @@ -59,7 +59,7 @@ static void uds_get_io_factory(struct io_factory *factory)
>   	atomic_inc(&factory->ref_count);
>   }
>   
> -int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_ptr)
> +int uds_make_io_factory(struct file *bdev_file, struct io_factory **factory_ptr)
>   {
>   	int result;
>   	struct io_factory *factory;
> @@ -68,16 +68,16 @@ int uds_make_io_factory(struct block_device *bdev, struct io_factory **factory_p
>   	if (result != VDO_SUCCESS)
>   		return result;
>   
> -	factory->bdev = bdev;
> +	factory->bdev_file = bdev_file;
>   	atomic_set_release(&factory->ref_count, 1);
>   
>   	*factory_ptr = factory;
>   	return UDS_SUCCESS;
>   }
>   
> -int uds_replace_storage(struct io_factory *factory, struct block_device *bdev)
> +int uds_replace_storage(struct io_factory *factory, struct file *bdev_file)
>   {
> -	factory->bdev = bdev;
> +	factory->bdev_file = bdev_file;
>   	return UDS_SUCCESS;
>   }
>   
> @@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
>   
>   size_t uds_get_writable_size(struct io_factory *factory)
>   {
> -	return i_size_read(factory->bdev->bd_inode);
> +	return i_size_read(file_inode(factory->bdev_file));
>   }
>   
>   /* Create a struct dm_bufio_client for an index region starting at offset. */
> @@ -99,8 +99,9 @@ int uds_make_bufio(struct io_factory *factory, off_t block_offset, size_t block_
>   {
>   	struct dm_bufio_client *client;
>   
> -	client = dm_bufio_client_create(factory->bdev, block_size, reserved_buffers, 0,
> -					NULL, NULL, 0);
> +	client = dm_bufio_client_create(file_bdev(factory->bdev_file),
> +					block_size, reserved_buffers,
> +					0, NULL, NULL, 0);
>   	if (IS_ERR(client))
>   		return -PTR_ERR(client);
>   
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.h b/drivers/md/dm-vdo/indexer/io-factory.h
> index 7fb5a0616a79..a3ca84d62f2d 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.h
> +++ b/drivers/md/dm-vdo/indexer/io-factory.h
> @@ -24,11 +24,11 @@ enum {
>   	SECTORS_PER_BLOCK = UDS_BLOCK_SIZE >> SECTOR_SHIFT,
>   };
>   
> -int __must_check uds_make_io_factory(struct block_device *bdev,
> +int __must_check uds_make_io_factory(struct file *bdev_file,
>   				     struct io_factory **factory_ptr);
>   
>   int __must_check uds_replace_storage(struct io_factory *factory,
> -				     struct block_device *bdev);
> +				     struct file *bdev_file);
>   
>   void uds_put_io_factory(struct io_factory *factory);
>   
> diff --git a/drivers/md/dm-vdo/indexer/volume.c b/drivers/md/dm-vdo/indexer/volume.c
> index 655453bb276b..edbe46252657 100644
> --- a/drivers/md/dm-vdo/indexer/volume.c
> +++ b/drivers/md/dm-vdo/indexer/volume.c
> @@ -1465,12 +1465,12 @@ int uds_find_volume_chapter_boundaries(struct volume *volume, u64 *lowest_vcn,
>   
>   int __must_check uds_replace_volume_storage(struct volume *volume,
>   					    struct index_layout *layout,
> -					    struct block_device *bdev)
> +					    struct file *bdev_file)
>   {
>   	int result;
>   	u32 i;
>   
> -	result = uds_replace_index_layout_storage(layout, bdev);
> +	result = uds_replace_index_layout_storage(layout, bdev_file);
>   	if (result != UDS_SUCCESS)
>   		return result;
>   
> diff --git a/drivers/md/dm-vdo/indexer/volume.h b/drivers/md/dm-vdo/indexer/volume.h
> index 8679a5e55347..1dc3561b8b43 100644
> --- a/drivers/md/dm-vdo/indexer/volume.h
> +++ b/drivers/md/dm-vdo/indexer/volume.h
> @@ -130,7 +130,7 @@ void uds_free_volume(struct volume *volume);
>   
>   int __must_check uds_replace_volume_storage(struct volume *volume,
>   					    struct index_layout *layout,
> -					    struct block_device *bdev);
> +					    struct file *bdev_file);
>   
>   int __must_check uds_find_volume_chapter_boundaries(struct volume *volume,
>   						    u64 *lowest_vcn, u64 *highest_vcn,
> diff --git a/drivers/md/dm-vdo/vdo.c b/drivers/md/dm-vdo/vdo.c
> index fff847767755..eca9f8b51535 100644
> --- a/drivers/md/dm-vdo/vdo.c
> +++ b/drivers/md/dm-vdo/vdo.c
> @@ -809,7 +809,7 @@ void vdo_load_super_block(struct vdo *vdo, struct vdo_completion *parent)
>    */
>   struct block_device *vdo_get_backing_device(const struct vdo *vdo)
>   {
> -	return vdo->device_config->owned_device->bdev;
> +	return file_bdev(vdo->device_config->owned_device->bdev_file);
>   }
>   
>   /**


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-10 17:26   ` Matthew Sakai
@ 2024-04-10 17:40     ` Al Viro
  2024-04-10 18:59       ` Matthew Sakai
  2024-04-11 11:12       ` Christian Brauner
  0 siblings, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-10 17:40 UTC (permalink / raw)
  To: Matthew Sakai
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3, dm-devel

On Wed, Apr 10, 2024 at 01:26:47PM -0400, Matthew Sakai wrote:

> > 'dm_dev->bdev_file', it's ok to get inode from the file.

It can be done much easier, though -

[PATCH] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)

going to be faster, actually - shift is cheaper than dereference...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
index 5a4b0a927f56..b423bec6458b 100644
--- a/drivers/md/dm-vdo/dm-vdo-target.c
+++ b/drivers/md/dm-vdo/dm-vdo-target.c
@@ -878,7 +878,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
 	}
 
 	if (config->version == 0) {
-		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
+		u64 device_size = bdev_nr_bytes(config->owned_device->bdev);
 
 		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
 	}
@@ -1011,7 +1011,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
 
 static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
 {
-	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
+	return bdev_nr_bytes(vdo_get_backing_device(vdo)) / VDO_BLOCK_SIZE;
 }
 
 static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
index 515765d35794..1bee9d63dc0a 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.c
+++ b/drivers/md/dm-vdo/indexer/io-factory.c
@@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
 
 size_t uds_get_writable_size(struct io_factory *factory)
 {
-	return i_size_read(factory->bdev->bd_inode);
+	return bdev_nr_bytes(factory->bdev);
 }
 
 /* Create a struct dm_bufio_client for an index region starting at offset. */

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-10 17:40     ` Al Viro
@ 2024-04-10 18:59       ` Matthew Sakai
  2024-04-11 11:12       ` Christian Brauner
  1 sibling, 0 replies; 116+ messages in thread
From: Matthew Sakai @ 2024-04-10 18:59 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3, dm-devel



On 4/10/24 13:40, Al Viro wrote:
> On Wed, Apr 10, 2024 at 01:26:47PM -0400, Matthew Sakai wrote:
> 
>>> 'dm_dev->bdev_file', it's ok to get inode from the file.
> 
> It can be done much easier, though -
> 
> [PATCH] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
> 
> going to be faster, actually - shift is cheaper than dereference...

This does look simpler. And doing this means there's no reason to switch 
dm-vdo from using struct block_device * to using struct file *, so the 
rest of the original patch is unnecessary.

Reviewed-by: Matthew Sakai <msakai@redhat.com>

> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
> index 5a4b0a927f56..b423bec6458b 100644
> --- a/drivers/md/dm-vdo/dm-vdo-target.c
> +++ b/drivers/md/dm-vdo/dm-vdo-target.c
> @@ -878,7 +878,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
>   	}
>   
>   	if (config->version == 0) {
> -		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
> +		u64 device_size = bdev_nr_bytes(config->owned_device->bdev);
>   
>   		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
>   	}
> @@ -1011,7 +1011,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
>   
>   static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
>   {
> -	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
> +	return bdev_nr_bytes(vdo_get_backing_device(vdo)) / VDO_BLOCK_SIZE;
>   }
>   
>   static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
> diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
> index 515765d35794..1bee9d63dc0a 100644
> --- a/drivers/md/dm-vdo/indexer/io-factory.c
> +++ b/drivers/md/dm-vdo/indexer/io-factory.c
> @@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
>   
>   size_t uds_get_writable_size(struct io_factory *factory)
>   {
> -	return i_size_read(factory->bdev->bd_inode);
> +	return bdev_nr_bytes(factory->bdev);
>   }
>   
>   /* Create a struct dm_bufio_client for an index region starting at offset. */
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-10 10:59                     ` Jan Kara
@ 2024-04-10 22:34                       ` Al Viro
  2024-04-11 11:56                         ` Christian Brauner
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-10 22:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Wed, Apr 10, 2024 at 12:59:11PM +0200, Jan Kara wrote:

> I agree with Christian and Al - and I think I've expressed that already in
> the previous version of the series [1] but I guess I was not explicit
> enough :). I think the initial part of the series (upto patch 21, perhaps
> excluding patch 20) is a nice cleanup but the latter part playing with
> stashing struct file is not an improvement and seems pointless to me. So
> I'd separate the initial part cleaning up the obvious places and let
> Christian merge it and then we can figure out what (if anything) to do with
> remaining bd_inode uses in fs/buffer.c etc. E.g. what Al suggests with
> bd_mapping makes sense to me but I didn't check what's left after your
> initial patches...

FWIW, experimental on top of -next:
Al Viro (7):
      block_device: add a pointer to struct address_space (page cache of bdev)
      use ->bd_mapping instead of ->bd_inode->i_mapping
      grow_dev_folio(): we only want ->bd_inode->i_mapping there
      gfs2: more obvious initializations of mapping->host
      blkdev_write_iter(): saner way to get inode and bdev
      blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
      dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)

Yu Kuai (4):
      ext4: remove block_device_ejected()
      block: move two helpers into bdev.c
      bcachefs: remove dead function bdev_sectors()
      block2mtd: prevent direct access of bd_inode	[slightly modified]

leaves only this:
block/bdev.c:60:        struct inode *inode = bdev->bd_inode;
block/bdev.c:137:       loff_t size = i_size_read(bdev->bd_inode);
block/bdev.c:144:       bdev->bd_inode->i_blkbits = blksize_bits(bsize);
block/bdev.c:158:       if (bdev->bd_inode->i_blkbits != blksize_bits(size)) {
block/bdev.c:160:               bdev->bd_inode->i_blkbits = blksize_bits(size);
block/bdev.c:415:       bdev->bd_inode = inode;
block/bdev.c:434:       i_size_write(bdev->bd_inode, (loff_t)sectors << SECTOR_SHIFT);
block/bdev.c:444:       bdev->bd_inode->i_rdev = dev;
block/bdev.c:445:       bdev->bd_inode->i_ino = dev;
block/bdev.c:446:       insert_inode_hash(bdev->bd_inode);
block/bdev.c:974:       bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
block/bdev.c:980:       ihold(bdev->bd_inode);
block/bdev.c:1257:      return !inode_unhashed(disk->part0->bd_inode);
block/bdev.c:1263:      return 1 << bdev->bd_inode->i_blkbits;
block/genhd.c:659:              remove_inode_hash(part->bd_inode);
block/genhd.c:1194:     iput(disk->part0->bd_inode);    /* frees the disk */
block/genhd.c:1384:     iput(disk->part0->bd_inode);
block/partitions/core.c:246:    iput(dev_to_bdev(dev)->bd_inode);
block/partitions/core.c:472:    remove_inode_hash(part->bd_inode);
block/partitions/core.c:658:            remove_inode_hash(part->bd_inode);
drivers/s390/block/dasd_ioctl.c:218:            block->gdp->part0->bd_inode->i_blkbits =
fs/buffer.c:192:        struct inode *bd_inode = bdev->bd_inode;
fs/buffer.c:1699:       struct inode *bd_inode = bdev->bd_inode;
fs/erofs/data.c:73:             buf->inode = sb->s_bdev->bd_inode;
fs/nilfs2/segment.c:2793:       inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);

I've got erofs patches that get rid of that instance; bdev.c is obviously priveleged
since it sees coallocated inode directly.  Other than those we have
	* 3 callers of remove_inode_hash()
	* 3 callers of iput()
	* one caller of inode_attach_wb() (nilfs2)
	* weird shit in DASD (redundant, that; incidentally, I don't see anything
	  that might prevent DASD format requested with mounted partitions on that
	  disk - and won't that be fun and joy for an admin to step into...)
	* two places in fs/buffer.c that want to convert block numbers to positions
	  in bytes.  Either the function itself or its caller has the block size
	  as argument; replacing that to passing block _shift_ instead of size
	  would reduce those two to ->bd_mapping.
And that's it.  iput() and remove_inode_hash() are obvious candidates for
helpers (internal to block/*; no exporting those, it's private to bdev.c,
genhd.c and paritions/core.c).

fs/buffer.c ones need a bit more code audit (not quite done with that), but
it looks at least plausible.  Which would leave us with whatever nilfs2 is
doing and that weirdness in dasd_format() (why set ->i_blkbits but not
->i_blocksize?  why not use set_blocksize(), for that matter?  where the
hell is check for exclusive open?)

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: (subset) [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path
  2024-04-06  9:09 ` [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path Yu Kuai
@ 2024-04-11  9:16   ` Christian Brauner
  0 siblings, 0 replies; 116+ messages in thread
From: Christian Brauner @ 2024-04-11  9:16 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christian Brauner, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3, jack, hch, viro, axboe

On Sat, 06 Apr 2024 17:09:25 +0800, Yu Kuai wrote:
> At the time bdev_may_open() is called, module reference is grabbed
> already, hence module reference should be released if bdev_may_open()
> failed.
> 
> This problem is found by code review.
> 
> 
> [...]

Bugfix for current code that should go separately.

---

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[21/26] block: fix module reference leakage from bdev_open_by_dev error path
        https://git.kernel.org/vfs/vfs/c/9617cd6f24b2

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 04/26] block: prevent direct access of bd_inode
  2024-04-07  2:37     ` Yu Kuai
@ 2024-04-11 11:12       ` Christian Brauner
  0 siblings, 0 replies; 116+ messages in thread
From: Christian Brauner @ 2024-04-11 11:12 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Al Viro, jack, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 10:37:08AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2024/04/07 10:22, Al Viro 写道:
> > On Sat, Apr 06, 2024 at 05:09:08PM +0800, Yu Kuai wrote:
> > > @@ -669,7 +669,7 @@ static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
> > >   {
> > >   	struct file *file = iocb->ki_filp;
> > >   	struct block_device *bdev = I_BDEV(file->f_mapping->host);
> > > -	struct inode *bd_inode = bdev->bd_inode;
> > > +	struct inode *bd_inode = bdev_inode(bdev);
> > 
> > What you want here is this:
> > 
> > 	struct inode *bd_inode = file->f_mapping->host;
> > 	struct block_device *bdev = I_BDEV(bd_inode);
> 
> Yes, this way is better, logically.
> > 
> > 
> > > --- a/block/ioctl.c
> > > +++ b/block/ioctl.c
> > > @@ -97,7 +97,7 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
> > >   {
> > >   	uint64_t range[2];
> > >   	uint64_t start, len;
> > > -	struct inode *inode = bdev->bd_inode;
> > > +	struct inode *inode = bdev_inode(bdev);
> > >   	int err;
> > 
> > The uses of 'inode' in this function are
> >          filemap_invalidate_lock(inode->i_mapping);
> > and
> >          filemap_invalidate_unlock(inode->i_mapping);
> > 
> > IOW, you want bdev_mapping(bdev), not bdev_inode(bdev).
> > 
> > > @@ -166,7 +166,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
> > >   {
> > >   	uint64_t range[2];
> > >   	uint64_t start, end, len;
> > > -	struct inode *inode = bdev->bd_inode;
> > > +	struct inode *inode = bdev_inode(bdev);
> > 
> > Same story.
> 
> Yes.

I've folded in those changes during applying.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file
  2024-04-10 17:40     ` Al Viro
  2024-04-10 18:59       ` Matthew Sakai
@ 2024-04-11 11:12       ` Christian Brauner
  1 sibling, 0 replies; 116+ messages in thread
From: Christian Brauner @ 2024-04-11 11:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Matthew Sakai, Yu Kuai, jack, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, dm-devel

On Wed, Apr 10, 2024 at 06:40:22PM +0100, Al Viro wrote:
> On Wed, Apr 10, 2024 at 01:26:47PM -0400, Matthew Sakai wrote:
> 
> > > 'dm_dev->bdev_file', it's ok to get inode from the file.
> 
> It can be done much easier, though -
> 
> [PATCH] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
> 
> going to be faster, actually - shift is cheaper than dereference...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

I've used that patch instead of the original one.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-10 22:34                       ` Al Viro
@ 2024-04-11 11:56                         ` Christian Brauner
  2024-04-11 14:04                           ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Christian Brauner @ 2024-04-11 11:56 UTC (permalink / raw)
  To: Al Viro, Jan Kara
  Cc: Yu Kuai, hch, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

On Wed, Apr 10, 2024 at 11:34:43PM +0100, Al Viro wrote:
> On Wed, Apr 10, 2024 at 12:59:11PM +0200, Jan Kara wrote:
> 
> > I agree with Christian and Al - and I think I've expressed that already in
> > the previous version of the series [1] but I guess I was not explicit
> > enough :). I think the initial part of the series (upto patch 21, perhaps
> > excluding patch 20) is a nice cleanup but the latter part playing with
> > stashing struct file is not an improvement and seems pointless to me. So
> > I'd separate the initial part cleaning up the obvious places and let
> > Christian merge it and then we can figure out what (if anything) to do with
> > remaining bd_inode uses in fs/buffer.c etc. E.g. what Al suggests with
> > bd_mapping makes sense to me but I didn't check what's left after your
> > initial patches...
> 
> FWIW, experimental on top of -next:

Ok, let's move forward with this. I've applied the first 19 patches.
Patch 20 is the start of what we all disliked. 21 is clearly a bugfix
for current code so that'll go separately from the rest. I've replaced
open-code f_mapping access with file_mapping(). The symmetry between
file_inode() and file_mapping() is quite nice.

Al, your idea to switch erofs away from buf->inode can go on top of what
Yu did imho. There's no real reason to throw it away imho.

I've exported bdev_mapping() because it really makes the btrfs change a
lot slimmer and we don't need to care about messing with a lot of that
code. I didn't care about making it static inline because that might've
meant we need to move other stuff into the header as well. Imho, it's
not that important but if it's a big deal to any of you just do the
changes on top of it, please.

Pushed to
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.super

If I hear no objections that'll show up in -next tomorrow. Al, would be
nice if you could do your changes on top of this, please.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-11 11:56                         ` Christian Brauner
@ 2024-04-11 14:04                           ` Al Viro
  2024-04-11 14:49                             ` Al Viro
  2024-04-12  9:21                             ` Christian Brauner
  0 siblings, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:04 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Thu, Apr 11, 2024 at 01:56:03PM +0200, Christian Brauner wrote:
> On Wed, Apr 10, 2024 at 11:34:43PM +0100, Al Viro wrote:
> > On Wed, Apr 10, 2024 at 12:59:11PM +0200, Jan Kara wrote:
> > 
> > > I agree with Christian and Al - and I think I've expressed that already in
> > > the previous version of the series [1] but I guess I was not explicit
> > > enough :). I think the initial part of the series (upto patch 21, perhaps
> > > excluding patch 20) is a nice cleanup but the latter part playing with
> > > stashing struct file is not an improvement and seems pointless to me. So
> > > I'd separate the initial part cleaning up the obvious places and let
> > > Christian merge it and then we can figure out what (if anything) to do with
> > > remaining bd_inode uses in fs/buffer.c etc. E.g. what Al suggests with
> > > bd_mapping makes sense to me but I didn't check what's left after your
> > > initial patches...
> > 
> > FWIW, experimental on top of -next:
> 
> Ok, let's move forward with this. I've applied the first 19 patches.
> Patch 20 is the start of what we all disliked. 21 is clearly a bugfix
> for current code so that'll go separately from the rest. I've replaced
> open-code f_mapping access with file_mapping(). The symmetry between
> file_inode() and file_mapping() is quite nice.
> 
> Al, your idea to switch erofs away from buf->inode can go on top of what
> Yu did imho. There's no real reason to throw it away imho.
> 
> I've exported bdev_mapping() because it really makes the btrfs change a
> lot slimmer and we don't need to care about messing with a lot of that
> code. I didn't care about making it static inline because that might've
> meant we need to move other stuff into the header as well. Imho, it's
> not that important but if it's a big deal to any of you just do the
> changes on top of it, please.
> 
> Pushed to
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.super
> 
> If I hear no objections that'll show up in -next tomorrow. Al, would be
> nice if you could do your changes on top of this, please.

Objection: start with adding bdev->bd_mapping, next convert the really
obvious instances to it and most of this series becomes not needed at
all.

Really.  There is no need whatsoever to push struct file down all those
paths.

And yes, erofs and buffer.c stuff belongs on top of that, no arguments here.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-11 14:04                           ` Al Viro
@ 2024-04-11 14:49                             ` Al Viro
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
  2024-04-12  1:38                               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
  2024-04-12  9:21                             ` Christian Brauner
  1 sibling, 2 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:49 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Thu, Apr 11, 2024 at 03:04:09PM +0100, Al Viro wrote:
> > lot slimmer and we don't need to care about messing with a lot of that
> > code. I didn't care about making it static inline because that might've
> > meant we need to move other stuff into the header as well. Imho, it's
> > not that important but if it's a big deal to any of you just do the
> > changes on top of it, please.
> > 
> > Pushed to
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.super
> > 
> > If I hear no objections that'll show up in -next tomorrow. Al, would be
> > nice if you could do your changes on top of this, please.
> 
> Objection: start with adding bdev->bd_mapping, next convert the really
> obvious instances to it and most of this series becomes not needed at
> all.
> 
> Really.  There is no need whatsoever to push struct file down all those
> paths.
> 
> And yes, erofs and buffer.c stuff belongs on top of that, no arguments here.

FWIW, here's what you get if this is done in such order:

block/bdev.c                           | 31 ++++++++++++++++++++++---------
block/blk-zoned.c                      |  4 ++--
block/fops.c                           |  4 ++--
block/genhd.c                          |  2 +-
block/ioctl.c                          | 14 ++++++--------
block/partitions/core.c                |  2 +-
drivers/md/bcache/super.c              |  2 +-
drivers/md/dm-vdo/dm-vdo-target.c      |  4 ++--
drivers/md/dm-vdo/indexer/io-factory.c |  2 +-
drivers/mtd/devices/block2mtd.c        |  6 ++++--
drivers/scsi/scsicam.c                 |  2 +-
fs/bcachefs/util.h                     |  5 -----
fs/btrfs/disk-io.c                     |  6 +++---
fs/btrfs/volumes.c                     |  2 +-
fs/btrfs/zoned.c                       |  2 +-
fs/buffer.c                            | 10 +++++-----
fs/cramfs/inode.c                      |  2 +-
fs/ext4/dir.c                          |  2 +-
fs/ext4/ext4_jbd2.c                    |  2 +-
fs/ext4/super.c                        | 24 +++---------------------
fs/gfs2/glock.c                        |  2 +-
fs/gfs2/ops_fstype.c                   |  2 +-
fs/jbd2/journal.c                      |  2 +-
include/linux/blk_types.h              |  1 +
include/linux/blkdev.h                 | 12 ++----------
include/linux/buffer_head.h            |  4 ++--
include/linux/jbd2.h                   |  4 ++--
27 files changed, 69 insertions(+), 86 deletions(-)

The bulk of the changes is straight replacements of foo->bd_inode->i_mapping
with foo->bd_mapping.  That's completely mechanical and that takes out most
of the bd_inode uses.  Anyway, patches in followups

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev)
  2024-04-11 14:49                             ` Al Viro
@ 2024-04-11 14:53                               ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 02/11] use ->bd_mapping instead of ->bd_inode->i_mapping Al Viro
                                                   ` (10 more replies)
  2024-04-12  1:38                               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
  1 sibling, 11 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

points to ->i_data of coallocated inode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c              | 1 +
 include/linux/blk_types.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/block/bdev.c b/block/bdev.c
index dd26d37356aa..1c3462fba6ce 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -413,6 +413,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	mutex_init(&bdev->bd_holder_lock);
 	bdev->bd_partno = partno;
 	bdev->bd_inode = inode;
+	bdev->bd_mapping = &inode->i_data;
 	bdev->bd_queue = disk->queue;
 	if (partno)
 		bdev->bd_has_submit_bio = disk->part0->bd_has_submit_bio;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cb1526ec44b5..6438c75cbb35 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -51,6 +51,7 @@ struct block_device {
 	bool			bd_has_submit_bio;
 	dev_t			bd_dev;
 	struct inode		*bd_inode;	/* will die */
+	struct address_space	*bd_mapping;	/* page cache */
 
 	atomic_t		bd_openers;
 	spinlock_t		bd_size_lock; /* for bd_inode->i_size updates */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 02/11] use ->bd_mapping instead of ->bd_inode->i_mapping
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there Al Viro
                                                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Just the low-hanging fruit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c                | 18 +++++++++---------
 block/blk-zoned.c           |  4 ++--
 block/genhd.c               |  2 +-
 block/ioctl.c               |  4 ++--
 block/partitions/core.c     |  2 +-
 drivers/md/bcache/super.c   |  2 +-
 drivers/scsi/scsicam.c      |  2 +-
 fs/btrfs/disk-io.c          |  6 +++---
 fs/btrfs/volumes.c          |  2 +-
 fs/btrfs/zoned.c            |  2 +-
 fs/buffer.c                 |  2 +-
 fs/cramfs/inode.c           |  2 +-
 fs/ext4/dir.c               |  2 +-
 fs/ext4/ext4_jbd2.c         |  2 +-
 fs/ext4/super.c             |  6 +++---
 fs/jbd2/journal.c           |  2 +-
 include/linux/buffer_head.h |  4 ++--
 include/linux/jbd2.h        |  4 ++--
 18 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 1c3462fba6ce..39a2fe9f84dd 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -76,7 +76,7 @@ static void bdev_write_inode(struct block_device *bdev)
 /* Kill _all_ buffers and pagecache , dirty or not.. */
 static void kill_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev->bd_mapping;
 
 	if (mapping_empty(mapping))
 		return;
@@ -88,7 +88,7 @@ static void kill_bdev(struct block_device *bdev)
 /* Invalidate clean unused buffers and pagecache. */
 void invalidate_bdev(struct block_device *bdev)
 {
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev->bd_mapping;
 
 	if (mapping->nrpages) {
 		invalidate_bh_lrus();
@@ -116,7 +116,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 			goto invalidate;
 	}
 
-	truncate_inode_pages_range(bdev->bd_inode->i_mapping, lstart, lend);
+	truncate_inode_pages_range(bdev->bd_mapping, lstart, lend);
 	if (!(mode & BLK_OPEN_EXCL))
 		bd_abort_claiming(bdev, truncate_bdev_range);
 	return 0;
@@ -126,7 +126,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 	 * Someone else has handle exclusively open. Try invalidating instead.
 	 * The 'end' argument is inclusive so the rounding is safe.
 	 */
-	return invalidate_inode_pages2_range(bdev->bd_inode->i_mapping,
+	return invalidate_inode_pages2_range(bdev->bd_mapping,
 					     lstart >> PAGE_SHIFT,
 					     lend >> PAGE_SHIFT);
 }
@@ -192,7 +192,7 @@ int sync_blockdev_nowait(struct block_device *bdev)
 {
 	if (!bdev)
 		return 0;
-	return filemap_flush(bdev->bd_inode->i_mapping);
+	return filemap_flush(bdev->bd_mapping);
 }
 EXPORT_SYMBOL_GPL(sync_blockdev_nowait);
 
@@ -204,13 +204,13 @@ int sync_blockdev(struct block_device *bdev)
 {
 	if (!bdev)
 		return 0;
-	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
+	return filemap_write_and_wait(bdev->bd_mapping);
 }
 EXPORT_SYMBOL(sync_blockdev);
 
 int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend)
 {
-	return filemap_write_and_wait_range(bdev->bd_inode->i_mapping,
+	return filemap_write_and_wait_range(bdev->bd_mapping,
 			lstart, lend);
 }
 EXPORT_SYMBOL(sync_blockdev_range);
@@ -439,7 +439,7 @@ void bdev_set_nr_sectors(struct block_device *bdev, sector_t sectors)
 void bdev_add(struct block_device *bdev, dev_t dev)
 {
 	if (bdev_stable_writes(bdev))
-		mapping_set_stable_writes(bdev->bd_inode->i_mapping);
+		mapping_set_stable_writes(bdev->bd_mapping);
 	bdev->bd_dev = dev;
 	bdev->bd_inode->i_rdev = dev;
 	bdev->bd_inode->i_ino = dev;
@@ -909,7 +909,7 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 		bdev_file->f_mode |= FMODE_NOWAIT;
 	if (mode & BLK_OPEN_RESTRICT_WRITES)
 		bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
-	bdev_file->f_mapping = bdev->bd_inode->i_mapping;
+	bdev_file->f_mapping = bdev->bd_mapping;
 	bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
 	bdev_file->private_data = holder;
 
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index da0f4b2a8fa0..b008bcd4889c 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -398,7 +398,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 		op = REQ_OP_ZONE_RESET;
 
 		/* Invalidate the page cache, including dirty pages. */
-		filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_lock(bdev->bd_mapping);
 		ret = blkdev_truncate_zone_range(bdev, mode, &zrange);
 		if (ret)
 			goto fail;
@@ -420,7 +420,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
 
 fail:
 	if (cmd == BLKRESETZONE)
-		filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+		filemap_invalidate_unlock(bdev->bd_mapping);
 
 	return ret;
 }
diff --git a/block/genhd.c b/block/genhd.c
index bb29a68e1d67..b294d56961fb 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -745,7 +745,7 @@ void invalidate_disk(struct gendisk *disk)
 	struct block_device *bdev = disk->part0;
 
 	invalidate_bdev(bdev);
-	bdev->bd_inode->i_mapping->wb_err = 0;
+	bdev->bd_mapping->wb_err = 0;
 	set_capacity(disk, 0);
 }
 EXPORT_SYMBOL(invalidate_disk);
diff --git a/block/ioctl.c b/block/ioctl.c
index 0c76137adcaa..d365d8e92f98 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -151,12 +151,12 @@ static int blk_ioctl_secure_erase(struct block_device *bdev, blk_mode_t mode,
 	if (start + len > bdev_nr_bytes(bdev))
 		return -EINVAL;
 
-	filemap_invalidate_lock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_lock(bdev->bd_mapping);
 	err = truncate_bdev_range(bdev, mode, start, start + len - 1);
 	if (!err)
 		err = blkdev_issue_secure_erase(bdev, start >> 9, len >> 9,
 						GFP_KERNEL);
-	filemap_invalidate_unlock(bdev->bd_inode->i_mapping);
+	filemap_invalidate_unlock(bdev->bd_mapping);
 	return err;
 }
 
diff --git a/block/partitions/core.c b/block/partitions/core.c
index b11e88c82c8c..899f2093835f 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -704,7 +704,7 @@ EXPORT_SYMBOL_GPL(bdev_disk_changed);
 
 void *read_part_sector(struct parsed_partitions *state, sector_t n, Sector *p)
 {
-	struct address_space *mapping = state->disk->part0->bd_inode->i_mapping;
+	struct address_space *mapping = state->disk->part0->bd_mapping;
 	struct folio *folio;
 
 	if (n >= get_capacity(state->disk)) {
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 330bcd9ea4a9..707836a7d8b2 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -171,7 +171,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 	struct page *page;
 	unsigned int i;
 
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
+	page = read_cache_page_gfp(bdev->bd_mapping,
 				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
 	if (IS_ERR(page))
 		return "IO error";
diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..dd69342bbe78 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,7 +32,7 @@
  */
 unsigned char *scsi_bios_ptable(struct block_device *dev)
 {
-	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
+	struct address_space *mapping = bdev_whole(dev)->bd_mapping;
 	unsigned char *res = NULL;
 	struct folio *folio;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0474e9b6d302..343811c914b8 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3651,7 +3651,7 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 	struct btrfs_super_block *super;
 	struct page *page;
 	u64 bytenr, bytenr_orig;
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	struct address_space *mapping = bdev->bd_mapping;
 	int ret;
 
 	bytenr_orig = btrfs_sb_offset(copy_num);
@@ -3738,7 +3738,7 @@ static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
+	struct address_space *mapping = device->bdev->bd_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
 	int errors = 0;
@@ -3855,7 +3855,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		    device->commit_total_bytes)
 			break;
 
-		page = find_get_page(device->bdev->bd_inode->i_mapping,
+		page = find_get_page(device->bdev->bd_mapping,
 				     bytenr >> PAGE_SHIFT);
 		if (!page) {
 			errors++;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a3dc88e420d1..224df46cf938 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1287,7 +1287,7 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 		return ERR_PTR(-EINVAL);
 
 	/* pull in the page with our super */
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
+	page = read_cache_page_gfp(bdev->bd_mapping, index, GFP_KERNEL);
 
 	if (IS_ERR(page))
 		return ERR_CAST(page);
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 4b52a8916dbb..1d8e0f762918 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -118,7 +118,7 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 		return -ENOENT;
 	} else if (full[0] && full[1]) {
 		/* Compare two super blocks */
-		struct address_space *mapping = bdev->bd_inode->i_mapping;
+		struct address_space *mapping = bdev->bd_mapping;
 		struct page *page[BTRFS_NR_SB_LOG_ZONES];
 		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
 		int i;
diff --git a/fs/buffer.c b/fs/buffer.c
index 4f73d23c2c46..d5a0932ae68d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1463,7 +1463,7 @@ __bread_gfp(struct block_device *bdev, sector_t block,
 {
 	struct buffer_head *bh;
 
-	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp |= mapping_gfp_constraint(bdev->bd_mapping, ~__GFP_FS);
 
 	/*
 	 * Prefer looping in the allocator rather than here, at least that
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 9901057a15ba..460690ca0174 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -183,7 +183,7 @@ static int next_buffer;
 static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 				unsigned int len)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev->bd_mapping;
 	struct file_ra_state ra = {};
 	struct page *pages[BLKS_PER_BUF];
 	unsigned i, blocknr, buffer;
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..ff4514e4626b 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -192,7 +192,7 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 					(PAGE_SHIFT - inode->i_blkbits);
 			if (!ra_has_index(&file->f_ra, index))
 				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
+					sb->s_bdev->bd_mapping,
 					&file->f_ra, file,
 					index, 1);
 			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index 5d8055161acd..da4a82456383 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,7 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
 
 static void ext4_check_bdev_write_error(struct super_block *sb)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+	struct address_space *mapping = sb->s_bdev->bd_mapping;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int err;
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3fce1b80c419..0be1c3a7ffa0 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -244,7 +244,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
 struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 				   blk_opf_t op_flags)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_mapping,
 			~__GFP_FS) | __GFP_MOVABLE;
 
 	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
@@ -253,7 +253,7 @@ struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 					    sector_t block)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
+	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_mapping,
 			~__GFP_FS);
 
 	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
@@ -5568,7 +5568,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	 * used to detect the metadata async write error.
 	 */
 	spin_lock_init(&sbi->s_bdev_wb_lock);
-	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
+	errseq_check_and_advance(&sb->s_bdev->bd_mapping->wb_err,
 				 &sbi->s_bdev_wb_err);
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..03c4b9214f56 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2009,7 +2009,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
 		byte_count = (block_stop - block_start + 1) *
 				journal->j_blocksize;
 
-		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
+		truncate_inode_pages_range(journal->j_dev->bd_mapping,
 				byte_start, byte_stop);
 
 		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index d78454a4dd1f..e58a0d63409a 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -338,7 +338,7 @@ static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev->bd_mapping, ~__GFP_FS);
 	gfp |= __GFP_NOFAIL;
 
 	return bdev_getblk(bdev, block, size, gfp);
@@ -349,7 +349,7 @@ static inline struct buffer_head *__getblk(struct block_device *bdev,
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = mapping_gfp_constraint(bdev->bd_mapping, ~__GFP_FS);
 	gfp |= __GFP_MOVABLE | __GFP_NOFAIL;
 
 	return bdev_getblk(bdev, block, size, gfp);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..ac31c37816f7 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1696,7 +1696,7 @@ static inline void jbd2_journal_abort_handle(handle_t *handle)
 
 static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev->bd_mapping;
 
 	/*
 	 * Save the original wb_err value of client fs's bdev mapping which
@@ -1707,7 +1707,7 @@ static inline void jbd2_init_fs_dev_write_error(journal_t *journal)
 
 static inline int jbd2_check_fs_dev_write_error(journal_t *journal)
 {
-	struct address_space *mapping = journal->j_fs_dev->bd_inode->i_mapping;
+	struct address_space *mapping = journal->j_fs_dev->bd_mapping;
 
 	return errseq_check(&mapping->wb_err,
 			    READ_ONCE(journal->j_fs_dev_wb_err));
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
  2024-04-11 14:53                                 ` [PATCH 02/11] use ->bd_mapping instead of ->bd_inode->i_mapping Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:59                                   ` Matthew Wilcox
  2024-04-11 14:53                                 ` [PATCH 04/11] gfs2: more obvious initializations of mapping->host Al Viro
                                                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/buffer.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index d5a0932ae68d..78a4e95ba2f2 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1034,12 +1034,12 @@ static sector_t folio_init_buffers(struct folio *folio,
 static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 		pgoff_t index, unsigned size, gfp_t gfp)
 {
-	struct inode *inode = bdev->bd_inode;
+	struct address_space *mapping = bdev->bd_mapping;
 	struct folio *folio;
 	struct buffer_head *bh;
 	sector_t end_block = 0;
 
-	folio = __filemap_get_folio(inode->i_mapping, index,
+	folio = __filemap_get_folio(mapping, index,
 			FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
 	if (IS_ERR(folio))
 		return false;
@@ -1073,10 +1073,10 @@ static bool grow_dev_folio(struct block_device *bdev, sector_t block,
 	 * lock to be atomic wrt __find_get_block(), which does not
 	 * run under the folio lock.
 	 */
-	spin_lock(&inode->i_mapping->i_private_lock);
+	spin_lock(&mapping->i_private_lock);
 	link_dev_buffers(folio, bh);
 	end_block = folio_init_buffers(folio, bdev, size);
-	spin_unlock(&inode->i_mapping->i_private_lock);
+	spin_unlock(&mapping->i_private_lock);
 unlock:
 	folio_unlock(folio);
 	folio_put(folio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 04/11] gfs2: more obvious initializations of mapping->host
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
  2024-04-11 14:53                                 ` [PATCH 02/11] use ->bd_mapping instead of ->bd_inode->i_mapping Al Viro
  2024-04-11 14:53                                 ` [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 05/11] blkdev_write_iter(): saner way to get inode and bdev Al Viro
                                                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

what's going on is copying the ->host of bdev's address_space

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/gfs2/glock.c      | 2 +-
 fs/gfs2/ops_fstype.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 34540f9d011c..1ebcf6c90f2b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1227,7 +1227,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 	mapping = gfs2_glock2aspace(gl);
 	if (mapping) {
                 mapping->a_ops = &gfs2_meta_aops;
-		mapping->host = s->s_bdev->bd_inode;
+		mapping->host = s->s_bdev->bd_mapping->host;
 		mapping->flags = 0;
 		mapping_set_gfp_mask(mapping, GFP_NOFS);
 		mapping->i_private_data = NULL;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 572d58e86296..fcf7dfd14f52 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -114,7 +114,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	address_space_init_once(mapping);
 	mapping->a_ops = &gfs2_rgrp_aops;
-	mapping->host = sb->s_bdev->bd_inode;
+	mapping->host = sb->s_bdev->bd_mapping->host;
 	mapping->flags = 0;
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	mapping->i_private_data = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 05/11] blkdev_write_iter(): saner way to get inode and bdev
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (2 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 04/11] gfs2: more obvious initializations of mapping->host Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 06/11] blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here Al Viro
                                                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

... same as in other methods - bdev_file_inode() and I_BDEV() of that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/fops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index af6c244314af..040743a3b43d 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -668,8 +668,8 @@ static ssize_t blkdev_buffered_write(struct kiocb *iocb, struct iov_iter *from)
 static ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
-	struct block_device *bdev = I_BDEV(file->f_mapping->host);
-	struct inode *bd_inode = bdev->bd_inode;
+	struct inode *bd_inode = bdev_file_inode(file);
+	struct block_device *bdev = I_BDEV(bd_inode);
 	loff_t size = bdev_nr_bytes(bdev);
 	size_t shorted = 0;
 	ssize_t ret;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 06/11] blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here...
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (3 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 05/11] blkdev_write_iter(): saner way to get inode and bdev Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 07/11] ext4: remove block_device_ejected() Al Viro
                                                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/ioctl.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index d365d8e92f98..e0c2d834df7a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -97,7 +97,6 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, len;
-	struct inode *inode = bdev->bd_inode;
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
@@ -120,13 +119,13 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode,
 	if (start + len > bdev_nr_bytes(bdev))
 		return -EINVAL;
 
-	filemap_invalidate_lock(inode->i_mapping);
+	filemap_invalidate_lock(bdev->bd_mapping);
 	err = truncate_bdev_range(bdev, mode, start, start + len - 1);
 	if (err)
 		goto fail;
 	err = blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL);
 fail:
-	filemap_invalidate_unlock(inode->i_mapping);
+	filemap_invalidate_unlock(bdev->bd_mapping);
 	return err;
 }
 
@@ -166,7 +165,6 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
 {
 	uint64_t range[2];
 	uint64_t start, end, len;
-	struct inode *inode = bdev->bd_inode;
 	int err;
 
 	if (!(mode & BLK_OPEN_WRITE))
@@ -189,7 +187,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
 		return -EINVAL;
 
 	/* Invalidate the page cache, including dirty pages */
-	filemap_invalidate_lock(inode->i_mapping);
+	filemap_invalidate_lock(bdev->bd_mapping);
 	err = truncate_bdev_range(bdev, mode, start, end);
 	if (err)
 		goto fail;
@@ -198,7 +196,7 @@ static int blk_ioctl_zeroout(struct block_device *bdev, blk_mode_t mode,
 				   BLKDEV_ZERO_NOUNMAP);
 
 fail:
-	filemap_invalidate_unlock(inode->i_mapping);
+	filemap_invalidate_unlock(bdev->bd_mapping);
 	return err;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 07/11] ext4: remove block_device_ejected()
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (4 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 06/11] blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 08/11] block: move two helpers into bdev.c Al Viro
                                                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

From: Yu Kuai <yukuai3@huawei.com>

block_device_ejected() is added by commit bdfe0cbd746a ("Revert
"ext4: remove block_device_ejected"") in 2015. At that time 'bdi->wb'
is destroyed synchronized from del_gendisk(), hence if ext4 is still
mounted, and then mark_buffer_dirty() will reference destroyed 'wb'.
However, such problem doesn't exist anymore:

- commit d03f6cdc1fc4 ("block: Dynamically allocate and refcount
backing_dev_info") switch bdi to use refcounting;
- commit 13eec2363ef0 ("fs: Get proper reference for s_bdi"), will grab
additional reference of bdi while mounting, so that 'bdi->wb' will not
be destroyed until generic_shutdown_super().

Hence remove this dead function block_device_ejected().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ext4/super.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0be1c3a7ffa0..6e2bd802b50c 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -492,22 +492,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else.  Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
-	struct inode *bd_inode = sb->s_bdev->bd_inode;
-	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
-	return bdi->dev == NULL;
-}
-
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
 {
 	struct super_block		*sb = journal->j_private;
@@ -6168,8 +6152,6 @@ static int ext4_commit_super(struct super_block *sb)
 
 	if (!sbh)
 		return -EINVAL;
-	if (block_device_ejected(sb))
-		return -ENODEV;
 
 	ext4_update_super(sb);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 08/11] block: move two helpers into bdev.c
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (5 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 07/11] ext4: remove block_device_ejected() Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) Al Viro
                                                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

From: Yu Kuai <yukuai3@huawei.com>

disk_live() and block_size() access bd_inode directly, prepare to remove
the field bd_inode from block_device, and only access bd_inode in block
layer.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bdev.c           | 12 ++++++++++++
 include/linux/blkdev.h | 12 ++----------
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 39a2fe9f84dd..31384396fc31 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1252,6 +1252,18 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 	blkdev_put_no_open(bdev);
 }
 
+bool disk_live(struct gendisk *disk)
+{
+	return !inode_unhashed(disk->part0->bd_inode);
+}
+EXPORT_SYMBOL_GPL(disk_live);
+
+unsigned int block_size(struct block_device *bdev)
+{
+	return 1 << bdev->bd_inode->i_blkbits;
+}
+EXPORT_SYMBOL_GPL(block_size);
+
 static int __init setup_bdev_allow_write_mounted(char *str)
 {
 	if (kstrtobool(str, &bdev_allow_write_mounted))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 172c91879999..2c0d3a89002c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -211,11 +211,6 @@ struct gendisk {
 	struct blk_independent_access_ranges *ia_ranges;
 };
 
-static inline bool disk_live(struct gendisk *disk)
-{
-	return !inode_unhashed(disk->part0->bd_inode);
-}
-
 /**
  * disk_openers - returns how many openers are there for a disk
  * @disk: disk to check
@@ -1364,11 +1359,6 @@ static inline unsigned int blksize_bits(unsigned int size)
 	return order_base_2(size >> SECTOR_SHIFT) + SECTOR_SHIFT;
 }
 
-static inline unsigned int block_size(struct block_device *bdev)
-{
-	return 1 << bdev->bd_inode->i_blkbits;
-}
-
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
 
@@ -1536,6 +1526,8 @@ void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
 struct block_device *file_bdev(struct file *bdev_file);
+bool disk_live(struct gendisk *disk);
+unsigned int block_size(struct block_device *bdev);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (6 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 08/11] block: move two helpers into bdev.c Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 18:04                                   ` Matthew Sakai
  2024-04-11 14:53                                 ` [PATCH 10/11] bcachefs: remove dead function bdev_sectors() Al Viro
                                                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

going to be faster, actually - shift is cheaper than dereference...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/md/dm-vdo/dm-vdo-target.c      | 4 ++--
 drivers/md/dm-vdo/indexer/io-factory.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-vdo/dm-vdo-target.c b/drivers/md/dm-vdo/dm-vdo-target.c
index 5a4b0a927f56..b423bec6458b 100644
--- a/drivers/md/dm-vdo/dm-vdo-target.c
+++ b/drivers/md/dm-vdo/dm-vdo-target.c
@@ -878,7 +878,7 @@ static int parse_device_config(int argc, char **argv, struct dm_target *ti,
 	}
 
 	if (config->version == 0) {
-		u64 device_size = i_size_read(config->owned_device->bdev->bd_inode);
+		u64 device_size = bdev_nr_bytes(config->owned_device->bdev);
 
 		config->physical_blocks = device_size / VDO_BLOCK_SIZE;
 	}
@@ -1011,7 +1011,7 @@ static void vdo_status(struct dm_target *ti, status_type_t status_type,
 
 static block_count_t __must_check get_underlying_device_block_count(const struct vdo *vdo)
 {
-	return i_size_read(vdo_get_backing_device(vdo)->bd_inode) / VDO_BLOCK_SIZE;
+	return bdev_nr_bytes(vdo_get_backing_device(vdo)) / VDO_BLOCK_SIZE;
 }
 
 static int __must_check process_vdo_message_locked(struct vdo *vdo, unsigned int argc,
diff --git a/drivers/md/dm-vdo/indexer/io-factory.c b/drivers/md/dm-vdo/indexer/io-factory.c
index 515765d35794..1bee9d63dc0a 100644
--- a/drivers/md/dm-vdo/indexer/io-factory.c
+++ b/drivers/md/dm-vdo/indexer/io-factory.c
@@ -90,7 +90,7 @@ void uds_put_io_factory(struct io_factory *factory)
 
 size_t uds_get_writable_size(struct io_factory *factory)
 {
-	return i_size_read(factory->bdev->bd_inode);
+	return bdev_nr_bytes(factory->bdev);
 }
 
 /* Create a struct dm_bufio_client for an index region starting at offset. */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 10/11] bcachefs: remove dead function bdev_sectors()
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (7 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-11 14:53                                 ` [PATCH 11/11] block2mtd: prevent direct access of bd_inode Al Viro
  2024-04-17 11:05                                 ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Christian Brauner
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

From: Yu Kuai <yukuai3@huawei.com>

bdev_sectors() is not used hence remove it.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/bcachefs/util.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index b7e7c29278fc..6d666986f39a 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -445,11 +445,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
 void bch2_bio_map(struct bio *bio, void *base, size_t);
 int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
 
-static inline sector_t bdev_sectors(struct block_device *bdev)
-{
-	return bdev->bd_inode->i_size >> 9;
-}
-
 #define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 11/11] block2mtd: prevent direct access of bd_inode
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (8 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 10/11] bcachefs: remove dead function bdev_sectors() Al Viro
@ 2024-04-11 14:53                                 ` Al Viro
  2024-04-17 11:05                                 ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Christian Brauner
  10 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-11 14:53 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

From: Yu Kuai <yukuai3@huawei.com>

Now that block2mtd stash the file of opened bdev, it's ok to get inode
from the file.

[AV: use bdev_nr_bytes()]

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/mtd/devices/block2mtd.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index caacdc0a3819..b06c8dd51562 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -265,6 +265,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 	struct file *bdev_file;
 	struct block_device *bdev;
 	struct block2mtd_dev *dev;
+	loff_t size;
 	char *name;
 
 	if (!devname)
@@ -291,7 +292,8 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		goto err_free_block2mtd;
 	}
 
-	if ((long)bdev->bd_inode->i_size % erase_size) {
+	size = bdev_nr_bytes(bdev);
+	if ((long)size % erase_size) {
 		pr_err("erasesize must be a divisor of device size\n");
 		goto err_free_block2mtd;
 	}
@@ -309,7 +311,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 
 	dev->mtd.name = name;
 
-	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+	dev->mtd.size = size & PAGE_MASK;
 	dev->mtd.erasesize = erase_size;
 	dev->mtd.writesize = 1;
 	dev->mtd.writebufsize = PAGE_SIZE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there
  2024-04-11 14:53                                 ` [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there Al Viro
@ 2024-04-11 14:59                                   ` Matthew Wilcox
  0 siblings, 0 replies; 116+ messages in thread
From: Matthew Wilcox @ 2024-04-11 14:59 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Thu, Apr 11, 2024 at 03:53:38PM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-07  5:11                   ` Al Viro
  2024-04-07  5:21                     ` Al Viro
@ 2024-04-11 15:22                     ` Matthew Wilcox
  1 sibling, 0 replies; 116+ messages in thread
From: Matthew Wilcox @ 2024-04-11 15:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Sun, Apr 07, 2024 at 06:11:19AM +0100, Al Viro wrote:
> On Sun, Apr 07, 2024 at 05:57:58AM +0100, Al Viro wrote:
> 
> > PS: in grow_dev_folio() we probably want
> > 	struct address_space *mapping = bdev->bd_inode->i_mapping;
> > instead of
> > 	struct inode *inode = bdev->bd_inode;
> > as one of the preliminary chunks.
> > FWIW, it really looks like address_space (== page cache of block device,
> > not an unreasonably candidate for primitive) and block size (well,
> > logarithm thereof) cover the majority of what remains, with device
> > size possibly being (remote) third...
> 
> Incidentally, how painful would it be to switch __bread_gfp() and __bread()
> to passing *logarithm* of block size instead of block size?  And possibly
> supply the same to clean_bdev_aliases()...

I've looked at it because blksize_bits() was pretty horrid.  But I got
scared because I couldn't figure out how to make unconverted places
fail to compile, without doing something ugly like

-__bread(struct block_device *bdev, sector_t block, unsigned size)
+__bread(unsigned shift, struct block_device *bdev, sector_t block)

I assume you're not talking about changing bh->b_size, just passing in
the log and comparing bh->b_size to 1<<shift?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-07  4:05   ` Al Viro
  2024-04-07  4:08     ` Al Viro
@ 2024-04-11 16:13     ` Gao Xiang
  2024-04-12  1:14       ` Yu Kuai
  2024-04-25 19:56       ` Al Viro
  1 sibling, 2 replies; 116+ messages in thread
From: Gao Xiang @ 2024-04-11 16:13 UTC (permalink / raw)
  To: Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3

Hi Al,

On 2024/4/7 12:05, Al Viro wrote:
> On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Now that all filesystems stash the bdev file, it's ok to get inode
>> for the file.
> 
> Looking at the only user of erofs_buf->inode (erofs_bread())...  We
> use the inode for two things there - block size calculation (to get
> from block number to position in bytes) and access to page cache.
> We read in full pages anyway.  And frankly, looking at the callers,
> we really would be better off if we passed position in bytes instead
> of block number.  IOW, it smells like erofs_bread() having wrong type.
> 
> Look at the callers.  With 3 exceptions it's
> fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
> fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
> fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
> fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);
> 
> and all of them actually want the return value + erofs_offset(...).  IOW,
> we take a linear position (in bytes).  Divide it by block size (from sb).
> Pass the factor to erofs_bread(), where we multiply that by block size
> (from inode), see which page will that be in, get that page and return a
> pointer *into* that page.  Then we again divide the same position
> by block size (from sb) and add the remainder to the pointer returned
> by erofs_bread().
> 
> IOW, it would be much easier to pass the position directly and to hell
> with block size logics.  Three exceptions to that pattern:
> 
> fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
> fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
> fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
> 
> Those could bloody well multiply the argument by block size;
> the first one (erofs_read_metabuf()) is also interesting - its
> callers themselves follow the similar pattern.  So it might be
> worth passing it a position in bytes as well...
> 
> In any case, all 3 have superblock reference, so they can convert
> from blocks to bytes conveniently.  Which means that erofs_bread()
> doesn't need to mess with block size considerations at all.
> 
> IOW, it might make sense to replace erofs_buf->inode with
> pointer to address space.  And use file_mapping() instead of
> file_inode() in that patch...

Just saw this again by chance, which is unexpected.

Yeah, I think that is a good idea.  The story is that erofs_bread()
was derived from a page-based interface:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35

so it was once a page index number.  I think a byte offset will be
a better interface to clean up these, thanks for your time and work
on this!

BTW, sightly off the topic:

I'm little confused why I'm not be looped for this version this time
even:

  1) I explicitly asked to Cc the mailing list so that I could find
     the latest discussion and respond in time:
      https://lore.kernel.org/r/5e04a86d-8bbd-41da-95f6-cf1562ed04f9@linux.alibaba.com

  2) I sent my r-v-b tag on RFC v4 (and the tag was added on this
     version) but I didn't receive this new version.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode)
  2024-04-11 14:53                                 ` [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) Al Viro
@ 2024-04-11 18:04                                   ` Matthew Sakai
  0 siblings, 0 replies; 116+ messages in thread
From: Matthew Sakai @ 2024-04-11 18:04 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On 4/11/24 10:53, Al Viro wrote:
> going to be faster, actually - shift is cheaper than dereference...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Matthew Sakai <msakai@redhat.com>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-11 16:13     ` Gao Xiang
@ 2024-04-12  1:14       ` Yu Kuai
  2024-04-25 19:56       ` Al Viro
  1 sibling, 0 replies; 116+ messages in thread
From: Yu Kuai @ 2024-04-12  1:14 UTC (permalink / raw)
  To: Gao Xiang, Al Viro, Yu Kuai
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C)

Hi,

在 2024/04/12 0:13, Gao Xiang 写道:
> Hi Al,
> 
> On 2024/4/7 12:05, Al Viro wrote:
>> On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Now that all filesystems stash the bdev file, it's ok to get inode
>>> for the file.
>>
>> Looking at the only user of erofs_buf->inode (erofs_bread())...  We
>> use the inode for two things there - block size calculation (to get
>> from block number to position in bytes) and access to page cache.
>> We read in full pages anyway.  And frankly, looking at the callers,
>> we really would be better off if we passed position in bytes instead
>> of block number.  IOW, it smells like erofs_bread() having wrong type.
>>
>> Look at the callers.  With 3 exceptions it's
>> fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, 
>> *offset), EROFS_KMAP);
>> fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, 
>> *offset), EROFS_KMAP);
>> fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, 
>> erofs_blknr(sb, it.pos), EROFS_KMAP);
>> fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, 
>> erofs_blknr(sb, it.pos),
>> fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(it->sb, it->pos),
>> fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, 
>> erofs_blknr(sb, it->pos),
>> fs/erofs/zdata.c:943:           src = erofs_bread(&buf, 
>> erofs_blknr(sb, pos), EROFS_KMAP);
>>
>> and all of them actually want the return value + erofs_offset(...).  IOW,
>> we take a linear position (in bytes).  Divide it by block size (from sb).
>> Pass the factor to erofs_bread(), where we multiply that by block size
>> (from inode), see which page will that be in, get that page and return a
>> pointer *into* that page.  Then we again divide the same position
>> by block size (from sb) and add the remainder to the pointer returned
>> by erofs_bread().
>>
>> IOW, it would be much easier to pass the position directly and to hell
>> with block size logics.  Three exceptions to that pattern:
>>
>> fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
>> fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
>> fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
>>
>> Those could bloody well multiply the argument by block size;
>> the first one (erofs_read_metabuf()) is also interesting - its
>> callers themselves follow the similar pattern.  So it might be
>> worth passing it a position in bytes as well...
>>
>> In any case, all 3 have superblock reference, so they can convert
>> from blocks to bytes conveniently.  Which means that erofs_bread()
>> doesn't need to mess with block size considerations at all.
>>
>> IOW, it might make sense to replace erofs_buf->inode with
>> pointer to address space.  And use file_mapping() instead of
>> file_inode() in that patch...
> 
> Just saw this again by chance, which is unexpected.
> 
> Yeah, I think that is a good idea.  The story is that erofs_bread()
> was derived from a page-based interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35 
> 
> 
> so it was once a page index number.  I think a byte offset will be
> a better interface to clean up these, thanks for your time and work
> on this!
> 
> BTW, sightly off the topic:
> 
> I'm little confused why I'm not be looped for this version this time
> even:
> 
>   1) I explicitly asked to Cc the mailing list so that I could find
>      the latest discussion and respond in time:
>       
> https://lore.kernel.org/r/5e04a86d-8bbd-41da-95f6-cf1562ed04f9@linux.alibaba.com 
> 
> 
>   2) I sent my r-v-b tag on RFC v4 (and the tag was added on this
>      version) but I didn't receive this new version.

This is my fault to blame, I gave up to cc all address from 
get_maintainer.pl for this set, because I somehow can't send this set
with too much CC. However, I should still CC you.

Thanks,
Kuai

> 
> Thanks,
> Gao Xiang
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-11 14:49                             ` Al Viro
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
@ 2024-04-12  1:38                               ` Yu Kuai
  2024-04-12  2:59                                 ` Al Viro
  1 sibling, 1 reply; 116+ messages in thread
From: Yu Kuai @ 2024-04-12  1:38 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2024/04/11 22:49, Al Viro 写道:
> On Thu, Apr 11, 2024 at 03:04:09PM +0100, Al Viro wrote:
>>> lot slimmer and we don't need to care about messing with a lot of that
>>> code. I didn't care about making it static inline because that might've
>>> meant we need to move other stuff into the header as well. Imho, it's
>>> not that important but if it's a big deal to any of you just do the
>>> changes on top of it, please.
>>>
>>> Pushed to
>>> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.super
>>>
>>> If I hear no objections that'll show up in -next tomorrow. Al, would be
>>> nice if you could do your changes on top of this, please.
>>
>> Objection: start with adding bdev->bd_mapping, next convert the really
>> obvious instances to it and most of this series becomes not needed at
>> all.
>>
>> Really.  There is no need whatsoever to push struct file down all those
>> paths.
>>

There really is a long history here. The beginning of the attempt to try
removing the filed 'bd_inode' is that I want to make a room from the
first cacheline(64 bytes) for a new 'unsigned long flags' field because
we keep adding new 'bool xxx' field [1]. And adding a new 'bd_mapping'
field will make that impossible.

I do like the idea of passing 'bd_mapping' here, however, will it be
considered to expose bdev_mapping() for slow path, or to pass in bd_file
and get it by 'f_mapping' for fast path? So that a new field in the
first cacheline will still be possible, other than that there will be
more code change, I don't see any difference for performance.

Thanks,
Kuai

[1] 
https://lore.kernel.org/all/20231122103103.1104589-3-yukuai1@huaweicloud.com/
>> And yes, erofs and buffer.c stuff belongs on top of that, no arguments here.
> 
> FWIW, here's what you get if this is done in such order:
> 
> block/bdev.c                           | 31 ++++++++++++++++++++++---------
> block/blk-zoned.c                      |  4 ++--
> block/fops.c                           |  4 ++--
> block/genhd.c                          |  2 +-
> block/ioctl.c                          | 14 ++++++--------
> block/partitions/core.c                |  2 +-
> drivers/md/bcache/super.c              |  2 +-
> drivers/md/dm-vdo/dm-vdo-target.c      |  4 ++--
> drivers/md/dm-vdo/indexer/io-factory.c |  2 +-
> drivers/mtd/devices/block2mtd.c        |  6 ++++--
> drivers/scsi/scsicam.c                 |  2 +-
> fs/bcachefs/util.h                     |  5 -----
> fs/btrfs/disk-io.c                     |  6 +++---
> fs/btrfs/volumes.c                     |  2 +-
> fs/btrfs/zoned.c                       |  2 +-
> fs/buffer.c                            | 10 +++++-----
> fs/cramfs/inode.c                      |  2 +-
> fs/ext4/dir.c                          |  2 +-
> fs/ext4/ext4_jbd2.c                    |  2 +-
> fs/ext4/super.c                        | 24 +++---------------------
> fs/gfs2/glock.c                        |  2 +-
> fs/gfs2/ops_fstype.c                   |  2 +-
> fs/jbd2/journal.c                      |  2 +-
> include/linux/blk_types.h              |  1 +
> include/linux/blkdev.h                 | 12 ++----------
> include/linux/buffer_head.h            |  4 ++--
> include/linux/jbd2.h                   |  4 ++--
> 27 files changed, 69 insertions(+), 86 deletions(-)
> 
> The bulk of the changes is straight replacements of foo->bd_inode->i_mapping
> with foo->bd_mapping.  That's completely mechanical and that takes out most
> of the bd_inode uses.  Anyway, patches in followups
> 
> .
> 


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-12  1:38                               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
@ 2024-04-12  2:59                                 ` Al Viro
  2024-04-12  4:41                                   ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-12  2:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christian Brauner, Jan Kara, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Apr 12, 2024 at 09:38:16AM +0800, Yu Kuai wrote:

> There really is a long history here. The beginning of the attempt to try
> removing the filed 'bd_inode' is that I want to make a room from the
> first cacheline(64 bytes) for a new 'unsigned long flags' field because
> we keep adding new 'bool xxx' field [1]. And adding a new 'bd_mapping'
> field will make that impossible.

Why does it need to be unsigned long?  dev_t is 32bit; what you need
is to keep this
        bool                    bd_read_only;   /* read-only policy */
	u8                      bd_partno;
	bool                    bd_write_holder;
	bool                    bd_has_submit_bio;

from blowing past u32.  Sure, you can't use test_bit() et.al. with u16,
but what's wrong with explicit bitwise operations?  You need some protection
for multiple writers, but you need it anyway - e.g. this
        if (bdev->bd_disk->fops->set_read_only) {
		ret = bdev->bd_disk->fops->set_read_only(bdev, n);
		if (ret)
			return ret;
	}
	bdev->bd_read_only = n;
will need the exclusion over the entire "call ->set_read_only() and set
the flag", not just for setting the flag itself.

And yes, it's a real-world bug - two threads calling BLKROSET on the
same opened file can race, with inconsistency between the flag and
whatever state ->set_read_only() modifies.

AFAICS, ->bd_write_holder is (apparently) relying upon ->open_mutex.
Whether it would be a good solution for ->bd_read_only is a question
to block folks, but some exclusion is obviously needed.

Let's sort that out, rather than papering it over with set_bit() et.al.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-12  2:59                                 ` Al Viro
@ 2024-04-12  4:41                                   ` Al Viro
  2024-04-12  7:13                                     ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-12  4:41 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christian Brauner, Jan Kara, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Apr 12, 2024 at 03:59:10AM +0100, Al Viro wrote:
> for multiple writers, but you need it anyway - e.g. this
>         if (bdev->bd_disk->fops->set_read_only) {
> 		ret = bdev->bd_disk->fops->set_read_only(bdev, n);
> 		if (ret)
> 			return ret;
> 	}
> 	bdev->bd_read_only = n;
> will need the exclusion over the entire "call ->set_read_only() and set
> the flag", not just for setting the flag itself.
> 
> And yes, it's a real-world bug - two threads calling BLKROSET on the
> same opened file can race, with inconsistency between the flag and
> whatever state ->set_read_only() modifies.

BLKROSET is CAP_SYS_ADMIN-only, so it's not a CVE fodder; the sky
is not falling.  The bug is real, though.

I see Christoph's postings in that thread; the thing is, it's not
just the flags that need to be protected.  If we end up deciding
that serialization for different flags should not be tied to the
same thing (which is reasonable - e.g. md_set_read_only() is not
something you want to shove under existing lock), I would still
suggest something along the lines of

	u32 __bd_flags;		// partno and flags

static inline u8 bd_partno(struct block_device *bdev)
{
	return bdev->__bd_flags & 0xff;
}

static void bd_set_flag(struct block_device *bdev, int flag)
{
	u32 v = bdev->__bd_flags;

	for (;;) {
		u32 w = cmpxchg(&bdev->__bd_flags, v, v | (1 << (flag + 8)));
		if (w == v)
			return;
		v = w;
	}
}

and similar for bd_clear_flag().  Changes of ->bd_partno never
happen - we set it at allocation time and never modify the sucker.

This is orthogonal to BLKROSET/BLKROSET exclusion, converting
->bd_inode accesses, etc.

Christoph, do you have any problems with that approach?

COMPLETELY UNTESTED patch along those lines follows; if it works,
it would need to be carved up.  And I would probably switch the
places where we do if (bdev->bd_partno) to if (bdev_is_partition(bdev)),
for better readability.

I'd converted only those 3 flags; again, this is just an untested
illustration to the above.

diff --git a/block/bdev.c b/block/bdev.c
index 7a5f611c3d2e..9aa23620fe92 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -411,13 +411,11 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
 	mutex_init(&bdev->bd_fsfreeze_mutex);
 	spin_lock_init(&bdev->bd_size_lock);
 	mutex_init(&bdev->bd_holder_lock);
-	bdev->bd_partno = partno;
+	bdev->__bd_flags = partno;
 	bdev->bd_inode = inode;
 	bdev->bd_queue = disk->queue;
-	if (partno)
-		bdev->bd_has_submit_bio = disk->part0->bd_has_submit_bio;
-	else
-		bdev->bd_has_submit_bio = false;
+	if (partno && bdev_test_flag(disk->part0, BD_HAS_SUBMIT_BIO))
+		bdev_set_flag(bdev, BD_HAS_SUBMIT_BIO);
 	bdev->bd_stats = alloc_percpu(struct disk_stats);
 	if (!bdev->bd_stats) {
 		iput(inode);
@@ -624,7 +622,7 @@ static void bd_end_claim(struct block_device *bdev, void *holder)
 		bdev->bd_holder = NULL;
 		bdev->bd_holder_ops = NULL;
 		mutex_unlock(&bdev->bd_holder_lock);
-		if (bdev->bd_write_holder)
+		if (bdev_test_flag(bdev, BD_WRITE_HOLDER))
 			unblock = true;
 	}
 	if (!whole->bd_holders)
@@ -640,7 +638,7 @@ static void bd_end_claim(struct block_device *bdev, void *holder)
 	 */
 	if (unblock) {
 		disk_unblock_events(bdev->bd_disk);
-		bdev->bd_write_holder = false;
+		bdev_clear_flag(bdev, BD_WRITE_HOLDER);
 	}
 }
 
@@ -892,9 +890,10 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
 		 * writeable reference is too fragile given the way @mode is
 		 * used in blkdev_get/put().
 		 */
-		if ((mode & BLK_OPEN_WRITE) && !bdev->bd_write_holder &&
+		if ((mode & BLK_OPEN_WRITE) &&
+		    !bdev_test_flag(bdev, BD_WRITE_HOLDER) &&
 		    (disk->event_flags & DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE)) {
-			bdev->bd_write_holder = true;
+			bdev_set_flag(bdev, BD_WRITE_HOLDER);
 			unblock_events = false;
 		}
 	}
diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..6a28b6b7062a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -615,7 +615,7 @@ static void __submit_bio(struct bio *bio)
 	if (unlikely(!blk_crypto_bio_prep(&bio)))
 		return;
 
-	if (!bio->bi_bdev->bd_has_submit_bio) {
+	if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO)) {
 		blk_mq_submit_bio(bio);
 	} else if (likely(bio_queue_enter(bio) == 0)) {
 		struct gendisk *disk = bio->bi_bdev->bd_disk;
@@ -723,7 +723,7 @@ void submit_bio_noacct_nocheck(struct bio *bio)
 	 */
 	if (current->bio_list)
 		bio_list_add(&current->bio_list[0], bio);
-	else if (!bio->bi_bdev->bd_has_submit_bio)
+	else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
 		__submit_bio_noacct_mq(bio);
 	else
 		__submit_bio_noacct(bio);
@@ -759,7 +759,7 @@ void submit_bio_noacct(struct bio *bio)
 	if (!bio_flagged(bio, BIO_REMAPPED)) {
 		if (unlikely(bio_check_eod(bio)))
 			goto end_io;
-		if (bdev->bd_partno && unlikely(blk_partition_remap(bio)))
+		if (bdev_partno(bdev) && unlikely(blk_partition_remap(bio)))
 			goto end_io;
 	}
 
@@ -989,7 +989,7 @@ void update_io_ticks(struct block_device *part, unsigned long now, bool end)
 		if (likely(try_cmpxchg(&part->bd_stamp, &stamp, now)))
 			__part_stat_add(part, io_ticks, end ? now - stamp : 1);
 	}
-	if (part->bd_partno) {
+	if (bdev_partno(part)) {
 		part = bdev_whole(part);
 		goto again;
 	}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32afb87efbd0..1c4bd891fd6d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -92,7 +92,7 @@ static bool blk_mq_check_inflight(struct request *rq, void *priv)
 	struct mq_inflight *mi = priv;
 
 	if (rq->part && blk_do_io_stat(rq) &&
-	    (!mi->part->bd_partno || rq->part == mi->part) &&
+	    (!bdev_partno(mi->part) || rq->part == mi->part) &&
 	    blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
 		mi->inflight[rq_data_dir(rq)]++;
 
diff --git a/block/early-lookup.c b/block/early-lookup.c
index 3effbd0d35e9..3fb57f7d2b12 100644
--- a/block/early-lookup.c
+++ b/block/early-lookup.c
@@ -78,7 +78,7 @@ static int __init devt_from_partuuid(const char *uuid_str, dev_t *devt)
 		 * to the partition number found by UUID.
 		 */
 		*devt = part_devt(dev_to_disk(dev),
-				  dev_to_bdev(dev)->bd_partno + offset);
+				  bdev_partno(dev_to_bdev(dev)) + offset);
 	} else {
 		*devt = dev->devt;
 	}
diff --git a/block/genhd.c b/block/genhd.c
index bb29a68e1d67..19cd1a31fa80 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -413,7 +413,8 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
 	elevator_init_mq(disk->queue);
 
 	/* Mark bdev as having a submit_bio, if needed */
-	disk->part0->bd_has_submit_bio = disk->fops->submit_bio != NULL;
+	if (disk->fops->submit_bio)
+		bdev_set_flag(disk->part0, BD_HAS_SUBMIT_BIO);
 
 	/*
 	 * If the driver provides an explicit major number it also must provide
diff --git a/block/ioctl.c b/block/ioctl.c
index 0c76137adcaa..be173e4ff43d 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -402,7 +402,10 @@ static int blkdev_roset(struct block_device *bdev, unsigned cmd,
 		if (ret)
 			return ret;
 	}
-	bdev->bd_read_only = n;
+	if (n)
+		bdev_set_flag(bdev, BD_READ_ONLY);
+	else
+		bdev_clear_flag(bdev, BD_READ_ONLY);
 	return 0;
 }
 
diff --git a/block/partitions/core.c b/block/partitions/core.c
index b11e88c82c8c..edd5309dc4ba 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -173,7 +173,7 @@ static struct parsed_partitions *check_partition(struct gendisk *hd)
 static ssize_t part_partition_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%d\n", dev_to_bdev(dev)->bd_partno);
+	return sprintf(buf, "%d\n", bdev_partno(dev_to_bdev(dev)));
 }
 
 static ssize_t part_start_show(struct device *dev,
@@ -250,7 +250,7 @@ static int part_uevent(const struct device *dev, struct kobj_uevent_env *env)
 {
 	const struct block_device *part = dev_to_bdev(dev);
 
-	add_uevent_var(env, "PARTN=%u", part->bd_partno);
+	add_uevent_var(env, "PARTN=%u", bdev_partno(part));
 	if (part->bd_meta_info && part->bd_meta_info->volname[0])
 		add_uevent_var(env, "PARTNAME=%s", part->bd_meta_info->volname);
 	return 0;
@@ -267,7 +267,7 @@ void drop_partition(struct block_device *part)
 {
 	lockdep_assert_held(&part->bd_disk->open_mutex);
 
-	xa_erase(&part->bd_disk->part_tbl, part->bd_partno);
+	xa_erase(&part->bd_disk->part_tbl, bdev_partno(part));
 	kobject_put(part->bd_holder_dir);
 
 	device_del(&part->bd_device);
@@ -338,8 +338,8 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
 	pdev->parent = ddev;
 
 	/* in consecutive minor range? */
-	if (bdev->bd_partno < disk->minors) {
-		devt = MKDEV(disk->major, disk->first_minor + bdev->bd_partno);
+	if (bdev_partno(bdev) < disk->minors) {
+		devt = MKDEV(disk->major, disk->first_minor + bdev_partno(bdev));
 	} else {
 		err = blk_alloc_ext_minor();
 		if (err < 0)
@@ -404,7 +404,7 @@ static bool partition_overlaps(struct gendisk *disk, sector_t start,
 
 	rcu_read_lock();
 	xa_for_each_start(&disk->part_tbl, idx, part, 1) {
-		if (part->bd_partno != skip_partno &&
+		if (bdev_partno(part) != skip_partno &&
 		    start < part->bd_start_sect + bdev_nr_sectors(part) &&
 		    start + length > part->bd_start_sect) {
 			overlap = true;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index cb1526ec44b5..bbbcbb36fb6e 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -45,10 +45,7 @@ struct block_device {
 	struct request_queue *	bd_queue;
 	struct disk_stats __percpu *bd_stats;
 	unsigned long		bd_stamp;
-	bool			bd_read_only;	/* read-only policy */
-	u8			bd_partno;
-	bool			bd_write_holder;
-	bool			bd_has_submit_bio;
+	u32			__bd_flags;	// partition number + flags
 	dev_t			bd_dev;
 	struct inode		*bd_inode;	/* will die */
 
@@ -86,6 +83,12 @@ struct block_device {
 #define bdev_kobj(_bdev) \
 	(&((_bdev)->bd_device.kobj))
 
+enum {
+	BD_READ_ONLY,		// read-only policy
+	BD_WRITE_HOLDER,
+	BD_HAS_SUBMIT_BIO
+};
+
 /*
  * Block error status values.  See block/blk-core:blk_errors for the details.
  * Alpha cannot write a byte atomically, so we need to use 32-bit value.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c3e8f7cf96be..d556cec9224b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -720,15 +720,51 @@ void invalidate_disk(struct gendisk *disk);
 void set_disk_ro(struct gendisk *disk, bool read_only);
 void disk_uevent(struct gendisk *disk, enum kobject_action action);
 
+static inline u8 bdev_partno(const struct block_device *bdev)
+{
+	return bdev->__bd_flags & 0xff;
+}
+
+static inline bool bdev_test_flag(const struct block_device *bdev, int flag)
+{
+	return bdev->__bd_flags & (1 << (flag + 8));
+}
+
+static inline void bdev_set_flag(struct block_device *bdev, int flag)
+{
+	u32 v = bdev->__bd_flags;
+
+	for (;;) {
+		u32 w = cmpxchg(&bdev->__bd_flags, v, v | (1 << (flag + 8)));
+
+		if (v == w)
+			return;
+		v = w;
+	}
+}
+
+static inline void bdev_clear_flag(struct block_device *bdev, int flag)
+{
+	u32 v = bdev->__bd_flags;
+
+	for (;;) {
+		u32 w = cmpxchg(&bdev->__bd_flags, v, v & ~(1 << (flag + 8)));
+
+		if (v == w)
+			return;
+		v = w;
+	}
+}
+
 static inline int get_disk_ro(struct gendisk *disk)
 {
-	return disk->part0->bd_read_only ||
+	return bdev_test_flag(disk->part0, BD_READ_ONLY) ||
 		test_bit(GD_READ_ONLY, &disk->state);
 }
 
 static inline int bdev_read_only(struct block_device *bdev)
 {
-	return bdev->bd_read_only || get_disk_ro(bdev->bd_disk);
+	return bdev_test_flag(bdev, BD_READ_ONLY) || get_disk_ro(bdev->bd_disk);
 }
 
 bool set_capacity_and_notify(struct gendisk *disk, sector_t size);
@@ -1095,7 +1131,7 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
 
 static inline bool bdev_is_partition(struct block_device *bdev)
 {
-	return bdev->bd_partno;
+	return bdev_partno(bdev) != 0;
 }
 
 enum blk_default_limits {
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index abeba356bc3f..ec7eb365b152 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -59,7 +59,7 @@ static inline void part_stat_set_all(struct block_device *part, int value)
 
 #define part_stat_add(part, field, addnd)	do {			\
 	__part_stat_add((part), field, addnd);				\
-	if ((part)->bd_partno)						\
+	if (bdev_partno(part))						\
 		__part_stat_add(bdev_whole(part), field, addnd);	\
 } while (0)
 
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 552738f14275..e05583e54fa5 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -966,13 +966,13 @@ char *bdev_name(char *buf, char *end, struct block_device *bdev,
 
 	hd = bdev->bd_disk;
 	buf = string(buf, end, hd->disk_name, spec);
-	if (bdev->bd_partno) {
+	if (bdev_partno(bdev)) {
 		if (isdigit(hd->disk_name[strlen(hd->disk_name)-1])) {
 			if (buf < end)
 				*buf = 'p';
 			buf++;
 		}
-		buf = number(buf, end, bdev->bd_partno, spec);
+		buf = number(buf, end, bdev_partno(bdev), spec);
 	}
 	return buf;
 }

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-12  4:41                                   ` Al Viro
@ 2024-04-12  7:13                                     ` Al Viro
  0 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-12  7:13 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christian Brauner, Jan Kara, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Fri, Apr 12, 2024 at 05:41:16AM +0100, Al Viro wrote:

> Christoph, do you have any problems with that approach?
> 
> COMPLETELY UNTESTED patch along those lines follows; if it works,
> it would need to be carved up.  And I would probably switch the
> places where we do if (bdev->bd_partno) to if (bdev_is_partition(bdev)),
> for better readability.

See 
git://git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git bd_flags
for a carve-up

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-11 14:04                           ` Al Viro
  2024-04-11 14:49                             ` Al Viro
@ 2024-04-12  9:21                             ` Christian Brauner
  2024-04-12 11:29                               ` Al Viro
  1 sibling, 1 reply; 116+ messages in thread
From: Christian Brauner @ 2024-04-12  9:21 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Thu, Apr 11, 2024 at 03:04:09PM +0100, Al Viro wrote:
> On Thu, Apr 11, 2024 at 01:56:03PM +0200, Christian Brauner wrote:
> > On Wed, Apr 10, 2024 at 11:34:43PM +0100, Al Viro wrote:
> > > On Wed, Apr 10, 2024 at 12:59:11PM +0200, Jan Kara wrote:
> > > 
> > > > I agree with Christian and Al - and I think I've expressed that already in
> > > > the previous version of the series [1] but I guess I was not explicit
> > > > enough :). I think the initial part of the series (upto patch 21, perhaps
> > > > excluding patch 20) is a nice cleanup but the latter part playing with
> > > > stashing struct file is not an improvement and seems pointless to me. So
> > > > I'd separate the initial part cleaning up the obvious places and let
> > > > Christian merge it and then we can figure out what (if anything) to do with
> > > > remaining bd_inode uses in fs/buffer.c etc. E.g. what Al suggests with
> > > > bd_mapping makes sense to me but I didn't check what's left after your
> > > > initial patches...
> > > 
> > > FWIW, experimental on top of -next:
> > 
> > Ok, let's move forward with this. I've applied the first 19 patches.
> > Patch 20 is the start of what we all disliked. 21 is clearly a bugfix
> > for current code so that'll go separately from the rest. I've replaced
> > open-code f_mapping access with file_mapping(). The symmetry between
> > file_inode() and file_mapping() is quite nice.
> > 
> > Al, your idea to switch erofs away from buf->inode can go on top of what
> > Yu did imho. There's no real reason to throw it away imho.
> > 
> > I've exported bdev_mapping() because it really makes the btrfs change a
> > lot slimmer and we don't need to care about messing with a lot of that
> > code. I didn't care about making it static inline because that might've
> > meant we need to move other stuff into the header as well. Imho, it's
> > not that important but if it's a big deal to any of you just do the
> > changes on top of it, please.
> > 
> > Pushed to
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.super
> > 
> > If I hear no objections that'll show up in -next tomorrow. Al, would be
> > nice if you could do your changes on top of this, please.
> 
> Objection: start with adding bdev->bd_mapping, next convert the really
> obvious instances to it and most of this series becomes not needed at
> all.
> 
> Really.  There is no need whatsoever to push struct file down all those
> paths.

Your series just replaces bd_inode in struct block_device with
bd_mapping. In a lot of places we do have immediate access to the bdev
file without changing any calling conventions whatsoever. IMO it's
perfectly fine to just use file_mapping() there. Sure, let's use
bdev_mapping() in instances like btrfs where we'd otherwise have to
change function signatures I'm not opposed to that. But there's no good
reason to just replace everything with bdev->bd_mapping access. And
really, why keep that thing in struct block_device when we can avoid it.

> 
> And yes, erofs and buffer.c stuff belongs on top of that, no arguments here.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-12  9:21                             ` Christian Brauner
@ 2024-04-12 11:29                               ` Al Viro
  2024-04-13 15:25                                 ` Christian Brauner
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-12 11:29 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Fri, Apr 12, 2024 at 11:21:08AM +0200, Christian Brauner wrote:

> Your series just replaces bd_inode in struct block_device with
> bd_mapping. In a lot of places we do have immediate access to the bdev
> file without changing any calling conventions whatsoever. IMO it's
> perfectly fine to just use file_mapping() there. Sure, let's use
> bdev_mapping() in instances like btrfs where we'd otherwise have to
> change function signatures I'm not opposed to that. But there's no good
> reason to just replace everything with bdev->bd_mapping access. And
> really, why keep that thing in struct block_device when we can avoid it.

Because having to have struct file around in the places where we want to
get to page cache of block device fast is often inconvenient (see fs/buffer.c,
if nothing else).

It also simplifies the hell out of the patch series - it's one obviously
safe automatic change in a single commit.

And AFAICS the flags-related rationale can be dealt with in a much simpler
way - see #bf_flags in my tree.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-12 11:29                               ` Al Viro
@ 2024-04-13 15:25                                 ` Christian Brauner
  2024-04-15 20:45                                   ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Christian Brauner @ 2024-04-13 15:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Fri, Apr 12, 2024 at 12:29:19PM +0100, Al Viro wrote:
> On Fri, Apr 12, 2024 at 11:21:08AM +0200, Christian Brauner wrote:
> 
> > Your series just replaces bd_inode in struct block_device with
> > bd_mapping. In a lot of places we do have immediate access to the bdev
> > file without changing any calling conventions whatsoever. IMO it's
> > perfectly fine to just use file_mapping() there. Sure, let's use
> > bdev_mapping() in instances like btrfs where we'd otherwise have to
> > change function signatures I'm not opposed to that. But there's no good
> > reason to just replace everything with bdev->bd_mapping access. And
> > really, why keep that thing in struct block_device when we can avoid it.
> 
> Because having to have struct file around in the places where we want to
> get to page cache of block device fast is often inconvenient (see fs/buffer.c,
> if nothing else).

Yes, agreed. But my point is why can't we expose bdev_mapping() for
exactly that purpose without having to have that bd_mapping member in
struct block_device? We don't want to trade bd_inode for bd_mapping in
that struct imho. IOW, if we can avoid bloating struct block device with
additional members then we should do that. Is there some performance
concern that I'm missing and if so are there numbers to back this?

> It also simplifies the hell out of the patch series - it's one obviously
> safe automatic change in a single commit.

It's trivial to fold the simple file_mapping() conversion into a single
patch as well. It's a pure artifact of splitting the patches per
subsystem/driver. That's just because people have wildly different
opinions on how to do such conversion. But really, that can be trivially
dealt with.

> And AFAICS the flags-related rationale can be dealt with in a much simpler
> way - see #bf_flags in my tree.

That's certainly worth doing independent of this discussion.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-13 15:25                                 ` Christian Brauner
@ 2024-04-15 20:45                                   ` Al Viro
  2024-04-16  6:32                                     ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-15 20:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Sat, Apr 13, 2024 at 05:25:01PM +0200, Christian Brauner wrote:

> > It also simplifies the hell out of the patch series - it's one obviously
> > safe automatic change in a single commit.
> 
> It's trivial to fold the simple file_mapping() conversion into a single
> patch as well.

... after a bunch of patches that propagate struct file to places where
it has no business being.  Compared to a variant that doesn't need those
patches at all.

> It's a pure artifact of splitting the patches per
> subsystem/driver.

No, it is not.  ->bd_mapping conversion can be done without any
preliminaries.  Note that it doesn't need messing with bdev_read_folio(),
it doesn't need this journal->j_fs_dev_file thing, etc.

One thing I believe is completely wrong in this series is bdev_inode()
existence.  It (and equivalent use of file_inode() on struct file is
even worse) is papering over the real interface deficiencies.  And
extra file_inode() uses are just about impossible to catch ;-/

IMO we should *never* use file_inode() on opened block devices.
At all.  It's brittle, it's asking for trouble as soon as somebody
passes a normally opened struct file to one of the functions using it
and it papers over the missing primitives.

As for the space concerns...  With struct device embedded into those
things, it's not even funny.  Space within the first cacheline - sure,
but we can have a pointer in there just fine.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-06  9:09 ` [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2024-04-16  1:35   ` Al Viro
  2024-04-16  8:47     ` Alexander Gordeev
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-16  1:35 UTC (permalink / raw)
  To: linux-s390
  Cc: jack, hch, brauner, axboe, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai3, Yu Kuai

On Sat, Apr 06, 2024 at 05:09:19PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_devcie.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
>  drivers/s390/block/dasd_ioctl.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
> index 7e0ed7032f76..c1201590f343 100644
> --- a/drivers/s390/block/dasd_ioctl.c
> +++ b/drivers/s390/block/dasd_ioctl.c
> @@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
>  	 * enabling the device later.
>  	 */
>  	if (fdata->start_unit == 0) {
> -		block->gdp->part0->bd_inode->i_blkbits =
> -			blksize_bits(fdata->blksize);
> +		rc = set_blocksize(block->gdp->part0, fdata->blksize);

Could somebody (preferably s390 folks) explain what is going on in
dasd_format()?  The change in this commit is *NOT* an equivalent
transformation - mainline does not evict the page cache of device.

Is that
	* intentional behaviour in mainline version, possibly broken
by this patch
	* a bug in mainline accidentally fixed by this patch
	* something else?

And shouldn't there be an exclusion between that and having a filesystem
on a partition of that disk currently mounted?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-15 20:45                                   ` Al Viro
@ 2024-04-16  6:32                                     ` Al Viro
  2024-04-17  4:35                                       ` [PATCH][RFC] set_blocksize() in pktcdvd (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
                                                         ` (2 more replies)
  0 siblings, 3 replies; 116+ messages in thread
From: Al Viro @ 2024-04-16  6:32 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Mon, Apr 15, 2024 at 09:45:11PM +0100, Al Viro wrote:
> On Sat, Apr 13, 2024 at 05:25:01PM +0200, Christian Brauner wrote:
> 
> > > It also simplifies the hell out of the patch series - it's one obviously
> > > safe automatic change in a single commit.
> > 
> > It's trivial to fold the simple file_mapping() conversion into a single
> > patch as well.
> 
> ... after a bunch of patches that propagate struct file to places where
> it has no business being.  Compared to a variant that doesn't need those
> patches at all.
> 
> > It's a pure artifact of splitting the patches per
> > subsystem/driver.
> 
> No, it is not.  ->bd_mapping conversion can be done without any
> preliminaries.  Note that it doesn't need messing with bdev_read_folio(),
> it doesn't need this journal->j_fs_dev_file thing, etc.
> 
> One thing I believe is completely wrong in this series is bdev_inode()
> existence.  It (and equivalent use of file_inode() on struct file is
> even worse) is papering over the real interface deficiencies.  And
> extra file_inode() uses are just about impossible to catch ;-/
> 
> IMO we should *never* use file_inode() on opened block devices.
> At all.  It's brittle, it's asking for trouble as soon as somebody
> passes a normally opened struct file to one of the functions using it
> and it papers over the missing primitives.

BTW, speaking of the things where opened struct file would be a good
idea - set_blocksize() should take an opened struct file, and it should
have non-NULL ->private_data.

Changing block size under e.g. a mounted filesystem should never happen;
doing that is asking for serious breakage.

Looking through the current callers (mainline), most are OK (and easy
to switch).  However,
	
drivers/block/pktcdvd.c:2285:           set_blocksize(disk->part0, CD_FRAMESIZE);
drivers/block/pktcdvd.c:2529:   set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
	Might be broken; pktcdvd.c being what it is...

drivers/md/bcache/super.c:2558: if (set_blocksize(file_bdev(bdev_file), 4096))
	Almost certainly broken; hit register_bcache() with pathname of a mounted
block device, and if the block size on filesystem in question is not 4K, the things
will get interesting.

fs/btrfs/volumes.c:485: ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
	Some of the callers do not bother with exclusive open;
in particular, if btrfs_get_dev_args_from_path() ever gets a pathname
of a mounted device with something other than btrfs on it, it won't
be pretty.

kernel/power/swap.c:371:        res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
kernel/power/swap.c:1577:               set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
	Special cases (for obvious reasons); said that, why do we bother
with set_blocksize() on those anyway?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-16  1:35   ` Al Viro
@ 2024-04-16  8:47     ` Alexander Gordeev
  2024-04-17 12:47       ` Stefan Haberland
  0 siblings, 1 reply; 116+ messages in thread
From: Alexander Gordeev @ 2024-04-16  8:47 UTC (permalink / raw)
  To: Al Viro, Stefan Haberland, Jan Hoeppner
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin

On Tue, Apr 16, 2024 at 02:35:55AM +0100, Al Viro wrote:
> >  drivers/s390/block/dasd_ioctl.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
> > index 7e0ed7032f76..c1201590f343 100644
> > --- a/drivers/s390/block/dasd_ioctl.c
> > +++ b/drivers/s390/block/dasd_ioctl.c
> > @@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
> >  	 * enabling the device later.
> >  	 */
> >  	if (fdata->start_unit == 0) {
> > -		block->gdp->part0->bd_inode->i_blkbits =
> > -			blksize_bits(fdata->blksize);
> > +		rc = set_blocksize(block->gdp->part0, fdata->blksize);
> 
> Could somebody (preferably s390 folks) explain what is going on in
> dasd_format()?  The change in this commit is *NOT* an equivalent
> transformation - mainline does not evict the page cache of device.
> 
> Is that
> 	* intentional behaviour in mainline version, possibly broken
> by this patch
> 	* a bug in mainline accidentally fixed by this patch
> 	* something else?
> 
> And shouldn't there be an exclusion between that and having a filesystem
> on a partition of that disk currently mounted?

CC-ing Stefan and Jan.

Thanks!

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH][RFC] set_blocksize() in pktcdvd (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device)
  2024-04-16  6:32                                     ` Al Viro
@ 2024-04-17  4:35                                       ` Al Viro
  2024-04-17 13:43                                       ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Jan Kara
  2024-04-17 20:45                                       ` [RFC] set_blocksize() in kernel/power/swap.c (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
  2 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-17  4:35 UTC (permalink / raw)
  To: axboe
  Cc: Jan Kara, Yu Kuai, hch, linux-fsdevel, linux-block, yi.zhang,
	yangerkun, yukuai (C),
	Christian Brauner

On Tue, Apr 16, 2024 at 07:32:53AM +0100, Al Viro wrote:

> drivers/block/pktcdvd.c:2285:           set_blocksize(disk->part0, CD_FRAMESIZE);

	We had hardsect_size set to that 2Kb from the very beginning
(well, logical_block_size these days).	And the first ->open() is
(and had been since before the pktcdvd went into mainline) followed by
setting block size anyway, so any effects of that set_blocksize() had
always been lost.  Candidate block sizes start at logical_block_size...
Rudiment of something from 2000--2004 when it existed out of tree?
<checks>  That logic into the tree in 2.5.13; May 2002...

	AFAICS, this one can be simply removed.  Jens, do you have
any objections to that?  It's safe, but really pointless...

> drivers/block/pktcdvd.c:2529:   set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);

	This, OTOH, is not safe at all - we don't have the underlying device
exclusive, and it's possible that it is in use with e.g. 4Kb block size (e.g.
from ext* read-only mount, with 4Kb blocks).  This set_blocksize() will screw
the filesystem very badly - block numbers mapping to LBA will change, for starters.

	We are setting a pktcdvd device up here, and that set_blocksize()
is done to the underlying device.  It does *not* prevent changes of block
size of the underlying device by the time we actually open the device
we'd set up - set_blocksize() in ->open() is done to pktcdvd device,
not the underlying one.  So... what is it for?

	It might make sense to move it into ->open(), where we do have
the underlying device claimed.	But doing that at the setup time looks
very odd...

	Do you have any objections against this:

commit d1d93f2c26f70fbcd714615d1a3ea7a104fc0f43
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 17 00:28:03 2024 -0400

    pktcdvd: sort set_blocksize() calls out
    
    1) it doesn't make any sense to have ->open() call set_blocksize() on the
    device being opened - the caller will override that anyway.
    
    2) setting block size on underlying device, OTOH, ought to be done when
    we are opening it exclusive - i.e. as part of pkt_open_dev().  Having
    it done at setup time doesn't guarantee us anything about the state
    at the time we start talking to it.  Worse, if you happen to have
    the underlying device containing e.g. ext2 with 4Kb blocks that
    is currently mounted r/o, that set_blocksize() will confuse the hell
    out of filesystem.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 21728e9ea5c3..05933f25b397 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2215,6 +2215,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
 		}
 		dev_info(ddev, "%lukB available on disc\n", lba << 1);
 	}
+	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	return 0;
 
@@ -2278,11 +2279,6 @@ static int pkt_open(struct gendisk *disk, blk_mode_t mode)
 		ret = pkt_open_dev(pd, mode & BLK_OPEN_WRITE);
 		if (ret)
 			goto out_dec;
-		/*
-		 * needed here as well, since ext2 (among others) may change
-		 * the blocksize at mount time
-		 */
-		set_blocksize(disk->part0, CD_FRAMESIZE);
 	}
 	mutex_unlock(&ctl_mutex);
 	mutex_unlock(&pktcdvd_mutex);
@@ -2526,7 +2522,6 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 	__module_get(THIS_MODULE);
 
 	pd->bdev_file = bdev_file;
-	set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
 	atomic_set(&pd->cdrw.pending_bios, 0);
 	pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev)
  2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
                                                   ` (9 preceding siblings ...)
  2024-04-11 14:53                                 ` [PATCH 11/11] block2mtd: prevent direct access of bd_inode Al Viro
@ 2024-04-17 11:05                                 ` Christian Brauner
  10 siblings, 0 replies; 116+ messages in thread
From: Christian Brauner @ 2024-04-17 11:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C)

On Thu, Apr 11, 2024 at 03:53:36PM +0100, Al Viro wrote:
> points to ->i_data of coallocated inode.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

I've picked all of this into vfs.super btw. I still want to go through
your reply but as you know I'm a bit time-constrained for a bit more. :/

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-16  8:47     ` Alexander Gordeev
@ 2024-04-17 12:47       ` Stefan Haberland
  2024-04-28 18:58         ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Stefan Haberland @ 2024-04-17 12:47 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

Am 16.04.24 um 10:47 schrieb Alexander Gordeev:
> On Tue, Apr 16, 2024 at 02:35:55AM +0100, Al Viro wrote:
>>>   drivers/s390/block/dasd_ioctl.c | 5 +++--
>>>   1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
>>> index 7e0ed7032f76..c1201590f343 100644
>>> --- a/drivers/s390/block/dasd_ioctl.c
>>> +++ b/drivers/s390/block/dasd_ioctl.c
>>> @@ -215,8 +215,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
>>>   	 * enabling the device later.
>>>   	 */
>>>   	if (fdata->start_unit == 0) {
>>> -		block->gdp->part0->bd_inode->i_blkbits =
>>> -			blksize_bits(fdata->blksize);
>>> +		rc = set_blocksize(block->gdp->part0, fdata->blksize);
>> Could somebody (preferably s390 folks) explain what is going on in
>> dasd_format()?  The change in this commit is *NOT* an equivalent
>> transformation - mainline does not evict the page cache of device.
>>
>> Is that
>> 	* intentional behaviour in mainline version, possibly broken
>> by this patch
>> 	* a bug in mainline accidentally fixed by this patch
>> 	* something else?
>>
>> And shouldn't there be an exclusion between that and having a filesystem
>> on a partition of that disk currently mounted?
> CC-ing Stefan and Jan.
>
> Thanks!

Hi,
from my point of view this was an equivalent transformation.

set_blocksize() does basically also set i_blkbits like it was before.
The dasd_format ioctl does only work on a disabled device. To achieve this
all partitions need to be unmounted.
The tooling also refuses to work on disks actually in use.

So there should be no page cache to evict.

The comment above this code says:

/* Since dasdfmt keeps the device open after it was disabled,
  * there still exists an inode for this device.
  * We must update i_blkbits, otherwise we might get errors when
  * enabling the device later.
  */

This is the reason for updating i_blkbits.

However, I get your point to question the code itself.

Honestly this code exists for many years and I can not tell if the
circumstances of the comment have changed in between somehow.
A quick test without this code did not show any change or errors but
there might be corner cases I am missing.

Maybe you can give a hint if this makes any sense from your point of view.

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-16  6:32                                     ` Al Viro
  2024-04-17  4:35                                       ` [PATCH][RFC] set_blocksize() in pktcdvd (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
@ 2024-04-17 13:43                                       ` Jan Kara
  2024-04-17 15:23                                         ` Al Viro
  2024-04-17 20:45                                       ` [RFC] set_blocksize() in kernel/power/swap.c (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
  2 siblings, 1 reply; 116+ messages in thread
From: Jan Kara @ 2024-04-17 13:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Christian Brauner, Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Tue 16-04-24 07:32:53, Al Viro wrote:
> On Mon, Apr 15, 2024 at 09:45:11PM +0100, Al Viro wrote:
> > On Sat, Apr 13, 2024 at 05:25:01PM +0200, Christian Brauner wrote:
> > 
> > > > It also simplifies the hell out of the patch series - it's one obviously
> > > > safe automatic change in a single commit.
> > > 
> > > It's trivial to fold the simple file_mapping() conversion into a single
> > > patch as well.
> > 
> > ... after a bunch of patches that propagate struct file to places where
> > it has no business being.  Compared to a variant that doesn't need those
> > patches at all.
> > 
> > > It's a pure artifact of splitting the patches per
> > > subsystem/driver.
> > 
> > No, it is not.  ->bd_mapping conversion can be done without any
> > preliminaries.  Note that it doesn't need messing with bdev_read_folio(),
> > it doesn't need this journal->j_fs_dev_file thing, etc.
> > 
> > One thing I believe is completely wrong in this series is bdev_inode()
> > existence.  It (and equivalent use of file_inode() on struct file is
> > even worse) is papering over the real interface deficiencies.  And
> > extra file_inode() uses are just about impossible to catch ;-/
> > 
> > IMO we should *never* use file_inode() on opened block devices.
> > At all.  It's brittle, it's asking for trouble as soon as somebody
> > passes a normally opened struct file to one of the functions using it
> > and it papers over the missing primitives.
> 
> BTW, speaking of the things where opened struct file would be a good
> idea - set_blocksize() should take an opened struct file, and it should
> have non-NULL ->private_data.
> 
> Changing block size under e.g. a mounted filesystem should never happen;
> doing that is asking for serious breakage.
> 
> Looking through the current callers (mainline), most are OK (and easy
> to switch).  However,
> 	
> drivers/block/pktcdvd.c:2285:           set_blocksize(disk->part0, CD_FRAMESIZE);
> drivers/block/pktcdvd.c:2529:   set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
> 	Might be broken; pktcdvd.c being what it is...
> 
> drivers/md/bcache/super.c:2558: if (set_blocksize(file_bdev(bdev_file), 4096))
> 	Almost certainly broken; hit register_bcache() with pathname of a mounted
> block device, and if the block size on filesystem in question is not 4K, the things
> will get interesting.

Agreed. Furthermore that set_blocksize() seems to be completely pointless
these days AFAICT because we use read_cache_page_gfp() to read in the data
from the device. Sure we may be creating more bhs per page than necessary
but who cares?

> fs/btrfs/volumes.c:485: ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
> 	Some of the callers do not bother with exclusive open;
> in particular, if btrfs_get_dev_args_from_path() ever gets a pathname
> of a mounted device with something other than btrfs on it, it won't
> be pretty.

Yeah and frankly reading through btrfs_read_dev_super() I'm not sure which
code needs the block size set either. We use read_cache_page_gfp() for the
IO there as well.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device
  2024-04-17 13:43                                       ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Jan Kara
@ 2024-04-17 15:23                                         ` Al Viro
  0 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-17 15:23 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christian Brauner, Yu Kuai, hch, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai (C)

On Wed, Apr 17, 2024 at 03:43:12PM +0200, Jan Kara wrote:

> > fs/btrfs/volumes.c:485: ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
> > 	Some of the callers do not bother with exclusive open;
> > in particular, if btrfs_get_dev_args_from_path() ever gets a pathname
> > of a mounted device with something other than btrfs on it, it won't
> > be pretty.
> 
> Yeah and frankly reading through btrfs_read_dev_super() I'm not sure which
> code needs the block size set either. We use read_cache_page_gfp() for the
> IO there as well.

FWIW, I don't understand the use of invalidate_bdev() in btrfs_get_bdev_and_sb(),
especially when called from btrfs_get_dev_args_from_path() - what's the point
of evicting page cache before reading the on-disk superblock, when all we are
going to do with the data we get is scan through internal list of opened devices
for uuid, etc.  matches?

Could btrfs folks comment on that one?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [RFC] set_blocksize() in kernel/power/swap.c (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device)
  2024-04-16  6:32                                     ` Al Viro
  2024-04-17  4:35                                       ` [PATCH][RFC] set_blocksize() in pktcdvd (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
  2024-04-17 13:43                                       ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Jan Kara
@ 2024-04-17 20:45                                       ` Al Viro
  2 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-17 20:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jan Kara, Yu Kuai, hch, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai (C),
	Christian Brauner, linux-pm

On Tue, Apr 16, 2024 at 07:32:53AM +0100, Al Viro wrote:

> kernel/power/swap.c:371:        res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
> kernel/power/swap.c:1577:               set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
> 	Special cases (for obvious reasons); said that, why do we bother
> with set_blocksize() on those anyway?

AFAICS, we really don't need either - all IO is done via hib_submit_io(),
which sets a single-page bio and feeds it to submit_bio{,_wait}()
directly.  We are *not* using the page cache of the block device
in question, let alone any buffer_head instances.

Could swsusp folks comment?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-11 16:13     ` Gao Xiang
  2024-04-12  1:14       ` Yu Kuai
@ 2024-04-25 19:56       ` Al Viro
  2024-04-25 19:57         ` [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number Al Viro
                           ` (7 more replies)
  1 sibling, 8 replies; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:56 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

On Fri, Apr 12, 2024 at 12:13:42AM +0800, Gao Xiang wrote:
> Hi Al,
> 
> On 2024/4/7 12:05, Al Viro wrote:
> > On Sat, Apr 06, 2024 at 05:09:12PM +0800, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > Now that all filesystems stash the bdev file, it's ok to get inode
> > > for the file.
> > 
> > Looking at the only user of erofs_buf->inode (erofs_bread())...  We
> > use the inode for two things there - block size calculation (to get
> > from block number to position in bytes) and access to page cache.
> > We read in full pages anyway.  And frankly, looking at the callers,
> > we really would be better off if we passed position in bytes instead
> > of block number.  IOW, it smells like erofs_bread() having wrong type.
> > 
> > Look at the callers.  With 3 exceptions it's
> > fs/erofs/super.c:135:   ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> > fs/erofs/super.c:151:           ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
> > fs/erofs/xattr.c:84:    it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
> > fs/erofs/xattr.c:105:           it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
> > fs/erofs/xattr.c:188:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/xattr.c:294:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/xattr.c:339:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
> > fs/erofs/xattr.c:378:           it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
> > fs/erofs/zdata.c:943:           src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);
> > 
> > and all of them actually want the return value + erofs_offset(...).  IOW,
> > we take a linear position (in bytes).  Divide it by block size (from sb).
> > Pass the factor to erofs_bread(), where we multiply that by block size
> > (from inode), see which page will that be in, get that page and return a
> > pointer *into* that page.  Then we again divide the same position
> > by block size (from sb) and add the remainder to the pointer returned
> > by erofs_bread().
> > 
> > IOW, it would be much easier to pass the position directly and to hell
> > with block size logics.  Three exceptions to that pattern:
> > 
> > fs/erofs/data.c:80:     return erofs_bread(buf, blkaddr, type);
> > fs/erofs/dir.c:66:              de = erofs_bread(&buf, i, EROFS_KMAP);
> > fs/erofs/namei.c:103:           de = erofs_bread(&buf, mid, EROFS_KMAP);
> > 
> > Those could bloody well multiply the argument by block size;
> > the first one (erofs_read_metabuf()) is also interesting - its
> > callers themselves follow the similar pattern.  So it might be
> > worth passing it a position in bytes as well...
> > 
> > In any case, all 3 have superblock reference, so they can convert
> > from blocks to bytes conveniently.  Which means that erofs_bread()
> > doesn't need to mess with block size considerations at all.
> > 
> > IOW, it might make sense to replace erofs_buf->inode with
> > pointer to address space.  And use file_mapping() instead of
> > file_inode() in that patch...
> 
> Just saw this again by chance, which is unexpected.
> 
> Yeah, I think that is a good idea.  The story is that erofs_bread()
> was derived from a page-based interface:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35
> 
> so it was once a page index number.  I think a byte offset will be
> a better interface to clean up these, thanks for your time and work
> on this!

FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
minimal conversion of erofs_read_buf() and switch from buf->inode
to buf->mapping, the latter follows that up with massage for
erofs_read_metabuf().

Completely untested; it builds, but that's all I can promise.  Individual
patches in followups.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number
  2024-04-25 19:56       ` Al Viro
@ 2024-04-25 19:57         ` Al Viro
  2024-04-29  3:01           ` Gao Xiang
  2024-04-25 19:58         ` [PATCH 2/6] erofs_buf: store address_space instead of inode Al Viro
                           ` (6 subsequent siblings)
  7 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:57 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

Callers are happier that way, especially since we no longer need to
play with splitting offset into block number and offset within block,
passing the former to erofs_bread(), then adding the latter...

erofs_bread() always reads entire pages, anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/data.c     |  5 ++---
 fs/erofs/dir.c      |  2 +-
 fs/erofs/internal.h |  2 +-
 fs/erofs/namei.c    |  2 +-
 fs/erofs/super.c    |  8 ++++----
 fs/erofs/xattr.c    | 35 +++++++++++++----------------------
 fs/erofs/zdata.c    |  4 ++--
 7 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 52524bd9698b..d3c446dda2ff 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -29,11 +29,10 @@ void erofs_put_metabuf(struct erofs_buf *buf)
  * Derive the block size from inode->i_blkbits to make compatible with
  * anonymous inode in fscache mode.
  */
-void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
+void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 		  enum erofs_kmap_type type)
 {
 	struct inode *inode = buf->inode;
-	erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 	struct folio *folio;
@@ -77,7 +76,7 @@ void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
 			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
 {
 	erofs_init_metabuf(buf, sb);
-	return erofs_bread(buf, blkaddr, type);
+	return erofs_bread(buf, erofs_pos(sb, blkaddr), type);
 }
 
 static int erofs_map_blocks_flatmode(struct inode *inode,
diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
index b80abec0531a..9d38f39bb4f7 100644
--- a/fs/erofs/dir.c
+++ b/fs/erofs/dir.c
@@ -63,7 +63,7 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
 		struct erofs_dirent *de;
 		unsigned int nameoff, maxsize;
 
-		de = erofs_bread(&buf, i, EROFS_KMAP);
+		de = erofs_bread(&buf, erofs_pos(sb, i), EROFS_KMAP);
 		if (IS_ERR(de)) {
 			erofs_err(sb, "fail to readdir of logical block %u of nid %llu",
 				  i, EROFS_I(dir)->nid);
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 39c67119f43b..9e30c67c135c 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -409,7 +409,7 @@ void *erofs_read_metadata(struct super_block *sb, struct erofs_buf *buf,
 			  erofs_off_t *offset, int *lengthp);
 void erofs_unmap_metabuf(struct erofs_buf *buf);
 void erofs_put_metabuf(struct erofs_buf *buf);
-void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
+void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 		  enum erofs_kmap_type type);
 void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb);
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
diff --git a/fs/erofs/namei.c b/fs/erofs/namei.c
index f0110a78acb2..11afa48996a3 100644
--- a/fs/erofs/namei.c
+++ b/fs/erofs/namei.c
@@ -100,7 +100,7 @@ static void *erofs_find_target_block(struct erofs_buf *target,
 		struct erofs_dirent *de;
 
 		buf.inode = dir;
-		de = erofs_bread(&buf, mid, EROFS_KMAP);
+		de = erofs_bread(&buf, erofs_pos(dir->i_sb, mid), EROFS_KMAP);
 		if (!IS_ERR(de)) {
 			const int nameoff = nameoff_from_disk(de->nameoff, bsz);
 			const int ndirents = nameoff / sizeof(*de);
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index c0eb139adb07..fdefc3772620 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -132,11 +132,11 @@ void *erofs_read_metadata(struct super_block *sb, struct erofs_buf *buf,
 	int len, i, cnt;
 
 	*offset = round_up(*offset, 4);
-	ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
+	ptr = erofs_bread(buf, *offset, EROFS_KMAP);
 	if (IS_ERR(ptr))
 		return ptr;
 
-	len = le16_to_cpu(*(__le16 *)&ptr[erofs_blkoff(sb, *offset)]);
+	len = le16_to_cpu(*(__le16 *)ptr);
 	if (!len)
 		len = U16_MAX + 1;
 	buffer = kmalloc(len, GFP_KERNEL);
@@ -148,12 +148,12 @@ void *erofs_read_metadata(struct super_block *sb, struct erofs_buf *buf,
 	for (i = 0; i < len; i += cnt) {
 		cnt = min_t(int, sb->s_blocksize - erofs_blkoff(sb, *offset),
 			    len - i);
-		ptr = erofs_bread(buf, erofs_blknr(sb, *offset), EROFS_KMAP);
+		ptr = erofs_bread(buf, *offset, EROFS_KMAP);
 		if (IS_ERR(ptr)) {
 			kfree(buffer);
 			return ptr;
 		}
-		memcpy(buffer + i, ptr + erofs_blkoff(sb, *offset), cnt);
+		memcpy(buffer + i, ptr, cnt);
 		*offset += cnt;
 	}
 	return buffer;
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index b58316b49a43..ec233917830a 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -81,13 +81,13 @@ static int erofs_init_inode_xattrs(struct inode *inode)
 	it.pos = erofs_iloc(inode) + vi->inode_isize;
 
 	/* read in shared xattr array (non-atomic, see kmalloc below) */
-	it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos), EROFS_KMAP);
+	it.kaddr = erofs_bread(&it.buf, it.pos, EROFS_KMAP);
 	if (IS_ERR(it.kaddr)) {
 		ret = PTR_ERR(it.kaddr);
 		goto out_unlock;
 	}
 
-	ih = it.kaddr + erofs_blkoff(sb, it.pos);
+	ih = it.kaddr;
 	vi->xattr_name_filter = le32_to_cpu(ih->h_name_filter);
 	vi->xattr_shared_count = ih->h_shared_count;
 	vi->xattr_shared_xattrs = kmalloc_array(vi->xattr_shared_count,
@@ -102,16 +102,14 @@ static int erofs_init_inode_xattrs(struct inode *inode)
 	it.pos += sizeof(struct erofs_xattr_ibody_header);
 
 	for (i = 0; i < vi->xattr_shared_count; ++i) {
-		it.kaddr = erofs_bread(&it.buf, erofs_blknr(sb, it.pos),
-				       EROFS_KMAP);
+		it.kaddr = erofs_bread(&it.buf, it.pos, EROFS_KMAP);
 		if (IS_ERR(it.kaddr)) {
 			kfree(vi->xattr_shared_xattrs);
 			vi->xattr_shared_xattrs = NULL;
 			ret = PTR_ERR(it.kaddr);
 			goto out_unlock;
 		}
-		vi->xattr_shared_xattrs[i] = le32_to_cpu(*(__le32 *)
-				(it.kaddr + erofs_blkoff(sb, it.pos)));
+		vi->xattr_shared_xattrs[i] = le32_to_cpu(*(__le32 *)it.kaddr);
 		it.pos += sizeof(__le32);
 	}
 	erofs_put_metabuf(&it.buf);
@@ -185,12 +183,11 @@ static int erofs_xattr_copy_to_buffer(struct erofs_xattr_iter *it,
 	void *src;
 
 	for (processed = 0; processed < len; processed += slice) {
-		it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
-					EROFS_KMAP);
+		it->kaddr = erofs_bread(&it->buf, it->pos, EROFS_KMAP);
 		if (IS_ERR(it->kaddr))
 			return PTR_ERR(it->kaddr);
 
-		src = it->kaddr + erofs_blkoff(sb, it->pos);
+		src = it->kaddr;
 		slice = min_t(unsigned int, sb->s_blocksize -
 				erofs_blkoff(sb, it->pos), len - processed);
 		memcpy(it->buffer + it->buffer_ofs, src, slice);
@@ -208,8 +205,7 @@ static int erofs_listxattr_foreach(struct erofs_xattr_iter *it)
 	int err;
 
 	/* 1. handle xattr entry */
-	entry = *(struct erofs_xattr_entry *)
-			(it->kaddr + erofs_blkoff(it->sb, it->pos));
+	entry = *(struct erofs_xattr_entry *)it->kaddr;
 	it->pos += sizeof(struct erofs_xattr_entry);
 
 	base_index = entry.e_name_index;
@@ -259,8 +255,7 @@ static int erofs_getxattr_foreach(struct erofs_xattr_iter *it)
 	unsigned int slice, processed, value_sz;
 
 	/* 1. handle xattr entry */
-	entry = *(struct erofs_xattr_entry *)
-			(it->kaddr + erofs_blkoff(sb, it->pos));
+	entry = *(struct erofs_xattr_entry *)it->kaddr;
 	it->pos += sizeof(struct erofs_xattr_entry);
 	value_sz = le16_to_cpu(entry.e_value_size);
 
@@ -291,8 +286,7 @@ static int erofs_getxattr_foreach(struct erofs_xattr_iter *it)
 
 	/* 2. handle xattr name */
 	for (processed = 0; processed < entry.e_name_len; processed += slice) {
-		it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
-					EROFS_KMAP);
+		it->kaddr = erofs_bread(&it->buf, it->pos, EROFS_KMAP);
 		if (IS_ERR(it->kaddr))
 			return PTR_ERR(it->kaddr);
 
@@ -300,7 +294,7 @@ static int erofs_getxattr_foreach(struct erofs_xattr_iter *it)
 				sb->s_blocksize - erofs_blkoff(sb, it->pos),
 				entry.e_name_len - processed);
 		if (memcmp(it->name.name + it->infix_len + processed,
-			   it->kaddr + erofs_blkoff(sb, it->pos), slice))
+			   it->kaddr, slice))
 			return -ENOATTR;
 		it->pos += slice;
 	}
@@ -336,13 +330,11 @@ static int erofs_xattr_iter_inline(struct erofs_xattr_iter *it,
 	it->pos = erofs_iloc(inode) + vi->inode_isize + xattr_header_sz;
 
 	while (remaining) {
-		it->kaddr = erofs_bread(&it->buf, erofs_blknr(it->sb, it->pos),
-					EROFS_KMAP);
+		it->kaddr = erofs_bread(&it->buf, it->pos, EROFS_KMAP);
 		if (IS_ERR(it->kaddr))
 			return PTR_ERR(it->kaddr);
 
-		entry_sz = erofs_xattr_entry_size(it->kaddr +
-				erofs_blkoff(it->sb, it->pos));
+		entry_sz = erofs_xattr_entry_size(it->kaddr);
 		/* xattr on-disk corruption: xattr entry beyond xattr_isize */
 		if (remaining < entry_sz) {
 			DBG_BUGON(1);
@@ -375,8 +367,7 @@ static int erofs_xattr_iter_shared(struct erofs_xattr_iter *it,
 	for (i = 0; i < vi->xattr_shared_count; ++i) {
 		it->pos = erofs_pos(sb, sbi->xattr_blkaddr) +
 				vi->xattr_shared_xattrs[i] * sizeof(__le32);
-		it->kaddr = erofs_bread(&it->buf, erofs_blknr(sb, it->pos),
-					EROFS_KMAP);
+		it->kaddr = erofs_bread(&it->buf, it->pos, EROFS_KMAP);
 		if (IS_ERR(it->kaddr))
 			return PTR_ERR(it->kaddr);
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 3216b920d369..9ffdae7fcd5b 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -940,12 +940,12 @@ static int z_erofs_read_fragment(struct super_block *sb, struct page *page,
 	for (; cur < end; cur += cnt, pos += cnt) {
 		cnt = min_t(unsigned int, end - cur,
 			    sb->s_blocksize - erofs_blkoff(sb, pos));
-		src = erofs_bread(&buf, erofs_blknr(sb, pos), EROFS_KMAP);
+		src = erofs_bread(&buf, pos, EROFS_KMAP);
 		if (IS_ERR(src)) {
 			erofs_put_metabuf(&buf);
 			return PTR_ERR(src);
 		}
-		memcpy_to_page(page, cur, src + erofs_blkoff(sb, pos), cnt);
+		memcpy_to_page(page, cur, src, cnt);
 	}
 	erofs_put_metabuf(&buf);
 	return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 2/6] erofs_buf: store address_space instead of inode
  2024-04-25 19:56       ` Al Viro
  2024-04-25 19:57         ` [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number Al Viro
@ 2024-04-25 19:58         ` Al Viro
  2024-04-29  3:01           ` Gao Xiang
  2024-04-25 19:58         ` erofs: mechanically convert erofs_read_metabuf() to offsets Al Viro
                           ` (5 subsequent siblings)
  7 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:58 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

... seeing that ->i_mapping is the only thing we want from the inode.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/data.c     | 7 +++----
 fs/erofs/dir.c      | 2 +-
 fs/erofs/internal.h | 2 +-
 fs/erofs/namei.c    | 4 ++--
 fs/erofs/xattr.c    | 2 +-
 fs/erofs/zdata.c    | 2 +-
 6 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index d3c446dda2ff..e1a170e45c70 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -32,7 +32,6 @@ void erofs_put_metabuf(struct erofs_buf *buf)
 void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 		  enum erofs_kmap_type type)
 {
-	struct inode *inode = buf->inode;
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 	struct folio *folio;
@@ -42,7 +41,7 @@ void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 		erofs_put_metabuf(buf);
 
 		nofs_flag = memalloc_nofs_save();
-		folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
+		folio = read_cache_folio(buf->mapping, index, NULL, NULL);
 		memalloc_nofs_restore(nofs_flag);
 		if (IS_ERR(folio))
 			return folio;
@@ -67,9 +66,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 {
 	if (erofs_is_fscache_mode(sb))
-		buf->inode = EROFS_SB(sb)->s_fscache->inode;
+		buf->mapping = EROFS_SB(sb)->s_fscache->inode->i_mapping;
 	else
-		buf->inode = sb->s_bdev->bd_inode;
+		buf->mapping = sb->s_bdev->bd_inode->i_mapping;
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c
index 9d38f39bb4f7..2193a6710c8f 100644
--- a/fs/erofs/dir.c
+++ b/fs/erofs/dir.c
@@ -58,7 +58,7 @@ static int erofs_readdir(struct file *f, struct dir_context *ctx)
 	int err = 0;
 	bool initial = true;
 
-	buf.inode = dir;
+	buf.mapping = dir->i_mapping;
 	while (ctx->pos < dirsize) {
 		struct erofs_dirent *de;
 		unsigned int nameoff, maxsize;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 9e30c67c135c..12a179818897 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -223,7 +223,7 @@ enum erofs_kmap_type {
 };
 
 struct erofs_buf {
-	struct inode *inode;
+	struct address_space *mapping;
 	struct page *page;
 	void *base;
 	enum erofs_kmap_type kmap_type;
diff --git a/fs/erofs/namei.c b/fs/erofs/namei.c
index 11afa48996a3..c94d0c1608a8 100644
--- a/fs/erofs/namei.c
+++ b/fs/erofs/namei.c
@@ -99,7 +99,7 @@ static void *erofs_find_target_block(struct erofs_buf *target,
 		struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
 		struct erofs_dirent *de;
 
-		buf.inode = dir;
+		buf.mapping = dir->i_mapping;
 		de = erofs_bread(&buf, erofs_pos(dir->i_sb, mid), EROFS_KMAP);
 		if (!IS_ERR(de)) {
 			const int nameoff = nameoff_from_disk(de->nameoff, bsz);
@@ -171,7 +171,7 @@ int erofs_namei(struct inode *dir, const struct qstr *name, erofs_nid_t *nid,
 
 	qn.name = name->name;
 	qn.end = name->name + name->len;
-	buf.inode = dir;
+	buf.mapping = dir->i_mapping;
 
 	ndirents = 0;
 	de = erofs_find_target_block(&buf, dir, &qn, &ndirents);
diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c
index ec233917830a..a90d7d649739 100644
--- a/fs/erofs/xattr.c
+++ b/fs/erofs/xattr.c
@@ -483,7 +483,7 @@ int erofs_xattr_prefixes_init(struct super_block *sb)
 		return -ENOMEM;
 
 	if (sbi->packed_inode)
-		buf.inode = sbi->packed_inode;
+		buf.mapping = sbi->packed_inode->i_mapping;
 	else
 		erofs_init_metabuf(&buf, sb);
 
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 9ffdae7fcd5b..283c9c3a611d 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -936,7 +936,7 @@ static int z_erofs_read_fragment(struct super_block *sb, struct page *page,
 	if (!packed_inode)
 		return -EFSCORRUPTED;
 
-	buf.inode = packed_inode;
+	buf.mapping = packed_inode->i_mapping;
 	for (; cur < end; cur += cnt, pos += cnt) {
 		cnt = min_t(unsigned int, end - cur,
 			    sb->s_blocksize - erofs_blkoff(sb, pos));
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* erofs: mechanically convert erofs_read_metabuf() to offsets
  2024-04-25 19:56       ` Al Viro
  2024-04-25 19:57         ` [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number Al Viro
  2024-04-25 19:58         ` [PATCH 2/6] erofs_buf: store address_space instead of inode Al Viro
@ 2024-04-25 19:58         ` Al Viro
  2024-04-25 19:59         ` [PATCH 4/6] erofs: don't align offset for erofs_read_metabuf() (simple cases) Al Viro
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:58 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

just lift the call of erofs_pos() into the callers; it will
collapse in most of them, but that's better done caller-by-caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/data.c     | 8 ++++----
 fs/erofs/fscache.c  | 2 +-
 fs/erofs/inode.c    | 4 ++--
 fs/erofs/internal.h | 2 +-
 fs/erofs/super.c    | 2 +-
 fs/erofs/zdata.c    | 2 +-
 fs/erofs/zmap.c     | 6 +++---
 7 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index e1a170e45c70..82a196e02b5c 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -72,10 +72,10 @@ void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
-			 erofs_blk_t blkaddr, enum erofs_kmap_type type)
+			 erofs_off_t offset, enum erofs_kmap_type type)
 {
 	erofs_init_metabuf(buf, sb);
-	return erofs_bread(buf, erofs_pos(sb, blkaddr), type);
+	return erofs_bread(buf, offset, type);
 }
 
 static int erofs_map_blocks_flatmode(struct inode *inode,
@@ -152,7 +152,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
 	pos = ALIGN(erofs_iloc(inode) + vi->inode_isize +
 		    vi->xattr_isize, unit) + unit * chunknr;
 
-	kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(sb, pos), EROFS_KMAP);
+	kaddr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, erofs_blknr(sb, pos)), EROFS_KMAP);
 	if (IS_ERR(kaddr)) {
 		err = PTR_ERR(kaddr);
 		goto out;
@@ -295,7 +295,7 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 
 		iomap->type = IOMAP_INLINE;
 		ptr = erofs_read_metabuf(&buf, sb,
-				erofs_blknr(sb, mdev.m_pa), EROFS_KMAP);
+				erofs_pos(sb, erofs_blknr(sb, mdev.m_pa)), EROFS_KMAP);
 		if (IS_ERR(ptr))
 			return PTR_ERR(ptr);
 		iomap->inline_data = ptr + erofs_blkoff(sb, mdev.m_pa);
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 8aff1a724805..4df4617d99f2 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -282,7 +282,7 @@ static int erofs_fscache_data_read_slice(struct erofs_fscache_rq *req)
 		blknr = erofs_blknr(sb, map.m_pa);
 		size = map.m_llen;
 
-		src = erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP);
+		src = erofs_read_metabuf(&buf, sb, erofs_pos(sb, blknr), EROFS_KMAP);
 		if (IS_ERR(src))
 			return PTR_ERR(src);
 
diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c
index 0eb0e6f933c3..5f6439a63af7 100644
--- a/fs/erofs/inode.c
+++ b/fs/erofs/inode.c
@@ -26,7 +26,7 @@ static void *erofs_read_inode(struct erofs_buf *buf,
 	blkaddr = erofs_blknr(sb, inode_loc);
 	*ofs = erofs_blkoff(sb, inode_loc);
 
-	kaddr = erofs_read_metabuf(buf, sb, blkaddr, EROFS_KMAP);
+	kaddr = erofs_read_metabuf(buf, sb, erofs_pos(sb, blkaddr), EROFS_KMAP);
 	if (IS_ERR(kaddr)) {
 		erofs_err(sb, "failed to get inode (nid: %llu) page, err %ld",
 			  vi->nid, PTR_ERR(kaddr));
@@ -66,7 +66,7 @@ static void *erofs_read_inode(struct erofs_buf *buf,
 				goto err_out;
 			}
 			memcpy(copied, dic, gotten);
-			kaddr = erofs_read_metabuf(buf, sb, blkaddr + 1,
+			kaddr = erofs_read_metabuf(buf, sb, erofs_pos(sb, blkaddr + 1),
 						   EROFS_KMAP);
 			if (IS_ERR(kaddr)) {
 				erofs_err(sb, "failed to get inode payload block (nid: %llu), err %ld",
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 12a179818897..f82a5eb79c8e 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -413,7 +413,7 @@ void *erofs_bread(struct erofs_buf *buf, erofs_off_t offset,
 		  enum erofs_kmap_type type);
 void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb);
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
-			 erofs_blk_t blkaddr, enum erofs_kmap_type type);
+			 erofs_off_t offset, enum erofs_kmap_type type);
 int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *dev);
 int erofs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		 u64 start, u64 len);
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index fdefc3772620..5466118c7e2d 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -180,7 +180,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
 	struct file *bdev_file;
 	void *ptr;
 
-	ptr = erofs_read_metabuf(buf, sb, erofs_blknr(sb, *pos), EROFS_KMAP);
+	ptr = erofs_read_metabuf(buf, sb, erofs_pos(sb, erofs_blknr(sb, *pos)), EROFS_KMAP);
 	if (IS_ERR(ptr))
 		return PTR_ERR(ptr);
 	dis = ptr + erofs_blkoff(sb, *pos);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 283c9c3a611d..d417e189f1a0 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -868,7 +868,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
 	} else {
 		void *mptr;
 
-		mptr = erofs_read_metabuf(&map->buf, sb, blknr, EROFS_NO_KMAP);
+		mptr = erofs_read_metabuf(&map->buf, sb, erofs_pos(sb, blknr), EROFS_NO_KMAP);
 		if (IS_ERR(mptr)) {
 			ret = PTR_ERR(mptr);
 			erofs_err(sb, "failed to get inline data %d", ret);
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index e313c936351d..bd8dfe8c65ae 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -34,7 +34,7 @@ static int z_erofs_load_full_lcluster(struct z_erofs_maprecorder *m,
 	unsigned int advise, type;
 
 	m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
-				      erofs_blknr(inode->i_sb, pos), EROFS_KMAP);
+				      erofs_pos(inode->i_sb, erofs_blknr(inode->i_sb, pos)), EROFS_KMAP);
 	if (IS_ERR(m->kaddr))
 		return PTR_ERR(m->kaddr);
 
@@ -267,7 +267,7 @@ static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m,
 out:
 	pos += lcn * (1 << amortizedshift);
 	m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
-				      erofs_blknr(inode->i_sb, pos), EROFS_KMAP);
+				      erofs_pos(inode->i_sb, erofs_blknr(inode->i_sb, pos)), EROFS_KMAP);
 	if (IS_ERR(m->kaddr))
 		return PTR_ERR(m->kaddr);
 	return unpack_compacted_index(m, amortizedshift, pos, lookahead);
@@ -600,7 +600,7 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 		goto out_unlock;
 
 	pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8);
-	kaddr = erofs_read_metabuf(&buf, sb, erofs_blknr(sb, pos), EROFS_KMAP);
+	kaddr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, erofs_blknr(sb, pos)), EROFS_KMAP);
 	if (IS_ERR(kaddr)) {
 		err = PTR_ERR(kaddr);
 		goto out_unlock;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 4/6] erofs: don't align offset for erofs_read_metabuf() (simple cases)
  2024-04-25 19:56       ` Al Viro
                           ` (2 preceding siblings ...)
  2024-04-25 19:58         ` erofs: mechanically convert erofs_read_metabuf() to offsets Al Viro
@ 2024-04-25 19:59         ` Al Viro
  2024-04-25 19:59         ` [PATCH 5/6] erofs: don't round offset down for erofs_read_metabuf() Al Viro
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:59 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

Most of the callers of erofs_read_metabuf() have the following form:

	block = erofs_blknr(sb, offset);
	off = erofs_blkoff(sb, offset);
	p = erofs_read_metabuf(...., erofs_pos(sb, block), ...);
	if (IS_ERR(p))
		return PTR_ERR(p);
	q = p + off;
	// no further uses of p, block or off.

The value passed to erofs_read_metabuf() is offset rounded down to block
size, i.e. offset - off.  Passing offset as-is would increase the return
value by off in case of success and keep the return value unchanged in
in case of error.  In other words, the same could be achieved by

	q = erofs_read_metabuf(...., offset, ...);
	if (IS_ERR(q))
		return PTR_ERR(q);

This commit convert these simple cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/data.c    | 11 +++++------
 fs/erofs/fscache.c | 12 +++---------
 fs/erofs/super.c   |  8 +++-----
 fs/erofs/zmap.c    |  8 +++-----
 4 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 82a196e02b5c..604d0bc82a0e 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -152,7 +152,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
 	pos = ALIGN(erofs_iloc(inode) + vi->inode_isize +
 		    vi->xattr_isize, unit) + unit * chunknr;
 
-	kaddr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, erofs_blknr(sb, pos)), EROFS_KMAP);
+	kaddr = erofs_read_metabuf(&buf, sb, pos, EROFS_KMAP);
 	if (IS_ERR(kaddr)) {
 		err = PTR_ERR(kaddr);
 		goto out;
@@ -163,7 +163,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
 
 	/* handle block map */
 	if (!(vi->chunkformat & EROFS_CHUNK_FORMAT_INDEXES)) {
-		__le32 *blkaddr = kaddr + erofs_blkoff(sb, pos);
+		__le32 *blkaddr = kaddr;
 
 		if (le32_to_cpu(*blkaddr) == EROFS_NULL_ADDR) {
 			map->m_flags = 0;
@@ -174,7 +174,7 @@ int erofs_map_blocks(struct inode *inode, struct erofs_map_blocks *map)
 		goto out_unlock;
 	}
 	/* parse chunk indexes */
-	idx = kaddr + erofs_blkoff(sb, pos);
+	idx = kaddr;
 	switch (le32_to_cpu(idx->blkaddr)) {
 	case EROFS_NULL_ADDR:
 		map->m_flags = 0;
@@ -294,11 +294,10 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
 
 		iomap->type = IOMAP_INLINE;
-		ptr = erofs_read_metabuf(&buf, sb,
-				erofs_pos(sb, erofs_blknr(sb, mdev.m_pa)), EROFS_KMAP);
+		ptr = erofs_read_metabuf(&buf, sb, mdev.m_pa, EROFS_KMAP);
 		if (IS_ERR(ptr))
 			return PTR_ERR(ptr);
-		iomap->inline_data = ptr + erofs_blkoff(sb, mdev.m_pa);
+		iomap->inline_data = ptr;
 		iomap->private = buf.base;
 	} else {
 		iomap->type = IOMAP_MAPPED;
diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c
index 4df4617d99f2..c1b42392b854 100644
--- a/fs/erofs/fscache.c
+++ b/fs/erofs/fscache.c
@@ -273,21 +273,15 @@ static int erofs_fscache_data_read_slice(struct erofs_fscache_rq *req)
 	if (map.m_flags & EROFS_MAP_META) {
 		struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
 		struct iov_iter iter;
-		erofs_blk_t blknr;
-		size_t offset, size;
+		size_t size = map.m_llen;
 		void *src;
 
-		/* For tail packing layout, the offset may be non-zero. */
-		offset = erofs_blkoff(sb, map.m_pa);
-		blknr = erofs_blknr(sb, map.m_pa);
-		size = map.m_llen;
-
-		src = erofs_read_metabuf(&buf, sb, erofs_pos(sb, blknr), EROFS_KMAP);
+		src = erofs_read_metabuf(&buf, sb, map.m_pa, EROFS_KMAP);
 		if (IS_ERR(src))
 			return PTR_ERR(src);
 
 		iov_iter_xarray(&iter, ITER_DEST, &mapping->i_pages, pos, PAGE_SIZE);
-		if (copy_to_iter(src + offset, size, &iter) != size) {
+		if (copy_to_iter(src, size, &iter) != size) {
 			erofs_put_metabuf(&buf);
 			return -EFAULT;
 		}
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 5466118c7e2d..49dc34ea70b2 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -178,12 +178,10 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
 	struct erofs_fscache *fscache;
 	struct erofs_deviceslot *dis;
 	struct file *bdev_file;
-	void *ptr;
 
-	ptr = erofs_read_metabuf(buf, sb, erofs_pos(sb, erofs_blknr(sb, *pos)), EROFS_KMAP);
-	if (IS_ERR(ptr))
-		return PTR_ERR(ptr);
-	dis = ptr + erofs_blkoff(sb, *pos);
+	dis = erofs_read_metabuf(buf, sb, *pos, EROFS_KMAP);
+	if (IS_ERR(dis))
+		return PTR_ERR(dis);
 
 	if (!sbi->devs->flatdev && !dif->path) {
 		if (!dis->tag[0]) {
diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index bd8dfe8c65ae..7c7151c22067 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -580,7 +580,6 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 	int err, headnr;
 	erofs_off_t pos;
 	struct erofs_buf buf = __EROFS_BUF_INITIALIZER;
-	void *kaddr;
 	struct z_erofs_map_header *h;
 
 	if (test_bit(EROFS_I_Z_INITED_BIT, &vi->flags)) {
@@ -600,13 +599,12 @@ static int z_erofs_fill_inode_lazy(struct inode *inode)
 		goto out_unlock;
 
 	pos = ALIGN(erofs_iloc(inode) + vi->inode_isize + vi->xattr_isize, 8);
-	kaddr = erofs_read_metabuf(&buf, sb, erofs_pos(sb, erofs_blknr(sb, pos)), EROFS_KMAP);
-	if (IS_ERR(kaddr)) {
-		err = PTR_ERR(kaddr);
+	h = erofs_read_metabuf(&buf, sb, pos, EROFS_KMAP);
+	if (IS_ERR(h)) {
+		err = PTR_ERR(h);
 		goto out_unlock;
 	}
 
-	h = kaddr + erofs_blkoff(sb, pos);
 	/*
 	 * if the highest bit of the 8-byte map header is set, the whole file
 	 * is stored in the packed inode. The rest bits keeps z_fragmentoff.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 5/6] erofs: don't round offset down for erofs_read_metabuf()
  2024-04-25 19:56       ` Al Viro
                           ` (3 preceding siblings ...)
  2024-04-25 19:59         ` [PATCH 4/6] erofs: don't align offset for erofs_read_metabuf() (simple cases) Al Viro
@ 2024-04-25 19:59         ` Al Viro
  2024-04-25 20:00         ` [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down Al Viro
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 116+ messages in thread
From: Al Viro @ 2024-04-25 19:59 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

There's only one place where struct z_erofs_maprecorder ->kaddr is
used not in the same function that has assigned it -
the value read in unpack_compacted_index() gets calculated in
z_erofs_load_compact_lcluster().  With minor massage we can switch
to storing it with offset in block already added.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/zmap.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c
index 7c7151c22067..5f9ece0c2a03 100644
--- a/fs/erofs/zmap.c
+++ b/fs/erofs/zmap.c
@@ -34,13 +34,13 @@ static int z_erofs_load_full_lcluster(struct z_erofs_maprecorder *m,
 	unsigned int advise, type;
 
 	m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
-				      erofs_pos(inode->i_sb, erofs_blknr(inode->i_sb, pos)), EROFS_KMAP);
+				      pos, EROFS_KMAP);
 	if (IS_ERR(m->kaddr))
 		return PTR_ERR(m->kaddr);
 
 	m->nextpackoff = pos + sizeof(struct z_erofs_lcluster_index);
 	m->lcn = lcn;
-	di = m->kaddr + erofs_blkoff(inode->i_sb, pos);
+	di = m->kaddr;
 
 	advise = le16_to_cpu(di->di_advise);
 	type = (advise >> Z_EROFS_LI_LCLUSTER_TYPE_BIT) &
@@ -120,7 +120,7 @@ static int unpack_compacted_index(struct z_erofs_maprecorder *m,
 {
 	struct erofs_inode *const vi = EROFS_I(m->inode);
 	const unsigned int lclusterbits = vi->z_logical_clusterbits;
-	unsigned int vcnt, base, lo, lobits, encodebits, nblk, eofs;
+	unsigned int vcnt, lo, lobits, encodebits, nblk, bytes;
 	int i;
 	u8 *in, type;
 	bool big_pcluster;
@@ -138,11 +138,11 @@ static int unpack_compacted_index(struct z_erofs_maprecorder *m,
 	big_pcluster = vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1;
 	lobits = max(lclusterbits, ilog2(Z_EROFS_LI_D0_CBLKCNT) + 1U);
 	encodebits = ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt;
-	eofs = erofs_blkoff(m->inode->i_sb, pos);
-	base = round_down(eofs, vcnt << amortizedshift);
-	in = m->kaddr + base;
+	bytes = pos & ((vcnt << amortizedshift) - 1);
 
-	i = (eofs - base) >> amortizedshift;
+	in = m->kaddr - bytes;
+
+	i = bytes >> amortizedshift;
 
 	lo = decode_compactedbits(lobits, in, encodebits * i, &type);
 	m->type = type;
@@ -267,7 +267,7 @@ static int z_erofs_load_compact_lcluster(struct z_erofs_maprecorder *m,
 out:
 	pos += lcn * (1 << amortizedshift);
 	m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
-				      erofs_pos(inode->i_sb, erofs_blknr(inode->i_sb, pos)), EROFS_KMAP);
+				      pos, EROFS_KMAP);
 	if (IS_ERR(m->kaddr))
 		return PTR_ERR(m->kaddr);
 	return unpack_compacted_index(m, amortizedshift, pos, lookahead);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down
  2024-04-25 19:56       ` Al Viro
                           ` (4 preceding siblings ...)
  2024-04-25 19:59         ` [PATCH 5/6] erofs: don't round offset down for erofs_read_metabuf() Al Viro
@ 2024-04-25 20:00         ` Al Viro
  2024-04-26  5:32           ` Gao Xiang
  2024-04-25 20:08         ` [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode Al Viro
  2024-04-25 23:22         ` Gao Xiang
  7 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-25 20:00 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

... and be more idiomatic when calculating ->pageofs_in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/erofs/zdata.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index d417e189f1a0..a4ff20b54cc1 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -868,7 +868,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
 	} else {
 		void *mptr;
 
-		mptr = erofs_read_metabuf(&map->buf, sb, erofs_pos(sb, blknr), EROFS_NO_KMAP);
+		mptr = erofs_read_metabuf(&map->buf, sb, map->m_pa, EROFS_NO_KMAP);
 		if (IS_ERR(mptr)) {
 			ret = PTR_ERR(mptr);
 			erofs_err(sb, "failed to get inline data %d", ret);
@@ -876,7 +876,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
 		}
 		get_page(map->buf.page);
 		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page);
-		fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK;
+		fe->pcl->pageofs_in = offset_in_page(mptr);
 		fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
 	}
 	/* file-backed inplace I/O pages are traversed in reverse order */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-25 19:56       ` Al Viro
                           ` (5 preceding siblings ...)
  2024-04-25 20:00         ` [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down Al Viro
@ 2024-04-25 20:08         ` Al Viro
  2024-04-25 21:56           ` Gao Xiang
  2024-04-25 23:22         ` Gao Xiang
  7 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-25 20:08 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:

> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> minimal conversion of erofs_read_buf() and switch from buf->inode
> to buf->mapping, the latter follows that up with massage for
> erofs_read_metabuf().

First two and last four patches resp.  BTW, what are the intended rules
for inline symlinks?  "Should fit within the same block as the last
byte of on-disk erofs_inode_{compact,extended}"?  Feels like
erofs_read_inode() might be better off if it did copying the symlink
body instead of leaving it to erofs_fill_symlink(), complete with
the sanity checks...  I'd left that logics alone, though - I'm nowhere
near familiar enough with erofs layout.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-25 20:08         ` [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode Al Viro
@ 2024-04-25 21:56           ` Gao Xiang
  2024-04-25 22:28             ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Gao Xiang @ 2024-04-25 21:56 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

Hi Al,

On 2024/4/26 04:08, Al Viro wrote:
> On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
> 
>> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
>> minimal conversion of erofs_read_buf() and switch from buf->inode
>> to buf->mapping, the latter follows that up with massage for
>> erofs_read_metabuf().
> 
> First two and last four patches resp.  BTW, what are the intended rules
> for inline symlinks?  "Should fit within the same block as the last

symlink on-disk layout follows the same rule of regular files.  The last
logical block can be inlined right after the on-disk inode (called tail
packing inline) or use a separate fs block to keep the symlink if tail
packing inline doesn't fit.

> byte of on-disk erofs_inode_{compact,extended}"?  Feels like
> erofs_read_inode() might be better off if it did copying the symlink
> body instead of leaving it to erofs_fill_symlink(), complete with
> the sanity checks...  I'd left that logics alone, though - I'm nowhere
> near familiar enough with erofs layout.
If I understand correctly, do you mean just fold erofs_fill_symlink()
into the caller?  That is fine with me, I can change this in the
future.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-25 21:56           ` Gao Xiang
@ 2024-04-25 22:28             ` Al Viro
  2024-04-25 23:11               ` Gao Xiang
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-25 22:28 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

On Fri, Apr 26, 2024 at 05:56:52AM +0800, Gao Xiang wrote:
> Hi Al,
> 
> On 2024/4/26 04:08, Al Viro wrote:
> > On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
> > 
> > > FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> > > minimal conversion of erofs_read_buf() and switch from buf->inode
> > > to buf->mapping, the latter follows that up with massage for
> > > erofs_read_metabuf().
> > 
> > First two and last four patches resp.  BTW, what are the intended rules
> > for inline symlinks?  "Should fit within the same block as the last
> 
> symlink on-disk layout follows the same rule of regular files.  The last
> logical block can be inlined right after the on-disk inode (called tail
> packing inline) or use a separate fs block to keep the symlink if tail
> packing inline doesn't fit.
> 
> > byte of on-disk erofs_inode_{compact,extended}"?  Feels like
> > erofs_read_inode() might be better off if it did copying the symlink
> > body instead of leaving it to erofs_fill_symlink(), complete with
> > the sanity checks...  I'd left that logics alone, though - I'm nowhere
> > near familiar enough with erofs layout.
> If I understand correctly, do you mean just fold erofs_fill_symlink()
> into the caller?  That is fine with me, I can change this in the
> future.

It's just that the calling conventions of erofs_read_inode() feel wrong ;-/
We return a pointer and offset, with (ERR_PTR(...), anything) used to
indicate an error and (pointer into page, offset) used (in case of
fast symlinks and only in case of fast symlinks) to encode the address
of symlink body, with data starting at pointer + offset + vi->xattr_isize
and length being ->i_size, no greater than block size - offset - vi->xattr_size.

If anything, it would be easier to follow (and document) if you had
allocated and filled the symlink body right there in erofs_read_inode().
That way you could lift erofs_put_metabuf() call into erofs_read_inode(),
along with the variable itself.

Perhaps something like void *erofs_read_inode(inode) with
	ERR_PTR(-E...) => error
	NULL => success, not a fast symlink
	pointer to string => success, a fast symlink, body allocated and returned
to caller.

Or, for that matter, have it return an int and stuff the body into ->i_link -
it's just that you'd need to set ->i_op there with such approach.

Not sure, really.  BTW, one comment about erofs_fill_symlink() - it's probably
a good idea to use kmemdup_nul() rather than open-coding it (and do that after
the block overflow check, obviously).

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-25 22:28             ` Al Viro
@ 2024-04-25 23:11               ` Gao Xiang
  0 siblings, 0 replies; 116+ messages in thread
From: Gao Xiang @ 2024-04-25 23:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3



On 2024/4/26 06:28, Al Viro wrote:
> On Fri, Apr 26, 2024 at 05:56:52AM +0800, Gao Xiang wrote:
>> Hi Al,
>>
>> On 2024/4/26 04:08, Al Viro wrote:
>>> On Thu, Apr 25, 2024 at 08:56:41PM +0100, Al Viro wrote:
>>>
>>>> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
>>>> minimal conversion of erofs_read_buf() and switch from buf->inode
>>>> to buf->mapping, the latter follows that up with massage for
>>>> erofs_read_metabuf().
>>>
>>> First two and last four patches resp.  BTW, what are the intended rules
>>> for inline symlinks?  "Should fit within the same block as the last
>>
>> symlink on-disk layout follows the same rule of regular files.  The last
>> logical block can be inlined right after the on-disk inode (called tail
>> packing inline) or use a separate fs block to keep the symlink if tail
>> packing inline doesn't fit.
>>
>>> byte of on-disk erofs_inode_{compact,extended}"?  Feels like
>>> erofs_read_inode() might be better off if it did copying the symlink
>>> body instead of leaving it to erofs_fill_symlink(), complete with
>>> the sanity checks...  I'd left that logics alone, though - I'm nowhere
>>> near familiar enough with erofs layout.
>> If I understand correctly, do you mean just fold erofs_fill_symlink()
>> into the caller?  That is fine with me, I can change this in the
>> future.
> 
> It's just that the calling conventions of erofs_read_inode() feel wrong ;-/
> We return a pointer and offset, with (ERR_PTR(...), anything) used to
> indicate an error and (pointer into page, offset) used (in case of
> fast symlinks and only in case of fast symlinks) to encode the address
> of symlink body, with data starting at pointer + offset + vi->xattr_isize
> and length being ->i_size, no greater than block size - offset - vi->xattr_size.
> 
> If anything, it would be easier to follow (and document) if you had
> allocated and filled the symlink body right there in erofs_read_inode().
> That way you could lift erofs_put_metabuf() call into erofs_read_inode(),
> along with the variable itself.
> 
> Perhaps something like void *erofs_read_inode(inode) with
> 	ERR_PTR(-E...) => error
> 	NULL => success, not a fast symlink
> 	pointer to string => success, a fast symlink, body allocated and returned
> to caller.

Got it.  my original plan was that erofs_read_inode() didn't need
to handle inode->i_mode stuffs (IOWs, different type of inodes).

But I think symlink i_link cases can be handled specifically in
erofs_read_inode().

> 
> Or, for that matter, have it return an int and stuff the body into ->i_link -
> it's just that you'd need to set ->i_op there with such approach.
> 
> Not sure, really.  BTW, one comment about erofs_fill_symlink() - it's probably
> a good idea to use kmemdup_nul() rather than open-coding it (and do that after
> the block overflow check, obviously).

Yes, let me handle all of this later.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode
  2024-04-25 19:56       ` Al Viro
                           ` (6 preceding siblings ...)
  2024-04-25 20:08         ` [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode Al Viro
@ 2024-04-25 23:22         ` Gao Xiang
  7 siblings, 0 replies; 116+ messages in thread
From: Gao Xiang @ 2024-04-25 23:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

Hi Al,

On 2024/4/26 03:56, Al Viro wrote:
> On Fri, Apr 12, 2024 at 12:13:42AM +0800, Gao Xiang wrote:

...

>>
>> Just saw this again by chance, which is unexpected.
>>
>> Yeah, I think that is a good idea.  The story is that erofs_bread()
>> was derived from a page-based interface:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/erofs/data.c?h=v5.10#n35
>>
>> so it was once a page index number.  I think a byte offset will be
>> a better interface to clean up these, thanks for your time and work
>> on this!
> 
> FWIW, see #misc.erofs and #more.erofs in my tree; the former is the
> minimal conversion of erofs_read_buf() and switch from buf->inode
> to buf->mapping, the latter follows that up with massage for
> erofs_read_metabuf().
> 
> Completely untested; it builds, but that's all I can promise.  Individual
> patches in followups.

Thanks for so much time on this, I will review/test/feedback
these patches by the end of this week since an internal project
for my employer is also ongoing.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down
  2024-04-25 20:00         ` [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down Al Viro
@ 2024-04-26  5:32           ` Gao Xiang
  2024-05-03  4:15             ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Gao Xiang @ 2024-04-26  5:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

Hi Al,

On 2024/4/26 04:00, Al Viro wrote:
> ... and be more idiomatic when calculating ->pageofs_in.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>   fs/erofs/zdata.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index d417e189f1a0..a4ff20b54cc1 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -868,7 +868,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
>   	} else {
>   		void *mptr;
>   
> -		mptr = erofs_read_metabuf(&map->buf, sb, erofs_pos(sb, blknr), EROFS_NO_KMAP);
> +		mptr = erofs_read_metabuf(&map->buf, sb, map->m_pa, EROFS_NO_KMAP);

This patch caused some corrupted failure, since
here erofs_read_metabuf() is EROFS_NO_KMAP and
it's no needed to get a maped-address since only
a page reference is needed.

>   		if (IS_ERR(mptr)) {
>   			ret = PTR_ERR(mptr);
>   			erofs_err(sb, "failed to get inline data %d", ret);
> @@ -876,7 +876,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
>   		}
>   		get_page(map->buf.page);
>   		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page);
> -		fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK;
> +		fe->pcl->pageofs_in = offset_in_page(mptr);

So it's unnecessary to change this line IMHO.

BTW, would you mind routing this series through erofs tree
with other erofs patches for -next (as long as this series
isn't twisted with vfs and block stuffs...)?  Since I may
need to test more to ensure they don't break anything and
could fix them immediately by hand...

Thanks,
Gao Xiang


>   		fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
>   	}
>   	/* file-backed inplace I/O pages are traversed in reverse order */

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-17 12:47       ` Stefan Haberland
@ 2024-04-28 18:58         ` Al Viro
  2024-04-28 23:23           ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-28 18:58 UTC (permalink / raw)
  To: Stefan Haberland
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

On Wed, Apr 17, 2024 at 02:47:14PM +0200, Stefan Haberland wrote:

> set_blocksize() does basically also set i_blkbits like it was before.
> The dasd_format ioctl does only work on a disabled device. To achieve this
> all partitions need to be unmounted.
> The tooling also refuses to work on disks actually in use.
> 
> So there should be no page cache to evict.

You mean this?
        if (base->state != DASD_STATE_BASIC) {
                pr_warn("%s: The DASD cannot be formatted while it is enabled\n",
                        dev_name(&base->cdev->dev));
                return -EBUSY;
        }  

OK, but what would prevent dasd_ioctl_disable() from working while
disk is in use?  And I don't see anything that would evict the
page cache in dasd_ioctl_disable() either, actually...

What am I missing here?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-28 18:58         ` Al Viro
@ 2024-04-28 23:23           ` Al Viro
  2024-04-29 14:41             ` Stefan Haberland
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-28 23:23 UTC (permalink / raw)
  To: Stefan Haberland
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

On Sun, Apr 28, 2024 at 07:58:23PM +0100, Al Viro wrote:
> On Wed, Apr 17, 2024 at 02:47:14PM +0200, Stefan Haberland wrote:
> 
> > set_blocksize() does basically also set i_blkbits like it was before.
> > The dasd_format ioctl does only work on a disabled device. To achieve this
> > all partitions need to be unmounted.
> > The tooling also refuses to work on disks actually in use.
> > 
> > So there should be no page cache to evict.
> 
> You mean this?
>         if (base->state != DASD_STATE_BASIC) {
>                 pr_warn("%s: The DASD cannot be formatted while it is enabled\n",
>                         dev_name(&base->cdev->dev));
>                 return -EBUSY;
>         }  
> 
> OK, but what would prevent dasd_ioctl_disable() from working while
> disk is in use?  And I don't see anything that would evict the
> page cache in dasd_ioctl_disable() either, actually...
> 
> What am I missing here?

BTW, you are updating block size according to new device size, before
        rc = base->discipline->format_device(base, fdata, 1);
	if (rc == -EAGAIN)
		rc = base->discipline->format_device(base, fdata, 0);
Unless something very unidiomatic is going on, this attempt to
format might fail...

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number
  2024-04-25 19:57         ` [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number Al Viro
@ 2024-04-29  3:01           ` Gao Xiang
  0 siblings, 0 replies; 116+ messages in thread
From: Gao Xiang @ 2024-04-29  3:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3



On 2024/4/26 03:57, Al Viro wrote:
> Callers are happier that way, especially since we no longer need to
> play with splitting offset into block number and offset within block,
> passing the former to erofs_bread(), then adding the latter...
> 
> erofs_bread() always reads entire pages, anyway.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 2/6] erofs_buf: store address_space instead of inode
  2024-04-25 19:58         ` [PATCH 2/6] erofs_buf: store address_space instead of inode Al Viro
@ 2024-04-29  3:01           ` Gao Xiang
  0 siblings, 0 replies; 116+ messages in thread
From: Gao Xiang @ 2024-04-29  3:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3



On 2024/4/26 03:58, Al Viro wrote:
> ... seeing that ->i_mapping is the only thing we want from the inode.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-28 23:23           ` Al Viro
@ 2024-04-29 14:41             ` Stefan Haberland
  2024-04-30  0:30               ` Al Viro
  0 siblings, 1 reply; 116+ messages in thread
From: Stefan Haberland @ 2024-04-29 14:41 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

Am 29.04.24 um 01:23 schrieb Al Viro:
> On Sun, Apr 28, 2024 at 07:58:23PM +0100, Al Viro wrote:
>> On Wed, Apr 17, 2024 at 02:47:14PM +0200, Stefan Haberland wrote:
>>
>>> set_blocksize() does basically also set i_blkbits like it was before.
>>> The dasd_format ioctl does only work on a disabled device. To achieve this
>>> all partitions need to be unmounted.
>>> The tooling also refuses to work on disks actually in use.
>>>
>>> So there should be no page cache to evict.
>> You mean this?
>>          if (base->state != DASD_STATE_BASIC) {
>>                  pr_warn("%s: The DASD cannot be formatted while it is enabled\n",
>>                          dev_name(&base->cdev->dev));
>>                  return -EBUSY;
>>          }
>>
>> OK, but what would prevent dasd_ioctl_disable() from working while
>> disk is in use?  And I don't see anything that would evict the
>> page cache in dasd_ioctl_disable() either, actually...
>>
>> What am I missing here?

Thank you for your input.
Let me provide some more insides how it is intended to work.
Maybe there is something we should improve.

This whole code is basically intended to be used by the dasdfmt tool.

For the dasdfmt tool and the dasd_format ioctl we are talking about DASD
ECKD devices.
An important note: for those devices a partition has to be used to access
the disk because the first tracks of the disks are not safe to store user
data. A partition has to be created by fdasd.

A disk in use has the state DASD_STATE_ONLINE.
To format a device the dasdfmt tool has to be called, it does the
following:

The dasdfmt tool checks if the disk is actually in use and refuses to
work on an 'in use' DASD.
So for example a partition that was in use has to be unmounted first.

Afterwards it does the following calls:

BIODASDDISABLE
  - to disable the device and prevent further usage
  - sets the disk in state DASD_STATE_BASIC
BIODASDFMT
  - does the actual formatting
  - checks if the disk is in state DASD_STATE_BASIC (if BIODASDDISABLE was
    called before)
  - this ioctl is usually called multiple times to format smaller parts of
    the disk each time
  - in the first call to this ioctl the first track (track 0) is
    invalidated (basically wiped out) and format_data_t.intensity equals
DASD_FMT_INT_INVAL
  - the last step is to finally format the first track to indicate a
    successful formatting of the whole disk
BIODASDENABLE
  - to enable the disk again for general usage
  - sets the disk to state DASD_STATE_ONLINE again
  - NOTE: a disabled device refuses an open call, so the tooling needs to
    keep the file descriptor open.

So the assumption in this processing is that a possibly used page cache is
evicted when removing the partition from actual usage (e.g. unmounting, ..).

While writing this I get to the point that it might not be the best idea to
rely on proper tool handling only and it might be a good idea to check for
an open count in BIODASDDISABLE as well so that the ioctls itself are safe
to use. (While it does not make a lot sense to use them alone.)
My assumption was that this is already done but obviously it isn't.

> BTW, you are updating block size according to new device size, before
>          rc = base->discipline->format_device(base, fdata, 1);
> 	if (rc == -EAGAIN)
> 		rc = base->discipline->format_device(base, fdata, 0);
> Unless something very unidiomatic is going on, this attempt to
> format might fail...

This is true. I guess the idea here was that the actual formatting of
track 0 is done last after the whole disk was successfully formatted and
everything went fine.
But actually also the invalidation of the first track would do this here.

So we should not only move this after the format_device call but we should
also add a check for DASD_FMT_INT_INVAL which is the first step in the
whole formatting.


My current conclusion would be that this patch itself is fine as is but I
should submit patches later to address the findings in this discussion.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-29 14:41             ` Stefan Haberland
@ 2024-04-30  0:30               ` Al Viro
  2024-04-30 11:35                 ` Stefan Haberland
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-04-30  0:30 UTC (permalink / raw)
  To: Stefan Haberland
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

On Mon, Apr 29, 2024 at 04:41:19PM +0200, Stefan Haberland wrote:

> The dasdfmt tool checks if the disk is actually in use and refuses to
> work on an 'in use' DASD.
> So for example a partition that was in use has to be unmounted first.

Hmm...  How is that check done?  Does it open device exclusive?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format()
  2024-04-30  0:30               ` Al Viro
@ 2024-04-30 11:35                 ` Stefan Haberland
  0 siblings, 0 replies; 116+ messages in thread
From: Stefan Haberland @ 2024-04-30 11:35 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-s390, jack, hch, brauner, axboe, linux-fsdevel,
	linux-block, yi.zhang, yangerkun, yukuai3, Yu Kuai,
	Eduard Shishkin, Alexander Gordeev, Jan Hoeppner

Am 30.04.24 um 02:30 schrieb Al Viro:
> On Mon, Apr 29, 2024 at 04:41:19PM +0200, Stefan Haberland wrote:
>
>> The dasdfmt tool checks if the disk is actually in use and refuses to
>> work on an 'in use' DASD.
>> So for example a partition that was in use has to be unmounted first.
> Hmm...  How is that check done?  Does it open device exclusive?
>

No, it just checks the open_count gathered from the driver through 
another ioctl.

And yes, of course there is a race in this check that between gathering 
the data
and disabling the device it could be opened.


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down
  2024-04-26  5:32           ` Gao Xiang
@ 2024-05-03  4:15             ` Al Viro
  2024-05-03 13:01               ` Gao Xiang
  0 siblings, 1 reply; 116+ messages in thread
From: Al Viro @ 2024-05-03  4:15 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3

On Fri, Apr 26, 2024 at 01:32:04PM +0800, Gao Xiang wrote:
> Hi Al,

> This patch caused some corrupted failure, since
> here erofs_read_metabuf() is EROFS_NO_KMAP and
> it's no needed to get a maped-address since only
> a page reference is needed.
> 
> >   		if (IS_ERR(mptr)) {
> >   			ret = PTR_ERR(mptr);
> >   			erofs_err(sb, "failed to get inline data %d", ret);
> > @@ -876,7 +876,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
> >   		}
> >   		get_page(map->buf.page);
> >   		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page);
> > -		fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK;
> > +		fe->pcl->pageofs_in = offset_in_page(mptr);
> 
> So it's unnecessary to change this line IMHO.

*nod*

thanks for catching that.

> BTW, would you mind routing this series through erofs tree
> with other erofs patches for -next (as long as this series
> isn't twisted with vfs and block stuffs...)?  Since I may
> need to test more to ensure they don't break anything and
> could fix them immediately by hand...

FWIW, my immediate interest here is the first couple of patches.

How about the following variant:

#misc.erofs (the first two commits) is put into never-rebased mode;
you pull it into your tree and do whatever's convenient with the rest.
I merge the same branch into block_device work; that way it doesn't
cause conflicts whatever else happens in our trees.

Are you OK with that?  At the moment I have
; git shortlog v6.9-rc2^..misc.erofs 
Al Viro (2):
      erofs: switch erofs_bread() to passing offset instead of block number
      erofs_buf: store address_space instead of inode

Linus Torvalds (1):
      Linux 6.9-rc2

IOW, it's those two commits, based at -rc2.  I can rebase that to other
starting point if that'd be more convenient for you.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down
  2024-05-03  4:15             ` Al Viro
@ 2024-05-03 13:01               ` Gao Xiang
  0 siblings, 0 replies; 116+ messages in thread
From: Gao Xiang @ 2024-05-03 13:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Yu Kuai, jack, hch, brauner, axboe, linux-fsdevel, linux-block,
	yi.zhang, yangerkun, yukuai3



On 2024/5/3 12:15, Al Viro wrote:
> On Fri, Apr 26, 2024 at 01:32:04PM +0800, Gao Xiang wrote:
>> Hi Al,
> 
>> This patch caused some corrupted failure, since
>> here erofs_read_metabuf() is EROFS_NO_KMAP and
>> it's no needed to get a maped-address since only
>> a page reference is needed.
>>
>>>    		if (IS_ERR(mptr)) {
>>>    			ret = PTR_ERR(mptr);
>>>    			erofs_err(sb, "failed to get inline data %d", ret);
>>> @@ -876,7 +876,7 @@ static int z_erofs_pcluster_begin(struct z_erofs_decompress_frontend *fe)
>>>    		}
>>>    		get_page(map->buf.page);
>>>    		WRITE_ONCE(fe->pcl->compressed_bvecs[0].page, map->buf.page);
>>> -		fe->pcl->pageofs_in = map->m_pa & ~PAGE_MASK;
>>> +		fe->pcl->pageofs_in = offset_in_page(mptr);
>>
>> So it's unnecessary to change this line IMHO.
> 
> *nod*
> 
> thanks for catching that.
> 
>> BTW, would you mind routing this series through erofs tree
>> with other erofs patches for -next (as long as this series
>> isn't twisted with vfs and block stuffs...)?  Since I may
>> need to test more to ensure they don't break anything and
>> could fix them immediately by hand...
> 
> FWIW, my immediate interest here is the first couple of patches.

Yes, the first two patches are fine by me, you could submit
directly.

> 
> How about the following variant:
> 
> #misc.erofs (the first two commits) is put into never-rebased mode;
> you pull it into your tree and do whatever's convenient with the rest.
> I merge the same branch into block_device work; that way it doesn't
> cause conflicts whatever else happens in our trees.
> 
> Are you OK with that?  At the moment I have
> ; git shortlog v6.9-rc2^..misc.erofs
> Al Viro (2):
>        erofs: switch erofs_bread() to passing offset instead of block number
>        erofs_buf: store address_space instead of inode
> 
> Linus Torvalds (1):
>        Linux 6.9-rc2
> 
> IOW, it's those two commits, based at -rc2.  I can rebase that to other
> starting point if that'd be more convenient for you.

Yeah, thanks for that.  I think I will submit two pull requests for
the next cycle, and I will send the second pull request after your
vfs work is landed upstream and it will include the remaining
patches you sent (a bit off this week since we're on holiday here).

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2024-05-03 13:01 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-06  9:09 [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 01/26] block: move two helpers into bdev.c Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 02/26] block: remove sync_blockdev_nowait() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 03/26] block: remove sync_blockdev_range() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 04/26] block: prevent direct access of bd_inode Yu Kuai
2024-04-07  2:22   ` Al Viro
2024-04-07  2:37     ` Yu Kuai
2024-04-11 11:12       ` Christian Brauner
2024-04-06  9:09 ` [PATCH vfs.all 05/26] block: add a helper bdev_read_folio() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 06/26] bcachefs: remove dead function bdev_sectors() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 07/26] cramfs: prevent direct access of bd_inode Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 08/26] erofs: " Yu Kuai
2024-04-07  4:05   ` Al Viro
2024-04-07  4:08     ` Al Viro
2024-04-11 16:13     ` Gao Xiang
2024-04-12  1:14       ` Yu Kuai
2024-04-25 19:56       ` Al Viro
2024-04-25 19:57         ` [PATCH 1/6] erofs: switch erofs_bread() to passing offset instead of block number Al Viro
2024-04-29  3:01           ` Gao Xiang
2024-04-25 19:58         ` [PATCH 2/6] erofs_buf: store address_space instead of inode Al Viro
2024-04-29  3:01           ` Gao Xiang
2024-04-25 19:58         ` erofs: mechanically convert erofs_read_metabuf() to offsets Al Viro
2024-04-25 19:59         ` [PATCH 4/6] erofs: don't align offset for erofs_read_metabuf() (simple cases) Al Viro
2024-04-25 19:59         ` [PATCH 5/6] erofs: don't round offset down for erofs_read_metabuf() Al Viro
2024-04-25 20:00         ` [PATCH 6/6] z_erofs_pcluster_begin(): don't bother with rounding position down Al Viro
2024-04-26  5:32           ` Gao Xiang
2024-05-03  4:15             ` Al Viro
2024-05-03 13:01               ` Gao Xiang
2024-04-25 20:08         ` [PATCH vfs.all 08/26] erofs: prevent direct access of bd_inode Al Viro
2024-04-25 21:56           ` Gao Xiang
2024-04-25 22:28             ` Al Viro
2024-04-25 23:11               ` Gao Xiang
2024-04-25 23:22         ` Gao Xiang
2024-04-06  9:09 ` [PATCH vfs.all 09/26] nilfs2: " Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 10/26] gfs2: " Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 11/26] btrfs: " Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 12/26] ext4: remove block_device_ejected() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 13/26] ext4: prevent direct access of bd_inode Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 14/26] jbd2: " Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 15/26] s390/dasd: use bdev api in dasd_format() Yu Kuai
2024-04-16  1:35   ` Al Viro
2024-04-16  8:47     ` Alexander Gordeev
2024-04-17 12:47       ` Stefan Haberland
2024-04-28 18:58         ` Al Viro
2024-04-28 23:23           ` Al Viro
2024-04-29 14:41             ` Stefan Haberland
2024-04-30  0:30               ` Al Viro
2024-04-30 11:35                 ` Stefan Haberland
2024-04-06  9:09 ` [PATCH vfs.all 16/26] bcache: prevent direct access of bd_inode Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 17/26] block2mtd: " Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 18/26] scsi: use bdev helper in scsi_bios_ptable() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 19/26] dm-vdo: convert to use bdev_file Yu Kuai
2024-04-10 10:56   ` Jan Kara
2024-04-10 17:26   ` Matthew Sakai
2024-04-10 17:40     ` Al Viro
2024-04-10 18:59       ` Matthew Sakai
2024-04-11 11:12       ` Christian Brauner
2024-04-06  9:09 ` [PATCH vfs.all 20/26] block: factor out a helper init_bdev_file() Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 21/26] block: fix module reference leakage from bdev_open_by_dev error path Yu Kuai
2024-04-11  9:16   ` (subset) " Christian Brauner
2024-04-06  9:09 ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
2024-04-06 19:42   ` Al Viro
2024-04-06 20:29     ` Al Viro
2024-04-07  1:18       ` Yu Kuai
2024-04-07  1:51         ` Al Viro
2024-04-07  2:34           ` Yu Kuai
2024-04-07  3:06             ` Al Viro
2024-04-07  3:21               ` Yu Kuai
2024-04-07  4:57                 ` Al Viro
2024-04-07  5:11                   ` Al Viro
2024-04-07  5:21                     ` Al Viro
2024-04-11 15:22                     ` Matthew Wilcox
2024-04-09  4:26                 ` Al Viro
2024-04-09  4:53                   ` Al Viro
2024-04-09  6:22                   ` Yu Kuai
2024-04-10 10:59                     ` Jan Kara
2024-04-10 22:34                       ` Al Viro
2024-04-11 11:56                         ` Christian Brauner
2024-04-11 14:04                           ` Al Viro
2024-04-11 14:49                             ` Al Viro
2024-04-11 14:53                               ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Al Viro
2024-04-11 14:53                                 ` [PATCH 02/11] use ->bd_mapping instead of ->bd_inode->i_mapping Al Viro
2024-04-11 14:53                                 ` [PATCH 03/11] grow_dev_folio(): we only want ->bd_inode->i_mapping there Al Viro
2024-04-11 14:59                                   ` Matthew Wilcox
2024-04-11 14:53                                 ` [PATCH 04/11] gfs2: more obvious initializations of mapping->host Al Viro
2024-04-11 14:53                                 ` [PATCH 05/11] blkdev_write_iter(): saner way to get inode and bdev Al Viro
2024-04-11 14:53                                 ` [PATCH 06/11] blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here Al Viro
2024-04-11 14:53                                 ` [PATCH 07/11] ext4: remove block_device_ejected() Al Viro
2024-04-11 14:53                                 ` [PATCH 08/11] block: move two helpers into bdev.c Al Viro
2024-04-11 14:53                                 ` [PATCH 09/11] dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) Al Viro
2024-04-11 18:04                                   ` Matthew Sakai
2024-04-11 14:53                                 ` [PATCH 10/11] bcachefs: remove dead function bdev_sectors() Al Viro
2024-04-11 14:53                                 ` [PATCH 11/11] block2mtd: prevent direct access of bd_inode Al Viro
2024-04-17 11:05                                 ` [PATCH 01/11] block_device: add a pointer to struct address_space (page cache of bdev) Christian Brauner
2024-04-12  1:38                               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Yu Kuai
2024-04-12  2:59                                 ` Al Viro
2024-04-12  4:41                                   ` Al Viro
2024-04-12  7:13                                     ` Al Viro
2024-04-12  9:21                             ` Christian Brauner
2024-04-12 11:29                               ` Al Viro
2024-04-13 15:25                                 ` Christian Brauner
2024-04-15 20:45                                   ` Al Viro
2024-04-16  6:32                                     ` Al Viro
2024-04-17  4:35                                       ` [PATCH][RFC] set_blocksize() in pktcdvd (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
2024-04-17 13:43                                       ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Jan Kara
2024-04-17 15:23                                         ` Al Viro
2024-04-17 20:45                                       ` [RFC] set_blocksize() in kernel/power/swap.c (was Re: [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device) Al Viro
2024-04-09  9:00               ` [PATCH vfs.all 22/26] block: stash a bdev_file to read/write raw blcok_device Christian Brauner
2024-04-09 10:23   ` Christian Brauner
2024-04-09 11:53     ` Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 23/26] iomap: add helpers helpers to get and set bdev Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 24/26] iomap: convert to use bdev_file Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 25/26] buffer: add helpers to get and set bdev Yu Kuai
2024-04-06  9:09 ` [PATCH vfs.all 26/26] buffer: convert to use bdev_file Yu Kuai
2024-04-07  2:20 ` [PATCH vfs.all 00/26] fs & block: remove bdev->bd_inode Yu Kuai
2024-04-08 14:05   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).