linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules
@ 2023-12-05 12:37 Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
                   ` (14 more replies)
  0 siblings, 15 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Patch 1 add some bdev apis, then follow up patches will use these apis
to avoid access bd_inode directly, and hopefully the field bd_inode can
be removed eventually(after figure out a way for fs/buffer.c).

Yu Kuai (14):
  block: add some bdev apis
  xen/blkback: use bdev api in xen_update_blkif_status()
  bcache: use bdev api in read_super()
  mtd: block2mtd: use bdev apis
  s390/dasd: use bdev api in dasd_format()
  scsicam: use bdev api in scsi_bios_ptable()
  bcachefs: remove dead function bdev_sectors()
  btrfs: use bdev apis
  cramfs: use bdev apis in cramfs_blkdev_read()
  erofs: use bdev api
  ext4: use bdev apis
  jbd2: use bdev apis
  gfs2: use bdev api
  nilfs2: use bdev api in nilfs_attach_log_writer()

 block/bdev.c                       | 116 +++++++++++++++++++++++++++++
 block/bio.c                        |   1 +
 block/blk.h                        |   2 -
 drivers/block/xen-blkback/xenbus.c |   3 +-
 drivers/md/bcache/super.c          |  11 ++-
 drivers/mtd/devices/block2mtd.c    |  80 +++++++++-----------
 drivers/s390/block/dasd_ioctl.c    |   5 +-
 drivers/scsi/scsicam.c             |   3 +-
 fs/bcachefs/util.h                 |   5 --
 fs/btrfs/disk-io.c                 |  68 ++++++++---------
 fs/btrfs/volumes.c                 |  17 ++---
 fs/btrfs/zoned.c                   |  12 ++-
 fs/cramfs/inode.c                  |  35 +++------
 fs/erofs/data.c                    |  17 +++--
 fs/erofs/internal.h                |   1 +
 fs/ext4/dir.c                      |   6 +-
 fs/ext4/ext4_jbd2.c                |   6 +-
 fs/ext4/super.c                    |  27 +------
 fs/gfs2/glock.c                    |   2 +-
 fs/gfs2/ops_fstype.c               |   2 +-
 fs/jbd2/journal.c                  |   3 +-
 fs/jbd2/recovery.c                 |   6 +-
 fs/nilfs2/segment.c                |   2 +-
 include/linux/blkdev.h             |  27 +++++++
 include/linux/buffer_head.h        |   5 +-
 25 files changed, 273 insertions(+), 189 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 17:03   ` Bart Van Assche
                     ` (2 more replies)
  2023-12-05 12:37 ` [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Those apis will be used for other modules, so that bd_inode won't be
accessed directly from other modules.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           | 116 +++++++++++++++++++++++++++++++++++++++++
 block/bio.c            |   1 +
 block/blk.h            |   2 -
 include/linux/blkdev.h |  27 ++++++++++
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 6f73b02d549c..fcba5c1bd113 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -92,6 +92,13 @@ void invalidate_bdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+			   pgoff_t end)
+{
+	invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
+}
+EXPORT_SYMBOL_GPL(invalidate_bdev_range);
+
 /*
  * Drop all buffers & page cache for given bdev range. This function bails
  * with error if bdev has other exclusive owner (such as filesystem).
@@ -124,6 +131,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 					     lstart >> PAGE_SHIFT,
 					     lend >> PAGE_SHIFT);
 }
+EXPORT_SYMBOL_GPL(truncate_bdev_range);
 
 static void set_init_blocksize(struct block_device *bdev)
 {
@@ -138,6 +146,18 @@ static void set_init_blocksize(struct block_device *bdev)
 	bdev->bd_inode->i_blkbits = blksize_bits(bsize);
 }
 
+loff_t bdev_size(struct block_device *bdev)
+{
+	loff_t size;
+
+	spin_lock(&bdev->bd_size_lock);
+	size = i_size_read(bdev->bd_inode);
+	spin_unlock(&bdev->bd_size_lock);
+
+	return size;
+}
+EXPORT_SYMBOL_GPL(bdev_size);
+
 int set_blocksize(struct block_device *bdev, int size)
 {
 	/* Size must be a power of two, and between 512 and PAGE_SIZE */
@@ -1144,3 +1164,99 @@ static int __init setup_bdev_allow_write_mounted(char *str)
 	return 1;
 }
 __setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
+
+struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
+{
+	return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio);
+
+struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
+				  gfp_t gfp)
+{
+	return mapping_read_folio_gfp(bdev->bd_inode->i_mapping, index, gfp);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio_gfp);
+
+struct folio *bdev_get_folio(struct block_device *bdev, pgoff_t index)
+{
+	return filemap_get_folio(bdev->bd_inode->i_mapping, index);
+}
+EXPORT_SYMBOL_GPL(bdev_get_folio);
+
+struct folio *bdev_find_or_create_folio(struct block_device *bdev,
+					pgoff_t index, gfp_t gfp)
+{
+	return __filemap_get_folio(bdev->bd_inode->i_mapping, index,
+				   FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
+}
+EXPORT_SYMBOL_GPL(bdev_find_or_create_folio);
+
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since)
+{
+	return errseq_check(&bdev->bd_inode->i_mapping->wb_err, since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check);
+
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since)
+{
+	return errseq_check_and_advance(&bdev->bd_inode->i_mapping->wb_err,
+					since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check_and_advance);
+
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
+{
+	return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
+}
+EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);
+
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+			 struct file *file, pgoff_t index,
+			 unsigned long req_count)
+{
+	struct file_ra_state tmp_ra = {};
+
+	if (!ra) {
+		ra = &tmp_ra;
+		file_ra_state_init(ra, bdev->bd_inode->i_mapping);
+	}
+	page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index,
+				  req_count);
+}
+EXPORT_SYMBOL_GPL(bdev_sync_readahead);
+
+void bdev_attach_wb(struct block_device *bdev)
+{
+	inode_attach_wb(bdev->bd_inode, NULL);
+}
+EXPORT_SYMBOL_GPL(bdev_attach_wb);
+
+void bdev_correlate_mapping(struct block_device *bdev,
+			    struct address_space *mapping)
+{
+	mapping->host = bdev->bd_inode;
+}
+EXPORT_SYMBOL_GPL(bdev_correlate_mapping);
+
+gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp)
+{
+	return mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
+}
+EXPORT_SYMBOL_GPL(bdev_gfp_constraint);
+
+/*
+ * The del_gendisk() function uninitializes the disk-specific data
+ * structures, including the bdi structure, without telling anyone
+ * else.  Once this happens, any attempt to call mark_buffer_dirty()
+ * (for example, by ext4_commit_super), will cause a kernel OOPS.
+ * This is a kludge to prevent these oops until we can put in a proper
+ * hook in del_gendisk() to inform the VFS and file system layers.
+ */
+int bdev_ejected(struct block_device *bdev)
+{
+	struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
+
+	return bdi->dev == NULL;
+}
+EXPORT_SYMBOL_GPL(bdev_ejected);
diff --git a/block/bio.c b/block/bio.c
index 816d412c06e9..f7123ad9b4ee 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1119,6 +1119,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
 	WARN_ON_ONCE(off > UINT_MAX);
 	__bio_add_page(bio, &folio->page, len, off);
 }
+EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
 
 /**
  * bio_add_folio - Attempt to add part of a folio to a bio.
diff --git a/block/blk.h b/block/blk.h
index 08a358bc0919..da4becd4f7e9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -467,8 +467,6 @@ extern struct device_attribute dev_attr_events_poll_msecs;
 extern struct attribute_group blk_trace_attr_group;
 
 blk_mode_t file_to_blk_mode(struct file *file);
-int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
-		loff_t lstart, loff_t lend);
 long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3f8a21cd9233..a55db77274a4 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1342,6 +1342,11 @@ static inline unsigned int block_size(struct block_device *bdev)
 	return 1 << bdev->bd_inode->i_blkbits;
 }
 
+static inline u8 block_bits(struct block_device *bdev)
+{
+	return bdev->bd_inode->i_blkbits;
+}
+
 int kblockd_schedule_work(struct work_struct *work);
 int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay);
 
@@ -1515,6 +1520,28 @@ struct block_device *blkdev_get_no_open(dev_t dev);
 void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
+loff_t bdev_size(struct block_device *bdev);
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+			   pgoff_t end);
+int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
+		loff_t lstart, loff_t lend);
+struct folio *bdev_get_folio(struct block_device *bdev, pgoff_t index);
+struct folio *bdev_find_or_create_folio(struct block_device *bdev,
+					pgoff_t index, gfp_t gfp);
+struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index);
+struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
+				  gfp_t gfp);
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since);
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since);
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev);
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+			 struct file *file, pgoff_t index,
+			 unsigned long req_count);
+void bdev_attach_wb(struct block_device *bdev);
+void bdev_correlate_mapping(struct block_device *bdev,
+			    struct address_space *mapping);
+gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp);
+int bdev_ejected(struct block_device *bdev);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-06  5:55   ` Christoph Hellwig
  2023-12-05 12:37 ` [PATCH -next RFC 03/14] bcache: use bdev api in read_super() Yu Kuai
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/xen-blkback/xenbus.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e34219ea2b05..e645afa4af57 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 		xenbus_dev_error(blkif->be->dev, err, "block flush");
 		return;
 	}
-	invalidate_inode_pages2(
-			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
+	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
 
 	for (i = 0; i < blkif->nr_rings; i++) {
 		ring = &blkif->rings[i];
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 03/14] bcache: use bdev api in read_super()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 04/14] mtd: block2mtd: use bdev apis Yu Kuai
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/bcache/super.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1402096b8076..376b9dc2523f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -168,14 +168,13 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 {
 	const char *err;
 	struct cache_sb_disk *s;
-	struct page *page;
+	struct folio *folio;
 	unsigned int i;
 
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
-				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
-	if (IS_ERR(page))
+	folio = bdev_read_folio_gfp(bdev, SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
+	if (IS_ERR(folio))
 		return "IO error";
-	s = page_address(page) + offset_in_page(SB_OFFSET);
+	s = folio_address(folio) + offset_in_folio(folio, SB_OFFSET);
 
 	sb->offset		= le64_to_cpu(s->offset);
 	sb->version		= le64_to_cpu(s->version);
@@ -272,7 +271,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 	*res = s;
 	return NULL;
 err:
-	put_page(page);
+	folio_put(folio);
 	return err;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 04/14] mtd: block2mtd: use bdev apis
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (2 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 03/14] bcache: use bdev api in read_super() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 05/14] s390/dasd: use bdev api in dasd_format() Yu Kuai
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/mtd/devices/block2mtd.c | 80 +++++++++++++++------------------
 1 file changed, 35 insertions(+), 45 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index aa44a23ec045..927fc9cf0856 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -46,40 +46,34 @@ struct block2mtd_dev {
 /* Static info about the MTD, used in cleanup_module */
 static LIST_HEAD(blkmtd_device_list);
 
-
-static struct page *page_read(struct address_space *mapping, pgoff_t index)
-{
-	return read_mapping_page(mapping, index, NULL);
-}
-
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
-	struct page *page;
+	struct block_device *bdev = dev->bdev_handle->bdev;
+	struct folio *folio;
 	pgoff_t index = to >> PAGE_SHIFT;	// page index
 	int pages = len >> PAGE_SHIFT;
 	u_long *p;
 	u_long *max;
 
 	while (pages) {
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(bdev, index);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		max = page_address(page) + PAGE_SIZE;
-		for (p=page_address(page); p<max; p++)
+		max = folio_address(folio) + folio_size(folio);
+		for (p = folio_address(folio); p < max; p++)
 			if (*p != -1UL) {
-				lock_page(page);
-				memset(page_address(page), 0xff, PAGE_SIZE);
-				set_page_dirty(page);
-				unlock_page(page);
-				balance_dirty_pages_ratelimited(mapping);
+				folio_lock(folio);
+				memset(folio_address(folio), 0xff,
+				       folio_size(folio));
+				folio_mark_dirty(folio);
+				folio_unlock(folio);
+				bdev_balance_dirty_pages_ratelimited(bdev);
 				break;
 			}
 
-		put_page(page);
+		folio_put(folio);
 		pages--;
 		index++;
 	}
@@ -106,9 +100,7 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 		size_t *retlen, u_char *buf)
 {
 	struct block2mtd_dev *dev = mtd->priv;
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
-	struct page *page;
+	struct folio *folio;
 	pgoff_t index = from >> PAGE_SHIFT;
 	int offset = from & (PAGE_SIZE-1);
 	int cpylen;
@@ -120,12 +112,12 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 			cpylen = len;	// this page
 		len = len - cpylen;
 
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(dev->bdev_handle->bdev, index);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		memcpy(buf, page_address(page) + offset, cpylen);
-		put_page(page);
+		memcpy(buf, folio_address(folio) + offset, cpylen);
+		folio_put(folio);
 
 		if (retlen)
 			*retlen += cpylen;
@@ -141,9 +133,8 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 		loff_t to, size_t len, size_t *retlen)
 {
-	struct page *page;
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
+	struct block_device *bdev = dev->bdev_handle->bdev;
+	struct folio *folio;
 	pgoff_t index = to >> PAGE_SHIFT;	// page index
 	int offset = to & ~PAGE_MASK;	// page offset
 	int cpylen;
@@ -155,18 +146,18 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 			cpylen = len;			// this page
 		len = len - cpylen;
 
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(bdev, index);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		if (memcmp(page_address(page)+offset, buf, cpylen)) {
-			lock_page(page);
-			memcpy(page_address(page) + offset, buf, cpylen);
-			set_page_dirty(page);
-			unlock_page(page);
-			balance_dirty_pages_ratelimited(mapping);
+		if (memcmp(folio_address(folio) + offset, buf, cpylen)) {
+			folio_lock(folio);
+			memcpy(folio_address(folio) + offset, buf, cpylen);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+			bdev_balance_dirty_pages_ratelimited(bdev);
 		}
-		put_page(page);
+		folio_put(folio);
 
 		if (retlen)
 			*retlen += cpylen;
@@ -211,8 +202,7 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
 	kfree(dev->mtd.name);
 
 	if (dev->bdev_handle) {
-		invalidate_mapping_pages(
-			dev->bdev_handle->bdev->bd_inode->i_mapping, 0, -1);
+		invalidate_bdev(dev->bdev_handle->bdev);
 		bdev_release(dev->bdev_handle);
 	}
 
@@ -295,7 +285,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		goto err_free_block2mtd;
 	}
 
-	if ((long)bdev->bd_inode->i_size % erase_size) {
+	if ((long)bdev_size(bdev) % erase_size) {
 		pr_err("erasesize must be a divisor of device size\n");
 		goto err_free_block2mtd;
 	}
@@ -313,7 +303,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 
 	dev->mtd.name = name;
 
-	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+	dev->mtd.size = bdev_size(bdev) & PAGE_MASK;
 	dev->mtd.erasesize = erase_size;
 	dev->mtd.writesize = 1;
 	dev->mtd.writebufsize = PAGE_SIZE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 05/14] s390/dasd: use bdev api in dasd_format()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (3 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 04/14] mtd: block2mtd: use bdev apis Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 06/14] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/s390/block/dasd_ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 61b9675e2a67..bbfb958237e6 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -221,8 +221,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
 	 * enabling the device later.
 	 */
 	if (fdata->start_unit == 0) {
-		block->gdp->part0->bd_inode->i_blkbits =
-			blksize_bits(fdata->blksize);
+		rc = set_blocksize(block->gdp->part0, fdata->blksize);
+		if (rc)
+			return rc;
 	}
 
 	rc = base->discipline->format_device(base, fdata, 1);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 06/14] scsicam: use bdev api in scsi_bios_ptable()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (4 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 05/14] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 07/14] bcachefs: remove dead function bdev_sectors() Yu Kuai
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/scsi/scsicam.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..1c99b964a0eb 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,11 +32,10 @@
  */
 unsigned char *scsi_bios_ptable(struct block_device *dev)
 {
-	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
 	unsigned char *res = NULL;
 	struct folio *folio;
 
-	folio = read_mapping_folio(mapping, 0, NULL);
+	folio = bdev_read_folio(bdev_whole(dev), 0);
 	if (IS_ERR(folio))
 		return NULL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 07/14] bcachefs: remove dead function bdev_sectors()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (5 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 06/14] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 08/14] btrfs: use bdev apis Yu Kuai
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

bdev_sectors() is not used hence remove it.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/bcachefs/util.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index b93d5f481c7e..932ca6f7a37b 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -541,11 +541,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
 void bch2_bio_map(struct bio *bio, void *base, size_t);
 int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
 
-static inline sector_t bdev_sectors(struct block_device *bdev)
-{
-	return bdev->bd_inode->i_size >> 9;
-}
-
 #define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 08/14] btrfs: use bdev apis
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (6 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 07/14] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 09/14] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/btrfs/disk-io.c | 68 ++++++++++++++++++++--------------------------
 fs/btrfs/volumes.c | 17 ++++++------
 fs/btrfs/zoned.c   | 12 ++++----
 3 files changed, 42 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9317606017e2..cfe7ea417760 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3597,28 +3597,24 @@ ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 static void btrfs_end_super_write(struct bio *bio)
 {
 	struct btrfs_device *device = bio->bi_private;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
-	struct page *page;
-
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		page = bvec->bv_page;
+	struct folio_iter fi;
 
+	bio_for_each_folio_all(fi, bio) {
 		if (bio->bi_status) {
 			btrfs_warn_rl_in_rcu(device->fs_info,
 				"lost page write due to IO error on %s (%d)",
 				btrfs_dev_name(device),
 				blk_status_to_errno(bio->bi_status));
-			ClearPageUptodate(page);
-			SetPageError(page);
+			folio_clear_uptodate(fi.folio);
+			folio_set_error(fi.folio);
 			btrfs_dev_stat_inc_and_print(device,
 						     BTRFS_DEV_STAT_WRITE_ERRS);
 		} else {
-			SetPageUptodate(page);
+			folio_mark_uptodate(fi.folio);
 		}
 
-		put_page(page);
-		unlock_page(page);
+		folio_put(fi.folio);
+		folio_unlock(fi.folio);
 	}
 
 	bio_put(bio);
@@ -3628,9 +3624,8 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 						   int copy_num, bool drop_cache)
 {
 	struct btrfs_super_block *super;
-	struct page *page;
+	struct folio *folio;
 	u64 bytenr, bytenr_orig;
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
 	int ret;
 
 	bytenr_orig = btrfs_sb_offset(copy_num);
@@ -3651,16 +3646,15 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 		 * Drop the page of the primary superblock, so later read will
 		 * always read from the device.
 		 */
-		invalidate_inode_pages2_range(mapping,
-				bytenr >> PAGE_SHIFT,
+		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
 				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
 	}
 
-	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
-	if (IS_ERR(page))
-		return ERR_CAST(page);
+	folio = bdev_read_folio_gfp(bdev, bytenr >> PAGE_SHIFT, GFP_NOFS);
+	if (IS_ERR(folio))
+		return ERR_CAST(folio);
 
-	super = page_address(page);
+	super = folio_address(folio);
 	if (btrfs_super_magic(super) != BTRFS_MAGIC) {
 		btrfs_release_disk_super(super);
 		return ERR_PTR(-ENODATA);
@@ -3717,7 +3711,6 @@ static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
 	int errors = 0;
@@ -3730,7 +3723,7 @@ static int write_dev_supers(struct btrfs_device *device,
 	shash->tfm = fs_info->csum_shash;
 
 	for (i = 0; i < max_mirrors; i++) {
-		struct page *page;
+		struct folio *folio;
 		struct bio *bio;
 		struct btrfs_super_block *disk_super;
 
@@ -3755,9 +3748,10 @@ static int write_dev_supers(struct btrfs_device *device,
 				    BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE,
 				    sb->csum);
 
-		page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT,
-					   GFP_NOFS);
-		if (!page) {
+		folio = bdev_find_or_create_folio(device->bdev,
+						  bytenr >> PAGE_SHIFT,
+						  GFP_NOFS);
+		if (IS_ERR(folio)) {
 			btrfs_err(device->fs_info,
 			    "couldn't get super block page for bytenr %llu",
 			    bytenr);
@@ -3766,9 +3760,9 @@ static int write_dev_supers(struct btrfs_device *device,
 		}
 
 		/* Bump the refcount for wait_dev_supers() */
-		get_page(page);
+		folio_get(folio);
 
-		disk_super = page_address(page);
+		disk_super = folio_address(folio);
 		memcpy(disk_super, sb, BTRFS_SUPER_INFO_SIZE);
 
 		/*
@@ -3782,8 +3776,8 @@ static int write_dev_supers(struct btrfs_device *device,
 		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
 		bio->bi_private = device;
 		bio->bi_end_io = btrfs_end_super_write;
-		__bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
-			       offset_in_page(bytenr));
+		bio_add_folio_nofail(bio, folio, BTRFS_SUPER_INFO_SIZE,
+				     offset_in_folio(folio, bytenr));
 
 		/*
 		 * We FUA only the first super block.  The others we allow to
@@ -3819,7 +3813,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		max_mirrors = BTRFS_SUPER_MIRROR_MAX;
 
 	for (i = 0; i < max_mirrors; i++) {
-		struct page *page;
+		struct folio *folio;
 
 		ret = btrfs_sb_log_location(device, i, READ, &bytenr);
 		if (ret == -ENOENT) {
@@ -3834,27 +3828,23 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		    device->commit_total_bytes)
 			break;
 
-		page = find_get_page(device->bdev->bd_inode->i_mapping,
-				     bytenr >> PAGE_SHIFT);
-		if (!page) {
+		folio = bdev_get_folio(device->bdev, bytenr >> PAGE_SHIFT);
+		if (!folio) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 			continue;
 		}
 		/* Page is submitted locked and unlocked once the IO completes */
-		wait_on_page_locked(page);
-		if (PageError(page)) {
+		folio_wait_locked(folio);
+		if (folio_test_error(folio)) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 		}
 
-		/* Drop our reference */
-		put_page(page);
-
-		/* Drop the reference from the writing run */
-		put_page(page);
+		/* Drop our reference and the reference from the writing run */
+		folio_put_refs(folio, 2);
 	}
 
 	/* log error, force error return */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1cc6b5d5eb61..3930495aebd1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1230,16 +1230,16 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 
 void btrfs_release_disk_super(struct btrfs_super_block *super)
 {
-	struct page *page = virt_to_page(super);
+	struct folio *folio = virt_to_folio(super);
 
-	put_page(page);
+	folio_put(folio);
 }
 
 static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
 						       u64 bytenr, u64 bytenr_orig)
 {
 	struct btrfs_super_block *disk_super;
-	struct page *page;
+	struct folio *folio;
 	void *p;
 	pgoff_t index;
 
@@ -1257,15 +1257,14 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 		return ERR_PTR(-EINVAL);
 
 	/* pull in the page with our super */
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
+	folio = bdev_read_folio_gfp(bdev, index, GFP_KERNEL);
+	if (IS_ERR(folio))
+		return ERR_CAST(folio);
 
-	if (IS_ERR(page))
-		return ERR_CAST(page);
-
-	p = page_address(page);
+	p = folio_address(folio);
 
 	/* align our pointer to the offset of the super block */
-	disk_super = p + offset_in_page(bytenr);
+	disk_super = p + offset_in_folio(folio, bytenr);
 
 	if (btrfs_super_bytenr(disk_super) != bytenr_orig ||
 	    btrfs_super_magic(disk_super) != BTRFS_MAGIC) {
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 12066afc235c..77d5f906ff16 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -120,8 +120,6 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 		return -ENOENT;
 	} else if (full[0] && full[1]) {
 		/* Compare two super blocks */
-		struct address_space *mapping = bdev->bd_inode->i_mapping;
-		struct page *page[BTRFS_NR_SB_LOG_ZONES];
 		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
 		int i;
 
@@ -129,15 +127,15 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 			u64 zone_end = (zones[i].start + zones[i].capacity) << SECTOR_SHIFT;
 			u64 bytenr = ALIGN_DOWN(zone_end, BTRFS_SUPER_INFO_SIZE) -
 						BTRFS_SUPER_INFO_SIZE;
-
-			page[i] = read_cache_page_gfp(mapping,
+			struct folio *folio = bdev_read_folio_gfp(bdev,
 					bytenr >> PAGE_SHIFT, GFP_NOFS);
-			if (IS_ERR(page[i])) {
+
+			if (IS_ERR(folio)) {
 				if (i == 1)
 					btrfs_release_disk_super(super[0]);
-				return PTR_ERR(page[i]);
+				return PTR_ERR(folio);
 			}
-			super[i] = page_address(page[i]);
+			super[i] = folio_address(folio);
 		}
 
 		if (btrfs_super_generation(super[0]) >
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 09/14] cramfs: use bdev apis in cramfs_blkdev_read()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (7 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 08/14] btrfs: use bdev apis Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 10/14] erofs: use bdev api Yu Kuai
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Also do some cleanup that there is no need for two for loop, and remove
local array pages.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/cramfs/inode.c | 35 ++++++++++++-----------------------
 1 file changed, 12 insertions(+), 23 deletions(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 60dbfa0f8805..46ff4e5506fd 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -183,9 +183,6 @@ static int next_buffer;
 static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 				unsigned int len)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
-	struct file_ra_state ra = {};
-	struct page *pages[BLKS_PER_BUF];
 	unsigned i, blocknr, buffer;
 	unsigned long devsize;
 	char *data;
@@ -214,37 +211,29 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 	devsize = bdev_nr_bytes(sb->s_bdev) >> PAGE_SHIFT;
 
 	/* Ok, read in BLKS_PER_BUF pages completely first. */
-	file_ra_state_init(&ra, mapping);
-	page_cache_sync_readahead(mapping, &ra, NULL, blocknr, BLKS_PER_BUF);
-
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = NULL;
-
-		if (blocknr + i < devsize) {
-			page = read_mapping_page(mapping, blocknr + i, NULL);
-			/* synchronous error? */
-			if (IS_ERR(page))
-				page = NULL;
-		}
-		pages[i] = page;
-	}
+	bdev_sync_readahead(sb->s_bdev, NULL, NULL, blocknr, BLKS_PER_BUF);
 
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;
 	buffer_dev[buffer] = sb;
-
 	data = read_buffers[buffer];
+
 	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
+		struct folio *folio = NULL;
+
+		if (blocknr + i < devsize)
+			folio = bdev_read_folio(sb->s_bdev, blocknr + i);
 
-		if (page) {
-			memcpy_from_page(data, page, 0, PAGE_SIZE);
-			put_page(page);
-		} else
+		if (IS_ERR_OR_NULL(folio)) {
 			memset(data, 0, PAGE_SIZE);
+		} else {
+			memcpy_from_folio(data, folio, 0, PAGE_SIZE);
+			folio_put(folio);
+		}
 		data += PAGE_SIZE;
 	}
+
 	return read_buffers[buffer] + offset;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 10/14] erofs: use bdev api
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (8 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 09/14] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:37 ` [PATCH -next RFC 11/14] ext4: use bdev apis Yu Kuai
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/erofs/data.c     | 17 +++++++++++------
 fs/erofs/internal.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index c98aeda8abb2..b9d2c90f9b22 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
 void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 		  enum erofs_kmap_type type)
 {
-	struct inode *inode = buf->inode;
-	erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
+	u8 blkbits = buf->inode ? buf->inode->i_blkbits : block_bits(buf->bdev);
+	erofs_off_t offset = (erofs_off_t)blkaddr << blkbits;
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 	struct folio *folio;
@@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 		erofs_put_metabuf(buf);
 
 		nofs_flag = memalloc_nofs_save();
-		folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
+		folio = buf->inode ?
+			read_mapping_folio(buf->inode->i_mapping, index, NULL) :
+			bdev_read_folio(buf->bdev, index);
 		memalloc_nofs_restore(nofs_flag);
 		if (IS_ERR(folio))
 			return folio;
@@ -67,10 +69,13 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 
 void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 {
-	if (erofs_is_fscache_mode(sb))
+	if (erofs_is_fscache_mode(sb)) {
 		buf->inode = EROFS_SB(sb)->s_fscache->inode;
-	else
-		buf->inode = sb->s_bdev->bd_inode;
+		buf->bdev = NULL;
+	} else {
+		buf->inode = NULL;
+		buf->bdev = sb->s_bdev;
+	}
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b0409badb017..a68b0924c052 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -224,6 +224,7 @@ enum erofs_kmap_type {
 
 struct erofs_buf {
 	struct inode *inode;
+	struct block_device *bdev;
 	struct page *page;
 	void *base;
 	enum erofs_kmap_type kmap_type;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 11/14] ext4: use bdev apis
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (9 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 10/14] erofs: use bdev api Yu Kuai
@ 2023-12-05 12:37 ` Yu Kuai
  2023-12-05 12:38 ` [PATCH -next RFC 12/14] jbd2: " Yu Kuai
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:37 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/ext4/dir.c               |  6 ++----
 fs/ext4/ext4_jbd2.c         |  6 +++---
 fs/ext4/super.c             | 27 ++++-----------------------
 include/linux/buffer_head.h |  5 +++--
 4 files changed, 12 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..64e35eb6a324 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -191,10 +191,8 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 			pgoff_t index = map.m_pblk >>
 					(PAGE_SHIFT - inode->i_blkbits);
 			if (!ra_has_index(&file->f_ra, index))
-				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
-					&file->f_ra, file,
-					index, 1);
+				bdev_sync_readahead(sb->s_bdev, &file->f_ra,
+						    file, index, 1);
 			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
 			bh = ext4_bread(NULL, inode, map.m_lblk, 0);
 			if (IS_ERR(bh)) {
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index d1a2e6624401..c1bf3a00fad9 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,6 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
 
 static void ext4_check_bdev_write_error(struct super_block *sb)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int err;
 
@@ -216,9 +215,10 @@ static void ext4_check_bdev_write_error(struct super_block *sb)
 	 * we could read old data from disk and write it out again, which
 	 * may lead to on-disk filesystem inconsistency.
 	 */
-	if (errseq_check(&mapping->wb_err, READ_ONCE(sbi->s_bdev_wb_err))) {
+	if (bdev_wb_err_check(sb->s_bdev, READ_ONCE(sbi->s_bdev_wb_err))) {
 		spin_lock(&sbi->s_bdev_wb_lock);
-		err = errseq_check_and_advance(&mapping->wb_err, &sbi->s_bdev_wb_err);
+		err = bdev_wb_err_check_and_advance(sb->s_bdev,
+						    &sbi->s_bdev_wb_err);
 		spin_unlock(&sbi->s_bdev_wb_lock);
 		if (err)
 			ext4_error_err(sb, -err,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0980845c8b8f..243671d86db3 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -244,8 +244,7 @@ static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
 struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 				   blk_opf_t op_flags)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
-			~__GFP_FS) | __GFP_MOVABLE;
+	gfp_t gfp = bdev_gfp_constraint(sb->s_bdev, ~__GFP_FS) | __GFP_MOVABLE;
 
 	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
 }
@@ -253,8 +252,7 @@ struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 					    sector_t block)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
-			~__GFP_FS);
+	gfp_t gfp = bdev_gfp_constraint(sb->s_bdev, ~__GFP_FS);
 
 	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
 }
@@ -492,22 +490,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else.  Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
-	struct inode *bd_inode = sb->s_bdev->bd_inode;
-	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
-	return bdi->dev == NULL;
-}
-
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
 {
 	struct super_block		*sb = journal->j_private;
@@ -5585,8 +5567,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	 * used to detect the metadata async write error.
 	 */
 	spin_lock_init(&sbi->s_bdev_wb_lock);
-	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
-				 &sbi->s_bdev_wb_err);
+	bdev_wb_err_check_and_advance(sb->s_bdev, &sbi->s_bdev_wb_err);
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
 	EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
@@ -6185,7 +6166,7 @@ static int ext4_commit_super(struct super_block *sb)
 
 	if (!sbh)
 		return -EINVAL;
-	if (block_device_ejected(sb))
+	if (bdev_ejected(sb->s_bdev))
 		return -ENODEV;
 
 	ext4_update_super(sb);
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 5f23ee599889..3a88b295b4f2 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -15,6 +15,7 @@
 #include <linux/pagemap.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
+#include <linux/blkdev.h>
 
 enum bh_state_bits {
 	BH_Uptodate,	/* Contains valid data */
@@ -341,7 +342,7 @@ static inline struct buffer_head *getblk_unmovable(struct block_device *bdev,
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = bdev_gfp_constraint(bdev, ~__GFP_FS);
 	gfp |= __GFP_NOFAIL;
 
 	return bdev_getblk(bdev, block, size, gfp);
@@ -352,7 +353,7 @@ static inline struct buffer_head *__getblk(struct block_device *bdev,
 {
 	gfp_t gfp;
 
-	gfp = mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+	gfp = bdev_gfp_constraint(bdev, ~__GFP_FS);
 	gfp |= __GFP_MOVABLE | __GFP_NOFAIL;
 
 	return bdev_getblk(bdev, block, size, gfp);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 12/14] jbd2: use bdev apis
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (10 preceding siblings ...)
  2023-12-05 12:37 ` [PATCH -next RFC 11/14] ext4: use bdev apis Yu Kuai
@ 2023-12-05 12:38 ` Yu Kuai
  2023-12-05 12:39 ` [PATCH -next RFC 13/14] gfs2: use bdev api Yu Kuai
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:38 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/jbd2/journal.c  | 3 +--
 fs/jbd2/recovery.c | 6 ++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ed53188472f9..f1b5ffeaf02a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2003,8 +2003,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
 		byte_count = (block_stop - block_start + 1) *
 				journal->j_blocksize;
 
-		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
-				byte_start, byte_stop);
+		truncate_bdev_range(journal->j_dev, 0, byte_start, byte_stop);
 
 		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
 			err = blkdev_issue_discard(journal->j_dev,
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 01f744cb97a4..6b6a2c4585fa 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -290,7 +290,6 @@ int jbd2_journal_recover(journal_t *journal)
 
 	struct recovery_info	info;
 	errseq_t		wb_err;
-	struct address_space	*mapping;
 
 	memset(&info, 0, sizeof(info));
 	sb = journal->j_superblock;
@@ -309,8 +308,7 @@ int jbd2_journal_recover(journal_t *journal)
 	}
 
 	wb_err = 0;
-	mapping = journal->j_fs_dev->bd_inode->i_mapping;
-	errseq_check_and_advance(&mapping->wb_err, &wb_err);
+	bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
 	err = do_one_pass(journal, &info, PASS_SCAN);
 	if (!err)
 		err = do_one_pass(journal, &info, PASS_REVOKE);
@@ -334,7 +332,7 @@ int jbd2_journal_recover(journal_t *journal)
 	err2 = sync_blockdev(journal->j_fs_dev);
 	if (!err)
 		err = err2;
-	err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
+	err2 = bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
 	if (!err)
 		err = err2;
 	/* Make sure all replayed data is on permanent storage */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 13/14] gfs2: use bdev api
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (11 preceding siblings ...)
  2023-12-05 12:38 ` [PATCH -next RFC 12/14] jbd2: " Yu Kuai
@ 2023-12-05 12:39 ` Yu Kuai
  2023-12-05 12:39 ` [PATCH -next RFC 14/14] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
  2023-12-06  5:54 ` [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Christoph Hellwig
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:39 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/gfs2/glock.c      | 2 +-
 fs/gfs2/ops_fstype.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index f28c67181230..c66b0ed07e15 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1227,7 +1227,7 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
 	mapping = gfs2_glock2aspace(gl);
 	if (mapping) {
                 mapping->a_ops = &gfs2_meta_aops;
-		mapping->host = s->s_bdev->bd_inode;
+		bdev_correlate_mapping(s->s_bdev, mapping);
 		mapping->flags = 0;
 		mapping_set_gfp_mask(mapping, GFP_NOFS);
 		mapping->i_private_data = NULL;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 00ce89bdf32c..3145a56c88cb 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -114,7 +114,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	address_space_init_once(mapping);
 	mapping->a_ops = &gfs2_rgrp_aops;
-	mapping->host = sb->s_bdev->bd_inode;
+	bdev_correlate_mapping(sb->s_bdev, mapping);
 	mapping->flags = 0;
 	mapping_set_gfp_mask(mapping, GFP_NOFS);
 	mapping->i_private_data = NULL;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH -next RFC 14/14] nilfs2: use bdev api in nilfs_attach_log_writer()
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (12 preceding siblings ...)
  2023-12-05 12:39 ` [PATCH -next RFC 13/14] gfs2: use bdev api Yu Kuai
@ 2023-12-05 12:39 ` Yu Kuai
  2023-12-06  5:54 ` [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Christoph Hellwig
  14 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-05 12:39 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yukuai1, yi.zhang,
	yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/nilfs2/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 52995838f2de..be47a1d21889 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2824,7 +2824,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
 	if (!nilfs->ns_writer)
 		return -ENOMEM;
 
-	inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
+	bdev_attach_wb(nilfs->ns_bdev);
 
 	err = nilfs_segctor_start_thread(nilfs->ns_writer);
 	if (unlikely(err))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
@ 2023-12-05 17:03   ` Bart Van Assche
  2023-12-06  6:14   ` Christoph Hellwig
  2023-12-06 14:58   ` Matthew Wilcox
  2 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2023-12-05 17:03 UTC (permalink / raw)
  To: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, nico,
	xiang, chao, tytso, adilger.kernel, agruenba, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs,
	linux-ext4, gfs2, linux-nilfs, yukuai3, yi.zhang, yangerkun

On 12/5/23 04:37, Yu Kuai wrote:
> +static inline u8 block_bits(struct block_device *bdev)
> +{
> +	return bdev->bd_inode->i_blkbits;
> +}

This function needs a name that's more descriptive.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules
  2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (13 preceding siblings ...)
  2023-12-05 12:39 ` [PATCH -next RFC 14/14] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
@ 2023-12-06  5:54 ` Christoph Hellwig
  2023-12-06  6:06   ` Yu Kuai
  14 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06  5:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs, yukuai3,
	yi.zhang, yangerkun

On Tue, Dec 05, 2023 at 08:37:14PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Patch 1 add some bdev apis, then follow up patches will use these apis
> to avoid access bd_inode directly, and hopefully the field bd_inode can
> be removed eventually(after figure out a way for fs/buffer.c).

What tree is this against?  It fails to apply to either Jens'
for-6.8/block or Linus tree in the very first patch.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-05 12:37 ` [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
@ 2023-12-06  5:55   ` Christoph Hellwig
  2023-12-06  6:56     ` Yu Kuai
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06  5:55 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs, yukuai3,
	yi.zhang, yangerkun

On Tue, Dec 05, 2023 at 08:37:16PM +0800, Yu Kuai wrote:
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index e34219ea2b05..e645afa4af57 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  		xenbus_dev_error(blkif->be->dev, err, "block flush");
>  		return;
>  	}
> -	invalidate_inode_pages2(
> -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);

blkbak is a bdev exported.   I don't think it should ever call
invalidate_inode_pages2, through a wrapper or not.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules
  2023-12-06  5:54 ` [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Christoph Hellwig
@ 2023-12-06  6:06   ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-06  6:06 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/12/06 13:54, Christoph Hellwig 写道:
> On Tue, Dec 05, 2023 at 08:37:14PM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Patch 1 add some bdev apis, then follow up patches will use these apis
>> to avoid access bd_inode directly, and hopefully the field bd_inode can
>> be removed eventually(after figure out a way for fs/buffer.c).
> 
> What tree is this against?  It fails to apply to either Jens'
> for-6.8/block or Linus tree in the very first patch.

It was against linux-next branch, for the tag next-20231201, because I'm
not sure yet if this patchset should be applied to Jans' tree. Please
let me know if I should swith wo Jens' tree for v2.

Thanks,
Kuai
> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
  2023-12-05 17:03   ` Bart Van Assche
@ 2023-12-06  6:14   ` Christoph Hellwig
  2023-12-06  6:50     ` Yu Kuai
  2023-12-06 17:50     ` Theodore Ts'o
  2023-12-06 14:58   ` Matthew Wilcox
  2 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06  6:14 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs, yukuai3,
	yi.zhang, yangerkun

> +void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
> +			   pgoff_t end)
> +{
> +	invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
> +}
> +EXPORT_SYMBOL_GPL(invalidate_bdev_range);

All these could probably use kerneldoc comments.

For this one I really don't like it existing at all, but we'll have to
discuss that in the btrfs patch.

> +loff_t bdev_size(struct block_device *bdev)
> +{
> +	loff_t size;
> +
> +	spin_lock(&bdev->bd_size_lock);
> +	size = i_size_read(bdev->bd_inode);
> +	spin_unlock(&bdev->bd_size_lock);
> +
> +	return size;
> +}
> +EXPORT_SYMBOL_GPL(bdev_size);

No need for this one.  The callers can simply use bdev_nr_bytes.

> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
> +{
> +	return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio);
> +
> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
> +				  gfp_t gfp)
> +{
> +	return mapping_read_folio_gfp(bdev->bd_inode->i_mapping, index, gfp);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio_gfp);

I think we can just drop bdev_read_folio_gfp. Half of the callers simply
pass GPK_KERNEL, and the other half passes GFP_NOFS and could just use
memalloc_nofs_save().

> +void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
> +{
> +	return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
> +}
> +EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);

Hmm, this is just used for block2mtd, and feels a little too low-level
to me, as block2mtd really should be using the normal fileread/write
APIs.  I guess we'll have to live with it for now if we want to expedite
killing off bd_inode.

> +void bdev_correlate_mapping(struct block_device *bdev,
> +			    struct address_space *mapping)
> +{
> +	mapping->host = bdev->bd_inode;
> +}
> +EXPORT_SYMBOL_GPL(bdev_correlate_mapping);

Maybe associated insted of correlate?  Either way this basically
fully exposes the bdev inode again :(

> +gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp)
> +{
> +	return mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
> +}
> +EXPORT_SYMBOL_GPL(bdev_gfp_constraint);

The right fix here is to:

 - use memalloc_nofs_save in extet instead of using
   mapping_gfp_constraint to clear it from the mapping flags
 - remove __ext4_sb_bread_gfp and just have buffer.c helper that does
   the right thing (either by changing the calling conventions of an
   existing one, or adding a new one).

> +/*
> + * The del_gendisk() function uninitializes the disk-specific data
> + * structures, including the bdi structure, without telling anyone
> + * else.  Once this happens, any attempt to call mark_buffer_dirty()
> + * (for example, by ext4_commit_super), will cause a kernel OOPS.
> + * This is a kludge to prevent these oops until we can put in a proper
> + * hook in del_gendisk() to inform the VFS and file system layers.
> + */
> +int bdev_ejected(struct block_device *bdev)
> +{
> +	struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
> +
> +	return bdi->dev == NULL;
> +}
> +EXPORT_SYMBOL_GPL(bdev_ejected);

And this code in ext4 should just go away entirely.  The bdi should
always be valid for a live bdev for years.

> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1119,6 +1119,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
>  	WARN_ON_ONCE(off > UINT_MAX);
>  	__bio_add_page(bio, &folio->page, len, off);
>  }
> +EXPORT_SYMBOL_GPL(bio_add_folio_nofail);

How is this realted?  The export is fine, but really should be a
separate, well-documented commit.

>  
> +static inline u8 block_bits(struct block_device *bdev)
> +{
> +	return bdev->bd_inode->i_blkbits;
> +}

Not sure we should need this.  i_blkbits comes from the blocksize
the fs set, so it should have other ways to get at it.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-06  6:14   ` Christoph Hellwig
@ 2023-12-06  6:50     ` Yu Kuai
  2023-12-06  7:20       ` Christoph Hellwig
  2023-12-06 17:50     ` Theodore Ts'o
  1 sibling, 1 reply; 28+ messages in thread
From: Yu Kuai @ 2023-12-06  6:50 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/12/06 14:14, Christoph Hellwig 写道:
>> +void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
>> +			   pgoff_t end)
>> +{
>> +	invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
>> +}
>> +EXPORT_SYMBOL_GPL(invalidate_bdev_range);
> 
> All these could probably use kerneldoc comments.

Ok, and thanks for reviewing the patchset!
> 
> For this one I really don't like it existing at all, but we'll have to
> discuss that in the btrfs patch.
> 
>> +loff_t bdev_size(struct block_device *bdev)
>> +{
>> +	loff_t size;
>> +
>> +	spin_lock(&bdev->bd_size_lock);
>> +	size = i_size_read(bdev->bd_inode);
>> +	spin_unlock(&bdev->bd_size_lock);
>> +
>> +	return size;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_size);
> 
> No need for this one.  The callers can simply use bdev_nr_bytes.

Ok, I'll replace it with bdev_nr_bytes.
> 
>> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
>> +{
>> +	return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_read_folio);
>> +
>> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
>> +				  gfp_t gfp)
>> +{
>> +	return mapping_read_folio_gfp(bdev->bd_inode->i_mapping, index, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_read_folio_gfp);
> 
> I think we can just drop bdev_read_folio_gfp. Half of the callers simply
> pass GPK_KERNEL, and the other half passes GFP_NOFS and could just use
> memalloc_nofs_save().

I'm a litter confused, so there are 3 use cases:
1) use GFP_USER, default gfp from bdev_alloc.
2) use GFP_KERNEL
3) use GFP_NOFS

I understand that you're suggesting memalloc_nofs_save() to distinguish
2 and 3, but how can I distinguish 1?
> 
>> +void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
>> +{
>> +	return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);
> 
> Hmm, this is just used for block2mtd, and feels a little too low-level
> to me, as block2mtd really should be using the normal fileread/write
> APIs.  I guess we'll have to live with it for now if we want to expedite
> killing off bd_inode.
> 
>> +void bdev_correlate_mapping(struct block_device *bdev,
>> +			    struct address_space *mapping)
>> +{
>> +	mapping->host = bdev->bd_inode;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_correlate_mapping);
> 
> Maybe associated insted of correlate?  Either way this basically
> fully exposes the bdev inode again :(
> 
>> +gfp_t bdev_gfp_constraint(struct block_device *bdev, gfp_t gfp)
>> +{
>> +	return mapping_gfp_constraint(bdev->bd_inode->i_mapping, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_gfp_constraint);
> 
> The right fix here is to:
> 
>   - use memalloc_nofs_save in extet instead of using
>     mapping_gfp_constraint to clear it from the mapping flags
>   - remove __ext4_sb_bread_gfp and just have buffer.c helper that does
>     the right thing (either by changing the calling conventions of an
>     existing one, or adding a new one).

Thanks for the suggestions, but I'm not sure how to do this yet, I must
read more ext4 code.
> 
>> +/*
>> + * The del_gendisk() function uninitializes the disk-specific data
>> + * structures, including the bdi structure, without telling anyone
>> + * else.  Once this happens, any attempt to call mark_buffer_dirty()
>> + * (for example, by ext4_commit_super), will cause a kernel OOPS.
>> + * This is a kludge to prevent these oops until we can put in a proper
>> + * hook in del_gendisk() to inform the VFS and file system layers.
>> + */
>> +int bdev_ejected(struct block_device *bdev)
>> +{
>> +	struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
>> +
>> +	return bdi->dev == NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_ejected);
> 
> And this code in ext4 should just go away entirely.  The bdi should
> always be valid for a live bdev for years.
Sounds good, I was confused about this code as well.

> 
>> --- a/block/bio.c
>> +++ b/block/bio.c
>> @@ -1119,6 +1119,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
>>   	WARN_ON_ONCE(off > UINT_MAX);
>>   	__bio_add_page(bio, &folio->page, len, off);
>>   }
>> +EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
> 
> How is this realted?  The export is fine, but really should be a
> separate, well-documented commit.

This is used to replace __bio_add_page() in btrfs while converting page
to folio, please let me know if I should keep this, if so, I'll split
this into a new commit.
> 
>>   
>> +static inline u8 block_bits(struct block_device *bdev)
>> +{
>> +	return bdev->bd_inode->i_blkbits;
>> +}
> 
> Not sure we should need this.  i_blkbits comes from the blocksize
> the fs set, so it should have other ways to get at it.

Yes, this is now only used for erofs, and erofs do call
sb_set_blocksize() while initializing, hence it's right there is other
way to get blkbits and this helper is not needed.

Thanks,
Kuai

> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-06  5:55   ` Christoph Hellwig
@ 2023-12-06  6:56     ` Yu Kuai
  2023-12-06  7:21       ` Christoph Hellwig
  0 siblings, 1 reply; 28+ messages in thread
From: Yu Kuai @ 2023-12-06  6:56 UTC (permalink / raw)
  To: Christoph Hellwig, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-erofs, linux-ext4, gfs2, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2023/12/06 13:55, Christoph Hellwig 写道:
> On Tue, Dec 05, 2023 at 08:37:16PM +0800, Yu Kuai wrote:
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index e34219ea2b05..e645afa4af57 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>>   		xenbus_dev_error(blkif->be->dev, err, "block flush");
>>   		return;
>>   	}
>> -	invalidate_inode_pages2(
>> -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
>> +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
> 
> blkbak is a bdev exported.   I don't think it should ever call
> invalidate_inode_pages2, through a wrapper or not.

I'm not sure about this. I'm not familiar with xen/blkback, but I saw
that xen-blkback will open a bdev from xen_vbd_create(), hence this
looks like a dm/md for me, hence it sounds reasonable to sync +
invalidate the opened bdev while initialization. Please kindly correct
me if I'm wrong.

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-06  6:50     ` Yu Kuai
@ 2023-12-06  7:20       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06  7:20 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, axboe, roger.pau, colyli, kent.overstreet,
	joern, miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, nico,
	xiang, chao, tytso, adilger.kernel, agruenba, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs, linux-ext4,
	gfs2, linux-nilfs, yi.zhang, yangerkun, yukuai (C)

On Wed, Dec 06, 2023 at 02:50:56PM +0800, Yu Kuai wrote:
> I'm a litter confused, so there are 3 use cases:
> 1) use GFP_USER, default gfp from bdev_alloc.
> 2) use GFP_KERNEL
> 3) use GFP_NOFS
> 
> I understand that you're suggesting memalloc_nofs_save() to distinguish
> 2 and 3, but how can I distinguish 1?

You shouldn't.  Diverging from the default flags except for clearing
the FS or IO flags is simply a bug.  Note that things like block2mtd
should probably also ensure a noio allocation if they aren't doing that
yet.

> >   - use memalloc_nofs_save in extet instead of using
> >     mapping_gfp_constraint to clear it from the mapping flags
> >   - remove __ext4_sb_bread_gfp and just have buffer.c helper that does
> >     the right thing (either by changing the calling conventions of an
> >     existing one, or adding a new one).
> 
> Thanks for the suggestions, but I'm not sure how to do this yet, I must
> read more ext4 code.

the nofs save part should be trivial.  You can just skip the rest for
now as it's not needed for this patch series.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-06  6:56     ` Yu Kuai
@ 2023-12-06  7:21       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06  7:21 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Christoph Hellwig, axboe, roger.pau, colyli, kent.overstreet,
	joern, miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, nico,
	xiang, chao, tytso, adilger.kernel, agruenba, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs, linux-ext4,
	gfs2, linux-nilfs, yi.zhang, yangerkun, yukuai (C)

On Wed, Dec 06, 2023 at 02:56:05PM +0800, Yu Kuai wrote:
> > > -	invalidate_inode_pages2(
> > > -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> > > +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
> > 
> > blkbak is a bdev exported.   I don't think it should ever call
> > invalidate_inode_pages2, through a wrapper or not.
> 
> I'm not sure about this. I'm not familiar with xen/blkback, but I saw
> that xen-blkback will open a bdev from xen_vbd_create(), hence this
> looks like a dm/md for me, hence it sounds reasonable to sync +
> invalidate the opened bdev while initialization. Please kindly correct
> me if I'm wrong.

I guess we have enough precedence for this, so the switchover here
isn't wrong.  But all this invalidating of the bdev cache seems to
be asking for trouble.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
  2023-12-05 17:03   ` Bart Van Assche
  2023-12-06  6:14   ` Christoph Hellwig
@ 2023-12-06 14:58   ` Matthew Wilcox
  2023-12-07  2:45     ` Yu Kuai
  2 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2023-12-06 14:58 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, akpm, hare,
	p.raghav, linux-block, linux-kernel, xen-devel, linux-bcache,
	linux-mtd, linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-erofs, linux-ext4, gfs2, linux-nilfs, yukuai3, yi.zhang,
	yangerkun

On Tue, Dec 05, 2023 at 08:37:15PM +0800, Yu Kuai wrote:
> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
> +{
> +	return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
> +}
> +EXPORT_SYMBOL_GPL(bdev_read_folio);

I'm coming to the opinion that 'index' is the wrong parameter here.
Looking through all the callers of bdev_read_folio() in this patchset,
they all have a position in bytes, and they all convert it to
index for this call.  The API should probably be:

struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
{
	return read_mapping_folio(bdev->bd_inode->i_mapping,
			pos / PAGE_SIZE, NULL);
}

... and at some point, we'll get round to converting read_mapping_folio()
to take its argument in loff_t.

Similiarly for these two APIs:

> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
> +				  gfp_t gfp)
> +struct folio *bdev_get_folio(struct block_device *bdev, pgoff_t index)

> +struct folio *bdev_find_or_create_folio(struct block_device *bdev,
> +					pgoff_t index, gfp_t gfp)
> +{
> +	return __filemap_get_folio(bdev->bd_inode->i_mapping, index,
> +				   FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
> +}
> +EXPORT_SYMBOL_GPL(bdev_find_or_create_folio);

This one probably shouldn't exist.  I've been converting callers of
find_or_create_page() to call __filemap_get_folio; I suspect we
should expose a __bdev_get_folio and have the callers use the FGP
arguments directly, but I'm open to other opinions here.

> +void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
> +			 struct file *file, pgoff_t index,
> +			 unsigned long req_count)
> +{
> +	struct file_ra_state tmp_ra = {};
> +
> +	if (!ra) {
> +		ra = &tmp_ra;
> +		file_ra_state_init(ra, bdev->bd_inode->i_mapping);
> +	}
> +	page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index,
> +				  req_count);
> +}

I think the caller should always be passing in a valid file_ra_state.
It's only cramfs that doesn't have one, and it really should!
Not entirely sure about the arguments here; part of me says "bytes",
but this is weird enough to maybe take arguments in pages.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-06  6:14   ` Christoph Hellwig
  2023-12-06  6:50     ` Yu Kuai
@ 2023-12-06 17:50     ` Theodore Ts'o
  2023-12-06 17:57       ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Theodore Ts'o @ 2023-12-06 17:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, nico,
	xiang, chao, adilger.kernel, agruenba, jack, konishi.ryusuke,
	willy, akpm, hare, p.raghav, linux-block, linux-kernel,
	xen-devel, linux-bcache, linux-mtd, linux-s390, linux-scsi,
	linux-bcachefs, linux-btrfs, linux-erofs, linux-ext4, gfs2,
	linux-nilfs, yukuai3, yi.zhang, yangerkun

On Tue, Dec 05, 2023 at 10:14:00PM -0800, Christoph Hellwig wrote:
> > +/*
> > + * The del_gendisk() function uninitializes the disk-specific data
> > + * structures, including the bdi structure, without telling anyone
> > + * else.  Once this happens, any attempt to call mark_buffer_dirty()
> > + * (for example, by ext4_commit_super), will cause a kernel OOPS.
> > + * This is a kludge to prevent these oops until we can put in a proper
> > + * hook in del_gendisk() to inform the VFS and file system layers.
> > + */
> > +int bdev_ejected(struct block_device *bdev)
> > +{
> > +	struct backing_dev_info *bdi = inode_to_bdi(bdev->bd_inode);
> > +
> > +	return bdi->dev == NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(bdev_ejected);
> 
> And this code in ext4 should just go away entirely.  The bdi should
> always be valid for a live bdev for years.

This was added because pulling a mounted a USB thumb drive (or a HDD
drops off the SATA bus) while the file system is mounted and actively
in use, would result in a kernel OOPS.  If that's no longer true,
that's great, but it would be good to test to make sure this is the
case....

If we really want to remove it, I'd suggest doing this as a separate
commit, so that after we see syzbot reports, or users complaining
about kernel crashes, we can revert the removal if necessary.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-06 17:50     ` Theodore Ts'o
@ 2023-12-06 17:57       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2023-12-06 17:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Christoph Hellwig, Yu Kuai, axboe, roger.pau, colyli,
	kent.overstreet, joern, miquel.raynal, richard, vigneshr, sth,
	hoeppner, hca, gor, agordeev, jejb, martin.petersen, clm, josef,
	dsterba, nico, xiang, chao, adilger.kernel, agruenba, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-erofs, linux-ext4,
	gfs2, linux-nilfs, yukuai3, yi.zhang, yangerkun

On Wed, Dec 06, 2023 at 12:50:38PM -0500, Theodore Ts'o wrote:
> This was added because pulling a mounted a USB thumb drive (or a HDD
> drops off the SATA bus) while the file system is mounted and actively
> in use, would result in a kernel OOPS.  If that's no longer true,
> that's great, but it would be good to test to make sure this is the
> case....

And, surprise, surprise - that didn't just affect ext4.  So I ended
up fixing this properly in the block layer.

> If we really want to remove it, I'd suggest doing this as a separate
> commit, so that after we see syzbot reports, or users complaining
> about kernel crashes, we can revert the removal if necessary.

Yes, this should of course be separate, well documented commit.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -next RFC 01/14] block: add some bdev apis
  2023-12-06 14:58   ` Matthew Wilcox
@ 2023-12-07  2:45     ` Yu Kuai
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Kuai @ 2023-12-07  2:45 UTC (permalink / raw)
  To: Matthew Wilcox, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, nico, xiang, chao, tytso,
	adilger.kernel, agruenba, jack, konishi.ryusuke, akpm, hare,
	p.raghav, linux-block, linux-kernel, xen-devel, linux-bcache,
	linux-mtd, linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-erofs, linux-ext4, gfs2, linux-nilfs, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2023/12/06 22:58, Matthew Wilcox 写道:
> On Tue, Dec 05, 2023 at 08:37:15PM +0800, Yu Kuai wrote:
>> +struct folio *bdev_read_folio(struct block_device *bdev, pgoff_t index)
>> +{
>> +	return read_mapping_folio(bdev->bd_inode->i_mapping, index, NULL);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_read_folio);
> 
> I'm coming to the opinion that 'index' is the wrong parameter here.
> Looking through all the callers of bdev_read_folio() in this patchset,
> they all have a position in bytes, and they all convert it to
> index for this call.  The API should probably be:
> 
> struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
> {
> 	return read_mapping_folio(bdev->bd_inode->i_mapping,
> 			pos / PAGE_SIZE, NULL);
> }

Thanks for reviewing this patchset! Okay, I'll convert to pass in "pos"
in v2.
> 
> ... and at some point, we'll get round to converting read_mapping_folio()
> to take its argument in loff_t.
> 
> Similiarly for these two APIs:
> 
>> +struct folio *bdev_read_folio_gfp(struct block_device *bdev, pgoff_t index,
>> +				  gfp_t gfp)
>> +struct folio *bdev_get_folio(struct block_device *bdev, pgoff_t index)
> 
>> +struct folio *bdev_find_or_create_folio(struct block_device *bdev,
>> +					pgoff_t index, gfp_t gfp)
>> +{
>> +	return __filemap_get_folio(bdev->bd_inode->i_mapping, index,
>> +				   FGP_LOCK | FGP_ACCESSED | FGP_CREAT, gfp);
>> +}
>> +EXPORT_SYMBOL_GPL(bdev_find_or_create_folio);
> 
> This one probably shouldn't exist.  I've been converting callers of
> find_or_create_page() to call __filemap_get_folio; I suspect we
> should expose a __bdev_get_folio and have the callers use the FGP
> arguments directly, but I'm open to other opinions here.

If nobody against this, I will expose single __bdev_get_folio() to use
in v2.
> 
>> +void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
>> +			 struct file *file, pgoff_t index,
>> +			 unsigned long req_count)
>> +{
>> +	struct file_ra_state tmp_ra = {};
>> +
>> +	if (!ra) {
>> +		ra = &tmp_ra;
>> +		file_ra_state_init(ra, bdev->bd_inode->i_mapping);
>> +	}
>> +	page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index,
>> +				  req_count);
>> +}
> 
> I think the caller should always be passing in a valid file_ra_state.
> It's only cramfs that doesn't have one, and it really should!
> Not entirely sure about the arguments here; part of me says "bytes",
> but this is weird enough to maybe take arguments in pages.

In fact, bdev_sync_readahead() is only called for cramfs and ext4.

For ext4 it's used in ext4_readdir() so there is valid file_ra_state.

Hoever, for cramfs it's used in cramfs_read(), and cramfs_read() is used
for:

1) cramfs_read_folio
2) cramfs_readdir
3) cramfs_lookup
4) cramfs_read_super

Looks like it's easy to pass in valid file_ra_state() for 1) and 2),
however, I don't see an easy way to do this for 3) and 4).

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-12-07  2:45 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-05 12:37 [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 01/14] block: add some bdev apis Yu Kuai
2023-12-05 17:03   ` Bart Van Assche
2023-12-06  6:14   ` Christoph Hellwig
2023-12-06  6:50     ` Yu Kuai
2023-12-06  7:20       ` Christoph Hellwig
2023-12-06 17:50     ` Theodore Ts'o
2023-12-06 17:57       ` Christoph Hellwig
2023-12-06 14:58   ` Matthew Wilcox
2023-12-07  2:45     ` Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 02/14] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
2023-12-06  5:55   ` Christoph Hellwig
2023-12-06  6:56     ` Yu Kuai
2023-12-06  7:21       ` Christoph Hellwig
2023-12-05 12:37 ` [PATCH -next RFC 03/14] bcache: use bdev api in read_super() Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 04/14] mtd: block2mtd: use bdev apis Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 05/14] s390/dasd: use bdev api in dasd_format() Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 06/14] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 07/14] bcachefs: remove dead function bdev_sectors() Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 08/14] btrfs: use bdev apis Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 09/14] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 10/14] erofs: use bdev api Yu Kuai
2023-12-05 12:37 ` [PATCH -next RFC 11/14] ext4: use bdev apis Yu Kuai
2023-12-05 12:38 ` [PATCH -next RFC 12/14] jbd2: " Yu Kuai
2023-12-05 12:39 ` [PATCH -next RFC 13/14] gfs2: use bdev api Yu Kuai
2023-12-05 12:39 ` [PATCH -next RFC 14/14] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
2023-12-06  5:54 ` [PATCH -next RFC 00/14] block: don't access bd_inode directly from other modules Christoph Hellwig
2023-12-06  6:06   ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).