linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules
@ 2023-12-21  8:56 Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis Yu Kuai
                   ` (16 more replies)
  0 siblings, 17 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:56 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Changes in v3:
 - remove bdev_associated_mapping() and patch 12 from v1;
 - add kerneldoc comments for new bdev apis;
 - rename __bdev_get_folio() to bdev_get_folio;
 - fix a problem in erofs that erofs_init_metabuf() is not always
 called.
 - add reviewed-by tag for patch 15-17;
Changes in v2:
 - remove some bdev apis that is not necessary;
 - pass in offset for bdev_read_folio() and __bdev_get_folio();
 - remove bdev_gfp_constraint() and add a new helper in fs/buffer.c to
 prevent access bd_indoe() directly from mapping_gfp_constraint() in
 ext4.(patch 15, 16);
 - remove block_device_ejected() from ext4.


Patch 1 add some bdev apis, then follow up patches will use these apis
to avoid access bd_inode directly, and hopefully the field bd_inode can
be removed eventually(after figure out a way for fs/buffer.c).

Yu Kuai (17):
  block: add some bdev apis
  xen/blkback: use bdev api in xen_update_blkif_status()
  bcache: use bdev api in read_super()
  mtd: block2mtd: use bdev apis
  s390/dasd: use bdev api in dasd_format()
  scsicam: use bdev api in scsi_bios_ptable()
  bcachefs: remove dead function bdev_sectors()
  bio: export bio_add_folio_nofail()
  btrfs: use bdev apis
  cramfs: use bdev apis in cramfs_blkdev_read()
  erofs: use bdev api
  nilfs2: use bdev api in nilfs_attach_log_writer()
  jbd2: use bdev apis
  buffer: add a new helper to read sb block
  ext4: use new helper to read sb block
  ext4: remove block_device_ejected()
  ext4: use bdev apis

 block/bdev.c                       | 148 +++++++++++++++++++++++++++++
 block/bio.c                        |   1 +
 block/blk.h                        |   2 -
 drivers/block/xen-blkback/xenbus.c |   3 +-
 drivers/md/bcache/super.c          |  11 +--
 drivers/mtd/devices/block2mtd.c    |  81 +++++++---------
 drivers/s390/block/dasd_ioctl.c    |   5 +-
 drivers/scsi/scsicam.c             |   4 +-
 fs/bcachefs/util.h                 |   5 -
 fs/btrfs/disk-io.c                 |  71 +++++++-------
 fs/btrfs/volumes.c                 |  17 ++--
 fs/btrfs/zoned.c                   |  15 +--
 fs/buffer.c                        |  68 +++++++++----
 fs/cramfs/inode.c                  |  36 +++----
 fs/erofs/data.c                    |  18 ++--
 fs/erofs/internal.h                |   2 +
 fs/ext4/dir.c                      |   6 +-
 fs/ext4/ext4.h                     |  13 ---
 fs/ext4/ext4_jbd2.c                |   6 +-
 fs/ext4/inode.c                    |   8 +-
 fs/ext4/super.c                    |  66 +++----------
 fs/ext4/symlink.c                  |   2 +-
 fs/jbd2/journal.c                  |   3 +-
 fs/jbd2/recovery.c                 |   6 +-
 fs/nilfs2/segment.c                |   2 +-
 include/linux/blkdev.h             |  17 ++++
 include/linux/buffer_head.h        |  18 +++-
 27 files changed, 377 insertions(+), 257 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
@ 2023-12-21  8:56 ` Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:56 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Those apis will be used for other modules, so that bd_inode won't be
accessed directly from other modules.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bdev.c           | 148 +++++++++++++++++++++++++++++++++++++++++
 block/blk.h            |   2 -
 include/linux/blkdev.h |  17 +++++
 3 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 750aec178b6a..6204621c6db6 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -89,6 +89,25 @@ void invalidate_bdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(invalidate_bdev);
 
+/**
+ * invalidate_bdev_pages - Invalidate clean unused buffers and pagecache.
+ * @bdev: the block device which holds the cache to invalidate
+ * @start: the offset 'from' which to invalidate
+ * @end: the offset 'to' which to invalidate (inclusive)
+ *
+ * This function removes pages that are clean, unmapped and unlocked,
+ * as well as shadow entries. It will not block on IO activity.
+ *
+ * If you want to remove all the pages of one block device, regardless of
+ * their use and writeback state, use truncate_bdev_range().
+ */
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+			   pgoff_t end)
+{
+	invalidate_mapping_pages(bdev->bd_inode->i_mapping, start, end);
+}
+EXPORT_SYMBOL_GPL(invalidate_bdev_range);
+
 /*
  * Drop all buffers & page cache for given bdev range. This function bails
  * with error if bdev has other exclusive owner (such as filesystem).
@@ -121,6 +140,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
 					     lstart >> PAGE_SHIFT,
 					     lend >> PAGE_SHIFT);
 }
+EXPORT_SYMBOL_GPL(truncate_bdev_range);
 
 static void set_init_blocksize(struct block_device *bdev)
 {
@@ -1102,3 +1122,131 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 
 	blkdev_put_no_open(bdev);
 }
+
+/**
+ * bdev_read_folio - Read into block device page cache.
+ * @bdev: the block device which holds the cache to read.
+ * @pos: the offset that allocated folio will contain.
+ *
+ * Read one page into the block device page cache. If it succeeds, the folio
+ * returned will contain @pos;
+ *
+ * Return: Uptodate folio on success, ERR_PTR() on failure.
+ */
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos)
+{
+	return mapping_read_folio_gfp(bdev->bd_inode->i_mapping,
+				      pos >> PAGE_SHIFT, GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(bdev_read_folio);
+
+/**
+ * bdev_get_folio - Find and get a reference to a folio.
+ * @bdev: the block device which holds the address_space to search.
+ * @pos: the offset the returned folio will contain.
+ * @fgp_flags: %FGP flags modify how the folio is returned.
+ * @gfp: Memory allocation flags to use if %FGP_CREAT is specified.
+ *
+ * Looks up the page cache entry at @bdev->bd_inode->i_mapping from @pos. If
+ * this function returns a folio, it is returned with an increased refcount.
+ *
+ * Return: The found folio or an ERR_PTR() otherwise.
+ */
+struct folio *bdev_get_folio(struct block_device *bdev, loff_t pos,
+			     fgf_t fgp_flags, gfp_t gfp)
+{
+	return __filemap_get_folio(bdev->bd_inode->i_mapping, pos >> PAGE_SHIFT,
+				   fgp_flags, gfp);
+}
+EXPORT_SYMBOL_GPL(bdev_get_folio);
+
+/**
+ * bdev_wb_err_check - Has block device writeback error occurred?
+ * @bdev: the block device to check.
+ * @since: Previously-sampled @bdev->bd_inode->i_mapping->wb_err.
+ *
+ * Grab @bdev->bd_inode->i_mapping->wb_err, and see if it has changed @since
+ * the given value was sampled.
+ *
+ * Return: The latest error or 0 if it hasn't changed.
+ */
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since)
+{
+	return errseq_check(&bdev->bd_inode->i_mapping->wb_err, since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check);
+
+/**
+ * bdev_wb_err_check_and_advance() - Check block device writeback error and
+ * advance to current value.
+ * @bdev: the block device to check;
+ * @since: Pointer to previously-sampled @bdev->bd_inode->i_mapping->wb_err to
+ * check against and advance.
+ *
+ * Grab @bdev->bd_inode->i_mapping->wb_err, and see whether it matches the
+ * value that @since points to. If it does, then just return 0; If it doesn't,
+ * then the value has changed. Set the "seen" flag, and try to swap it into
+ * place as the new eseq value. Then, set that value as the new @since value,
+ * and return whatever the error portion is set to.
+ *
+ * Return: Negative errno if one has been stored, or 0 if no new error has
+ * occurred.
+ */
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since)
+{
+	return errseq_check_and_advance(&bdev->bd_inode->i_mapping->wb_err,
+					since);
+}
+EXPORT_SYMBOL_GPL(bdev_wb_err_check_and_advance);
+
+/**
+ * bdev_balance_dirty_pages_ratelimited - balance dirty memory state.
+ * @bdev: the block device which was dirtied.
+ *
+ * Check the system's dirty state and will initiate writeback if needed.
+ */
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev)
+{
+	return balance_dirty_pages_ratelimited(bdev->bd_inode->i_mapping);
+}
+EXPORT_SYMBOL_GPL(bdev_balance_dirty_pages_ratelimited);
+
+/**
+ * bdev_async_readahead - readahead for marked block device pages
+ * @bdev: the block device to read.
+ * @ra: file_ra_state which holds the readahead state.
+ * @file: Used by the filesystem for authentication.
+ * @index: Index of first page to be read.
+ * @req_count: Total number of pages being read by the caller.
+ *
+ * Read multiple pages into the block device page cache. The readahead logic may
+ * decide to piggyback more pages onto the read request if access patterns
+ * suggest it will improve performance.
+ */
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+			 struct file *file, pgoff_t index,
+			 unsigned long req_count)
+{
+	struct file_ra_state tmp_ra = {};
+
+	if (!ra) {
+		ra = &tmp_ra;
+		file_ra_state_init(ra, bdev->bd_inode->i_mapping);
+	}
+	page_cache_sync_readahead(bdev->bd_inode->i_mapping, ra, file, index,
+				  req_count);
+}
+EXPORT_SYMBOL_GPL(bdev_sync_readahead);
+
+/**
+ * bdev_attach_wb - associate an block device with its wb
+ * @bdev: block device of interest
+ *
+ * If @bdev->bd_inode doesn't have its wb, associate it with the wb matching the
+ * %current.
+ */
+void bdev_attach_wb(struct block_device *bdev)
+{
+	inode_attach_wb(bdev->bd_inode, NULL);
+}
+EXPORT_SYMBOL_GPL(bdev_attach_wb);
diff --git a/block/blk.h b/block/blk.h
index 1ef920f72e0f..ce5fcd927eaa 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -465,8 +465,6 @@ extern struct device_attribute dev_attr_events_poll_msecs;
 extern struct attribute_group blk_trace_attr_group;
 
 blk_mode_t file_to_blk_mode(struct file *file);
-int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
-		loff_t lstart, loff_t lend);
 long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index bc236e77d85e..ccac7d32bb86 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/sbitmap.h>
 #include <linux/uuid.h>
 #include <linux/xarray.h>
+#include <linux/pagemap.h>
 
 struct module;
 struct request_queue;
@@ -1475,6 +1476,22 @@ struct block_device *blkdev_get_no_open(dev_t dev);
 void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
+void invalidate_bdev_range(struct block_device *bdev, pgoff_t start,
+			   pgoff_t end);
+int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
+		loff_t lstart, loff_t lend);
+struct folio *bdev_read_folio(struct block_device *bdev, loff_t pos);
+struct folio *bdev_get_folio(struct block_device *bdev, loff_t pos,
+			     fgf_t fgp_flags, gfp_t gfp);
+int bdev_wb_err_check(struct block_device *bdev, errseq_t since);
+int bdev_wb_err_check_and_advance(struct block_device *bdev, errseq_t *since);
+void bdev_balance_dirty_pages_ratelimited(struct block_device *bdev);
+void bdev_sync_readahead(struct block_device *bdev, struct file_ra_state *ra,
+			 struct file *file, pgoff_t index,
+			 unsigned long req_count);
+void bdev_attach_wb(struct block_device *bdev);
+void bdev_associated_mapping(struct block_device *bdev,
+			     struct address_space *mapping);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis Yu Kuai
@ 2023-12-21  8:56 ` Yu Kuai
  2024-01-04 11:06   ` Jan Kara
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 03/17] bcache: use bdev api in read_super() Yu Kuai
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:56 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/block/xen-blkback/xenbus.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e34219ea2b05..e645afa4af57 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
 		xenbus_dev_error(blkif->be->dev, err, "block flush");
 		return;
 	}
-	invalidate_inode_pages2(
-			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
+	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
 
 	for (i = 0; i < blkif->nr_rings; i++) {
 		ring = &blkif->rings[i];
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 03/17] bcache: use bdev api in read_super()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
@ 2023-12-21  8:56 ` Yu Kuai
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis Yu Kuai
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:56 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/bcache/super.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index bfe1685dbae5..23892b32c582 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -168,14 +168,13 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 {
 	const char *err;
 	struct cache_sb_disk *s;
-	struct page *page;
+	struct folio *folio;
 	unsigned int i;
 
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
-				   SB_OFFSET >> PAGE_SHIFT, GFP_KERNEL);
-	if (IS_ERR(page))
+	folio = bdev_read_folio(bdev, SB_OFFSET);
+	if (IS_ERR(folio))
 		return "IO error";
-	s = page_address(page) + offset_in_page(SB_OFFSET);
+	s = folio_address(folio) + offset_in_folio(folio, SB_OFFSET);
 
 	sb->offset		= le64_to_cpu(s->offset);
 	sb->version		= le64_to_cpu(s->version);
@@ -272,7 +271,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
 	*res = s;
 	return NULL;
 err:
-	put_page(page);
+	folio_put(folio);
 	return err;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (2 preceding siblings ...)
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 03/17] bcache: use bdev api in read_super() Yu Kuai
@ 2023-12-21  8:56 ` Yu Kuai
  2024-01-04 11:28   ` Jan Kara
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 05/17] s390/dasd: use bdev api in dasd_format() Yu Kuai
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:56 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/mtd/devices/block2mtd.c | 81 +++++++++++++++------------------
 1 file changed, 36 insertions(+), 45 deletions(-)

diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index aa44a23ec045..cf201bf73184 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -46,40 +46,34 @@ struct block2mtd_dev {
 /* Static info about the MTD, used in cleanup_module */
 static LIST_HEAD(blkmtd_device_list);
 
-
-static struct page *page_read(struct address_space *mapping, pgoff_t index)
-{
-	return read_mapping_page(mapping, index, NULL);
-}
-
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
-	struct page *page;
+	struct block_device *bdev = dev->bdev_handle->bdev;
+	struct folio *folio;
 	pgoff_t index = to >> PAGE_SHIFT;	// page index
 	int pages = len >> PAGE_SHIFT;
 	u_long *p;
 	u_long *max;
 
 	while (pages) {
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(bdev, index << PAGE_SHIFT);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		max = page_address(page) + PAGE_SIZE;
-		for (p=page_address(page); p<max; p++)
+		max = folio_address(folio) + folio_size(folio);
+		for (p = folio_address(folio); p < max; p++)
 			if (*p != -1UL) {
-				lock_page(page);
-				memset(page_address(page), 0xff, PAGE_SIZE);
-				set_page_dirty(page);
-				unlock_page(page);
-				balance_dirty_pages_ratelimited(mapping);
+				folio_lock(folio);
+				memset(folio_address(folio), 0xff,
+				       folio_size(folio));
+				folio_mark_dirty(folio);
+				folio_unlock(folio);
+				bdev_balance_dirty_pages_ratelimited(bdev);
 				break;
 			}
 
-		put_page(page);
+		folio_put(folio);
 		pages--;
 		index++;
 	}
@@ -106,9 +100,7 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 		size_t *retlen, u_char *buf)
 {
 	struct block2mtd_dev *dev = mtd->priv;
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
-	struct page *page;
+	struct folio *folio;
 	pgoff_t index = from >> PAGE_SHIFT;
 	int offset = from & (PAGE_SIZE-1);
 	int cpylen;
@@ -120,12 +112,13 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 			cpylen = len;	// this page
 		len = len - cpylen;
 
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(dev->bdev_handle->bdev,
+					index << PAGE_SHIFT);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		memcpy(buf, page_address(page) + offset, cpylen);
-		put_page(page);
+		memcpy(buf, folio_address(folio) + offset, cpylen);
+		folio_put(folio);
 
 		if (retlen)
 			*retlen += cpylen;
@@ -141,9 +134,8 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
 static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 		loff_t to, size_t len, size_t *retlen)
 {
-	struct page *page;
-	struct address_space *mapping =
-				dev->bdev_handle->bdev->bd_inode->i_mapping;
+	struct block_device *bdev = dev->bdev_handle->bdev;
+	struct folio *folio;
 	pgoff_t index = to >> PAGE_SHIFT;	// page index
 	int offset = to & ~PAGE_MASK;	// page offset
 	int cpylen;
@@ -155,18 +147,18 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
 			cpylen = len;			// this page
 		len = len - cpylen;
 
-		page = page_read(mapping, index);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
+		folio = bdev_read_folio(bdev, index << PAGE_SHIFT);
+		if (IS_ERR(folio))
+			return PTR_ERR(folio);
 
-		if (memcmp(page_address(page)+offset, buf, cpylen)) {
-			lock_page(page);
-			memcpy(page_address(page) + offset, buf, cpylen);
-			set_page_dirty(page);
-			unlock_page(page);
-			balance_dirty_pages_ratelimited(mapping);
+		if (memcmp(folio_address(folio) + offset, buf, cpylen)) {
+			folio_lock(folio);
+			memcpy(folio_address(folio) + offset, buf, cpylen);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+			bdev_balance_dirty_pages_ratelimited(bdev);
 		}
-		put_page(page);
+		folio_put(folio);
 
 		if (retlen)
 			*retlen += cpylen;
@@ -211,8 +203,7 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
 	kfree(dev->mtd.name);
 
 	if (dev->bdev_handle) {
-		invalidate_mapping_pages(
-			dev->bdev_handle->bdev->bd_inode->i_mapping, 0, -1);
+		invalidate_bdev(dev->bdev_handle->bdev);
 		bdev_release(dev->bdev_handle);
 	}
 
@@ -295,7 +286,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 		goto err_free_block2mtd;
 	}
 
-	if ((long)bdev->bd_inode->i_size % erase_size) {
+	if (bdev_nr_bytes(bdev) % erase_size) {
 		pr_err("erasesize must be a divisor of device size\n");
 		goto err_free_block2mtd;
 	}
@@ -313,7 +304,7 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
 
 	dev->mtd.name = name;
 
-	dev->mtd.size = bdev->bd_inode->i_size & PAGE_MASK;
+	dev->mtd.size = bdev_nr_bytes(bdev) & PAGE_MASK;
 	dev->mtd.erasesize = erase_size;
 	dev->mtd.writesize = 1;
 	dev->mtd.writebufsize = PAGE_SIZE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 05/17] s390/dasd: use bdev api in dasd_format()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (3 preceding siblings ...)
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 06/17] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/s390/block/dasd_ioctl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/block/dasd_ioctl.c b/drivers/s390/block/dasd_ioctl.c
index 61b9675e2a67..bbfb958237e6 100644
--- a/drivers/s390/block/dasd_ioctl.c
+++ b/drivers/s390/block/dasd_ioctl.c
@@ -221,8 +221,9 @@ dasd_format(struct dasd_block *block, struct format_data_t *fdata)
 	 * enabling the device later.
 	 */
 	if (fdata->start_unit == 0) {
-		block->gdp->part0->bd_inode->i_blkbits =
-			blksize_bits(fdata->blksize);
+		rc = set_blocksize(block->gdp->part0, fdata->blksize);
+		if (rc)
+			return rc;
 	}
 
 	rc = base->discipline->format_device(base, fdata, 1);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 06/17] scsicam: use bdev api in scsi_bios_ptable()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (4 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 05/17] s390/dasd: use bdev api in dasd_format() Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors() Yu Kuai
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_devcie.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/scsi/scsicam.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/scsi/scsicam.c b/drivers/scsi/scsicam.c
index e2c7d8ef205f..9617d70c0ed1 100644
--- a/drivers/scsi/scsicam.c
+++ b/drivers/scsi/scsicam.c
@@ -32,11 +32,9 @@
  */
 unsigned char *scsi_bios_ptable(struct block_device *dev)
 {
-	struct address_space *mapping = bdev_whole(dev)->bd_inode->i_mapping;
 	unsigned char *res = NULL;
-	struct folio *folio;
+	struct folio *folio = bdev_read_folio(bdev_whole(dev), 0);
 
-	folio = read_mapping_folio(mapping, 0, NULL);
 	if (IS_ERR(folio))
 		return NULL;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (5 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 06/17] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2024-04-11 17:49   ` Kent Overstreet
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 08/17] bio: export bio_add_folio_nofail() Yu Kuai
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

bdev_sectors() is not used hence remove it.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/bcachefs/util.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
index 2984b57b2958..22a0acc1704f 100644
--- a/fs/bcachefs/util.h
+++ b/fs/bcachefs/util.h
@@ -516,11 +516,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
 void bch2_bio_map(struct bio *bio, void *base, size_t);
 int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
 
-static inline sector_t bdev_sectors(struct block_device *bdev)
-{
-	return bdev->bd_inode->i_size >> 9;
-}
-
 #define closure_bio_submit(bio, cl)					\
 do {									\
 	closure_get(cl);						\
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 08/17] bio: export bio_add_folio_nofail()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (6 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis Yu Kuai
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Currently btrfs is using __bio_add_page() in write_dev_supers(). In order
to convert to use folio for bdev in btrfs, export bio_add_folio_nofail()
so that it can replace __bio_add_page().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/bio.c b/block/bio.c
index b9642a41f286..c7459839ca40 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1122,6 +1122,7 @@ void bio_add_folio_nofail(struct bio *bio, struct folio *folio, size_t len,
 	WARN_ON_ONCE(off > UINT_MAX);
 	__bio_add_page(bio, &folio->page, len, off);
 }
+EXPORT_SYMBOL_GPL(bio_add_folio_nofail);
 
 /**
  * bio_add_folio - Attempt to add part of a folio to a bio.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (7 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 08/17] bio: export bio_add_folio_nofail() Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2023-12-23 17:31   ` Matthew Wilcox
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 10/17] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/btrfs/disk-io.c | 71 +++++++++++++++++++++-------------------------
 fs/btrfs/volumes.c | 17 ++++++-----
 fs/btrfs/zoned.c   | 15 +++++-----
 3 files changed, 48 insertions(+), 55 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 401ea09ae4b8..a1cfdee99a81 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3620,28 +3620,24 @@ ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 static void btrfs_end_super_write(struct bio *bio)
 {
 	struct btrfs_device *device = bio->bi_private;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
-	struct page *page;
-
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		page = bvec->bv_page;
+	struct folio_iter fi;
 
+	bio_for_each_folio_all(fi, bio) {
 		if (bio->bi_status) {
 			btrfs_warn_rl_in_rcu(device->fs_info,
 				"lost page write due to IO error on %s (%d)",
 				btrfs_dev_name(device),
 				blk_status_to_errno(bio->bi_status));
-			ClearPageUptodate(page);
-			SetPageError(page);
+			folio_clear_uptodate(fi.folio);
+			folio_set_error(fi.folio);
 			btrfs_dev_stat_inc_and_print(device,
 						     BTRFS_DEV_STAT_WRITE_ERRS);
 		} else {
-			SetPageUptodate(page);
+			folio_mark_uptodate(fi.folio);
 		}
 
-		put_page(page);
-		unlock_page(page);
+		folio_put(fi.folio);
+		folio_unlock(fi.folio);
 	}
 
 	bio_put(bio);
@@ -3651,9 +3647,9 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 						   int copy_num, bool drop_cache)
 {
 	struct btrfs_super_block *super;
-	struct page *page;
+	struct folio *folio;
 	u64 bytenr, bytenr_orig;
-	struct address_space *mapping = bdev->bd_inode->i_mapping;
+	unsigned int nofs_flag;
 	int ret;
 
 	bytenr_orig = btrfs_sb_offset(copy_num);
@@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 		 * Drop the page of the primary superblock, so later read will
 		 * always read from the device.
 		 */
-		invalidate_inode_pages2_range(mapping,
-				bytenr >> PAGE_SHIFT,
+		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
 				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
 	}
 
-	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
-	if (IS_ERR(page))
-		return ERR_CAST(page);
+	nofs_flag = memalloc_nofs_save();
+	folio = bdev_read_folio(bdev, bytenr);
+	memalloc_nofs_restore(nofs_flag);
+	if (IS_ERR(folio))
+		return ERR_CAST(folio);
 
-	super = page_address(page);
+	super = folio_address(folio);
 	if (btrfs_super_magic(super) != BTRFS_MAGIC) {
 		btrfs_release_disk_super(super);
 		return ERR_PTR(-ENODATA);
@@ -3740,7 +3737,6 @@ static int write_dev_supers(struct btrfs_device *device,
 			    struct btrfs_super_block *sb, int max_mirrors)
 {
 	struct btrfs_fs_info *fs_info = device->fs_info;
-	struct address_space *mapping = device->bdev->bd_inode->i_mapping;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
 	int i;
 	int errors = 0;
@@ -3753,7 +3749,7 @@ static int write_dev_supers(struct btrfs_device *device,
 	shash->tfm = fs_info->csum_shash;
 
 	for (i = 0; i < max_mirrors; i++) {
-		struct page *page;
+		struct folio *folio;
 		struct bio *bio;
 		struct btrfs_super_block *disk_super;
 
@@ -3778,9 +3774,10 @@ static int write_dev_supers(struct btrfs_device *device,
 				    BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE,
 				    sb->csum);
 
-		page = find_or_create_page(mapping, bytenr >> PAGE_SHIFT,
-					   GFP_NOFS);
-		if (!page) {
+		folio = bdev_get_folio(device->bdev, bytenr,
+				       FGP_LOCK|FGP_ACCESSED|FGP_CREAT,
+				       GFP_NOFS);
+		if (IS_ERR(folio)) {
 			btrfs_err(device->fs_info,
 			    "couldn't get super block page for bytenr %llu",
 			    bytenr);
@@ -3789,9 +3786,9 @@ static int write_dev_supers(struct btrfs_device *device,
 		}
 
 		/* Bump the refcount for wait_dev_supers() */
-		get_page(page);
+		folio_get(folio);
 
-		disk_super = page_address(page);
+		disk_super = folio_address(folio);
 		memcpy(disk_super, sb, BTRFS_SUPER_INFO_SIZE);
 
 		/*
@@ -3805,8 +3802,8 @@ static int write_dev_supers(struct btrfs_device *device,
 		bio->bi_iter.bi_sector = bytenr >> SECTOR_SHIFT;
 		bio->bi_private = device;
 		bio->bi_end_io = btrfs_end_super_write;
-		__bio_add_page(bio, page, BTRFS_SUPER_INFO_SIZE,
-			       offset_in_page(bytenr));
+		bio_add_folio_nofail(bio, folio, BTRFS_SUPER_INFO_SIZE,
+				     offset_in_folio(folio, bytenr));
 
 		/*
 		 * We FUA only the first super block.  The others we allow to
@@ -3842,7 +3839,7 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		max_mirrors = BTRFS_SUPER_MIRROR_MAX;
 
 	for (i = 0; i < max_mirrors; i++) {
-		struct page *page;
+		struct folio *folio;
 
 		ret = btrfs_sb_log_location(device, i, READ, &bytenr);
 		if (ret == -ENOENT) {
@@ -3857,27 +3854,23 @@ static int wait_dev_supers(struct btrfs_device *device, int max_mirrors)
 		    device->commit_total_bytes)
 			break;
 
-		page = find_get_page(device->bdev->bd_inode->i_mapping,
-				     bytenr >> PAGE_SHIFT);
-		if (!page) {
+		folio = bdev_get_folio(device->bdev, bytenr, 0, 0);
+		if (!IS_ERR(folio)) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 			continue;
 		}
 		/* Page is submitted locked and unlocked once the IO completes */
-		wait_on_page_locked(page);
-		if (PageError(page)) {
+		folio_wait_locked(folio);
+		if (folio_test_error(folio)) {
 			errors++;
 			if (i == 0)
 				primary_failed = true;
 		}
 
-		/* Drop our reference */
-		put_page(page);
-
-		/* Drop the reference from the writing run */
-		put_page(page);
+		/* Drop our reference and the reference from the writing run */
+		folio_put_refs(folio, 2);
 	}
 
 	/* log error, force error return */
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c6f16625af51..b3538777683c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1230,16 +1230,16 @@ int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 
 void btrfs_release_disk_super(struct btrfs_super_block *super)
 {
-	struct page *page = virt_to_page(super);
+	struct folio *folio = virt_to_folio(super);
 
-	put_page(page);
+	folio_put(folio);
 }
 
 static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev,
 						       u64 bytenr, u64 bytenr_orig)
 {
 	struct btrfs_super_block *disk_super;
-	struct page *page;
+	struct folio *folio;
 	void *p;
 	pgoff_t index;
 
@@ -1257,15 +1257,14 @@ static struct btrfs_super_block *btrfs_read_disk_super(struct block_device *bdev
 		return ERR_PTR(-EINVAL);
 
 	/* pull in the page with our super */
-	page = read_cache_page_gfp(bdev->bd_inode->i_mapping, index, GFP_KERNEL);
+	folio = bdev_read_folio(bdev, index);
+	if (IS_ERR(folio))
+		return ERR_CAST(folio);
 
-	if (IS_ERR(page))
-		return ERR_CAST(page);
-
-	p = page_address(page);
+	p = folio_address(folio);
 
 	/* align our pointer to the offset of the super block */
-	disk_super = p + offset_in_page(bytenr);
+	disk_super = p + offset_in_folio(folio, bytenr);
 
 	if (btrfs_super_bytenr(disk_super) != bytenr_orig ||
 	    btrfs_super_magic(disk_super) != BTRFS_MAGIC) {
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 23c1e6b19a65..eb1a9a8b5959 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -120,8 +120,6 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 		return -ENOENT;
 	} else if (full[0] && full[1]) {
 		/* Compare two super blocks */
-		struct address_space *mapping = bdev->bd_inode->i_mapping;
-		struct page *page[BTRFS_NR_SB_LOG_ZONES];
 		struct btrfs_super_block *super[BTRFS_NR_SB_LOG_ZONES];
 		int i;
 
@@ -129,15 +127,18 @@ static int sb_write_pointer(struct block_device *bdev, struct blk_zone *zones,
 			u64 zone_end = (zones[i].start + zones[i].capacity) << SECTOR_SHIFT;
 			u64 bytenr = ALIGN_DOWN(zone_end, BTRFS_SUPER_INFO_SIZE) -
 						BTRFS_SUPER_INFO_SIZE;
+			struct folio *folio;
+			unsigned int nofs_flag;
 
-			page[i] = read_cache_page_gfp(mapping,
-					bytenr >> PAGE_SHIFT, GFP_NOFS);
-			if (IS_ERR(page[i])) {
+			nofs_flag = memalloc_nofs_save();
+			folio = bdev_read_folio(bdev, bytenr);
+			memalloc_nofs_restore(nofs_flag);
+			if (IS_ERR(folio)) {
 				if (i == 1)
 					btrfs_release_disk_super(super[0]);
-				return PTR_ERR(page[i]);
+				return PTR_ERR(folio);
 			}
-			super[i] = page_address(page[i]);
+			super[i] = folio_address(folio);
 		}
 
 		if (btrfs_super_generation(super[0]) >
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 10/17] cramfs: use bdev apis in cramfs_blkdev_read()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (8 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis Yu Kuai
@ 2023-12-21  8:57 ` Yu Kuai
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api Yu Kuai
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:57 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

On the one hand covert to use folio while reading bdev inode, on the
other hand prevent to access bd_inode directly.

Also do some cleanup that there is no need for two for loop, and remove
local array pages.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/cramfs/inode.c | 36 +++++++++++++-----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 60dbfa0f8805..fad95d683d97 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -183,9 +183,6 @@ static int next_buffer;
 static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 				unsigned int len)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
-	struct file_ra_state ra = {};
-	struct page *pages[BLKS_PER_BUF];
 	unsigned i, blocknr, buffer;
 	unsigned long devsize;
 	char *data;
@@ -214,37 +211,30 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
 	devsize = bdev_nr_bytes(sb->s_bdev) >> PAGE_SHIFT;
 
 	/* Ok, read in BLKS_PER_BUF pages completely first. */
-	file_ra_state_init(&ra, mapping);
-	page_cache_sync_readahead(mapping, &ra, NULL, blocknr, BLKS_PER_BUF);
-
-	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = NULL;
-
-		if (blocknr + i < devsize) {
-			page = read_mapping_page(mapping, blocknr + i, NULL);
-			/* synchronous error? */
-			if (IS_ERR(page))
-				page = NULL;
-		}
-		pages[i] = page;
-	}
+	bdev_sync_readahead(sb->s_bdev, NULL, NULL, blocknr, BLKS_PER_BUF);
 
 	buffer = next_buffer;
 	next_buffer = NEXT_BUFFER(buffer);
 	buffer_blocknr[buffer] = blocknr;
 	buffer_dev[buffer] = sb;
-
 	data = read_buffers[buffer];
+
 	for (i = 0; i < BLKS_PER_BUF; i++) {
-		struct page *page = pages[i];
+		struct folio *folio = NULL;
+
+		if (blocknr + i < devsize)
+			folio = bdev_read_folio(sb->s_bdev,
+						(blocknr + i) << PAGE_SHIFT);
 
-		if (page) {
-			memcpy_from_page(data, page, 0, PAGE_SIZE);
-			put_page(page);
-		} else
+		if (IS_ERR_OR_NULL(folio)) {
 			memset(data, 0, PAGE_SIZE);
+		} else {
+			memcpy_from_folio(data, folio, 0, PAGE_SIZE);
+			folio_put(folio);
+		}
 		data += PAGE_SIZE;
 	}
+
 	return read_buffers[buffer] + offset;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (9 preceding siblings ...)
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 10/17] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
@ 2023-12-21  8:58 ` Yu Kuai
  2024-01-04 12:02   ` Jan Kara
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:58 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/erofs/data.c     | 18 ++++++++++++------
 fs/erofs/internal.h |  2 ++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index c98aeda8abb2..bbe2fe199bf3 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
 void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 		  enum erofs_kmap_type type)
 {
-	struct inode *inode = buf->inode;
-	erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
+	u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
+	erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
 	pgoff_t index = offset >> PAGE_SHIFT;
 	struct page *page = buf->page;
 	struct folio *folio;
@@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 		erofs_put_metabuf(buf);
 
 		nofs_flag = memalloc_nofs_save();
-		folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
+		folio = buf->inode ?
+			read_mapping_folio(buf->inode->i_mapping, index, NULL) :
+			bdev_read_folio(buf->bdev, offset);
 		memalloc_nofs_restore(nofs_flag);
 		if (IS_ERR(folio))
 			return folio;
@@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
 
 void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
 {
-	if (erofs_is_fscache_mode(sb))
+	if (erofs_is_fscache_mode(sb)) {
 		buf->inode = EROFS_SB(sb)->s_fscache->inode;
-	else
-		buf->inode = sb->s_bdev->bd_inode;
+		buf->bdev = NULL;
+	} else {
+		buf->inode = NULL;
+		buf->bdev = sb->s_bdev;
+		buf->blkszbits = EROFS_SB(sb)->blkszbits;
+	}
 }
 
 void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index b0409badb017..c9206351b485 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -224,8 +224,10 @@ enum erofs_kmap_type {
 
 struct erofs_buf {
 	struct inode *inode;
+	struct block_device *bdev;
 	struct page *page;
 	void *base;
+	u8 blkszbits;
 	enum erofs_kmap_type kmap_type;
 };
 #define __EROFS_BUF_INITIALIZER	((struct erofs_buf){ .page = NULL })
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (10 preceding siblings ...)
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api Yu Kuai
@ 2023-12-21  8:58 ` Yu Kuai
  2023-12-21 14:54   ` Ryusuke Konishi
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis Yu Kuai
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:58 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/nilfs2/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 55e31cc903d1..a1130e384937 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2823,7 +2823,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
 	if (!nilfs->ns_writer)
 		return -ENOMEM;
 
-	inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
+	bdev_attach_wb(nilfs->ns_bdev);
 
 	err = nilfs_segctor_start_thread(nilfs->ns_writer);
 	if (unlikely(err))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (11 preceding siblings ...)
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
@ 2023-12-21  8:58 ` Yu Kuai
  2024-01-04 12:11   ` Jan Kara
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block Yu Kuai
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:58 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/jbd2/journal.c  | 3 +--
 fs/jbd2/recovery.c | 6 ++----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index ed53188472f9..f1b5ffeaf02a 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2003,8 +2003,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
 		byte_count = (block_stop - block_start + 1) *
 				journal->j_blocksize;
 
-		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
-				byte_start, byte_stop);
+		truncate_bdev_range(journal->j_dev, 0, byte_start, byte_stop);
 
 		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
 			err = blkdev_issue_discard(journal->j_dev,
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 01f744cb97a4..6b6a2c4585fa 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -290,7 +290,6 @@ int jbd2_journal_recover(journal_t *journal)
 
 	struct recovery_info	info;
 	errseq_t		wb_err;
-	struct address_space	*mapping;
 
 	memset(&info, 0, sizeof(info));
 	sb = journal->j_superblock;
@@ -309,8 +308,7 @@ int jbd2_journal_recover(journal_t *journal)
 	}
 
 	wb_err = 0;
-	mapping = journal->j_fs_dev->bd_inode->i_mapping;
-	errseq_check_and_advance(&mapping->wb_err, &wb_err);
+	bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
 	err = do_one_pass(journal, &info, PASS_SCAN);
 	if (!err)
 		err = do_one_pass(journal, &info, PASS_REVOKE);
@@ -334,7 +332,7 @@ int jbd2_journal_recover(journal_t *journal)
 	err2 = sync_blockdev(journal->j_fs_dev);
 	if (!err)
 		err = err2;
-	err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
+	err2 = bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
 	if (!err)
 		err = err2;
 	/* Make sure all replayed data is on permanent storage */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (12 preceding siblings ...)
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis Yu Kuai
@ 2023-12-21  8:58 ` Yu Kuai
  2024-01-04 12:22   ` Jan Kara
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 15/17] ext4: use " Yu Kuai
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:58 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Unlike __bread_gfp(), ext4 has special handing while reading sb block:

1) __GFP_NOFAIL is not set, and memory allocation can fail;
2) If buffer write failed before, set buffer uptodate and don't read
   block from disk;
3) REQ_META is set for all IO, and REQ_PRIO is set for reading xattr;
4) If failed, return error ptr instead of NULL;

This patch add a new helper __bread_gfp2() that will match above 2 and 3(
1 will be used, and 4 will still be encapsulated by ext4), and prepare to
prevent calling mapping_gfp_constraint() directly on bd_inode->i_mapping
in ext4.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 fs/buffer.c                 | 68 ++++++++++++++++++++++++++-----------
 include/linux/buffer_head.h | 18 +++++++++-
 2 files changed, 65 insertions(+), 21 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 967f34b70aa8..188bd36c9fea 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1255,16 +1255,19 @@ void __bforget(struct buffer_head *bh)
 }
 EXPORT_SYMBOL(__bforget);
 
-static struct buffer_head *__bread_slow(struct buffer_head *bh)
+static struct buffer_head *__bread_slow(struct buffer_head *bh,
+					blk_opf_t op_flags,
+					bool check_write_error)
 {
 	lock_buffer(bh);
-	if (buffer_uptodate(bh)) {
+	if (buffer_uptodate(bh) ||
+	    (check_write_error && buffer_uptodate_or_error(bh))) {
 		unlock_buffer(bh);
 		return bh;
 	} else {
 		get_bh(bh);
 		bh->b_end_io = end_buffer_read_sync;
-		submit_bh(REQ_OP_READ, bh);
+		submit_bh(REQ_OP_READ | op_flags, bh);
 		wait_on_buffer(bh);
 		if (buffer_uptodate(bh))
 			return bh;
@@ -1445,6 +1448,31 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
 }
 EXPORT_SYMBOL(__breadahead);
 
+static struct buffer_head *
+bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
+	  blk_opf_t op_flags, gfp_t gfp, bool check_write_error)
+{
+	struct buffer_head *bh;
+
+	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
+
+	/*
+	 * Prefer looping in the allocator rather than here, at least that
+	 * code knows what it's doing.
+	 */
+	gfp |= __GFP_NOFAIL;
+
+	bh = bdev_getblk(bdev, block, size, gfp);
+	if (unlikely(!bh))
+		return NULL;
+
+	if (buffer_uptodate(bh) ||
+	    (check_write_error && buffer_uptodate_or_error(bh)))
+		return bh;
+
+	return __bread_slow(bh, op_flags, check_write_error);
+}
+
 /**
  *  __bread_gfp() - reads a specified block and returns the bh
  *  @bdev: the block_device to read from
@@ -1458,27 +1486,27 @@ EXPORT_SYMBOL(__breadahead);
  *  It returns NULL if the block was unreadable.
  */
 struct buffer_head *
-__bread_gfp(struct block_device *bdev, sector_t block,
-		   unsigned size, gfp_t gfp)
+__bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
+	    gfp_t gfp)
 {
-	struct buffer_head *bh;
-
-	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
-
-	/*
-	 * Prefer looping in the allocator rather than here, at least that
-	 * code knows what it's doing.
-	 */
-	gfp |= __GFP_NOFAIL;
-
-	bh = bdev_getblk(bdev, block, size, gfp);
-
-	if (likely(bh) && !buffer_uptodate(bh))
-		bh = __bread_slow(bh);
-	return bh;
+	return bread_gfp(bdev, block, size, 0, gfp, false);
 }
 EXPORT_SYMBOL(__bread_gfp);
 
+/*
+ * This works like __bread_gfp() except:
+ * 1) If buffer write failed before, set buffer uptodate and don't read
+ * block from disk;
+ * 2) Caller can pass in additional op_flags like REQ_META;
+ */
+struct buffer_head *
+__bread_gfp2(struct block_device *bdev, sector_t block, unsigned int size,
+	     blk_opf_t op_flags, gfp_t gfp)
+{
+	return bread_gfp(bdev, block, size, op_flags, gfp, true);
+}
+EXPORT_SYMBOL(__bread_gfp2);
+
 static void __invalidate_bh_lrus(struct bh_lru *b)
 {
 	int i;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 5f23ee599889..751b2744b4ae 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -171,6 +171,18 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh)
 	return test_bit_acquire(BH_Uptodate, &bh->b_state);
 }
 
+static __always_inline int buffer_uptodate_or_error(struct buffer_head *bh)
+{
+	/*
+	 * If the buffer has the write error flag, data was failed to write
+	 * out in the block. In this case, set buffer uptodate to prevent
+	 * reading old data.
+	 */
+	if (buffer_write_io_error(bh))
+		set_buffer_uptodate(bh);
+	return buffer_uptodate(bh);
+}
+
 static inline unsigned long bh_offset(const struct buffer_head *bh)
 {
 	return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
@@ -231,7 +243,11 @@ void __brelse(struct buffer_head *);
 void __bforget(struct buffer_head *);
 void __breadahead(struct block_device *, sector_t block, unsigned int size);
 struct buffer_head *__bread_gfp(struct block_device *,
-				sector_t block, unsigned size, gfp_t gfp);
+				sector_t block, unsigned int size, gfp_t gfp);
+struct buffer_head *__bread_gfp2(struct block_device *bdev, sector_t block,
+				 unsigned int size, blk_opf_t op_flags,
+				 gfp_t gfp);
+
 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
 void free_buffer_head(struct buffer_head * bh);
 void unlock_buffer(struct buffer_head *bh);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 15/17] ext4: use new helper to read sb block
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (13 preceding siblings ...)
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block Yu Kuai
@ 2023-12-21  8:58 ` Yu Kuai
  2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 16/17] ext4: remove block_device_ejected() Yu Kuai
  2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 17/17] ext4: use bdev apis Yu Kuai
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:58 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Remove __ext4_sb_bread_gfp() and ext4_buffer_uptodate() that is defined
by ext4, and convert to use common helper __bread_gfp2() and
buffer_uptodate_or_error().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4.h    | 13 -------------
 fs/ext4/inode.c   |  8 ++++----
 fs/ext4/super.c   | 45 ++++++++++-----------------------------------
 fs/ext4/symlink.c |  2 +-
 4 files changed, 15 insertions(+), 53 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a5d784872303..8377f6c5264f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3824,19 +3824,6 @@ extern const struct iomap_ops ext4_iomap_ops;
 extern const struct iomap_ops ext4_iomap_overwrite_ops;
 extern const struct iomap_ops ext4_iomap_report_ops;
 
-static inline int ext4_buffer_uptodate(struct buffer_head *bh)
-{
-	/*
-	 * If the buffer has the write error flag, we have failed
-	 * to write out data in the block.  In this  case, we don't
-	 * have to read the block because we may read the old data
-	 * successfully.
-	 */
-	if (buffer_write_io_error(bh))
-		set_buffer_uptodate(bh);
-	return buffer_uptodate(bh);
-}
-
 #endif	/* __KERNEL__ */
 
 #define EFSBADCRC	EBADMSG		/* Bad CRC detected */
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 61277f7f8722..efb0af6f02f7 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -887,7 +887,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct inode *inode,
 	bh = ext4_getblk(handle, inode, block, map_flags);
 	if (IS_ERR(bh))
 		return bh;
-	if (!bh || ext4_buffer_uptodate(bh))
+	if (!bh || buffer_uptodate_or_error(bh))
 		return bh;
 
 	ret = ext4_read_bh_lock(bh, REQ_META | REQ_PRIO, true);
@@ -915,7 +915,7 @@ int ext4_bread_batch(struct inode *inode, ext4_lblk_t block, int bh_count,
 
 	for (i = 0; i < bh_count; i++)
 		/* Note that NULL bhs[i] is valid because of holes. */
-		if (bhs[i] && !ext4_buffer_uptodate(bhs[i]))
+		if (bhs[i] && !buffer_uptodate_or_error(bhs[i]))
 			ext4_read_bh_lock(bhs[i], REQ_META | REQ_PRIO, false);
 
 	if (!wait)
@@ -4392,11 +4392,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino,
 	bh = sb_getblk(sb, block);
 	if (unlikely(!bh))
 		return -ENOMEM;
-	if (ext4_buffer_uptodate(bh))
+	if (buffer_uptodate_or_error(bh))
 		goto has_buffer;
 
 	lock_buffer(bh);
-	if (ext4_buffer_uptodate(bh)) {
+	if (buffer_uptodate_or_error(bh)) {
 		/* Someone brought it uptodate while we waited */
 		unlock_buffer(bh);
 		goto has_buffer;
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c5fcf377ab1f..3f07eaa33332 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -180,7 +180,7 @@ void ext4_read_bh_nowait(struct buffer_head *bh, blk_opf_t op_flags,
 {
 	BUG_ON(!buffer_locked(bh));
 
-	if (ext4_buffer_uptodate(bh)) {
+	if (buffer_uptodate_or_error(bh)) {
 		unlock_buffer(bh);
 		return;
 	}
@@ -191,7 +191,7 @@ int ext4_read_bh(struct buffer_head *bh, blk_opf_t op_flags, bh_end_io_t *end_io
 {
 	BUG_ON(!buffer_locked(bh));
 
-	if (ext4_buffer_uptodate(bh)) {
+	if (buffer_uptodate_or_error(bh)) {
 		unlock_buffer(bh);
 		return 0;
 	}
@@ -214,49 +214,24 @@ int ext4_read_bh_lock(struct buffer_head *bh, blk_opf_t op_flags, bool wait)
 	return ext4_read_bh(bh, op_flags, NULL);
 }
 
-/*
- * This works like __bread_gfp() except it uses ERR_PTR for error
- * returns.  Currently with sb_bread it's impossible to distinguish
- * between ENOMEM and EIO situations (since both result in a NULL
- * return.
- */
-static struct buffer_head *__ext4_sb_bread_gfp(struct super_block *sb,
-					       sector_t block,
-					       blk_opf_t op_flags, gfp_t gfp)
-{
-	struct buffer_head *bh;
-	int ret;
-
-	bh = sb_getblk_gfp(sb, block, gfp);
-	if (bh == NULL)
-		return ERR_PTR(-ENOMEM);
-	if (ext4_buffer_uptodate(bh))
-		return bh;
-
-	ret = ext4_read_bh_lock(bh, REQ_META | op_flags, true);
-	if (ret) {
-		put_bh(bh);
-		return ERR_PTR(ret);
-	}
-	return bh;
-}
-
 struct buffer_head *ext4_sb_bread(struct super_block *sb, sector_t block,
 				   blk_opf_t op_flags)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
-			~__GFP_FS) | __GFP_MOVABLE;
+	struct buffer_head *bh = __bread_gfp2(sb->s_bdev, block,
+					      sb->s_blocksize,
+					      REQ_META | op_flags,
+					      __GFP_MOVABLE);
 
-	return __ext4_sb_bread_gfp(sb, block, op_flags, gfp);
+	return bh ? bh : ERR_PTR(-EIO);
 }
 
 struct buffer_head *ext4_sb_bread_unmovable(struct super_block *sb,
 					    sector_t block)
 {
-	gfp_t gfp = mapping_gfp_constraint(sb->s_bdev->bd_inode->i_mapping,
-			~__GFP_FS);
+	struct buffer_head *bh = __bread_gfp2(sb->s_bdev, block,
+					      sb->s_blocksize, REQ_META, 0);
 
-	return __ext4_sb_bread_gfp(sb, block, 0, gfp);
+	return bh ? bh : ERR_PTR(-EIO);
 }
 
 void ext4_sb_breadahead_unmovable(struct super_block *sb, sector_t block)
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index 75bf1f88843c..49e918221aac 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -94,7 +94,7 @@ static const char *ext4_get_link(struct dentry *dentry, struct inode *inode,
 		bh = ext4_getblk(NULL, inode, 0, EXT4_GET_BLOCKS_CACHED_NOWAIT);
 		if (IS_ERR(bh))
 			return ERR_CAST(bh);
-		if (!bh || !ext4_buffer_uptodate(bh))
+		if (!bh || !buffer_uptodate_or_error(bh))
 			return ERR_PTR(-ECHILD);
 	} else {
 		bh = ext4_bread(NULL, inode, 0, 0);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 16/17] ext4: remove block_device_ejected()
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (14 preceding siblings ...)
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 15/17] ext4: use " Yu Kuai
@ 2023-12-21  8:59 ` Yu Kuai
  2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 17/17] ext4: use bdev apis Yu Kuai
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:59 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

block_device_ejected() is added by commit bdfe0cbd746a ("Revert
"ext4: remove block_device_ejected"") in 2015. At that time 'bdi->wb'
is destroyed synchronized from del_gendisk(), hence if ext4 is still
mounted, and then mark_buffer_dirty() will reference destroyed 'wb'.
However, such problem doesn't exist anymore:

- commit d03f6cdc1fc4 ("block: Dynamically allocate and refcount
backing_dev_info") switch bdi to use refcounting;
- commit 13eec2363ef0 ("fs: Get proper reference for s_bdi"), will grab
additional reference of bdi while mounting, so that 'bdi->wb' will not
be destroyed until generic_shutdown_super().

Hence remove this dead function block_device_ejected().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3f07eaa33332..a7935edbd7b1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -467,22 +467,6 @@ static void ext4_maybe_update_superblock(struct super_block *sb)
 		schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
 }
 
-/*
- * The del_gendisk() function uninitializes the disk-specific data
- * structures, including the bdi structure, without telling anyone
- * else.  Once this happens, any attempt to call mark_buffer_dirty()
- * (for example, by ext4_commit_super), will cause a kernel OOPS.
- * This is a kludge to prevent these oops until we can put in a proper
- * hook in del_gendisk() to inform the VFS and file system layers.
- */
-static int block_device_ejected(struct super_block *sb)
-{
-	struct inode *bd_inode = sb->s_bdev->bd_inode;
-	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
-
-	return bdi->dev == NULL;
-}
-
 static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
 {
 	struct super_block		*sb = journal->j_private;
@@ -6162,8 +6146,6 @@ static int ext4_commit_super(struct super_block *sb)
 
 	if (!sbh)
 		return -EINVAL;
-	if (block_device_ejected(sb))
-		return -ENODEV;
 
 	ext4_update_super(sb);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH RFC v3 for-6.8/block 17/17] ext4: use bdev apis
  2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
                   ` (15 preceding siblings ...)
  2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 16/17] ext4: remove block_device_ejected() Yu Kuai
@ 2023-12-21  8:59 ` Yu Kuai
  16 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2023-12-21  8:59 UTC (permalink / raw)
  To: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav
  Cc: linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yukuai1, yi.zhang, yangerkun

From: Yu Kuai <yukuai3@huawei.com>

Avoid to access bd_inode directly, prepare to remove bd_inode from
block_device.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/dir.c       | 6 ++----
 fs/ext4/ext4_jbd2.c | 6 +++---
 fs/ext4/super.c     | 3 +--
 3 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 3985f8c33f95..64e35eb6a324 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -191,10 +191,8 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
 			pgoff_t index = map.m_pblk >>
 					(PAGE_SHIFT - inode->i_blkbits);
 			if (!ra_has_index(&file->f_ra, index))
-				page_cache_sync_readahead(
-					sb->s_bdev->bd_inode->i_mapping,
-					&file->f_ra, file,
-					index, 1);
+				bdev_sync_readahead(sb->s_bdev, &file->f_ra,
+						    file, index, 1);
 			file->f_ra.prev_pos = (loff_t)index << PAGE_SHIFT;
 			bh = ext4_bread(NULL, inode, map.m_lblk, 0);
 			if (IS_ERR(bh)) {
diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index d1a2e6624401..c1bf3a00fad9 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -206,7 +206,6 @@ static void ext4_journal_abort_handle(const char *caller, unsigned int line,
 
 static void ext4_check_bdev_write_error(struct super_block *sb)
 {
-	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
 	int err;
 
@@ -216,9 +215,10 @@ static void ext4_check_bdev_write_error(struct super_block *sb)
 	 * we could read old data from disk and write it out again, which
 	 * may lead to on-disk filesystem inconsistency.
 	 */
-	if (errseq_check(&mapping->wb_err, READ_ONCE(sbi->s_bdev_wb_err))) {
+	if (bdev_wb_err_check(sb->s_bdev, READ_ONCE(sbi->s_bdev_wb_err))) {
 		spin_lock(&sbi->s_bdev_wb_lock);
-		err = errseq_check_and_advance(&mapping->wb_err, &sbi->s_bdev_wb_err);
+		err = bdev_wb_err_check_and_advance(sb->s_bdev,
+						    &sbi->s_bdev_wb_err);
 		spin_unlock(&sbi->s_bdev_wb_lock);
 		if (err)
 			ext4_error_err(sb, -err,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index a7935edbd7b1..25c3d2ac8559 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5544,8 +5544,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
 	 * used to detect the metadata async write error.
 	 */
 	spin_lock_init(&sbi->s_bdev_wb_lock);
-	errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
-				 &sbi->s_bdev_wb_err);
+	bdev_wb_err_check_and_advance(sb->s_bdev, &sbi->s_bdev_wb_err);
 	EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
 	ext4_orphan_cleanup(sb, es);
 	EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer()
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
@ 2023-12-21 14:54   ` Ryusuke Konishi
  0 siblings, 0 replies; 37+ messages in thread
From: Ryusuke Konishi @ 2023-12-21 14:54 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, willy, akpm, hare, p.raghav,
	linux-block, linux-kernel, xen-devel, linux-bcache, linux-mtd,
	linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yi.zhang, yangerkun

On Thu, Dec 21, 2023 at 6:00 PM Yu Kuai wrote:
>
> From: Yu Kuai <yukuai3@huawei.com>
>
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_device.
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  fs/nilfs2/segment.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index 55e31cc903d1..a1130e384937 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -2823,7 +2823,7 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
>         if (!nilfs->ns_writer)
>                 return -ENOMEM;
>
> -       inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
> +       bdev_attach_wb(nilfs->ns_bdev);
>
>         err = nilfs_segctor_start_thread(nilfs->ns_writer);
>         if (unlikely(err))
> --
> 2.39.2
>

Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis Yu Kuai
@ 2023-12-23 17:31   ` Matthew Wilcox
  2023-12-23 18:39     ` Kent Overstreet
  2024-01-04 11:49     ` Jan Kara
  0 siblings, 2 replies; 37+ messages in thread
From: Matthew Wilcox @ 2023-12-23 17:31 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, akpm, hare,
	p.raghav, linux-block, linux-kernel, xen-devel, linux-bcache,
	linux-mtd, linux-s390, linux-scsi, linux-bcachefs, linux-btrfs,
	linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs, yukuai3,
	yi.zhang, yangerkun

On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
>  		 * Drop the page of the primary superblock, so later read will
>  		 * always read from the device.
>  		 */
> -		invalidate_inode_pages2_range(mapping,
> -				bytenr >> PAGE_SHIFT,
> +		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
>  				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
>  	}
>  
> -	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> -	if (IS_ERR(page))
> -		return ERR_CAST(page);
> +	nofs_flag = memalloc_nofs_save();
> +	folio = bdev_read_folio(bdev, bytenr);
> +	memalloc_nofs_restore(nofs_flag);

This is the wrong way to use memalloc_nofs_save/restore.  They should be
used at the point that the filesystem takes/releases whatever lock is
also used during reclaim.  I don't know btrfs well enough to suggest
what lock is missing these annotations.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
  2023-12-23 17:31   ` Matthew Wilcox
@ 2023-12-23 18:39     ` Kent Overstreet
  2024-01-04 11:49     ` Jan Kara
  1 sibling, 0 replies; 37+ messages in thread
From: Kent Overstreet @ 2023-12-23 18:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, akpm, hare, p.raghav, linux-block, linux-kernel,
	xen-devel, linux-bcache, linux-mtd, linux-s390, linux-scsi,
	linux-bcachefs, linux-btrfs, linux-fsdevel, linux-erofs,
	linux-ext4, linux-nilfs, yukuai3, yi.zhang, yangerkun

On Sat, Dec 23, 2023 at 05:31:55PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> >  		 * Drop the page of the primary superblock, so later read will
> >  		 * always read from the device.
> >  		 */
> > -		invalidate_inode_pages2_range(mapping,
> > -				bytenr >> PAGE_SHIFT,
> > +		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> >  				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> >  	}
> >  
> > -	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > -	if (IS_ERR(page))
> > -		return ERR_CAST(page);
> > +	nofs_flag = memalloc_nofs_save();
> > +	folio = bdev_read_folio(bdev, bytenr);
> > +	memalloc_nofs_restore(nofs_flag);
> 
> This is the wrong way to use memalloc_nofs_save/restore.  They should be
> used at the point that the filesystem takes/releases whatever lock is
> also used during reclaim.  I don't know btrfs well enough to suggest
> what lock is missing these annotations.

Yes, but considering this is a cross-filesystem cleanup I wouldn't want
to address that in this patchset. And the easier, more incremental
approach for the conversion would be to first convert every GFP_NOFS
usage  to memalloc_nofs_save() like this patch does, as small local
changes, and then let the btrfs people combine them and move them to the
approproate location in a separate patchstet.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
@ 2024-01-04 11:06   ` Jan Kara
  2024-01-04 12:19     ` Yu Kuai
  2024-01-05  6:08     ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Kara @ 2024-01-04 11:06 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu 21-12-23 16:56:57, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_devcie.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>  drivers/block/xen-blkback/xenbus.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index e34219ea2b05..e645afa4af57 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>  		xenbus_dev_error(blkif->be->dev, err, "block flush");
>  		return;
>  	}
> -	invalidate_inode_pages2(
> -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);

This function uses invalidate_inode_pages2() while invalidate_bdev() ends
up using mapping_try_invalidate() and there are subtle behavioral
differences between these two (for example invalidate_inode_pages2() tries
to clean dirty pages using the ->launder_folio method). So I think you'll
need helper like invalidate_bdev2() for this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis
  2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis Yu Kuai
@ 2024-01-04 11:28   ` Jan Kara
  2024-01-04 12:22     ` Yu Kuai
  2024-01-05  6:10     ` Christoph Hellwig
  0 siblings, 2 replies; 37+ messages in thread
From: Jan Kara @ 2024-01-04 11:28 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu 21-12-23 16:56:59, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> On the one hand covert to use folio while reading bdev inode, on the
> other hand prevent to access bd_inode directly.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
...
> +		for (p = folio_address(folio); p < max; p++)
>  			if (*p != -1UL) {
> -				lock_page(page);
> -				memset(page_address(page), 0xff, PAGE_SIZE);
> -				set_page_dirty(page);
> -				unlock_page(page);
> -				balance_dirty_pages_ratelimited(mapping);
> +				folio_lock(folio);
> +				memset(folio_address(folio), 0xff,
> +				       folio_size(folio));
> +				folio_mark_dirty(folio);
> +				folio_unlock(folio);
> +				bdev_balance_dirty_pages_ratelimited(bdev);

Rather then creating this bdev_balance_dirty_pages_ratelimited() just for
MTD perhaps we can have here (and in other functions):

				...
				mapping = folio_mapping(folio);
				folio_unlock(folio);
				if (mapping)
					balance_dirty_pages_ratelimited(mapping);

What do you think? Because when we are working with the folios it is rather
natural to use their mapping for dirty balancing?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
  2023-12-23 17:31   ` Matthew Wilcox
  2023-12-23 18:39     ` Kent Overstreet
@ 2024-01-04 11:49     ` Jan Kara
  2024-04-10 17:28       ` David Sterba
  1 sibling, 1 reply; 37+ messages in thread
From: Jan Kara @ 2024-01-04 11:49 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, akpm, hare, p.raghav, linux-block, linux-kernel,
	xen-devel, linux-bcache, linux-mtd, linux-s390, linux-scsi,
	linux-bcachefs, linux-btrfs, linux-fsdevel, linux-erofs,
	linux-ext4, linux-nilfs, yukuai3, yi.zhang, yangerkun

On Sat 23-12-23 17:31:55, Matthew Wilcox wrote:
> On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> >  		 * Drop the page of the primary superblock, so later read will
> >  		 * always read from the device.
> >  		 */
> > -		invalidate_inode_pages2_range(mapping,
> > -				bytenr >> PAGE_SHIFT,
> > +		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> >  				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> >  	}
> >  
> > -	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > -	if (IS_ERR(page))
> > -		return ERR_CAST(page);
> > +	nofs_flag = memalloc_nofs_save();
> > +	folio = bdev_read_folio(bdev, bytenr);
> > +	memalloc_nofs_restore(nofs_flag);
> 
> This is the wrong way to use memalloc_nofs_save/restore.  They should be
> used at the point that the filesystem takes/releases whatever lock is
> also used during reclaim.  I don't know btrfs well enough to suggest
> what lock is missing these annotations.

In principle I agree with you but in this particular case I agree the ask
is just too big. I suspect it is one of btrfs btree locks or maybe
chunk_mutex but I doubt even btrfs developers know and maybe it is just a
cargo cult. And it is not like this would be the first occurence of this
anti-pattern in btrfs - see e.g. device_list_add(), add_missing_dev(),
btrfs_destroy_delalloc_inodes() (here the wrapping around
invalidate_inode_pages2() looks really weird), and many others...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api Yu Kuai
@ 2024-01-04 12:02   ` Jan Kara
  2024-01-04 12:32     ` Yu Kuai
  0 siblings, 1 reply; 37+ messages in thread
From: Jan Kara @ 2024-01-04 12:02 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu 21-12-23 16:58:26, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_device.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
right thing in erofs_bread() which is the only place that seems to care
about the erofs_is_fscache_mode() distinction... Also blkszbits is then
trivially sb->s_blocksize_bits so it would all seem much more
straightforward.

								Honza

> ---
>  fs/erofs/data.c     | 18 ++++++++++++------
>  fs/erofs/internal.h |  2 ++
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index c98aeda8abb2..bbe2fe199bf3 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
>  void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>  		  enum erofs_kmap_type type)
>  {
> -	struct inode *inode = buf->inode;
> -	erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
> +	u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
> +	erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
>  	pgoff_t index = offset >> PAGE_SHIFT;
>  	struct page *page = buf->page;
>  	struct folio *folio;
> @@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>  		erofs_put_metabuf(buf);
>  
>  		nofs_flag = memalloc_nofs_save();
> -		folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
> +		folio = buf->inode ?
> +			read_mapping_folio(buf->inode->i_mapping, index, NULL) :
> +			bdev_read_folio(buf->bdev, offset);
>  		memalloc_nofs_restore(nofs_flag);
>  		if (IS_ERR(folio))
>  			return folio;
> @@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>  
>  void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
>  {
> -	if (erofs_is_fscache_mode(sb))
> +	if (erofs_is_fscache_mode(sb)) {
>  		buf->inode = EROFS_SB(sb)->s_fscache->inode;
> -	else
> -		buf->inode = sb->s_bdev->bd_inode;
> +		buf->bdev = NULL;
> +	} else {
> +		buf->inode = NULL;
> +		buf->bdev = sb->s_bdev;
> +		buf->blkszbits = EROFS_SB(sb)->blkszbits;
> +	}
>  }
>  
>  void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
> index b0409badb017..c9206351b485 100644
> --- a/fs/erofs/internal.h
> +++ b/fs/erofs/internal.h
> @@ -224,8 +224,10 @@ enum erofs_kmap_type {
>  
>  struct erofs_buf {
>  	struct inode *inode;
> +	struct block_device *bdev;
>  	struct page *page;
>  	void *base;
> +	u8 blkszbits;
>  	enum erofs_kmap_type kmap_type;
>  };
>  #define __EROFS_BUF_INITIALIZER	((struct erofs_buf){ .page = NULL })
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis Yu Kuai
@ 2024-01-04 12:11   ` Jan Kara
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kara @ 2024-01-04 12:11 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu 21-12-23 16:58:46, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Avoid to access bd_inode directly, prepare to remove bd_inode from
> block_device.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

But note there are changes pending to this code for the coming merge window
so you'll have to rebase...

								Honza

> ---
>  fs/jbd2/journal.c  | 3 +--
>  fs/jbd2/recovery.c | 6 ++----
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
> index ed53188472f9..f1b5ffeaf02a 100644
> --- a/fs/jbd2/journal.c
> +++ b/fs/jbd2/journal.c
> @@ -2003,8 +2003,7 @@ static int __jbd2_journal_erase(journal_t *journal, unsigned int flags)
>  		byte_count = (block_stop - block_start + 1) *
>  				journal->j_blocksize;
>  
> -		truncate_inode_pages_range(journal->j_dev->bd_inode->i_mapping,
> -				byte_start, byte_stop);
> +		truncate_bdev_range(journal->j_dev, 0, byte_start, byte_stop);
>  
>  		if (flags & JBD2_JOURNAL_FLUSH_DISCARD) {
>  			err = blkdev_issue_discard(journal->j_dev,
> diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
> index 01f744cb97a4..6b6a2c4585fa 100644
> --- a/fs/jbd2/recovery.c
> +++ b/fs/jbd2/recovery.c
> @@ -290,7 +290,6 @@ int jbd2_journal_recover(journal_t *journal)
>  
>  	struct recovery_info	info;
>  	errseq_t		wb_err;
> -	struct address_space	*mapping;
>  
>  	memset(&info, 0, sizeof(info));
>  	sb = journal->j_superblock;
> @@ -309,8 +308,7 @@ int jbd2_journal_recover(journal_t *journal)
>  	}
>  
>  	wb_err = 0;
> -	mapping = journal->j_fs_dev->bd_inode->i_mapping;
> -	errseq_check_and_advance(&mapping->wb_err, &wb_err);
> +	bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
>  	err = do_one_pass(journal, &info, PASS_SCAN);
>  	if (!err)
>  		err = do_one_pass(journal, &info, PASS_REVOKE);
> @@ -334,7 +332,7 @@ int jbd2_journal_recover(journal_t *journal)
>  	err2 = sync_blockdev(journal->j_fs_dev);
>  	if (!err)
>  		err = err2;
> -	err2 = errseq_check_and_advance(&mapping->wb_err, &wb_err);
> +	err2 = bdev_wb_err_check_and_advance(journal->j_fs_dev, &wb_err);
>  	if (!err)
>  		err = err2;
>  	/* Make sure all replayed data is on permanent storage */
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()
  2024-01-04 11:06   ` Jan Kara
@ 2024-01-04 12:19     ` Yu Kuai
  2024-01-04 15:16       ` Jan Kara
  2024-01-05  6:08     ` Christoph Hellwig
  1 sibling, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-01-04 12:19 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi, Jan!

在 2024/01/04 19:06, Jan Kara 写道:
> On Thu 21-12-23 16:56:57, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>> block_devcie.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>>   drivers/block/xen-blkback/xenbus.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
>> index e34219ea2b05..e645afa4af57 100644
>> --- a/drivers/block/xen-blkback/xenbus.c
>> +++ b/drivers/block/xen-blkback/xenbus.c
>> @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
>>   		xenbus_dev_error(blkif->be->dev, err, "block flush");
>>   		return;
>>   	}
>> -	invalidate_inode_pages2(
>> -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
>> +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
> 
> This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> up using mapping_try_invalidate() and there are subtle behavioral
> differences between these two (for example invalidate_inode_pages2() tries
> to clean dirty pages using the ->launder_folio method). So I think you'll
> need helper like invalidate_bdev2() for this.

Thanks for reviewing this patch, I know the differenct between then,
what I don't understand is that why using invalidate_inode_pages2()
here. sync_blockdev() is just called and 0 is returned, I think in this
case it's safe to call invalidate_bdev() directly, or am I missing
other things?

Thanks,
Kuai

> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block
  2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block Yu Kuai
@ 2024-01-04 12:22   ` Jan Kara
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kara @ 2024-01-04 12:22 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu 21-12-23 16:58:53, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Unlike __bread_gfp(), ext4 has special handing while reading sb block:
> 
> 1) __GFP_NOFAIL is not set, and memory allocation can fail;
> 2) If buffer write failed before, set buffer uptodate and don't read
>    block from disk;
> 3) REQ_META is set for all IO, and REQ_PRIO is set for reading xattr;
> 4) If failed, return error ptr instead of NULL;
> 
> This patch add a new helper __bread_gfp2() that will match above 2 and 3(
> 1 will be used, and 4 will still be encapsulated by ext4), and prepare to
> prevent calling mapping_gfp_constraint() directly on bd_inode->i_mapping
> in ext4.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

I'm not enthusiastic about this but I guess it is as good as it gets
without larger cleanups in this area. So feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/buffer.c                 | 68 ++++++++++++++++++++++++++-----------
>  include/linux/buffer_head.h | 18 +++++++++-
>  2 files changed, 65 insertions(+), 21 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 967f34b70aa8..188bd36c9fea 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1255,16 +1255,19 @@ void __bforget(struct buffer_head *bh)
>  }
>  EXPORT_SYMBOL(__bforget);
>  
> -static struct buffer_head *__bread_slow(struct buffer_head *bh)
> +static struct buffer_head *__bread_slow(struct buffer_head *bh,
> +					blk_opf_t op_flags,
> +					bool check_write_error)
>  {
>  	lock_buffer(bh);
> -	if (buffer_uptodate(bh)) {
> +	if (buffer_uptodate(bh) ||
> +	    (check_write_error && buffer_uptodate_or_error(bh))) {
>  		unlock_buffer(bh);
>  		return bh;
>  	} else {
>  		get_bh(bh);
>  		bh->b_end_io = end_buffer_read_sync;
> -		submit_bh(REQ_OP_READ, bh);
> +		submit_bh(REQ_OP_READ | op_flags, bh);
>  		wait_on_buffer(bh);
>  		if (buffer_uptodate(bh))
>  			return bh;
> @@ -1445,6 +1448,31 @@ void __breadahead(struct block_device *bdev, sector_t block, unsigned size)
>  }
>  EXPORT_SYMBOL(__breadahead);
>  
> +static struct buffer_head *
> +bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
> +	  blk_opf_t op_flags, gfp_t gfp, bool check_write_error)
> +{
> +	struct buffer_head *bh;
> +
> +	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
> +
> +	/*
> +	 * Prefer looping in the allocator rather than here, at least that
> +	 * code knows what it's doing.
> +	 */
> +	gfp |= __GFP_NOFAIL;
> +
> +	bh = bdev_getblk(bdev, block, size, gfp);
> +	if (unlikely(!bh))
> +		return NULL;
> +
> +	if (buffer_uptodate(bh) ||
> +	    (check_write_error && buffer_uptodate_or_error(bh)))
> +		return bh;
> +
> +	return __bread_slow(bh, op_flags, check_write_error);
> +}
> +
>  /**
>   *  __bread_gfp() - reads a specified block and returns the bh
>   *  @bdev: the block_device to read from
> @@ -1458,27 +1486,27 @@ EXPORT_SYMBOL(__breadahead);
>   *  It returns NULL if the block was unreadable.
>   */
>  struct buffer_head *
> -__bread_gfp(struct block_device *bdev, sector_t block,
> -		   unsigned size, gfp_t gfp)
> +__bread_gfp(struct block_device *bdev, sector_t block, unsigned int size,
> +	    gfp_t gfp)
>  {
> -	struct buffer_head *bh;
> -
> -	gfp |= mapping_gfp_constraint(bdev->bd_inode->i_mapping, ~__GFP_FS);
> -
> -	/*
> -	 * Prefer looping in the allocator rather than here, at least that
> -	 * code knows what it's doing.
> -	 */
> -	gfp |= __GFP_NOFAIL;
> -
> -	bh = bdev_getblk(bdev, block, size, gfp);
> -
> -	if (likely(bh) && !buffer_uptodate(bh))
> -		bh = __bread_slow(bh);
> -	return bh;
> +	return bread_gfp(bdev, block, size, 0, gfp, false);
>  }
>  EXPORT_SYMBOL(__bread_gfp);
>  
> +/*
> + * This works like __bread_gfp() except:
> + * 1) If buffer write failed before, set buffer uptodate and don't read
> + * block from disk;
> + * 2) Caller can pass in additional op_flags like REQ_META;
> + */
> +struct buffer_head *
> +__bread_gfp2(struct block_device *bdev, sector_t block, unsigned int size,
> +	     blk_opf_t op_flags, gfp_t gfp)
> +{
> +	return bread_gfp(bdev, block, size, op_flags, gfp, true);
> +}
> +EXPORT_SYMBOL(__bread_gfp2);
> +
>  static void __invalidate_bh_lrus(struct bh_lru *b)
>  {
>  	int i;
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index 5f23ee599889..751b2744b4ae 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -171,6 +171,18 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh)
>  	return test_bit_acquire(BH_Uptodate, &bh->b_state);
>  }
>  
> +static __always_inline int buffer_uptodate_or_error(struct buffer_head *bh)
> +{
> +	/*
> +	 * If the buffer has the write error flag, data was failed to write
> +	 * out in the block. In this case, set buffer uptodate to prevent
> +	 * reading old data.
> +	 */
> +	if (buffer_write_io_error(bh))
> +		set_buffer_uptodate(bh);
> +	return buffer_uptodate(bh);
> +}
> +
>  static inline unsigned long bh_offset(const struct buffer_head *bh)
>  {
>  	return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1);
> @@ -231,7 +243,11 @@ void __brelse(struct buffer_head *);
>  void __bforget(struct buffer_head *);
>  void __breadahead(struct block_device *, sector_t block, unsigned int size);
>  struct buffer_head *__bread_gfp(struct block_device *,
> -				sector_t block, unsigned size, gfp_t gfp);
> +				sector_t block, unsigned int size, gfp_t gfp);
> +struct buffer_head *__bread_gfp2(struct block_device *bdev, sector_t block,
> +				 unsigned int size, blk_opf_t op_flags,
> +				 gfp_t gfp);
> +
>  struct buffer_head *alloc_buffer_head(gfp_t gfp_flags);
>  void free_buffer_head(struct buffer_head * bh);
>  void unlock_buffer(struct buffer_head *bh);
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis
  2024-01-04 11:28   ` Jan Kara
@ 2024-01-04 12:22     ` Yu Kuai
  2024-01-05  6:10     ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2024-01-04 12:22 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi,

在 2024/01/04 19:28, Jan Kara 写道:
> On Thu 21-12-23 16:56:59, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> On the one hand covert to use folio while reading bdev inode, on the
>> other hand prevent to access bd_inode directly.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ...
>> +		for (p = folio_address(folio); p < max; p++)
>>   			if (*p != -1UL) {
>> -				lock_page(page);
>> -				memset(page_address(page), 0xff, PAGE_SIZE);
>> -				set_page_dirty(page);
>> -				unlock_page(page);
>> -				balance_dirty_pages_ratelimited(mapping);
>> +				folio_lock(folio);
>> +				memset(folio_address(folio), 0xff,
>> +				       folio_size(folio));
>> +				folio_mark_dirty(folio);
>> +				folio_unlock(folio);
>> +				bdev_balance_dirty_pages_ratelimited(bdev);
> 
> Rather then creating this bdev_balance_dirty_pages_ratelimited() just for
> MTD perhaps we can have here (and in other functions):
> 
> 				...
> 				mapping = folio_mapping(folio);
> 				folio_unlock(folio);
> 				if (mapping)
> 					balance_dirty_pages_ratelimited(mapping);
> 
> What do you think? Because when we are working with the folios it is rather
> natural to use their mapping for dirty balancing?

I think this is a great idea! And bdev_balance_dirty_pages_ratelimited()
can be removed as well.

Thanks,
Kuai

> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api
  2024-01-04 12:02   ` Jan Kara
@ 2024-01-04 12:32     ` Yu Kuai
  2024-01-05  4:43       ` Gao Xiang
  0 siblings, 1 reply; 37+ messages in thread
From: Yu Kuai @ 2024-01-04 12:32 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)

Hi, Jan!

在 2024/01/04 20:02, Jan Kara 写道:
> On Thu 21-12-23 16:58:26, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>> block_device.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
> unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
> right thing in erofs_bread() which is the only place that seems to care
> about the erofs_is_fscache_mode() distinction... Also blkszbits is then
> trivially sb->s_blocksize_bits so it would all seem much more
> straightforward.

Thanks for your suggestion, I'll follow this unless Gao Xiang has other
suggestions.

Kuai
> 
> 								Honza
> 
>> ---
>>   fs/erofs/data.c     | 18 ++++++++++++------
>>   fs/erofs/internal.h |  2 ++
>>   2 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>> index c98aeda8abb2..bbe2fe199bf3 100644
>> --- a/fs/erofs/data.c
>> +++ b/fs/erofs/data.c
>> @@ -32,8 +32,8 @@ void erofs_put_metabuf(struct erofs_buf *buf)
>>   void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>>   		  enum erofs_kmap_type type)
>>   {
>> -	struct inode *inode = buf->inode;
>> -	erofs_off_t offset = (erofs_off_t)blkaddr << inode->i_blkbits;
>> +	u8 blkszbits = buf->inode ? buf->inode->i_blkbits : buf->blkszbits;
>> +	erofs_off_t offset = (erofs_off_t)blkaddr << blkszbits;
>>   	pgoff_t index = offset >> PAGE_SHIFT;
>>   	struct page *page = buf->page;
>>   	struct folio *folio;
>> @@ -43,7 +43,9 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>>   		erofs_put_metabuf(buf);
>>   
>>   		nofs_flag = memalloc_nofs_save();
>> -		folio = read_cache_folio(inode->i_mapping, index, NULL, NULL);
>> +		folio = buf->inode ?
>> +			read_mapping_folio(buf->inode->i_mapping, index, NULL) :
>> +			bdev_read_folio(buf->bdev, offset);
>>   		memalloc_nofs_restore(nofs_flag);
>>   		if (IS_ERR(folio))
>>   			return folio;
>> @@ -67,10 +69,14 @@ void *erofs_bread(struct erofs_buf *buf, erofs_blk_t blkaddr,
>>   
>>   void erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb)
>>   {
>> -	if (erofs_is_fscache_mode(sb))
>> +	if (erofs_is_fscache_mode(sb)) {
>>   		buf->inode = EROFS_SB(sb)->s_fscache->inode;
>> -	else
>> -		buf->inode = sb->s_bdev->bd_inode;
>> +		buf->bdev = NULL;
>> +	} else {
>> +		buf->inode = NULL;
>> +		buf->bdev = sb->s_bdev;
>> +		buf->blkszbits = EROFS_SB(sb)->blkszbits;
>> +	}
>>   }
>>   
>>   void *erofs_read_metabuf(struct erofs_buf *buf, struct super_block *sb,
>> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
>> index b0409badb017..c9206351b485 100644
>> --- a/fs/erofs/internal.h
>> +++ b/fs/erofs/internal.h
>> @@ -224,8 +224,10 @@ enum erofs_kmap_type {
>>   
>>   struct erofs_buf {
>>   	struct inode *inode;
>> +	struct block_device *bdev;
>>   	struct page *page;
>>   	void *base;
>> +	u8 blkszbits;
>>   	enum erofs_kmap_type kmap_type;
>>   };
>>   #define __EROFS_BUF_INITIALIZER	((struct erofs_buf){ .page = NULL })
>> -- 
>> 2.39.2
>>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()
  2024-01-04 12:19     ` Yu Kuai
@ 2024-01-04 15:16       ` Jan Kara
  0 siblings, 0 replies; 37+ messages in thread
From: Jan Kara @ 2024-01-04 15:16 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-fsdevel,
	linux-erofs, linux-ext4, linux-nilfs, yi.zhang, yangerkun,
	yukuai (C)

Hi Kuai!

On Thu 04-01-24 20:19:05, Yu Kuai wrote:
> 在 2024/01/04 19:06, Jan Kara 写道:
> > On Thu 21-12-23 16:56:57, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > Avoid to access bd_inode directly, prepare to remove bd_inode from
> > > block_devcie.
> > > 
> > > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > > ---
> > >   drivers/block/xen-blkback/xenbus.c | 3 +--
> > >   1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index e34219ea2b05..e645afa4af57 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -104,8 +104,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
> > >   		xenbus_dev_error(blkif->be->dev, err, "block flush");
> > >   		return;
> > >   	}
> > > -	invalidate_inode_pages2(
> > > -			blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
> > > +	invalidate_bdev(blkif->vbd.bdev_handle->bdev);
> > 
> > This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> > up using mapping_try_invalidate() and there are subtle behavioral
> > differences between these two (for example invalidate_inode_pages2() tries
> > to clean dirty pages using the ->launder_folio method). So I think you'll
> > need helper like invalidate_bdev2() for this.
> 
> Thanks for reviewing this patch, I know the differenct between then,
> what I don't understand is that why using invalidate_inode_pages2()
> here.

Well, then the change in behavior should be at least noted in the
changelog.

> sync_blockdev() is just called and 0 is returned, I think in this
> case it's safe to call invalidate_bdev() directly, or am I missing
> other things?

I still think there's a difference. invalidate_inode_pages2() also unmaps
memory mappings which mapping_try_invalidate() does not do. That being said
in xen_update_blkif_status() we seem to be bringing up a virtual block
device so before this function is called, anybody would have hard time
using anything in it. But this definitely needs a confirmation from Xen
maintainers and a good documentation of the behavioral change in the
changelog.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api
  2024-01-04 12:32     ` Yu Kuai
@ 2024-01-05  4:43       ` Gao Xiang
  0 siblings, 0 replies; 37+ messages in thread
From: Gao Xiang @ 2024-01-05  4:43 UTC (permalink / raw)
  To: Yu Kuai, Jan Kara
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yi.zhang, yangerkun, yukuai (C)



On 2024/1/4 20:32, Yu Kuai wrote:
> Hi, Jan!
> 
> 在 2024/01/04 20:02, Jan Kara 写道:
>> On Thu 21-12-23 16:58:26, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> Avoid to access bd_inode directly, prepare to remove bd_inode from
>>> block_device.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>
>> I'm not erofs maintainer but IMO this is quite ugly and grows erofs_buf
>> unnecessarily. I'd rather store 'sb' pointer in erofs_buf and then do the
>> right thing in erofs_bread() which is the only place that seems to care
>> about the erofs_is_fscache_mode() distinction... Also blkszbits is then
>> trivially sb->s_blocksize_bits so it would all seem much more
>> straightforward.
> 
> Thanks for your suggestion, I'll follow this unless Gao Xiang has other
> suggestions.

Yes, that would be better, I'm fine with that.  Yet in the future we
may support a seperate large dirblocksize more than block size, but
we could revisit later.

Thanks,
Gao Xiang

> 
> Kuai
>>
>>                                 Honza
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status()
  2024-01-04 11:06   ` Jan Kara
  2024-01-04 12:19     ` Yu Kuai
@ 2024-01-05  6:08     ` Christoph Hellwig
  1 sibling, 0 replies; 37+ messages in thread
From: Christoph Hellwig @ 2024-01-05  6:08 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-fsdevel,
	linux-erofs, linux-ext4, linux-nilfs, yukuai3, yi.zhang,
	yangerkun

On Thu, Jan 04, 2024 at 12:06:31PM +0100, Jan Kara wrote:
> This function uses invalidate_inode_pages2() while invalidate_bdev() ends
> up using mapping_try_invalidate() and there are subtle behavioral
> differences between these two (for example invalidate_inode_pages2() tries
> to clean dirty pages using the ->launder_folio method). So I think you'll
> need helper like invalidate_bdev2() for this.

That assues that the existing code actually does this intentionally,
which seems doubtful.  But the change in behavior does not to be
documented and explained.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis
  2024-01-04 11:28   ` Jan Kara
  2024-01-04 12:22     ` Yu Kuai
@ 2024-01-05  6:10     ` Christoph Hellwig
  2024-01-05 10:31       ` Yu Kuai
  1 sibling, 1 reply; 37+ messages in thread
From: Christoph Hellwig @ 2024-01-05  6:10 UTC (permalink / raw)
  To: Jan Kara
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-fsdevel,
	linux-erofs, linux-ext4, linux-nilfs, yukuai3, yi.zhang,
	yangerkun

On Thu, Jan 04, 2024 at 12:28:55PM +0100, Jan Kara wrote:
> What do you think? Because when we are working with the folios it is rather
> natural to use their mapping for dirty balancing?

The real problem is that block2mtd pokes way to deep into block
internals.

I think the saviour here is Christians series to replace the bdev handle
with a struct file, which will allow to use the normal file write path
here and get rid of the entire layering volation.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis
  2024-01-05  6:10     ` Christoph Hellwig
@ 2024-01-05 10:31       ` Yu Kuai
  0 siblings, 0 replies; 37+ messages in thread
From: Yu Kuai @ 2024-01-05 10:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jan Kara
  Cc: Yu Kuai, axboe, roger.pau, colyli, kent.overstreet, joern,
	miquel.raynal, richard, vigneshr, sth, hoeppner, hca, gor,
	agordeev, jejb, martin.petersen, clm, josef, dsterba, viro,
	brauner, nico, xiang, chao, tytso, adilger.kernel, jack,
	konishi.ryusuke, willy, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-fsdevel,
	linux-erofs, linux-ext4, linux-nilfs, yi.zhang, yangerkun,
	yukuai (C)

Hi,

在 2024/01/05 14:10, Christoph Hellwig 写道:
> On Thu, Jan 04, 2024 at 12:28:55PM +0100, Jan Kara wrote:
>> What do you think? Because when we are working with the folios it is rather
>> natural to use their mapping for dirty balancing?
> 
> The real problem is that block2mtd pokes way to deep into block
> internals.
> 
> I think the saviour here is Christians series to replace the bdev handle
> with a struct file, which will allow to use the normal file write path
> here and get rid of the entire layering volation.

Yes, looks like lots of patches from this set is not needed anymore.
I'll stop sending v4 and just send some patches that is not related to
'bd_inode' separately.

Thanks,
Kuai

> 
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis
  2024-01-04 11:49     ` Jan Kara
@ 2024-04-10 17:28       ` David Sterba
  0 siblings, 0 replies; 37+ messages in thread
From: David Sterba @ 2024-04-10 17:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: Matthew Wilcox, Yu Kuai, axboe, roger.pau, colyli,
	kent.overstreet, joern, miquel.raynal, richard, vigneshr, sth,
	hoeppner, hca, gor, agordeev, jejb, martin.petersen, clm, josef,
	dsterba, viro, brauner, nico, xiang, chao, tytso, adilger.kernel,
	jack, konishi.ryusuke, akpm, hare, p.raghav, linux-block,
	linux-kernel, xen-devel, linux-bcache, linux-mtd, linux-s390,
	linux-scsi, linux-bcachefs, linux-btrfs, linux-fsdevel,
	linux-erofs, linux-ext4, linux-nilfs, yukuai3, yi.zhang,
	yangerkun

On Thu, Jan 04, 2024 at 12:49:58PM +0100, Jan Kara wrote:
> On Sat 23-12-23 17:31:55, Matthew Wilcox wrote:
> > On Thu, Dec 21, 2023 at 04:57:04PM +0800, Yu Kuai wrote:
> > > @@ -3674,16 +3670,17 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
> > >  		 * Drop the page of the primary superblock, so later read will
> > >  		 * always read from the device.
> > >  		 */
> > > -		invalidate_inode_pages2_range(mapping,
> > > -				bytenr >> PAGE_SHIFT,
> > > +		invalidate_bdev_range(bdev, bytenr >> PAGE_SHIFT,
> > >  				(bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT);
> > >  	}
> > >  
> > > -	page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> > > -	if (IS_ERR(page))
> > > -		return ERR_CAST(page);
> > > +	nofs_flag = memalloc_nofs_save();
> > > +	folio = bdev_read_folio(bdev, bytenr);
> > > +	memalloc_nofs_restore(nofs_flag);
> > 
> > This is the wrong way to use memalloc_nofs_save/restore.  They should be
> > used at the point that the filesystem takes/releases whatever lock is
> > also used during reclaim.  I don't know btrfs well enough to suggest
> > what lock is missing these annotations.
> 
> In principle I agree with you but in this particular case I agree the ask
> is just too big. I suspect it is one of btrfs btree locks or maybe
> chunk_mutex but I doubt even btrfs developers know and maybe it is just a
> cargo cult. And it is not like this would be the first occurence of this
> anti-pattern in btrfs - see e.g. device_list_add(), add_missing_dev(),
> btrfs_destroy_delalloc_inodes() (here the wrapping around
> invalidate_inode_pages2() looks really weird), and many others...

The pattern is intentional and a temporary solution before we could
implement the scoped NOFS. Functions calling allocations get converted
from GFP_NOFS to GFP_KERNEL but in case they're called from a context
that either holds big locks or can recursively enter the filesystem then
it's protected by the memalloc calls. This should not be surprising.
What may not be obvious is which locks or kmalloc calling functions it
could be, this depends on the analysis of the function call chain and
usually there's enough evidence why it's needed.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors()
  2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors() Yu Kuai
@ 2024-04-11 17:49   ` Kent Overstreet
  0 siblings, 0 replies; 37+ messages in thread
From: Kent Overstreet @ 2024-04-11 17:49 UTC (permalink / raw)
  To: Yu Kuai
  Cc: axboe, roger.pau, colyli, kent.overstreet, joern, miquel.raynal,
	richard, vigneshr, sth, hoeppner, hca, gor, agordeev, jejb,
	martin.petersen, clm, josef, dsterba, viro, brauner, nico, xiang,
	chao, tytso, adilger.kernel, jack, konishi.ryusuke, willy, akpm,
	hare, p.raghav, linux-block, linux-kernel, xen-devel,
	linux-bcache, linux-mtd, linux-s390, linux-scsi, linux-bcachefs,
	linux-btrfs, linux-fsdevel, linux-erofs, linux-ext4, linux-nilfs,
	yukuai3, yi.zhang, yangerkun

On Thu, Dec 21, 2023 at 04:57:02PM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> bdev_sectors() is not used hence remove it.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

Acked-by: Kent Overstreet <kent.overstreet@linux.dev>

> ---
>  fs/bcachefs/util.h | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h
> index 2984b57b2958..22a0acc1704f 100644
> --- a/fs/bcachefs/util.h
> +++ b/fs/bcachefs/util.h
> @@ -516,11 +516,6 @@ static inline unsigned fract_exp_two(unsigned x, unsigned fract_bits)
>  void bch2_bio_map(struct bio *bio, void *base, size_t);
>  int bch2_bio_alloc_pages(struct bio *, size_t, gfp_t);
>  
> -static inline sector_t bdev_sectors(struct block_device *bdev)
> -{
> -	return bdev->bd_inode->i_size >> 9;
> -}
> -
>  #define closure_bio_submit(bio, cl)					\
>  do {									\
>  	closure_get(cl);						\
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2024-04-11 17:49 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  8:56 [PATCH RFC v3 for-6.8/block 00/17] block: don't access bd_inode directly from other modules Yu Kuai
2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 01/17] block: add some bdev apis Yu Kuai
2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 02/17] xen/blkback: use bdev api in xen_update_blkif_status() Yu Kuai
2024-01-04 11:06   ` Jan Kara
2024-01-04 12:19     ` Yu Kuai
2024-01-04 15:16       ` Jan Kara
2024-01-05  6:08     ` Christoph Hellwig
2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 03/17] bcache: use bdev api in read_super() Yu Kuai
2023-12-21  8:56 ` [PATCH RFC v3 for-6.8/block 04/17] mtd: block2mtd: use bdev apis Yu Kuai
2024-01-04 11:28   ` Jan Kara
2024-01-04 12:22     ` Yu Kuai
2024-01-05  6:10     ` Christoph Hellwig
2024-01-05 10:31       ` Yu Kuai
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 05/17] s390/dasd: use bdev api in dasd_format() Yu Kuai
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 06/17] scsicam: use bdev api in scsi_bios_ptable() Yu Kuai
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 07/17] bcachefs: remove dead function bdev_sectors() Yu Kuai
2024-04-11 17:49   ` Kent Overstreet
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 08/17] bio: export bio_add_folio_nofail() Yu Kuai
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 09/17] btrfs: use bdev apis Yu Kuai
2023-12-23 17:31   ` Matthew Wilcox
2023-12-23 18:39     ` Kent Overstreet
2024-01-04 11:49     ` Jan Kara
2024-04-10 17:28       ` David Sterba
2023-12-21  8:57 ` [PATCH RFC v3 for-6.8/block 10/17] cramfs: use bdev apis in cramfs_blkdev_read() Yu Kuai
2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 11/17] erofs: use bdev api Yu Kuai
2024-01-04 12:02   ` Jan Kara
2024-01-04 12:32     ` Yu Kuai
2024-01-05  4:43       ` Gao Xiang
2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 12/17] nilfs2: use bdev api in nilfs_attach_log_writer() Yu Kuai
2023-12-21 14:54   ` Ryusuke Konishi
2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 13/17] jbd2: use bdev apis Yu Kuai
2024-01-04 12:11   ` Jan Kara
2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 14/17] buffer: add a new helper to read sb block Yu Kuai
2024-01-04 12:22   ` Jan Kara
2023-12-21  8:58 ` [PATCH RFC v3 for-6.8/block 15/17] ext4: use " Yu Kuai
2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 16/17] ext4: remove block_device_ejected() Yu Kuai
2023-12-21  8:59 ` [PATCH RFC v3 for-6.8/block 17/17] ext4: use bdev apis Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).