All of lore.kernel.org
 help / color / mirror / Atom feed
* consolidate btrfs checksumming, repair and bio splitting v4
@ 2023-01-21  6:49 Christoph Hellwig
  2023-01-21  6:49 ` [PATCH 01/34] block: export bio_split_rw Christoph Hellwig
                   ` (34 more replies)
  0 siblings, 35 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:49 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Hi all,

this series moves a large amount of duplicate code below btrfs_submit_bio
into what I call the 'storage' layer.  Instead of duplicating code to
checksum, check checksums and repair and split bios in all the caller
of btrfs_submit_bio (buffered I/O, direct I/O, compressed I/O, encoded
I/O), the work is done one in a central place, often more optiomal and
without slight changes in behavior.  Once that is done the upper layers
also don't need to split the bios for extent boundaries, as the storage
layer can do that itself, including splitting the bios for the zone
append limits for zoned I/O.

The split work is inspired by an earlier series from Qu, from which it
also reuses a few patches.

A git tree is also available:

    git://git.infradead.org/users/hch/misc.git btrfs-bio-split

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-bio-split

Changes since v2:
 - split up one patch into 15 and update the commit message
 - fix a use after free in run_one_async_done
   (the end result is identical to v2 except for the one line fix
   in run_one_async_done)

Changes since v2:
 - minor rebase to latest misc-next
 - pick up various reviews for all patches

Changes since v1:
 - rebased to latest btrfs/misc-next
 - added a new patch to remove the fs_info argument to btrfs_submit_bio
 - clean up the repair_one_sector calling convention a bit
 - don't move file_offset for ONE_ORDERED bios in btrfs_split_bio
 - temporarily use btrfs_zoned_get_device in alloc_compressed_bio
 - minor refactoring of some of the checksum helpers
 - split up a few patches
 - drop a few of the blk_status_t to int conversion for now
 - various whitespace fixes

Diffstat:
 block/blk-merge.c         |    3 
 fs/btrfs/bio.c            |  547 ++++++++++++++++++++++++++++++++++++----
 fs/btrfs/bio.h            |   67 +---
 fs/btrfs/btrfs_inode.h    |   22 -
 fs/btrfs/compression.c    |  273 +++-----------------
 fs/btrfs/compression.h    |    3 
 fs/btrfs/disk-io.c        |  203 +--------------
 fs/btrfs/disk-io.h        |   11 
 fs/btrfs/extent-io-tree.h |    1 
 fs/btrfs/extent_io.c      |  556 ++++-------------------------------------
 fs/btrfs/extent_io.h      |   31 --
 fs/btrfs/file-item.c      |   74 +----
 fs/btrfs/file-item.h      |    8 
 fs/btrfs/fs.h             |    5 
 fs/btrfs/inode.c          |  621 ++++++----------------------------------------
 fs/btrfs/volumes.c        |  115 ++------
 fs/btrfs/volumes.h        |   18 -
 fs/btrfs/zoned.c          |   85 ++----
 fs/btrfs/zoned.h          |   16 -
 fs/iomap/direct-io.c      |   10 
 include/linux/bio.h       |    4 
 include/linux/iomap.h     |    3 
 22 files changed, 819 insertions(+), 1857 deletions(-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/34] block: export bio_split_rw
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
@ 2023-01-21  6:49 ` Christoph Hellwig
  2023-01-21  6:49 ` [PATCH 02/34] btrfs: better document struct btrfs_bio Christoph Hellwig
                   ` (33 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:49 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel, Chaitanya Kulkarni

bio_split_rw can be used by file systems to split and incoming write
bio into multiple bios fitting the hardware limit for use as ZONE_APPEND
bios.  Export it for initial use in btrfs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/blk-merge.c   | 3 ++-
 include/linux/bio.h | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b7c193d67185de..64bf7d9dd8e852 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -276,7 +276,7 @@ static bool bvec_split_segs(const struct queue_limits *lim,
  * responsible for ensuring that @bs is only destroyed after processing of the
  * split bio has finished.
  */
-static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
+struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 		unsigned *segs, struct bio_set *bs, unsigned max_bytes)
 {
 	struct bio_vec bv, bvprv, *bvprvp = NULL;
@@ -336,6 +336,7 @@ static struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
 	bio_clear_polled(bio);
 	return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs);
 }
+EXPORT_SYMBOL_GPL(bio_split_rw);
 
 /**
  * __bio_split_to_limits - split a bio to fit the queue limits
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c1da63f6c80800..d766be7152e151 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -12,6 +12,8 @@
 
 #define BIO_MAX_VECS		256U
 
+struct queue_limits;
+
 static inline unsigned int bio_max_segs(unsigned int nr_segs)
 {
 	return min(nr_segs, BIO_MAX_VECS);
@@ -375,6 +377,8 @@ static inline void bip_set_seed(struct bio_integrity_payload *bip,
 void bio_trim(struct bio *bio, sector_t offset, sector_t size);
 extern struct bio *bio_split(struct bio *bio, int sectors,
 			     gfp_t gfp, struct bio_set *bs);
+struct bio *bio_split_rw(struct bio *bio, const struct queue_limits *lim,
+		unsigned *segs, struct bio_set *bs, unsigned max_bytes);
 
 /**
  * bio_next_split - get next @sectors from a bio, splitting if necessary
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/34] btrfs: better document struct btrfs_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
  2023-01-21  6:49 ` [PATCH 01/34] block: export bio_split_rw Christoph Hellwig
@ 2023-01-21  6:49 ` Christoph Hellwig
  2023-01-22 10:19   ` Anand Jain
  2023-01-23 15:25   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
                   ` (32 subsequent siblings)
  34 siblings, 2 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:49 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Update the comments on btrfs_bio to better describe the structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index b12f84b3b3410f..baaa27961cc812 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -26,9 +26,8 @@ struct btrfs_fs_info;
 typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
 
 /*
- * Additional info to pass along bio.
- *
- * Mostly for btrfs specific features like csum and mirror_num.
+ * Highlevel btrfs I/O structure.  It is allocated by btrfs_bio_alloc and
+ * passed to btrfs_submit_bio for mapping to the physical devices.
  */
 struct btrfs_bio {
 	unsigned int mirror_num:7;
@@ -42,7 +41,7 @@ struct btrfs_bio {
 	unsigned int is_metadata:1;
 	struct bvec_iter iter;
 
-	/* for direct I/O */
+	/* File offset that this I/O operates on. */
 	u64 file_offset;
 
 	/* @device is for stripe IO submission. */
@@ -62,7 +61,7 @@ struct btrfs_bio {
 	btrfs_bio_end_io_t end_io;
 	void *private;
 
-	/* For read end I/O handling */
+	/* For internal use in read end I/O handling */
 	struct work_struct end_io_work;
 
 	/*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
  2023-01-21  6:49 ` [PATCH 01/34] block: export bio_split_rw Christoph Hellwig
  2023-01-21  6:49 ` [PATCH 02/34] btrfs: better document struct btrfs_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-22 10:20   ` Anand Jain
                     ` (3 more replies)
  2023-01-21  6:50 ` [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization Christoph Hellwig
                   ` (31 subsequent siblings)
  34 siblings, 4 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

All btrfs_bio I/Os are associated with an inode.  Add a pointer to that
inode, which will allow to simplify a lot of calling conventions, and
which will be needed in the I/O completion path in the future.

This grow the btrfs_bio struture by a pointer, but that grows will
be offset by the removal of the device pointer soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c         | 8 ++++++--
 fs/btrfs/bio.h         | 5 ++++-
 fs/btrfs/compression.c | 3 ++-
 fs/btrfs/extent_io.c   | 8 ++++----
 fs/btrfs/inode.c       | 4 +++-
 5 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 8affc88b0e0a4b..2398bb263957b2 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -22,9 +22,11 @@ static struct bio_set btrfs_bioset;
  * is already initialized by the block layer.
  */
 static inline void btrfs_bio_init(struct btrfs_bio *bbio,
+				  struct btrfs_inode *inode,
 				  btrfs_bio_end_io_t end_io, void *private)
 {
 	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->inode = inode;
 	bbio->end_io = end_io;
 	bbio->private = private;
 }
@@ -37,16 +39,18 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio,
  * a mempool.
  */
 struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
+			    struct btrfs_inode *inode,
 			    btrfs_bio_end_io_t end_io, void *private)
 {
 	struct bio *bio;
 
 	bio = bio_alloc_bioset(NULL, nr_vecs, opf, GFP_NOFS, &btrfs_bioset);
-	btrfs_bio_init(btrfs_bio(bio), end_io, private);
+	btrfs_bio_init(btrfs_bio(bio), inode, end_io, private);
 	return bio;
 }
 
 struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
+				    struct btrfs_inode *inode,
 				    btrfs_bio_end_io_t end_io, void *private)
 {
 	struct bio *bio;
@@ -56,7 +60,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
 
 	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
 	bbio = btrfs_bio(bio);
-	btrfs_bio_init(bbio, end_io, private);
+	btrfs_bio_init(bbio, inode, end_io, private);
 
 	bio_trim(bio, offset >> 9, size >> 9);
 	bbio->iter = bio->bi_iter;
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index baaa27961cc812..8d69d0b226d99b 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -41,7 +41,8 @@ struct btrfs_bio {
 	unsigned int is_metadata:1;
 	struct bvec_iter iter;
 
-	/* File offset that this I/O operates on. */
+	/* Inode and offset into it that this I/O operates on. */
+	struct btrfs_inode *inode;
 	u64 file_offset;
 
 	/* @device is for stripe IO submission. */
@@ -80,8 +81,10 @@ int __init btrfs_bioset_init(void);
 void __cold btrfs_bioset_exit(void);
 
 struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
+			    struct btrfs_inode *inode,
 			    btrfs_bio_end_io_t end_io, void *private);
 struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
+				    struct btrfs_inode *inode,
 				    btrfs_bio_end_io_t end_io, void *private);
 
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4a5aeb8dd4793a..b8e3e899974b34 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -344,7 +344,8 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, endio_func, cb);
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, BTRFS_I(cb->inode), endio_func,
+			      cb);
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
 
 	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9bd32daa9b9a6f..faf9312a46c0e1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -740,7 +740,8 @@ int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_
 		return -EIO;
 	}
 
-	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io,
+	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->inode,
+				     failed_bbio->end_io,
 				     failed_bbio->private);
 	repair_bbio = btrfs_bio(repair_bio);
 	repair_bbio->file_offset = start;
@@ -1394,9 +1395,8 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	struct bio *bio;
 	int ret;
 
-	ASSERT(bio_ctrl->end_io_func);
-
-	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, bio_ctrl->end_io_func, NULL);
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, inode, bio_ctrl->end_io_func,
+			      NULL);
 	/*
 	 * For compressed page range, its disk_bytenr is always @disk_bytenr
 	 * passed in, no matter if we have added any range into previous bio.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3c49742f0d4556..0a85e42f114cc5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8097,7 +8097,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		 * the allocation is backed by btrfs_bioset.
 		 */
 		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len,
-					      btrfs_end_dio_bio, dip);
+					      BTRFS_I(inode), btrfs_end_dio_bio,
+					      dip);
 		btrfs_bio(bio)->file_offset = file_offset;
 
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
@@ -10409,6 +10410,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 
 			if (!bio) {
 				bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ,
+						      inode,
 						      btrfs_encoded_read_endio,
 						      &priv);
 				bio->bi_iter.bi_sector =
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 15:59   ` Johannes Thumshirn
  2023-01-24 14:55   ` Anand Jain
  2023-01-21  6:50 ` [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums Christoph Hellwig
                   ` (30 subsequent siblings)
  34 siblings, 2 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

To prepare for pending changes drop the optimization to only look up
csums once per bio that is submitted from the iomap layer.  In the
short run this does cause additional lookups for fragmented direct
reads, but later in the series, the bio based lookup will be used on
the entire bio submitted from iomap, restoring the old behavior
in common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 32 +++++---------------------------
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0a85e42f114cc5..863a5527853c66 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -100,9 +100,6 @@ struct btrfs_dio_private {
 	 */
 	refcount_t refs;
 
-	/* Array of checksums */
-	u8 *csums;
-
 	/* This must be last */
 	struct bio bio;
 };
@@ -7907,7 +7904,6 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 			      dip->file_offset + dip->bytes - 1, NULL);
 	}
 
-	kfree(dip->csums);
 	bio_endio(&dip->bio);
 }
 
@@ -7990,7 +7986,6 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 				 u64 file_offset, int async_submit)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_dio_private *dip = btrfs_bio(bio)->private;
 	blk_status_t ret;
 
 	/* Save the original iter for read repair */
@@ -8017,8 +8012,11 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 			return;
 		}
 	} else {
-		btrfs_bio(bio)->csum = btrfs_csum_ptr(fs_info, dip->csums,
-						      file_offset - dip->file_offset);
+		ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL);
+		if (ret) {
+			btrfs_bio_end_io(btrfs_bio(bio), ret);
+			return;
+		}
 	}
 map:
 	btrfs_submit_bio(fs_info, bio, 0);
@@ -8030,7 +8028,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	struct btrfs_dio_private *dip =
 		container_of(dio_bio, struct btrfs_dio_private, bio);
 	struct inode *inode = iter->inode;
-	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
 			     BTRFS_BLOCK_GROUP_RAID56_MASK);
@@ -8051,25 +8048,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	dip->file_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
 	refcount_set(&dip->refs, 1);
-	dip->csums = NULL;
-
-	if (!write && !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
-		unsigned int nr_sectors =
-			(dio_bio->bi_iter.bi_size >> fs_info->sectorsize_bits);
-
-		/*
-		 * Load the csums up front to reduce csum tree searches and
-		 * contention when submitting bios.
-		 */
-		status = BLK_STS_RESOURCE;
-		dip->csums = kcalloc(nr_sectors, fs_info->csum_size, GFP_NOFS);
-		if (!dip->csums)
-			goto out_err;
-
-		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
-		if (status != BLK_STS_OK)
-			goto out_err;
-	}
 
 	start_sector = dio_bio->bi_iter.bi_sector;
 	submit_len = dio_bio->bi_iter.bi_size;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:06   ` Johannes Thumshirn
  2023-01-24 15:16   ` Anand Jain
  2023-01-21  6:50 ` [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio Christoph Hellwig
                   ` (29 subsequent siblings)
  34 siblings, 2 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

The csums argument is always NULL now, so remove it and always allocate
the csums array in the btrfs_bio.  Also pass the btrfs_bio instead of
inode + bio to document that this function requires a btrfs_bio and
not just any bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c |  2 +-
 fs/btrfs/file-item.c   | 52 ++++++++++++++++--------------------------
 fs/btrfs/file-item.h   |  2 +-
 fs/btrfs/inode.c       |  6 ++---
 4 files changed, 25 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b8e3e899974b34..ab7f7ea499d9ca 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -801,7 +801,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			 */
 			btrfs_bio(comp_bio)->file_offset = file_offset;
 
-			ret = btrfs_lookup_bio_sums(inode, comp_bio, NULL);
+			ret = btrfs_lookup_bio_sums(btrfs_bio(comp_bio));
 			if (ret) {
 				btrfs_bio_end_io(btrfs_bio(comp_bio), ret);
 				break;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 5de73466b2ca2d..c5324fe8f4be7a 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -380,32 +380,25 @@ static int search_file_offset_in_bio(struct bio *bio, struct inode *inode,
 /*
  * Lookup the checksum for the read bio in csum tree.
  *
- * @inode:  inode that the bio is for.
- * @bio:    bio to look up.
- * @dst:    Buffer of size nblocks * btrfs_super_csum_size() used to return
- *          checksum (nblocks = bio->bi_iter.bi_size / fs_info->sectorsize). If
- *          NULL, the checksum buffer is allocated and returned in
- *          btrfs_bio(bio)->csum instead.
- *
  * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise.
  */
-blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst)
+blk_status_t btrfs_lookup_bio_sums(struct btrfs_bio *bbio)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
-	struct btrfs_bio *bbio = NULL;
+	struct btrfs_inode *inode = bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct extent_io_tree *io_tree = &inode->io_tree;
+	struct bio *bio = &bbio->bio;
 	struct btrfs_path *path;
 	const u32 sectorsize = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
 	u32 orig_len = bio->bi_iter.bi_size;
 	u64 orig_disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 cur_disk_bytenr;
-	u8 *csum;
 	const unsigned int nblocks = orig_len >> fs_info->sectorsize_bits;
 	int count = 0;
 	blk_status_t ret = BLK_STS_OK;
 
-	if ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ||
+	if ((inode->flags & BTRFS_INODE_NODATASUM) ||
 	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
 		return BLK_STS_OK;
 
@@ -426,21 +419,14 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	if (!path)
 		return BLK_STS_RESOURCE;
 
-	if (!dst) {
-		bbio = btrfs_bio(bio);
-
-		if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
-			bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS);
-			if (!bbio->csum) {
-				btrfs_free_path(path);
-				return BLK_STS_RESOURCE;
-			}
-		} else {
-			bbio->csum = bbio->csum_inline;
+	if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
+		bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS);
+		if (!bbio->csum) {
+			btrfs_free_path(path);
+			return BLK_STS_RESOURCE;
 		}
-		csum = bbio->csum;
 	} else {
-		csum = dst;
+		bbio->csum = bbio->csum_inline;
 	}
 
 	/*
@@ -456,7 +442,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	 * read from the commit root and sidestep a nasty deadlock
 	 * between reading the free space cache and updating the csum tree.
 	 */
-	if (btrfs_is_free_space_inode(BTRFS_I(inode))) {
+	if (btrfs_is_free_space_inode(inode)) {
 		path->search_commit_root = 1;
 		path->skip_locking = 1;
 	}
@@ -479,14 +465,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 		ASSERT(cur_disk_bytenr - orig_disk_bytenr < UINT_MAX);
 		sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >>
 				fs_info->sectorsize_bits;
-		csum_dst = csum + sector_offset * csum_size;
+		csum_dst = bbio->csum + sector_offset * csum_size;
 
 		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
 					 search_len, csum_dst);
 		if (count < 0) {
 			ret = errno_to_blk_status(count);
-			if (bbio)
-				btrfs_bio_free_csum(bbio);
+			if (bbio->csum != bbio->csum_inline)
+				kfree(bbio->csum);
+			bbio->csum = NULL;
 			break;
 		}
 
@@ -504,12 +491,13 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 			memset(csum_dst, 0, csum_size);
 			count = 1;
 
-			if (BTRFS_I(inode)->root->root_key.objectid ==
+			if (inode->root->root_key.objectid ==
 			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
 				u64 file_offset;
 				int ret;
 
-				ret = search_file_offset_in_bio(bio, inode,
+				ret = search_file_offset_in_bio(bio,
+						&inode->vfs_inode,
 						cur_disk_bytenr, &file_offset);
 				if (ret)
 					set_extent_bits(io_tree, file_offset,
diff --git a/fs/btrfs/file-item.h b/fs/btrfs/file-item.h
index 0312256684349b..a2f9747adf3ac0 100644
--- a/fs/btrfs/file-item.h
+++ b/fs/btrfs/file-item.h
@@ -38,7 +38,7 @@ static inline u32 btrfs_file_extent_calc_inline_size(u32 datasize)
 
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
-blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst);
+blk_status_t btrfs_lookup_bio_sums(struct btrfs_bio *bbio);
 int btrfs_insert_hole_extent(struct btrfs_trans_handle *trans,
 			     struct btrfs_root *root, u64 objectid, u64 pos,
 			     u64 num_bytes);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 863a5527853c66..7c8f5349ed7a4c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2780,7 +2780,7 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 	 * Lookup bio sums does extra checks around whether we need to csum or
 	 * not, which is why we ignore skip_sum here.
 	 */
-	ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL);
+	ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
 	if (ret) {
 		btrfs_bio_end_io(btrfs_bio(bio), ret);
 		return;
@@ -8012,7 +8012,7 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 			return;
 		}
 	} else {
-		ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL);
+		ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
 		if (ret) {
 			btrfs_bio_end_io(btrfs_bio(bio), ret);
 			return;
@@ -10279,7 +10279,7 @@ static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
 	blk_status_t ret;
 
 	if (!priv->skip_csum) {
-		ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL);
+		ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
 		if (ret)
 			return ret;
 	}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:13   ` Johannes Thumshirn
  2023-01-24 15:27   ` Anand Jain
  2023-01-21  6:50 ` [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code Christoph Hellwig
                   ` (28 subsequent siblings)
  34 siblings, 2 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Add a bbio local variable and to prepare for calling functions that
return a blk_status_t, rename the existing int used for error handling
so that ret can be reused for the blk_status_t, and a label that can be
reused for failing the passed in bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2398bb263957b2..23afeeb6dbc8eb 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -230,20 +230,21 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr)
 
 void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num)
 {
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
 	struct btrfs_io_context *bioc = NULL;
 	struct btrfs_io_stripe smap;
-	int ret;
+	blk_status_t ret;
+	int error;
 
 	btrfs_bio_counter_inc_blocked(fs_info);
-	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
-				&bioc, &smap, &mirror_num, 1);
-	if (ret) {
-		btrfs_bio_counter_dec(fs_info);
-		btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
-		return;
+	error = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
+				  &bioc, &smap, &mirror_num, 1);
+	if (error) {
+		ret = errno_to_blk_status(error);
+		goto fail;
 	}
 
 	if (map_length < length) {
@@ -255,8 +256,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 
 	if (!bioc) {
 		/* Single mirror read/write fast path */
-		btrfs_bio(bio)->mirror_num = mirror_num;
-		btrfs_bio(bio)->device = smap.dev;
+		bbio->mirror_num = mirror_num;
+		bbio->device = smap.dev;
 		bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
 		bio->bi_private = fs_info;
 		bio->bi_end_io = btrfs_simple_end_io;
@@ -278,6 +279,11 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 		for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
 			btrfs_submit_mirrored_bio(bioc, dev_nr);
 	}
+	return;
+
+fail:
+	btrfs_bio_counter_dec(fs_info);
+	btrfs_bio_end_io(bbio, ret);
 }
 
 /*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:18   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio Christoph Hellwig
                   ` (27 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

All callers of btrfs_submit_bio that want to validate checksums
currently have to store a copy of the iter in the btrfs_bio.  Move
the assignment into common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c         | 7 ++++++-
 fs/btrfs/compression.c | 4 ----
 fs/btrfs/extent_io.c   | 1 -
 fs/btrfs/inode.c       | 7 -------
 4 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 23afeeb6dbc8eb..2651f20891f08f 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -63,7 +63,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
 	btrfs_bio_init(bbio, inode, end_io, private);
 
 	bio_trim(bio, offset >> 9, size >> 9);
-	bbio->iter = bio->bi_iter;
 	return bio;
 }
 
@@ -254,6 +253,12 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 		BUG();
 	}
 
+	/*
+	 * Save the iter for the end_io handler for data reads.
+	 */
+	if (bio_op(bio) == REQ_OP_READ && !(bio->bi_opf & REQ_META))
+		bbio->iter = bio->bi_iter;
+
 	if (!bioc) {
 		/* Single mirror read/write fast path */
 		bbio->mirror_num = mirror_num;
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ab7f7ea499d9ca..ba458b88be26b0 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -789,10 +789,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			submit = true;
 
 		if (submit) {
-			/* Save the original iter for read repair */
-			if (bio_op(comp_bio) == REQ_OP_READ)
-				btrfs_bio(comp_bio)->iter = comp_bio->bi_iter;
-
 			/*
 			 * Save the initial offset of this chunk, as there
 			 * is no direct correlation between compressed pages and
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index faf9312a46c0e1..44cacf62e4314e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -756,7 +756,6 @@ int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_
 	}
 
 	bio_add_page(repair_bio, page, failrec->len, pgoff);
-	repair_bbio->iter = repair_bio->bi_iter;
 
 	btrfs_debug(fs_info,
 		    "repair read error: submitting new read to mirror %d",
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7c8f5349ed7a4c..c368a45bc079d6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2773,9 +2773,6 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 		return;
 	}
 
-	/* Save the original iter for read repair */
-	btrfs_bio(bio)->iter = bio->bi_iter;
-
 	/*
 	 * Lookup bio sums does extra checks around whether we need to csum or
 	 * not, which is why we ignore skip_sum here.
@@ -7988,10 +7985,6 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	blk_status_t ret;
 
-	/* Save the original iter for read repair */
-	if (btrfs_op(bio) == BTRFS_MAP_READ)
-		btrfs_bio(bio)->iter = bio->bi_iter;
-
 	if (inode->flags & BTRFS_INODE_NODATASUM)
 		goto map;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:32   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper Christoph Hellwig
                   ` (26 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Instead of calling btrfs_lookup_bio_sums in every caller of
btrfs_submit_bio that reads data, do the call once in btrfs_submit_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c         | 10 ++++++++--
 fs/btrfs/compression.c |  6 ------
 fs/btrfs/inode.c       | 24 ------------------------
 3 files changed, 8 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 2651f20891f08f..6fbb71e60037be 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -14,6 +14,7 @@
 #include "dev-replace.h"
 #include "rcu-string.h"
 #include "zoned.h"
+#include "file-item.h"
 
 static struct bio_set btrfs_bioset;
 
@@ -254,10 +255,15 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	}
 
 	/*
-	 * Save the iter for the end_io handler for data reads.
+	 * Save the iter for the end_io handler and preload the checksums for
+	 * data reads.
 	 */
-	if (bio_op(bio) == REQ_OP_READ && !(bio->bi_opf & REQ_META))
+	if (bio_op(bio) == REQ_OP_READ && !(bio->bi_opf & REQ_META)) {
 		bbio->iter = bio->bi_iter;
+		ret = btrfs_lookup_bio_sums(bbio);
+		if (ret)
+			goto fail;
+	}
 
 	if (!bioc) {
 		/* Single mirror read/write fast path */
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ba458b88be26b0..fdcc6f3b90b128 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -797,12 +797,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			 */
 			btrfs_bio(comp_bio)->file_offset = file_offset;
 
-			ret = btrfs_lookup_bio_sums(btrfs_bio(comp_bio));
-			if (ret) {
-				btrfs_bio_end_io(btrfs_bio(comp_bio), ret);
-				break;
-			}
-
 			ASSERT(comp_bio->bi_iter.bi_size);
 			btrfs_submit_bio(fs_info, comp_bio, mirror_num);
 			comp_bio = NULL;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c368a45bc079d6..598897b0d661da 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2762,7 +2762,6 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 			int mirror_num, enum btrfs_compression_type compress_type)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	blk_status_t ret;
 
 	if (compress_type != BTRFS_COMPRESS_NONE) {
 		/*
@@ -2773,16 +2772,6 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 		return;
 	}
 
-	/*
-	 * Lookup bio sums does extra checks around whether we need to csum or
-	 * not, which is why we ignore skip_sum here.
-	 */
-	ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
-	if (ret) {
-		btrfs_bio_end_io(btrfs_bio(bio), ret);
-		return;
-	}
-
 	btrfs_submit_bio(fs_info, bio, mirror_num);
 }
 
@@ -8004,12 +7993,6 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 			btrfs_bio_end_io(btrfs_bio(bio), ret);
 			return;
 		}
-	} else {
-		ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
-		if (ret) {
-			btrfs_bio_end_io(btrfs_bio(bio), ret);
-			return;
-		}
 	}
 map:
 	btrfs_submit_bio(fs_info, bio, 0);
@@ -10269,13 +10252,6 @@ static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
 {
 	struct btrfs_encoded_read_private *priv = btrfs_bio(bio)->private;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	blk_status_t ret;
-
-	if (!priv->skip_csum) {
-		ret = btrfs_lookup_bio_sums(btrfs_bio(bio));
-		if (ret)
-			return ret;
-	}
 
 	atomic_inc(&priv->pending);
 	btrfs_submit_bio(fs_info, bio, mirror_num);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:36   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer Christoph Hellwig
                   ` (25 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Add a new checksumming helper that wraps btrfs_check_data_csum and
does all the checks to if we're dealing with some form of nodatacsum
I/O.  This helper will be used by the new storage layer checksum
validation and repair code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h |  2 ++
 fs/btrfs/inode.c       | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 195c09e20609e4..3faabcef9898fb 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -423,6 +423,8 @@ int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
 int btrfs_check_data_csum(struct btrfs_inode *inode, struct btrfs_bio *bbio,
 			  u32 bio_offset, struct page *page, u32 pgoff);
+bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
+			u32 bio_offset, struct bio_vec *bv);
 unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
 				    u32 bio_offset, struct page *page,
 				    u64 start, u64 end);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 598897b0d661da..12b76d272f08d5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3495,6 +3495,45 @@ int btrfs_check_data_csum(struct btrfs_inode *inode, struct btrfs_bio *bbio,
 	return -EIO;
 }
 
+/*
+ * btrfs_data_csum_ok - verify the checksum of single data sector
+ * @bbio:	btrfs_io_bio which contains the csum
+ * @dev:	device the sector is on
+ * @bio_offset:	offset to the beginning of the bio (in bytes)
+ * @bv:		bio_vec to check
+ *
+ * Check if the checksum on a data block is valid.  When a checksum mismatch is
+ * detected, report the error and fill the corrupted range with zero.
+ *
+ * Return %true if the sector is ok or had no checksum to start with, else
+ * %false.
+ */
+bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
+			u32 bio_offset, struct bio_vec *bv)
+{
+	struct btrfs_inode *inode = bbio->inode;
+	u64 file_offset = bbio->file_offset + bio_offset;
+	u64 end = file_offset + bv->bv_len - 1;
+
+	if (!bbio->csum)
+		return true;
+
+	if (btrfs_is_data_reloc_root(inode->root) &&
+	    test_range_bit(&inode->io_tree, file_offset, end, EXTENT_NODATASUM,
+			1, NULL)) {
+		/* Skip the range without csum for data reloc inode */
+		clear_extent_bits(&inode->io_tree, file_offset, end,
+				  EXTENT_NODATASUM);
+		return true;
+	}
+
+	if (btrfs_check_data_csum(inode, bbio, bio_offset, bv->bv_page,
+				  bv->bv_offset) < 0)
+		return false;
+	return true;
+}
+
+
 /*
  * When reads are done, we need to check csums to verify the data is correct.
  * if there's a match, we allow the bio to finish.  If not, the code in
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:39   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 11/34] btrfs: remove btrfs_bio_free_csum Christoph Hellwig
                   ` (24 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Currently btrfs handles checksum validation and repair in the end I/O
handler for the btrfs_bio.  This leads to a lot of duplicate code
plus issues with variying semantics or bugs, e.g.

 - the until recently broken repair for compressed extents
 - the fact that encoded reads validate the checksums but do not kick
   of read repair
 - the inconsistent checking of the BTRFS_FS_STATE_NO_CSUMS flag

This commit revamps the checksum validation and repair code to instead
work below the btrfs_submit_bio interfaces.

In case of a checksum failure (or a plain old I/O error), the repair
is now kicked off before the upper level ->end_io handler is invoked.

Progress of an in-progress repair is tracked by a small structure
that is allocated using a mempool for each original bio with failed
sectors, which holds a reference to the original bio.   This new
structure is allocated using a mempool to guarantee forward progress
even under memory pressure.  The mempool will be replenished when
the repair completes, just as the mempools backing the bios.

There is one significant behavior change here:  If repair fails or
is impossible to start with, the whole bio will be failed to the
upper layer.  This is the behavior that all I/O submitters except
for buffered I/O already emulated in their end_io handler.  For
buffered I/O this now means that a large readahead request can
fail due to a single bad sector, but as readahead errors are igored
the following readpage if the sector is actually accessed will
still be able to read.  This also matches the I/O failure handling
in other file systems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         | 193 ++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/compression.c |  41 +--------
 fs/btrfs/extent_io.c   | 125 ++------------------------
 fs/btrfs/inode.c       |  81 +----------------
 4 files changed, 206 insertions(+), 234 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 6fbb71e60037be..d1a545158bb0a0 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -17,6 +17,14 @@
 #include "file-item.h"
 
 static struct bio_set btrfs_bioset;
+static struct bio_set btrfs_repair_bioset;
+static mempool_t btrfs_failed_bio_pool;
+
+struct btrfs_failed_bio {
+	struct btrfs_bio *bbio;
+	int num_copies;
+	atomic_t repair_count;
+};
 
 /*
  * Initialize a btrfs_bio structure.  This skips the embedded bio itself as it
@@ -67,6 +75,165 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
 	return bio;
 }
 
+static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
+{
+	if (cur_mirror == fbio->num_copies)
+		return cur_mirror + 1 - fbio->num_copies;
+	return cur_mirror + 1;
+}
+
+static int prev_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
+{
+	if (cur_mirror == 1)
+		return fbio->num_copies;
+	return cur_mirror - 1;
+}
+
+static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
+{
+	if (atomic_dec_and_test(&fbio->repair_count)) {
+		fbio->bbio->end_io(fbio->bbio);
+		mempool_free(fbio, &btrfs_failed_bio_pool);
+	}
+}
+
+static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
+				 struct btrfs_device *dev)
+{
+	struct btrfs_failed_bio *fbio = repair_bbio->private;
+	struct btrfs_inode *inode = repair_bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	struct bio_vec *bv = bio_first_bvec_all(&repair_bbio->bio);
+	int mirror = repair_bbio->mirror_num;
+
+	if (repair_bbio->bio.bi_status ||
+	    !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) {
+		bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ);
+		repair_bbio->bio.bi_iter = repair_bbio->iter;
+
+		mirror = next_repair_mirror(fbio, mirror);
+		if (mirror == fbio->bbio->mirror_num) {
+			btrfs_debug(fs_info, "no mirror left");
+			fbio->bbio->bio.bi_status = BLK_STS_IOERR;
+			goto done;
+		}
+
+		btrfs_submit_bio(fs_info, &repair_bbio->bio, mirror);
+		return;
+	}
+
+	do {
+		mirror = prev_repair_mirror(fbio, mirror);
+		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
+				  repair_bbio->file_offset, fs_info->sectorsize,
+				  repair_bbio->iter.bi_sector <<
+					SECTOR_SHIFT,
+				  bv->bv_page, bv->bv_offset, mirror);
+	} while (mirror != fbio->bbio->mirror_num);
+
+done:
+	btrfs_repair_done(fbio);
+	bio_put(&repair_bbio->bio);
+}
+
+/*
+ * Try to kick off a repair read to the next available mirror for a bad
+ * sector.
+ *
+ * This primarily tries to recover good data to serve the actual read request,
+ * but also tries to write the good data back to the bad mirror(s) when a
+ * read succeeded to restore the redundancy.
+ */
+static struct btrfs_failed_bio *repair_one_sector(struct btrfs_bio *failed_bbio,
+						  u32 bio_offset,
+						  struct bio_vec *bv,
+						  struct btrfs_failed_bio *fbio)
+{
+	struct btrfs_inode *inode = failed_bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	const u32 sectorsize = fs_info->sectorsize;
+	const u64 logical = failed_bbio->iter.bi_sector << SECTOR_SHIFT;
+	struct btrfs_bio *repair_bbio;
+	struct bio *repair_bio;
+	int num_copies;
+	int mirror;
+
+	btrfs_debug(fs_info, "repair read error: read error at %llu",
+		    failed_bbio->file_offset + bio_offset);
+
+	num_copies = btrfs_num_copies(fs_info, logical, sectorsize);
+	if (num_copies == 1) {
+		btrfs_debug(fs_info, "no copy to repair from");
+		failed_bbio->bio.bi_status = BLK_STS_IOERR;
+		return fbio;
+	}
+
+	if (!fbio) {
+		fbio = mempool_alloc(&btrfs_failed_bio_pool, GFP_NOFS);
+		fbio->bbio = failed_bbio;
+		fbio->num_copies = num_copies;
+		atomic_set(&fbio->repair_count, 1);
+	}
+
+	atomic_inc(&fbio->repair_count);
+
+	repair_bio = bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_NOFS,
+				      &btrfs_repair_bioset);
+	repair_bio->bi_iter.bi_sector = failed_bbio->iter.bi_sector;
+	bio_add_page(repair_bio, bv->bv_page, bv->bv_len, bv->bv_offset);
+
+	repair_bbio = btrfs_bio(repair_bio);
+	btrfs_bio_init(repair_bbio, failed_bbio->inode, NULL, fbio);
+	repair_bbio->file_offset = failed_bbio->file_offset + bio_offset;
+
+	mirror = next_repair_mirror(fbio, failed_bbio->mirror_num);
+	btrfs_debug(fs_info, "submitting repair read to mirror %d", mirror);
+	btrfs_submit_bio(fs_info, repair_bio, mirror);
+	return fbio;
+}
+
+static void btrfs_check_read_bio(struct btrfs_bio *bbio,
+				 struct btrfs_device *dev)
+{
+	struct btrfs_inode *inode = bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
+	unsigned int sectorsize = fs_info->sectorsize;
+	struct bvec_iter *iter = &bbio->iter;
+	blk_status_t status = bbio->bio.bi_status;
+	struct btrfs_failed_bio *fbio = NULL;
+	u32 offset = 0;
+
+	/*
+	 * Hand off repair bios to the repair code as there is no upper level
+	 * submitter for them.
+	 */
+	if (unlikely(bbio->bio.bi_pool == &btrfs_repair_bioset)) {
+		btrfs_end_repair_bio(bbio, dev);
+		return;
+	}
+
+	/* Clear the I/O error.  A failed repair will reset it */
+	bbio->bio.bi_status = BLK_STS_OK;
+
+	while (iter->bi_size) {
+		struct bio_vec bv = bio_iter_iovec(&bbio->bio, *iter);
+
+		bv.bv_len = min(bv.bv_len, sectorsize);
+		if (status || !btrfs_data_csum_ok(bbio, dev, offset, &bv))
+			fbio = repair_one_sector(bbio, offset, &bv, fbio);
+
+		bio_advance_iter_single(&bbio->bio, iter, sectorsize);
+		offset += sectorsize;
+	}
+
+	btrfs_bio_free_csum(bbio);
+
+	if (unlikely(fbio))
+		btrfs_repair_done(fbio);
+	else
+		bbio->end_io(bbio);
+}
+
 static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
 {
 	if (!dev || !dev->bdev)
@@ -94,7 +261,11 @@ static void btrfs_end_bio_work(struct work_struct *work)
 {
 	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work);
 
-	bbio->end_io(bbio);
+	/* Metadata reads are checked and repaired by the submitter */
+	if (bbio->bio.bi_opf & REQ_META)
+		bbio->end_io(bbio);
+	else
+		btrfs_check_read_bio(bbio, bbio->device);
 }
 
 static void btrfs_simple_end_io(struct bio *bio)
@@ -122,7 +293,10 @@ static void btrfs_raid56_end_io(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	bbio->mirror_num = bioc->mirror_num;
-	bbio->end_io(bbio);
+	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
+		btrfs_check_read_bio(bbio, NULL);
+	else
+		bbio->end_io(bbio);
 
 	btrfs_put_bioc(bioc);
 }
@@ -402,10 +576,25 @@ int __init btrfs_bioset_init(void)
 			offsetof(struct btrfs_bio, bio),
 			BIOSET_NEED_BVECS))
 		return -ENOMEM;
+	if (bioset_init(&btrfs_repair_bioset, BIO_POOL_SIZE,
+			offsetof(struct btrfs_bio, bio),
+			BIOSET_NEED_BVECS))
+		goto out_free_bioset;
+	if (mempool_init_kmalloc_pool(&btrfs_failed_bio_pool, BIO_POOL_SIZE,
+				      sizeof(struct btrfs_failed_bio)))
+		goto out_free_repair_bioset;
 	return 0;
+
+out_free_repair_bioset:
+	bioset_exit(&btrfs_repair_bioset);
+out_free_bioset:
+	bioset_exit(&btrfs_bioset);
+	return -ENOMEM;
 }
 
 void __cold btrfs_bioset_exit(void)
 {
+	mempool_exit(&btrfs_failed_bio_pool);
+	bioset_exit(&btrfs_repair_bioset);
 	bioset_exit(&btrfs_bioset);
 }
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index fdcc6f3b90b128..f84ccb185c2a6f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -164,52 +164,15 @@ static void finish_compressed_bio_read(struct compressed_bio *cb)
 	kfree(cb);
 }
 
-/*
- * Verify the checksums and kick off repair if needed on the uncompressed data
- * before decompressing it into the original bio and freeing the uncompressed
- * pages.
- */
 static void end_compressed_bio_read(struct btrfs_bio *bbio)
 {
 	struct compressed_bio *cb = bbio->private;
-	struct inode *inode = cb->inode;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_inode *bi = BTRFS_I(inode);
-	bool csum = !(bi->flags & BTRFS_INODE_NODATASUM) &&
-		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
-	blk_status_t status = bbio->bio.bi_status;
-	struct bvec_iter iter;
-	struct bio_vec bv;
-	u32 offset;
-
-	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
-		u64 start = bbio->file_offset + offset;
-
-		if (!status &&
-		    (!csum || !btrfs_check_data_csum(bi, bbio, offset,
-						     bv.bv_page, bv.bv_offset))) {
-			btrfs_clean_io_failure(bi, start, bv.bv_page,
-					       bv.bv_offset);
-		} else {
-			int ret;
-
-			refcount_inc(&cb->pending_ios);
-			ret = btrfs_repair_one_sector(BTRFS_I(inode), bbio, offset,
-						      bv.bv_page, bv.bv_offset,
-						      true);
-			if (ret) {
-				refcount_dec(&cb->pending_ios);
-				status = errno_to_blk_status(ret);
-			}
-		}
-	}
 
-	if (status)
-		cb->status = status;
+	if (bbio->bio.bi_status)
+		cb->status = bbio->bio.bi_status;
 
 	if (refcount_dec_and_test(&cb->pending_ios))
 		finish_compressed_bio_read(cb);
-	btrfs_bio_free_csum(bbio);
 	bio_put(&bbio->bio);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 44cacf62e4314e..81f44b6c9fad50 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -803,79 +803,6 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 		btrfs_subpage_end_reader(fs_info, page, start, len);
 }
 
-static void end_sector_io(struct page *page, u64 offset, bool uptodate)
-{
-	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
-	const u32 sectorsize = inode->root->fs_info->sectorsize;
-
-	end_page_read(page, uptodate, offset, sectorsize);
-	unlock_extent(&inode->io_tree, offset, offset + sectorsize - 1, NULL);
-}
-
-static void submit_data_read_repair(struct inode *inode,
-				    struct btrfs_bio *failed_bbio,
-				    u32 bio_offset, const struct bio_vec *bvec,
-				    unsigned int error_bitmap)
-{
-	const unsigned int pgoff = bvec->bv_offset;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct page *page = bvec->bv_page;
-	const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset;
-	const u64 end = start + bvec->bv_len - 1;
-	const u32 sectorsize = fs_info->sectorsize;
-	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
-	int i;
-
-	BUG_ON(bio_op(&failed_bbio->bio) == REQ_OP_WRITE);
-
-	/* This repair is only for data */
-	ASSERT(is_data_inode(inode));
-
-	/* We're here because we had some read errors or csum mismatch */
-	ASSERT(error_bitmap);
-
-	/*
-	 * We only get called on buffered IO, thus page must be mapped and bio
-	 * must not be cloned.
-	 */
-	ASSERT(page->mapping && !bio_flagged(&failed_bbio->bio, BIO_CLONED));
-
-	/* Iterate through all the sectors in the range */
-	for (i = 0; i < nr_bits; i++) {
-		const unsigned int offset = i * sectorsize;
-		bool uptodate = false;
-		int ret;
-
-		if (!(error_bitmap & (1U << i))) {
-			/*
-			 * This sector has no error, just end the page read
-			 * and unlock the range.
-			 */
-			uptodate = true;
-			goto next;
-		}
-
-		ret = btrfs_repair_one_sector(BTRFS_I(inode), failed_bbio,
-				bio_offset + offset, page, pgoff + offset,
-				true);
-		if (!ret) {
-			/*
-			 * We have submitted the read repair, the page release
-			 * will be handled by the endio function of the
-			 * submitted repair bio.
-			 * Thus we don't need to do any thing here.
-			 */
-			continue;
-		}
-		/*
-		 * Continue on failed repair, otherwise the remaining sectors
-		 * will not be properly unlocked.
-		 */
-next:
-		end_sector_io(page, start + offset, uptodate);
-	}
-}
-
 /* lots and lots of room for performance fixes in the end_bio funcs */
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
@@ -1093,8 +1020,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
-		unsigned int error_bitmap = (unsigned int)-1;
-		bool repair = false;
 		u64 start;
 		u64 end;
 		u32 len;
@@ -1126,25 +1051,15 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 		len = bvec->bv_len;
 
 		mirror = bbio->mirror_num;
-		if (likely(uptodate)) {
-			if (is_data_inode(inode)) {
-				error_bitmap = btrfs_verify_data_csum(bbio,
-						bio_offset, page, start, end);
-				if (error_bitmap)
-					uptodate = false;
-			} else {
-				if (btrfs_validate_metadata_buffer(bbio,
-						page, start, end, mirror))
-					uptodate = false;
-			}
-		}
+		if (uptodate && !is_data_inode(inode) &&
+		    btrfs_validate_metadata_buffer(bbio, page, start, end,
+						   mirror))
+			uptodate = false;
 
 		if (likely(uptodate)) {
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
 
-			btrfs_clean_io_failure(BTRFS_I(inode), start, page, 0);
-
 			/*
 			 * Zero out the remaining part if this range straddles
 			 * i_size.
@@ -1161,19 +1076,7 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 				zero_user_segment(page, zero_start,
 						  offset_in_page(end) + 1);
 			}
-		} else if (is_data_inode(inode)) {
-			/*
-			 * Only try to repair bios that actually made it to a
-			 * device.  If the bio failed to be submitted mirror
-			 * is 0 and we need to fail it without retrying.
-			 *
-			 * This also includes the high level bios for compressed
-			 * extents - these never make it to a device and repair
-			 * is already handled on the lower compressed bio.
-			 */
-			if (mirror > 0)
-				repair = true;
-		} else {
+		} else if (!is_data_inode(inode)) {
 			struct extent_buffer *eb;
 
 			eb = find_extent_buffer_readpage(fs_info, page, start);
@@ -1182,19 +1085,10 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 			atomic_dec(&eb->io_pages);
 		}
 
-		if (repair) {
-			/*
-			 * submit_data_read_repair() will handle all the good
-			 * and bad sectors, we just continue to the next bvec.
-			 */
-			submit_data_read_repair(inode, bbio, bio_offset, bvec,
-						error_bitmap);
-		} else {
-			/* Update page status and unlock */
-			end_page_read(page, uptodate, start, len);
-			endio_readpage_release_extent(&processed, BTRFS_I(inode),
-					start, end, PageUptodate(page));
-		}
+		/* Update page status and unlock */
+		end_page_read(page, uptodate, start, len);
+		endio_readpage_release_extent(&processed, BTRFS_I(inode),
+				start, end, PageUptodate(page));
 
 		ASSERT(bio_offset + len > bio_offset);
 		bio_offset += len;
@@ -1202,7 +1096,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 	}
 	/* Release the last extent */
 	endio_readpage_release_extent(&processed, NULL, 0, 0, false);
-	btrfs_bio_free_csum(bbio);
 	bio_put(bio);
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 12b76d272f08d5..d122bf9a72aaff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7942,39 +7942,6 @@ void btrfs_submit_dio_repair_bio(struct btrfs_inode *inode, struct bio *bio, int
 	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
 }
 
-static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
-					     struct btrfs_bio *bbio,
-					     const bool uptodate)
-{
-	struct inode *inode = &dip->inode->vfs_inode;
-	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
-	blk_status_t err = BLK_STS_OK;
-	struct bvec_iter iter;
-	struct bio_vec bv;
-	u32 offset;
-
-	btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) {
-		u64 start = bbio->file_offset + offset;
-
-		if (uptodate &&
-		    (!csum || !btrfs_check_data_csum(BTRFS_I(inode), bbio, offset,
-						     bv.bv_page, bv.bv_offset))) {
-			btrfs_clean_io_failure(BTRFS_I(inode), start,
-					       bv.bv_page, bv.bv_offset);
-		} else {
-			int ret;
-
-			ret = btrfs_repair_one_sector(BTRFS_I(inode), bbio, offset,
-					bv.bv_page, bv.bv_offset, false);
-			if (ret)
-				err = errno_to_blk_status(ret);
-		}
-	}
-
-	return err;
-}
-
 blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
 					      struct bio *bio,
 					      u64 dio_file_offset)
@@ -7988,18 +7955,14 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
 	struct bio *bio = &bbio->bio;
 	blk_status_t err = bio->bi_status;
 
-	if (err)
+	if (err) {
 		btrfs_warn(dip->inode->root->fs_info,
 			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
 			   btrfs_ino(dip->inode), bio_op(bio),
 			   bio->bi_opf, bio->bi_iter.bi_sector,
 			   bio->bi_iter.bi_size, err);
-
-	if (bio_op(bio) == REQ_OP_READ)
-		err = btrfs_check_read_dio_bio(dip, bbio, !err);
-
-	if (err)
 		dip->bio.bi_status = err;
+	}
 
 	btrfs_record_physical_zoned(&dip->inode->vfs_inode, bbio->file_offset, bio);
 
@@ -10283,7 +10246,6 @@ struct btrfs_encoded_read_private {
 	wait_queue_head_t wait;
 	atomic_t pending;
 	blk_status_t status;
-	bool skip_csum;
 };
 
 static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
@@ -10297,44 +10259,11 @@ static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
 	return BLK_STS_OK;
 }
 
-static blk_status_t btrfs_encoded_read_verify_csum(struct btrfs_bio *bbio)
-{
-	const bool uptodate = (bbio->bio.bi_status == BLK_STS_OK);
-	struct btrfs_encoded_read_private *priv = bbio->private;
-	struct btrfs_inode *inode = priv->inode;
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u32 sectorsize = fs_info->sectorsize;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
-	u32 bio_offset = 0;
-
-	if (priv->skip_csum || !uptodate)
-		return bbio->bio.bi_status;
-
-	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
-		unsigned int i, nr_sectors, pgoff;
-
-		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
-		pgoff = bvec->bv_offset;
-		for (i = 0; i < nr_sectors; i++) {
-			ASSERT(pgoff < PAGE_SIZE);
-			if (btrfs_check_data_csum(inode, bbio, bio_offset,
-					    bvec->bv_page, pgoff))
-				return BLK_STS_IOERR;
-			bio_offset += sectorsize;
-			pgoff += sectorsize;
-		}
-	}
-	return BLK_STS_OK;
-}
-
 static void btrfs_encoded_read_endio(struct btrfs_bio *bbio)
 {
 	struct btrfs_encoded_read_private *priv = bbio->private;
-	blk_status_t status;
 
-	status = btrfs_encoded_read_verify_csum(bbio);
-	if (status) {
+	if (bbio->bio.bi_status) {
 		/*
 		 * The memory barrier implied by the atomic_dec_return() here
 		 * pairs with the memory barrier implied by the
@@ -10343,11 +10272,10 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio)
 		 * write is observed before the load of status in
 		 * btrfs_encoded_read_regular_fill_pages().
 		 */
-		WRITE_ONCE(priv->status, status);
+		WRITE_ONCE(priv->status, bbio->bio.bi_status);
 	}
 	if (!atomic_dec_return(&priv->pending))
 		wake_up(&priv->wait);
-	btrfs_bio_free_csum(bbio);
 	bio_put(&bbio->bio);
 }
 
@@ -10360,7 +10288,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 		.inode = inode,
 		.file_offset = file_offset,
 		.pending = ATOMIC_INIT(1),
-		.skip_csum = (inode->flags & BTRFS_INODE_NODATASUM),
 	};
 	unsigned long i = 0;
 	u64 cur = 0;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/34] btrfs: remove btrfs_bio_free_csum
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:41   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector Christoph Hellwig
                   ` (23 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

btrfs_bio_free_csum has only one caller left, and that calle is always
for an data inode and doesn't need zeroing of the csum pointer as that
pointer will never be touched again.  Just open code the conditional
kfree there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c |  3 ++-
 fs/btrfs/bio.h | 10 ----------
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index d1a545158bb0a0..cdee76e3a6121a 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -226,7 +226,8 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio,
 		offset += sectorsize;
 	}
 
-	btrfs_bio_free_csum(bbio);
+	if (bbio->csum != bbio->csum_inline)
+		kfree(bbio->csum);
 
 	if (unlikely(fbio))
 		btrfs_repair_done(fbio);
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 8d69d0b226d99b..996275eb106260 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -94,16 +94,6 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 	bbio->end_io(bbio);
 }
 
-static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
-{
-	if (bbio->is_metadata)
-		return;
-	if (bbio->csum != bbio->csum_inline) {
-		kfree(bbio->csum);
-		bbio->csum = NULL;
-	}
-}
-
 /*
  * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
  *
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 11/34] btrfs: remove btrfs_bio_free_csum Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:42   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 13/34] btrfs: remove now unused checksumming helpers Christoph Hellwig
                   ` (22 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

btrfs_bio_for_each_sector is unused now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 996275eb106260..2e799c3348d5a6 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -94,22 +94,6 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 	bbio->end_io(bbio);
 }
 
-/*
- * Iterate through a btrfs_bio (@bbio) on a per-sector basis.
- *
- * bvl        - struct bio_vec
- * bbio       - struct btrfs_bio
- * iters      - struct bvec_iter
- * bio_offset - unsigned int
- */
-#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset)	\
-	for ((iter) = (bbio)->iter, (bio_offset) = 0;			\
-	     (iter).bi_size &&					\
-	     (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1);	\
-	     (bio_offset) += fs_info->sectorsize,			\
-	     bio_advance_iter_single(&(bbio)->bio, &(iter),		\
-	     (fs_info)->sectorsize))
-
 void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		      int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/34] btrfs: remove now unused checksumming helpers
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-23 16:49   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio Christoph Hellwig
                   ` (21 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Remove the unused btrfs_verify_data_csum helper, and fold
btrfs_check_data_csum into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h |   5 --
 fs/btrfs/inode.c       | 127 ++++++-----------------------------------
 2 files changed, 17 insertions(+), 115 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 3faabcef9898fb..99430d0ebbb5fd 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -421,13 +421,8 @@ blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
 					      u64 dio_file_offset);
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
-int btrfs_check_data_csum(struct btrfs_inode *inode, struct btrfs_bio *bbio,
-			  u32 bio_offset, struct page *page, u32 pgoff);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
-unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
-				    u32 bio_offset, struct page *page,
-				    u64 start, u64 end);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
 			      u64 *orig_start, u64 *orig_block_len,
 			      u64 *ram_bytes, bool nowait, bool strict);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d122bf9a72aaff..c63160cf135d98 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3456,45 +3456,6 @@ static u8 *btrfs_csum_ptr(const struct btrfs_fs_info *fs_info, u8 *csums, u64 of
 	return csums + offset_in_sectors * fs_info->csum_size;
 }
 
-/*
- * check_data_csum - verify checksum of one sector of uncompressed data
- * @inode:	inode
- * @bbio:	btrfs_bio which contains the csum
- * @bio_offset:	offset to the beginning of the bio (in bytes)
- * @page:	page where is the data to be verified
- * @pgoff:	offset inside the page
- *
- * The length of such check is always one sector size.
- *
- * When csum mismatch is detected, we will also report the error and fill the
- * corrupted range with zero. (Thus it needs the extra parameters)
- */
-int btrfs_check_data_csum(struct btrfs_inode *inode, struct btrfs_bio *bbio,
-			  u32 bio_offset, struct page *page, u32 pgoff)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	u32 len = fs_info->sectorsize;
-	u8 *csum_expected;
-	u8 csum[BTRFS_CSUM_SIZE];
-
-	ASSERT(pgoff + len <= PAGE_SIZE);
-
-	csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
-
-	if (btrfs_check_sector_csum(fs_info, page, pgoff, csum, csum_expected))
-		goto zeroit;
-	return 0;
-
-zeroit:
-	btrfs_print_data_csum_error(inode, bbio->file_offset + bio_offset,
-				    csum, csum_expected, bbio->mirror_num);
-	if (bbio->device)
-		btrfs_dev_stat_inc_and_print(bbio->device,
-					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
-	memzero_page(page, pgoff, len);
-	return -EIO;
-}
-
 /*
  * btrfs_data_csum_ok - verify the checksum of single data sector
  * @bbio:	btrfs_io_bio which contains the csum
@@ -3512,8 +3473,13 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv)
 {
 	struct btrfs_inode *inode = bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	u64 file_offset = bbio->file_offset + bio_offset;
 	u64 end = file_offset + bv->bv_len - 1;
+	u8 *csum_expected;
+	u8 csum[BTRFS_CSUM_SIZE];
+
+	ASSERT(bv->bv_len == fs_info->sectorsize);
 
 	if (!bbio->csum)
 		return true;
@@ -3527,78 +3493,19 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 		return true;
 	}
 
-	if (btrfs_check_data_csum(inode, bbio, bio_offset, bv->bv_page,
-				  bv->bv_offset) < 0)
-		return false;
+	csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
+	if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
+				    csum_expected))
+		goto zeroit;
 	return true;
-}
-
-
-/*
- * When reads are done, we need to check csums to verify the data is correct.
- * if there's a match, we allow the bio to finish.  If not, the code in
- * extent_io.c will try to find good copies for us.
- *
- * @bio_offset:	offset to the beginning of the bio (in bytes)
- * @start:	file offset of the range start
- * @end:	file offset of the range end (inclusive)
- *
- * Return a bitmap where bit set means a csum mismatch, and bit not set means
- * csum match.
- */
-unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
-				    u32 bio_offset, struct page *page,
-				    u64 start, u64 end)
-{
-	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
-	struct btrfs_root *root = inode->root;
-	struct btrfs_fs_info *fs_info = root->fs_info;
-	struct extent_io_tree *io_tree = &inode->io_tree;
-	const u32 sectorsize = root->fs_info->sectorsize;
-	u32 pg_off;
-	unsigned int result = 0;
-
-	/*
-	 * This only happens for NODATASUM or compressed read.
-	 * Normally this should be covered by above check for compressed read
-	 * or the next check for NODATASUM.  Just do a quicker exit here.
-	 */
-	if (bbio->csum == NULL)
-		return 0;
-
-	if (inode->flags & BTRFS_INODE_NODATASUM)
-		return 0;
-
-	if (unlikely(test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state)))
-		return 0;
-
-	ASSERT(page_offset(page) <= start &&
-	       end <= page_offset(page) + PAGE_SIZE - 1);
-	for (pg_off = offset_in_page(start);
-	     pg_off < offset_in_page(end);
-	     pg_off += sectorsize, bio_offset += sectorsize) {
-		u64 file_offset = pg_off + page_offset(page);
-		int ret;
-
-		if (btrfs_is_data_reloc_root(root) &&
-		    test_range_bit(io_tree, file_offset,
-				   file_offset + sectorsize - 1,
-				   EXTENT_NODATASUM, 1, NULL)) {
-			/* Skip the range without csum for data reloc inode */
-			clear_extent_bits(io_tree, file_offset,
-					  file_offset + sectorsize - 1,
-					  EXTENT_NODATASUM);
-			continue;
-		}
-		ret = btrfs_check_data_csum(inode, bbio, bio_offset, page, pg_off);
-		if (ret < 0) {
-			const int nr_bit = (pg_off - offset_in_page(start)) >>
-				     root->fs_info->sectorsize_bits;
-
-			result |= (1U << nr_bit);
-		}
-	}
-	return result;
+zeroit:
+	btrfs_print_data_csum_error(inode, file_offset, csum, csum_expected,
+				    bbio->mirror_num);
+	if (dev)
+		btrfs_dev_stat_inc_and_print(dev,
+					     BTRFS_DEV_STAT_CORRUPTION_ERRS);
+	memzero_bvec(bv);
+	return false;
 }
 
 /*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 13/34] btrfs: remove now unused checksumming helpers Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 11:01   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 15/34] btrfs: remove the io_failure_record infrastructure Christoph Hellwig
                   ` (20 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

The device field is only used by the simple end I/O handler, and for
that it can simply be stored in the bi_private field of the bio,
which is currently used for the fs_info that can be retreived through
bbio->inode as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c | 10 +++++-----
 fs/btrfs/bio.h |  2 --
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index cdee76e3a6121a..e32a2dac97c541 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -266,18 +266,19 @@ static void btrfs_end_bio_work(struct work_struct *work)
 	if (bbio->bio.bi_opf & REQ_META)
 		bbio->end_io(bbio);
 	else
-		btrfs_check_read_bio(bbio, bbio->device);
+		btrfs_check_read_bio(bbio, bbio->bio.bi_private);
 }
 
 static void btrfs_simple_end_io(struct bio *bio)
 {
-	struct btrfs_fs_info *fs_info = bio->bi_private;
 	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_device *dev = bio->bi_private;
+	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
 
 	btrfs_bio_counter_dec(fs_info);
 
 	if (bio->bi_status)
-		btrfs_log_dev_io_error(bio, bbio->device);
+		btrfs_log_dev_io_error(bio, dev);
 
 	if (bio_op(bio) == REQ_OP_READ) {
 		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
@@ -443,9 +444,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	if (!bioc) {
 		/* Single mirror read/write fast path */
 		bbio->mirror_num = mirror_num;
-		bbio->device = smap.dev;
 		bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
-		bio->bi_private = fs_info;
+		bio->bi_private = smap.dev;
 		bio->bi_end_io = btrfs_simple_end_io;
 		btrfs_submit_dev_bio(smap.dev, bio);
 	} else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 2e799c3348d5a6..61a791cf5360f3 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -45,8 +45,6 @@ struct btrfs_bio {
 	struct btrfs_inode *inode;
 	u64 file_offset;
 
-	/* @device is for stripe IO submission. */
-	struct btrfs_device *device;
 	union {
 		/* For data checksum verification. */
 		struct {
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/34] btrfs: remove the io_failure_record infrastructure
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (13 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 11:04   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio Christoph Hellwig
                   ` (19 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

strutc io_failure_record and the io_failure_tree tree are unused now,
so remove them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h    |   7 -
 fs/btrfs/extent-io-tree.h |   1 -
 fs/btrfs/extent_io.c      | 260 --------------------------------------
 fs/btrfs/extent_io.h      |  31 -----
 fs/btrfs/inode.c          |  16 ---
 5 files changed, 315 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 99430d0ebbb5fd..78c7979b8dcac7 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -93,12 +93,6 @@ struct btrfs_inode {
 	/* the io_tree does range state (DIRTY, LOCKED etc) */
 	struct extent_io_tree io_tree;
 
-	/* special utility tree used to record which mirrors have already been
-	 * tried when checksums fail for a given block
-	 */
-	struct rb_root io_failure_tree;
-	spinlock_t io_failure_lock;
-
 	/*
 	 * Keep track of where the inode has extent items mapped in order to
 	 * make sure the i_size adjustments are accurate
@@ -414,7 +408,6 @@ static inline void btrfs_inode_split_flags(u64 inode_item_flags,
 void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
 void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 			int mirror_num, enum btrfs_compression_type compress_type);
-void btrfs_submit_dio_repair_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
 blk_status_t btrfs_submit_bio_start(struct btrfs_inode *inode, struct bio *bio);
 blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
 					      struct bio *bio,
diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index e3eeec380844c7..21766e49ec02e7 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -6,7 +6,6 @@
 #include "misc.h"
 
 struct extent_changeset;
-struct io_failure_record;
 
 /* Bits for the extent state */
 enum {
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 81f44b6c9fad50..75b717a7b0f97d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -515,266 +515,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 			       start, end, page_ops, NULL);
 }
 
-static int insert_failrec(struct btrfs_inode *inode,
-			  struct io_failure_record *failrec)
-{
-	struct rb_node *exist;
-
-	spin_lock(&inode->io_failure_lock);
-	exist = rb_simple_insert(&inode->io_failure_tree, failrec->bytenr,
-				 &failrec->rb_node);
-	spin_unlock(&inode->io_failure_lock);
-
-	return (exist == NULL) ? 0 : -EEXIST;
-}
-
-static struct io_failure_record *get_failrec(struct btrfs_inode *inode, u64 start)
-{
-	struct rb_node *node;
-	struct io_failure_record *failrec = ERR_PTR(-ENOENT);
-
-	spin_lock(&inode->io_failure_lock);
-	node = rb_simple_search(&inode->io_failure_tree, start);
-	if (node)
-		failrec = rb_entry(node, struct io_failure_record, rb_node);
-	spin_unlock(&inode->io_failure_lock);
-	return failrec;
-}
-
-static void free_io_failure(struct btrfs_inode *inode,
-			    struct io_failure_record *rec)
-{
-	spin_lock(&inode->io_failure_lock);
-	rb_erase(&rec->rb_node, &inode->io_failure_tree);
-	spin_unlock(&inode->io_failure_lock);
-
-	kfree(rec);
-}
-
-static int next_mirror(const struct io_failure_record *failrec, int cur_mirror)
-{
-	if (cur_mirror == failrec->num_copies)
-		return cur_mirror + 1 - failrec->num_copies;
-	return cur_mirror + 1;
-}
-
-static int prev_mirror(const struct io_failure_record *failrec, int cur_mirror)
-{
-	if (cur_mirror == 1)
-		return failrec->num_copies;
-	return cur_mirror - 1;
-}
-
-/*
- * each time an IO finishes, we do a fast check in the IO failure tree
- * to see if we need to process or clean up an io_failure_record
- */
-int btrfs_clean_io_failure(struct btrfs_inode *inode, u64 start,
-			   struct page *page, unsigned int pg_offset)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct extent_io_tree *io_tree = &inode->io_tree;
-	u64 ino = btrfs_ino(inode);
-	u64 locked_start, locked_end;
-	struct io_failure_record *failrec;
-	int mirror;
-	int ret;
-
-	failrec = get_failrec(inode, start);
-	if (IS_ERR(failrec))
-		return 0;
-
-	BUG_ON(!failrec->this_mirror);
-
-	if (sb_rdonly(fs_info->sb))
-		goto out;
-
-	ret = find_first_extent_bit(io_tree, failrec->bytenr, &locked_start,
-				    &locked_end, EXTENT_LOCKED, NULL);
-	if (ret || locked_start > failrec->bytenr ||
-	    locked_end < failrec->bytenr + failrec->len - 1)
-		goto out;
-
-	mirror = failrec->this_mirror;
-	do {
-		mirror = prev_mirror(failrec, mirror);
-		btrfs_repair_io_failure(fs_info, ino, start, failrec->len,
-				  failrec->logical, page, pg_offset, mirror);
-	} while (mirror != failrec->failed_mirror);
-
-out:
-	free_io_failure(inode, failrec);
-	return 0;
-}
-
-/*
- * Can be called when
- * - hold extent lock
- * - under ordered extent
- * - the inode is freeing
- */
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end)
-{
-	struct io_failure_record *failrec;
-	struct rb_node *node, *next;
-
-	if (RB_EMPTY_ROOT(&inode->io_failure_tree))
-		return;
-
-	spin_lock(&inode->io_failure_lock);
-	node = rb_simple_search_first(&inode->io_failure_tree, start);
-	while (node) {
-		failrec = rb_entry(node, struct io_failure_record, rb_node);
-		if (failrec->bytenr > end)
-			break;
-
-		next = rb_next(node);
-		rb_erase(&failrec->rb_node, &inode->io_failure_tree);
-		kfree(failrec);
-
-		node = next;
-	}
-	spin_unlock(&inode->io_failure_lock);
-}
-
-static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode,
-							     struct btrfs_bio *bbio,
-							     unsigned int bio_offset)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	u64 start = bbio->file_offset + bio_offset;
-	struct io_failure_record *failrec;
-	const u32 sectorsize = fs_info->sectorsize;
-	int ret;
-
-	failrec = get_failrec(BTRFS_I(inode), start);
-	if (!IS_ERR(failrec)) {
-		btrfs_debug(fs_info,
-	"Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu",
-			failrec->logical, failrec->bytenr, failrec->len);
-		/*
-		 * when data can be on disk more than twice, add to failrec here
-		 * (e.g. with a list for failed_mirror) to make
-		 * clean_io_failure() clean all those errors at once.
-		 */
-		ASSERT(failrec->this_mirror == bbio->mirror_num);
-		ASSERT(failrec->len == fs_info->sectorsize);
-		return failrec;
-	}
-
-	failrec = kzalloc(sizeof(*failrec), GFP_NOFS);
-	if (!failrec)
-		return ERR_PTR(-ENOMEM);
-
-	RB_CLEAR_NODE(&failrec->rb_node);
-	failrec->bytenr = start;
-	failrec->len = sectorsize;
-	failrec->failed_mirror = bbio->mirror_num;
-	failrec->this_mirror = bbio->mirror_num;
-	failrec->logical = (bbio->iter.bi_sector << SECTOR_SHIFT) + bio_offset;
-
-	btrfs_debug(fs_info,
-		    "new io failure record logical %llu start %llu",
-		    failrec->logical, start);
-
-	failrec->num_copies = btrfs_num_copies(fs_info, failrec->logical, sectorsize);
-	if (failrec->num_copies == 1) {
-		/*
-		 * We only have a single copy of the data, so don't bother with
-		 * all the retry and error correction code that follows. No
-		 * matter what the error is, it is very likely to persist.
-		 */
-		btrfs_debug(fs_info,
-			"cannot repair logical %llu num_copies %d",
-			failrec->logical, failrec->num_copies);
-		kfree(failrec);
-		return ERR_PTR(-EIO);
-	}
-
-	/* Set the bits in the private failure tree */
-	ret = insert_failrec(BTRFS_I(inode), failrec);
-	if (ret) {
-		kfree(failrec);
-		return ERR_PTR(ret);
-	}
-
-	return failrec;
-}
-
-int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_bbio,
-			    u32 bio_offset, struct page *page, unsigned int pgoff,
-			    bool submit_buffered)
-{
-	u64 start = failed_bbio->file_offset + bio_offset;
-	struct io_failure_record *failrec;
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct bio *failed_bio = &failed_bbio->bio;
-	const int icsum = bio_offset >> fs_info->sectorsize_bits;
-	struct bio *repair_bio;
-	struct btrfs_bio *repair_bbio;
-
-	btrfs_debug(fs_info,
-		   "repair read error: read error at %llu", start);
-
-	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
-
-	failrec = btrfs_get_io_failure_record(&inode->vfs_inode, failed_bbio, bio_offset);
-	if (IS_ERR(failrec))
-		return PTR_ERR(failrec);
-
-	/*
-	 * There are two premises:
-	 * a) deliver good data to the caller
-	 * b) correct the bad sectors on disk
-	 *
-	 * Since we're only doing repair for one sector, we only need to get
-	 * a good copy of the failed sector and if we succeed, we have setup
-	 * everything for btrfs_repair_io_failure to do the rest for us.
-	 */
-	failrec->this_mirror = next_mirror(failrec, failrec->this_mirror);
-	if (failrec->this_mirror == failrec->failed_mirror) {
-		btrfs_debug(fs_info,
-			"failed to repair num_copies %d this_mirror %d failed_mirror %d",
-			failrec->num_copies, failrec->this_mirror, failrec->failed_mirror);
-		free_io_failure(inode, failrec);
-		return -EIO;
-	}
-
-	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->inode,
-				     failed_bbio->end_io,
-				     failed_bbio->private);
-	repair_bbio = btrfs_bio(repair_bio);
-	repair_bbio->file_offset = start;
-	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-
-	if (failed_bbio->csum) {
-		const u32 csum_size = fs_info->csum_size;
-
-		repair_bbio->csum = repair_bbio->csum_inline;
-		memcpy(repair_bbio->csum,
-		       failed_bbio->csum + csum_size * icsum, csum_size);
-	}
-
-	bio_add_page(repair_bio, page, failrec->len, pgoff);
-
-	btrfs_debug(fs_info,
-		    "repair read error: submitting new read to mirror %d",
-		    failrec->this_mirror);
-
-	/*
-	 * At this point we have a bio, so any errors from bio submission will
-	 * be handled by the endio on the repair_bio, so we can't return an
-	 * error here.
-	 */
-	if (submit_buffered)
-		btrfs_submit_data_read_bio(inode, repair_bio,
-					   failrec->this_mirror, 0);
-	else
-		btrfs_submit_dio_repair_bio(inode, repair_bio, failrec->this_mirror);
-
-	return BLK_STS_OK;
-}
-
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index a2c82448b2e076..1b311cd6978321 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -60,11 +60,9 @@ enum {
 #define BITMAP_LAST_BYTE_MASK(nbits) \
 	(BYTE_MASK >> (-(nbits) & (BITS_PER_BYTE - 1)))
 
-struct btrfs_bio;
 struct btrfs_root;
 struct btrfs_inode;
 struct btrfs_fs_info;
-struct io_failure_record;
 struct extent_io_tree;
 struct btrfs_tree_parent_check;
 
@@ -279,35 +277,6 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 
-/*
- * When IO fails, either with EIO or csum verification fails, we
- * try other mirrors that might have a good copy of the data.  This
- * io_failure_record is used to record state as we go through all the
- * mirrors.  If another mirror has good data, the sector is set up to date
- * and things continue.  If a good mirror can't be found, the original
- * bio end_io callback is called to indicate things have failed.
- */
-struct io_failure_record {
-	/* Use rb_simple_node for search/insert */
-	struct {
-		struct rb_node rb_node;
-		u64 bytenr;
-	};
-	struct page *page;
-	u64 len;
-	u64 logical;
-	int this_mirror;
-	int failed_mirror;
-	int num_copies;
-};
-
-int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_bbio,
-			    u32 bio_offset, struct page *page, unsigned int pgoff,
-			    bool submit_buffered);
-void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end);
-int btrfs_clean_io_failure(struct btrfs_inode *inode, u64 start,
-			   struct page *page, unsigned int pg_offset);
-
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
 			     struct page *locked_page, u64 *start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c63160cf135d98..190c5ba92f9320 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3249,8 +3249,6 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 					ordered_extent->disk_num_bytes);
 	}
 
-	btrfs_free_io_failure_record(inode, start, end);
-
 	if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) {
 		truncated = true;
 		logical_len = ordered_extent->truncated_len;
@@ -5395,8 +5393,6 @@ void btrfs_evict_inode(struct inode *inode)
 	if (is_bad_inode(inode))
 		goto no_delete;
 
-	btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1);
-
 	if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags))
 		goto no_delete;
 
@@ -7839,16 +7835,6 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 	bio_endio(&dip->bio);
 }
 
-void btrfs_submit_dio_repair_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
-{
-	struct btrfs_dio_private *dip = btrfs_bio(bio)->private;
-
-	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
-
-	refcount_inc(&dip->refs);
-	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
-}
-
 blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
 					      struct bio *bio,
 					      u64 dio_file_offset)
@@ -8714,7 +8700,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->last_log_commit = 0;
 
 	spin_lock_init(&ei->lock);
-	spin_lock_init(&ei->io_failure_lock);
 	ei->outstanding_extents = 0;
 	if (sb->s_magic != BTRFS_TEST_MAGIC)
 		btrfs_init_metadata_block_rsv(fs_info, &ei->block_rsv,
@@ -8734,7 +8719,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->io_tree.inode = ei;
 	extent_io_tree_init(fs_info, &ei->file_extent_tree,
 			    IO_TREE_INODE_FILE_EXTENT);
-	ei->io_failure_tree = RB_ROOT;
 	atomic_set(&ei->sync_writers, 0);
 	mutex_init(&ei->log_mutex);
 	btrfs_ordered_inode_tree_init(&ei->ordered_tree);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (14 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 15/34] btrfs: remove the io_failure_record infrastructure Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 11:09   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 17/34] btrfs: remove the is_metadata flag " Christoph Hellwig
                   ` (18 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Rename iter to saved_iter and move it next to the repair internals
and nothing outside of bio.c should be touching it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.c | 12 ++++++------
 fs/btrfs/bio.h |  7 +++++--
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index e32a2dac97c541..f1566919fdb969 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -109,7 +109,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
 	if (repair_bbio->bio.bi_status ||
 	    !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) {
 		bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ);
-		repair_bbio->bio.bi_iter = repair_bbio->iter;
+		repair_bbio->bio.bi_iter = repair_bbio->saved_iter;
 
 		mirror = next_repair_mirror(fbio, mirror);
 		if (mirror == fbio->bbio->mirror_num) {
@@ -126,7 +126,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
 		mirror = prev_repair_mirror(fbio, mirror);
 		btrfs_repair_io_failure(fs_info, btrfs_ino(inode),
 				  repair_bbio->file_offset, fs_info->sectorsize,
-				  repair_bbio->iter.bi_sector <<
+				  repair_bbio->saved_iter.bi_sector <<
 					SECTOR_SHIFT,
 				  bv->bv_page, bv->bv_offset, mirror);
 	} while (mirror != fbio->bbio->mirror_num);
@@ -152,7 +152,7 @@ static struct btrfs_failed_bio *repair_one_sector(struct btrfs_bio *failed_bbio,
 	struct btrfs_inode *inode = failed_bbio->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	const u32 sectorsize = fs_info->sectorsize;
-	const u64 logical = failed_bbio->iter.bi_sector << SECTOR_SHIFT;
+	const u64 logical = failed_bbio->saved_iter.bi_sector << SECTOR_SHIFT;
 	struct btrfs_bio *repair_bbio;
 	struct bio *repair_bio;
 	int num_copies;
@@ -179,7 +179,7 @@ static struct btrfs_failed_bio *repair_one_sector(struct btrfs_bio *failed_bbio,
 
 	repair_bio = bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_NOFS,
 				      &btrfs_repair_bioset);
-	repair_bio->bi_iter.bi_sector = failed_bbio->iter.bi_sector;
+	repair_bio->bi_iter.bi_sector = failed_bbio->saved_iter.bi_sector;
 	bio_add_page(repair_bio, bv->bv_page, bv->bv_len, bv->bv_offset);
 
 	repair_bbio = btrfs_bio(repair_bio);
@@ -198,7 +198,7 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio,
 	struct btrfs_inode *inode = bbio->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	unsigned int sectorsize = fs_info->sectorsize;
-	struct bvec_iter *iter = &bbio->iter;
+	struct bvec_iter *iter = &bbio->saved_iter;
 	blk_status_t status = bbio->bio.bi_status;
 	struct btrfs_failed_bio *fbio = NULL;
 	u32 offset = 0;
@@ -435,7 +435,7 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	 * data reads.
 	 */
 	if (bio_op(bio) == REQ_OP_READ && !(bio->bi_opf & REQ_META)) {
-		bbio->iter = bio->bi_iter;
+		bbio->saved_iter = bio->bi_iter;
 		ret = btrfs_lookup_bio_sums(bbio);
 		if (ret)
 			goto fail;
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 61a791cf5360f3..c232148348dfe3 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -39,17 +39,20 @@ struct btrfs_bio {
 	 * it's a metadata bio.
 	 */
 	unsigned int is_metadata:1;
-	struct bvec_iter iter;
 
 	/* Inode and offset into it that this I/O operates on. */
 	struct btrfs_inode *inode;
 	u64 file_offset;
 
 	union {
-		/* For data checksum verification. */
+		/*
+		 * Data checksumming and original I/O information for internal
+		 * use in the btrfs_submit_bio machinery.
+		 */
 		struct {
 			u8 *csum;
 			u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
+			struct bvec_iter saved_iter;
 		};
 
 		/* For metadata parentness verification. */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/34] btrfs: remove the is_metadata flag in struct btrfs_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (15 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 11:13   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 18/34] btrfs: remove the submit_bio_start helpers Christoph Hellwig
                   ` (17 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

This flag is unused now, so remove it.  Re-expand the mirror_num field
to 8 bits, and move it to the I/O completion internal section of the
structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/bio.h     | 11 +----------
 fs/btrfs/disk-io.c |  1 -
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index c232148348dfe3..a96bcb3f36f684 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -30,16 +30,6 @@ typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
  * passed to btrfs_submit_bio for mapping to the physical devices.
  */
 struct btrfs_bio {
-	unsigned int mirror_num:7;
-
-	/*
-	 * Extra indicator for metadata bios.
-	 * For some btrfs bios they use pages without a mapping, thus
-	 * we can not rely on page->mapping->host to determine if
-	 * it's a metadata bio.
-	 */
-	unsigned int is_metadata:1;
-
 	/* Inode and offset into it that this I/O operates on. */
 	struct btrfs_inode *inode;
 	u64 file_offset;
@@ -64,6 +54,7 @@ struct btrfs_bio {
 	void *private;
 
 	/* For internal use in read end I/O handling */
+	unsigned int mirror_num;
 	struct work_struct end_io_work;
 
 	/*
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a6f89ac1c0865c..5a8ca6a518cf62 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -846,7 +846,6 @@ void btrfs_submit_metadata_bio(struct btrfs_inode *inode, struct bio *bio, int m
 	blk_status_t ret;
 
 	bio->bi_opf |= REQ_META;
-	bbio->is_metadata = 1;
 
 	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
 		btrfs_submit_bio(fs_info, bio, mirror_num);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/34] btrfs: remove the submit_bio_start helpers
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (16 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 17/34] btrfs: remove the is_metadata flag " Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 11:49   ` Johannes Thumshirn
  2023-01-21  6:50 ` [PATCH 19/34] btrfs: simplify the btrfs_csum_one_bio calling convention Christoph Hellwig
                   ` (16 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Just call btrfs_csum_one_bio and btree_csum_one_bio directly.

Note that btree_csum_one_bio has to be moved up in the file a bit
to avoid a forward declaration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/btrfs_inode.h |  4 ----
 fs/btrfs/disk-io.c     | 54 ++++++++++++++++++------------------------
 fs/btrfs/disk-io.h     |  1 -
 fs/btrfs/inode.c       | 20 ----------------
 4 files changed, 23 insertions(+), 56 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 78c7979b8dcac7..ba5f023aaf5573 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -408,10 +408,6 @@ static inline void btrfs_inode_split_flags(u64 inode_item_flags,
 void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
 void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 			int mirror_num, enum btrfs_compression_type compress_type);
-blk_status_t btrfs_submit_bio_start(struct btrfs_inode *inode, struct bio *bio);
-blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
-					      struct bio *bio,
-					      u64 dio_file_offset);
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5a8ca6a518cf62..d89ba4d0f898e5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,6 +52,7 @@
 #include "relocation.h"
 #include "scrub.h"
 #include "super.h"
+#include "file-item.h"
 
 #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
 				 BTRFS_HEADER_FLAG_RELOC |\
@@ -455,6 +456,24 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	return csum_one_extent_buffer(eb);
 }
 
+static blk_status_t btree_csum_one_bio(struct bio *bio)
+{
+	struct bio_vec *bvec;
+	struct btrfs_root *root;
+	int ret = 0;
+	struct bvec_iter_all iter_all;
+
+	ASSERT(!bio_flagged(bio, BIO_CLONED));
+	bio_for_each_segment_all(bvec, bio, iter_all) {
+		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
+		ret = csum_dirty_buffer(root->fs_info, bvec);
+		if (ret)
+			break;
+	}
+
+	return errno_to_blk_status(ret);
+}
+
 static int check_tree_block_fsid(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
@@ -708,14 +727,14 @@ static void run_one_async_start(struct btrfs_work *work)
 	async = container_of(work, struct  async_submit_bio, work);
 	switch (async->submit_cmd) {
 	case WQ_SUBMIT_METADATA:
-		ret = btree_submit_bio_start(async->bio);
+		ret = btree_csum_one_bio(async->bio);
 		break;
 	case WQ_SUBMIT_DATA:
-		ret = btrfs_submit_bio_start(async->inode, async->bio);
+		ret = btrfs_csum_one_bio(async->inode, async->bio, (u64)-1, false);
 		break;
 	case WQ_SUBMIT_DATA_DIO:
-		ret = btrfs_submit_bio_start_direct_io(async->inode,
-				async->bio, async->dio_file_offset);
+		ret = btrfs_csum_one_bio(async->inode, async->bio,
+					 async->dio_file_offset, false);
 		break;
 	default:
 		/* Can't happen so return something that would prevent the IO. */
@@ -800,33 +819,6 @@ bool btrfs_wq_submit_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_
 	return true;
 }
 
-static blk_status_t btree_csum_one_bio(struct bio *bio)
-{
-	struct bio_vec *bvec;
-	struct btrfs_root *root;
-	int ret = 0;
-	struct bvec_iter_all iter_all;
-
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
-		ret = csum_dirty_buffer(root->fs_info, bvec);
-		if (ret)
-			break;
-	}
-
-	return errno_to_blk_status(ret);
-}
-
-blk_status_t btree_submit_bio_start(struct bio *bio)
-{
-	/*
-	 * when we're called for a write, we're already in the async
-	 * submission context.  Just jump into btrfs_submit_bio.
-	 */
-	return btree_csum_one_bio(bio);
-}
-
 static bool should_async_write(struct btrfs_fs_info *fs_info,
 			     struct btrfs_inode *bi)
 {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index f2f295eb6103da..5898beb648484a 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -122,7 +122,6 @@ enum btrfs_wq_submit_cmd {
 
 bool btrfs_wq_submit_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num,
 			 u64 dio_file_offset, enum btrfs_wq_submit_cmd cmd);
-blk_status_t btree_submit_bio_start(struct bio *bio);
 int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 190c5ba92f9320..014cd6a4544132 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2532,19 +2532,6 @@ void btrfs_clear_delalloc_extent(struct btrfs_inode *inode,
 	}
 }
 
-/*
- * in order to insert checksums into the metadata in large chunks,
- * we wait until bio submission time.   All the pages in the bio are
- * checksummed and sums are attached onto the ordered extent record.
- *
- * At IO completion time the cums attached on the ordered extent record
- * are inserted into the btree
- */
-blk_status_t btrfs_submit_bio_start(struct btrfs_inode *inode, struct bio *bio)
-{
-	return btrfs_csum_one_bio(inode, bio, (u64)-1, false);
-}
-
 /*
  * Split an extent_map at [start, start + len]
  *
@@ -7835,13 +7822,6 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 	bio_endio(&dip->bio);
 }
 
-blk_status_t btrfs_submit_bio_start_direct_io(struct btrfs_inode *inode,
-					      struct bio *bio,
-					      u64 dio_file_offset)
-{
-	return btrfs_csum_one_bio(inode, bio, dio_file_offset, false);
-}
-
 static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
 {
 	struct btrfs_dio_private *dip = bbio->private;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/34] btrfs: simplify the btrfs_csum_one_bio calling convention
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (17 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 18/34] btrfs: remove the submit_bio_start helpers Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 20/34] btrfs: handle checksum generation in the storage layer Christoph Hellwig
                   ` (15 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

To prepare for further bio submission changes btrfs_csum_one_bio
should be able to take all it's arguments from the btrfs_bio structure.
It can always use the bbio->inode already, and once the compression code
is updated to set ->file_offset that one can be used unconditionally
as well instead of looking at the page mapping now that btrfs doesn't
allow ordered extents to span discontiguous data ranges.

The only slightly tricky bit is the one_ordered flag set by the
compressed writes.  Replace that one with the driver private bio
flag, which gets cleared before the bio is handed off to the block layer
so that we don't get in the way of driver use.

Note: this leaves an argument an a flag to btrfs_wq_submit_bio unused.
But that whole mechanism will be removed in its current form in the
next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         |  3 +++
 fs/btrfs/bio.h         |  3 +++
 fs/btrfs/compression.c |  6 ++++--
 fs/btrfs/disk-io.c     |  5 +----
 fs/btrfs/file-item.c   | 22 ++++++----------------
 fs/btrfs/file-item.h   |  6 ++++--
 fs/btrfs/inode.c       |  4 ++--
 7 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index f1566919fdb969..aae56e226fcdd9 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -441,6 +441,9 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 			goto fail;
 	}
 
+	/* Do not leak our private flag into the block layer */
+	bio->bi_opf &= ~REQ_BTRFS_ONE_ORDERED;
+
 	if (!bioc) {
 		/* Single mirror read/write fast path */
 		bbio->mirror_num = mirror_num;
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index a96bcb3f36f684..334dcc3d5feb95 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -86,6 +86,9 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 	bbio->end_io(bbio);
 }
 
+/* bio only refers to one ordered extent */
+#define REQ_BTRFS_ONE_ORDERED	REQ_DRV
+
 void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		      int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index f84ccb185c2a6f..002af327705605 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -357,7 +357,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	blk_status_t ret = BLK_STS_OK;
 	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
-	const enum req_op bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE;
+	const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED |
+		(use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE);
 
 	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
 	       IS_ALIGNED(len, fs_info->sectorsize));
@@ -395,6 +396,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 				ret = errno_to_blk_status(PTR_ERR(bio));
 				break;
 			}
+			btrfs_bio(bio)->file_offset = start;
 			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
 		}
@@ -436,7 +438,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 		if (submit) {
 			if (!skip_sum) {
-				ret = btrfs_csum_one_bio(inode, bio, start, true);
+				ret = btrfs_csum_one_bio(btrfs_bio(bio));
 				if (ret) {
 					btrfs_bio_end_io(btrfs_bio(bio), ret);
 					break;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d89ba4d0f898e5..27d1df2b80dd38 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -730,11 +730,8 @@ static void run_one_async_start(struct btrfs_work *work)
 		ret = btree_csum_one_bio(async->bio);
 		break;
 	case WQ_SUBMIT_DATA:
-		ret = btrfs_csum_one_bio(async->inode, async->bio, (u64)-1, false);
-		break;
 	case WQ_SUBMIT_DATA_DIO:
-		ret = btrfs_csum_one_bio(async->inode, async->bio,
-					 async->dio_file_offset, false);
+		ret = btrfs_csum_one_bio(btrfs_bio(async->bio));
 		break;
 	default:
 		/* Can't happen so return something that would prevent the IO. */
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index c5324fe8f4be7a..a097b03c0bdcdf 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -771,24 +771,17 @@ int btrfs_lookup_csums_bitmap(struct btrfs_root *root, u64 start, u64 end,
 }
 
 /*
- * Calculate checksums of the data contained inside a bio.
- *
- * @inode:	 Owner of the data inside the bio
- * @bio:	 Contains the data to be checksummed
- * @offset:      If (u64)-1, @bio may contain discontiguous bio vecs, so the
- *               file offsets are determined from the page offsets in the bio.
- *               Otherwise, this is the starting file offset of the bio vecs in
- *               @bio, which must be contiguous.
- * @one_ordered: If true, @bio only refers to one ordered extent.
+ * Calculate checksums of the data contained inside a bio
  */
-blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
-				u64 offset, bool one_ordered)
+blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio)
 {
+	struct btrfs_inode *inode = bbio->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
+	struct bio *bio = &bbio->bio;
+	u64 offset = bbio->file_offset;
 	struct btrfs_ordered_sum *sums;
 	struct btrfs_ordered_extent *ordered = NULL;
-	const bool use_page_offsets = (offset == (u64)-1);
 	char *data;
 	struct bvec_iter iter;
 	struct bio_vec bvec;
@@ -816,9 +809,6 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 	shash->tfm = fs_info->csum_shash;
 
 	bio_for_each_segment(bvec, bio, iter) {
-		if (use_page_offsets)
-			offset = page_offset(bvec.bv_page) + bvec.bv_offset;
-
 		if (!ordered) {
 			ordered = btrfs_lookup_ordered_extent(inode, offset);
 			/*
@@ -840,7 +830,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
 						 - 1);
 
 		for (i = 0; i < blockcount; i++) {
-			if (!one_ordered &&
+			if (!(bio->bi_opf & REQ_BTRFS_ONE_ORDERED) &&
 			    !in_range(offset, ordered->file_offset,
 				      ordered->num_bytes)) {
 				unsigned long bytes_left;
diff --git a/fs/btrfs/file-item.h b/fs/btrfs/file-item.h
index a2f9747adf3ac0..cd7f2ae515c0ca 100644
--- a/fs/btrfs/file-item.h
+++ b/fs/btrfs/file-item.h
@@ -49,8 +49,10 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans,
 int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct btrfs_ordered_sum *sums);
-blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio,
-				u64 offset, bool one_ordered);
+blk_status_t btrfs_csum_one_bio(struct btrfs_bio *bbio);
+int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
+			     struct list_head *list, int search_commit,
+			     bool nowait);
 int btrfs_lookup_csums_list(struct btrfs_root *root, u64 start, u64 end,
 			    struct list_head *list, int search_commit,
 			    bool nowait);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 014cd6a4544132..2a25ea88a9e8bd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2736,7 +2736,7 @@ void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int
 		    btrfs_wq_submit_bio(inode, bio, mirror_num, 0, WQ_SUBMIT_DATA))
 			return;
 
-		ret = btrfs_csum_one_bio(inode, bio, (u64)-1, false);
+		ret = btrfs_csum_one_bio(btrfs_bio(bio));
 		if (ret) {
 			btrfs_bio_end_io(btrfs_bio(bio), ret);
 			return;
@@ -7863,7 +7863,7 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
 		 * If we aren't doing async submit, calculate the csum of the
 		 * bio now.
 		 */
-		ret = btrfs_csum_one_bio(inode, bio, file_offset, false);
+		ret = btrfs_csum_one_bio(btrfs_bio(bio));
 		if (ret) {
 			btrfs_bio_end_io(btrfs_bio(bio), ret);
 			return;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/34] btrfs: handle checksum generation in the storage layer
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (18 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 19/34] btrfs: simplify the btrfs_csum_one_bio calling convention Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 21/34] btrfs: handle recording of zoned writes " Christoph Hellwig
                   ` (14 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Instead of letting the callers of btrfs_submit_bio deal with checksumming
the (meta)data in the bio and making decisions on when to offload the
checksumming to the bio, leave that to btrfs_submit_bio.  Do do so the
existing btrfs_submit_bio function is split into an upper and a lower
half, so that the lower half can be offloaded to a workqueue.

Note that this changes the behavior for direct writes to raid56 volumes so
that async checksum offloading is not skipped when more I/O is expected.
This runs counter to the argument explaining why it was done, although I
can't measure any affects of the change.  Commits later in this series
will make sure the entire direct writes is offloaded to the workqueue
at once and thus make sure it is sent to the raid56 code from a single
thread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         | 208 +++++++++++++++++++++++++++++++++++------
 fs/btrfs/compression.c |   9 --
 fs/btrfs/disk-io.c     | 155 +-----------------------------
 fs/btrfs/disk-io.h     |   9 +-
 fs/btrfs/inode.c       |  67 +------------
 5 files changed, 186 insertions(+), 262 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index aae56e226fcdd9..0dae2e4666c83a 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -404,7 +404,169 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr)
 	btrfs_submit_dev_bio(bioc->stripes[dev_nr].dev, bio);
 }
 
-void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num)
+static void __btrfs_submit_bio(struct bio *bio, struct btrfs_io_context *bioc,
+			       struct btrfs_io_stripe *smap, int mirror_num)
+{
+	/* Do not leak our private flag into the block layer */
+	bio->bi_opf &= ~REQ_BTRFS_ONE_ORDERED;
+
+	if (!bioc) {
+		/* Single mirror read/write fast path */
+		btrfs_bio(bio)->mirror_num = mirror_num;
+		bio->bi_iter.bi_sector = smap->physical >> SECTOR_SHIFT;
+		bio->bi_private = smap->dev;
+		bio->bi_end_io = btrfs_simple_end_io;
+		btrfs_submit_dev_bio(smap->dev, bio);
+	} else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		/* Parity RAID write or read recovery */
+		bio->bi_private = bioc;
+		bio->bi_end_io = btrfs_raid56_end_io;
+		if (bio_op(bio) == REQ_OP_READ)
+			raid56_parity_recover(bio, bioc, mirror_num);
+		else
+			raid56_parity_write(bio, bioc);
+	} else {
+		/* Write to multiple mirrors */
+		int total_devs = bioc->num_stripes;
+		int dev_nr;
+
+		bioc->orig_bio = bio;
+		for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
+			btrfs_submit_mirrored_bio(bioc, dev_nr);
+	}
+}
+
+static blk_status_t btrfs_bio_csum(struct btrfs_bio *bbio)
+{
+	if (bbio->bio.bi_opf & REQ_META)
+		return btree_csum_one_bio(&bbio->bio);
+	return btrfs_csum_one_bio(bbio);
+}
+
+/*
+ * async submit bios are used to offload expensive checksumming
+ * onto the worker threads.
+ */
+struct async_submit_bio {
+	struct btrfs_bio *bbio;
+	struct btrfs_io_context *bioc;
+	struct btrfs_io_stripe smap;
+	int mirror_num;
+	struct btrfs_work work;
+};
+
+/*
+ * In order to insert checksums into the metadata in large chunks,
+ * we wait until bio submission time.   All the pages in the bio are
+ * checksummed and sums are attached onto the ordered extent record.
+ *
+ * At IO completion time the cums attached on the ordered extent record
+ * are inserted into the btree
+ */
+static void run_one_async_start(struct btrfs_work *work)
+{
+	struct async_submit_bio *async =
+		container_of(work, struct async_submit_bio, work);
+	blk_status_t ret;
+
+	ret = btrfs_bio_csum(async->bbio);
+	if (ret)
+		async->bbio->bio.bi_status = ret;
+}
+
+/*
+ * In order to insert checksums into the metadata in large chunks, we wait
+ * until bio submission time.   All the pages in the bio are checksummed and
+ * sums are attached onto the ordered extent record.
+ *
+ * At IO completion time the csums attached on the ordered extent record are
+ * inserted into the tree.
+ */
+static void run_one_async_done(struct btrfs_work *work)
+{
+	struct async_submit_bio *async =
+		container_of(work, struct async_submit_bio, work);
+	struct bio *bio = &async->bbio->bio;
+
+	/* If an error occurred we just want to clean up the bio and move on */
+	if (bio->bi_status) {
+		btrfs_bio_end_io(async->bbio, bio->bi_status);
+		return;
+	}
+
+	/*
+	 * All of the bios that pass through here are from async helpers.
+	 * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's context.
+	 * This changes nothing when cgroups aren't in use.
+	 */
+	bio->bi_opf |= REQ_CGROUP_PUNT;
+	__btrfs_submit_bio(bio, async->bioc, &async->smap, async->mirror_num);
+}
+
+static void run_one_async_free(struct btrfs_work *work)
+{
+	kfree(container_of(work, struct async_submit_bio, work));
+}
+
+static bool should_async_write(struct btrfs_bio *bbio)
+{
+	/*
+	 * If the I/O is not issued by fsync and friends, (->sync_writers != 0),
+	 * then try to defer the submission to a workqueue to parallelize the
+	 * checksum calculation.
+	 */
+	if (atomic_read(&bbio->inode->sync_writers))
+		return false;
+
+	/*
+	 * Submit metadata writes synchronously if the checksum implementation
+	 * is fast, or we are on a zoned device that wants I/O to be submitted
+	 * in order.
+	 */
+	if (bbio->bio.bi_opf & REQ_META) {
+		struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+
+		if (btrfs_is_zoned(fs_info))
+			return false;
+		if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Submit bio to an async queue.
+ *
+ * Returns true if the work has been succesfuly submitted, else false.
+ */
+static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
+				struct btrfs_io_context *bioc,
+				struct btrfs_io_stripe *smap, int mirror_num)
+{
+	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+	struct async_submit_bio *async;
+
+	async = kmalloc(sizeof(*async), GFP_NOFS);
+	if (!async)
+		return false;
+
+	async->bbio = bbio;
+	async->bioc = bioc;
+	async->smap = *smap;
+	async->mirror_num = mirror_num;
+
+	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
+			run_one_async_free);
+	if (op_is_sync(bbio->bio.bi_opf))
+		btrfs_queue_work(fs_info->hipri_workers, &async->work);
+	else
+		btrfs_queue_work(fs_info->workers, &async->work);
+	return true;
+}
+
+void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+		      int mirror_num)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	u64 logical = bio->bi_iter.bi_sector << 9;
@@ -441,33 +603,25 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 			goto fail;
 	}
 
-	/* Do not leak our private flag into the block layer */
-	bio->bi_opf &= ~REQ_BTRFS_ONE_ORDERED;
-
-	if (!bioc) {
-		/* Single mirror read/write fast path */
-		bbio->mirror_num = mirror_num;
-		bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
-		bio->bi_private = smap.dev;
-		bio->bi_end_io = btrfs_simple_end_io;
-		btrfs_submit_dev_bio(smap.dev, bio);
-	} else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-		/* Parity RAID write or read recovery */
-		bio->bi_private = bioc;
-		bio->bi_end_io = btrfs_raid56_end_io;
-		if (bio_op(bio) == REQ_OP_READ)
-			raid56_parity_recover(bio, bioc, mirror_num);
-		else
-			raid56_parity_write(bio, bioc);
-	} else {
-		/* Write to multiple mirrors */
-		int total_devs = bioc->num_stripes;
-		int dev_nr;
-
-		bioc->orig_bio = bio;
-		for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
-			btrfs_submit_mirrored_bio(bioc, dev_nr);
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		/*
+		 * Csum items for reloc roots have already been cloned at this
+		 * point, so they are handled as part of the no-checksum case.
+		 */
+		if (!(bbio->inode->flags & BTRFS_INODE_NODATASUM) &&
+		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
+		    !btrfs_is_data_reloc_root(bbio->inode->root)) {
+			if (should_async_write(bbio) &&
+			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
+				return;
+
+			ret = btrfs_bio_csum(bbio);
+			if (ret)
+				goto fail;
+		}
 	}
+
+	__btrfs_submit_bio(bio, bioc, &smap, mirror_num);
 	return;
 
 fail:
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 002af327705605..21c064c8cf8ada 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -355,7 +355,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	u64 cur_disk_bytenr = disk_start;
 	u64 next_stripe_start;
 	blk_status_t ret = BLK_STS_OK;
-	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
 	const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED |
 		(use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE);
@@ -437,14 +436,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			submit = true;
 
 		if (submit) {
-			if (!skip_sum) {
-				ret = btrfs_csum_one_bio(btrfs_bio(bio));
-				if (ret) {
-					btrfs_bio_end_io(btrfs_bio(bio), ret);
-					break;
-				}
-			}
-
 			ASSERT(bio->bi_iter.bi_size);
 			btrfs_submit_bio(fs_info, bio, 0);
 			bio = NULL;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 27d1df2b80dd38..9ad6e85fcee6e9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,7 +52,6 @@
 #include "relocation.h"
 #include "scrub.h"
 #include "super.h"
-#include "file-item.h"
 
 #define BTRFS_SUPER_FLAG_SUPP	(BTRFS_HEADER_FLAG_WRITTEN |\
 				 BTRFS_HEADER_FLAG_RELOC |\
@@ -79,23 +78,6 @@ static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
 		crypto_free_shash(fs_info->csum_shash);
 }
 
-/*
- * async submit bios are used to offload expensive checksumming
- * onto the worker threads.  They checksum file and metadata bios
- * just before they are sent down the IO stack.
- */
-struct async_submit_bio {
-	struct btrfs_inode *inode;
-	struct bio *bio;
-	enum btrfs_wq_submit_cmd submit_cmd;
-	int mirror_num;
-
-	/* Optional parameter for used by direct io */
-	u64 dio_file_offset;
-	struct btrfs_work work;
-	blk_status_t status;
-};
-
 /*
  * Compute the csum of a btree block and store the result to provided buffer.
  */
@@ -456,7 +438,7 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	return csum_one_extent_buffer(eb);
 }
 
-static blk_status_t btree_csum_one_bio(struct bio *bio)
+blk_status_t btree_csum_one_bio(struct bio *bio)
 {
 	struct bio_vec *bvec;
 	struct btrfs_root *root;
@@ -719,143 +701,10 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 	return ret;
 }
 
-static void run_one_async_start(struct btrfs_work *work)
-{
-	struct async_submit_bio *async;
-	blk_status_t ret;
-
-	async = container_of(work, struct  async_submit_bio, work);
-	switch (async->submit_cmd) {
-	case WQ_SUBMIT_METADATA:
-		ret = btree_csum_one_bio(async->bio);
-		break;
-	case WQ_SUBMIT_DATA:
-	case WQ_SUBMIT_DATA_DIO:
-		ret = btrfs_csum_one_bio(btrfs_bio(async->bio));
-		break;
-	default:
-		/* Can't happen so return something that would prevent the IO. */
-		ret = BLK_STS_IOERR;
-		ASSERT(0);
-	}
-	if (ret)
-		async->status = ret;
-}
-
-/*
- * In order to insert checksums into the metadata in large chunks, we wait
- * until bio submission time.   All the pages in the bio are checksummed and
- * sums are attached onto the ordered extent record.
- *
- * At IO completion time the csums attached on the ordered extent record are
- * inserted into the tree.
- */
-static void run_one_async_done(struct btrfs_work *work)
-{
-	struct async_submit_bio *async =
-		container_of(work, struct  async_submit_bio, work);
-	struct btrfs_inode *inode = async->inode;
-	struct btrfs_bio *bbio = btrfs_bio(async->bio);
-
-	/* If an error occurred we just want to clean up the bio and move on */
-	if (async->status) {
-		btrfs_bio_end_io(bbio, async->status);
-		return;
-	}
-
-	/*
-	 * All of the bios that pass through here are from async helpers.
-	 * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's context.
-	 * This changes nothing when cgroups aren't in use.
-	 */
-	async->bio->bi_opf |= REQ_CGROUP_PUNT;
-	btrfs_submit_bio(inode->root->fs_info, async->bio, async->mirror_num);
-}
-
-static void run_one_async_free(struct btrfs_work *work)
-{
-	struct async_submit_bio *async;
-
-	async = container_of(work, struct  async_submit_bio, work);
-	kfree(async);
-}
-
-/*
- * Submit bio to an async queue.
- *
- * Retrun:
- * - true if the work has been succesfuly submitted
- * - false in case of error
- */
-bool btrfs_wq_submit_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num,
-			 u64 dio_file_offset, enum btrfs_wq_submit_cmd cmd)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct async_submit_bio *async;
-
-	async = kmalloc(sizeof(*async), GFP_NOFS);
-	if (!async)
-		return false;
-
-	async->inode = inode;
-	async->bio = bio;
-	async->mirror_num = mirror_num;
-	async->submit_cmd = cmd;
-
-	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
-			run_one_async_free);
-
-	async->dio_file_offset = dio_file_offset;
-
-	async->status = 0;
-
-	if (op_is_sync(bio->bi_opf))
-		btrfs_queue_work(fs_info->hipri_workers, &async->work);
-	else
-		btrfs_queue_work(fs_info->workers, &async->work);
-	return true;
-}
-
-static bool should_async_write(struct btrfs_fs_info *fs_info,
-			     struct btrfs_inode *bi)
-{
-	if (btrfs_is_zoned(fs_info))
-		return false;
-	if (atomic_read(&bi->sync_writers))
-		return false;
-	if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags))
-		return false;
-	return true;
-}
-
 void btrfs_submit_metadata_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
-	blk_status_t ret;
-
 	bio->bi_opf |= REQ_META;
-
-	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
-		btrfs_submit_bio(fs_info, bio, mirror_num);
-		return;
-	}
-
-	/*
-	 * Kthread helpers are used to submit writes so that checksumming can
-	 * happen in parallel across all CPUs.
-	 */
-	if (should_async_write(fs_info, inode) &&
-	    btrfs_wq_submit_bio(inode, bio, mirror_num, 0, WQ_SUBMIT_METADATA))
-		return;
-
-	ret = btree_csum_one_bio(bio);
-	if (ret) {
-		btrfs_bio_end_io(bbio, ret);
-		return;
-	}
-
-	btrfs_submit_bio(fs_info, bio, mirror_num);
+	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
 }
 
 #ifdef CONFIG_MIGRATION
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 5898beb648484a..ac55f8ec3a31a7 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -114,14 +114,7 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 int btrfs_read_extent_buffer(struct extent_buffer *buf,
 			     struct btrfs_tree_parent_check *check);
 
-enum btrfs_wq_submit_cmd {
-	WQ_SUBMIT_METADATA,
-	WQ_SUBMIT_DATA,
-	WQ_SUBMIT_DATA_DIO,
-};
-
-bool btrfs_wq_submit_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num,
-			 u64 dio_file_offset, enum btrfs_wq_submit_cmd cmd);
+blk_status_t btree_csum_one_bio(struct bio *bio);
 int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2a25ea88a9e8bd..6acb2557a31fa6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2721,27 +2721,6 @@ void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int
 		}
 	}
 
-	/*
-	 * If we need to checksum, and the I/O is not issued by fsync and
-	 * friends, that is ->sync_writers != 0, defer the submission to a
-	 * workqueue to parallelize it.
-	 *
-	 * Csum items for reloc roots have already been cloned at this point,
-	 * so they are handled as part of the no-checksum case.
-	 */
-	if (!(inode->flags & BTRFS_INODE_NODATASUM) &&
-	    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
-	    !btrfs_is_data_reloc_root(inode->root)) {
-		if (!atomic_read(&inode->sync_writers) &&
-		    btrfs_wq_submit_bio(inode, bio, mirror_num, 0, WQ_SUBMIT_DATA))
-			return;
-
-		ret = btrfs_csum_one_bio(btrfs_bio(bio));
-		if (ret) {
-			btrfs_bio_end_io(btrfs_bio(bio), ret);
-			return;
-		}
-	}
 	btrfs_submit_bio(fs_info, bio, mirror_num);
 }
 
@@ -7843,36 +7822,6 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
 	btrfs_dio_private_put(dip);
 }
 
-static void btrfs_submit_dio_bio(struct bio *bio, struct btrfs_inode *inode,
-				 u64 file_offset, int async_submit)
-{
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	blk_status_t ret;
-
-	if (inode->flags & BTRFS_INODE_NODATASUM)
-		goto map;
-
-	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-		/* Check btrfs_submit_data_write_bio() for async submit rules */
-		if (async_submit && !atomic_read(&inode->sync_writers) &&
-		    btrfs_wq_submit_bio(inode, bio, 0, file_offset,
-					WQ_SUBMIT_DATA_DIO))
-			return;
-
-		/*
-		 * If we aren't doing async submit, calculate the csum of the
-		 * bio now.
-		 */
-		ret = btrfs_csum_one_bio(btrfs_bio(bio));
-		if (ret) {
-			btrfs_bio_end_io(btrfs_bio(bio), ret);
-			return;
-		}
-	}
-map:
-	btrfs_submit_bio(fs_info, bio, 0);
-}
-
 static void btrfs_submit_direct(const struct iomap_iter *iter,
 		struct bio *dio_bio, loff_t file_offset)
 {
@@ -7880,11 +7829,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		container_of(dio_bio, struct btrfs_dio_private, bio);
 	struct inode *inode = iter->inode;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
-			     BTRFS_BLOCK_GROUP_RAID56_MASK);
 	struct bio *bio;
 	u64 start_sector;
-	int async_submit = 0;
 	u64 submit_len;
 	u64 clone_offset = 0;
 	u64 clone_len;
@@ -7951,19 +7897,10 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		 * We transfer the initial reference to the last bio, so we
 		 * don't need to increment the reference count for the last one.
 		 */
-		if (submit_len > 0) {
+		if (submit_len > 0)
 			refcount_inc(&dip->refs);
-			/*
-			 * If we are submitting more than one bio, submit them
-			 * all asynchronously. The exception is RAID 5 or 6, as
-			 * asynchronous checksums make it difficult to collect
-			 * full stripe writes.
-			 */
-			if (!raid56)
-				async_submit = 1;
-		}
 
-		btrfs_submit_dio_bio(bio, BTRFS_I(inode), file_offset, async_submit);
+		btrfs_submit_bio(fs_info, bio, 0);
 
 		dio_data->submitted += clone_len;
 		clone_offset += clone_len;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/34] btrfs: handle recording of zoned writes in the storage layer
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (19 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 20/34] btrfs: handle checksum generation in the storage layer Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 22/34] btrfs: support cloned bios in btree_csum_one_bio Christoph Hellwig
                   ` (13 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Move the code that splits the ordered extents and records the physical
location for them to the storage layer so that the higher level consumers
don't have to care about physical block numbers at all.  This will also
allow to eventually remove accounting for the zone append write sizes in
the upper layer with a little bit more block layer work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         |  8 ++++++++
 fs/btrfs/btrfs_inode.h |  1 +
 fs/btrfs/compression.c |  1 -
 fs/btrfs/extent_io.c   |  6 ------
 fs/btrfs/inode.c       | 37 +++++++------------------------------
 fs/btrfs/zoned.c       | 13 +++++--------
 fs/btrfs/zoned.h       |  6 ++----
 7 files changed, 23 insertions(+), 49 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 0dae2e4666c83a..84673f552bb345 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -284,6 +284,8 @@ static void btrfs_simple_end_io(struct bio *bio)
 		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
 		queue_work(btrfs_end_io_wq(fs_info, bio), &bbio->end_io_work);
 	} else {
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
+			btrfs_record_physical_zoned(bbio);
 		bbio->end_io(bbio);
 	}
 }
@@ -604,6 +606,12 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	}
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+			ret = btrfs_extract_ordered_extent(btrfs_bio(bio));
+			if (ret)
+				goto fail;
+		}
+
 		/*
 		 * Csum items for reloc roots have already been cloned at this
 		 * point, so they are handled as part of the no-checksum case.
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index ba5f023aaf5573..b83b731c63e13e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -410,6 +410,7 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 			int mirror_num, enum btrfs_compression_type compress_type);
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
+blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio);
 bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
 			u32 bio_offset, struct bio_vec *bv);
 noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len,
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 21c064c8cf8ada..efa040eb6b19d1 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -273,7 +273,6 @@ static void end_compressed_bio_write(struct btrfs_bio *bbio)
 	if (refcount_dec_and_test(&cb->pending_ios)) {
 		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
 
-		btrfs_record_physical_zoned(cb->inode, cb->start, &bbio->bio);
 		queue_work(fs_info->compressed_write_workers, &cb->write_end_work);
 	}
 	bio_put(&bbio->bio);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 75b717a7b0f97d..81a56489db7323 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -586,7 +586,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
 	u64 start;
 	u64 end;
 	struct bvec_iter_all iter_all;
-	bool first_bvec = true;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
@@ -608,11 +607,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
 		start = page_offset(page) + bvec->bv_offset;
 		end = start + bvec->bv_len - 1;
 
-		if (first_bvec) {
-			btrfs_record_physical_zoned(inode, start, bio);
-			first_bvec = false;
-		}
-
 		end_extent_writepage(page, error, start, end);
 
 		btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6acb2557a31fa6..da71e2a6a2e68b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2647,19 +2647,19 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len,
 	return ret;
 }
 
-static blk_status_t extract_ordered_extent(struct btrfs_inode *inode,
-					   struct bio *bio, loff_t file_offset)
+blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 {
+	u64 start = (u64)bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 len = bbio->bio.bi_iter.bi_size;
+	struct btrfs_inode *inode = bbio->inode;
 	struct btrfs_ordered_extent *ordered;
-	u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 file_len;
-	u64 len = bio->bi_iter.bi_size;
 	u64 end = start + len;
 	u64 ordered_end;
 	u64 pre, post;
 	int ret = 0;
 
-	ordered = btrfs_lookup_ordered_extent(inode, file_offset);
+	ordered = btrfs_lookup_ordered_extent(inode, bbio->file_offset);
 	if (WARN_ON_ONCE(!ordered))
 		return BLK_STS_IOERR;
 
@@ -2699,7 +2699,7 @@ static blk_status_t extract_ordered_extent(struct btrfs_inode *inode,
 	ret = btrfs_split_ordered_extent(ordered, pre, post);
 	if (ret)
 		goto out;
-	ret = split_zoned_em(inode, file_offset, file_len, pre, post);
+	ret = split_zoned_em(inode, bbio->file_offset, file_len, pre, post);
 
 out:
 	btrfs_put_ordered_extent(ordered);
@@ -2709,19 +2709,7 @@ static blk_status_t extract_ordered_extent(struct btrfs_inode *inode,
 
 void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	blk_status_t ret;
-
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-		ret = extract_ordered_extent(inode, bio,
-				page_offset(bio_first_bvec_all(bio)->bv_page));
-		if (ret) {
-			btrfs_bio_end_io(btrfs_bio(bio), ret);
-			return;
-		}
-	}
-
-	btrfs_submit_bio(fs_info, bio, mirror_num);
+	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
 }
 
 void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
@@ -7816,8 +7804,6 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
 		dip->bio.bi_status = err;
 	}
 
-	btrfs_record_physical_zoned(&dip->inode->vfs_inode, bbio->file_offset, bio);
-
 	bio_put(bio);
 	btrfs_dio_private_put(dip);
 }
@@ -7876,15 +7862,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 					      dip);
 		btrfs_bio(bio)->file_offset = file_offset;
 
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			status = extract_ordered_extent(BTRFS_I(inode), bio,
-							file_offset);
-			if (status) {
-				bio_put(bio);
-				goto out_err;
-			}
-		}
-
 		ASSERT(submit_len >= clone_len);
 		submit_len -= clone_len;
 
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index d46701a77b1728..5bf67c3c9f846f 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -17,6 +17,7 @@
 #include "space-info.h"
 #include "fs.h"
 #include "accessors.h"
+#include "bio.h"
 
 /* Maximum number of zones to report per blkdev_report_zones() call */
 #define BTRFS_REPORT_NR_ZONES   4096
@@ -1660,21 +1661,17 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start)
 	return ret;
 }
 
-void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset,
-				 struct bio *bio)
+void btrfs_record_physical_zoned(struct btrfs_bio *bbio)
 {
+	const u64 physical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
 	struct btrfs_ordered_extent *ordered;
-	const u64 physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 
-	if (bio_op(bio) != REQ_OP_ZONE_APPEND)
-		return;
-
-	ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset);
+	ordered = btrfs_lookup_ordered_extent(bbio->inode, bbio->file_offset);
 	if (WARN_ON(!ordered))
 		return;
 
 	ordered->physical = physical;
-	ordered->bdev = bio->bi_bdev;
+	ordered->bdev = bbio->bio.bi_bdev;
 
 	btrfs_put_ordered_extent(ordered);
 }
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index f43990985d802c..bc93a740e7cf34 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -57,8 +57,7 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans,
 			    struct extent_buffer *eb);
 void btrfs_free_redirty_list(struct btrfs_transaction *trans);
 bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start);
-void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset,
-				 struct bio *bio);
+void btrfs_record_physical_zoned(struct btrfs_bio *bbio);
 void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered);
 bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info,
 				    struct extent_buffer *eb,
@@ -190,8 +189,7 @@ static inline bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start)
 	return false;
 }
 
-static inline void btrfs_record_physical_zoned(struct inode *inode,
-					       u64 file_offset, struct bio *bio)
+static inline void btrfs_record_physical_zoned(struct btrfs_bio *bbio)
 {
 }
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/34] btrfs: support cloned bios in btree_csum_one_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (20 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 21/34] btrfs: handle recording of zoned writes " Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios Christoph Hellwig
                   ` (12 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

To allow splitting bios in btrfs_submit_bio, btree_csum_one_bio needs to
be able to handle cloned bios.  As btree_csum_one_bio is always called
before handing the bio to the block layer that is trivially done by using
bio_for_each_segment instead of bio_for_each_segment_all.  Also switch
the function to take a btrfs_bio and use that to derive the fs_info.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c     |  2 +-
 fs/btrfs/disk-io.c | 14 ++++++--------
 fs/btrfs/disk-io.h |  2 +-
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 84673f552bb345..c7522ac7e0e71c 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -441,7 +441,7 @@ static void __btrfs_submit_bio(struct bio *bio, struct btrfs_io_context *bioc,
 static blk_status_t btrfs_bio_csum(struct btrfs_bio *bbio)
 {
 	if (bbio->bio.bi_opf & REQ_META)
-		return btree_csum_one_bio(&bbio->bio);
+		return btree_csum_one_bio(bbio);
 	return btrfs_csum_one_bio(bbio);
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9ad6e85fcee6e9..2ae329b5ce9804 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -438,17 +438,15 @@ static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec
 	return csum_one_extent_buffer(eb);
 }
 
-blk_status_t btree_csum_one_bio(struct bio *bio)
+blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
 {
-	struct bio_vec *bvec;
-	struct btrfs_root *root;
+	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+	struct bvec_iter iter;
+	struct bio_vec bv;
 	int ret = 0;
-	struct bvec_iter_all iter_all;
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		root = BTRFS_I(bvec->bv_page->mapping->host)->root;
-		ret = csum_dirty_buffer(root->fs_info, bvec);
+	bio_for_each_segment(bv, &bbio->bio, iter) {
+		ret = csum_dirty_buffer(fs_info, &bv);
 		if (ret)
 			break;
 	}
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index ac55f8ec3a31a7..f2dd4c6d9c258b 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -114,7 +114,7 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 int btrfs_read_extent_buffer(struct extent_buffer *buf,
 			     struct btrfs_tree_parent_check *check);
 
-blk_status_t btree_csum_one_bio(struct bio *bio);
+blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio);
 int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (21 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 22/34] btrfs: support cloned bios in btree_csum_one_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-25 21:51   ` Josef Bacik
  2023-01-21  6:50 ` [PATCH 24/34] btrfs: pass the iomap bio to btrfs_submit_bio Christoph Hellwig
                   ` (11 subsequent siblings)
  34 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Currently the I/O submitters have to split bios according to the
chunk stripe boundaries.  This leads to extra lookups in the extent
trees and a lot of boilerplate code.

To drop this requirement, split the bio when __btrfs_map_block
returns a mapping that is smaller than the requested size and
keep a count of pending bios in the original btrfs_bio so that
the upper level completion is only invoked when all clones have
completed.

Based on a patch from Qu Wenruo.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/bio.c | 108 ++++++++++++++++++++++++++++++++++++++++---------
 fs/btrfs/bio.h |   1 +
 2 files changed, 91 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index c7522ac7e0e71c..ff42b783902140 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -17,6 +17,7 @@
 #include "file-item.h"
 
 static struct bio_set btrfs_bioset;
+static struct bio_set btrfs_clone_bioset;
 static struct bio_set btrfs_repair_bioset;
 static mempool_t btrfs_failed_bio_pool;
 
@@ -38,6 +39,7 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio,
 	bbio->inode = inode;
 	bbio->end_io = end_io;
 	bbio->private = private;
+	atomic_set(&bbio->pending_ios, 1);
 }
 
 /*
@@ -75,6 +77,58 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
 	return bio;
 }
 
+static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length)
+{
+	struct btrfs_bio *orig_bbio = btrfs_bio(orig);
+	struct bio *bio;
+
+	bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS,
+			&btrfs_clone_bioset);
+	btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio);
+
+	btrfs_bio(bio)->file_offset = orig_bbio->file_offset;
+	if (!(orig->bi_opf & REQ_BTRFS_ONE_ORDERED))
+		orig_bbio->file_offset += map_length;
+
+	atomic_inc(&orig_bbio->pending_ios);
+	return bio;
+}
+
+static void btrfs_orig_write_end_io(struct bio *bio);
+static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio,
+				       struct btrfs_bio *orig_bbio)
+{
+	/*
+	 * For writes btrfs tolerates nr_mirrors - 1 write failures, so we
+	 * can't just blindly propagate a write failure here.
+	 * Instead increment the error count in the original I/O context so
+	 * that it is guaranteed to be larger than the error tolerance.
+	 */
+	if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) {
+		struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private;
+		struct btrfs_io_context *orig_bioc = orig_stripe->bioc;
+
+		atomic_add(orig_bioc->max_errors, &orig_bioc->error);
+	} else {
+		orig_bbio->bio.bi_status = bbio->bio.bi_status;
+	}
+}
+
+static void btrfs_orig_bbio_end_io(struct btrfs_bio *bbio)
+{
+	if (bbio->bio.bi_pool == &btrfs_clone_bioset) {
+		struct btrfs_bio *orig_bbio = bbio->private;
+
+		if (bbio->bio.bi_status)
+			btrfs_bbio_propagate_error(bbio, orig_bbio);
+		bio_put(&bbio->bio);
+		bbio = orig_bbio;
+	}
+
+	if (atomic_dec_and_test(&bbio->pending_ios))
+		bbio->end_io(bbio);
+}
+
 static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
 {
 	if (cur_mirror == fbio->num_copies)
@@ -92,7 +146,7 @@ static int prev_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
 static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
 {
 	if (atomic_dec_and_test(&fbio->repair_count)) {
-		fbio->bbio->end_io(fbio->bbio);
+		btrfs_orig_bbio_end_io(fbio->bbio);
 		mempool_free(fbio, &btrfs_failed_bio_pool);
 	}
 }
@@ -232,7 +286,7 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio,
 	if (unlikely(fbio))
 		btrfs_repair_done(fbio);
 	else
-		bbio->end_io(bbio);
+		btrfs_orig_bbio_end_io(bbio);
 }
 
 static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
@@ -286,7 +340,7 @@ static void btrfs_simple_end_io(struct bio *bio)
 	} else {
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
 			btrfs_record_physical_zoned(bbio);
-		bbio->end_io(bbio);
+		btrfs_orig_bbio_end_io(bbio);
 	}
 }
 
@@ -300,7 +354,7 @@ static void btrfs_raid56_end_io(struct bio *bio)
 	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
 		btrfs_check_read_bio(bbio, NULL);
 	else
-		bbio->end_io(bbio);
+		btrfs_orig_bbio_end_io(bbio);
 
 	btrfs_put_bioc(bioc);
 }
@@ -327,7 +381,7 @@ static void btrfs_orig_write_end_io(struct bio *bio)
 	else
 		bio->bi_status = BLK_STS_OK;
 
-	bbio->end_io(bbio);
+	btrfs_orig_bbio_end_io(bbio);
 	btrfs_put_bioc(bioc);
 }
 
@@ -492,7 +546,7 @@ static void run_one_async_done(struct btrfs_work *work)
 
 	/* If an error occurred we just want to clean up the bio and move on */
 	if (bio->bi_status) {
-		btrfs_bio_end_io(async->bbio, bio->bi_status);
+		btrfs_orig_bbio_end_io(async->bbio);
 		return;
 	}
 
@@ -567,8 +621,8 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
 	return true;
 }
 
-void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-		      int mirror_num)
+static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
+			       int mirror_num)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	u64 logical = bio->bi_iter.bi_sector << 9;
@@ -587,11 +641,10 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		goto fail;
 	}
 
+	map_length = min(map_length, length);
 	if (map_length < length) {
-		btrfs_crit(fs_info,
-			   "mapping failed logical %llu bio len %llu len %llu",
-			   logical, length, map_length);
-		BUG();
+		bio = btrfs_split_bio(bio, map_length);
+		bbio = btrfs_bio(bio);
 	}
 
 	/*
@@ -602,14 +655,14 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		bbio->saved_iter = bio->bi_iter;
 		ret = btrfs_lookup_bio_sums(bbio);
 		if (ret)
-			goto fail;
+			goto fail_put_bio;
 	}
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 			ret = btrfs_extract_ordered_extent(btrfs_bio(bio));
 			if (ret)
-				goto fail;
+				goto fail_put_bio;
 		}
 
 		/*
@@ -621,20 +674,33 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		    !btrfs_is_data_reloc_root(bbio->inode->root)) {
 			if (should_async_write(bbio) &&
 			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
-				return;
+				goto done;
 
 			ret = btrfs_bio_csum(bbio);
 			if (ret)
-				goto fail;
+				goto fail_put_bio;
 		}
 	}
 
 	__btrfs_submit_bio(bio, bioc, &smap, mirror_num);
-	return;
+done:
+	return map_length == length;
 
+fail_put_bio:
+	if (map_length < length)
+		bio_put(bio);
 fail:
 	btrfs_bio_counter_dec(fs_info);
 	btrfs_bio_end_io(bbio, ret);
+	/* Do not submit another chunk */
+	return true;
+}
+
+void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+		      int mirror_num)
+{
+	while (!btrfs_submit_chunk(fs_info, bio, mirror_num))
+		;
 }
 
 /*
@@ -742,10 +808,13 @@ int __init btrfs_bioset_init(void)
 			offsetof(struct btrfs_bio, bio),
 			BIOSET_NEED_BVECS))
 		return -ENOMEM;
+	if (bioset_init(&btrfs_clone_bioset, BIO_POOL_SIZE,
+			offsetof(struct btrfs_bio, bio), 0))
+		goto out_free_bioset;
 	if (bioset_init(&btrfs_repair_bioset, BIO_POOL_SIZE,
 			offsetof(struct btrfs_bio, bio),
 			BIOSET_NEED_BVECS))
-		goto out_free_bioset;
+		goto out_free_clone_bioset;
 	if (mempool_init_kmalloc_pool(&btrfs_failed_bio_pool, BIO_POOL_SIZE,
 				      sizeof(struct btrfs_failed_bio)))
 		goto out_free_repair_bioset;
@@ -753,6 +822,8 @@ int __init btrfs_bioset_init(void)
 
 out_free_repair_bioset:
 	bioset_exit(&btrfs_repair_bioset);
+out_free_clone_bioset:
+	bioset_exit(&btrfs_clone_bioset);
 out_free_bioset:
 	bioset_exit(&btrfs_bioset);
 	return -ENOMEM;
@@ -762,5 +833,6 @@ void __cold btrfs_bioset_exit(void)
 {
 	mempool_exit(&btrfs_failed_bio_pool);
 	bioset_exit(&btrfs_repair_bioset);
+	bioset_exit(&btrfs_clone_bioset);
 	bioset_exit(&btrfs_bioset);
 }
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 334dcc3d5feb95..7c50f757cf5106 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -55,6 +55,7 @@ struct btrfs_bio {
 
 	/* For internal use in read end I/O handling */
 	unsigned int mirror_num;
+	atomic_t pending_ios;
 	struct work_struct end_io_work;
 
 	/*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 24/34] btrfs: pass the iomap bio to btrfs_submit_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (22 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 25/34] btrfs: remove stripe boundary calculation for buffered I/O Christoph Hellwig
                   ` (10 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Now that btrfs_submit_bio splits the bio when crossing stripe boundaries,
there is no need for the higher level code to do that manually.

For direct I/O this is really helpful, as btrfs_submit_io can now simply
take the bio allocated by iomap and send it on to btrfs_submit_bio
instead of allocating clones.

For that to work, the bio embedded into struct btrfs_dio_private needs to
become a full btrfs_bio as expected by btrfs_submit_bio.

With this change there is a single work item to offload the entire iomap
bio so the heuristics to skip async processing for bios that were split
isn't needed anymore either.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c   |  22 +------
 fs/btrfs/bio.h   |   6 +-
 fs/btrfs/inode.c | 162 ++++++++++-------------------------------------
 3 files changed, 37 insertions(+), 153 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index ff42b783902140..5d4b67fc44f4de 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -31,9 +31,8 @@ struct btrfs_failed_bio {
  * Initialize a btrfs_bio structure.  This skips the embedded bio itself as it
  * is already initialized by the block layer.
  */
-static inline void btrfs_bio_init(struct btrfs_bio *bbio,
-				  struct btrfs_inode *inode,
-				  btrfs_bio_end_io_t end_io, void *private)
+void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
+			   btrfs_bio_end_io_t end_io, void *private)
 {
 	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
 	bbio->inode = inode;
@@ -60,23 +59,6 @@ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
 	return bio;
 }
 
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
-				    struct btrfs_inode *inode,
-				    btrfs_bio_end_io_t end_io, void *private)
-{
-	struct bio *bio;
-	struct btrfs_bio *bbio;
-
-	ASSERT(offset <= UINT_MAX && size <= UINT_MAX);
-
-	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
-	bbio = btrfs_bio(bio);
-	btrfs_bio_init(bbio, inode, end_io, private);
-
-	bio_trim(bio, offset >> 9, size >> 9);
-	return bio;
-}
-
 static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length)
 {
 	struct btrfs_bio *orig_bbio = btrfs_bio(orig);
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 7c50f757cf5106..5acbcc18026d52 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -73,13 +73,11 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
 int __init btrfs_bioset_init(void);
 void __cold btrfs_bioset_exit(void);
 
+void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_inode *inode,
+		    btrfs_bio_end_io_t end_io, void *private);
 struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
 			    struct btrfs_inode *inode,
 			    btrfs_bio_end_io_t end_io, void *private);
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
-				    struct btrfs_inode *inode,
-				    btrfs_bio_end_io_t end_io, void *private);
-
 
 static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index da71e2a6a2e68b..46ff73c09da7d3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -84,24 +84,12 @@ struct btrfs_dio_data {
 };
 
 struct btrfs_dio_private {
-	struct btrfs_inode *inode;
-
-	/*
-	 * Since DIO can use anonymous page, we cannot use page_offset() to
-	 * grab the file offset, thus need a dedicated member for file offset.
-	 */
+	/* Range of I/O */
 	u64 file_offset;
-	/* Used for bio::bi_size */
 	u32 bytes;
 
-	/*
-	 * References to this structure. There is one reference per in-flight
-	 * bio plus one while we're still setting up.
-	 */
-	refcount_t refs;
-
 	/* This must be last */
-	struct bio bio;
+	struct btrfs_bio bbio;
 };
 
 static struct bio_set btrfs_dio_bioset;
@@ -7767,132 +7755,48 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
-static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
+static void btrfs_dio_end_io(struct btrfs_bio *bbio)
 {
-	/*
-	 * This implies a barrier so that stores to dio_bio->bi_status before
-	 * this and loads of dio_bio->bi_status after this are fully ordered.
-	 */
-	if (!refcount_dec_and_test(&dip->refs))
-		return;
-
-	if (btrfs_op(&dip->bio) == BTRFS_MAP_WRITE) {
-		btrfs_mark_ordered_io_finished(dip->inode, NULL,
-					       dip->file_offset, dip->bytes,
-					       !dip->bio.bi_status);
-	} else {
-		unlock_extent(&dip->inode->io_tree,
-			      dip->file_offset,
-			      dip->file_offset + dip->bytes - 1, NULL);
-	}
-
-	bio_endio(&dip->bio);
-}
-
-static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
-{
-	struct btrfs_dio_private *dip = bbio->private;
+	struct btrfs_dio_private *dip =
+		container_of(bbio, struct btrfs_dio_private, bbio);
+	struct btrfs_inode *inode = bbio->inode;
 	struct bio *bio = &bbio->bio;
-	blk_status_t err = bio->bi_status;
 
-	if (err) {
-		btrfs_warn(dip->inode->root->fs_info,
-			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
-			   btrfs_ino(dip->inode), bio_op(bio),
-			   bio->bi_opf, bio->bi_iter.bi_sector,
-			   bio->bi_iter.bi_size, err);
-		dip->bio.bi_status = err;
+	if (bio->bi_status) {
+		btrfs_warn(inode->root->fs_info,
+			   "direct IO failed ino %llu op 0x%0x offset %#llx len %u err no %d",
+			   btrfs_ino(inode), bio->bi_opf,
+			   dip->file_offset, dip->bytes, bio->bi_status);
 	}
 
-	bio_put(bio);
-	btrfs_dio_private_put(dip);
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
+		btrfs_mark_ordered_io_finished(inode, NULL, dip->file_offset,
+					       dip->bytes, !bio->bi_status);
+	else
+		unlock_extent(&inode->io_tree, dip->file_offset,
+			      dip->file_offset + dip->bytes - 1, NULL);
+
+	bbio->bio.bi_private = bbio->private;
+	iomap_dio_bio_end_io(bio);
 }
 
-static void btrfs_submit_direct(const struct iomap_iter *iter,
-		struct bio *dio_bio, loff_t file_offset)
+static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
+				loff_t file_offset)
 {
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	struct btrfs_dio_private *dip =
-		container_of(dio_bio, struct btrfs_dio_private, bio);
-	struct inode *inode = iter->inode;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct bio *bio;
-	u64 start_sector;
-	u64 submit_len;
-	u64 clone_offset = 0;
-	u64 clone_len;
-	u64 logical;
-	int ret;
-	blk_status_t status;
-	struct btrfs_io_geometry geom;
+		container_of(bbio, struct btrfs_dio_private, bbio);
 	struct btrfs_dio_data *dio_data = iter->private;
-	struct extent_map *em = NULL;
 
-	dip->inode = BTRFS_I(inode);
-	dip->file_offset = file_offset;
-	dip->bytes = dio_bio->bi_iter.bi_size;
-	refcount_set(&dip->refs, 1);
+	btrfs_bio_init(bbio, BTRFS_I(iter->inode), btrfs_dio_end_io,
+			bio->bi_private);
+	bbio->file_offset = file_offset;
 
-	start_sector = dio_bio->bi_iter.bi_sector;
-	submit_len = dio_bio->bi_iter.bi_size;
-
-	do {
-		logical = start_sector << 9;
-		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
-		if (IS_ERR(em)) {
-			status = errno_to_blk_status(PTR_ERR(em));
-			em = NULL;
-			goto out_err;
-		}
-		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
-					    logical, &geom);
-		if (ret) {
-			status = errno_to_blk_status(ret);
-			goto out_err_em;
-		}
-
-		clone_len = min(submit_len, geom.len);
-		ASSERT(clone_len <= UINT_MAX);
-
-		/*
-		 * This will never fail as it's passing GPF_NOFS and
-		 * the allocation is backed by btrfs_bioset.
-		 */
-		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len,
-					      BTRFS_I(inode), btrfs_end_dio_bio,
-					      dip);
-		btrfs_bio(bio)->file_offset = file_offset;
-
-		ASSERT(submit_len >= clone_len);
-		submit_len -= clone_len;
-
-		/*
-		 * Increase the count before we submit the bio so we know
-		 * the end IO handler won't happen before we increase the
-		 * count. Otherwise, the dip might get freed before we're
-		 * done setting it up.
-		 *
-		 * We transfer the initial reference to the last bio, so we
-		 * don't need to increment the reference count for the last one.
-		 */
-		if (submit_len > 0)
-			refcount_inc(&dip->refs);
-
-		btrfs_submit_bio(fs_info, bio, 0);
-
-		dio_data->submitted += clone_len;
-		clone_offset += clone_len;
-		start_sector += clone_len >> 9;
-		file_offset += clone_len;
-
-		free_extent_map(em);
-	} while (submit_len > 0);
-	return;
+	dip->file_offset = file_offset;
+	dip->bytes = bio->bi_iter.bi_size;
 
-out_err_em:
-	free_extent_map(em);
-out_err:
-	dio_bio->bi_status = status;
-	btrfs_dio_private_put(dip);
+	dio_data->submitted += bio->bi_iter.bi_size;
+	btrfs_submit_bio(btrfs_sb(iter->inode->i_sb), bio, 0);
 }
 
 static const struct iomap_ops btrfs_dio_iomap_ops = {
@@ -7901,7 +7805,7 @@ static const struct iomap_ops btrfs_dio_iomap_ops = {
 };
 
 static const struct iomap_dio_ops btrfs_dio_ops = {
-	.submit_io		= btrfs_submit_direct,
+	.submit_io		= btrfs_dio_submit_io,
 	.bio_set		= &btrfs_dio_bioset,
 };
 
@@ -8736,7 +8640,7 @@ int __init btrfs_init_cachep(void)
 		goto fail;
 
 	if (bioset_init(&btrfs_dio_bioset, BIO_POOL_SIZE,
-			offsetof(struct btrfs_dio_private, bio),
+			offsetof(struct btrfs_dio_private, bbio.bio),
 			BIOSET_NEED_BVECS))
 		goto fail;
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 25/34] btrfs: remove stripe boundary calculation for buffered I/O
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (23 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 24/34] btrfs: pass the iomap bio to btrfs_submit_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 26/34] btrfs: remove stripe boundary calculation for compressed I/O Christoph Hellwig
                   ` (9 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

From: Qu Wenruo <wqu@suse.com>

Remove btrfs_bio_ctrl::len_to_stripe_boundary, so that buffer
I/O will no longer limit its bio size according to stripe length
now that btrfs_submit_bio can split bios at stripe boundaries.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: simplify calc_bio_boundaries a little more]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent_io.c | 71 ++++++++++++--------------------------------
 1 file changed, 19 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 81a56489db7323..c7346bc6d8bdd7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -99,7 +99,6 @@ struct btrfs_bio_ctrl {
 	struct bio *bio;
 	int mirror_num;
 	enum btrfs_compression_type compress_type;
-	u32 len_to_stripe_boundary;
 	u32 len_to_oe_boundary;
 	btrfs_bio_end_io_t end_io_func;
 
@@ -901,7 +900,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 
 	ASSERT(bio);
 	/* The limit should be calculated when bio_ctrl->bio is allocated */
-	ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary);
+	ASSERT(bio_ctrl->len_to_oe_boundary);
 	if (bio_ctrl->compress_type != compress_type)
 		return 0;
 
@@ -937,9 +936,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 	if (!contig)
 		return 0;
 
-	real_size = min(bio_ctrl->len_to_oe_boundary,
-			bio_ctrl->len_to_stripe_boundary) - bio_size;
-	real_size = min(real_size, size);
+	real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size);
 
 	/*
 	 * If real_size is 0, never call bio_add_*_page(), as even size is 0,
@@ -956,58 +953,30 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 	return ret;
 }
 
-static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
-			       struct btrfs_inode *inode, u64 file_offset)
+static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
+				struct btrfs_inode *inode, u64 file_offset)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_io_geometry geom;
 	struct btrfs_ordered_extent *ordered;
-	struct extent_map *em;
 	u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
-	int ret;
 
 	/*
-	 * Pages for compressed extent are never submitted to disk directly,
-	 * thus it has no real boundary, just set them to U32_MAX.
-	 *
-	 * The split happens for real compressed bio, which happens in
-	 * btrfs_submit_compressed_read/write().
+	 * Limit the extent to the ordered boundary for Zone Append.
+	 * Compressed bios aren't submitted directly, so it doesn't apply
+	 * to them.
 	 */
-	if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) {
-		bio_ctrl->len_to_oe_boundary = U32_MAX;
-		bio_ctrl->len_to_stripe_boundary = U32_MAX;
-		return 0;
-	}
-	em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
-	if (IS_ERR(em))
-		return PTR_ERR(em);
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
-				    logical, &geom);
-	free_extent_map(em);
-	if (ret < 0) {
-		return ret;
-	}
-	if (geom.len > U32_MAX)
-		bio_ctrl->len_to_stripe_boundary = U32_MAX;
-	else
-		bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
-
-	if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
-		bio_ctrl->len_to_oe_boundary = U32_MAX;
-		return 0;
-	}
-
-	/* Ordered extent not yet created, so we're good */
-	ordered = btrfs_lookup_ordered_extent(inode, file_offset);
-	if (!ordered) {
-		bio_ctrl->len_to_oe_boundary = U32_MAX;
-		return 0;
+	if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE &&
+	    bio_op(bio_ctrl->bio) == REQ_OP_ZONE_APPEND) {
+		ordered = btrfs_lookup_ordered_extent(inode, file_offset);
+		if (ordered) {
+			bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
+					ordered->disk_bytenr +
+					ordered->disk_num_bytes - logical);
+			btrfs_put_ordered_extent(ordered);
+			return;
+		}
 	}
 
-	bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
-		ordered->disk_bytenr + ordered->disk_num_bytes - logical);
-	btrfs_put_ordered_extent(ordered);
-	return 0;
+	bio_ctrl->len_to_oe_boundary = U32_MAX;
 }
 
 static int alloc_new_bio(struct btrfs_inode *inode,
@@ -1033,9 +1002,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 		bio->bi_iter.bi_sector = (disk_bytenr + offset) >> SECTOR_SHIFT;
 	bio_ctrl->bio = bio;
 	bio_ctrl->compress_type = compress_type;
-	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
-	if (ret < 0)
-		goto error;
+	calc_bio_boundaries(bio_ctrl, inode, file_offset);
 
 	if (wbc) {
 		/*
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 26/34] btrfs: remove stripe boundary calculation for compressed I/O
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (24 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 25/34] btrfs: remove stripe boundary calculation for buffered I/O Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 27/34] btrfs: remove stripe boundary calculation for encoded I/O Christoph Hellwig
                   ` (8 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

From: Qu Wenruo <wqu@suse.com>

Stop looking at the stripe boundary in alloc_compressed_bio() now that
that btrfs_submit_bio can split bios, open code the now trivial code
from alloc_compressed_bio() in btrfs_submit_compressed_read and stop
maintaining the pending_ios count for reads as there is always just
a single bio now.

Signed-off-by: Qu Wenruo <wqu@suse.com>
[hch: remove more cruft in btrfs_submit_compressed_read,
      use btrfs_zoned_get_device in alloc_compressed_bio]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/compression.c | 131 +++++++++++------------------------------
 1 file changed, 34 insertions(+), 97 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index efa040eb6b19d1..547cb626dfdeb9 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -141,12 +141,15 @@ static int compression_decompress(int type, struct list_head *ws,
 
 static int btrfs_decompress_bio(struct compressed_bio *cb);
 
-static void finish_compressed_bio_read(struct compressed_bio *cb)
+static void end_compressed_bio_read(struct btrfs_bio *bbio)
 {
+	struct compressed_bio *cb = bbio->private;
 	unsigned int index;
 	struct page *page;
 
-	if (cb->status == BLK_STS_OK)
+	if (bbio->bio.bi_status)
+		cb->status = bbio->bio.bi_status;
+	else
 		cb->status = errno_to_blk_status(btrfs_decompress_bio(cb));
 
 	/* Release the compressed pages */
@@ -162,17 +165,6 @@ static void finish_compressed_bio_read(struct compressed_bio *cb)
 	/* Finally free the cb struct */
 	kfree(cb->compressed_pages);
 	kfree(cb);
-}
-
-static void end_compressed_bio_read(struct btrfs_bio *bbio)
-{
-	struct compressed_bio *cb = bbio->private;
-
-	if (bbio->bio.bi_status)
-		cb->status = bbio->bio.bi_status;
-
-	if (refcount_dec_and_test(&cb->pending_ios))
-		finish_compressed_bio_read(cb);
 	bio_put(&bbio->bio);
 }
 
@@ -289,43 +281,30 @@ static void end_compressed_bio_write(struct btrfs_bio *bbio)
  *                      from or written to.
  * @endio_func:         The endio function to call after the IO for compressed data
  *                      is finished.
- * @next_stripe_start:  Return value of logical bytenr of where next stripe starts.
- *                      Let the caller know to only fill the bio up to the stripe
- *                      boundary.
  */
-
-
 static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
 					blk_opf_t opf,
-					btrfs_bio_end_io_t endio_func,
-					u64 *next_stripe_start)
+					btrfs_bio_end_io_t endio_func)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
-	struct btrfs_io_geometry geom;
-	struct extent_map *em;
 	struct bio *bio;
-	int ret;
 
 	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, BTRFS_I(cb->inode), endio_func,
 			      cb);
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
 
-	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
-	if (IS_ERR(em)) {
-		bio_put(bio);
-		return ERR_CAST(em);
-	}
+	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+		struct btrfs_device *device;
 
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-		bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev);
+		device = btrfs_zoned_get_device(fs_info, disk_bytenr,
+						fs_info->sectorsize);
+		if (IS_ERR(device)) {
+			bio_put(bio);
+			return ERR_CAST(device);
+		}
 
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom);
-	free_extent_map(em);
-	if (ret < 0) {
-		bio_put(bio);
-		return ERR_PTR(ret);
+		bio_set_dev(bio, device->bdev);
 	}
-	*next_stripe_start = disk_bytenr + geom.len;
 	refcount_inc(&cb->pending_ios);
 	return bio;
 }
@@ -352,7 +331,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	struct bio *bio = NULL;
 	struct compressed_bio *cb;
 	u64 cur_disk_bytenr = disk_start;
-	u64 next_stripe_start;
 	blk_status_t ret = BLK_STS_OK;
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
 	const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED |
@@ -388,8 +366,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		/* Allocate new bio if submitted or not yet allocated */
 		if (!bio) {
 			bio = alloc_compressed_bio(cb, cur_disk_bytenr,
-				bio_op | write_flags, end_compressed_bio_write,
-				&next_stripe_start);
+				bio_op | write_flags, end_compressed_bio_write);
 			if (IS_ERR(bio)) {
 				ret = errno_to_blk_status(PTR_ERR(bio));
 				break;
@@ -398,20 +375,12 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
 		}
-		/*
-		 * We should never reach next_stripe_start start as we will
-		 * submit comp_bio when reach the boundary immediately.
-		 */
-		ASSERT(cur_disk_bytenr != next_stripe_start);
-
 		/*
 		 * We have various limits on the real read size:
-		 * - stripe boundary
 		 * - page boundary
 		 * - compressed length boundary
 		 */
-		real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_bytenr);
-		real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset));
 		real_size = min_t(u64, real_size, compressed_len - offset);
 		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
@@ -426,9 +395,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			submit = true;
 
 		cur_disk_bytenr += added;
-		/* Reached stripe boundary */
-		if (cur_disk_bytenr == next_stripe_start)
-			submit = true;
 
 		/* Finished the range */
 		if (cur_disk_bytenr == disk_start + compressed_len)
@@ -623,10 +589,9 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	struct extent_map_tree *em_tree;
 	struct compressed_bio *cb;
 	unsigned int compressed_len;
-	struct bio *comp_bio = NULL;
+	struct bio *comp_bio;
 	const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 cur_disk_byte = disk_bytenr;
-	u64 next_stripe_start;
 	u64 file_offset;
 	u64 em_len;
 	u64 em_start;
@@ -693,37 +658,24 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
 
+	comp_bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ,
+				   BTRFS_I(cb->inode),
+				   end_compressed_bio_read, cb);
+	comp_bio->bi_iter.bi_sector = cur_disk_byte >> SECTOR_SHIFT;
+
 	while (cur_disk_byte < disk_bytenr + compressed_len) {
 		u64 offset = cur_disk_byte - disk_bytenr;
 		unsigned int index = offset >> PAGE_SHIFT;
 		unsigned int real_size;
 		unsigned int added;
 		struct page *page = cb->compressed_pages[index];
-		bool submit = false;
 
-		/* Allocate new bio if submitted or not yet allocated */
-		if (!comp_bio) {
-			comp_bio = alloc_compressed_bio(cb, cur_disk_byte,
-					REQ_OP_READ, end_compressed_bio_read,
-					&next_stripe_start);
-			if (IS_ERR(comp_bio)) {
-				cb->status = errno_to_blk_status(PTR_ERR(comp_bio));
-				break;
-			}
-		}
-		/*
-		 * We should never reach next_stripe_start start as we will
-		 * submit comp_bio when reach the boundary immediately.
-		 */
-		ASSERT(cur_disk_byte != next_stripe_start);
 		/*
 		 * We have various limit on the real read size:
-		 * - stripe boundary
 		 * - page boundary
 		 * - compressed length boundary
 		 */
-		real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_byte);
-		real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset));
 		real_size = min_t(u64, real_size, compressed_len - offset);
 		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
@@ -734,35 +686,20 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		 */
 		ASSERT(added == real_size);
 		cur_disk_byte += added;
-
-		/* Reached stripe boundary, need to submit */
-		if (cur_disk_byte == next_stripe_start)
-			submit = true;
-
-		/* Has finished the range, need to submit */
-		if (cur_disk_byte == disk_bytenr + compressed_len)
-			submit = true;
-
-		if (submit) {
-			/*
-			 * Save the initial offset of this chunk, as there
-			 * is no direct correlation between compressed pages and
-			 * the original file offset.  The field is only used for
-			 * priting error messages.
-			 */
-			btrfs_bio(comp_bio)->file_offset = file_offset;
-
-			ASSERT(comp_bio->bi_iter.bi_size);
-			btrfs_submit_bio(fs_info, comp_bio, mirror_num);
-			comp_bio = NULL;
-		}
 	}
 
 	if (memstall)
 		psi_memstall_leave(&pflags);
 
-	if (refcount_dec_and_test(&cb->pending_ios))
-		finish_compressed_bio_read(cb);
+	/*
+	 * Just stash the initial offset of this chunk, as there is no direct
+	 * correlation between compressed pages and the original file offset.
+	 * The field is only used for priting error messages anyway.
+	 */
+	btrfs_bio(comp_bio)->file_offset = file_offset;
+
+	ASSERT(comp_bio->bi_iter.bi_size);
+	btrfs_submit_bio(fs_info, comp_bio, mirror_num);
 	return;
 
 fail:
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 27/34] btrfs: remove stripe boundary calculation for encoded I/O
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (25 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 26/34] btrfs: remove stripe boundary calculation for compressed I/O Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 28/34] btrfs: remove struct btrfs_io_geometry Christoph Hellwig
                   ` (7 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

From: Qu Wenruo <wqu@suse.com>

Stop looking at the stripe boundary in
btrfs_encoded_read_regular_fill_pages() now that btrfs_submit_bio can
split bios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/inode.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 46ff73c09da7d3..e246fa8e4ae7f6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9972,7 +9972,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 					  u64 file_offset, u64 disk_bytenr,
 					  u64 disk_io_size, struct page **pages)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_encoded_read_private priv = {
 		.inode = inode,
 		.file_offset = file_offset,
@@ -9980,33 +9979,15 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 	};
 	unsigned long i = 0;
 	u64 cur = 0;
-	int ret;
 
 	init_waitqueue_head(&priv.wait);
 	/*
-	 * Submit bios for the extent, splitting due to bio or stripe limits as
-	 * necessary.
+	 * Submit bios for the extent, splitting due to bio limits as necessary.
 	 */
 	while (cur < disk_io_size) {
-		struct extent_map *em;
-		struct btrfs_io_geometry geom;
 		struct bio *bio = NULL;
-		u64 remaining;
+		u64 remaining = disk_io_size - cur;
 
-		em = btrfs_get_chunk_map(fs_info, disk_bytenr + cur,
-					 disk_io_size - cur);
-		if (IS_ERR(em)) {
-			ret = PTR_ERR(em);
-		} else {
-			ret = btrfs_get_io_geometry(fs_info, em, BTRFS_MAP_READ,
-						    disk_bytenr + cur, &geom);
-			free_extent_map(em);
-		}
-		if (ret) {
-			WRITE_ONCE(priv.status, errno_to_blk_status(ret));
-			break;
-		}
-		remaining = min(geom.len, disk_io_size - cur);
 		while (bio || remaining) {
 			size_t bytes = min_t(u64, remaining, PAGE_SIZE);
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 28/34] btrfs: remove struct btrfs_io_geometry
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (26 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 27/34] btrfs: remove stripe boundary calculation for encoded I/O Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 29/34] btrfs: remove submit_encoded_read_bio Christoph Hellwig
                   ` (6 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Now that btrfs_get_io_geometry has a single caller, we can massage it
into a form that is more suitable for that caller and remove the
marshalling into and out of struct btrfs_io_geometry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 115 +++++++++++++--------------------------------
 fs/btrfs/volumes.h |  18 -------
 2 files changed, 32 insertions(+), 101 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4cdadf30163d5e..707dd0456cea21 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6270,91 +6270,43 @@ static bool need_full_stripe(enum btrfs_map_op op)
 	return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
 }
 
-/*
- * Calculate the geometry of a particular (address, len) tuple. This
- * information is used to calculate how big a particular bio can get before it
- * straddles a stripe.
- *
- * @fs_info: the filesystem
- * @em:      mapping containing the logical extent
- * @op:      type of operation - write or read
- * @logical: address that we want to figure out the geometry of
- * @io_geom: pointer used to return values
- *
- * Returns < 0 in case a chunk for the given logical address cannot be found,
- * usually shouldn't happen unless @logical is corrupted, 0 otherwise.
- */
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom)
+static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op,
+			    u64 offset, u64 *stripe_nr, u64 *stripe_offset,
+			    u64 *full_stripe_start)
 {
-	struct map_lookup *map;
-	u64 len;
-	u64 offset;
-	u64 stripe_offset;
-	u64 stripe_nr;
-	u32 stripe_len;
-	u64 raid56_full_stripe_start = (u64)-1;
-	int data_stripes;
+	u32 stripe_len = map->stripe_len;
 
 	ASSERT(op != BTRFS_MAP_DISCARD);
 
-	map = em->map_lookup;
-	/* Offset of this logical address in the chunk */
-	offset = logical - em->start;
-	/* Len of a stripe in a chunk */
-	stripe_len = map->stripe_len;
 	/*
-	 * Stripe_nr is where this block falls in
-	 * stripe_offset is the offset of this block in its stripe.
+	 * Stripe_nr is the stripe where this block falls.
+	 * Stripe_offset is the offset of this block in its stripe.
 	 */
-	stripe_nr = div64_u64_rem(offset, stripe_len, &stripe_offset);
-	ASSERT(stripe_offset < U32_MAX);
+	*stripe_nr = div64_u64_rem(offset, stripe_len, stripe_offset);
+	ASSERT(*stripe_offset < U32_MAX);
 
-	data_stripes = nr_data_stripes(map);
+	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		unsigned long full_stripe_len =
+			stripe_len * nr_data_stripes(map);
 
-	/* Only stripe based profiles needs to check against stripe length. */
-	if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) {
-		u64 max_len = stripe_len - stripe_offset;
+		*full_stripe_start =
+			div64_u64(offset, full_stripe_len) * full_stripe_len;
 
 		/*
-		 * In case of raid56, we need to know the stripe aligned start
+		 * For writes to RAID[56], allow to write a full stripe set, but
+		 * no straddling of stripe sets.
 		 */
-		if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-			unsigned long full_stripe_len = stripe_len * data_stripes;
-			raid56_full_stripe_start = offset;
-
-			/*
-			 * Allow a write of a full stripe, but make sure we
-			 * don't allow straddling of stripes
-			 */
-			raid56_full_stripe_start = div64_u64(raid56_full_stripe_start,
-					full_stripe_len);
-			raid56_full_stripe_start *= full_stripe_len;
-
-			/*
-			 * For writes to RAID[56], allow a full stripeset across
-			 * all disks. For other RAID types and for RAID[56]
-			 * reads, just allow a single stripe (on a single disk).
-			 */
-			if (op == BTRFS_MAP_WRITE) {
-				max_len = stripe_len * data_stripes -
-					  (offset - raid56_full_stripe_start);
-			}
-		}
-		len = min_t(u64, em->len - offset, max_len);
-	} else {
-		len = em->len - offset;
+		if (op == BTRFS_MAP_WRITE)
+			return full_stripe_len - (offset - *full_stripe_start);
 	}
 
-	io_geom->len = len;
-	io_geom->offset = offset;
-	io_geom->stripe_len = stripe_len;
-	io_geom->stripe_nr = stripe_nr;
-	io_geom->stripe_offset = stripe_offset;
-	io_geom->raid56_stripe_offset = raid56_full_stripe_start;
-
-	return 0;
+	/*
+	 * For other RAID types and for RAID[56] reads, just allow a single
+	 * stripe (on a single disk).
+	 */
+	if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK)
+		return stripe_len - *stripe_offset;
+	return U64_MAX;
 }
 
 static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup *map,
@@ -6373,6 +6325,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 {
 	struct extent_map *em;
 	struct map_lookup *map;
+	u64 map_offset;
 	u64 stripe_offset;
 	u64 stripe_nr;
 	u64 stripe_len;
@@ -6391,7 +6344,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	int patch_the_first_stripe_for_dev_replace = 0;
 	u64 physical_to_patch_in_first_stripe = 0;
 	u64 raid56_full_stripe_start = (u64)-1;
-	struct btrfs_io_geometry geom;
+	u64 max_len;
 
 	ASSERT(bioc_ret);
 	ASSERT(op != BTRFS_MAP_DISCARD);
@@ -6399,18 +6352,14 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	em = btrfs_get_chunk_map(fs_info, logical, *length);
 	ASSERT(!IS_ERR(em));
 
-	ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom);
-	if (ret < 0)
-		return ret;
-
 	map = em->map_lookup;
-
-	*length = geom.len;
-	stripe_len = geom.stripe_len;
-	stripe_nr = geom.stripe_nr;
-	stripe_offset = geom.stripe_offset;
-	raid56_full_stripe_start = geom.raid56_stripe_offset;
 	data_stripes = nr_data_stripes(map);
+	stripe_len = map->stripe_len;
+
+	map_offset = logical - em->start;
+	max_len = btrfs_max_io_len(map, op, map_offset, &stripe_nr,
+				   &stripe_offset, &raid56_full_stripe_start);
+	*length = min_t(u64, em->len - map_offset, max_len);
 
 	down_read(&dev_replace->rwsem);
 	dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6b7a05f6cf8237..7e51f2238f72e6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -53,21 +53,6 @@ enum btrfs_raid_types {
 	BTRFS_NR_RAID_TYPES
 };
 
-struct btrfs_io_geometry {
-	/* remaining bytes before crossing a stripe */
-	u64 len;
-	/* offset of logical address in chunk */
-	u64 offset;
-	/* length of single IO stripe */
-	u32 stripe_len;
-	/* offset of address in stripe */
-	u32 stripe_offset;
-	/* number of stripe where address falls */
-	u64 stripe_nr;
-	/* offset of raid56 stripe into the chunk */
-	u64 raid56_stripe_offset;
-};
-
 /*
  * Use sequence counter to get consistent device stat data on
  * 32-bit processors.
@@ -545,9 +530,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info,
 					       u64 logical, u64 *length_ret,
 					       u32 *num_stripes);
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom);
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 29/34] btrfs: remove submit_encoded_read_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (27 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 28/34] btrfs: remove struct btrfs_io_geometry Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 30/34] btrfs: remove the fs_info argument to btrfs_submit_bio Christoph Hellwig
                   ` (5 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Just opencode the functionality in the only caller and remove the
now superfluous error handling there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/inode.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e246fa8e4ae7f6..aad802d100e134 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9937,17 +9937,6 @@ struct btrfs_encoded_read_private {
 	blk_status_t status;
 };
 
-static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
-					    struct bio *bio, int mirror_num)
-{
-	struct btrfs_encoded_read_private *priv = btrfs_bio(bio)->private;
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-
-	atomic_inc(&priv->pending);
-	btrfs_submit_bio(fs_info, bio, mirror_num);
-	return BLK_STS_OK;
-}
-
 static void btrfs_encoded_read_endio(struct btrfs_bio *bbio)
 {
 	struct btrfs_encoded_read_private *priv = bbio->private;
@@ -9972,6 +9961,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 					  u64 file_offset, u64 disk_bytenr,
 					  u64 disk_io_size, struct page **pages)
 {
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_encoded_read_private priv = {
 		.inode = inode,
 		.file_offset = file_offset,
@@ -10002,14 +9992,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 
 			if (!bytes ||
 			    bio_add_page(bio, pages[i], bytes, 0) < bytes) {
-				blk_status_t status;
-
-				status = submit_encoded_read_bio(inode, bio, 0);
-				if (status) {
-					WRITE_ONCE(priv.status, status);
-					bio_put(bio);
-					goto out;
-				}
+				atomic_inc(&priv.pending);
+				btrfs_submit_bio(fs_info, bio, 0);
 				bio = NULL;
 				continue;
 			}
@@ -10020,7 +10004,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 		}
 	}
 
-out:
 	if (atomic_dec_return(&priv.pending))
 		io_wait_event(priv.wait, !atomic_read(&priv.pending));
 	/* See btrfs_encoded_read_endio() for ordering. */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 30/34] btrfs: remove the fs_info argument to btrfs_submit_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (28 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 29/34] btrfs: remove submit_encoded_read_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 31/34] btrfs: remove now spurious bio submission helpers Christoph Hellwig
                   ` (4 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

btrfs_submit_bio can derive it trivially from bbio->inode, so stop
bothering in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         | 13 ++++++-------
 fs/btrfs/bio.h         |  3 +--
 fs/btrfs/compression.c |  4 ++--
 fs/btrfs/disk-io.c     |  2 +-
 fs/btrfs/inode.c       | 11 ++++-------
 5 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 5d4b67fc44f4de..38a2d2bb5c2fa9 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -154,7 +154,7 @@ static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio,
 			goto done;
 		}
 
-		btrfs_submit_bio(fs_info, &repair_bbio->bio, mirror);
+		btrfs_submit_bio(&repair_bbio->bio, mirror);
 		return;
 	}
 
@@ -224,7 +224,7 @@ static struct btrfs_failed_bio *repair_one_sector(struct btrfs_bio *failed_bbio,
 
 	mirror = next_repair_mirror(fbio, failed_bbio->mirror_num);
 	btrfs_debug(fs_info, "submitting repair read to mirror %d", mirror);
-	btrfs_submit_bio(fs_info, repair_bio, mirror);
+	btrfs_submit_bio(repair_bio, mirror);
 	return fbio;
 }
 
@@ -603,10 +603,10 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
 	return true;
 }
 
-static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
-			       int mirror_num)
+static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
@@ -678,10 +678,9 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
 	return true;
 }
 
-void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-		      int mirror_num)
+void btrfs_submit_bio(struct bio *bio, int mirror_num)
 {
-	while (!btrfs_submit_chunk(fs_info, bio, mirror_num))
+	while (!btrfs_submit_chunk(bio, mirror_num))
 		;
 }
 
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 5acbcc18026d52..20105806c8acc3 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -88,8 +88,7 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 /* bio only refers to one ordered extent */
 #define REQ_BTRFS_ONE_ORDERED	REQ_DRV
 
-void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-		      int mirror_num);
+void btrfs_submit_bio(struct bio *bio, int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 547cb626dfdeb9..41c1f8937bf5d6 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -402,7 +402,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 
 		if (submit) {
 			ASSERT(bio->bi_iter.bi_size);
-			btrfs_submit_bio(fs_info, bio, 0);
+			btrfs_submit_bio(bio, 0);
 			bio = NULL;
 		}
 		cond_resched();
@@ -699,7 +699,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	btrfs_bio(comp_bio)->file_offset = file_offset;
 
 	ASSERT(comp_bio->bi_iter.bi_size);
-	btrfs_submit_bio(fs_info, comp_bio, mirror_num);
+	btrfs_submit_bio(comp_bio, mirror_num);
 	return;
 
 fail:
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ae329b5ce9804..a84b662fdd220d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -702,7 +702,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 void btrfs_submit_metadata_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
 {
 	bio->bi_opf |= REQ_META;
-	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
+	btrfs_submit_bio(bio, mirror_num);
 }
 
 #ifdef CONFIG_MIGRATION
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index aad802d100e134..0c3e0ec842933e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2697,14 +2697,12 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 
 void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
 {
-	btrfs_submit_bio(inode->root->fs_info, bio, mirror_num);
+	btrfs_submit_bio(bio, mirror_num);
 }
 
 void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 			int mirror_num, enum btrfs_compression_type compress_type)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-
 	if (compress_type != BTRFS_COMPRESS_NONE) {
 		/*
 		 * btrfs_submit_compressed_read will handle completing the bio
@@ -2714,7 +2712,7 @@ void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
 		return;
 	}
 
-	btrfs_submit_bio(fs_info, bio, mirror_num);
+	btrfs_submit_bio(bio, mirror_num);
 }
 
 /*
@@ -7796,7 +7794,7 @@ static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio,
 	dip->bytes = bio->bi_iter.bi_size;
 
 	dio_data->submitted += bio->bi_iter.bi_size;
-	btrfs_submit_bio(btrfs_sb(iter->inode->i_sb), bio, 0);
+	btrfs_submit_bio(bio, 0);
 }
 
 static const struct iomap_ops btrfs_dio_iomap_ops = {
@@ -9961,7 +9959,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 					  u64 file_offset, u64 disk_bytenr,
 					  u64 disk_io_size, struct page **pages)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_encoded_read_private priv = {
 		.inode = inode,
 		.file_offset = file_offset,
@@ -9993,7 +9990,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 			if (!bytes ||
 			    bio_add_page(bio, pages[i], bytes, 0) < bytes) {
 				atomic_inc(&priv.pending);
-				btrfs_submit_bio(fs_info, bio, 0);
+				btrfs_submit_bio(bio, 0);
 				bio = NULL;
 				continue;
 			}
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 31/34] btrfs: remove now spurious bio submission helpers
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (29 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 30/34] btrfs: remove the fs_info argument to btrfs_submit_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 32/34] btrfs: calculate file system wide queue limit for zoned mode Christoph Hellwig
                   ` (3 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

Just call btrfs_submit_bio and btrfs_submit_compressed_read directly from
submit_one_bio now that all additional functionality has moved into
btrfs_submit_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/btrfs_inode.h |  3 ---
 fs/btrfs/disk-io.c     |  6 ------
 fs/btrfs/disk-io.h     |  1 -
 fs/btrfs/extent_io.c   | 19 ++++++++++---------
 fs/btrfs/inode.c       | 20 --------------------
 5 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b83b731c63e13e..49a92aa65de1f6 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -405,9 +405,6 @@ static inline void btrfs_inode_split_flags(u64 inode_item_flags,
 #define CSUM_FMT				"0x%*phN"
 #define CSUM_FMT_VALUE(size, bytes)		size, bytes
 
-void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
-void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
-			int mirror_num, enum btrfs_compression_type compress_type);
 int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
 			    u32 pgoff, u8 *csum, const u8 * const csum_expected);
 blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a84b662fdd220d..0da0bde347e54a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -699,12 +699,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 	return ret;
 }
 
-void btrfs_submit_metadata_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
-{
-	bio->bi_opf |= REQ_META;
-	btrfs_submit_bio(bio, mirror_num);
-}
-
 #ifdef CONFIG_MIGRATION
 static int btree_migrate_folio(struct address_space *mapping,
 		struct folio *dst, struct folio *src, enum migrate_mode mode)
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index f2dd4c6d9c258b..3b53fc29a85888 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -86,7 +86,6 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
 int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror);
-void btrfs_submit_metadata_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
 #endif
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c7346bc6d8bdd7..66134bb75f8941 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -125,7 +125,7 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 {
 	struct bio *bio;
 	struct bio_vec *bv;
-	struct btrfs_inode *inode;
+	struct inode *inode;
 	int mirror_num;
 
 	if (!bio_ctrl->bio)
@@ -133,7 +133,7 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 
 	bio = bio_ctrl->bio;
 	bv = bio_first_bvec_all(bio);
-	inode = BTRFS_I(bv->bv_page->mapping->host);
+	inode = bv->bv_page->mapping->host;
 	mirror_num = bio_ctrl->mirror_num;
 
 	/* Caller should ensure the bio has at least some range added */
@@ -141,7 +141,7 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 
 	btrfs_bio(bio)->file_offset = page_offset(bv->bv_page) + bv->bv_offset;
 
-	if (!is_data_inode(&inode->vfs_inode)) {
+	if (!is_data_inode(inode)) {
 		if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
 			/*
 			 * For metadata read, we should have the parent_check,
@@ -152,14 +152,15 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 			       bio_ctrl->parent_check,
 			       sizeof(struct btrfs_tree_parent_check));
 		}
-		btrfs_submit_metadata_bio(inode, bio, mirror_num);
-	} else if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-		btrfs_submit_data_write_bio(inode, bio, mirror_num);
-	} else {
-		btrfs_submit_data_read_bio(inode, bio, mirror_num,
-					   bio_ctrl->compress_type);
+		bio->bi_opf |= REQ_META;
 	}
 
+	if (btrfs_op(bio) == BTRFS_MAP_READ &&
+	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
+		btrfs_submit_compressed_read(inode, bio, mirror_num);
+	else
+		btrfs_submit_bio(bio, mirror_num);
+
 	/* The bio is owned by the end_io handler now */
 	bio_ctrl->bio = NULL;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0c3e0ec842933e..383219f51d86ab 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2695,26 +2695,6 @@ blk_status_t btrfs_extract_ordered_extent(struct btrfs_bio *bbio)
 	return errno_to_blk_status(ret);
 }
 
-void btrfs_submit_data_write_bio(struct btrfs_inode *inode, struct bio *bio, int mirror_num)
-{
-	btrfs_submit_bio(bio, mirror_num);
-}
-
-void btrfs_submit_data_read_bio(struct btrfs_inode *inode, struct bio *bio,
-			int mirror_num, enum btrfs_compression_type compress_type)
-{
-	if (compress_type != BTRFS_COMPRESS_NONE) {
-		/*
-		 * btrfs_submit_compressed_read will handle completing the bio
-		 * if there were any errors, so just return here.
-		 */
-		btrfs_submit_compressed_read(&inode->vfs_inode, bio, mirror_num);
-		return;
-	}
-
-	btrfs_submit_bio(bio, mirror_num);
-}
-
 /*
  * given a list of ordered sums record them in the inode.  This happens
  * at IO completion time based on sums calculated at bio submission time.
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 32/34] btrfs: calculate file system wide queue limit for zoned mode
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (30 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 31/34] btrfs: remove now spurious bio submission helpers Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 33/34] btrfs: split zone append bios in btrfs_submit_bio Christoph Hellwig
                   ` (2 subsequent siblings)
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

To be able to split a write into properly sized zone append commands,
we need a queue_limits structure that contains the least common
denominator suitable for all devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/fs.h    |  5 ++++-
 fs/btrfs/zoned.c | 52 ++++++++++++++++++++++++------------------------
 fs/btrfs/zoned.h |  1 -
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 3d8156fc8523f2..4c477eae689148 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -3,6 +3,7 @@
 #ifndef BTRFS_FS_H
 #define BTRFS_FS_H
 
+#include <linux/blkdev.h>
 #include <linux/fs.h>
 #include <linux/btrfs_tree.h>
 #include <linux/sizes.h>
@@ -748,8 +749,10 @@ struct btrfs_fs_info {
 	 */
 	u64 zone_size;
 
-	/* Max size to emit ZONE_APPEND write command */
+	/* Constraints for ZONE_APPEND commands: */
+	struct queue_limits limits;
 	u64 max_zone_append_size;
+
 	struct mutex zoned_meta_io_lock;
 	spinlock_t treelog_bg_lock;
 	u64 treelog_bg;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 5bf67c3c9f846f..d6a8f8a07d7581 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -421,25 +421,6 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache)
 	nr_sectors = bdev_nr_sectors(bdev);
 	zone_info->zone_size_shift = ilog2(zone_info->zone_size);
 	zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors);
-	/*
-	 * We limit max_zone_append_size also by max_segments *
-	 * PAGE_SIZE. Technically, we can have multiple pages per segment. But,
-	 * since btrfs adds the pages one by one to a bio, and btrfs cannot
-	 * increase the metadata reservation even if it increases the number of
-	 * extents, it is safe to stick with the limit.
-	 *
-	 * With the zoned emulation, we can have non-zoned device on the zoned
-	 * mode. In this case, we don't have a valid max zone append size. So,
-	 * use max_segments * PAGE_SIZE as the pseudo max_zone_append_size.
-	 */
-	if (bdev_is_zoned(bdev)) {
-		zone_info->max_zone_append_size = min_t(u64,
-			(u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT,
-			(u64)bdev_max_segments(bdev) << PAGE_SHIFT);
-	} else {
-		zone_info->max_zone_append_size =
-			(u64)bdev_max_segments(bdev) << PAGE_SHIFT;
-	}
 	if (!IS_ALIGNED(nr_sectors, zone_sectors))
 		zone_info->nr_zones++;
 
@@ -719,9 +700,9 @@ static int btrfs_check_for_zoned_device(struct btrfs_fs_info *fs_info)
 
 int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 {
+	struct queue_limits *lim = &fs_info->limits;
 	struct btrfs_device *device;
 	u64 zone_size = 0;
-	u64 max_zone_append_size = 0;
 	int ret;
 
 	/*
@@ -731,6 +712,8 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 	if (!btrfs_fs_incompat(fs_info, ZONED))
 		return btrfs_check_for_zoned_device(fs_info);
 
+	blk_set_stacking_limits(lim);
+
 	list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) {
 		struct btrfs_zoned_device_info *zone_info = device->zone_info;
 
@@ -745,10 +728,17 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 				  zone_info->zone_size, zone_size);
 			return -EINVAL;
 		}
-		if (!max_zone_append_size ||
-		    (zone_info->max_zone_append_size &&
-		     zone_info->max_zone_append_size < max_zone_append_size))
-			max_zone_append_size = zone_info->max_zone_append_size;
+
+		/*
+		 * With the zoned emulation, we can have non-zoned device on the
+		 * zoned mode. In this case, we don't have a valid max zone
+		 * append size.
+		 */
+		if (bdev_is_zoned(device->bdev)) {
+			blk_stack_limits(lim,
+					 &bdev_get_queue(device->bdev)->limits,
+					 0);
+		}
 	}
 
 	/*
@@ -769,8 +759,18 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info)
 	}
 
 	fs_info->zone_size = zone_size;
-	fs_info->max_zone_append_size = ALIGN_DOWN(max_zone_append_size,
-						   fs_info->sectorsize);
+	/*
+	 * Also limit max_zone_append_size by max_segments * PAGE_SIZE.
+	 * Technically, we can have multiple pages per segment. But,
+	 * since btrfs adds the pages one by one to a bio, and btrfs cannot
+	 * increase the metadata reservation even if it increases the number of
+	 * extents, it is safe to stick with the limit.
+	 */
+	fs_info->max_zone_append_size = ALIGN_DOWN(
+		min3((u64)lim->max_zone_append_sectors << SECTOR_SHIFT,
+		     (u64)lim->max_sectors << SECTOR_SHIFT,
+		     (u64)lim->max_segments << PAGE_SHIFT),
+		fs_info->sectorsize);
 	fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED;
 	if (fs_info->max_zone_append_size < fs_info->max_extent_size)
 		fs_info->max_extent_size = fs_info->max_zone_append_size;
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index bc93a740e7cf34..f25f332b772859 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -20,7 +20,6 @@ struct btrfs_zoned_device_info {
 	 */
 	u64 zone_size;
 	u8  zone_size_shift;
-	u64 max_zone_append_size;
 	u32 nr_zones;
 	unsigned int max_active_zones;
 	atomic_t active_zones_left;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 33/34] btrfs: split zone append bios in btrfs_submit_bio
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (31 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 32/34] btrfs: calculate file system wide queue limit for zoned mode Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-21  6:50 ` [PATCH 34/34] iomap: remove IOMAP_F_ZONE_APPEND Christoph Hellwig
  2023-01-24 13:22 ` consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

The current btrfs zoned device support is a little cumbersome in the data
I/O path as it requires the callers to not issue I/O larger than the
supported ZONE_APPEND size of the underlying device.  This leads to a lot
of extra accounting.  Instead change btrfs_submit_bio so that it can take
write bios of arbitrary size and form from the upper layers, and just
split them internally to the ZONE_APPEND queue limits.  Then remove all
the upper layer warts catering to limited write sized on zoned devices,
including the extra refcount in the compressed_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/bio.c         |  44 +++++++++-------
 fs/btrfs/compression.c | 112 ++++++++---------------------------------
 fs/btrfs/compression.h |   3 --
 fs/btrfs/extent_io.c   |  72 ++++++--------------------
 fs/btrfs/inode.c       |   4 --
 fs/btrfs/zoned.c       |  20 --------
 fs/btrfs/zoned.h       |   9 ----
 7 files changed, 64 insertions(+), 200 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 38a2d2bb5c2fa9..e6fe1b7dbc504c 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -59,13 +59,22 @@ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
 	return bio;
 }
 
-static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length)
+static struct bio *btrfs_split_bio(struct btrfs_fs_info *fs_info,
+				   struct bio *orig, u64 map_length,
+				   bool use_append)
 {
 	struct btrfs_bio *orig_bbio = btrfs_bio(orig);
 	struct bio *bio;
 
-	bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS,
-			&btrfs_clone_bioset);
+	if (use_append) {
+		unsigned int nr_segs;
+
+		bio = bio_split_rw(orig, &fs_info->limits, &nr_segs,
+				   &btrfs_clone_bioset, map_length);
+	} else {
+		bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS,
+				&btrfs_clone_bioset);
+	}
 	btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio);
 
 	btrfs_bio(bio)->file_offset = orig_bbio->file_offset;
@@ -399,16 +408,10 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
 	 */
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 		u64 physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
+		u64 zone_start = round_down(physical, dev->fs_info->zone_size);
 
-		if (btrfs_dev_is_sequential(dev, physical)) {
-			u64 zone_start = round_down(physical,
-						    dev->fs_info->zone_size);
-
-			bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT;
-		} else {
-			bio->bi_opf &= ~REQ_OP_ZONE_APPEND;
-			bio->bi_opf |= REQ_OP_WRITE;
-		}
+		ASSERT(btrfs_dev_is_sequential(dev, physical));
+		bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT;
 	}
 	btrfs_debug_in_rcu(dev->fs_info,
 	"%s: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u",
@@ -606,10 +609,12 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
 static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
-	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
+	struct btrfs_inode *inode = bbio->inode;
+	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
+	bool use_append = btrfs_use_zone_append(inode, logical);
 	struct btrfs_io_context *bioc = NULL;
 	struct btrfs_io_stripe smap;
 	blk_status_t ret;
@@ -624,8 +629,11 @@ static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 	}
 
 	map_length = min(map_length, length);
+	if (use_append)
+		map_length = min(map_length, fs_info->max_zone_append_size);
+
 	if (map_length < length) {
-		bio = btrfs_split_bio(bio, map_length);
+		bio = btrfs_split_bio(fs_info, bio, map_length, use_append);
 		bbio = btrfs_bio(bio);
 	}
 
@@ -641,7 +649,9 @@ static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 	}
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		if (use_append) {
+			bio->bi_opf &= ~REQ_OP_WRITE;
+			bio->bi_opf |= REQ_OP_ZONE_APPEND;
 			ret = btrfs_extract_ordered_extent(btrfs_bio(bio));
 			if (ret)
 				goto fail_put_bio;
@@ -651,9 +661,9 @@ static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 		 * Csum items for reloc roots have already been cloned at this
 		 * point, so they are handled as part of the no-checksum case.
 		 */
-		if (!(bbio->inode->flags & BTRFS_INODE_NODATASUM) &&
+		if (!(inode->flags & BTRFS_INODE_NODATASUM) &&
 		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
-		    !btrfs_is_data_reloc_root(bbio->inode->root)) {
+		    !btrfs_is_data_reloc_root(inode->root)) {
 			if (should_async_write(bbio) &&
 			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
 				goto done;
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 41c1f8937bf5d6..361ed31db3a1fe 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -258,57 +258,14 @@ static void btrfs_finish_compressed_write_work(struct work_struct *work)
 static void end_compressed_bio_write(struct btrfs_bio *bbio)
 {
 	struct compressed_bio *cb = bbio->private;
+	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
 
-	if (bbio->bio.bi_status)
-		cb->status = bbio->bio.bi_status;
-
-	if (refcount_dec_and_test(&cb->pending_ios)) {
-		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+	cb->status = bbio->bio.bi_status;
+	queue_work(fs_info->compressed_write_workers, &cb->write_end_work);
 
-		queue_work(fs_info->compressed_write_workers, &cb->write_end_work);
-	}
 	bio_put(&bbio->bio);
 }
 
-/*
- * Allocate a compressed_bio, which will be used to read/write on-disk
- * (aka, compressed) * data.
- *
- * @cb:                 The compressed_bio structure, which records all the needed
- *                      information to bind the compressed data to the uncompressed
- *                      page cache.
- * @disk_byten:         The logical bytenr where the compressed data will be read
- *                      from or written to.
- * @endio_func:         The endio function to call after the IO for compressed data
- *                      is finished.
- */
-static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
-					blk_opf_t opf,
-					btrfs_bio_end_io_t endio_func)
-{
-	struct bio *bio;
-
-	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, BTRFS_I(cb->inode), endio_func,
-			      cb);
-	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
-
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
-		struct btrfs_device *device;
-
-		device = btrfs_zoned_get_device(fs_info, disk_bytenr,
-						fs_info->sectorsize);
-		if (IS_ERR(device)) {
-			bio_put(bio);
-			return ERR_CAST(device);
-		}
-
-		bio_set_dev(bio, device->bdev);
-	}
-	refcount_inc(&cb->pending_ios);
-	return bio;
-}
-
 /*
  * worker function to build and submit bios for previously compressed pages.
  * The corresponding pages in the inode should be marked for writeback
@@ -332,16 +289,12 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	struct compressed_bio *cb;
 	u64 cur_disk_bytenr = disk_start;
 	blk_status_t ret = BLK_STS_OK;
-	const bool use_append = btrfs_use_zone_append(inode, disk_start);
-	const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED |
-		(use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE);
 
 	ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
 	       IS_ALIGNED(len, fs_info->sectorsize));
 	cb = kmalloc(sizeof(struct compressed_bio), GFP_NOFS);
 	if (!cb)
 		return BLK_STS_RESOURCE;
-	refcount_set(&cb->pending_ios, 1);
 	cb->status = BLK_STS_OK;
 	cb->inode = &inode->vfs_inode;
 	cb->start = start;
@@ -352,8 +305,16 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	INIT_WORK(&cb->write_end_work, btrfs_finish_compressed_write_work);
 	cb->nr_pages = nr_pages;
 
-	if (blkcg_css)
+	if (blkcg_css) {
 		kthread_associate_blkcg(blkcg_css);
+		write_flags |= REQ_CGROUP_PUNT;
+	}
+
+	write_flags |= REQ_BTRFS_ONE_ORDERED;
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_WRITE | write_flags,
+			      BTRFS_I(cb->inode), end_compressed_bio_write, cb);
+	bio->bi_iter.bi_sector = cur_disk_bytenr >> SECTOR_SHIFT;
+	btrfs_bio(bio)->file_offset = start;
 
 	while (cur_disk_bytenr < disk_start + compressed_len) {
 		u64 offset = cur_disk_bytenr - disk_start;
@@ -361,20 +322,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		unsigned int real_size;
 		unsigned int added;
 		struct page *page = compressed_pages[index];
-		bool submit = false;
-
-		/* Allocate new bio if submitted or not yet allocated */
-		if (!bio) {
-			bio = alloc_compressed_bio(cb, cur_disk_bytenr,
-				bio_op | write_flags, end_compressed_bio_write);
-			if (IS_ERR(bio)) {
-				ret = errno_to_blk_status(PTR_ERR(bio));
-				break;
-			}
-			btrfs_bio(bio)->file_offset = start;
-			if (blkcg_css)
-				bio->bi_opf |= REQ_CGROUP_PUNT;
-		}
+
 		/*
 		 * We have various limits on the real read size:
 		 * - page boundary
@@ -384,35 +332,20 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		real_size = min_t(u64, real_size, compressed_len - offset);
 		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
-		if (use_append)
-			added = bio_add_zone_append_page(bio, page, real_size,
-					offset_in_page(offset));
-		else
-			added = bio_add_page(bio, page, real_size,
-					offset_in_page(offset));
-		/* Reached zoned boundary */
-		if (added == 0)
-			submit = true;
-
+		added = bio_add_page(bio, page, real_size, offset_in_page(offset));
+		/*
+		 * Maximum compressed extent is smaller than bio size limit,
+		 * thus bio_add_page() should always success.
+		 */
+		ASSERT(added == real_size);
 		cur_disk_bytenr += added;
-
-		/* Finished the range */
-		if (cur_disk_bytenr == disk_start + compressed_len)
-			submit = true;
-
-		if (submit) {
-			ASSERT(bio->bi_iter.bi_size);
-			btrfs_submit_bio(bio, 0);
-			bio = NULL;
-		}
-		cond_resched();
 	}
 
+	/* Finished the range */
+	ASSERT(bio->bi_iter.bi_size);
+	btrfs_submit_bio(bio, 0);
 	if (blkcg_css)
 		kthread_associate_blkcg(NULL);
-
-	if (refcount_dec_and_test(&cb->pending_ios))
-		finish_compressed_bio_write(cb);
 	return ret;
 }
 
@@ -624,7 +557,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto out;
 	}
 
-	refcount_set(&cb->pending_ios, 1);
 	cb->status = BLK_STS_OK;
 	cb->inode = inode;
 
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 6209d40a1e08e3..a5e3377db9adc9 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -31,9 +31,6 @@ static_assert((BTRFS_MAX_COMPRESSED % PAGE_SIZE) == 0);
 #define	BTRFS_ZLIB_DEFAULT_LEVEL		3
 
 struct compressed_bio {
-	/* Number of outstanding bios */
-	refcount_t pending_ios;
-
 	/* Number of compressed pages in the array */
 	unsigned int nr_pages;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 66134bb75f8941..477be146b981ec 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -897,7 +897,6 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 	u32 real_size;
 	const sector_t sector = disk_bytenr >> SECTOR_SHIFT;
 	bool contig = false;
-	int ret;
 
 	ASSERT(bio);
 	/* The limit should be calculated when bio_ctrl->bio is allocated */
@@ -946,12 +945,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 	if (real_size == 0)
 		return 0;
 
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-		ret = bio_add_zone_append_page(bio, page, real_size, pg_offset);
-	else
-		ret = bio_add_page(bio, page, real_size, pg_offset);
-
-	return ret;
+	return bio_add_page(bio, page, real_size, pg_offset);
 }
 
 static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
@@ -966,7 +960,7 @@ static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 	 * to them.
 	 */
 	if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE &&
-	    bio_op(bio_ctrl->bio) == REQ_OP_ZONE_APPEND) {
+	    btrfs_use_zone_append(inode, logical)) {
 		ordered = btrfs_lookup_ordered_extent(inode, file_offset);
 		if (ordered) {
 			bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX,
@@ -980,16 +974,14 @@ static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 	bio_ctrl->len_to_oe_boundary = U32_MAX;
 }
 
-static int alloc_new_bio(struct btrfs_inode *inode,
-			 struct btrfs_bio_ctrl *bio_ctrl,
-			 struct writeback_control *wbc,
-			 blk_opf_t opf,
-			 u64 disk_bytenr, u32 offset, u64 file_offset,
-			 enum btrfs_compression_type compress_type)
+static void alloc_new_bio(struct btrfs_inode *inode,
+			  struct btrfs_bio_ctrl *bio_ctrl,
+			  struct writeback_control *wbc, blk_opf_t opf,
+			  u64 disk_bytenr, u32 offset, u64 file_offset,
+			  enum btrfs_compression_type compress_type)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct bio *bio;
-	int ret;
 
 	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, inode, bio_ctrl->end_io_func,
 			      NULL);
@@ -1007,40 +999,14 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 
 	if (wbc) {
 		/*
-		 * For Zone append we need the correct block_device that we are
-		 * going to write to set in the bio to be able to respect the
-		 * hardware limitation.  Look it up here:
+		 * Pick the last added device to support cgroup writeback.  For
+		 * multi-device file systems this means blk-cgroup policies have
+		 * to always be set on the last added/replaced device.
+		 * This is a bit odd but has been like that for a long time.
 		 */
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			struct btrfs_device *dev;
-
-			dev = btrfs_zoned_get_device(fs_info, disk_bytenr,
-						     fs_info->sectorsize);
-			if (IS_ERR(dev)) {
-				ret = PTR_ERR(dev);
-				goto error;
-			}
-
-			bio_set_dev(bio, dev->bdev);
-		} else {
-			/*
-			 * Otherwise pick the last added device to support
-			 * cgroup writeback.  For multi-device file systems this
-			 * means blk-cgroup policies have to always be set on the
-			 * last added/replaced device.  This is a bit odd but has
-			 * been like that for a long time.
-			 */
-			bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
-		}
+		bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
 		wbc_init_bio(wbc, bio);
-	} else {
-		ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND);
 	}
-	return 0;
-error:
-	bio_ctrl->bio = NULL;
-	btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
-	return ret;
 }
 
 /*
@@ -1066,7 +1032,6 @@ static int submit_extent_page(blk_opf_t opf,
 			      enum btrfs_compression_type compress_type,
 			      bool force_bio_submit)
 {
-	int ret = 0;
 	struct btrfs_inode *inode = BTRFS_I(page->mapping->host);
 	unsigned int cur = pg_offset;
 
@@ -1086,12 +1051,9 @@ static int submit_extent_page(blk_opf_t opf,
 
 		/* Allocate new bio if needed */
 		if (!bio_ctrl->bio) {
-			ret = alloc_new_bio(inode, bio_ctrl, wbc, opf,
-					    disk_bytenr, offset,
-					    page_offset(page) + cur,
-					    compress_type);
-			if (ret < 0)
-				return ret;
+			alloc_new_bio(inode, bio_ctrl, wbc, opf, disk_bytenr,
+				      offset, page_offset(page) + cur,
+				      compress_type);
 		}
 		/*
 		 * We must go through btrfs_bio_add_page() to ensure each
@@ -1648,10 +1610,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		 * find_next_dirty_byte() are all exclusive
 		 */
 		iosize = min(min(em_end, end + 1), dirty_range_end) - cur;
-
-		if (btrfs_use_zone_append(inode, em->block_start))
-			op = REQ_OP_ZONE_APPEND;
-
 		free_extent_map(em);
 		em = NULL;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 383219f51d86ab..36ae541ad51baa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7678,10 +7678,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	iomap->offset = start;
 	iomap->bdev = fs_info->fs_devices->latest_dev->bdev;
 	iomap->length = len;
-
-	if (write && btrfs_use_zone_append(BTRFS_I(inode), em->block_start))
-		iomap->flags |= IOMAP_F_ZONE_APPEND;
-
 	free_extent_map(em);
 
 	return 0;
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index d6a8f8a07d7581..d862477f79f36d 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1845,26 +1845,6 @@ int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical,
 	return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length);
 }
 
-struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
-					    u64 logical, u64 length)
-{
-	struct btrfs_device *device;
-	struct extent_map *em;
-	struct map_lookup *map;
-
-	em = btrfs_get_chunk_map(fs_info, logical, length);
-	if (IS_ERR(em))
-		return ERR_CAST(em);
-
-	map = em->map_lookup;
-	/* We only support single profile for now */
-	device = map->stripes[0].dev;
-
-	free_extent_map(em);
-
-	return device;
-}
-
 /*
  * Activate block group and underlying device zones
  *
diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h
index f25f332b772859..157f46132c56e0 100644
--- a/fs/btrfs/zoned.h
+++ b/fs/btrfs/zoned.h
@@ -66,8 +66,6 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache,
 int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length);
 int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical,
 				  u64 physical_start, u64 physical_pos);
-struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info,
-					    u64 logical, u64 length);
 bool btrfs_zone_activate(struct btrfs_block_group *block_group);
 int btrfs_zone_finish(struct btrfs_block_group *block_group);
 bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags);
@@ -221,13 +219,6 @@ static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev,
 	return -EOPNOTSUPP;
 }
 
-static inline struct btrfs_device *btrfs_zoned_get_device(
-						  struct btrfs_fs_info *fs_info,
-						  u64 logical, u64 length)
-{
-	return ERR_PTR(-EOPNOTSUPP);
-}
-
 static inline bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 {
 	return true;
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 34/34] iomap: remove IOMAP_F_ZONE_APPEND
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (32 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 33/34] btrfs: split zone append bios in btrfs_submit_bio Christoph Hellwig
@ 2023-01-21  6:50 ` Christoph Hellwig
  2023-01-24 13:22 ` consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-21  6:50 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel, Damien Le Moal

No users left now that btrfs takes REQ_OP_WRITE bios from iomap and
splits and converts them to REQ_OP_ZONE_APPEND internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/iomap/direct-io.c  | 10 ++--------
 include/linux/iomap.h |  3 +--
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 9804714b17518e..f771001574d008 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -217,16 +217,10 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
 {
 	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
 
-	if (!(dio->flags & IOMAP_DIO_WRITE)) {
-		WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND);
+	if (!(dio->flags & IOMAP_DIO_WRITE))
 		return REQ_OP_READ;
-	}
-
-	if (iomap->flags & IOMAP_F_ZONE_APPEND)
-		opflags |= REQ_OP_ZONE_APPEND;
-	else
-		opflags |= REQ_OP_WRITE;
 
+	opflags |= REQ_OP_WRITE;
 	if (use_fua)
 		opflags |= REQ_FUA;
 	else
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0983dfc9a203c3..fca43a4bd96b77 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -58,8 +58,7 @@ struct vm_fault;
 #define IOMAP_F_SHARED		(1U << 2)
 #define IOMAP_F_MERGED		(1U << 3)
 #define IOMAP_F_BUFFER_HEAD	(1U << 4)
-#define IOMAP_F_ZONE_APPEND	(1U << 5)
-#define IOMAP_F_XATTR		(1U << 6)
+#define IOMAP_F_XATTR		(1U << 5)
 
 /*
  * Flags set by the core iomap code during operations:
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/34] btrfs: better document struct btrfs_bio
  2023-01-21  6:49 ` [PATCH 02/34] btrfs: better document struct btrfs_bio Christoph Hellwig
@ 2023-01-22 10:19   ` Anand Jain
  2023-01-23 15:25   ` Johannes Thumshirn
  1 sibling, 0 replies; 80+ messages in thread
From: Anand Jain @ 2023-01-22 10:19 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

LGTM
Reviewed-by: Anand Jain <anand.jain@oralce.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
@ 2023-01-22 10:20   ` Anand Jain
  2023-01-23 15:26   ` Johannes Thumshirn
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Anand Jain @ 2023-01-22 10:20 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On 21/01/2023 14:50, Christoph Hellwig wrote:
> All btrfs_bio I/Os are associated with an inode.  Add a pointer to that
> inode, which will allow to simplify a lot of calling conventions, and
> which will be needed in the I/O completion path in the future.
> 
> This grow the btrfs_bio struture by a pointer, but that grows will
Nit: s/struture/structure

> be offset by the removal of the device pointer soon.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/34] btrfs: better document struct btrfs_bio
  2023-01-21  6:49 ` [PATCH 02/34] btrfs: better document struct btrfs_bio Christoph Hellwig
  2023-01-22 10:19   ` Anand Jain
@ 2023-01-23 15:25   ` Johannes Thumshirn
  1 sibling, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 15:25 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
  2023-01-22 10:20   ` Anand Jain
@ 2023-01-23 15:26   ` Johannes Thumshirn
  2023-01-23 15:47   ` Johannes Thumshirn
  2023-03-07  1:44   ` Qu Wenruo
  3 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 15:26 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:50, Christoph Hellwig wrote:
> This grow the btrfs_bio struture by a pointer, but that grows will

s/struture/structure

Otherwise looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
  2023-01-22 10:20   ` Anand Jain
  2023-01-23 15:26   ` Johannes Thumshirn
@ 2023-01-23 15:47   ` Johannes Thumshirn
  2023-03-07  1:44   ` Qu Wenruo
  3 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 15:47 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization
  2023-01-21  6:50 ` [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization Christoph Hellwig
@ 2023-01-23 15:59   ` Johannes Thumshirn
  2023-01-24 14:55   ` Anand Jain
  1 sibling, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 15:59 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:50, Christoph Hellwig wrote:
> To prepare for pending changes drop the optimization to only look up
> csums once per bio that is submitted from the iomap layer.  In the
> short run this does cause additional lookups for fragmented direct
> reads, but later in the series, the bio based lookup will be used on
> the entire bio submitted from iomap, restoring the old behavior
> in common code.

I was wondering how that would impact performance, when we do a csum
tree lookup on every DIO read until I realized you already accounted
for that in the commit message.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums
  2023-01-21  6:50 ` [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums Christoph Hellwig
@ 2023-01-23 16:06   ` Johannes Thumshirn
  2023-01-24 15:16   ` Anand Jain
  1 sibling, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:06 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio
  2023-01-21  6:50 ` [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio Christoph Hellwig
@ 2023-01-23 16:13   ` Johannes Thumshirn
  2023-01-24 15:27   ` Anand Jain
  1 sibling, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:13 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code
  2023-01-21  6:50 ` [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code Christoph Hellwig
@ 2023-01-23 16:18   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:18 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio
  2023-01-21  6:50 ` [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio Christoph Hellwig
@ 2023-01-23 16:32   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:32 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper
  2023-01-21  6:50 ` [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper Christoph Hellwig
@ 2023-01-23 16:36   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:36 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:51, Christoph Hellwig wrote:
> +	if (btrfs_check_data_csum(inode, bbio, bio_offset, bv->bv_page,
> +				  bv->bv_offset) < 0)
> +		return false;
> +	return true;
> +}
> +
> +

Nit: double newline

>  /*
>   * When reads are done, we need to check csums to verify the data is correct.
>   * if there's a match, we allow the bio to finish.  If not, the code in

Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer
  2023-01-21  6:50 ` [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer Christoph Hellwig
@ 2023-01-23 16:39   ` Johannes Thumshirn
  2023-01-24  6:47     ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:39 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:51, Christoph Hellwig wrote:
> Currently btrfs handles checksum validation and repair in the end I/O
> handler for the btrfs_bio.  This leads to a lot of duplicate code
> plus issues with variying semantics or bugs, e.g.

s/variying/varying


> buffered I/O this now means that a large readahead request can
> fail due to a single bad sector, but as readahead errors are igored

s/igored/ignored/

> the following readpage if the sector is actually accessed will

s/if/of

> still be able to read.  This also matches the I/O failure handling
> in other file systems.

Sorry for not having spotted these earlier.

Anyways, code still looks good:
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/34] btrfs: remove btrfs_bio_free_csum
  2023-01-21  6:50 ` [PATCH 11/34] btrfs: remove btrfs_bio_free_csum Christoph Hellwig
@ 2023-01-23 16:41   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:41 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector
  2023-01-21  6:50 ` [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector Christoph Hellwig
@ 2023-01-23 16:42   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:42 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/34] btrfs: remove now unused checksumming helpers
  2023-01-21  6:50 ` [PATCH 13/34] btrfs: remove now unused checksumming helpers Christoph Hellwig
@ 2023-01-23 16:49   ` Johannes Thumshirn
  2023-01-23 16:53     ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-23 16:49 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:51, Christoph Hellwig wrote:
> -	if (btrfs_check_data_csum(inode, bbio, bio_offset, bv->bv_page,
> -				  bv->bv_offset) < 0)
> -		return false;
> +	csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
> +	if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
> +				    csum_expected))
> +		goto zeroit;
>  	return true;

We could even go as far as that:

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c63160cf135d..9f384551b722 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3449,13 +3449,6 @@ int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page,
        return 0;
 }
 
-static u8 *btrfs_csum_ptr(const struct btrfs_fs_info *fs_info, u8 *csums, u64 offset)
-{
-       u64 offset_in_sectors = offset >> fs_info->sectorsize_bits;
-
-       return csums + offset_in_sectors * fs_info->csum_size;
-}
-
 /*
  * btrfs_data_csum_ok - verify the checksum of single data sector
  * @bbio:      btrfs_io_bio which contains the csum
@@ -3493,7 +3486,8 @@ bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev,
                return true;
        }
 
-       csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset);
+       csum_expected = bbio->csum + (bio_offset >> fs_info->sectorsize_bits) *
+                               fs_info->csum_size;
        if (btrfs_check_sector_csum(fs_info, bv->bv_page, bv->bv_offset, csum,
                                    csum_expected))
                goto zeroit;

Anyways that's splitting hairs,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/34] btrfs: remove now unused checksumming helpers
  2023-01-23 16:49   ` Johannes Thumshirn
@ 2023-01-23 16:53     ` Christoph Hellwig
  2023-01-24  8:33       ` Johannes Thumshirn
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-23 16:53 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On Mon, Jan 23, 2023 at 04:49:30PM +0000, Johannes Thumshirn wrote:
> We could even go as far as that:

That looks nice.  Care to send an incremental patch once the series is
in?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer
  2023-01-23 16:39   ` Johannes Thumshirn
@ 2023-01-24  6:47     ` Christoph Hellwig
  2023-01-24  8:16       ` Johannes Thumshirn
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-24  6:47 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On Mon, Jan 23, 2023 at 04:39:29PM +0000, Johannes Thumshirn wrote:
> > the following readpage if the sector is actually accessed will
> 
> s/if/of

This is really supposed to be an if - readahead is speculative,
and thus the actual setor might not actually really be needed in the
end.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer
  2023-01-24  6:47     ` Christoph Hellwig
@ 2023-01-24  8:16       ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24  8:16 UTC (permalink / raw)
  To: hch
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 24.01.23 07:48, Christoph Hellwig wrote:
> On Mon, Jan 23, 2023 at 04:39:29PM +0000, Johannes Thumshirn wrote:
>>> the following readpage if the sector is actually accessed will
>>
>> s/if/of
> 
> This is really supposed to be an if - readahead is speculative,
> and thus the actual setor might not actually really be needed in the
> end.
> 

Ah OK. My bad.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/34] btrfs: remove now unused checksumming helpers
  2023-01-23 16:53     ` Christoph Hellwig
@ 2023-01-24  8:33       ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24  8:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Damien Le Moal,
	Naohiro Aota, Qu Wenruo, Jens Axboe, Darrick J. Wong,
	linux-block, linux-btrfs, linux-fsdevel

On 23.01.23 17:53, Christoph Hellwig wrote:
> On Mon, Jan 23, 2023 at 04:49:30PM +0000, Johannes Thumshirn wrote:
>> We could even go as far as that:
> 
> That looks nice.  Care to send an incremental patch once the series is
> in?
> 

Sure can do, but it's untested as of now. Although I just folded 
btrfs_csum_ptr() into it's only caller.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio Christoph Hellwig
@ 2023-01-24 11:01   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24 11:01 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:51, Christoph Hellwig wrote:
> The device field is only used by the simple end I/O handler, and for
> that it can simply be stored in the bi_private field of the bio,
> which is currently used for the fs_info that can be retreived through

 s/retreived/retrieved/

Other than that,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/34] btrfs: remove the io_failure_record infrastructure
  2023-01-21  6:50 ` [PATCH 15/34] btrfs: remove the io_failure_record infrastructure Christoph Hellwig
@ 2023-01-24 11:04   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24 11:04 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On 21.01.23 07:51, Christoph Hellwig wrote:
> strutc io_failure_record and the io_failure_tree tree are unused now,

s/strutc/struct/

> so remove them.
> 

Otherwise looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio Christoph Hellwig
@ 2023-01-24 11:09   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24 11:09 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/34] btrfs: remove the is_metadata flag in struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 17/34] btrfs: remove the is_metadata flag " Christoph Hellwig
@ 2023-01-24 11:13   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24 11:13 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/34] btrfs: remove the submit_bio_start helpers
  2023-01-21  6:50 ` [PATCH 18/34] btrfs: remove the submit_bio_start helpers Christoph Hellwig
@ 2023-01-24 11:49   ` Johannes Thumshirn
  0 siblings, 0 replies; 80+ messages in thread
From: Johannes Thumshirn @ 2023-01-24 11:49 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: consolidate btrfs checksumming, repair and bio splitting v4
  2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
                   ` (33 preceding siblings ...)
  2023-01-21  6:50 ` [PATCH 34/34] iomap: remove IOMAP_F_ZONE_APPEND Christoph Hellwig
@ 2023-01-24 13:22 ` Christoph Hellwig
  34 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-24 13:22 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On Sat, Jan 21, 2023 at 07:49:57AM +0100, Christoph Hellwig wrote:
> A git tree is also available:
> 
>     git://git.infradead.org/users/hch/misc.git btrfs-bio-split
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-bio-split

I've update the git tree with the typos pointed out during the reviews
fixed, and the Review tags added.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization
  2023-01-21  6:50 ` [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization Christoph Hellwig
  2023-01-23 15:59   ` Johannes Thumshirn
@ 2023-01-24 14:55   ` Anand Jain
  2023-01-24 19:55     ` Christoph Hellwig
  1 sibling, 1 reply; 80+ messages in thread
From: Anand Jain @ 2023-01-24 14:55 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On 21/01/2023 14:50, Christoph Hellwig wrote:
> To prepare for pending changes drop the optimization to only look up
> csums once per bio that is submitted from the iomap layer.  In the
> short run this does cause additional lookups for fragmented direct
> reads, but later in the series, the bio based lookup will be used on
> the entire bio submitted from iomap, restoring the old behavior
> in common code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Anand Jain <anand.jain@oracle.com>


I was curious about the commit message.
I ran fio to test the performance before and after the change.
The results were similar.

fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1 
--size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16

before this patch
READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s), 
io=481MiB (504MB), run=60017-60017msec

after this patch
READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s), 
io=490MiB (513MB), run=60013-60013msec

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums
  2023-01-21  6:50 ` [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums Christoph Hellwig
  2023-01-23 16:06   ` Johannes Thumshirn
@ 2023-01-24 15:16   ` Anand Jain
  1 sibling, 0 replies; 80+ messages in thread
From: Anand Jain @ 2023-01-24 15:16 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel


Looks good.
Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio
  2023-01-21  6:50 ` [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio Christoph Hellwig
  2023-01-23 16:13   ` Johannes Thumshirn
@ 2023-01-24 15:27   ` Anand Jain
  1 sibling, 0 replies; 80+ messages in thread
From: Anand Jain @ 2023-01-24 15:27 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

looks good.

Reviewed-by: Anand Jain <anand.jain@oracle.com>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization
  2023-01-24 14:55   ` Anand Jain
@ 2023-01-24 19:55     ` Christoph Hellwig
  2023-01-25  7:42       ` Anand Jain
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-24 19:55 UTC (permalink / raw)
  To: Anand Jain
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On Tue, Jan 24, 2023 at 10:55:25PM +0800, Anand Jain wrote:
>
> I was curious about the commit message.
> I ran fio to test the performance before and after the change.
> The results were similar.
>
> fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1 
> --size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16
>
> before this patch
> READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s), 
> io=481MiB (504MB), run=60017-60017msec
>
> after this patch
> READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s), 
> io=490MiB (513MB), run=60013-60013msec

That's 4k reads.  The will benefit from the inline csum array in the
btrfs_bio, but won't benefit from the existing batching, so this is
kind of expected.

The good news is that the final series will still use the inline
csum array for small reads, while also only doing a single csum tree
lookup for larger reads, so you'll get the best of both worlds.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization
  2023-01-24 19:55     ` Christoph Hellwig
@ 2023-01-25  7:42       ` Anand Jain
  0 siblings, 0 replies; 80+ messages in thread
From: Anand Jain @ 2023-01-25  7:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel



On 25/01/2023 03:55, Christoph Hellwig wrote:
> On Tue, Jan 24, 2023 at 10:55:25PM +0800, Anand Jain wrote:
>>
>> I was curious about the commit message.
>> I ran fio to test the performance before and after the change.
>> The results were similar.
>>
>> fio --group_reporting=1 --directory /mnt/test --name dioread --direct=1
>> --size=1g --rw=read  --runtime=60 --iodepth=1024 --nrfiles=16 --numjobs=16
>>
>> before this patch
>> READ: bw=8208KiB/s (8405kB/s), 8208KiB/s-8208KiB/s (8405kB/s-8405kB/s),
>> io=481MiB (504MB), run=60017-60017msec
>>
>> after this patch
>> READ: bw=8353KiB/s (8554kB/s), 8353KiB/s-8353KiB/s (8554kB/s-8554kB/s),
>> io=490MiB (513MB), run=60013-60013msec
> 
> That's 4k reads.  The will benefit from the inline csum array in the
> btrfs_bio, but won't benefit from the existing batching, so this is
> kind of expected.
> 
> The good news is that the final series will still use the inline
> csum array for small reads, while also only doing a single csum tree
> lookup for larger reads, so you'll get the best of both worlds.
> 

Ok. Got this results for the whole series from an aarch64 
(pagesize=64k); Results finds little improvement/same.


Before:
Last commit:
b3b1ba7b8c0d btrfs: skip backref walking during fiemap if we know the 
leaf is shared

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=0 /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6052MiB/s (6346MB/s), 6052MiB/s-6052MiB/s 
(6346MB/s-6346MB/s), io=16.0GiB (17.2GB), run=2707-2707msec

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=64K /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6139MiB/s (6437MB/s), 6139MiB/s-6139MiB/s 
(6437MB/s-6437MB/s), io=16.0GiB (17.2GB), run=2669-2669msec


After:
last commit:
b488ab9aed15 iomap: remove IOMAP_F_ZONE_APPEND

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount -o max_inline=0 /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6100MiB/s (6396MB/s), 6100MiB/s-6100MiB/s 
(6396MB/s-6396MB/s), io=16.0GiB (17.2GB), run=2686-2686msec

---- mkfs.btrfs /dev/vdb ..... :0 ----
---- mount /dev/vdb /btrfs ..... :0 ----
---- fio --group_reporting=1 --directory /btrfs --name dioread 
--direct=1 --size=1g --rw=read --runtime=60 --iodepth=1024 --nrfiles=16 
--numjobs=16 | egrep "fio|READ" ..... :0 ----
fio-3.19
    READ: bw=6157MiB/s (6456MB/s), 6157MiB/s-6157MiB/s 
(6456MB/s-6456MB/s), io=16.0GiB (17.2GB), run=2661-2661msec




^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-21  6:50 ` [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios Christoph Hellwig
@ 2023-01-25 21:51   ` Josef Bacik
  2023-01-26  5:21     ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Josef Bacik @ 2023-01-25 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, David Sterba, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Qu Wenruo, Jens Axboe, Darrick J. Wong,
	linux-block, linux-btrfs, linux-fsdevel

On Sat, Jan 21, 2023 at 07:50:20AM +0100, Christoph Hellwig wrote:
> Currently the I/O submitters have to split bios according to the
> chunk stripe boundaries.  This leads to extra lookups in the extent
> trees and a lot of boilerplate code.
> 
> To drop this requirement, split the bio when __btrfs_map_block
> returns a mapping that is smaller than the requested size and
> keep a count of pending bios in the original btrfs_bio so that
> the upper level completion is only invoked when all clones have
> completed.
> 
> Based on a patch from Qu Wenruo.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/bio.c | 108 ++++++++++++++++++++++++++++++++++++++++---------
>  fs/btrfs/bio.h |   1 +
>  2 files changed, 91 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index c7522ac7e0e71c..ff42b783902140 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -17,6 +17,7 @@
>  #include "file-item.h"
>  
>  static struct bio_set btrfs_bioset;
> +static struct bio_set btrfs_clone_bioset;
>  static struct bio_set btrfs_repair_bioset;
>  static mempool_t btrfs_failed_bio_pool;
>  
> @@ -38,6 +39,7 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio,
>  	bbio->inode = inode;
>  	bbio->end_io = end_io;
>  	bbio->private = private;
> +	atomic_set(&bbio->pending_ios, 1);
>  }
>  
>  /*
> @@ -75,6 +77,58 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
>  	return bio;
>  }
>  
> +static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length)
> +{
> +	struct btrfs_bio *orig_bbio = btrfs_bio(orig);
> +	struct bio *bio;
> +
> +	bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS,
> +			&btrfs_clone_bioset);
> +	btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio);
> +
> +	btrfs_bio(bio)->file_offset = orig_bbio->file_offset;
> +	if (!(orig->bi_opf & REQ_BTRFS_ONE_ORDERED))
> +		orig_bbio->file_offset += map_length;
> +
> +	atomic_inc(&orig_bbio->pending_ios);
> +	return bio;
> +}
> +
> +static void btrfs_orig_write_end_io(struct bio *bio);
> +static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio,
> +				       struct btrfs_bio *orig_bbio)
> +{
> +	/*
> +	 * For writes btrfs tolerates nr_mirrors - 1 write failures, so we
> +	 * can't just blindly propagate a write failure here.
> +	 * Instead increment the error count in the original I/O context so
> +	 * that it is guaranteed to be larger than the error tolerance.
> +	 */
> +	if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) {
> +		struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private;
> +		struct btrfs_io_context *orig_bioc = orig_stripe->bioc;
> +
> +		atomic_add(orig_bioc->max_errors, &orig_bioc->error);
> +	} else {
> +		orig_bbio->bio.bi_status = bbio->bio.bi_status;
> +	}
> +}
> +
> +static void btrfs_orig_bbio_end_io(struct btrfs_bio *bbio)
> +{
> +	if (bbio->bio.bi_pool == &btrfs_clone_bioset) {
> +		struct btrfs_bio *orig_bbio = bbio->private;
> +
> +		if (bbio->bio.bi_status)
> +			btrfs_bbio_propagate_error(bbio, orig_bbio);
> +		bio_put(&bbio->bio);
> +		bbio = orig_bbio;
> +	}
> +
> +	if (atomic_dec_and_test(&bbio->pending_ios))
> +		bbio->end_io(bbio);
> +}
> +
>  static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
>  {
>  	if (cur_mirror == fbio->num_copies)
> @@ -92,7 +146,7 @@ static int prev_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror)
>  static void btrfs_repair_done(struct btrfs_failed_bio *fbio)
>  {
>  	if (atomic_dec_and_test(&fbio->repair_count)) {
> -		fbio->bbio->end_io(fbio->bbio);
> +		btrfs_orig_bbio_end_io(fbio->bbio);
>  		mempool_free(fbio, &btrfs_failed_bio_pool);
>  	}
>  }
> @@ -232,7 +286,7 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio,
>  	if (unlikely(fbio))
>  		btrfs_repair_done(fbio);
>  	else
> -		bbio->end_io(bbio);
> +		btrfs_orig_bbio_end_io(bbio);
>  }
>  
>  static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
> @@ -286,7 +340,7 @@ static void btrfs_simple_end_io(struct bio *bio)
>  	} else {
>  		if (bio_op(bio) == REQ_OP_ZONE_APPEND)
>  			btrfs_record_physical_zoned(bbio);
> -		bbio->end_io(bbio);
> +		btrfs_orig_bbio_end_io(bbio);
>  	}
>  }
>  
> @@ -300,7 +354,7 @@ static void btrfs_raid56_end_io(struct bio *bio)
>  	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
>  		btrfs_check_read_bio(bbio, NULL);
>  	else
> -		bbio->end_io(bbio);
> +		btrfs_orig_bbio_end_io(bbio);
>  
>  	btrfs_put_bioc(bioc);
>  }
> @@ -327,7 +381,7 @@ static void btrfs_orig_write_end_io(struct bio *bio)
>  	else
>  		bio->bi_status = BLK_STS_OK;
>  
> -	bbio->end_io(bbio);
> +	btrfs_orig_bbio_end_io(bbio);
>  	btrfs_put_bioc(bioc);
>  }
>  
> @@ -492,7 +546,7 @@ static void run_one_async_done(struct btrfs_work *work)
>  
>  	/* If an error occurred we just want to clean up the bio and move on */
>  	if (bio->bi_status) {
> -		btrfs_bio_end_io(async->bbio, bio->bi_status);
> +		btrfs_orig_bbio_end_io(async->bbio);
>  		return;
>  	}
>  
> @@ -567,8 +621,8 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio,
>  	return true;
>  }
>  
> -void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
> -		      int mirror_num)
> +static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
> +			       int mirror_num)
>  {
>  	struct btrfs_bio *bbio = btrfs_bio(bio);
>  	u64 logical = bio->bi_iter.bi_sector << 9;
> @@ -587,11 +641,10 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>  		goto fail;
>  	}
>  
> +	map_length = min(map_length, length);
>  	if (map_length < length) {
> -		btrfs_crit(fs_info,
> -			   "mapping failed logical %llu bio len %llu len %llu",
> -			   logical, length, map_length);
> -		BUG();
> +		bio = btrfs_split_bio(bio, map_length);
> +		bbio = btrfs_bio(bio);
>  	}
>  
>  	/*
> @@ -602,14 +655,14 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>  		bbio->saved_iter = bio->bi_iter;
>  		ret = btrfs_lookup_bio_sums(bbio);
>  		if (ret)
> -			goto fail;
> +			goto fail_put_bio;
>  	}
>  
>  	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
>  		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
>  			ret = btrfs_extract_ordered_extent(btrfs_bio(bio));
>  			if (ret)
> -				goto fail;
> +				goto fail_put_bio;
>  		}
>  
>  		/*
> @@ -621,20 +674,33 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>  		    !btrfs_is_data_reloc_root(bbio->inode->root)) {
>  			if (should_async_write(bbio) &&
>  			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
> -				return;
> +				goto done;
>  
>  			ret = btrfs_bio_csum(bbio);
>  			if (ret)
> -				goto fail;
> +				goto fail_put_bio;
>  		}
>  	}
>  
>  	__btrfs_submit_bio(bio, bioc, &smap, mirror_num);
> -	return;
> +done:
> +	return map_length == length;
>  
> +fail_put_bio:
> +	if (map_length < length)
> +		bio_put(bio);

This is causing a panic in btrfs/125 because you set bbio to
btrfs_bio(split_bio), which has a NULL end_io.  You need something like the
following so that we're ending the correct bbio.  Thanks,

Josef

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 5d4b67fc44f4..f3a357c48e69 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -607,6 +607,7 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
 			       int mirror_num)
 {
 	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_bio *orig_bbio = bbio;
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
@@ -673,7 +674,7 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio,
 		bio_put(bio);
 fail:
 	btrfs_bio_counter_dec(fs_info);
-	btrfs_bio_end_io(bbio, ret);
+	btrfs_bio_end_io(orig_bbio, ret);
 	/* Do not submit another chunk */
 	return true;
 }

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-25 21:51   ` Josef Bacik
@ 2023-01-26  5:21     ` Christoph Hellwig
  2023-01-26 17:43       ` Josef Bacik
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-26  5:21 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, Chris Mason, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On Wed, Jan 25, 2023 at 04:51:16PM -0500, Josef Bacik wrote:
> This is causing a panic in btrfs/125 because you set bbio to
> btrfs_bio(split_bio), which has a NULL end_io.  You need something like the
> following so that we're ending the correct bbio.  Thanks,

Just curious, what are the other configuration details as I've never been
able to hit it?

The fix itself looks good.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-26  5:21     ` Christoph Hellwig
@ 2023-01-26 17:43       ` Josef Bacik
  2023-01-26 17:46         ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Josef Bacik @ 2023-01-26 17:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, David Sterba, Damien Le Moal, Naohiro Aota,
	Johannes Thumshirn, Qu Wenruo, Jens Axboe, Darrick J. Wong,
	linux-block, linux-btrfs, linux-fsdevel

On Thu, Jan 26, 2023 at 06:21:43AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 25, 2023 at 04:51:16PM -0500, Josef Bacik wrote:
> > This is causing a panic in btrfs/125 because you set bbio to
> > btrfs_bio(split_bio), which has a NULL end_io.  You need something like the
> > following so that we're ending the correct bbio.  Thanks,
> 
> Just curious, what are the other configuration details as I've never been
> able to hit it?
> 
> The fix itself looks good.

I reproduced it on the CI setup I've got for us, this was the config

[btrfs_normal_freespacetree]
TEST_DIR=/mnt/test
TEST_DEV=/dev/mapper/vg0-lv0
SCRATCH_DEV_POOL="/dev/mapper/vg0-lv7 /dev/mapper/vg0-lv6 /dev/mapper/vg0-lv5 /dev/mapper/vg0-lv4 /dev/mapper/vg0-lv3 /dev/mapper/vg0-lv2 /dev/mapper/vg0-lv1 "
SCRATCH_MNT=/mnt/scratch
LOGWRITES_DEV=/dev/mapper/vg0-lv8
PERF_CONFIGNAME=jbacik
MKFS_OPTIONS="-K -f -O ^no-holes"
MOUNT_OPTIONS="-o space_cache=v2"
FSTYP=btrfs

I actually hadn't been running 125 because it wasn't in the auto group, Dave
noticed it, I just tried it on this VM and hit it right away.  No worries,
that's why we have the CI stuff, sometimes it just doesn't trigger for us but
will trigger with the CI setup.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-26 17:43       ` Josef Bacik
@ 2023-01-26 17:46         ` Christoph Hellwig
  2023-01-26 18:33           ` David Sterba
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-26 17:46 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Christoph Hellwig, Chris Mason, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On Thu, Jan 26, 2023 at 12:43:01PM -0500, Josef Bacik wrote:
> I actually hadn't been running 125 because it wasn't in the auto group, Dave
> noticed it, I just tried it on this VM and hit it right away.  No worries,
> that's why we have the CI stuff, sometimes it just doesn't trigger for us but
> will trigger with the CI setup.  Thanks,

Oh, I guess the lack of auto group means I've never tested it.  But
it's a fairly bad bug, and I'm surprised nothing in auto hits an
error after a bio split.  I'll need to find out if I can find a simpler
reproducer as this warrants a regression test.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-26 17:46         ` Christoph Hellwig
@ 2023-01-26 18:33           ` David Sterba
  2023-01-27  7:02             ` test not in the auto group, was: " Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: David Sterba @ 2023-01-26 18:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Josef Bacik, Chris Mason, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel

On Thu, Jan 26, 2023 at 06:46:11PM +0100, Christoph Hellwig wrote:
> On Thu, Jan 26, 2023 at 12:43:01PM -0500, Josef Bacik wrote:
> > I actually hadn't been running 125 because it wasn't in the auto group, Dave
> > noticed it, I just tried it on this VM and hit it right away.  No worries,
> > that's why we have the CI stuff, sometimes it just doesn't trigger for us but
> > will trigger with the CI setup.  Thanks,
> 
> Oh, I guess the lack of auto group means I've never tested it.  But
> it's a fairly bad bug, and I'm surprised nothing in auto hits an
> error after a bio split.  I'll need to find out if I can find a simpler
> reproducer as this warrants a regression test.

The 'auto' group is good for first tests, I'm running 'check -g all' on
my VM setups. If this is enough to trigger errors then we probably don't
need a separate regression test.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* test not in the auto group, was: Re: [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios
  2023-01-26 18:33           ` David Sterba
@ 2023-01-27  7:02             ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-01-27  7:02 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, Chris Mason, David Sterba, Darrick J. Wong,
	linux-btrfs, linux-fsdevel, fstests

On Thu, Jan 26, 2023 at 07:33:04PM +0100, David Sterba wrote:
> > Oh, I guess the lack of auto group means I've never tested it.  But
> > it's a fairly bad bug, and I'm surprised nothing in auto hits an
> > error after a bio split.  I'll need to find out if I can find a simpler
> > reproducer as this warrants a regression test.
> 
> The 'auto' group is good for first tests, I'm running 'check -g all' on
> my VM setups. If this is enough to trigger errors then we probably don't
> need a separate regression test.

Hmm.  The xfstests README says:

"By default the tests suite will run all the tests in the auto group. These
 are the tests that are expected to function correctly as regression tests,
 and it excludes tests that exercise conditions known to cause machine
 failures (i.e. the "dangerous" tests)."

and my assumptions over decades of xfstests use has been that only tests
that are broken, non-deterministic, or cause recent upstream kernels
to crash are not in auto.

Is there some kind of different rule for btrfs?  e.g. btrfs/125
seems to complete quickly and does not actually seem to be dangerous.
Besides that there's btrfs/185, which is very quick fuzzer, and btrfs/198
which is a fairly normal test as far as I can tell.

The generic tests also have a few !auto tests that look like they
should be mostly in the auto group as well, in addition to a few broken
and dangerous ones, and the blockdev ones from Darrick that should
probably move to blktests.

XFS mostly seems to have dangerous fuzzer tests in the !auto category.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
                     ` (2 preceding siblings ...)
  2023-01-23 15:47   ` Johannes Thumshirn
@ 2023-03-07  1:44   ` Qu Wenruo
  2023-03-07 14:41     ` Christoph Hellwig
  3 siblings, 1 reply; 80+ messages in thread
From: Qu Wenruo @ 2023-03-07  1:44 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel



On 2023/1/21 14:50, Christoph Hellwig wrote:
> All btrfs_bio I/Os are associated with an inode.  Add a pointer to that
> inode, which will allow to simplify a lot of calling conventions, and
> which will be needed in the I/O completion path in the future.
> 
> This grow the btrfs_bio struture by a pointer, but that grows will
> be offset by the removal of the device pointer soon.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

With my recent restart on scrub rework, this patch makes me wonder, what 
if scrub wants to use btrfs_bio, but don't want to pass a valid 
btrfs_inode pointer?

E.g. scrub code just wants to read certain mirror of a logical bytenr.
This can simplify the handling of RAID56, as for data stripes the repair 
path is the same, just try the next mirror(s).

Furthermore most of the new btrfs_bio code is handling data reads by 
triggering read-repair automatically.
This can be unnecessary for scrub.

For now, I can workaround the behavior by setting REQ_META and pass
btree_inode as the inode, but this is only a workaround.
This can be problematic especially if we want to merge metadata and data 
verification behavior.

Any better ideas on this?


And since we're here, can we also have btrfs equivalent of on-stack bio?
As scrub can benefit a lot from that, as for sector-by-sector read, we 
want to avoid repeating allocating/freeing a btrfs_bio just for reading 
one sector.
(The existing behavior is using on-stack bio with bio_init/bio_uninit 
inside scrub_recheck_block())

Thanks,
Qu
> ---
>   fs/btrfs/bio.c         | 8 ++++++--
>   fs/btrfs/bio.h         | 5 ++++-
>   fs/btrfs/compression.c | 3 ++-
>   fs/btrfs/extent_io.c   | 8 ++++----
>   fs/btrfs/inode.c       | 4 +++-
>   5 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
> index 8affc88b0e0a4b..2398bb263957b2 100644
> --- a/fs/btrfs/bio.c
> +++ b/fs/btrfs/bio.c
> @@ -22,9 +22,11 @@ static struct bio_set btrfs_bioset;
>    * is already initialized by the block layer.
>    */
>   static inline void btrfs_bio_init(struct btrfs_bio *bbio,
> +				  struct btrfs_inode *inode,
>   				  btrfs_bio_end_io_t end_io, void *private)
>   {
>   	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
> +	bbio->inode = inode;
>   	bbio->end_io = end_io;
>   	bbio->private = private;
>   }
> @@ -37,16 +39,18 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio,
>    * a mempool.
>    */
>   struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
> +			    struct btrfs_inode *inode,
>   			    btrfs_bio_end_io_t end_io, void *private)
>   {
>   	struct bio *bio;
>   
>   	bio = bio_alloc_bioset(NULL, nr_vecs, opf, GFP_NOFS, &btrfs_bioset);
> -	btrfs_bio_init(btrfs_bio(bio), end_io, private);
> +	btrfs_bio_init(btrfs_bio(bio), inode, end_io, private);
>   	return bio;
>   }
>   
>   struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
> +				    struct btrfs_inode *inode,
>   				    btrfs_bio_end_io_t end_io, void *private)
>   {
>   	struct bio *bio;
> @@ -56,7 +60,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
>   
>   	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
>   	bbio = btrfs_bio(bio);
> -	btrfs_bio_init(bbio, end_io, private);
> +	btrfs_bio_init(bbio, inode, end_io, private);
>   
>   	bio_trim(bio, offset >> 9, size >> 9);
>   	bbio->iter = bio->bi_iter;
> diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
> index baaa27961cc812..8d69d0b226d99b 100644
> --- a/fs/btrfs/bio.h
> +++ b/fs/btrfs/bio.h
> @@ -41,7 +41,8 @@ struct btrfs_bio {
>   	unsigned int is_metadata:1;
>   	struct bvec_iter iter;
>   
> -	/* File offset that this I/O operates on. */
> +	/* Inode and offset into it that this I/O operates on. */
> +	struct btrfs_inode *inode;
>   	u64 file_offset;
>   
>   	/* @device is for stripe IO submission. */
> @@ -80,8 +81,10 @@ int __init btrfs_bioset_init(void);
>   void __cold btrfs_bioset_exit(void);
>   
>   struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf,
> +			    struct btrfs_inode *inode,
>   			    btrfs_bio_end_io_t end_io, void *private);
>   struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
> +				    struct btrfs_inode *inode,
>   				    btrfs_bio_end_io_t end_io, void *private);
>   
>   
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 4a5aeb8dd4793a..b8e3e899974b34 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -344,7 +344,8 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
>   	struct bio *bio;
>   	int ret;
>   
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, endio_func, cb);
> +	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, BTRFS_I(cb->inode), endio_func,
> +			      cb);
>   	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
>   
>   	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 9bd32daa9b9a6f..faf9312a46c0e1 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -740,7 +740,8 @@ int btrfs_repair_one_sector(struct btrfs_inode *inode, struct btrfs_bio *failed_
>   		return -EIO;
>   	}
>   
> -	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io,
> +	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->inode,
> +				     failed_bbio->end_io,
>   				     failed_bbio->private);
>   	repair_bbio = btrfs_bio(repair_bio);
>   	repair_bbio->file_offset = start;
> @@ -1394,9 +1395,8 @@ static int alloc_new_bio(struct btrfs_inode *inode,
>   	struct bio *bio;
>   	int ret;
>   
> -	ASSERT(bio_ctrl->end_io_func);
> -
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, bio_ctrl->end_io_func, NULL);
> +	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, inode, bio_ctrl->end_io_func,
> +			      NULL);
>   	/*
>   	 * For compressed page range, its disk_bytenr is always @disk_bytenr
>   	 * passed in, no matter if we have added any range into previous bio.
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3c49742f0d4556..0a85e42f114cc5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8097,7 +8097,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   		 * the allocation is backed by btrfs_bioset.
>   		 */
>   		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len,
> -					      btrfs_end_dio_bio, dip);
> +					      BTRFS_I(inode), btrfs_end_dio_bio,
> +					      dip);
>   		btrfs_bio(bio)->file_offset = file_offset;
>   
>   		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> @@ -10409,6 +10410,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
>   
>   			if (!bio) {
>   				bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ,
> +						      inode,
>   						      btrfs_encoded_read_endio,
>   						      &priv);
>   				bio->bi_iter.bi_sector =

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-07  1:44   ` Qu Wenruo
@ 2023-03-07 14:41     ` Christoph Hellwig
  2023-03-07 22:21       ` Qu Wenruo
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-03-07 14:41 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On Tue, Mar 07, 2023 at 09:44:32AM +0800, Qu Wenruo wrote:
> With my recent restart on scrub rework, this patch makes me wonder, what if 
> scrub wants to use btrfs_bio, but don't want to pass a valid btrfs_inode 
> pointer?

The full inode is only really needed for the data repair code.  But a lot
of code uses the fs_info, which would have to be added as a separate
counter.  The other usage is the sync_writers counter, which is a bit
odd and should probably be keyed off the REQ_SYNC flag instead.

> E.g. scrub code just wants to read certain mirror of a logical bytenr.
> This can simplify the handling of RAID56, as for data stripes the repair 
> path is the same, just try the next mirror(s).
>
> Furthermore most of the new btrfs_bio code is handling data reads by 
> triggering read-repair automatically.
> This can be unnecessary for scrub.

This sounds like you don't want to use the btrfs_bio at all as you
don't rely on any of the functionality from it.

>
> And since we're here, can we also have btrfs equivalent of on-stack bio?
> As scrub can benefit a lot from that, as for sector-by-sector read, we want 
> to avoid repeating allocating/freeing a btrfs_bio just for reading one 
> sector.
> (The existing behavior is using on-stack bio with bio_init/bio_uninit 
> inside scrub_recheck_block())

You can do that right now by declaring a btrfs_bio on-stack and then
calling bio_init on the embedded bio followed by a btrfs_bio_init on
the btrfs_bio.  But I don't think doing this will actually be a win
for the scrub code in terms of speed or code size.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-07 14:41     ` Christoph Hellwig
@ 2023-03-07 22:21       ` Qu Wenruo
  2023-03-08  6:04         ` Qu Wenruo
  0 siblings, 1 reply; 80+ messages in thread
From: Qu Wenruo @ 2023-03-07 22:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel



On 2023/3/7 22:41, Christoph Hellwig wrote:
> On Tue, Mar 07, 2023 at 09:44:32AM +0800, Qu Wenruo wrote:
>> With my recent restart on scrub rework, this patch makes me wonder, what if
>> scrub wants to use btrfs_bio, but don't want to pass a valid btrfs_inode
>> pointer?
> 
> The full inode is only really needed for the data repair code.  But a lot
> of code uses the fs_info, which would have to be added as a separate
> counter.  The other usage is the sync_writers counter, which is a bit
> odd and should probably be keyed off the REQ_SYNC flag instead.
> 
>> E.g. scrub code just wants to read certain mirror of a logical bytenr.
>> This can simplify the handling of RAID56, as for data stripes the repair
>> path is the same, just try the next mirror(s).
>>
>> Furthermore most of the new btrfs_bio code is handling data reads by
>> triggering read-repair automatically.
>> This can be unnecessary for scrub.
> 
> This sounds like you don't want to use the btrfs_bio at all as you
> don't rely on any of the functionality from it.

Well, to me the proper mirror_num based read is the core of btrfs_bio, 
not the read-repair thing.

Thus I'm not that convinced fully automatic read-repair integrated into 
btrfs_bio is a good idea.

Thanks,
Qu
> 
>>
>> And since we're here, can we also have btrfs equivalent of on-stack bio?
>> As scrub can benefit a lot from that, as for sector-by-sector read, we want
>> to avoid repeating allocating/freeing a btrfs_bio just for reading one
>> sector.
>> (The existing behavior is using on-stack bio with bio_init/bio_uninit
>> inside scrub_recheck_block())
> 
> You can do that right now by declaring a btrfs_bio on-stack and then
> calling bio_init on the embedded bio followed by a btrfs_bio_init on
> the btrfs_bio.  But I don't think doing this will actually be a win
> for the scrub code in terms of speed or code size.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-07 22:21       ` Qu Wenruo
@ 2023-03-08  6:04         ` Qu Wenruo
  2023-03-08 14:28           ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Qu Wenruo @ 2023-03-08  6:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Damien Le Moal,
	Naohiro Aota, Johannes Thumshirn, Qu Wenruo, Jens Axboe,
	Darrick J. Wong, linux-block, linux-btrfs, linux-fsdevel



On 2023/3/8 06:21, Qu Wenruo wrote:
> 
> 
> On 2023/3/7 22:41, Christoph Hellwig wrote:
>> On Tue, Mar 07, 2023 at 09:44:32AM +0800, Qu Wenruo wrote:
>>> With my recent restart on scrub rework, this patch makes me wonder, 
>>> what if
>>> scrub wants to use btrfs_bio, but don't want to pass a valid btrfs_inode
>>> pointer?
>>
>> The full inode is only really needed for the data repair code.  But a lot
>> of code uses the fs_info, which would have to be added as a separate
>> counter.  The other usage is the sync_writers counter, which is a bit
>> odd and should probably be keyed off the REQ_SYNC flag instead.
>>
>>> E.g. scrub code just wants to read certain mirror of a logical bytenr.
>>> This can simplify the handling of RAID56, as for data stripes the repair
>>> path is the same, just try the next mirror(s).
>>>
>>> Furthermore most of the new btrfs_bio code is handling data reads by
>>> triggering read-repair automatically.
>>> This can be unnecessary for scrub.
>>
>> This sounds like you don't want to use the btrfs_bio at all as you
>> don't rely on any of the functionality from it.
> 
> Well, to me the proper mirror_num based read is the core of btrfs_bio, 
> not the read-repair thing.
> 
> Thus I'm not that convinced fully automatic read-repair integrated into 
> btrfs_bio is a good idea.

BTW, I also checked if I can craft a scrub specific version of 
btrfs_submit_bio().

The result doesn't look good at all.

Without a btrfs_bio structure, it's already pretty hard to properly put 
bioc, decrease the bio counter.

Or I need to create a scrub_bio, and re-implement all the needed endio 
function handling.

So please really consider the simplest case, one just wants to 
read/write some data using logical + mirror_num, without any btrfs inode 
nor csum verification.

Thanks,
Qu

> 
> Thanks,
> Qu
>>
>>>
>>> And since we're here, can we also have btrfs equivalent of on-stack bio?
>>> As scrub can benefit a lot from that, as for sector-by-sector read, 
>>> we want
>>> to avoid repeating allocating/freeing a btrfs_bio just for reading one
>>> sector.
>>> (The existing behavior is using on-stack bio with bio_init/bio_uninit
>>> inside scrub_recheck_block())
>>
>> You can do that right now by declaring a btrfs_bio on-stack and then
>> calling bio_init on the embedded bio followed by a btrfs_bio_init on
>> the btrfs_bio.  But I don't think doing this will actually be a win
>> for the scrub code in terms of speed or code size.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-08  6:04         ` Qu Wenruo
@ 2023-03-08 14:28           ` Christoph Hellwig
  2023-03-09  0:08             ` Qu Wenruo
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-03-08 14:28 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Damien Le Moal, Naohiro Aota, Johannes Thumshirn, Qu Wenruo,
	Jens Axboe, Darrick J. Wong, linux-block, linux-btrfs,
	linux-fsdevel

On Wed, Mar 08, 2023 at 02:04:26PM +0800, Qu Wenruo wrote:
> BTW, I also checked if I can craft a scrub specific version of 
> btrfs_submit_bio().
>
> The result doesn't look good at all.
>
> Without a btrfs_bio structure, it's already pretty hard to properly put 
> bioc, decrease the bio counter.
>
> Or I need to create a scrub_bio, and re-implement all the needed endio 
> function handling.
>
> So please really consider the simplest case, one just wants to read/write 
> some data using logical + mirror_num, without any btrfs inode nor csum 
> verification.

As said before with a little more work we could get away without the
inode.  But the above sounds a little strange to me.  Can you share
your current code?  Maybe I can come up with some better ideas.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-08 14:28           ` Christoph Hellwig
@ 2023-03-09  0:08             ` Qu Wenruo
  2023-03-09  9:31               ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Qu Wenruo @ 2023-03-09  0:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]



On 2023/3/8 22:28, Christoph Hellwig wrote:
> On Wed, Mar 08, 2023 at 02:04:26PM +0800, Qu Wenruo wrote:
>> BTW, I also checked if I can craft a scrub specific version of
>> btrfs_submit_bio().
>>
>> The result doesn't look good at all.
>>
>> Without a btrfs_bio structure, it's already pretty hard to properly put
>> bioc, decrease the bio counter.
>>
>> Or I need to create a scrub_bio, and re-implement all the needed endio
>> function handling.
>>
>> So please really consider the simplest case, one just wants to read/write
>> some data using logical + mirror_num, without any btrfs inode nor csum
>> verification.
> 
> As said before with a little more work we could get away without the
> inode.  But the above sounds a little strange to me.  Can you share
> your current code?  Maybe I can come up with some better ideas.

My current one is a new btrfs_submit_scrub_read() helper, getting rid of 
features I don't need and slightly modify the endio functions to avoid 
any checks if no bbio->inode. AKA, most of your idea.

So that would be mostly fine.


But what I'm exploring is to completely get rid of btrfs_ino, using 
plain bio.

Things like bio counter is not a big deal as the only caller is 
scrub/dev-replace, thus there would be no race to change dev-replace 
half-way.
So is bioc for non-RAID56 profiles, as we only need to grab the 
dev/physical and can release it immediately.

But for RAID56, the bioc has to live long enough for raid56 work to 
finish, thus has to go btrfs_raid56_end_io() and rely on the extra 
bbio->end_io().

I guess I have to accept the defeat and just go btrfs_bio.

Thanks,
Qu

[-- Attachment #2: 0002-btrfs-introduce-a-new-helper-to-submit-bio-for-scrub.patch --]
[-- Type: text/x-patch, Size: 4002 bytes --]

From 85abc1d8b0d463f9db6d74a5809b10487ae21de8 Mon Sep 17 00:00:00 2001
Message-Id: <85abc1d8b0d463f9db6d74a5809b10487ae21de8.1678261377.git.wqu@suse.com>
In-Reply-To: <cover.1678261377.git.wqu@suse.com>
References: <cover.1678261377.git.wqu@suse.com>
From: Qu Wenruo <wqu@suse.com>
Date: Wed, 8 Mar 2023 14:22:10 +0800
Subject: [PATCH 02/10] btrfs: introduce a new helper to submit bio for scrub

The new helper, btrfs_submit_scrub_read(), would be mostly a subset of
btrfs_submit_bio(), with the following limitations:

- Only supports read
- @mirror_num must be > 0
- No read-time repair nor checksum verification
- The @bbio must not cross stripe boundary

This would provide the basis for unified read repair for scrub, as we no
longer needs to handle RAID56 recovery all by scrub, and RAID56 data
stripes scrub can share the same code of read and repair.

The repair part would be the same as non-RAID56, as we only need to try
the next mirror.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/bio.c | 43 ++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/bio.h |  2 ++
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 726592868e9c..966517ad81ce 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -305,8 +305,8 @@ static void btrfs_end_bio_work(struct work_struct *work)
 {
 	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work);
 
-	/* Metadata reads are checked and repaired by the submitter. */
-	if (bbio->bio.bi_opf & REQ_META)
+	/* Metadata or scrub reads are checked and repaired by the submitter. */
+	if (bbio->bio.bi_opf & REQ_META || !bbio->inode)
 		bbio->end_io(bbio);
 	else
 		btrfs_check_read_bio(bbio, bbio->bio.bi_private);
@@ -340,7 +340,8 @@ static void btrfs_raid56_end_io(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	bbio->mirror_num = bioc->mirror_num;
-	if (bio_op(bio) == REQ_OP_READ && !(bbio->bio.bi_opf & REQ_META))
+	if (bio_op(bio) == REQ_OP_READ && bbio->inode &&
+	    !(bbio->bio.bi_opf & REQ_META))
 		btrfs_check_read_bio(bbio, NULL);
 	else
 		btrfs_orig_bbio_end_io(bbio);
@@ -686,6 +687,42 @@ static bool btrfs_submit_chunk(struct bio *bio, int mirror_num)
 	return true;
 }
 
+/*
+ * Scrub read special version, with extra limits:
+ *
+ * - Only support read for scrub usage
+ * - @mirror_num must be >0
+ * - No read-time repair nor checksum verification.
+ * - The @bbio must not cross stripe boundary.
+ */
+void btrfs_submit_scrub_read(struct btrfs_fs_info *fs_info,
+			     struct btrfs_bio *bbio, int mirror_num)
+{
+	struct btrfs_bio *orig_bbio = bbio;
+	u64 logical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT;
+	u64 length = bbio->bio.bi_iter.bi_size;
+	u64 map_length = length;
+	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_io_stripe smap;
+	int ret;
+
+	ASSERT(mirror_num > 0);
+	ASSERT(btrfs_op(&bbio->bio) == BTRFS_MAP_READ);
+	btrfs_bio_counter_inc_blocked(fs_info);
+	ret = __btrfs_map_block(fs_info, btrfs_op(&bbio->bio), logical,
+				&map_length, &bioc, &smap, &mirror_num, 1);
+	if (ret)
+		goto fail;
+
+	ASSERT(map_length == length);
+	__btrfs_submit_bio(&bbio->bio, bioc, &smap, mirror_num);
+	return;
+
+fail:
+	btrfs_bio_counter_dec(fs_info);
+	btrfs_bio_end_io(orig_bbio, ret);
+}
+
 void btrfs_submit_bio(struct bio *bio, int mirror_num)
 {
 	while (!btrfs_submit_chunk(bio, mirror_num))
diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h
index 873ff85817f0..341d579bd0b9 100644
--- a/fs/btrfs/bio.h
+++ b/fs/btrfs/bio.h
@@ -89,6 +89,8 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
 #define REQ_BTRFS_ONE_ORDERED			REQ_DRV
 
 void btrfs_submit_bio(struct bio *bio, int mirror_num);
+void btrfs_submit_scrub_read(struct btrfs_fs_info *fs_info,
+			     struct btrfs_bio *bbio, int mirror_num);
 int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			    u64 length, u64 logical, struct page *page,
 			    unsigned int pg_offset, int mirror_num);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-09  0:08             ` Qu Wenruo
@ 2023-03-09  9:31               ` Christoph Hellwig
  2023-03-09 10:32                 ` Qu Wenruo
  0 siblings, 1 reply; 80+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:31 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-fsdevel

On Thu, Mar 09, 2023 at 08:08:34AM +0800, Qu Wenruo wrote:
> My current one is a new btrfs_submit_scrub_read() helper, getting rid of 
> features I don't need and slightly modify the endio functions to avoid any 
> checks if no bbio->inode. AKA, most of your idea.
>
> So that would be mostly fine.

This looks mostly ok to me.  I suspect in the longer run all metadata
I/O might be able to use this helper as well.

> But for RAID56, the bioc has to live long enough for raid56 work to finish, 
> thus has to go btrfs_raid56_end_io() and rely on the extra bbio->end_io().

The bioc lifetimes for RAID56 are a bit odd and one of the things I'd
love to eventually look into, but іt's not very high on the priority list
right now.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-09  9:31               ` Christoph Hellwig
@ 2023-03-09 10:32                 ` Qu Wenruo
  2023-03-09 15:18                   ` Christoph Hellwig
  0 siblings, 1 reply; 80+ messages in thread
From: Qu Wenruo @ 2023-03-09 10:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, linux-fsdevel



On 2023/3/9 17:31, Christoph Hellwig wrote:
> On Thu, Mar 09, 2023 at 08:08:34AM +0800, Qu Wenruo wrote:
>> My current one is a new btrfs_submit_scrub_read() helper, getting rid of
>> features I don't need and slightly modify the endio functions to avoid any
>> checks if no bbio->inode. AKA, most of your idea.
>>
>> So that would be mostly fine.
> 
> This looks mostly ok to me.  I suspect in the longer run all metadata
> I/O might be able to use this helper as well.

IMHO metadata would also go into the btrfs_check_read_bio().

As for now, all the info for metadata verification is already integrated 
into bbio, thus in the long run, the no-check path would be the exception.
> 
>> But for RAID56, the bioc has to live long enough for raid56 work to finish,
>> thus has to go btrfs_raid56_end_io() and rely on the extra bbio->end_io().
> 
> The bioc lifetimes for RAID56 are a bit odd and one of the things I'd
> love to eventually look into, but іt's not very high on the priority list
> right now.

If you have some good ideas, I'm very happy to try.

My current idea is to dump a bioc for btrfs_raid_bio, so 
btrfs_submit_bio() path can just free the bioc after usage, no need to wait.

But not sure if this change would really cleanup the code, thus it would 
still be very low on priority.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 03/34] btrfs: add a btrfs_inode pointer to struct btrfs_bio
  2023-03-09 10:32                 ` Qu Wenruo
@ 2023-03-09 15:18                   ` Christoph Hellwig
  0 siblings, 0 replies; 80+ messages in thread
From: Christoph Hellwig @ 2023-03-09 15:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-fsdevel

On Thu, Mar 09, 2023 at 06:32:41PM +0800, Qu Wenruo wrote:
>> This looks mostly ok to me.  I suspect in the longer run all metadata
>> I/O might be able to use this helper as well.
>
> IMHO metadata would also go into the btrfs_check_read_bio().

Why?  The reason data checksum verification is down there is so that
it can deal with split bios.  Metadata never gets split, so it is more
logical to keep the metadata verification in the extent_buffer code.

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2023-03-09 15:18 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-21  6:49 consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig
2023-01-21  6:49 ` [PATCH 01/34] block: export bio_split_rw Christoph Hellwig
2023-01-21  6:49 ` [PATCH 02/34] btrfs: better document struct btrfs_bio Christoph Hellwig
2023-01-22 10:19   ` Anand Jain
2023-01-23 15:25   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 03/34] btrfs: add a btrfs_inode pointer to " Christoph Hellwig
2023-01-22 10:20   ` Anand Jain
2023-01-23 15:26   ` Johannes Thumshirn
2023-01-23 15:47   ` Johannes Thumshirn
2023-03-07  1:44   ` Qu Wenruo
2023-03-07 14:41     ` Christoph Hellwig
2023-03-07 22:21       ` Qu Wenruo
2023-03-08  6:04         ` Qu Wenruo
2023-03-08 14:28           ` Christoph Hellwig
2023-03-09  0:08             ` Qu Wenruo
2023-03-09  9:31               ` Christoph Hellwig
2023-03-09 10:32                 ` Qu Wenruo
2023-03-09 15:18                   ` Christoph Hellwig
2023-01-21  6:50 ` [PATCH 04/34] btrfs: remove the direct I/O read checksum lookup optimization Christoph Hellwig
2023-01-23 15:59   ` Johannes Thumshirn
2023-01-24 14:55   ` Anand Jain
2023-01-24 19:55     ` Christoph Hellwig
2023-01-25  7:42       ` Anand Jain
2023-01-21  6:50 ` [PATCH 05/34] btrfs: simplify btrfs_lookup_bio_sums Christoph Hellwig
2023-01-23 16:06   ` Johannes Thumshirn
2023-01-24 15:16   ` Anand Jain
2023-01-21  6:50 ` [PATCH 06/34] btrfs: slightly refactor btrfs_submit_bio Christoph Hellwig
2023-01-23 16:13   ` Johannes Thumshirn
2023-01-24 15:27   ` Anand Jain
2023-01-21  6:50 ` [PATCH 07/34] btrfs: save the bio iter for checksum validation in common code Christoph Hellwig
2023-01-23 16:18   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 08/34] btrfs: pre-load data checksum for reads in btrfs_submit_bio Christoph Hellwig
2023-01-23 16:32   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 09/34] btrfs: add a btrfs_data_csum_ok helper Christoph Hellwig
2023-01-23 16:36   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 10/34] btrfs: handle checksum validation and repair at the storage layer Christoph Hellwig
2023-01-23 16:39   ` Johannes Thumshirn
2023-01-24  6:47     ` Christoph Hellwig
2023-01-24  8:16       ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 11/34] btrfs: remove btrfs_bio_free_csum Christoph Hellwig
2023-01-23 16:41   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 12/34] btrfs: remove btrfs_bio_for_each_sector Christoph Hellwig
2023-01-23 16:42   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 13/34] btrfs: remove now unused checksumming helpers Christoph Hellwig
2023-01-23 16:49   ` Johannes Thumshirn
2023-01-23 16:53     ` Christoph Hellwig
2023-01-24  8:33       ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 14/34] btrfs: remove the device field in struct btrfs_bio Christoph Hellwig
2023-01-24 11:01   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 15/34] btrfs: remove the io_failure_record infrastructure Christoph Hellwig
2023-01-24 11:04   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 16/34] btrfs: rename the iter field in struct btrfs_bio Christoph Hellwig
2023-01-24 11:09   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 17/34] btrfs: remove the is_metadata flag " Christoph Hellwig
2023-01-24 11:13   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 18/34] btrfs: remove the submit_bio_start helpers Christoph Hellwig
2023-01-24 11:49   ` Johannes Thumshirn
2023-01-21  6:50 ` [PATCH 19/34] btrfs: simplify the btrfs_csum_one_bio calling convention Christoph Hellwig
2023-01-21  6:50 ` [PATCH 20/34] btrfs: handle checksum generation in the storage layer Christoph Hellwig
2023-01-21  6:50 ` [PATCH 21/34] btrfs: handle recording of zoned writes " Christoph Hellwig
2023-01-21  6:50 ` [PATCH 22/34] btrfs: support cloned bios in btree_csum_one_bio Christoph Hellwig
2023-01-21  6:50 ` [PATCH 23/34] btrfs: allow btrfs_submit_bio to split bios Christoph Hellwig
2023-01-25 21:51   ` Josef Bacik
2023-01-26  5:21     ` Christoph Hellwig
2023-01-26 17:43       ` Josef Bacik
2023-01-26 17:46         ` Christoph Hellwig
2023-01-26 18:33           ` David Sterba
2023-01-27  7:02             ` test not in the auto group, was: " Christoph Hellwig
2023-01-21  6:50 ` [PATCH 24/34] btrfs: pass the iomap bio to btrfs_submit_bio Christoph Hellwig
2023-01-21  6:50 ` [PATCH 25/34] btrfs: remove stripe boundary calculation for buffered I/O Christoph Hellwig
2023-01-21  6:50 ` [PATCH 26/34] btrfs: remove stripe boundary calculation for compressed I/O Christoph Hellwig
2023-01-21  6:50 ` [PATCH 27/34] btrfs: remove stripe boundary calculation for encoded I/O Christoph Hellwig
2023-01-21  6:50 ` [PATCH 28/34] btrfs: remove struct btrfs_io_geometry Christoph Hellwig
2023-01-21  6:50 ` [PATCH 29/34] btrfs: remove submit_encoded_read_bio Christoph Hellwig
2023-01-21  6:50 ` [PATCH 30/34] btrfs: remove the fs_info argument to btrfs_submit_bio Christoph Hellwig
2023-01-21  6:50 ` [PATCH 31/34] btrfs: remove now spurious bio submission helpers Christoph Hellwig
2023-01-21  6:50 ` [PATCH 32/34] btrfs: calculate file system wide queue limit for zoned mode Christoph Hellwig
2023-01-21  6:50 ` [PATCH 33/34] btrfs: split zone append bios in btrfs_submit_bio Christoph Hellwig
2023-01-21  6:50 ` [PATCH 34/34] iomap: remove IOMAP_F_ZONE_APPEND Christoph Hellwig
2023-01-24 13:22 ` consolidate btrfs checksumming, repair and bio splitting v4 Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.