All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: cleanup btrfs bio handling
@ 2022-03-22 15:55 Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 01/40] btrfs: fix submission hook error handling in btrfs_repair_one_sector Christoph Hellwig
                   ` (40 more replies)
  0 siblings, 41 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Hi all,

this series started out as an attempt to move btrfs to use the new
as of 5.18 bio interface, which then turned into cleaning up a lot
of the surrounding areas.  It can be roughly divided into 4 sub-series:

 - patches 1 to 4 are bug fixes for bugs found during code inspection.
   It might be a good idea if experienced btrfs developers could help
   review them or correct me if I misunderstood something
 - patches 5 to 22 are general cleanups on how bios are used and
   surrounding code
 - patches 23 to 29 clean up various extra memory allocations in the
   bio I/O path.  With I/Os that go to a single device (like all
   reads) only need the btrfs_bio memory allocation and not additional
   object.
 - patches 30 to 40 integrate the btrfs dio code more tightly with
   iomap and avoid the extra dio_private allocation and bio clone

All this is pretty rough.  It survices a xfstests auto group run on
a default file system config, though.

The tree is based on Jens' for-next tree as it started with the bio
cleanups, and will need a rebase once 5.18-rc1 is out.

A git tree is available here:

   git://git.infradead.org/users/hch/misc.git btrfs-bio-cleanup

Gitweb:

   http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-bio-cleanup

Diffstat:
 fs/btrfs/btrfs_inode.h     |   25 -
 fs/btrfs/check-integrity.c |  165 +++++------
 fs/btrfs/check-integrity.h |    8 
 fs/btrfs/compression.c     |   54 +--
 fs/btrfs/ctree.h           |    6 
 fs/btrfs/disk-io.c         |  272 +++---------------
 fs/btrfs/disk-io.h         |   21 -
 fs/btrfs/extent_io.c       |  210 ++++++--------
 fs/btrfs/extent_io.h       |   17 -
 fs/btrfs/file.c            |    6 
 fs/btrfs/inode.c           |  661 ++++++++++++++++++---------------------------
 fs/btrfs/raid56.c          |  156 ++++------
 fs/btrfs/scrub.c           |   92 ++----
 fs/btrfs/super.c           |   11 
 fs/btrfs/volumes.c         |  392 +++++++++++++++-----------
 fs/btrfs/volumes.h         |   60 ++--
 fs/iomap/direct-io.c       |   29 +
 fs/iomap/iter.c            |   13 
 include/linux/iomap.h      |   23 +
 19 files changed, 963 insertions(+), 1258 deletions(-)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 01/40] btrfs: fix submission hook error handling in btrfs_repair_one_sector
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 02/40] btrfs: fix direct I/O read repair for split bios Christoph Hellwig
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

btrfs_repair_one_sector just wants to free the bio if submit_bio_hook
fails, but btrfs_submit_data_bio will call bio_endio which will call
into the submitter of the original bio and free the bio there as well.

Move the bio_endio calls from btrfs_submit_data_bio and
btrfs_submit_metadata_bio into submit_one_bio to fix this double free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c   | 2 --
 fs/btrfs/extent_io.c | 4 ++++
 fs/btrfs/inode.c     | 4 ----
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b3e9cf3fd1dd1..c245e1b131964 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -941,8 +941,6 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 	return 0;
 
 out_w_error:
-	bio->bi_status = ret;
-	bio_endio(bio);
 	return ret;
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3b386bbb85a7f..e9fa0f6d605ee 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -181,6 +181,10 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 		ret = btrfs_submit_metadata_bio(tree->private_data, bio,
 						mirror_num, bio_flags);
 
+	if (ret) {
+		bio->bi_status = ret;
+		bio_endio(bio);
+	}
 	return blk_status_to_errno(ret);
 }
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5bbea5ec31fc5..3ef8b63bb1b5c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2571,10 +2571,6 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
 
 out:
-	if (ret) {
-		bio->bi_status = ret;
-		bio_endio(bio);
-	}
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 02/40] btrfs: fix direct I/O read repair for split bios
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 01/40] btrfs: fix submission hook error handling in btrfs_repair_one_sector Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 23:59   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices Christoph Hellwig
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_check_read_dio_bio is incorrect for subsequent bios.  Add
a file_offset field to struct btrfs_bio to pass along the correct offset.

Given that check_data_csum only uses start of an error message this
means problems with this miscalculation will only show up when I/O
fails or checksums mismatch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c |  1 +
 fs/btrfs/inode.c     | 13 +++++--------
 fs/btrfs/volumes.h   |  3 +++
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e9fa0f6d605ee..7ca4e9b80f023 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2662,6 +2662,7 @@ int btrfs_repair_one_sector(struct inode *inode,
 
 	repair_bio = btrfs_bio_alloc(1);
 	repair_bbio = btrfs_bio(repair_bio);
+	repair_bbio->file_offset = start;
 	repair_bio->bi_opf = REQ_OP_READ;
 	repair_bio->bi_end_io = failed_bio->bi_end_io;
 	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3ef8b63bb1b5c..93f00e9150ed0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7773,8 +7773,6 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct bio_vec bvec;
 	struct bvec_iter iter;
-	const u64 orig_file_offset = dip->file_offset;
-	u64 start = orig_file_offset;
 	u32 bio_offset = 0;
 	blk_status_t err = BLK_STS_OK;
 
@@ -7784,6 +7782,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
 		pgoff = bvec.bv_offset;
 		for (i = 0; i < nr_sectors; i++) {
+			u64 start = bbio->file_offset + bio_offset;
+
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
 			    (!csum || !check_data_csum(inode, bbio,
@@ -7796,17 +7796,13 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 			} else {
 				int ret;
 
-				ASSERT((start - orig_file_offset) < UINT_MAX);
-				ret = btrfs_repair_one_sector(inode,
-						&bbio->bio,
-						start - orig_file_offset,
-						bvec.bv_page, pgoff,
+				ret = btrfs_repair_one_sector(inode, &bbio->bio,
+						bio_offset, bvec.bv_page, pgoff,
 						start, bbio->mirror_num,
 						submit_dio_repair_bio);
 				if (ret)
 					err = errno_to_blk_status(ret);
 			}
-			start += sectorsize;
 			ASSERT(bio_offset + sectorsize > bio_offset);
 			bio_offset += sectorsize;
 			pgoff += sectorsize;
@@ -8009,6 +8005,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
 		bio->bi_private = dip;
 		bio->bi_end_io = btrfs_end_dio_bio;
+		btrfs_bio(bio)->file_offset = file_offset;
 
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 			status = extract_ordered_extent(BTRFS_I(inode), bio,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 005c9e2a491a1..c22148bebc2f5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -323,6 +323,9 @@ struct btrfs_fs_devices {
 struct btrfs_bio {
 	unsigned int mirror_num;
 
+	/* for direct I/O */
+	u64 file_offset;
+
 	/* @device is for stripe IO submission. */
 	struct btrfs_device *device;
 	u8 *csum;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 01/40] btrfs: fix submission hook error handling in btrfs_repair_one_sector Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 02/40] btrfs: fix direct I/O read repair for split bios Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:00   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 04/40] btrfs: fix and document the zoned device choice in alloc_new_bio Christoph Hellwig
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_end_dio_bio to record the write location for zone devices is
icorrect for subsequent bios.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 93f00e9150ed0..325e773c6e880 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7829,6 +7829,7 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
 static void btrfs_end_dio_bio(struct bio *bio)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t err = bio->bi_status;
 
 	if (err)
@@ -7839,12 +7840,12 @@ static void btrfs_end_dio_bio(struct bio *bio)
 			   bio->bi_iter.bi_size, err);
 
 	if (bio_op(bio) == REQ_OP_READ)
-		err = btrfs_check_read_dio_bio(dip, btrfs_bio(bio), !err);
+		err = btrfs_check_read_dio_bio(dip, bbio, !err);
 
 	if (err)
 		dip->dio_bio->bi_status = err;
 
-	btrfs_record_physical_zoned(dip->inode, dip->file_offset, bio);
+	btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio);
 
 	bio_put(bio);
 	btrfs_dio_private_put(dip);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 04/40] btrfs: fix and document the zoned device choice in alloc_new_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 05/40] btrfs: refactor __btrfsic_submit_bio Christoph Hellwig
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Zone Append bios only need a valid block device in struct bio, but
not the device in the btrfs_bio.  Use the information from
btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on
multi-device file system with non-homogeneous capabilities and remove
the pointless btrfs_bio.device assignment.

Add big fat comments explaining what is going on here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 7ca4e9b80f023..e789676373ab0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3330,24 +3330,37 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 	if (ret < 0)
 		goto error;
-	if (wbc) {
-		struct block_device *bdev;
 
-		bdev = fs_info->fs_devices->latest_dev->bdev;
-		bio_set_dev(bio, bdev);
-		wbc_init_bio(wbc, bio);
-	}
-	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-		struct btrfs_device *device;
+	if (wbc) {
+		/*
+		 * For Zone append we need the correct block_device that we are
+		 * going to write to set in the bio to be able to respect the
+		 * hardware limitation.  Look it up here:
+		 */
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+			struct btrfs_device *dev;
+
+			dev = btrfs_zoned_get_device(fs_info, disk_bytenr,
+						     fs_info->sectorsize);
+			if (IS_ERR(dev)) {
+				ret = PTR_ERR(dev);
+				goto error;
+			}
 
-		device = btrfs_zoned_get_device(fs_info, disk_bytenr,
-						fs_info->sectorsize);
-		if (IS_ERR(device)) {
-			ret = PTR_ERR(device);
-			goto error;
+			bio_set_dev(bio, dev->bdev);
+		} else {
+			/*
+			 * Otherwise pick the last added device to support
+			 * cgroup writeback.  For multi-device file systems this
+			 * means blk-cgroup policies have to always be set on the
+			 * last added/replaced device.  This is a bit odd but has
+			 * been like that for a long time.
+			 */
+			bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev);
 		}
-
-		btrfs_bio(bio)->device = device;
+		wbc_init_bio(wbc, bio);
+	} else {
+		ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND);
 	}
 	return 0;
 error:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 05/40] btrfs: refactor __btrfsic_submit_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 04/40] btrfs: fix and document the zoned device choice in alloc_new_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 06/40] btrfs: split submit_bio from btrfsic checking Christoph Hellwig
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Split out two helpers to mak __btrfsic_submit_bio more readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/check-integrity.c | 150 +++++++++++++++++++------------------
 1 file changed, 78 insertions(+), 72 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index abac86a758401..9efc1feb6cb08 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2635,6 +2635,74 @@ static struct btrfsic_dev_state *btrfsic_dev_state_lookup(dev_t dev)
 						  &btrfsic_dev_state_hashtable);
 }
 
+static void btrfsic_check_write_bio(struct bio *bio,
+		struct btrfsic_dev_state *dev_state)
+{
+	unsigned int segs = bio_segments(bio);
+	u64 dev_bytenr = 512 * bio->bi_iter.bi_sector;
+	u64 cur_bytenr = dev_bytenr;
+	struct bvec_iter iter;
+	struct bio_vec bvec;
+	char **mapped_datav;
+	int bio_is_patched = 0;
+	int i = 0;
+
+	if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
+		pr_info("submit_bio(rw=%d,0x%x, bi_vcnt=%u, bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
+		       bio_op(bio), bio->bi_opf, segs,
+		       bio->bi_iter.bi_sector, dev_bytenr, bio->bi_bdev);
+
+	mapped_datav = kmalloc_array(segs, sizeof(*mapped_datav), GFP_NOFS);
+	if (!mapped_datav)
+		return;
+
+	bio_for_each_segment(bvec, bio, iter) {
+		BUG_ON(bvec.bv_len != PAGE_SIZE);
+		mapped_datav[i] = page_address(bvec.bv_page);
+		i++;
+
+		if (dev_state->state->print_mask &
+		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE)
+			pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n",
+			       i, cur_bytenr, bvec.bv_len, bvec.bv_offset);
+		cur_bytenr += bvec.bv_len;
+	}
+
+	btrfsic_process_written_block(dev_state, dev_bytenr, mapped_datav, segs,
+				      bio, &bio_is_patched, bio->bi_opf);
+	kfree(mapped_datav);
+}
+
+static void btrfsic_check_flush_bio(struct bio *bio,
+		struct btrfsic_dev_state *dev_state)
+{
+	if (dev_state->state->print_mask & BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
+		pr_info("submit_bio(rw=%d,0x%x FLUSH, bdev=%p)\n",
+		       bio_op(bio), bio->bi_opf, bio->bi_bdev);
+
+	if (dev_state->dummy_block_for_bio_bh_flush.is_iodone) {
+		struct btrfsic_block *const block =
+			&dev_state->dummy_block_for_bio_bh_flush;
+
+		block->is_iodone = 0;
+		block->never_written = 0;
+		block->iodone_w_error = 0;
+		block->flush_gen = dev_state->last_flush_gen + 1;
+		block->submit_bio_bh_rw = bio->bi_opf;
+		block->orig_bio_private = bio->bi_private;
+		block->orig_bio_end_io = bio->bi_end_io;
+		block->next_in_same_bio = NULL;
+		bio->bi_private = block;
+		bio->bi_end_io = btrfsic_bio_end_io;
+	} else if ((dev_state->state->print_mask &
+		   (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH |
+		    BTRFSIC_PRINT_MASK_VERBOSE))) {
+		pr_info(
+"btrfsic_submit_bio(%pg) with FLUSH but dummy block already in use (ignored)!\n",
+		       dev_state->bdev);
+	}
+}
+
 static void __btrfsic_submit_bio(struct bio *bio)
 {
 	struct btrfsic_dev_state *dev_state;
@@ -2642,80 +2710,18 @@ static void __btrfsic_submit_bio(struct bio *bio)
 	if (!btrfsic_is_initialized)
 		return;
 
-	mutex_lock(&btrfsic_mutex);
-	/* since btrfsic_submit_bio() is also called before
-	 * btrfsic_mount(), this might return NULL */
+	/*
+	 * We can be called before btrfsic_mount, so there might not be a
+	 * dev_state.
+	 */
 	dev_state = btrfsic_dev_state_lookup(bio->bi_bdev->bd_dev);
-	if (NULL != dev_state &&
-	    (bio_op(bio) == REQ_OP_WRITE) && bio_has_data(bio)) {
-		int i = 0;
-		u64 dev_bytenr;
-		u64 cur_bytenr;
-		struct bio_vec bvec;
-		struct bvec_iter iter;
-		int bio_is_patched;
-		char **mapped_datav;
-		unsigned int segs = bio_segments(bio);
-
-		dev_bytenr = 512 * bio->bi_iter.bi_sector;
-		bio_is_patched = 0;
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bio(rw=%d,0x%x, bi_vcnt=%u, bi_sector=%llu (bytenr %llu), bi_bdev=%p)\n",
-			       bio_op(bio), bio->bi_opf, segs,
-			       bio->bi_iter.bi_sector, dev_bytenr, bio->bi_bdev);
-
-		mapped_datav = kmalloc_array(segs,
-					     sizeof(*mapped_datav), GFP_NOFS);
-		if (!mapped_datav)
-			goto leave;
-		cur_bytenr = dev_bytenr;
-
-		bio_for_each_segment(bvec, bio, iter) {
-			BUG_ON(bvec.bv_len != PAGE_SIZE);
-			mapped_datav[i] = page_address(bvec.bv_page);
-			i++;
-
-			if (dev_state->state->print_mask &
-			    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH_VERBOSE)
-				pr_info("#%u: bytenr=%llu, len=%u, offset=%u\n",
-				       i, cur_bytenr, bvec.bv_len, bvec.bv_offset);
-			cur_bytenr += bvec.bv_len;
-		}
-		btrfsic_process_written_block(dev_state, dev_bytenr,
-					      mapped_datav, segs,
-					      bio, &bio_is_patched,
-					      bio->bi_opf);
-		kfree(mapped_datav);
-	} else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) {
-		if (dev_state->state->print_mask &
-		    BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH)
-			pr_info("submit_bio(rw=%d,0x%x FLUSH, bdev=%p)\n",
-			       bio_op(bio), bio->bi_opf, bio->bi_bdev);
-		if (!dev_state->dummy_block_for_bio_bh_flush.is_iodone) {
-			if ((dev_state->state->print_mask &
-			     (BTRFSIC_PRINT_MASK_SUBMIT_BIO_BH |
-			      BTRFSIC_PRINT_MASK_VERBOSE)))
-				pr_info(
-"btrfsic_submit_bio(%pg) with FLUSH but dummy block already in use (ignored)!\n",
-				       dev_state->bdev);
-		} else {
-			struct btrfsic_block *const block =
-				&dev_state->dummy_block_for_bio_bh_flush;
-
-			block->is_iodone = 0;
-			block->never_written = 0;
-			block->iodone_w_error = 0;
-			block->flush_gen = dev_state->last_flush_gen + 1;
-			block->submit_bio_bh_rw = bio->bi_opf;
-			block->orig_bio_private = bio->bi_private;
-			block->orig_bio_end_io = bio->bi_end_io;
-			block->next_in_same_bio = NULL;
-			bio->bi_private = block;
-			bio->bi_end_io = btrfsic_bio_end_io;
-		}
+	mutex_lock(&btrfsic_mutex);
+	if (dev_state) {
+		if (bio_op(bio) == REQ_OP_WRITE && bio_has_data(bio))
+			btrfsic_check_write_bio(bio, dev_state);
+		else if (bio->bi_opf & REQ_PREFLUSH)
+			btrfsic_check_flush_bio(bio, dev_state);
 	}
-leave:
 	mutex_unlock(&btrfsic_mutex);
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 06/40] btrfs: split submit_bio from btrfsic checking
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 05/40] btrfs: refactor __btrfsic_submit_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:04   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 07/40] btrfs: simplify btrfsic_read_block Christoph Hellwig
                   ` (34 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Require a separate call to the integrity checking helpers from the
actual bio submission.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/check-integrity.c | 14 +-------------
 fs/btrfs/check-integrity.h |  8 ++++----
 fs/btrfs/disk-io.c         |  6 ++++--
 fs/btrfs/extent_io.c       |  3 ++-
 fs/btrfs/scrub.c           | 12 ++++++++----
 fs/btrfs/volumes.c         |  3 ++-
 6 files changed, 21 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 9efc1feb6cb08..49f9954f1438f 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2703,7 +2703,7 @@ static void btrfsic_check_flush_bio(struct bio *bio,
 	}
 }
 
-static void __btrfsic_submit_bio(struct bio *bio)
+void btrfsic_check_bio(struct bio *bio)
 {
 	struct btrfsic_dev_state *dev_state;
 
@@ -2725,18 +2725,6 @@ static void __btrfsic_submit_bio(struct bio *bio)
 	mutex_unlock(&btrfsic_mutex);
 }
 
-void btrfsic_submit_bio(struct bio *bio)
-{
-	__btrfsic_submit_bio(bio);
-	submit_bio(bio);
-}
-
-int btrfsic_submit_bio_wait(struct bio *bio)
-{
-	__btrfsic_submit_bio(bio);
-	return submit_bio_wait(bio);
-}
-
 int btrfsic_mount(struct btrfs_fs_info *fs_info,
 		  struct btrfs_fs_devices *fs_devices,
 		  int including_extent_data, u32 print_mask)
diff --git a/fs/btrfs/check-integrity.h b/fs/btrfs/check-integrity.h
index bcc730a06cb58..ed115e0f2ebbd 100644
--- a/fs/btrfs/check-integrity.h
+++ b/fs/btrfs/check-integrity.h
@@ -7,11 +7,11 @@
 #define BTRFS_CHECK_INTEGRITY_H
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
-void btrfsic_submit_bio(struct bio *bio);
-int btrfsic_submit_bio_wait(struct bio *bio);
+void btrfsic_check_bio(struct bio *bio);
 #else
-#define btrfsic_submit_bio submit_bio
-#define btrfsic_submit_bio_wait submit_bio_wait
+static inline void btrfsic_check_bio(struct bio *bio)
+{
+}
 #endif
 
 int btrfsic_mount(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c245e1b131964..9b8ee74144910 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4048,7 +4048,8 @@ static int write_dev_supers(struct btrfs_device *device,
 		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
 			bio->bi_opf |= REQ_FUA;
 
-		btrfsic_submit_bio(bio);
+		btrfsic_check_bio(bio);
+		submit_bio(bio);
 
 		if (btrfs_advance_sb_log(device, i))
 			errors++;
@@ -4161,7 +4162,8 @@ static void write_dev_flush(struct btrfs_device *device)
 	init_completion(&device->flush_wait);
 	bio->bi_private = &device->flush_wait;
 
-	btrfsic_submit_bio(bio);
+	btrfsic_check_bio(bio);
+	submit_bio(bio);
 	set_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state);
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e789676373ab0..1a39b9ffdd180 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2370,7 +2370,8 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
 	bio_add_page(bio, page, length, pg_offset);
 
-	if (btrfsic_submit_bio_wait(bio)) {
+	btrfsic_check_bio(bio);
+	if (submit_bio_wait(bio)) {
 		/* try to remap that extent elsewhere? */
 		btrfs_bio_counter_dec(fs_info);
 		bio_put(bio);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2e9a322773f28..605ecc675ba7c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1479,7 +1479,8 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 		bio->bi_iter.bi_sector = spage->physical >> 9;
 		bio->bi_opf = REQ_OP_READ;
 
-		if (btrfsic_submit_bio_wait(bio)) {
+		btrfsic_check_bio(bio);
+		if (submit_bio_wait(bio)) {
 			spage->io_error = 1;
 			sblock->no_io_error_seen = 0;
 		}
@@ -1565,7 +1566,8 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
 			return -EIO;
 		}
 
-		if (btrfsic_submit_bio_wait(bio)) {
+		btrfsic_check_bio(bio);
+		if (submit_bio_wait(bio)) {
 			btrfs_dev_stat_inc_and_print(spage_bad->dev,
 				BTRFS_DEV_STAT_WRITE_ERRS);
 			atomic64_inc(&fs_info->dev_replace.num_write_errors);
@@ -1723,7 +1725,8 @@ static void scrub_wr_submit(struct scrub_ctx *sctx)
 	 * orders the requests before sending them to the driver which
 	 * doubled the write performance on spinning disks when measured
 	 * with Linux 3.5 */
-	btrfsic_submit_bio(sbio->bio);
+	btrfsic_check_bio(sbio->bio);
+	submit_bio(sbio->bio);
 
 	if (btrfs_is_zoned(sctx->fs_info))
 		sctx->write_pointer = sbio->physical + sbio->page_count *
@@ -2057,7 +2060,8 @@ static void scrub_submit(struct scrub_ctx *sctx)
 	sbio = sctx->bios[sctx->curr];
 	sctx->curr = -1;
 	scrub_pending_bio_inc(sctx);
-	btrfsic_submit_bio(sbio->bio);
+	btrfsic_check_bio(sbio->bio);
+	submit_bio(sbio->bio);
 }
 
 static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b07d382d53a86..bfa8e825e5047 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6755,7 +6755,8 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 
-	btrfsic_submit_bio(bio);
+	btrfsic_check_bio(bio);
+	submit_bio(bio);
 }
 
 static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 07/40] btrfs: simplify btrfsic_read_block
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 06/40] btrfs: split submit_bio from btrfsic checking Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 08/40] btrfs: simplify repair_io_failure Christoph Hellwig
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

btrfsic_read_block does not need the btrfs_bio structure, so switch to
plain bio_alloc.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/check-integrity.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 49f9954f1438f..0fd3ca10ec569 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1563,10 +1563,9 @@ static int btrfsic_read_block(struct btrfsic_state *state,
 		struct bio *bio;
 		unsigned int j;
 
-		bio = btrfs_bio_alloc(num_pages - i);
-		bio_set_dev(bio, block_ctx->dev->bdev);
+		bio = bio_alloc(block_ctx->dev->bdev, num_pages - i,
+				REQ_OP_READ, GFP_NOFS);
 		bio->bi_iter.bi_sector = dev_bytenr >> 9;
-		bio->bi_opf = REQ_OP_READ;
 
 		for (j = i; j < num_pages; j++) {
 			ret = bio_add_page(bio, block_ctx->pagev[j],
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 08/40] btrfs: simplify repair_io_failure
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 07/40] btrfs: simplify btrfsic_read_block Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:06   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 09/40] btrfs: simplify scrub_recheck_block Christoph Hellwig
                   ` (32 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
so just use an on-stack bio.  Also cleanup the error handling to use goto
labels and not discard the actual return values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 52 ++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1a39b9ffdd180..be523581c0ac1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2307,12 +2307,13 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 			     u64 length, u64 logical, struct page *page,
 			     unsigned int pg_offset, int mirror_num)
 {
-	struct bio *bio;
 	struct btrfs_device *dev;
+	struct bio_vec bvec;
+	struct bio bio;
 	u64 map_length = 0;
 	u64 sector;
 	struct btrfs_io_context *bioc = NULL;
-	int ret;
+	int ret = 0;
 
 	ASSERT(!(fs_info->sb->s_flags & SB_RDONLY));
 	BUG_ON(!mirror_num);
@@ -2320,8 +2321,6 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 	if (btrfs_repair_one_zone(fs_info, logical))
 		return 0;
 
-	bio = btrfs_bio_alloc(1);
-	bio->bi_iter.bi_size = 0;
 	map_length = length;
 
 	/*
@@ -2339,53 +2338,50 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
 		 */
 		ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
 				      &map_length, &bioc, 0);
-		if (ret) {
-			btrfs_bio_counter_dec(fs_info);
-			bio_put(bio);
-			return -EIO;
-		}
+		if (ret)
+			goto out_counter_dec;
 		ASSERT(bioc->mirror_num == 1);
 	} else {
 		ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
 				      &map_length, &bioc, mirror_num);
-		if (ret) {
-			btrfs_bio_counter_dec(fs_info);
-			bio_put(bio);
-			return -EIO;
-		}
+		if (ret)
+			goto out_counter_dec;
 		BUG_ON(mirror_num != bioc->mirror_num);
 	}
 
 	sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9;
-	bio->bi_iter.bi_sector = sector;
 	dev = bioc->stripes[bioc->mirror_num - 1].dev;
 	btrfs_put_bioc(bioc);
+
 	if (!dev || !dev->bdev ||
 	    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) {
-		btrfs_bio_counter_dec(fs_info);
-		bio_put(bio);
-		return -EIO;
+		ret = -EIO;
+		goto out_counter_dec;
 	}
-	bio_set_dev(bio, dev->bdev);
-	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
-	bio_add_page(bio, page, length, pg_offset);
 
-	btrfsic_check_bio(bio);
-	if (submit_bio_wait(bio)) {
+	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
+	bio.bi_iter.bi_sector = sector;
+	__bio_add_page(&bio, page, length, pg_offset);
+
+	btrfsic_check_bio(&bio);
+	ret = submit_bio_wait(&bio);
+	if (ret) {
 		/* try to remap that extent elsewhere? */
-		btrfs_bio_counter_dec(fs_info);
-		bio_put(bio);
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
-		return -EIO;
+		goto out_bio_uninit;
 	}
 
 	btrfs_info_rl_in_rcu(fs_info,
 		"read error corrected: ino %llu off %llu (dev %s sector %llu)",
 				  ino, start,
 				  rcu_str_deref(dev->name), sector);
+	ret = 0;
+
+out_bio_uninit:
+	bio_uninit(&bio);
+out_counter_dec:
 	btrfs_bio_counter_dec(fs_info);
-	bio_put(bio);
-	return 0;
+	return ret;
 }
 
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 09/40] btrfs: simplify scrub_recheck_block
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 08/40] btrfs: simplify repair_io_failure Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:10   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy Christoph Hellwig
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
so just use an on-stack bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/scrub.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 605ecc675ba7c..508c91e26b6e9 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1462,8 +1462,9 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 		return scrub_recheck_block_on_raid56(fs_info, sblock);
 
 	for (page_num = 0; page_num < sblock->page_count; page_num++) {
-		struct bio *bio;
 		struct scrub_page *spage = sblock->pagev[page_num];
+		struct bio bio;
+		struct bio_vec bvec;
 
 		if (spage->dev->bdev == NULL) {
 			spage->io_error = 1;
@@ -1472,20 +1473,17 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
 		}
 
 		WARN_ON(!spage->page);
-		bio = btrfs_bio_alloc(1);
-		bio_set_dev(bio, spage->dev->bdev);
-
-		bio_add_page(bio, spage->page, fs_info->sectorsize, 0);
-		bio->bi_iter.bi_sector = spage->physical >> 9;
-		bio->bi_opf = REQ_OP_READ;
+		bio_init(&bio, spage->dev->bdev, &bvec, 1, REQ_OP_READ);
+		__bio_add_page(&bio, spage->page, fs_info->sectorsize, 0);
+		bio.bi_iter.bi_sector = spage->physical >> 9;
 
-		btrfsic_check_bio(bio);
-		if (submit_bio_wait(bio)) {
+		btrfsic_check_bio(&bio);
+		if (submit_bio_wait(&bio)) {
 			spage->io_error = 1;
 			sblock->no_io_error_seen = 0;
 		}
 
-		bio_put(bio);
+		bio_uninit(&bio);
 	}
 
 	if (sblock->no_io_error_seen)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 09/40] btrfs: simplify scrub_recheck_block Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:12   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 11/40] btrfs: move the call to bio_set_dev out of submit_stripe_bio Christoph Hellwig
                   ` (30 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
so just use an on-stack bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/scrub.c | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 508c91e26b6e9..bb9382c02714f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1544,7 +1544,8 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
 	BUG_ON(spage_good->page == NULL);
 	if (force_write || sblock_bad->header_error ||
 	    sblock_bad->checksum_error || spage_bad->io_error) {
-		struct bio *bio;
+		struct bio bio;
+		struct bio_vec bvec;
 		int ret;
 
 		if (!spage_bad->dev->bdev) {
@@ -1553,26 +1554,20 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
 			return -EIO;
 		}
 
-		bio = btrfs_bio_alloc(1);
-		bio_set_dev(bio, spage_bad->dev->bdev);
-		bio->bi_iter.bi_sector = spage_bad->physical >> 9;
-		bio->bi_opf = REQ_OP_WRITE;
+		bio_init(&bio, spage_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE);
+		bio.bi_iter.bi_sector = spage_bad->physical >> 9;
+		__bio_add_page(&bio, spage_good->page, sectorsize, 0);
 
-		ret = bio_add_page(bio, spage_good->page, sectorsize, 0);
-		if (ret != sectorsize) {
-			bio_put(bio);
-			return -EIO;
-		}
+		btrfsic_check_bio(&bio);
+		ret = submit_bio_wait(&bio);
+		bio_uninit(&bio);
 
-		btrfsic_check_bio(bio);
-		if (submit_bio_wait(bio)) {
+		if (ret) {
 			btrfs_dev_stat_inc_and_print(spage_bad->dev,
 				BTRFS_DEV_STAT_WRITE_ERRS);
 			atomic64_inc(&fs_info->dev_replace.num_write_errors);
-			bio_put(bio);
 			return -EIO;
 		}
-		bio_put(bio);
 	}
 
 	return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/40] btrfs: move the call to bio_set_dev out of submit_stripe_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 12/40] btrfs: pass a block_device to btrfs_bio_clone Christoph Hellwig
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Prepare for additional refactoring.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bfa8e825e5047..5dc2a89682a22 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6751,7 +6751,6 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
 		bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector,
 		(unsigned long)dev->bdev->bd_dev, rcu_str_deref(dev->name),
 		dev->devid, bio->bi_iter.bi_size);
-	bio_set_dev(bio, dev->bdev);
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 
@@ -6843,6 +6842,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		else
 			bio = first_bio;
 
+		bio_set_dev(bio, dev->bdev);
 		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
 	}
 	btrfs_bio_counter_dec(fs_info);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/40] btrfs: pass a block_device to btrfs_bio_clone
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 11/40] btrfs: move the call to bio_set_dev out of submit_stripe_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 13/40] btrfs: initialize ->bi_opf and ->bi_private in rbio_add_io_page Christoph Hellwig
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Pass the block_device to bio_alloc_clone instead of setting it later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 4 ++--
 fs/btrfs/extent_io.h | 2 +-
 fs/btrfs/volumes.c   | 9 +++++----
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index be523581c0ac1..88d3a46e89a51 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3150,13 +3150,13 @@ struct bio *btrfs_bio_alloc(unsigned int nr_iovecs)
 	return bio;
 }
 
-struct bio *btrfs_bio_clone(struct bio *bio)
+struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio)
 {
 	struct btrfs_bio *bbio;
 	struct bio *new;
 
 	/* Bio allocation backed by a bioset does not fail */
-	new = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOFS, &btrfs_bioset);
+	new = bio_alloc_clone(bdev, bio, GFP_NOFS, &btrfs_bioset);
 	bbio = btrfs_bio(new);
 	btrfs_bio_init(bbio);
 	bbio->iter = bio->bi_iter;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 0399cf8e3c32c..72d86f228c56e 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -278,7 +278,7 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
 				  u32 bits_to_clear, unsigned long page_ops);
 struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
-struct bio *btrfs_bio_clone(struct bio *bio);
+struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio);
 struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5dc2a89682a22..4dd54b80dac81 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6837,12 +6837,13 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			continue;
 		}
 
-		if (dev_nr < total_devs - 1)
-			bio = btrfs_bio_clone(first_bio);
-		else
+		if (dev_nr < total_devs - 1) {
+			bio = btrfs_bio_clone(dev->bdev, first_bio);
+		} else {
 			bio = first_bio;
+			bio_set_dev(bio, dev->bdev);
+		}
 
-		bio_set_dev(bio, dev->bdev);
 		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
 	}
 	btrfs_bio_counter_dec(fs_info);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/40] btrfs: initialize ->bi_opf and ->bi_private in rbio_add_io_page
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (11 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 12/40] btrfs: pass a block_device to btrfs_bio_clone Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios Christoph Hellwig
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Prepare for further refactoring by moving this initialization to a single
place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/raid56.c | 38 ++++++++++++++++++--------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0e239a4c3b264..2f1f7ca27acd5 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1069,7 +1069,8 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 			    struct page *page,
 			    int stripe_nr,
 			    unsigned long page_index,
-			    unsigned long bio_max_len)
+			    unsigned long bio_max_len,
+			    unsigned int opf)
 {
 	struct bio *last = bio_list->tail;
 	int ret;
@@ -1106,7 +1107,9 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	btrfs_bio(bio)->device = stripe->dev;
 	bio->bi_iter.bi_size = 0;
 	bio_set_dev(bio, stripe->dev->bdev);
+	bio->bi_opf = opf;
 	bio->bi_iter.bi_sector = disk_start >> 9;
+	bio->bi_private = rbio;
 
 	bio_add_page(bio, page, PAGE_SIZE, 0);
 	bio_list_add(bio_list, bio);
@@ -1275,7 +1278,8 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 			}
 
 			ret = rbio_add_io_page(rbio, &bio_list,
-				       page, stripe, pagenr, rbio->stripe_len);
+				       page, stripe, pagenr, rbio->stripe_len,
+				       REQ_OP_WRITE);
 			if (ret)
 				goto cleanup;
 		}
@@ -1300,7 +1304,8 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 
 			ret = rbio_add_io_page(rbio, &bio_list, page,
 					       rbio->bioc->tgtdev_map[stripe],
-					       pagenr, rbio->stripe_len);
+					       pagenr, rbio->stripe_len,
+					       REQ_OP_WRITE);
 			if (ret)
 				goto cleanup;
 		}
@@ -1311,9 +1316,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
 	BUG_ON(atomic_read(&rbio->stripes_pending) == 0);
 
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_private = rbio;
 		bio->bi_end_io = raid_write_end_io;
-		bio->bi_opf = REQ_OP_WRITE;
 
 		submit_bio(bio);
 	}
@@ -1517,7 +1520,8 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 				continue;
 
 			ret = rbio_add_io_page(rbio, &bio_list, page,
-				       stripe, pagenr, rbio->stripe_len);
+				       stripe, pagenr, rbio->stripe_len,
+				       REQ_OP_READ);
 			if (ret)
 				goto cleanup;
 		}
@@ -1540,9 +1544,7 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_private = rbio;
 		bio->bi_end_io = raid_rmw_end_io;
-		bio->bi_opf = REQ_OP_READ;
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
@@ -2059,7 +2061,8 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 
 			ret = rbio_add_io_page(rbio, &bio_list,
 				       rbio_stripe_page(rbio, stripe, pagenr),
-				       stripe, pagenr, rbio->stripe_len);
+				       stripe, pagenr, rbio->stripe_len,
+				       REQ_OP_READ);
 			if (ret < 0)
 				goto cleanup;
 		}
@@ -2086,9 +2089,7 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_private = rbio;
 		bio->bi_end_io = raid_recover_end_io;
-		bio->bi_opf = REQ_OP_READ;
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
@@ -2419,8 +2420,8 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 		struct page *page;
 
 		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
-		ret = rbio_add_io_page(rbio, &bio_list,
-			       page, rbio->scrubp, pagenr, rbio->stripe_len);
+		ret = rbio_add_io_page(rbio, &bio_list, page, rbio->scrubp,
+				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
 		if (ret)
 			goto cleanup;
 	}
@@ -2434,7 +2435,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 		page = rbio_stripe_page(rbio, rbio->scrubp, pagenr);
 		ret = rbio_add_io_page(rbio, &bio_list, page,
 				       bioc->tgtdev_map[rbio->scrubp],
-				       pagenr, rbio->stripe_len);
+				       pagenr, rbio->stripe_len, REQ_OP_WRITE);
 		if (ret)
 			goto cleanup;
 	}
@@ -2450,9 +2451,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	atomic_set(&rbio->stripes_pending, nr_data);
 
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_private = rbio;
 		bio->bi_end_io = raid_write_end_io;
-		bio->bi_opf = REQ_OP_WRITE;
 
 		submit_bio(bio);
 	}
@@ -2604,8 +2603,9 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 			if (PageUptodate(page))
 				continue;
 
-			ret = rbio_add_io_page(rbio, &bio_list, page,
-				       stripe, pagenr, rbio->stripe_len);
+			ret = rbio_add_io_page(rbio, &bio_list, page, stripe,
+					       pagenr, rbio->stripe_len,
+					       REQ_OP_READ);
 			if (ret)
 				goto cleanup;
 		}
@@ -2628,9 +2628,7 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_private = rbio;
 		bio->bi_end_io = raid56_parity_scrub_end_io;
-		bio->bi_opf = REQ_OP_READ;
 
 		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (12 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 13/40] btrfs: initialize ->bi_opf and ->bi_private in rbio_add_io_page Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:16   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios Christoph Hellwig
                   ` (26 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Except for the spurious initialization of ->device just after allocation
nothing uses the btrfs_bio, so just allocate a normal bio without extra
data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/raid56.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 2f1f7ca27acd5..a0d65f4b2b258 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1103,11 +1103,8 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
 	}
 
 	/* put a new bio on the list */
-	bio = btrfs_bio_alloc(bio_max_len >> PAGE_SHIFT ?: 1);
-	btrfs_bio(bio)->device = stripe->dev;
-	bio->bi_iter.bi_size = 0;
-	bio_set_dev(bio, stripe->dev->bdev);
-	bio->bi_opf = opf;
+	bio = bio_alloc(stripe->dev->bdev, max(bio_max_len >> PAGE_SHIFT, 1UL),
+			opf, GFP_NOFS);
 	bio->bi_iter.bi_sector = disk_start >> 9;
 	bio->bi_private = rbio;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (13 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:18   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 16/40] btrfs: stop using the btrfs_bio saved iter in index_rbio_pages Christoph Hellwig
                   ` (25 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

All the scrub bios go straight to the block device or the raid56 code,
none of which looks at the btrfs_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/scrub.c | 47 ++++++++++++++++++-----------------------------
 1 file changed, 18 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index bb9382c02714f..250d271b02341 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1415,8 +1415,8 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
 	if (!first_page->dev->bdev)
 		goto out;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
-	bio_set_dev(bio, first_page->dev->bdev);
+	bio = bio_alloc(first_page->dev->bdev, BIO_MAX_VECS, REQ_OP_READ,
+			GFP_NOFS);
 
 	for (page_num = 0; page_num < sblock->page_count; page_num++) {
 		struct scrub_page *spage = sblock->pagev[page_num];
@@ -1649,8 +1649,6 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
 	}
 	sbio = sctx->wr_curr_bio;
 	if (sbio->page_count == 0) {
-		struct bio *bio;
-
 		ret = fill_writer_pointer_gap(sctx,
 					      spage->physical_for_dev_replace);
 		if (ret) {
@@ -1661,17 +1659,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
 		sbio->physical = spage->physical_for_dev_replace;
 		sbio->logical = spage->logical;
 		sbio->dev = sctx->wr_tgtdev;
-		bio = sbio->bio;
-		if (!bio) {
-			bio = btrfs_bio_alloc(sctx->pages_per_bio);
-			sbio->bio = bio;
+		if (!sbio->bio) {
+			sbio->bio = bio_alloc(sbio->dev->bdev,
+					      sctx->pages_per_bio,
+					      REQ_OP_WRITE, GFP_NOFS);
 		}
-
-		bio->bi_private = sbio;
-		bio->bi_end_io = scrub_wr_bio_end_io;
-		bio_set_dev(bio, sbio->dev->bdev);
-		bio->bi_iter.bi_sector = sbio->physical >> 9;
-		bio->bi_opf = REQ_OP_WRITE;
+		sbio->bio->bi_private = sbio;
+		sbio->bio->bi_end_io = scrub_wr_bio_end_io;
+		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
 		sbio->status = 0;
 	} else if (sbio->physical + sbio->page_count * sectorsize !=
 		   spage->physical_for_dev_replace ||
@@ -1712,7 +1707,6 @@ static void scrub_wr_submit(struct scrub_ctx *sctx)
 
 	sbio = sctx->wr_curr_bio;
 	sctx->wr_curr_bio = NULL;
-	WARN_ON(!sbio->bio->bi_bdev);
 	scrub_pending_bio_inc(sctx);
 	/* process all writes in a single worker thread. Then the block layer
 	 * orders the requests before sending them to the driver which
@@ -2084,22 +2078,17 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
 	}
 	sbio = sctx->bios[sctx->curr];
 	if (sbio->page_count == 0) {
-		struct bio *bio;
-
 		sbio->physical = spage->physical;
 		sbio->logical = spage->logical;
 		sbio->dev = spage->dev;
-		bio = sbio->bio;
-		if (!bio) {
-			bio = btrfs_bio_alloc(sctx->pages_per_bio);
-			sbio->bio = bio;
+		if (!sbio->bio) {
+			sbio->bio = bio_alloc(sbio->dev->bdev,
+					      sctx->pages_per_bio,
+					      REQ_OP_READ, GFP_NOFS);
 		}
-
-		bio->bi_private = sbio;
-		bio->bi_end_io = scrub_bio_end_io;
-		bio_set_dev(bio, sbio->dev->bdev);
-		bio->bi_iter.bi_sector = sbio->physical >> 9;
-		bio->bi_opf = REQ_OP_READ;
+		sbio->bio->bi_private = sbio;
+		sbio->bio->bi_end_io = scrub_bio_end_io;
+		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
 		sbio->status = 0;
 	} else if (sbio->physical + sbio->page_count * sectorsize !=
 		   spage->physical ||
@@ -2215,7 +2204,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 		goto bioc_out;
 	}
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
+	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
 	bio->bi_iter.bi_sector = logical >> 9;
 	bio->bi_private = sblock;
 	bio->bi_end_io = scrub_missing_raid56_end_io;
@@ -2831,7 +2820,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 	if (ret || !bioc || !bioc->raid_map)
 		goto bioc_out;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
+	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
 	bio->bi_iter.bi_sector = sparity->logic_start >> 9;
 	bio->bi_private = sparity;
 	bio->bi_end_io = scrub_parity_bio_endio;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 16/40] btrfs: stop using the btrfs_bio saved iter in index_rbio_pages
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (14 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair Christoph Hellwig
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

The bios added to ->bio_list are the original bios fed into
btrfs_map_bio, which are never advanced.  Just use the iter in the
bio itself.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/raid56.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index a0d65f4b2b258..0c96e91e9ee03 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1155,9 +1155,6 @@ static void index_rbio_pages(struct btrfs_raid_bio *rbio)
 		stripe_offset = start - rbio->bioc->raid_map[0];
 		page_index = stripe_offset >> PAGE_SHIFT;
 
-		if (bio_flagged(bio, BIO_CLONED))
-			bio->bi_iter = btrfs_bio(bio)->iter;
-
 		bio_for_each_segment(bvec, bio, iter) {
 			rbio->bio_pages[page_index + i] = bvec.bv_page;
 			i++;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (15 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 16/40] btrfs: stop using the btrfs_bio saved iter in index_rbio_pages Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:20   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 18/40] btrfs: move more work into btrfs_end_bioc Christoph Hellwig
                   ` (23 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

submit_bio_hooks is always set to btrfs_submit_data_bio, so just remove
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 88d3a46e89a51..238252f86d5ad 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2721,8 +2721,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
 				      struct bio *failed_bio, u32 bio_offset,
 				      struct page *page, unsigned int pgoff,
 				      u64 start, u64 end, int failed_mirror,
-				      unsigned int error_bitmap,
-				      submit_bio_hook_t *submit_bio_hook)
+				      unsigned int error_bitmap)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const u32 sectorsize = fs_info->sectorsize;
@@ -2760,7 +2759,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
 		ret = btrfs_repair_one_sector(inode, failed_bio,
 				bio_offset + offset,
 				page, pgoff + offset, start + offset,
-				failed_mirror, submit_bio_hook);
+				failed_mirror, btrfs_submit_data_bio);
 		if (!ret) {
 			/*
 			 * We have submitted the read repair, the page release
@@ -3075,8 +3074,7 @@ static void end_bio_extent_readpage(struct bio *bio)
 			 */
 			submit_read_repair(inode, bio, bio_offset, page,
 					   start - page_offset(page), start,
-					   end, mirror, error_bitmap,
-					   btrfs_submit_data_bio);
+					   end, mirror, error_bitmap);
 
 			ASSERT(bio_offset + len > bio_offset);
 			bio_offset += len;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 18/40] btrfs: move more work into btrfs_end_bioc
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (16 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:29   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 19/40] btrfs: defer I/O completion based on the btrfs_raid_bio Christoph Hellwig
                   ` (22 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Assign ->mirror_num and ->bi_status in btrfs_end_bioc instead of
duplicating the logic in the callers.  Also remove the bio argument as
it always must be bioc->orig_bio and the now pointless bioc_error that
did nothing but assign bi_sector to the same value just sampled in the
caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 68 ++++++++++++++--------------------------------
 1 file changed, 20 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4dd54b80dac81..9d1f8c27eff33 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6659,19 +6659,29 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
 }
 
-static inline void btrfs_end_bioc(struct btrfs_io_context *bioc, struct bio *bio)
+static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
 {
+	struct bio *bio = bioc->orig_bio;
+
+	btrfs_bio(bio)->mirror_num = bioc->mirror_num;
 	bio->bi_private = bioc->private;
 	bio->bi_end_io = bioc->end_io;
-	bio_endio(bio);
 
+	/*
+	 * Only send an error to the higher layers if it is beyond the tolerance
+	 * threshold.
+	 */
+	if (atomic_read(&bioc->error) > bioc->max_errors)
+		bio->bi_status = BLK_STS_IOERR;
+	else
+		bio->bi_status = BLK_STS_OK;
+	bio_endio(bio);
 	btrfs_put_bioc(bioc);
 }
 
 static void btrfs_end_bio(struct bio *bio)
 {
 	struct btrfs_io_context *bioc = bio->bi_private;
-	int is_orig_bio = 0;
 
 	if (bio->bi_status) {
 		atomic_inc(&bioc->error);
@@ -6692,35 +6702,12 @@ static void btrfs_end_bio(struct bio *bio)
 		}
 	}
 
-	if (bio == bioc->orig_bio)
-		is_orig_bio = 1;
+	if (bio != bioc->orig_bio)
+		bio_put(bio);
 
 	btrfs_bio_counter_dec(bioc->fs_info);
-
-	if (atomic_dec_and_test(&bioc->stripes_pending)) {
-		if (!is_orig_bio) {
-			bio_put(bio);
-			bio = bioc->orig_bio;
-		}
-
-		btrfs_bio(bio)->mirror_num = bioc->mirror_num;
-		/* only send an error to the higher layers if it is
-		 * beyond the tolerance of the btrfs bio
-		 */
-		if (atomic_read(&bioc->error) > bioc->max_errors) {
-			bio->bi_status = BLK_STS_IOERR;
-		} else {
-			/*
-			 * this bio is actually up to date, we didn't
-			 * go over the max number of errors
-			 */
-			bio->bi_status = BLK_STS_OK;
-		}
-
-		btrfs_end_bioc(bioc, bio);
-	} else if (!is_orig_bio) {
-		bio_put(bio);
-	}
+	if (atomic_dec_and_test(&bioc->stripes_pending))
+		btrfs_end_bioc(bioc);
 }
 
 static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
@@ -6758,23 +6745,6 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
 	submit_bio(bio);
 }
 
-static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical)
-{
-	atomic_inc(&bioc->error);
-	if (atomic_dec_and_test(&bioc->stripes_pending)) {
-		/* Should be the original bio. */
-		WARN_ON(bio != bioc->orig_bio);
-
-		btrfs_bio(bio)->mirror_num = bioc->mirror_num;
-		bio->bi_iter.bi_sector = logical >> 9;
-		if (atomic_read(&bioc->error) > bioc->max_errors)
-			bio->bi_status = BLK_STS_IOERR;
-		else
-			bio->bi_status = BLK_STS_OK;
-		btrfs_end_bioc(bioc, bio);
-	}
-}
-
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num)
 {
@@ -6833,7 +6803,9 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 						   &dev->dev_state) ||
 		    (btrfs_op(first_bio) == BTRFS_MAP_WRITE &&
 		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
-			bioc_error(bioc, first_bio, logical);
+			atomic_inc(&bioc->error);
+			if (atomic_dec_and_test(&bioc->stripes_pending))
+				btrfs_end_bioc(bioc);
 			continue;
 		}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 19/40] btrfs: defer I/O completion based on the btrfs_raid_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (17 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 18/40] btrfs: move more work into btrfs_end_bioc Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio Christoph Hellwig
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Instead of attaching a an extra allocation an indirect call to each
low-level bio issued by the RAID code, add a btrfs_work to struct
btrfs_raid_bio and only defer the per-rbio completion action.  The
per-bio action for all the I/Os are trivial and can be safely done
from interrupt context.

As a nice side effect this also allows sharing the boilerplate code
for the per-bio completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c |   6 +--
 fs/btrfs/disk-io.h |   1 -
 fs/btrfs/raid56.c  | 110 +++++++++++++++++++--------------------------
 3 files changed, 46 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9b8ee74144910..dc497e17dcd06 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -740,14 +740,10 @@ static void end_workqueue_bio(struct bio *bio)
 			wq = fs_info->endio_meta_write_workers;
 		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
 			wq = fs_info->endio_freespace_worker;
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			wq = fs_info->endio_raid56_workers;
 		else
 			wq = fs_info->endio_write_workers;
 	} else {
-		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			wq = fs_info->endio_raid56_workers;
-		else if (end_io_wq->metadata)
+		if (end_io_wq->metadata)
 			wq = fs_info->endio_meta_workers;
 		else
 			wq = fs_info->endio_workers;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 5e8bef4b7563a..2364a30cd9e32 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -21,7 +21,6 @@ enum btrfs_wq_endio_type {
 	BTRFS_WQ_ENDIO_DATA,
 	BTRFS_WQ_ENDIO_METADATA,
 	BTRFS_WQ_ENDIO_FREE_SPACE,
-	BTRFS_WQ_ENDIO_RAID56,
 };
 
 static inline u64 btrfs_sb_offset(int mirror)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0c96e91e9ee03..69e45e14a0b39 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -145,6 +145,9 @@ struct btrfs_raid_bio {
 	atomic_t stripes_pending;
 
 	atomic_t error;
+
+	struct btrfs_work end_io_work;
+
 	/*
 	 * these are two arrays of pointers.  We allocate the
 	 * rbio big enough to hold them both and setup their
@@ -1428,15 +1431,7 @@ static void set_bio_pages_uptodate(struct bio *bio)
 		SetPageUptodate(bvec->bv_page);
 }
 
-/*
- * end io for the read phase of the rmw cycle.  All the bios here are physical
- * stripe bios we've read from the disk so we can recalculate the parity of the
- * stripe.
- *
- * This will usually kick off finish_rmw once all the bios are read in, but it
- * may trigger parity reconstruction if we had any errors along the way
- */
-static void raid_rmw_end_io(struct bio *bio)
+static void raid56_bio_end_io(struct bio *bio)
 {
 	struct btrfs_raid_bio *rbio = bio->bi_private;
 
@@ -1449,21 +1444,33 @@ static void raid_rmw_end_io(struct bio *bio)
 
 	if (!atomic_dec_and_test(&rbio->stripes_pending))
 		return;
+	btrfs_queue_work(rbio->bioc->fs_info->endio_raid56_workers,
+			 &rbio->end_io_work);
+}
 
-	if (atomic_read(&rbio->error) > rbio->bioc->max_errors)
-		goto cleanup;
+/*
+ * End io handler for the read phase of the rmw cycle.  All the bios here are
+ * physical stripe bios we've read from the disk so we can recalculate the
+ * parity of the stripe.
+ *
+ * This will usually kick off finish_rmw once all the bios are read in, but it
+ * may trigger parity reconstruction if we had any errors along the way
+ */
+static void raid56_rmw_end_io_work(struct btrfs_work *work)
+{
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, end_io_work);
+
+	if (atomic_read(&rbio->error) > rbio->bioc->max_errors) {
+		rbio_orig_end_io(rbio, BLK_STS_IOERR);
+		return;
+	}
 
 	/*
-	 * this will normally call finish_rmw to start our write
-	 * but if there are any failed stripes we'll reconstruct
-	 * from parity first
+	 * This will normally call finish_rmw to start our write but if there
+	 * are any failed stripes we'll reconstruct from parity first.
 	 */
 	validate_rbio_for_rmw(rbio);
-	return;
-
-cleanup:
-
-	rbio_orig_end_io(rbio, BLK_STS_IOERR);
 }
 
 /*
@@ -1537,11 +1544,9 @@ static int raid56_rmw_stripe(struct btrfs_raid_bio *rbio)
 	 * touch it after that.
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
+	btrfs_init_work(&rbio->end_io_work, raid56_rmw_end_io_work, NULL, NULL);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_end_io = raid_rmw_end_io;
-
-		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
-
+		bio->bi_end_io = raid56_bio_end_io;
 		submit_bio(bio);
 	}
 	/* the actual write will happen once the reads are done */
@@ -1980,25 +1985,13 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio)
 }
 
 /*
- * This is called only for stripes we've read from disk to
- * reconstruct the parity.
+ * This is called only for stripes we've read from disk to reconstruct the
+ * parity.
  */
-static void raid_recover_end_io(struct bio *bio)
+static void raid_recover_end_io_work(struct btrfs_work *work)
 {
-	struct btrfs_raid_bio *rbio = bio->bi_private;
-
-	/*
-	 * we only read stripe pages off the disk, set them
-	 * up to date if there were no errors
-	 */
-	if (bio->bi_status)
-		fail_bio_stripe(rbio, bio);
-	else
-		set_bio_pages_uptodate(bio);
-	bio_put(bio);
-
-	if (!atomic_dec_and_test(&rbio->stripes_pending))
-		return;
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, end_io_work);
 
 	if (atomic_read(&rbio->error) > rbio->bioc->max_errors)
 		rbio_orig_end_io(rbio, BLK_STS_IOERR);
@@ -2082,11 +2075,10 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	 * touch it after that.
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
+	btrfs_init_work(&rbio->end_io_work, raid_recover_end_io_work,
+			NULL, NULL);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_end_io = raid_recover_end_io;
-
-		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
-
+		bio->bi_end_io = raid56_bio_end_io;
 		submit_bio(bio);
 	}
 
@@ -2445,8 +2437,7 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
 	atomic_set(&rbio->stripes_pending, nr_data);
 
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_end_io = raid_write_end_io;
-
+		bio->bi_end_io = raid56_bio_end_io;
 		submit_bio(bio);
 	}
 	return;
@@ -2534,24 +2525,14 @@ static void validate_rbio_for_parity_scrub(struct btrfs_raid_bio *rbio)
  * This will usually kick off finish_rmw once all the bios are read in, but it
  * may trigger parity reconstruction if we had any errors along the way
  */
-static void raid56_parity_scrub_end_io(struct bio *bio)
+static void raid56_parity_scrub_end_io_work(struct btrfs_work *work)
 {
-	struct btrfs_raid_bio *rbio = bio->bi_private;
-
-	if (bio->bi_status)
-		fail_bio_stripe(rbio, bio);
-	else
-		set_bio_pages_uptodate(bio);
-
-	bio_put(bio);
-
-	if (!atomic_dec_and_test(&rbio->stripes_pending))
-		return;
+	struct btrfs_raid_bio *rbio =
+		container_of(work, struct btrfs_raid_bio, end_io_work);
 
 	/*
-	 * this will normally call finish_rmw to start our write
-	 * but if there are any failed stripes we'll reconstruct
-	 * from parity first
+	 * This will normally call finish_rmw to start our write, but if there
+	 * are any failed stripes we'll reconstruct from parity first
 	 */
 	validate_rbio_for_parity_scrub(rbio);
 }
@@ -2621,11 +2602,10 @@ static void raid56_parity_scrub_stripe(struct btrfs_raid_bio *rbio)
 	 * touch it after that.
 	 */
 	atomic_set(&rbio->stripes_pending, bios_to_read);
+	btrfs_init_work(&rbio->end_io_work, raid56_parity_scrub_end_io_work,
+			NULL, NULL);
 	while ((bio = bio_list_pop(&bio_list))) {
-		bio->bi_end_io = raid56_parity_scrub_end_io;
-
-		btrfs_bio_wq_end_io(rbio->bioc->fs_info, bio, BTRFS_WQ_ENDIO_RAID56);
-
+		bio->bi_end_io = raid56_bio_end_io;
 		submit_bio(bio);
 	}
 	/* the actual write will happen once the reads are done */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (18 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 19/40] btrfs: defer I/O completion based on the btrfs_raid_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:34   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio Christoph Hellwig
                   ` (20 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Remove the unused bio_flags argument and clean up the code flow to be
straight forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c   | 42 ++++++++++++++++--------------------------
 fs/btrfs/disk-io.h   |  2 +-
 fs/btrfs/extent_io.c |  2 +-
 3 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dc497e17dcd06..f43c9ab86e617 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -890,6 +890,10 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
 	return btree_csum_one_bio(bio);
 }
 
+/*
+ * Check if metadata writes should be submitted by async threads so that
+ * checksumming can happen in parallel across all CPUs.
+ */
 static bool should_async_write(struct btrfs_fs_info *fs_info,
 			     struct btrfs_inode *bi)
 {
@@ -903,41 +907,27 @@ static bool should_async_write(struct btrfs_fs_info *fs_info,
 }
 
 blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
-				       int mirror_num, unsigned long bio_flags)
+				       int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	blk_status_t ret;
 
-	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
-		/*
-		 * called for a read, do the setup so that checksum validation
-		 * can happen in the async kernel threads
-		 */
-		ret = btrfs_bio_wq_end_io(fs_info, bio,
-					  BTRFS_WQ_ENDIO_METADATA);
-		if (ret)
-			goto out_w_error;
-		ret = btrfs_map_bio(fs_info, bio, mirror_num);
-	} else if (!should_async_write(fs_info, BTRFS_I(inode))) {
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		if (should_async_write(fs_info, BTRFS_I(inode)))
+			return btrfs_wq_submit_bio(inode, bio, mirror_num, 0, 0,
+						   btree_submit_bio_start);
 		ret = btree_csum_one_bio(bio);
 		if (ret)
-			goto out_w_error;
-		ret = btrfs_map_bio(fs_info, bio, mirror_num);
+			return ret;
 	} else {
-		/*
-		 * kthread helpers are used to submit writes so that
-		 * checksumming can happen in parallel across all CPUs
-		 */
-		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, 0,
-					  0, btree_submit_bio_start);
+		/* checksum validation should happen in async threads: */
+		ret = btrfs_bio_wq_end_io(fs_info, bio,
+					  BTRFS_WQ_ENDIO_METADATA);
+		if (ret)
+			return ret;
 	}
 
-	if (ret)
-		goto out_w_error;
-	return 0;
-
-out_w_error:
-	return ret;
+	return btrfs_map_bio(fs_info, bio, mirror_num);
 }
 
 #ifdef CONFIG_MIGRATION
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 2364a30cd9e32..afe3bb96616c9 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -87,7 +87,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 				   struct page *page, u64 start, u64 end,
 				   int mirror);
 blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
-				       int mirror_num, unsigned long bio_flags);
+				       int mirror_num);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
 #endif
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 238252f86d5ad..58ef0f4fca361 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -179,7 +179,7 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 					    bio_flags);
 	else
 		ret = btrfs_submit_metadata_bio(tree->private_data, bio,
-						mirror_num, bio_flags);
+						mirror_num);
 
 	if (ret) {
 		bio->bi_status = ret;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (19 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:44   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio Christoph Hellwig
                   ` (19 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Clean up the code flow to be straight forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 85 +++++++++++++++++++++---------------------------
 1 file changed, 37 insertions(+), 48 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 325e773c6e880..a54b7fd4658d0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2511,67 +2511,56 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_root *root = BTRFS_I(inode)->root;
-	enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
-	blk_status_t ret = 0;
-	int skip_sum;
-	int async = !atomic_read(&BTRFS_I(inode)->sync_writers);
-
-	skip_sum = (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ||
-		test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
-
-	if (btrfs_is_free_space_inode(BTRFS_I(inode)))
-		metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
+	struct btrfs_inode *bi = BTRFS_I(inode);
+	blk_status_t ret;
 
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-		struct page *page = bio_first_bvec_all(bio)->bv_page;
-		loff_t file_offset = page_offset(page);
-
-		ret = extract_ordered_extent(BTRFS_I(inode), bio, file_offset);
+		ret = extract_ordered_extent(bi, bio,
+				page_offset(bio_first_bvec_all(bio)->bv_page));
 		if (ret)
-			goto out;
+			return ret;
 	}
 
-	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		if ((bi->flags & BTRFS_INODE_NODATASUM) ||
+		    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
+			goto mapit;
+
+		if (!atomic_read(&bi->sync_writers)) {
+			/* csum items have already been cloned */
+			if (btrfs_is_data_reloc_root(bi->root))
+				goto mapit;
+			return btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
+						  0, btrfs_submit_bio_start);
+		}
+		ret = btrfs_csum_one_bio(bi, bio, 0, 0);
+		if (ret)
+			return ret;
+	} else {
+		enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
+
+		if (btrfs_is_free_space_inode(bi))
+			metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
+
 		ret = btrfs_bio_wq_end_io(fs_info, bio, metadata);
 		if (ret)
-			goto out;
+			return ret;
 
-		if (bio_flags & EXTENT_BIO_COMPRESSED) {
-			ret = btrfs_submit_compressed_read(inode, bio,
+		if (bio_flags & EXTENT_BIO_COMPRESSED)
+			return btrfs_submit_compressed_read(inode, bio,
 							   mirror_num,
 							   bio_flags);
-			goto out;
-		} else {
-			/*
-			 * Lookup bio sums does extra checks around whether we
-			 * need to csum or not, which is why we ignore skip_sum
-			 * here.
-			 */
-			ret = btrfs_lookup_bio_sums(inode, bio, NULL);
-			if (ret)
-				goto out;
-		}
-		goto mapit;
-	} else if (async && !skip_sum) {
-		/* csum items have already been cloned */
-		if (btrfs_is_data_reloc_root(root))
-			goto mapit;
-		/* we're doing a write, do the async checksumming */
-		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
-					  0, btrfs_submit_bio_start);
-		goto out;
-	} else if (!skip_sum) {
-		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
+
+		/*
+		 * Lookup bio sums does extra checks around whether we need to
+		 * csum or not, which is why we ignore skip_sum here.
+		 */
+		ret = btrfs_lookup_bio_sums(inode, bio, NULL);
 		if (ret)
-			goto out;
+			return ret;
 	}
-
 mapit:
-	ret = btrfs_map_bio(fs_info, bio, mirror_num);
-
-out:
-	return ret;
+	return btrfs_map_bio(fs_info, bio, mirror_num);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (20 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:50   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio Christoph Hellwig
                   ` (18 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Remove the pointless goto just to return err and clean up the code flow
to be a little more straight forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 59 ++++++++++++++++++++++--------------------------
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a54b7fd4658d0..5c9d8e8a98466 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7844,47 +7844,42 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 		struct inode *inode, u64 file_offset, int async_submit)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_inode *bi = BTRFS_I(inode);
 	struct btrfs_dio_private *dip = bio->bi_private;
-	bool write = btrfs_op(bio) == BTRFS_MAP_WRITE;
 	blk_status_t ret;
 
-	/* Check btrfs_submit_bio_hook() for rules about async submit. */
-	if (async_submit)
-		async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
+			/* See btrfs_submit_data_bio for async submit rules */
+			if (async_submit && !atomic_read(&bi->sync_writers))
+				return btrfs_wq_submit_bio(inode, bio, 0, 0,
+					file_offset,
+					btrfs_submit_bio_start_direct_io);
 
-	if (!write) {
+			/*
+			 * If we aren't doing async submit, calculate the csum of the
+			 * bio now.
+			 */
+			ret = btrfs_csum_one_bio(bi, bio, file_offset, 1);
+			if (ret)
+				return ret;
+		}
+	} else {
 		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
 		if (ret)
-			goto err;
-	}
-
-	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
-		goto map;
+			return ret;
 
-	if (write && async_submit) {
-		ret = btrfs_wq_submit_bio(inode, bio, 0, 0, file_offset,
-					  btrfs_submit_bio_start_direct_io);
-		goto err;
-	} else if (write) {
-		/*
-		 * If we aren't doing async submit, calculate the csum of the
-		 * bio now.
-		 */
-		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, 1);
-		if (ret)
-			goto err;
-	} else {
-		u64 csum_offset;
+		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
+			u64 csum_offset;
 
-		csum_offset = file_offset - dip->file_offset;
-		csum_offset >>= fs_info->sectorsize_bits;
-		csum_offset *= fs_info->csum_size;
-		btrfs_bio(bio)->csum = dip->csums + csum_offset;
+			csum_offset = file_offset - dip->file_offset;
+			csum_offset >>= fs_info->sectorsize_bits;
+			csum_offset *= fs_info->csum_size;
+			btrfs_bio(bio)->csum = dip->csums + csum_offset;
+		}
 	}
-map:
-	ret = btrfs_map_bio(fs_info, bio, 0);
-err:
-	return ret;
+
+	return btrfs_map_bio(fs_info, bio, 0);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (21 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:54   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 24/40] btrfs: remove btrfs_end_io_wq Christoph Hellwig
                   ` (17 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

All the I/O going through the btrfs_bio based path are associated with an
inode.  Add a pointer to it to simplify a few things soon.  Also pass the
bio operation to btrfs_bio_alloc given that we have to touch it anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c |  4 +---
 fs/btrfs/extent_io.c   | 23 ++++++++++++-----------
 fs/btrfs/extent_io.h   |  6 ++++--
 fs/btrfs/inode.c       |  3 ++-
 fs/btrfs/volumes.h     |  2 ++
 5 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 71e5b2e9a1ba8..419a09d924290 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -464,10 +464,8 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
-
+	bio = btrfs_bio_alloc(cb->inode, BIO_MAX_VECS, opf);
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
-	bio->bi_opf = opf;
 	bio->bi_private = cb;
 	bio->bi_end_io = endio_func;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 58ef0f4fca361..116a65787e314 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2657,10 +2657,9 @@ int btrfs_repair_one_sector(struct inode *inode,
 		return -EIO;
 	}
 
-	repair_bio = btrfs_bio_alloc(1);
+	repair_bio = btrfs_bio_alloc(inode, 1, REQ_OP_READ);
 	repair_bbio = btrfs_bio(repair_bio);
 	repair_bbio->file_offset = start;
-	repair_bio->bi_opf = REQ_OP_READ;
 	repair_bio->bi_end_io = failed_bio->bi_end_io;
 	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
 	repair_bio->bi_private = failed_bio->bi_private;
@@ -3128,9 +3127,10 @@ static void end_bio_extent_readpage(struct bio *bio)
  * new bio by bio_alloc_bioset as it does not initialize the bytes outside of
  * 'bio' because use of __GFP_ZERO is not supported.
  */
-static inline void btrfs_bio_init(struct btrfs_bio *bbio)
+static inline void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode)
 {
 	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->inode = inode;
 }
 
 /*
@@ -3138,13 +3138,14 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio)
  *
  * The bio allocation is backed by bioset and does not fail.
  */
-struct bio *btrfs_bio_alloc(unsigned int nr_iovecs)
+struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
+		unsigned int opf)
 {
 	struct bio *bio;
 
 	ASSERT(0 < nr_iovecs && nr_iovecs <= BIO_MAX_VECS);
-	bio = bio_alloc_bioset(NULL, nr_iovecs, 0, GFP_NOFS, &btrfs_bioset);
-	btrfs_bio_init(btrfs_bio(bio));
+	bio = bio_alloc_bioset(NULL, nr_iovecs, opf, GFP_NOFS, &btrfs_bioset);
+	btrfs_bio_init(btrfs_bio(bio), inode);
 	return bio;
 }
 
@@ -3156,12 +3157,13 @@ struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio)
 	/* Bio allocation backed by a bioset does not fail */
 	new = bio_alloc_clone(bdev, bio, GFP_NOFS, &btrfs_bioset);
 	bbio = btrfs_bio(new);
-	btrfs_bio_init(bbio);
+	btrfs_bio_init(btrfs_bio(new), btrfs_bio(bio)->inode);
 	bbio->iter = bio->bi_iter;
 	return new;
 }
 
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
+struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
+		u64 offset, u64 size)
 {
 	struct bio *bio;
 	struct btrfs_bio *bbio;
@@ -3173,7 +3175,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
 	ASSERT(bio);
 
 	bbio = btrfs_bio(bio);
-	btrfs_bio_init(bbio);
+	btrfs_bio_init(btrfs_bio(bio), inode);
 
 	bio_trim(bio, offset >> 9, size >> 9);
 	bbio->iter = bio->bi_iter;
@@ -3308,7 +3310,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
+	bio = btrfs_bio_alloc(&inode->vfs_inode, BIO_MAX_VECS, opf);
 	/*
 	 * For compressed page range, its disk_bytenr is always @disk_bytenr
 	 * passed in, no matter if we have added any range into previous bio.
@@ -3321,7 +3323,6 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	bio_ctrl->bio_flags = bio_flags;
 	bio->bi_end_io = end_io_func;
 	bio->bi_private = &inode->io_tree;
-	bio->bi_opf = opf;
 	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 	if (ret < 0)
 		goto error;
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 72d86f228c56e..d5f3d9692ea29 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -277,9 +277,11 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  struct page *locked_page,
 				  u32 bits_to_clear, unsigned long page_ops);
-struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
+struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
+		unsigned int opf);
 struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio);
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
+struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
+		u64 offset, u64 size);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5c9d8e8a98466..18d54cfedf829 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7987,7 +7987,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		 * This will never fail as it's passing GPF_NOFS and
 		 * the allocation is backed by btrfs_bioset.
 		 */
-		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
+		bio = btrfs_bio_clone_partial(inode, dio_bio, clone_offset,
+					      clone_len);
 		bio->bi_private = dip;
 		bio->bi_end_io = btrfs_end_dio_bio;
 		btrfs_bio(bio)->file_offset = file_offset;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c22148bebc2f5..a4f942547002e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -321,6 +321,8 @@ struct btrfs_fs_devices {
  * Mostly for btrfs specific features like csum and mirror_num.
  */
 struct btrfs_bio {
+	struct inode *inode;
+
 	unsigned int mirror_num;
 
 	/* for direct I/O */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 24/40] btrfs: remove btrfs_end_io_wq
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (22 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  0:57   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 25/40] btrfs: remove btrfs_wq_submit_bio Christoph Hellwig
                   ` (16 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Avoid the extra allocation for all read bios by embedding a btrfs_work
and I/O end type into the btrfs_bio structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c |  24 +++------
 fs/btrfs/ctree.h       |   1 -
 fs/btrfs/disk-io.c     | 112 +----------------------------------------
 fs/btrfs/disk-io.h     |  10 ----
 fs/btrfs/inode.c       |  19 +++----
 fs/btrfs/super.c       |  11 +---
 fs/btrfs/volumes.c     |  44 ++++++++++++++--
 fs/btrfs/volumes.h     |  11 ++++
 8 files changed, 66 insertions(+), 166 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 419a09d924290..ae6f986058c75 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -423,20 +423,6 @@ static void end_compressed_bio_write(struct bio *bio)
 	bio_put(bio);
 }
 
-static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
-					  struct compressed_bio *cb,
-					  struct bio *bio, int mirror_num)
-{
-	blk_status_t ret;
-
-	ASSERT(bio->bi_iter.bi_size);
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		return ret;
-	ret = btrfs_map_bio(fs_info, bio, mirror_num);
-	return ret;
-}
-
 /*
  * Allocate a compressed_bio, which will be used to read/write on-disk
  * (aka, compressed) * data.
@@ -468,6 +454,10 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
 	bio->bi_private = cb;
 	bio->bi_end_io = endio_func;
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
+		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
+	else
+		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
 	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
 	if (IS_ERR(em)) {
@@ -594,7 +584,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 					goto finish_cb;
 			}
 
-			ret = submit_compressed_bio(fs_info, cb, bio, 0);
+			ASSERT(bio->bi_iter.bi_size);
+			ret = btrfs_map_bio(fs_info, bio, 0);
 			if (ret)
 				goto finish_cb;
 			bio = NULL;
@@ -930,7 +921,8 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 						  fs_info->sectorsize);
 			sums += fs_info->csum_size * nr_sectors;
 
-			ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
+			ASSERT(comp_bio->bi_iter.bi_size);
+			ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
 			if (ret)
 				goto finish_cb;
 			comp_bio = NULL;
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ebb2d109e8bb2..c22a24ca81652 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -823,7 +823,6 @@ struct btrfs_fs_info {
 	struct btrfs_workqueue *endio_meta_workers;
 	struct btrfs_workqueue *endio_raid56_workers;
 	struct btrfs_workqueue *rmw_workers;
-	struct btrfs_workqueue *endio_meta_write_workers;
 	struct btrfs_workqueue *endio_write_workers;
 	struct btrfs_workqueue *endio_freespace_worker;
 	struct btrfs_workqueue *caching_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f43c9ab86e617..bb910b78bbc82 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -51,7 +51,6 @@
 				 BTRFS_SUPER_FLAG_METADUMP |\
 				 BTRFS_SUPER_FLAG_METADUMP_V2)
 
-static void end_workqueue_fn(struct btrfs_work *work);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
 static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
 				      struct btrfs_fs_info *fs_info);
@@ -64,40 +63,6 @@ static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
 static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info);
 static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info);
 
-/*
- * btrfs_end_io_wq structs are used to do processing in task context when an IO
- * is complete.  This is used during reads to verify checksums, and it is used
- * by writes to insert metadata for new file extents after IO is complete.
- */
-struct btrfs_end_io_wq {
-	struct bio *bio;
-	bio_end_io_t *end_io;
-	void *private;
-	struct btrfs_fs_info *info;
-	blk_status_t status;
-	enum btrfs_wq_endio_type metadata;
-	struct btrfs_work work;
-};
-
-static struct kmem_cache *btrfs_end_io_wq_cache;
-
-int __init btrfs_end_io_wq_init(void)
-{
-	btrfs_end_io_wq_cache = kmem_cache_create("btrfs_end_io_wq",
-					sizeof(struct btrfs_end_io_wq),
-					0,
-					SLAB_MEM_SPREAD,
-					NULL);
-	if (!btrfs_end_io_wq_cache)
-		return -ENOMEM;
-	return 0;
-}
-
-void __cold btrfs_end_io_wq_exit(void)
-{
-	kmem_cache_destroy(btrfs_end_io_wq_cache);
-}
-
 static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
 {
 	if (fs_info->csum_shash)
@@ -726,54 +691,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 	return ret;
 }
 
-static void end_workqueue_bio(struct bio *bio)
-{
-	struct btrfs_end_io_wq *end_io_wq = bio->bi_private;
-	struct btrfs_fs_info *fs_info;
-	struct btrfs_workqueue *wq;
-
-	fs_info = end_io_wq->info;
-	end_io_wq->status = bio->bi_status;
-
-	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
-			wq = fs_info->endio_meta_write_workers;
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
-			wq = fs_info->endio_freespace_worker;
-		else
-			wq = fs_info->endio_write_workers;
-	} else {
-		if (end_io_wq->metadata)
-			wq = fs_info->endio_meta_workers;
-		else
-			wq = fs_info->endio_workers;
-	}
-
-	btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL);
-	btrfs_queue_work(wq, &end_io_wq->work);
-}
-
-blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
-			enum btrfs_wq_endio_type metadata)
-{
-	struct btrfs_end_io_wq *end_io_wq;
-
-	end_io_wq = kmem_cache_alloc(btrfs_end_io_wq_cache, GFP_NOFS);
-	if (!end_io_wq)
-		return BLK_STS_RESOURCE;
-
-	end_io_wq->private = bio->bi_private;
-	end_io_wq->end_io = bio->bi_end_io;
-	end_io_wq->info = info;
-	end_io_wq->status = 0;
-	end_io_wq->bio = bio;
-	end_io_wq->metadata = metadata;
-
-	bio->bi_private = end_io_wq;
-	bio->bi_end_io = end_workqueue_bio;
-	return 0;
-}
-
 static void run_one_async_start(struct btrfs_work *work)
 {
 	struct async_submit_bio *async;
@@ -921,10 +838,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 			return ret;
 	} else {
 		/* checksum validation should happen in async threads: */
-		ret = btrfs_bio_wq_end_io(fs_info, bio,
-					  BTRFS_WQ_ENDIO_METADATA);
-		if (ret)
-			return ret;
+		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_METADATA_READ;
 	}
 
 	return btrfs_map_bio(fs_info, bio, mirror_num);
@@ -1888,25 +1802,6 @@ struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
 	return root;
 }
 
-/*
- * called by the kthread helper functions to finally call the bio end_io
- * functions.  This is where read checksum verification actually happens
- */
-static void end_workqueue_fn(struct btrfs_work *work)
-{
-	struct bio *bio;
-	struct btrfs_end_io_wq *end_io_wq;
-
-	end_io_wq = container_of(work, struct btrfs_end_io_wq, work);
-	bio = end_io_wq->bio;
-
-	bio->bi_status = end_io_wq->status;
-	bio->bi_private = end_io_wq->private;
-	bio->bi_end_io = end_io_wq->end_io;
-	bio_endio(bio);
-	kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
-}
-
 static int cleaner_kthread(void *arg)
 {
 	struct btrfs_root *root = arg;
@@ -2219,7 +2114,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 	 * queues can do metadata I/O operations.
 	 */
 	btrfs_destroy_workqueue(fs_info->endio_meta_workers);
-	btrfs_destroy_workqueue(fs_info->endio_meta_write_workers);
 }
 
 static void free_root_extent_buffers(struct btrfs_root *root)
@@ -2404,9 +2298,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 	fs_info->endio_meta_workers =
 		btrfs_alloc_workqueue(fs_info, "endio-meta", flags,
 				      max_active, 4);
-	fs_info->endio_meta_write_workers =
-		btrfs_alloc_workqueue(fs_info, "endio-meta-write", flags,
-				      max_active, 2);
 	fs_info->endio_raid56_workers =
 		btrfs_alloc_workqueue(fs_info, "endio-raid56", flags,
 				      max_active, 4);
@@ -2429,7 +2320,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
 	if (!(fs_info->workers && fs_info->delalloc_workers &&
 	      fs_info->flush_workers &&
 	      fs_info->endio_workers && fs_info->endio_meta_workers &&
-	      fs_info->endio_meta_write_workers &&
 	      fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
 	      fs_info->endio_freespace_worker && fs_info->rmw_workers &&
 	      fs_info->caching_workers && fs_info->fixup_workers &&
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index afe3bb96616c9..e8900c1b71664 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -17,12 +17,6 @@
  */
 #define BTRFS_BDEV_BLOCKSIZE	(4096)
 
-enum btrfs_wq_endio_type {
-	BTRFS_WQ_ENDIO_DATA,
-	BTRFS_WQ_ENDIO_METADATA,
-	BTRFS_WQ_ENDIO_FREE_SPACE,
-};
-
 static inline u64 btrfs_sb_offset(int mirror)
 {
 	u64 start = SZ_16K;
@@ -119,8 +113,6 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 			  int atomic);
 int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
 		      struct btrfs_key *first_key);
-blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
-			enum btrfs_wq_endio_type metadata);
 blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
 				 int mirror_num, unsigned long bio_flags,
 				 u64 dio_file_offset,
@@ -144,8 +136,6 @@ int btree_lock_page_hook(struct page *page, void *data,
 int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
 int btrfs_get_free_objectid(struct btrfs_root *root, u64 *objectid);
 int btrfs_init_root_free_objectid(struct btrfs_root *root);
-int __init btrfs_end_io_wq_init(void);
-void __cold btrfs_end_io_wq_exit(void);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void btrfs_set_buffer_lockdep_class(u64 objectid,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18d54cfedf829..5a5474fac0b28 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2512,6 +2512,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_inode *bi = BTRFS_I(inode);
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t ret;
 
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
@@ -2537,14 +2538,10 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 		if (ret)
 			return ret;
 	} else {
-		enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
-
 		if (btrfs_is_free_space_inode(bi))
-			metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
-
-		ret = btrfs_bio_wq_end_io(fs_info, bio, metadata);
-		if (ret)
-			return ret;
+			bbio->end_io_type = BTRFS_ENDIO_WQ_FREE_SPACE_READ;
+		else
+			bbio->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
 		if (bio_flags & EXTENT_BIO_COMPRESSED)
 			return btrfs_submit_compressed_read(inode, bio,
@@ -7739,9 +7736,7 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
 
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
 
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		return ret;
+	btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
 
 	refcount_inc(&dip->refs);
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
@@ -7865,9 +7860,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 				return ret;
 		}
 	} else {
-		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-		if (ret)
-			return ret;
+		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
 		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
 			u64 csum_offset;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4d947ba32da9d..33dedca4f0862 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1835,8 +1835,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info,
 	btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size);
-	btrfs_workqueue_set_max(fs_info->endio_meta_write_workers,
-				new_pool_size);
 	btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->delayed_workers, new_pool_size);
@@ -2593,13 +2591,9 @@ static int __init init_btrfs_fs(void)
 	if (err)
 		goto free_delayed_ref;
 
-	err = btrfs_end_io_wq_init();
-	if (err)
-		goto free_prelim_ref;
-
 	err = btrfs_interface_init();
 	if (err)
-		goto free_end_io_wq;
+		goto free_prelim_ref;
 
 	btrfs_print_mod_info();
 
@@ -2615,8 +2609,6 @@ static int __init init_btrfs_fs(void)
 
 unregister_ioctl:
 	btrfs_interface_exit();
-free_end_io_wq:
-	btrfs_end_io_wq_exit();
 free_prelim_ref:
 	btrfs_prelim_ref_exit();
 free_delayed_ref:
@@ -2654,7 +2646,6 @@ static void __exit exit_btrfs_fs(void)
 	extent_state_cache_exit();
 	extent_io_exit();
 	btrfs_interface_exit();
-	btrfs_end_io_wq_exit();
 	unregister_filesystem(&btrfs_fs_type);
 	btrfs_exit_sysfs();
 	btrfs_cleanup_fs_uuids();
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9d1f8c27eff33..9a1eb1166d72f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6659,11 +6659,38 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
 }
 
-static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
+static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_io_context *bioc)
 {
+	struct btrfs_fs_info *fs_info = bioc->fs_info;
+
+	switch (btrfs_bio(bioc->orig_bio)->end_io_type) {
+	case BTRFS_ENDIO_WQ_DATA_READ:
+		return fs_info->endio_workers;
+	case BTRFS_ENDIO_WQ_DATA_WRITE:
+		return fs_info->endio_write_workers;
+	case BTRFS_ENDIO_WQ_METADATA_READ:
+		return fs_info->endio_meta_workers;
+	case BTRFS_ENDIO_WQ_FREE_SPACE_READ:
+		return fs_info->endio_freespace_worker;
+	default:
+		return NULL;
+	}
+}
+
+static void btrfs_end_bio_work(struct btrfs_work *work)
+{
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
+
+	bio_endio(&bbio->bio);
+}
+
+static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
+{
+	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bioc) : NULL;
 	struct bio *bio = bioc->orig_bio;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 
-	btrfs_bio(bio)->mirror_num = bioc->mirror_num;
+	bbio->mirror_num = bioc->mirror_num;
 	bio->bi_private = bioc->private;
 	bio->bi_end_io = bioc->end_io;
 
@@ -6675,7 +6702,14 @@ static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
 		bio->bi_status = BLK_STS_IOERR;
 	else
 		bio->bi_status = BLK_STS_OK;
-	bio_endio(bio);
+
+	if (wq) {
+		btrfs_init_work(&bbio->work, btrfs_end_bio_work, NULL, NULL);
+		btrfs_queue_work(wq, &bbio->work);
+	} else {
+		bio_endio(bio);
+	}
+
 	btrfs_put_bioc(bioc);
 }
 
@@ -6707,7 +6741,7 @@ static void btrfs_end_bio(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	if (atomic_dec_and_test(&bioc->stripes_pending))
-		btrfs_end_bioc(bioc);
+		btrfs_end_bioc(bioc, true);
 }
 
 static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
@@ -6805,7 +6839,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
 			atomic_inc(&bioc->error);
 			if (atomic_dec_and_test(&bioc->stripes_pending))
-				btrfs_end_bioc(bioc);
+				btrfs_end_bioc(bioc, false);
 			continue;
 		}
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index a4f942547002e..51a27180004eb 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -315,6 +315,14 @@ struct btrfs_fs_devices {
 				- 2 * sizeof(struct btrfs_chunk))	\
 				/ sizeof(struct btrfs_stripe) + 1)
 
+enum btrfs_endio_type {
+	BTRFS_ENDIO_NONE = 0,
+	BTRFS_ENDIO_WQ_DATA_READ,
+	BTRFS_ENDIO_WQ_DATA_WRITE,
+	BTRFS_ENDIO_WQ_METADATA_READ,
+	BTRFS_ENDIO_WQ_FREE_SPACE_READ,
+};
+
 /*
  * Additional info to pass along bio.
  *
@@ -324,6 +332,9 @@ struct btrfs_bio {
 	struct inode *inode;
 
 	unsigned int mirror_num;
+	
+	enum btrfs_endio_type end_io_type;
+	struct btrfs_work work;
 
 	/* for direct I/O */
 	u64 file_offset;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 25/40] btrfs: remove btrfs_wq_submit_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (23 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 24/40] btrfs: remove btrfs_end_io_wq Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 26/40] btrfs: refactor btrfs_map_bio Christoph Hellwig
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Reuse the btrfs_work in struct btrfs_bio for asynchronous submission
and remove the extra allocation for async write bios.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 122 +++++++++++++--------------------------------
 fs/btrfs/disk-io.h |   8 +--
 fs/btrfs/inode.c   |  42 +++++++++-------
 3 files changed, 62 insertions(+), 110 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bb910b78bbc82..59c1dc0b37399 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -69,23 +69,6 @@ static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
 		crypto_free_shash(fs_info->csum_shash);
 }
 
-/*
- * async submit bios are used to offload expensive checksumming
- * onto the worker threads.  They checksum file and metadata bios
- * just before they are sent down the IO stack.
- */
-struct async_submit_bio {
-	struct inode *inode;
-	struct bio *bio;
-	extent_submit_bio_start_t *submit_bio_start;
-	int mirror_num;
-
-	/* Optional parameter for submit_bio_start used by direct io */
-	u64 dio_file_offset;
-	struct btrfs_work work;
-	blk_status_t status;
-};
-
 /*
  * Lockdep class keys for extent_buffer->lock's in this root.  For a given
  * eb, the lockdep key is determined by the btrfs_root it belongs to and
@@ -691,18 +674,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 	return ret;
 }
 
-static void run_one_async_start(struct btrfs_work *work)
-{
-	struct async_submit_bio *async;
-	blk_status_t ret;
-
-	async = container_of(work, struct  async_submit_bio, work);
-	ret = async->submit_bio_start(async->inode, async->bio,
-				      async->dio_file_offset);
-	if (ret)
-		async->status = ret;
-}
-
 /*
  * In order to insert checksums into the metadata in large chunks, we wait
  * until bio submission time.   All the pages in the bio are checksummed and
@@ -711,72 +682,51 @@ static void run_one_async_start(struct btrfs_work *work)
  * At IO completion time the csums attached on the ordered extent record are
  * inserted into the tree.
  */
-static void run_one_async_done(struct btrfs_work *work)
+static void btrfs_submit_bio_work(struct btrfs_work *work)
 {
-	struct async_submit_bio *async;
-	struct inode *inode;
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
+	struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb);
+	struct bio *bio = &bbio->bio;
 	blk_status_t ret;
 
-	async = container_of(work, struct  async_submit_bio, work);
-	inode = async->inode;
+	/* Ensure the bio doesn't go away while linked into the workqueue */
+	bio_get(bio);
 
 	/* If an error occurred we just want to clean up the bio and move on */
-	if (async->status) {
-		async->bio->bi_status = async->status;
-		bio_endio(async->bio);
+	if (bio->bi_status) {
+		bio_endio(bio);
 		return;
 	}
 
 	/*
-	 * All of the bios that pass through here are from async helpers.
-	 * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's context.
-	 * This changes nothing when cgroups aren't in use.
+	 * Use REQ_CGROUP_PUNT to issue the bio from the owning cgroup's
+	 * context. This changes nothing when cgroups aren't in use.
 	 */
-	async->bio->bi_opf |= REQ_CGROUP_PUNT;
-	ret = btrfs_map_bio(btrfs_sb(inode->i_sb), async->bio, async->mirror_num);
+	bio->bi_opf |= REQ_CGROUP_PUNT;
+	ret = btrfs_map_bio(fs_info, bio, bbio->mirror_num);
 	if (ret) {
-		async->bio->bi_status = ret;
-		bio_endio(async->bio);
+		bio->bi_status = ret;
+		bio_endio(bio);
 	}
 }
 
-static void run_one_async_free(struct btrfs_work *work)
+static void btrfs_submit_bio_done(struct btrfs_work *work)
 {
-	struct async_submit_bio *async;
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
 
-	async = container_of(work, struct  async_submit_bio, work);
-	kfree(async);
+	bio_put(&bbio->bio);
 }
 
-blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
-				 int mirror_num, unsigned long bio_flags,
-				 u64 dio_file_offset,
-				 extent_submit_bio_start_t *submit_bio_start)
+void btrfs_submit_bio_async(struct btrfs_bio *bbio,
+		void (*start)(struct btrfs_work *work))
 {
-	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
-	struct async_submit_bio *async;
+	ASSERT(bbio->end_io_type == BTRFS_ENDIO_NONE);
 
-	async = kmalloc(sizeof(*async), GFP_NOFS);
-	if (!async)
-		return BLK_STS_RESOURCE;
-
-	async->inode = inode;
-	async->bio = bio;
-	async->mirror_num = mirror_num;
-	async->submit_bio_start = submit_bio_start;
-
-	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
-			run_one_async_free);
-
-	async->dio_file_offset = dio_file_offset;
-
-	async->status = 0;
-
-	if (op_is_sync(bio->bi_opf))
-		btrfs_set_work_high_priority(&async->work);
-
-	btrfs_queue_work(fs_info->workers, &async->work);
-	return 0;
+	btrfs_init_work(&bbio->work, start, btrfs_submit_bio_work,
+			btrfs_submit_bio_done);
+	if (op_is_sync(bbio->bio.bi_opf))
+		btrfs_set_work_high_priority(&bbio->work);
+	btrfs_queue_work(btrfs_sb(bbio->inode->i_sb)->workers, &bbio->work);
 }
 
 static blk_status_t btree_csum_one_bio(struct bio *bio)
@@ -797,14 +747,11 @@ static blk_status_t btree_csum_one_bio(struct bio *bio)
 	return errno_to_blk_status(ret);
 }
 
-static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
-					   u64 dio_file_offset)
+static void btree_submit_bio_start(struct btrfs_work *work)
 {
-	/*
-	 * when we're called for a write, we're already in the async
-	 * submission context.  Just jump into btrfs_map_bio
-	 */
-	return btree_csum_one_bio(bio);
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
+
+	bbio->bio.bi_status = btree_csum_one_bio(&bbio->bio);
 }
 
 /*
@@ -827,18 +774,21 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 				       int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t ret;
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-		if (should_async_write(fs_info, BTRFS_I(inode)))
-			return btrfs_wq_submit_bio(inode, bio, mirror_num, 0, 0,
-						   btree_submit_bio_start);
+		if (should_async_write(fs_info, BTRFS_I(inode))) {
+			bbio->mirror_num = mirror_num;
+			btrfs_submit_bio_async(bbio, btree_submit_bio_start);
+			return BLK_STS_OK;
+		}
 		ret = btree_csum_one_bio(bio);
 		if (ret)
 			return ret;
 	} else {
 		/* checksum validation should happen in async threads: */
-		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_METADATA_READ;
+		bbio->end_io_type = BTRFS_ENDIO_WQ_METADATA_READ;
 	}
 
 	return btrfs_map_bio(fs_info, bio, mirror_num);
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index e8900c1b71664..25fe657ebbac1 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -113,12 +113,8 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
 			  int atomic);
 int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
 		      struct btrfs_key *first_key);
-blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
-				 int mirror_num, unsigned long bio_flags,
-				 u64 dio_file_offset,
-				 extent_submit_bio_start_t *submit_bio_start);
-blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio,
-			  int mirror_num);
+void btrfs_submit_bio_async(struct btrfs_bio *bbio,
+		void (*start)(struct btrfs_work *work));
 int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans,
 			      struct btrfs_root *root);
 int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 5a5474fac0b28..70d82effe5e37 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2300,17 +2300,19 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode,
 }
 
 /*
- * in order to insert checksums into the metadata in large chunks,
- * we wait until bio submission time.   All the pages in the bio are
- * checksummed and sums are attached onto the ordered extent record.
+ * In order to insert checksums into the metadata in large chunks, we wait until
+ * bio submission time.   All the pages in the bio are checksummed and sums are
+ * attached onto the ordered extent record.
  *
- * At IO completion time the cums attached on the ordered extent record
- * are inserted into the btree
+ * At I/O completion time the cums attached on the ordered extent record are
+ * inserted into the btree.
  */
-static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio,
-					   u64 dio_file_offset)
+static void btrfs_submit_bio_start(struct btrfs_work *work)
 {
-	return btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
+
+	bbio->bio.bi_status =
+		btrfs_csum_one_bio(BTRFS_I(bbio->inode), &bbio->bio, 0, 0);
 }
 
 /*
@@ -2531,8 +2533,9 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			/* csum items have already been cloned */
 			if (btrfs_is_data_reloc_root(bi->root))
 				goto mapit;
-			return btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
-						  0, btrfs_submit_bio_start);
+			bbio->mirror_num = mirror_num;
+			btrfs_submit_bio_async(bbio, btrfs_submit_bio_start);
+			return BLK_STS_OK;
 		}
 		ret = btrfs_csum_one_bio(bi, bio, 0, 0);
 		if (ret)
@@ -7803,11 +7806,12 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode,
 				       finish_ordered_fn, uptodate);
 }
 
-static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
-						     struct bio *bio,
-						     u64 dio_file_offset)
+static void btrfs_submit_bio_start_direct_io(struct btrfs_work *work)
 {
-	return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, 1);
+	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
+
+	bbio->bio.bi_status = btrfs_csum_one_bio(BTRFS_I(bbio->inode),
+			&bbio->bio, bbio->file_offset, 1);
 }
 
 static void btrfs_end_dio_bio(struct bio *bio)
@@ -7841,15 +7845,17 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_inode *bi = BTRFS_I(inode);
 	struct btrfs_dio_private *dip = bio->bi_private;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t ret;
 
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
 		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
 			/* See btrfs_submit_data_bio for async submit rules */
-			if (async_submit && !atomic_read(&bi->sync_writers))
-				return btrfs_wq_submit_bio(inode, bio, 0, 0,
-					file_offset,
+			if (async_submit && !atomic_read(&bi->sync_writers)) {
+				btrfs_submit_bio_async(bbio,
 					btrfs_submit_bio_start_direct_io);
+				return BLK_STS_OK;
+			}
 
 			/*
 			 * If we aren't doing async submit, calculate the csum of the
@@ -7860,7 +7866,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 				return ret;
 		}
 	} else {
-		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
+		bbio->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
 		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
 			u64 csum_offset;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 26/40] btrfs: refactor btrfs_map_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (24 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 25/40] btrfs: remove btrfs_wq_submit_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  1:03   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block Christoph Hellwig
                   ` (14 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Use a label for common cleanup, untangle the conditionals for parity
RAID and move all per-stripe handling into submit_stripe_bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 88 ++++++++++++++++++++++------------------------
 1 file changed, 42 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9a1eb1166d72f..1cf0914b33847 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6744,10 +6744,30 @@ static void btrfs_end_bio(struct bio *bio)
 		btrfs_end_bioc(bioc, true);
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
-			      u64 physical, struct btrfs_device *dev)
+static void submit_stripe_bio(struct btrfs_io_context *bioc,
+		struct bio *orig_bio, int dev_nr, bool clone)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
+	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
+	u64 physical = bioc->stripes[dev_nr].physical;
+	struct bio *bio;
+
+	if (!dev || !dev->bdev ||
+	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
+	    (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
+	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
+		atomic_inc(&bioc->error);
+		if (atomic_dec_and_test(&bioc->stripes_pending))
+			btrfs_end_bioc(bioc, false);
+		return;
+	}
+
+	if (clone) {
+		bio = btrfs_bio_clone(dev->bdev, orig_bio);
+	} else {
+		bio = orig_bio;
+		bio_set_dev(bio, dev->bdev);
+	}
 
 	bio->bi_private = bioc;
 	btrfs_bio(bio)->device = dev;
@@ -6782,46 +6802,40 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num)
 {
-	struct btrfs_device *dev;
-	struct bio *first_bio = bio;
 	u64 logical = bio->bi_iter.bi_sector << 9;
-	u64 length = 0;
-	u64 map_length;
+	u64 length = bio->bi_iter.bi_size;
+	u64 map_length = length;
 	int ret;
 	int dev_nr;
 	int total_devs;
 	struct btrfs_io_context *bioc = NULL;
 
-	length = bio->bi_iter.bi_size;
-	map_length = length;
-
 	btrfs_bio_counter_inc_blocked(fs_info);
 	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
 				&map_length, &bioc, mirror_num, 1);
-	if (ret) {
-		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
-	}
+	if (ret)
+		goto out_dec;
 
 	total_devs = bioc->num_stripes;
-	bioc->orig_bio = first_bio;
-	bioc->private = first_bio->bi_private;
-	bioc->end_io = first_bio->bi_end_io;
+	bioc->orig_bio = bio;
+	bioc->private = bio->bi_private;
+	bioc->end_io = bio->bi_end_io;
 	atomic_set(&bioc->stripes_pending, bioc->num_stripes);
 
-	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
-	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
-		/* In this case, map_length has been set to the length of
-		   a single stripe; not the whole write */
+	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		/*
+		 * In this case, map_length has been set to the length of a
+		 * single stripe; not the whole write.
+		 */
 		if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
 			ret = raid56_parity_write(bio, bioc, map_length);
-		} else {
+			goto out_dec;
+		}
+		if (mirror_num > 1) {
 			ret = raid56_parity_recover(bio, bioc, map_length,
 						    mirror_num, 1);
+			goto out_dec;
 		}
-
-		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
 	}
 
 	if (map_length < length) {
@@ -6831,29 +6845,11 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		BUG();
 	}
 
-	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
-		dev = bioc->stripes[dev_nr].dev;
-		if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
-						   &dev->dev_state) ||
-		    (btrfs_op(first_bio) == BTRFS_MAP_WRITE &&
-		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
-			atomic_inc(&bioc->error);
-			if (atomic_dec_and_test(&bioc->stripes_pending))
-				btrfs_end_bioc(bioc, false);
-			continue;
-		}
-
-		if (dev_nr < total_devs - 1) {
-			bio = btrfs_bio_clone(dev->bdev, first_bio);
-		} else {
-			bio = first_bio;
-			bio_set_dev(bio, dev->bdev);
-		}
-
-		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
-	}
+	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
+		submit_stripe_bio(bioc, bio, dev_nr, dev_nr < total_devs - 1);
+out_dec:
 	btrfs_bio_counter_dec(fs_info);
-	return BLK_STS_OK;
+	return errno_to_blk_status(ret);
 }
 
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (25 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 26/40] btrfs: refactor btrfs_map_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  1:08   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio Christoph Hellwig
                   ` (13 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Clear need_raid_map early instead of repeating the same conditional over
and over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 60 ++++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1cf0914b33847..cc9e2565e4b64 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6435,6 +6435,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 
 	map = em->map_lookup;
 
+	if (!(map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) ||
+	    (!need_full_stripe(op) && mirror_num <= 1))
+		need_raid_map = 0;
+
 	*length = geom.len;
 	stripe_len = geom.stripe_len;
 	stripe_nr = geom.stripe_nr;
@@ -6509,37 +6513,32 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 					      dev_replace_is_ongoing);
 			mirror_num = stripe_index - old_stripe_index + 1;
 		}
+	} else if (need_raid_map) {
+		/* push stripe_nr back to the start of the full stripe */
+		stripe_nr = div64_u64(raid56_full_stripe_start,
+				      stripe_len * data_stripes);
 
-	} else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-		if (need_raid_map && (need_full_stripe(op) || mirror_num > 1)) {
-			/* push stripe_nr back to the start of the full stripe */
-			stripe_nr = div64_u64(raid56_full_stripe_start,
-					stripe_len * data_stripes);
-
-			/* RAID[56] write or recovery. Return all stripes */
-			num_stripes = map->num_stripes;
-			max_errors = nr_parity_stripes(map);
-
-			*length = map->stripe_len;
-			stripe_index = 0;
-			stripe_offset = 0;
-		} else {
-			/*
-			 * Mirror #0 or #1 means the original data block.
-			 * Mirror #2 is RAID5 parity block.
-			 * Mirror #3 is RAID6 Q block.
-			 */
-			stripe_nr = div_u64_rem(stripe_nr,
-					data_stripes, &stripe_index);
-			if (mirror_num > 1)
-				stripe_index = data_stripes + mirror_num - 2;
+		/* RAID[56] write or recovery. Return all stripes */
+		num_stripes = map->num_stripes;
+		max_errors = nr_parity_stripes(map);
 
-			/* We distribute the parity blocks across stripes */
-			div_u64_rem(stripe_nr + stripe_index, map->num_stripes,
-					&stripe_index);
-			if (!need_full_stripe(op) && mirror_num <= 1)
-				mirror_num = 1;
-		}
+		*length = map->stripe_len;
+		stripe_index = 0;
+		stripe_offset = 0;
+	} else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		/*
+		 * Mirror #0 or #1 means the original data block.
+		 * Mirror #2 is RAID5 parity block.
+		 * Mirror #3 is RAID6 Q block.
+		 */
+		stripe_nr = div_u64_rem(stripe_nr, data_stripes, &stripe_index);
+		if (mirror_num > 1)
+			stripe_index = data_stripes + mirror_num - 2;
+		/* We distribute the parity blocks across stripes */
+		div_u64_rem(stripe_nr + stripe_index, map->num_stripes,
+			    &stripe_index);
+		if (!need_full_stripe(op) && mirror_num <= 1)
+			mirror_num = 1;
 	} else {
 		/*
 		 * after this, stripe_nr is the number of stripes on this
@@ -6581,8 +6580,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	}
 
 	/* Build raid_map */
-	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
-	    (need_full_stripe(op) || mirror_num > 1)) {
+	if (need_raid_map) {
 		u64 tmp;
 		unsigned rot;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (26 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-23  1:14   ` Qu Wenruo
  2022-03-22 15:55 ` [PATCH 29/40] btrfs: do not allocate a btrfs_bio for low-level bios Christoph Hellwig
                   ` (12 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

There is very little of the I/O context that is actually needed for
issuing a bio.  Add the few needed fields to struct btrfs_bio instead.

The stripes array is still allocated on demand when more than a single
I/O is needed, but for single leg I/O (e.g. all reads) there is no
additional memory allocation now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 147 ++++++++++++++++++++++++++++-----------------
 fs/btrfs/volumes.h |  20 ++++--
 2 files changed, 107 insertions(+), 60 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cc9e2565e4b64..cec3f6b9f5c21 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -253,10 +253,9 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
 static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
-			     enum btrfs_map_op op,
-			     u64 logical, u64 *length,
-			     struct btrfs_io_context **bioc_ret,
-			     int mirror_num, int need_raid_map);
+		enum btrfs_map_op op, u64 logical, u64 *length,
+		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
+		int mirror_num, int need_raid_map);
 
 /*
  * Device locking
@@ -5926,7 +5925,6 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 		sizeof(u64) * (total_stripes),
 		GFP_NOFS|__GFP_NOFAIL);
 
-	atomic_set(&bioc->error, 0);
 	refcount_set(&bioc->refs, 1);
 
 	bioc->fs_info = fs_info;
@@ -6128,7 +6126,7 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	ret = __btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
-				logical, &length, &bioc, 0, 0);
+				logical, &length, &bioc, NULL, 0, 0);
 	if (ret) {
 		ASSERT(bioc == NULL);
 		return ret;
@@ -6397,10 +6395,9 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 }
 
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
-			     enum btrfs_map_op op,
-			     u64 logical, u64 *length,
-			     struct btrfs_io_context **bioc_ret,
-			     int mirror_num, int need_raid_map)
+		enum btrfs_map_op op, u64 logical, u64 *length,
+		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
+		int mirror_num, int need_raid_map)
 {
 	struct extent_map *em;
 	struct map_lookup *map;
@@ -6566,6 +6563,48 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		tgtdev_indexes = num_stripes;
 	}
 
+	if (need_full_stripe(op))
+		max_errors = btrfs_chunk_max_errors(map);
+
+	if (bbio && !need_raid_map) {
+		int replacement_idx = num_stripes;
+
+		if (num_alloc_stripes > 1) {
+			bbio->stripes = kmalloc_array(num_alloc_stripes,
+					sizeof(*bbio->stripes),
+					GFP_NOFS | __GFP_NOFAIL);
+		} else {
+			bbio->stripes = &bbio->__stripe;
+		}
+
+		atomic_set(&bbio->stripes_pending, num_stripes);
+		for (i = 0; i < num_stripes; i++) {
+			struct btrfs_bio_stripe *s = &bbio->stripes[i];
+
+			s->physical = map->stripes[stripe_index].physical +
+				stripe_offset + stripe_nr * map->stripe_len;
+			s->dev = map->stripes[stripe_index].dev;
+			stripe_index++;
+
+			if (op == BTRFS_MAP_WRITE && dev_replace_is_ongoing &&
+			    dev_replace->tgtdev &&
+			    !is_block_group_to_copy(fs_info, logical) &&
+			    s->dev->devid == dev_replace->srcdev->devid) {
+				struct btrfs_bio_stripe *r =
+					&bbio->stripes[replacement_idx++];
+
+				r->physical = s->physical;
+				r->dev = dev_replace->tgtdev;
+				max_errors++;
+				atomic_inc(&bbio->stripes_pending);
+			}
+		}
+
+		bbio->max_errors = max_errors;
+		bbio->mirror_num = mirror_num;
+		goto out;
+	}
+
 	bioc = alloc_btrfs_io_context(fs_info, num_alloc_stripes, tgtdev_indexes);
 	if (!bioc) {
 		ret = -ENOMEM;
@@ -6601,9 +6640,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		sort_parity_stripes(bioc, num_stripes);
 	}
 
-	if (need_full_stripe(op))
-		max_errors = btrfs_chunk_max_errors(map);
-
 	if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL &&
 	    need_full_stripe(op)) {
 		handle_ops_on_dev_replace(op, &bioc, dev_replace, logical,
@@ -6646,7 +6682,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 						     length, bioc_ret);
 
 	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
-				 mirror_num, 0);
+				 NULL, mirror_num, 0);
 }
 
 /* For Scrub/replace */
@@ -6654,14 +6690,15 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 		     u64 logical, u64 *length,
 		     struct btrfs_io_context **bioc_ret)
 {
-	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
+	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, NULL,
+				 0, 1);
 }
 
-static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_io_context *bioc)
+static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_bio *bbio)
 {
-	struct btrfs_fs_info *fs_info = bioc->fs_info;
+	struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb);
 
-	switch (btrfs_bio(bioc->orig_bio)->end_io_type) {
+	switch (bbio->end_io_type) {
 	case BTRFS_ENDIO_WQ_DATA_READ:
 		return fs_info->endio_workers;
 	case BTRFS_ENDIO_WQ_DATA_WRITE:
@@ -6682,21 +6719,22 @@ static void btrfs_end_bio_work(struct btrfs_work *work)
 	bio_endio(&bbio->bio);
 }
 
-static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
+static void btrfs_end_bbio(struct btrfs_bio *bbio, bool async)
 {
-	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bioc) : NULL;
-	struct bio *bio = bioc->orig_bio;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bbio) : NULL;
+	struct bio *bio = &bbio->bio;
 
-	bbio->mirror_num = bioc->mirror_num;
-	bio->bi_private = bioc->private;
-	bio->bi_end_io = bioc->end_io;
+	bio->bi_private = bbio->private;
+	bio->bi_end_io = bbio->end_io;
+
+	if (bbio->stripes != &bbio->__stripe)
+		kfree(bbio->stripes);
 
 	/*
 	 * Only send an error to the higher layers if it is beyond the tolerance
 	 * threshold.
 	 */
-	if (atomic_read(&bioc->error) > bioc->max_errors)
+	if (atomic_read(&bbio->error) > bbio->max_errors)
 		bio->bi_status = BLK_STS_IOERR;
 	else
 		bio->bi_status = BLK_STS_OK;
@@ -6707,16 +6745,14 @@ static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
 	} else {
 		bio_endio(bio);
 	}
-
-	btrfs_put_bioc(bioc);
 }
 
 static void btrfs_end_bio(struct bio *bio)
 {
-	struct btrfs_io_context *bioc = bio->bi_private;
+	struct btrfs_bio *bbio = bio->bi_private;
 
 	if (bio->bi_status) {
-		atomic_inc(&bioc->error);
+		atomic_inc(&bbio->error);
 		if (bio->bi_status == BLK_STS_IOERR ||
 		    bio->bi_status == BLK_STS_TARGET) {
 			struct btrfs_device *dev = btrfs_bio(bio)->device;
@@ -6734,40 +6770,39 @@ static void btrfs_end_bio(struct bio *bio)
 		}
 	}
 
-	if (bio != bioc->orig_bio)
+	if (bio != &bbio->bio)
 		bio_put(bio);
 
-	btrfs_bio_counter_dec(bioc->fs_info);
-	if (atomic_dec_and_test(&bioc->stripes_pending))
-		btrfs_end_bioc(bioc, true);
+	btrfs_bio_counter_dec(btrfs_sb(bbio->inode->i_sb));
+	if (atomic_dec_and_test(&bbio->stripes_pending))
+		btrfs_end_bbio(bbio, true);
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc,
-		struct bio *orig_bio, int dev_nr, bool clone)
+static void submit_stripe_bio(struct btrfs_bio *bbio, int dev_nr, bool clone)
 {
-	struct btrfs_fs_info *fs_info = bioc->fs_info;
-	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
-	u64 physical = bioc->stripes[dev_nr].physical;
+	struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb);
+	struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
+	u64 physical = bbio->stripes[dev_nr].physical;
 	struct bio *bio;
 
 	if (!dev || !dev->bdev ||
 	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
-	    (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
+	    (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE &&
 	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
-		atomic_inc(&bioc->error);
-		if (atomic_dec_and_test(&bioc->stripes_pending))
-			btrfs_end_bioc(bioc, false);
+		atomic_inc(&bbio->error);
+		if (atomic_dec_and_test(&bbio->stripes_pending))
+			btrfs_end_bbio(bbio, false);
 		return;
 	}
 
 	if (clone) {
-		bio = btrfs_bio_clone(dev->bdev, orig_bio);
+		bio = btrfs_bio_clone(dev->bdev, &bbio->bio);
 	} else {
-		bio = orig_bio;
+		bio = &bbio->bio;
 		bio_set_dev(bio, dev->bdev);
 	}
 
-	bio->bi_private = bioc;
+	bio->bi_private = bbio;
 	btrfs_bio(bio)->device = dev;
 	bio->bi_end_io = btrfs_end_bio;
 	bio->bi_iter.bi_sector = physical >> 9;
@@ -6800,6 +6835,7 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num)
 {
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
@@ -6809,18 +6845,17 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	struct btrfs_io_context *bioc = NULL;
 
 	btrfs_bio_counter_inc_blocked(fs_info);
-	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
-				&map_length, &bioc, mirror_num, 1);
+	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
+				&bioc, bbio, mirror_num, 1);
 	if (ret)
 		goto out_dec;
 
-	total_devs = bioc->num_stripes;
-	bioc->orig_bio = bio;
-	bioc->private = bio->bi_private;
-	bioc->end_io = bio->bi_end_io;
-	atomic_set(&bioc->stripes_pending, bioc->num_stripes);
+	bbio->private = bio->bi_private;
+	bbio->end_io = bio->bi_end_io;
+
+	if (bioc) {
+		ASSERT(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK);
 
-	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
 		/*
 		 * In this case, map_length has been set to the length of a
 		 * single stripe; not the whole write.
@@ -6834,6 +6869,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 						    mirror_num, 1);
 			goto out_dec;
 		}
+		ASSERT(0);
 	}
 
 	if (map_length < length) {
@@ -6843,8 +6879,9 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 		BUG();
 	}
 
+	total_devs = atomic_read(&bbio->stripes_pending);
 	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
-		submit_stripe_bio(bioc, bio, dev_nr, dev_nr < total_devs - 1);
+		submit_stripe_bio(bbio, dev_nr, dev_nr < total_devs - 1);
 out_dec:
 	btrfs_bio_counter_dec(fs_info);
 	return errno_to_blk_status(ret);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 51a27180004eb..cd71cd33a9df2 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -323,6 +323,11 @@ enum btrfs_endio_type {
 	BTRFS_ENDIO_WQ_FREE_SPACE_READ,
 };
 
+struct btrfs_bio_stripe {
+	struct btrfs_device *dev;
+	u64 physical;
+};
+
 /*
  * Additional info to pass along bio.
  *
@@ -333,6 +338,16 @@ struct btrfs_bio {
 
 	unsigned int mirror_num;
 	
+	atomic_t stripes_pending;
+	atomic_t error;
+	int max_errors;
+
+	struct btrfs_bio_stripe *stripes;
+	struct btrfs_bio_stripe __stripe;
+
+	bio_end_io_t *end_io;
+	void *private;
+
 	enum btrfs_endio_type end_io_type;
 	struct btrfs_work work;
 
@@ -389,13 +404,8 @@ struct btrfs_io_stripe {
  */
 struct btrfs_io_context {
 	refcount_t refs;
-	atomic_t stripes_pending;
 	struct btrfs_fs_info *fs_info;
 	u64 map_type; /* get from map_lookup->type */
-	bio_end_io_t *end_io;
-	struct bio *orig_bio;
-	void *private;
-	atomic_t error;
 	int max_errors;
 	int num_stripes;
 	int mirror_num;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 29/40] btrfs: do not allocate a btrfs_bio for low-level bios
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (27 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 30/40] iomap: add per-iomap_iter private data Christoph Hellwig
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

The bios submitted from btrfs_map_bio don't interact with the rest
of btrfs.  The only btrfs_bio field is the device.  Add a bbio
backpointer pointer to struct btrfs_bio_stripe so that the private
data can point to the stripe and just use a normal bio allocation
for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 13 -------------
 fs/btrfs/extent_io.h |  1 -
 fs/btrfs/volumes.c   | 21 +++++++++++----------
 fs/btrfs/volumes.h   |  5 ++++-
 4 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 116a65787e314..bfd91ed27bd14 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3149,19 +3149,6 @@ struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
 	return bio;
 }
 
-struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio)
-{
-	struct btrfs_bio *bbio;
-	struct bio *new;
-
-	/* Bio allocation backed by a bioset does not fail */
-	new = bio_alloc_clone(bdev, bio, GFP_NOFS, &btrfs_bioset);
-	bbio = btrfs_bio(new);
-	btrfs_bio_init(btrfs_bio(new), btrfs_bio(bio)->inode);
-	bbio->iter = bio->bi_iter;
-	return new;
-}
-
 struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
 		u64 offset, u64 size)
 {
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index d5f3d9692ea29..3f0cb1ef5fdff 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -279,7 +279,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  u32 bits_to_clear, unsigned long page_ops);
 struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
 		unsigned int opf);
-struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio);
 struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
 		u64 offset, u64 size);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cec3f6b9f5c21..7392b9f2a3323 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6749,23 +6749,21 @@ static void btrfs_end_bbio(struct btrfs_bio *bbio, bool async)
 
 static void btrfs_end_bio(struct bio *bio)
 {
-	struct btrfs_bio *bbio = bio->bi_private;
+	struct btrfs_bio_stripe *stripe = bio->bi_private;
+	struct btrfs_bio *bbio = stripe->bbio;
 
 	if (bio->bi_status) {
 		atomic_inc(&bbio->error);
 		if (bio->bi_status == BLK_STS_IOERR ||
 		    bio->bi_status == BLK_STS_TARGET) {
-			struct btrfs_device *dev = btrfs_bio(bio)->device;
-
-			ASSERT(dev->bdev);
 			if (btrfs_op(bio) == BTRFS_MAP_WRITE)
-				btrfs_dev_stat_inc_and_print(dev,
+				btrfs_dev_stat_inc_and_print(stripe->dev,
 						BTRFS_DEV_STAT_WRITE_ERRS);
 			else if (!(bio->bi_opf & REQ_RAHEAD))
-				btrfs_dev_stat_inc_and_print(dev,
+				btrfs_dev_stat_inc_and_print(stripe->dev,
 						BTRFS_DEV_STAT_READ_ERRS);
 			if (bio->bi_opf & REQ_PREFLUSH)
-				btrfs_dev_stat_inc_and_print(dev,
+				btrfs_dev_stat_inc_and_print(stripe->dev,
 						BTRFS_DEV_STAT_FLUSH_ERRS);
 		}
 	}
@@ -6796,14 +6794,17 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, int dev_nr, bool clone)
 	}
 
 	if (clone) {
-		bio = btrfs_bio_clone(dev->bdev, &bbio->bio);
+		bio = bio_alloc_clone(dev->bdev, &bbio->bio, GFP_NOFS,
+				      &fs_bio_set);
 	} else {
 		bio = &bbio->bio;
 		bio_set_dev(bio, dev->bdev);
+		btrfs_bio(bio)->device = dev;
 	}
 
-	bio->bi_private = bbio;
-	btrfs_bio(bio)->device = dev;
+	bbio->stripes[dev_nr].bbio = bbio;
+	bio->bi_private = &bbio->stripes[dev_nr];
+
 	bio->bi_end_io = btrfs_end_bio;
 	bio->bi_iter.bi_sector = physical >> 9;
 	/*
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index cd71cd33a9df2..5b0e7602434b0 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -325,7 +325,10 @@ enum btrfs_endio_type {
 
 struct btrfs_bio_stripe {
 	struct btrfs_device *dev;
-	u64 physical;
+	union {
+		u64 physical;			/* block mapping */
+		struct btrfs_bio *bbio;		/* for end I/O */
+	};
 };
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 30/40] iomap: add per-iomap_iter private data
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (28 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 29/40] btrfs: do not allocate a btrfs_bio for low-level bios Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 31/40] iomap: add a new ->iomap_iter operation Christoph Hellwig
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Allow the file system to keep state for all iterations.  For now only
wire it up for direct I/O as there is an immediate need for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/direct-io.c  | 3 +++
 include/linux/iomap.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index a434b1829545d..63ee37b40fd8f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -477,6 +477,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		.pos		= iocb->ki_pos,
 		.len		= iov_iter_count(iter),
 		.flags		= IOMAP_DIRECT,
+		.private	= iocb->private,
 	};
 	loff_t end = iomi.pos + iomi.len - 1, ret = 0;
 	bool wait_for_completion =
@@ -504,6 +505,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	dio->submit.waiter = current;
 	dio->submit.poll_bio = NULL;
 
+	WRITE_ONCE(iocb->private, NULL);
+
 	if (iov_iter_rw(iter) == READ) {
 		if (iomi.pos >= dio->i_size)
 			goto out_free_dio;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 97a3a2edb5850..3cc5ee01066d0 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -188,6 +188,7 @@ struct iomap_iter {
 	unsigned flags;
 	struct iomap iomap;
 	struct iomap srcmap;
+	void *private;
 };
 
 int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 31/40] iomap: add a new ->iomap_iter operation
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (29 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 30/40] iomap: add per-iomap_iter private data Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 32/40] iomap: optionally allocate dio bios from a file system bio_set Christoph Hellwig
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

This new operation combines ->iomap_beging, ->iomap_end and the actual
advancing of the iter into a new operation.  Matthew Wilcox originally
proposed this kind of interface to eventually allow inlining most of
the iteration and avoid an indirect call.  But it also allows for more
control in the file system to e.g. keep a little more state over
multiple iterations, which is something we'll need to improve the
btrfs direct I/O code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/iter.c       | 13 +++++++++----
 include/linux/iomap.h | 17 +++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c
index a1c7592d2aded..0bb22f2586e77 100644
--- a/fs/iomap/iter.c
+++ b/fs/iomap/iter.c
@@ -7,7 +7,7 @@
 #include <linux/iomap.h>
 #include "trace.h"
 
-static inline int iomap_iter_advance(struct iomap_iter *iter)
+int iomap_iter_advance(struct iomap_iter *iter)
 {
 	/* handle the previous iteration (if any) */
 	if (iter->iomap.length) {
@@ -27,8 +27,9 @@ static inline int iomap_iter_advance(struct iomap_iter *iter)
 	memset(&iter->srcmap, 0, sizeof(iter->srcmap));
 	return 1;
 }
+EXPORT_SYMBOL_GPL(iomap_iter_advance);
 
-static inline void iomap_iter_done(struct iomap_iter *iter)
+void iomap_iter_done(struct iomap_iter *iter)
 {
 	WARN_ON_ONCE(iter->iomap.offset > iter->pos);
 	WARN_ON_ONCE(iter->iomap.length == 0);
@@ -38,6 +39,7 @@ static inline void iomap_iter_done(struct iomap_iter *iter)
 	if (iter->srcmap.type != IOMAP_HOLE)
 		trace_iomap_iter_srcmap(iter->inode, &iter->srcmap);
 }
+EXPORT_SYMBOL_GPL(iomap_iter_done);
 
 /**
  * iomap_iter - iterate over a ranges in a file
@@ -58,10 +60,13 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops)
 {
 	int ret;
 
+	if (ops->iomap_iter)
+		return ops->iomap_iter(iter);
+
 	if (iter->iomap.length && ops->iomap_end) {
 		ret = ops->iomap_end(iter->inode, iter->pos, iomap_length(iter),
-				iter->processed > 0 ? iter->processed : 0,
-				iter->flags, &iter->iomap);
+				iomap_processed(iter), iter->flags,
+				&iter->iomap);
 		if (ret < 0 && !iter->processed)
 			return ret;
 	}
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 3cc5ee01066d0..494f530aa8bf8 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -14,6 +14,7 @@ struct address_space;
 struct fiemap_extent_info;
 struct inode;
 struct iomap_dio;
+struct iomap_iter;
 struct iomap_writepage_ctx;
 struct iov_iter;
 struct kiocb;
@@ -148,6 +149,8 @@ struct iomap_page_ops {
 #endif /* CONFIG_FS_DAX */
 
 struct iomap_ops {
+	int (*iomap_iter)(struct iomap_iter *iter);
+
 	/*
 	 * Return the existing mapping at pos, or reserve space starting at
 	 * pos for up to length, as long as we can do it as a single mapping.
@@ -208,6 +211,17 @@ static inline u64 iomap_length(const struct iomap_iter *iter)
 	return min(iter->len, end - iter->pos);
 }
 
+/**
+ * iomap_length - amount of data processed by the previous iomap iteration
+ * @iter: iteration structure
+ */
+static inline u64 iomap_processed(const struct iomap_iter *iter)
+{
+	if (iter->processed <= 0)
+		return 0;
+	return iter->processed;
+}
+
 /**
  * iomap_iter_srcmap - return the source map for the current iomap iteration
  * @i: iteration structure
@@ -224,6 +238,9 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
 	return &i->iomap;
 }
 
+int iomap_iter_advance(struct iomap_iter *iter);
+void iomap_iter_done(struct iomap_iter *iter);
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 32/40] iomap: optionally allocate dio bios from a file system bio_set
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (30 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 31/40] iomap: add a new ->iomap_iter operation Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:55 ` [PATCH 33/40] iomap: add a hint to ->submit_io if there is more I/O coming Christoph Hellwig
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Allow the file system to provide a specific bio_set for allocating
direct I/O bios.  This will allow file systems that use the
->submit_io hook to stash away additional information for file system
use.

To make use of this additional space for information the file system
also needs to hook into the completion path and thus override
the ->bi_end_io callback.  Export iomap_dio_bio_end_io so that the
file system can call back into the core iomap direct I/O code after
doing so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/iomap/direct-io.c  | 18 ++++++++++++++----
 include/linux/iomap.h |  3 +++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 63ee37b40fd8f..392ee8fe1f8c3 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -50,6 +50,15 @@ struct iomap_dio {
 	};
 };
 
+static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter,
+		struct iomap_dio *dio, unsigned short nr_vecs, unsigned int opf)
+{
+	if (dio->dops && dio->dops->bio_set)
+		return bio_alloc_bioset(iter->iomap.bdev, nr_vecs, opf,
+					GFP_KERNEL, dio->dops->bio_set);
+	return bio_alloc(iter->iomap.bdev, nr_vecs, opf, GFP_KERNEL);
+}
+
 static void iomap_dio_submit_bio(const struct iomap_iter *iter,
 		struct iomap_dio *dio, struct bio *bio, loff_t pos)
 {
@@ -143,7 +152,7 @@ static inline void iomap_dio_set_error(struct iomap_dio *dio, int ret)
 	cmpxchg(&dio->error, 0, ret);
 }
 
-static void iomap_dio_bio_end_io(struct bio *bio)
+void iomap_dio_bio_end_io(struct bio *bio)
 {
 	struct iomap_dio *dio = bio->bi_private;
 	bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY);
@@ -175,15 +184,16 @@ static void iomap_dio_bio_end_io(struct bio *bio)
 		bio_put(bio);
 	}
 }
+EXPORT_SYMBOL_GPL(iomap_dio_bio_end_io);
 
 static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 		loff_t pos, unsigned len)
 {
 	struct page *page = ZERO_PAGE(0);
-	int flags = REQ_SYNC | REQ_IDLE;
 	struct bio *bio;
 
-	bio = bio_alloc(iter->iomap.bdev, 1, REQ_OP_WRITE | flags, GFP_KERNEL);
+	bio = iomap_dio_alloc_bio(iter, dio, 1,
+			REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
 	bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
 	bio->bi_private = dio;
 	bio->bi_end_io = iomap_dio_bio_end_io;
@@ -307,7 +317,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 			goto out;
 		}
 
-		bio = bio_alloc(iomap->bdev, nr_pages, bio_opf, GFP_KERNEL);
+		bio = iomap_dio_alloc_bio(iter, dio, nr_pages, bio_opf);
 		bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
 		bio->bi_ioprio = dio->iocb->ki_ioprio;
 		bio->bi_private = dio;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 494f530aa8bf8..5648753973de0 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -341,6 +341,8 @@ struct iomap_dio_ops {
 		      unsigned flags);
 	void (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
 		          loff_t file_offset);
+
+	struct bio_set *bio_set;
 };
 
 /*
@@ -370,6 +372,7 @@ struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, size_t done_before);
 ssize_t iomap_dio_complete(struct iomap_dio *dio);
+void iomap_dio_bio_end_io(struct bio *bio);
 
 #ifdef CONFIG_SWAP
 struct file;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 33/40] iomap: add a hint to ->submit_io if there is more I/O coming
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (31 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 32/40] iomap: optionally allocate dio bios from a file system bio_set Christoph Hellwig
@ 2022-03-22 15:55 ` Christoph Hellwig
  2022-03-22 15:56 ` [PATCH 34/40] btrfs: add a btrfs_dio_rw wrapper Christoph Hellwig
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:55 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Btrfs would like to optimize checksum offloading to threads depending on
if there is more I/O to come.  Pass that information to the ->submit_io
method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c      | 2 +-
 fs/iomap/direct-io.c  | 8 ++++----
 include/linux/iomap.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 70d82effe5e37..2eb7e730c2afc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7917,7 +7917,7 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 }
 
 static void btrfs_submit_direct(const struct iomap_iter *iter,
-		struct bio *dio_bio, loff_t file_offset)
+		struct bio *dio_bio, loff_t file_offset, bool more)
 {
 	struct inode *inode = iter->inode;
 	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 392ee8fe1f8c3..3f18b04d73cde 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -60,7 +60,7 @@ static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter,
 }
 
 static void iomap_dio_submit_bio(const struct iomap_iter *iter,
-		struct iomap_dio *dio, struct bio *bio, loff_t pos)
+		struct iomap_dio *dio, struct bio *bio, loff_t pos, bool more)
 {
 	atomic_inc(&dio->ref);
 
@@ -70,7 +70,7 @@ static void iomap_dio_submit_bio(const struct iomap_iter *iter,
 	}
 
 	if (dio->dops && dio->dops->submit_io)
-		dio->dops->submit_io(iter, bio, pos);
+		dio->dops->submit_io(iter, bio, pos, more);
 	else
 		submit_bio(bio);
 }
@@ -200,7 +200,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 
 	get_page(page);
 	__bio_add_page(bio, page, len, 0);
-	iomap_dio_submit_bio(iter, dio, bio, pos);
+	iomap_dio_submit_bio(iter, dio, bio, pos, false);
 }
 
 /*
@@ -353,7 +353,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
 		 */
 		if (nr_pages)
 			dio->iocb->ki_flags &= ~IOCB_HIPRI;
-		iomap_dio_submit_bio(iter, dio, bio, pos);
+		iomap_dio_submit_bio(iter, dio, bio, pos, nr_pages);
 		pos += n;
 	} while (nr_pages);
 
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 5648753973de0..c4a2fa441e4f9 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -340,7 +340,7 @@ struct iomap_dio_ops {
 	int (*end_io)(struct kiocb *iocb, ssize_t size, int error,
 		      unsigned flags);
 	void (*submit_io)(const struct iomap_iter *iter, struct bio *bio,
-		          loff_t file_offset);
+		          loff_t file_offset, bool more);
 
 	struct bio_set *bio_set;
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 34/40] btrfs: add a btrfs_dio_rw wrapper
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (32 preceding siblings ...)
  2022-03-22 15:55 ` [PATCH 33/40] iomap: add a hint to ->submit_io if there is more I/O coming Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-22 15:56 ` [PATCH 35/40] btrfs: allocate dio_data on stack Christoph Hellwig
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Add a wrapper around iomap_dio_rw that keeps the direct I/O internals
isolated in inode.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/ctree.h |  4 ++--
 fs/btrfs/file.c  |  6 ++----
 fs/btrfs/inode.c | 11 +++++++++--
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c22a24ca81652..196f308e3e0d7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3255,9 +3255,9 @@ int btrfs_writepage_cow_fixup(struct page *page);
 void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
 					  struct page *page, u64 start,
 					  u64 end, bool uptodate);
+ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
+		     size_t done_before);
 extern const struct dentry_operations btrfs_dentry_operations;
-extern const struct iomap_ops btrfs_dio_iomap_ops;
-extern const struct iomap_dio_ops btrfs_dio_ops;
 
 /* Inode locking type flags, by default the exclusive lock is taken */
 #define BTRFS_ILOCK_SHARED	(1U << 0)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index a0179cc62913b..752ef6ecb311d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1967,8 +1967,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	 */
 again:
 	from->nofault = true;
-	err = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
-			   IOMAP_DIO_PARTIAL, written);
+	err = btrfs_dio_rw(iocb, from, written);
 	from->nofault = false;
 
 	/* No increment (+=) because iomap returns a cumulative value. */
@@ -3719,8 +3718,7 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to)
 	 */
 	pagefault_disable();
 	to->nofault = true;
-	ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
-			   IOMAP_DIO_PARTIAL, read);
+	ret = btrfs_dio_rw(iocb, to, read);
 	to->nofault = false;
 	pagefault_enable();
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2eb7e730c2afc..ab3ff4747266a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8050,15 +8050,22 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	btrfs_dio_private_put(dip);
 }
 
-const struct iomap_ops btrfs_dio_iomap_ops = {
+static const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_begin            = btrfs_dio_iomap_begin,
 	.iomap_end              = btrfs_dio_iomap_end,
 };
 
-const struct iomap_dio_ops btrfs_dio_ops = {
+static const struct iomap_dio_ops btrfs_dio_ops = {
 	.submit_io		= btrfs_submit_direct,
 };
 
+ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
+		size_t done_before)
+{
+	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
+			   IOMAP_DIO_PARTIAL, done_before);
+}
+
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			u64 start, u64 len)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 35/40] btrfs: allocate dio_data on stack
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (33 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 34/40] btrfs: add a btrfs_dio_rw wrapper Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-22 15:56 ` [PATCH 36/40] btrfs: implement ->iomap_iter Christoph Hellwig
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Make use of the new iomap_iter->private field to avoid a memory
allocation per iomap range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ab3ff4747266a..adcd392caa78e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7511,10 +7511,11 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 		loff_t length, unsigned int flags, struct iomap *iomap,
 		struct iomap *srcmap)
 {
+	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
-	struct btrfs_dio_data *dio_data = NULL;
+	struct btrfs_dio_data *dio_data = iter->private;
 	u64 lockstart, lockend;
 	const bool write = !!(flags & IOMAP_WRITE);
 	int ret = 0;
@@ -7541,21 +7542,15 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 			return ret;
 	}
 
-	dio_data = kzalloc(sizeof(*dio_data), GFP_NOFS);
-	if (!dio_data)
-		return -ENOMEM;
-
-	iomap->private = dio_data;
-
+	dio_data->submitted = 0;
+	dio_data->data_reserved = NULL;
 
 	/*
 	 * If this errors out it's because we couldn't invalidate pagecache for
 	 * this range and we need to fallback to buffered.
 	 */
-	if (lock_extent_direct(inode, lockstart, lockend, &cached_state, write)) {
-		ret = -ENOTBLK;
-		goto err;
-	}
+	if (lock_extent_direct(inode, lockstart, lockend, &cached_state, write))
+		return -ENOTBLK;
 
 	em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, start, len);
 	if (IS_ERR(em)) {
@@ -7664,24 +7659,22 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 unlock_err:
 	unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend,
 			     &cached_state);
-err:
-	kfree(dio_data);
-
 	return ret;
 }
 
 static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 		ssize_t written, unsigned int flags, struct iomap *iomap)
 {
-	int ret = 0;
-	struct btrfs_dio_data *dio_data = iomap->private;
+	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
+	struct btrfs_dio_data *dio_data = iter->private;
 	size_t submitted = dio_data->submitted;
 	const bool write = !!(flags & IOMAP_WRITE);
+	int ret = 0;
 
 	if (!write && (iomap->type == IOMAP_HOLE)) {
 		/* If reading from a hole, unlock and return */
 		unlock_extent(&BTRFS_I(inode)->io_tree, pos, pos + length - 1);
-		goto out;
+		return 0;
 	}
 
 	if (submitted < length) {
@@ -7698,10 +7691,6 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 
 	if (write)
 		extent_changeset_free(dio_data->data_reserved);
-out:
-	kfree(dio_data);
-	iomap->private = NULL;
-
 	return ret;
 }
 
@@ -7935,7 +7924,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	int ret;
 	blk_status_t status;
 	struct btrfs_io_geometry geom;
-	struct btrfs_dio_data *dio_data = iter->iomap.private;
+	struct btrfs_dio_data *dio_data = iter->private;
 	struct extent_map *em = NULL;
 
 	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
@@ -8062,6 +8051,9 @@ static const struct iomap_dio_ops btrfs_dio_ops = {
 ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		size_t done_before)
 {
+	struct btrfs_dio_data data;
+
+	iocb->private = &data;
 	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
 			   IOMAP_DIO_PARTIAL, done_before);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 36/40] btrfs: implement ->iomap_iter
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (34 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 35/40] btrfs: allocate dio_data on stack Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-22 15:56 ` [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper Christoph Hellwig
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Switch from the separate ->iomap_begin and ->iomap_end methods to
->iomap_iter to allow or greater control over the iteration in subsequent
patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/inode.c | 69 ++++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index adcd392caa78e..d4faed31d36a4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7507,17 +7507,18 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	return ret;
 }
 
-static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
-		loff_t length, unsigned int flags, struct iomap *iomap,
-		struct iomap *srcmap)
+static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
 {
-	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
+	struct inode *inode = iter->inode;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	loff_t start = iter->pos;
+	loff_t length = iter->len;
+	struct iomap *iomap = &iter->iomap;
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = iter->private;
 	u64 lockstart, lockend;
-	const bool write = !!(flags & IOMAP_WRITE);
+	bool write = (iter->flags & IOMAP_WRITE);
 	int ret = 0;
 	u64 len = length;
 	bool unlock_extents = false;
@@ -7602,7 +7603,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	 * which we return back to our caller - we should only return EIOCBQUEUED
 	 * after we have submitted bios for all the extents in the range.
 	 */
-	if ((flags & IOMAP_NOWAIT) && len < length) {
+	if ((iter->flags & IOMAP_NOWAIT) && len < length) {
 		free_extent_map(em);
 		ret = -EAGAIN;
 		goto unlock_err;
@@ -7662,30 +7663,28 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
 	return ret;
 }
 
-static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
-		ssize_t written, unsigned int flags, struct iomap *iomap)
+static int btrfs_dio_iomap_end(struct iomap_iter *iter)
 {
-	struct iomap_iter *iter = container_of(iomap, struct iomap_iter, iomap);
 	struct btrfs_dio_data *dio_data = iter->private;
-	size_t submitted = dio_data->submitted;
-	const bool write = !!(flags & IOMAP_WRITE);
+	struct btrfs_inode *bi = BTRFS_I(iter->inode);
+	bool write = (iter->flags & IOMAP_WRITE);
+	loff_t length = iomap_length(iter);
+	loff_t pos = iter->pos;
 	int ret = 0;
 
-	if (!write && (iomap->type == IOMAP_HOLE)) {
+	if (!write && iter->iomap.type == IOMAP_HOLE) {
 		/* If reading from a hole, unlock and return */
-		unlock_extent(&BTRFS_I(inode)->io_tree, pos, pos + length - 1);
+		unlock_extent(&bi->io_tree, pos, pos + length - 1);
 		return 0;
 	}
 
-	if (submitted < length) {
-		pos += submitted;
-		length -= submitted;
+	if (dio_data->submitted < length) {
+		pos += dio_data->submitted;
+		length -= dio_data->submitted;
 		if (write)
-			__endio_write_update_ordered(BTRFS_I(inode), pos,
-					length, false);
+			__endio_write_update_ordered(bi, pos, length, false);
 		else
-			unlock_extent(&BTRFS_I(inode)->io_tree, pos,
-				      pos + length - 1);
+			unlock_extent(&bi->io_tree, pos, pos + length - 1);
 		ret = -ENOTBLK;
 	}
 
@@ -7694,6 +7693,31 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
+static int btrfs_dio_iomap_iter(struct iomap_iter *iter)
+{
+	int ret;
+
+	if (iter->iomap.length) {
+		ret = btrfs_dio_iomap_end(iter);
+		if (ret < 0 && !iter->processed)
+			return ret;
+	}
+
+	ret = iomap_iter_advance(iter);
+	if (ret <= 0)
+		return ret;
+
+	ret = btrfs_dio_iomap_begin(iter);
+	if (ret < 0)
+		return ret;
+	iomap_iter_done(iter);
+	return 1;
+}
+
+static const struct iomap_ops btrfs_dio_iomap_ops = {
+	.iomap_iter		= btrfs_dio_iomap_iter,
+};
+
 static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
 {
 	/*
@@ -8039,11 +8063,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	btrfs_dio_private_put(dip);
 }
 
-static const struct iomap_ops btrfs_dio_iomap_ops = {
-	.iomap_begin            = btrfs_dio_iomap_begin,
-	.iomap_end              = btrfs_dio_iomap_end,
-};
-
 static const struct iomap_dio_ops btrfs_dio_ops = {
 	.submit_io		= btrfs_submit_direct,
 };
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (35 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 36/40] btrfs: implement ->iomap_iter Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-23  1:23   ` Qu Wenruo
  2022-03-22 15:56 ` [PATCH 38/40] btrfs: return a blk_status_t from btrfs_repair_one_sector Christoph Hellwig
                   ` (3 subsequent siblings)
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

---
 fs/btrfs/compression.c | 26 ++++++++-----------------
 fs/btrfs/extent_io.c   | 24 ++++++++---------------
 fs/btrfs/inode.c       | 32 ++++++++++--------------------
 fs/btrfs/volumes.c     | 44 +++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/volumes.h     | 20 ++-----------------
 5 files changed, 69 insertions(+), 77 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ae6f986058c75..fca025c327a7e 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -445,10 +445,9 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 					u64 *next_stripe_start)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
-	struct btrfs_io_geometry geom;
-	struct extent_map *em;
+	struct block_device *bdev;
 	struct bio *bio;
-	int ret;
+	u64 len;
 
 	bio = btrfs_bio_alloc(cb->inode, BIO_MAX_VECS, opf);
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
@@ -459,23 +458,14 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	else
 		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
-	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
-	if (IS_ERR(em)) {
-		bio_put(bio);
-		return ERR_CAST(em);
-	}
+	bdev = btrfs_get_stripe_info(fs_info, btrfs_op(bio), disk_bytenr,
+			      fs_info->sectorsize, &len);
+	if (IS_ERR(bdev))
+		return ERR_CAST(bdev);
 
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
-		bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev);
-
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom);
-	free_extent_map(em);
-	if (ret < 0) {
-		bio_put(bio);
-		return ERR_PTR(ret);
-	}
-	*next_stripe_start = disk_bytenr + geom.len;
-
+		bio_set_dev(bio, bdev);
+	*next_stripe_start = disk_bytenr + len;
 	return bio;
 }
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bfd91ed27bd14..10fc5e4dd14a3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3235,11 +3235,10 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 			       struct btrfs_inode *inode, u64 file_offset)
 {
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_io_geometry geom;
 	struct btrfs_ordered_extent *ordered;
-	struct extent_map *em;
 	u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
-	int ret;
+	struct block_device *bdev;
+	u64 len;
 
 	/*
 	 * Pages for compressed extent are never submitted to disk directly,
@@ -3253,19 +3252,12 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 		bio_ctrl->len_to_stripe_boundary = U32_MAX;
 		return 0;
 	}
-	em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
-	if (IS_ERR(em))
-		return PTR_ERR(em);
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
-				    logical, &geom);
-	free_extent_map(em);
-	if (ret < 0) {
-		return ret;
-	}
-	if (geom.len > U32_MAX)
-		bio_ctrl->len_to_stripe_boundary = U32_MAX;
-	else
-		bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
+
+	bdev = btrfs_get_stripe_info(fs_info, btrfs_op(bio_ctrl->bio), logical,
+			      fs_info->sectorsize, &len);
+	if (IS_ERR(bdev))
+		return PTR_ERR(bdev);
+	bio_ctrl->len_to_stripe_boundary = min(len, (u64)U32_MAX);
 
 	if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
 		bio_ctrl->len_to_oe_boundary = U32_MAX;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d4faed31d36a4..3f7e1779ff19f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7944,12 +7944,9 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	u64 submit_len;
 	u64 clone_offset = 0;
 	u64 clone_len;
-	u64 logical;
-	int ret;
 	blk_status_t status;
-	struct btrfs_io_geometry geom;
 	struct btrfs_dio_data *dio_data = iter->private;
-	struct extent_map *em = NULL;
+	u64 len;
 
 	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
 	if (!dip) {
@@ -7978,21 +7975,16 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	submit_len = dio_bio->bi_iter.bi_size;
 
 	do {
-		logical = start_sector << 9;
-		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
-		if (IS_ERR(em)) {
-			status = errno_to_blk_status(PTR_ERR(em));
-			em = NULL;
-			goto out_err_em;
-		}
-		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
-					    logical, &geom);
-		if (ret) {
-			status = errno_to_blk_status(ret);
-			goto out_err_em;
+		struct block_device *bdev;
+
+		bdev = btrfs_get_stripe_info(fs_info, btrfs_op(dio_bio),
+				      start_sector << 9, submit_len, &len);
+		if (IS_ERR(bdev)) {
+			status = errno_to_blk_status(PTR_ERR(bdev));
+			goto out_err;
 		}
 
-		clone_len = min(submit_len, geom.len);
+		clone_len = min(submit_len, len);
 		ASSERT(clone_len <= UINT_MAX);
 
 		/*
@@ -8044,20 +8036,16 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 			bio_put(bio);
 			if (submit_len > 0)
 				refcount_dec(&dip->refs);
-			goto out_err_em;
+			goto out_err;
 		}
 
 		dio_data->submitted += clone_len;
 		clone_offset += clone_len;
 		start_sector += clone_len >> 9;
 		file_offset += clone_len;
-
-		free_extent_map(em);
 	} while (submit_len > 0);
 	return;
 
-out_err_em:
-	free_extent_map(em);
 out_err:
 	dip->dio_bio->bi_status = status;
 	btrfs_dio_private_put(dip);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7392b9f2a3323..f70bb3569a7ae 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6301,6 +6301,21 @@ static bool need_full_stripe(enum btrfs_map_op op)
 	return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
 }
 
+struct btrfs_io_geometry {
+	/* remaining bytes before crossing a stripe */
+	u64 len;
+	/* offset of logical address in chunk */
+	u64 offset;
+	/* length of single IO stripe */
+	u64 stripe_len;
+	/* number of stripe where address falls */
+	u64 stripe_nr;
+	/* offset of address in stripe */
+	u64 stripe_offset;
+	/* offset of raid56 stripe into the chunk */
+	u64 raid56_stripe_offset;
+};
+
 /*
  * Calculate the geometry of a particular (address, len) tuple. This
  * information is used to calculate how big a particular bio can get before it
@@ -6315,9 +6330,10 @@ static bool need_full_stripe(enum btrfs_map_op op)
  * Returns < 0 in case a chunk for the given logical address cannot be found,
  * usually shouldn't happen unless @logical is corrupted, 0 otherwise.
  */
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom)
+static int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info,
+		struct extent_map *em,
+		enum btrfs_map_op op, u64 logical,
+		struct btrfs_io_geometry *io_geom)
 {
 	struct map_lookup *map;
 	u64 len;
@@ -6394,6 +6410,28 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 	return 0;
 }
 
+struct block_device *btrfs_get_stripe_info(struct btrfs_fs_info *fs_info,
+		enum btrfs_map_op op, u64 logical, u64 len, u64 *lenp)
+{
+	struct btrfs_io_geometry geom;
+	struct block_device *bdev;
+	struct extent_map *em;
+	int ret;
+
+	em = btrfs_get_chunk_map(fs_info, logical, len);
+	if (IS_ERR(em))
+		return ERR_CAST(em);
+
+	bdev = em->map_lookup->stripes[0].dev->bdev;
+
+	ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom);
+	free_extent_map(em);
+	if (ret < 0)
+		return ERR_PTR(ret);
+	*lenp = geom.len;
+	return bdev;
+}
+
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		enum btrfs_map_op op, u64 logical, u64 *length,
 		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5b0e7602434b0..c6425760f69da 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -17,21 +17,6 @@ extern struct mutex uuid_mutex;
 
 #define BTRFS_STRIPE_LEN	SZ_64K
 
-struct btrfs_io_geometry {
-	/* remaining bytes before crossing a stripe */
-	u64 len;
-	/* offset of logical address in chunk */
-	u64 offset;
-	/* length of single IO stripe */
-	u64 stripe_len;
-	/* number of stripe where address falls */
-	u64 stripe_nr;
-	/* offset of address in stripe */
-	u64 stripe_offset;
-	/* offset of raid56 stripe into the chunk */
-	u64 raid56_stripe_offset;
-};
-
 /*
  * Use sequence counter to get consistent device stat data on
  * 32-bit processors.
@@ -520,9 +505,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 		     u64 logical, u64 *length,
 		     struct btrfs_io_context **bioc_ret);
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom);
+struct block_device *btrfs_get_stripe_info(struct btrfs_fs_info *fs_info,
+		enum btrfs_map_op op, u64 logical, u64 length, u64 *lenp);
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 38/40] btrfs: return a blk_status_t from btrfs_repair_one_sector
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (36 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-22 15:56 ` [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector Christoph Hellwig
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

This is what the submit hook returns and what the callers want anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 14 +++++++-------
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/inode.c     |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 10fc5e4dd14a3..2fdb5d7dd51e1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2626,7 +2626,7 @@ static bool btrfs_check_repairable(struct inode *inode,
 	return true;
 }
 
-int btrfs_repair_one_sector(struct inode *inode,
+blk_status_t btrfs_repair_one_sector(struct inode *inode,
 			    struct bio *failed_bio, u32 bio_offset,
 			    struct page *page, unsigned int pgoff,
 			    u64 start, int failed_mirror,
@@ -2649,12 +2649,12 @@ int btrfs_repair_one_sector(struct inode *inode,
 
 	failrec = btrfs_get_io_failure_record(inode, start);
 	if (IS_ERR(failrec))
-		return PTR_ERR(failrec);
+		return errno_to_blk_status(PTR_ERR(failrec));
 
 
 	if (!btrfs_check_repairable(inode, failrec, failed_mirror)) {
 		free_io_failure(failure_tree, tree, failrec);
-		return -EIO;
+		return BLK_STS_IOERR;
 	}
 
 	repair_bio = btrfs_bio_alloc(inode, 1, REQ_OP_READ);
@@ -2685,7 +2685,7 @@ int btrfs_repair_one_sector(struct inode *inode,
 		free_io_failure(failure_tree, tree, failrec);
 		bio_put(repair_bio);
 	}
-	return blk_status_to_errno(status);
+	return status;
 }
 
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
@@ -2725,7 +2725,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	const u32 sectorsize = fs_info->sectorsize;
 	const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits;
-	int error = 0;
+	blk_status_t error = BLK_STS_OK;
 	int i;
 
 	BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE);
@@ -2744,7 +2744,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
 		const unsigned int offset = i * sectorsize;
 		struct extent_state *cached = NULL;
 		bool uptodate = false;
-		int ret;
+		blk_status_t ret;
 
 		if (!(error_bitmap & (1U << i))) {
 			/*
@@ -2786,7 +2786,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
 				start + offset + sectorsize - 1,
 				&cached);
 	}
-	return errno_to_blk_status(error);
+	return error;
 }
 
 /* lots and lots of room for performance fixes in the end_bio funcs */
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 3f0cb1ef5fdff..0239b26d5170a 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -303,7 +303,7 @@ struct io_failure_record {
 	int failed_mirror;
 };
 
-int btrfs_repair_one_sector(struct inode *inode,
+blk_status_t btrfs_repair_one_sector(struct inode *inode,
 			    struct bio *failed_bio, u32 bio_offset,
 			    struct page *page, unsigned int pgoff,
 			    u64 start, int failed_mirror,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3f7e1779ff19f..93b3ef48cea2f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7794,14 +7794,14 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 						 btrfs_ino(BTRFS_I(inode)),
 						 pgoff);
 			} else {
-				int ret;
+				blk_status_t ret;
 
 				ret = btrfs_repair_one_sector(inode, &bbio->bio,
 						bio_offset, bvec.bv_page, pgoff,
 						start, bbio->mirror_num,
 						submit_dio_repair_bio);
 				if (ret)
-					err = errno_to_blk_status(ret);
+					err = ret;
 			}
 			ASSERT(bio_offset + sectorsize > bio_offset);
 			bio_offset += sectorsize;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (37 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 38/40] btrfs: return a blk_status_t from btrfs_repair_one_sector Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-23  1:28   ` Qu Wenruo
  2022-03-24  0:57   ` Sweet Tea Dorminy
  2022-03-22 15:56 ` [PATCH 40/40] btrfs: use the iomap direct I/O bio directly Christoph Hellwig
  2022-03-22 17:46 ` RFC: cleanup btrfs bio handling Johannes Thumshirn
  40 siblings, 2 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Allow the caller to control what happens when the repair bio completes.
This will be needed streamline the direct I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 15 ++++++++-------
 fs/btrfs/extent_io.h |  8 ++++----
 fs/btrfs/inode.c     |  4 +++-
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 2fdb5d7dd51e1..5a1447db28228 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2627,10 +2627,10 @@ static bool btrfs_check_repairable(struct inode *inode,
 }
 
 blk_status_t btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook)
+		struct bio *failed_bio, u32 bio_offset, struct page *page,
+		unsigned int pgoff, u64 start, int failed_mirror,
+		submit_bio_hook_t *submit_bio_hook,
+		void *bi_private, void (*bi_end_io)(struct bio *bio))
 {
 	struct io_failure_record *failrec;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -2660,9 +2660,9 @@ blk_status_t btrfs_repair_one_sector(struct inode *inode,
 	repair_bio = btrfs_bio_alloc(inode, 1, REQ_OP_READ);
 	repair_bbio = btrfs_bio(repair_bio);
 	repair_bbio->file_offset = start;
-	repair_bio->bi_end_io = failed_bio->bi_end_io;
 	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-	repair_bio->bi_private = failed_bio->bi_private;
+	repair_bio->bi_private = bi_private;
+	repair_bio->bi_end_io = bi_end_io;
 
 	if (failed_bbio->csum) {
 		const u32 csum_size = fs_info->csum_size;
@@ -2758,7 +2758,8 @@ static blk_status_t submit_read_repair(struct inode *inode,
 		ret = btrfs_repair_one_sector(inode, failed_bio,
 				bio_offset + offset,
 				page, pgoff + offset, start + offset,
-				failed_mirror, btrfs_submit_data_bio);
+				failed_mirror, btrfs_submit_data_bio,
+				failed_bio->bi_private, failed_bio->bi_end_io);
 		if (!ret) {
 			/*
 			 * We have submitted the read repair, the page release
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 0239b26d5170a..54e54269cfdba 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -304,10 +304,10 @@ struct io_failure_record {
 };
 
 blk_status_t btrfs_repair_one_sector(struct inode *inode,
-			    struct bio *failed_bio, u32 bio_offset,
-			    struct page *page, unsigned int pgoff,
-			    u64 start, int failed_mirror,
-			    submit_bio_hook_t *submit_bio_hook);
+		struct bio *failed_bio, u32 bio_offset, struct page *page,
+		unsigned int pgoff, u64 start, int failed_mirror,
+		submit_bio_hook_t *submit_bio_hook,
+		void *bi_private, void (*bi_end_io)(struct bio *bio));
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 bool find_lock_delalloc_range(struct inode *inode,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 93b3ef48cea2f..e25d9d860c679 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7799,7 +7799,9 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 				ret = btrfs_repair_one_sector(inode, &bbio->bio,
 						bio_offset, bvec.bv_page, pgoff,
 						start, bbio->mirror_num,
-						submit_dio_repair_bio);
+						submit_dio_repair_bio,
+						bbio->bio.bi_private,
+						bbio->bio.bi_end_io);
 				if (ret)
 					err = ret;
 			}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (38 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector Christoph Hellwig
@ 2022-03-22 15:56 ` Christoph Hellwig
  2022-03-23  1:39   ` Qu Wenruo
  2022-03-22 17:46 ` RFC: cleanup btrfs bio handling Johannes Thumshirn
  40 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:56 UTC (permalink / raw)
  To: Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

Make the iomap code allocate btrfs dios by setting the bio_set field,
and then feed these directly into btrfs_map_dio.

For this to work iomap_begin needs to report a range that only contains
a single chunk, and thus is changed to a two level iteration.

This needs another extra field in struct btrfs_dio.  We culd overlay
it with other fields not used after I/O submittion, or split out a
new btrfs_dio_bio for the file_offset, iter and repair_refs, but
compared to the overall saving of the series this is a minor detail.

The per-iomap csum lookup is gone for now as well.  At least for
small I/Os this just creates a lot of overhead, but for large I/O
we could look into optimizing this in one for or another, but I'd
love to see a reproducer where it actually matters first.  With the
state as of this patch the direct I/O bio submission is so close
to the buffered one that they could be unified with very little
work, so diverging again would be a bit counterproductive.  OTOH
if the optimization is indeed very useful we should do it in a way
that also works for buffered reads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/btrfs_inode.h |  25 ---
 fs/btrfs/ctree.h       |   1 -
 fs/btrfs/extent_io.c   |  22 +-
 fs/btrfs/extent_io.h   |   4 +-
 fs/btrfs/inode.c       | 451 ++++++++++++++++-------------------------
 fs/btrfs/volumes.h     |   1 +
 6 files changed, 184 insertions(+), 320 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index b3e46aabc3d86..a3199020f0001 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -346,31 +346,6 @@ static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
 	return ret;
 }
 
-struct btrfs_dio_private {
-	struct inode *inode;
-
-	/*
-	 * Since DIO can use anonymous page, we cannot use page_offset() to
-	 * grab the file offset, thus need a dedicated member for file offset.
-	 */
-	u64 file_offset;
-	u64 disk_bytenr;
-	/* Used for bio::bi_size */
-	u32 bytes;
-
-	/*
-	 * References to this structure. There is one reference per in-flight
-	 * bio plus one while we're still setting up.
-	 */
-	refcount_t refs;
-
-	/* dio_bio came from fs/direct-io.c */
-	struct bio *dio_bio;
-
-	/* Array of checksums */
-	u8 csums[];
-};
-
 /*
  * btrfs_inode_item stores flags in a u64, btrfs_inode stores them in two
  * separate u32s. These two functions convert between the two representations.
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 196f308e3e0d7..64ef20b84f694 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3136,7 +3136,6 @@ int btrfs_del_orphan_item(struct btrfs_trans_handle *trans,
 int btrfs_find_orphan_item(struct btrfs_root *root, u64 offset);
 
 /* file-item.c */
-struct btrfs_dio_private;
 int btrfs_del_csums(struct btrfs_trans_handle *trans,
 		    struct btrfs_root *root, u64 bytenr, u64 len);
 blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5a1447db28228..f705e4ec9b961 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -31,7 +31,7 @@
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
-static struct bio_set btrfs_bioset;
+struct bio_set btrfs_bioset;
 
 static inline bool extent_state_in_tree(const struct extent_state *state)
 {
@@ -3150,26 +3150,6 @@ struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
 	return bio;
 }
 
-struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
-		u64 offset, u64 size)
-{
-	struct bio *bio;
-	struct btrfs_bio *bbio;
-
-	ASSERT(offset <= UINT_MAX && size <= UINT_MAX);
-
-	/* this will never fail when it's backed by a bioset */
-	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
-	ASSERT(bio);
-
-	bbio = btrfs_bio(bio);
-	btrfs_bio_init(btrfs_bio(bio), inode);
-
-	bio_trim(bio, offset >> 9, size >> 9);
-	bbio->iter = bio->bi_iter;
-	return bio;
-}
-
 /**
  * Attempt to add a page to bio
  *
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 54e54269cfdba..b416531721dfb 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -279,8 +279,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  u32 bits_to_clear, unsigned long page_ops);
 struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
 		unsigned int opf);
-struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
-		u64 offset, u64 size);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
@@ -323,4 +321,6 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info);
 #define btrfs_extent_buffer_leak_debug_check(fs_info)	do {} while (0)
 #endif
 
+extern struct bio_set btrfs_bioset;
+
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e25d9d860c679..6ea6ef214abdb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -62,8 +62,8 @@ struct btrfs_iget_args {
 };
 
 struct btrfs_dio_data {
-	ssize_t submitted;
 	struct extent_changeset *data_reserved;
+	struct iomap extent;
 };
 
 static const struct inode_operations btrfs_dir_inode_operations;
@@ -7507,16 +7507,16 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
 	return ret;
 }
 
-static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
+static int btrfs_dio_iomap_begin_extent(struct iomap_iter *iter)
 {
 	struct inode *inode = iter->inode;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	loff_t start = iter->pos;
 	loff_t length = iter->len;
-	struct iomap *iomap = &iter->iomap;
 	struct extent_map *em;
 	struct extent_state *cached_state = NULL;
 	struct btrfs_dio_data *dio_data = iter->private;
+	struct iomap *iomap = &dio_data->extent;
 	u64 lockstart, lockend;
 	bool write = (iter->flags & IOMAP_WRITE);
 	int ret = 0;
@@ -7543,7 +7543,6 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
 			return ret;
 	}
 
-	dio_data->submitted = 0;
 	dio_data->data_reserved = NULL;
 
 	/*
@@ -7647,14 +7646,12 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
 		iomap->type = IOMAP_MAPPED;
 	}
 	iomap->offset = start;
-	iomap->bdev = fs_info->fs_devices->latest_dev->bdev;
 	iomap->length = len;
 
 	if (write && btrfs_use_zone_append(BTRFS_I(inode), em->block_start))
 		iomap->flags |= IOMAP_F_ZONE_APPEND;
 
 	free_extent_map(em);
-
 	return 0;
 
 unlock_err:
@@ -7663,53 +7660,95 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
 	return ret;
 }
 
-static int btrfs_dio_iomap_end(struct iomap_iter *iter)
+static void btrfs_dio_unlock_remaining_extent(struct btrfs_inode *bi,
+		u64 pos, u64 len, u64 processed, bool write)
 {
-	struct btrfs_dio_data *dio_data = iter->private;
+	if (write)
+		__endio_write_update_ordered(bi, pos + processed,
+				len - processed, false);
+	else
+		unlock_extent(&bi->io_tree, pos + processed,
+				pos + len - 1);
+}
+
+static int btrfs_dio_iomap_begin_chunk(struct iomap_iter *iter)
+{
+	struct btrfs_fs_info *fs_info = btrfs_sb(iter->inode->i_sb);
 	struct btrfs_inode *bi = BTRFS_I(iter->inode);
-	bool write = (iter->flags & IOMAP_WRITE);
-	loff_t length = iomap_length(iter);
-	loff_t pos = iter->pos;
-	int ret = 0;
+	struct btrfs_dio_data *dio_data = iter->private;
+	struct block_device *bdev;
+	u64 len;
+
+	iter->iomap = dio_data->extent;
 
-	if (!write && iter->iomap.type == IOMAP_HOLE) {
-		/* If reading from a hole, unlock and return */
-		unlock_extent(&bi->io_tree, pos, pos + length - 1);
+	if (dio_data->extent.type != IOMAP_MAPPED)
 		return 0;
+
+	bdev = btrfs_get_stripe_info(fs_info, (iter->flags & IOMAP_WRITE) ?
+			BTRFS_MAP_WRITE : BTRFS_MAP_READ,
+			iter->iomap.addr, iter->iomap.length, &len);
+	if (WARN_ON_ONCE(IS_ERR(bdev))) {
+		btrfs_dio_unlock_remaining_extent(bi, dio_data->extent.offset,
+						  dio_data->extent.length, 0,
+						  iter->flags & IOMAP_WRITE);
+		return PTR_ERR(bdev);
 	}
 
-	if (dio_data->submitted < length) {
-		pos += dio_data->submitted;
-		length -= dio_data->submitted;
-		if (write)
-			__endio_write_update_ordered(bi, pos, length, false);
-		else
-			unlock_extent(&bi->io_tree, pos, pos + length - 1);
-		ret = -ENOTBLK;
+	iter->iomap.bdev = bdev;
+	iter->iomap.length = min(iter->iomap.length, len);
+	return 0;
+}
+
+static bool btrfs_dio_iomap_end(struct iomap_iter *iter)
+{
+	struct btrfs_inode *bi = BTRFS_I(iter->inode);
+	struct btrfs_dio_data *dio_data = iter->private;
+	struct iomap *extent = &dio_data->extent;
+	loff_t processed = iomap_processed(iter);
+	loff_t length = iomap_length(iter);
+
+	if (iter->iomap.type == IOMAP_HOLE) {
+		ASSERT(!(iter->flags & IOMAP_WRITE));
+
+		/* If reading from a hole, unlock the whole range here */
+		unlock_extent(&bi->io_tree, iter->pos, iter->pos + length - 1);
+	} else if (processed < length) {
+		btrfs_dio_unlock_remaining_extent(bi, extent->offset,
+						  extent->length, processed,
+						  iter->flags & IOMAP_WRITE);
+	} else if (iter->pos + processed < extent->offset + extent->length) {
+		extent->offset += processed;
+		extent->addr += processed;
+		extent->length -= processed;
+		return true;
 	}
 
-	if (write)
+	if (iter->flags & IOMAP_WRITE)
 		extent_changeset_free(dio_data->data_reserved);
-	return ret;
+	return false;
 }
 
 static int btrfs_dio_iomap_iter(struct iomap_iter *iter)
 {
+	bool keep_extent = false;
 	int ret;
 
-	if (iter->iomap.length) {
-		ret = btrfs_dio_iomap_end(iter);
-		if (ret < 0 && !iter->processed)
-			return ret;
-	}
+	if (iter->iomap.length)
+		keep_extent = btrfs_dio_iomap_end(iter);
 
 	ret = iomap_iter_advance(iter);
 	if (ret <= 0)
 		return ret;
 
-	ret = btrfs_dio_iomap_begin(iter);
-	if (ret < 0)
+	if (!keep_extent) {
+		ret = btrfs_dio_iomap_begin_extent(iter);
+		if (ret < 0)
+			return ret;
+	}
+	ret = btrfs_dio_iomap_begin_chunk(iter);
+	if (ret < 0) 
 		return ret;
+
 	iomap_iter_done(iter);
 	return 1;
 }
@@ -7718,54 +7757,40 @@ static const struct iomap_ops btrfs_dio_iomap_ops = {
 	.iomap_iter		= btrfs_dio_iomap_iter,
 };
 
-static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
+static void btrfs_end_read_dio_bio(struct btrfs_bio *bbio,
+		struct btrfs_bio *main_bbio);
+
+static void btrfs_dio_repair_end_io(struct bio *bio)
 {
-	/*
-	 * This implies a barrier so that stores to dio_bio->bi_status before
-	 * this and loads of dio_bio->bi_status after this are fully ordered.
-	 */
-	if (!refcount_dec_and_test(&dip->refs))
-		return;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_inode *bi = BTRFS_I(bbio->inode);
+	struct btrfs_bio *failed_bbio = bio->bi_private;
 
-	if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) {
-		__endio_write_update_ordered(BTRFS_I(dip->inode),
-					     dip->file_offset,
-					     dip->bytes,
-					     !dip->dio_bio->bi_status);
-	} else {
-		unlock_extent(&BTRFS_I(dip->inode)->io_tree,
-			      dip->file_offset,
-			      dip->file_offset + dip->bytes - 1);
+	if (bio->bi_status) {
+		btrfs_warn(bi->root->fs_info,
+			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
+			   btrfs_ino(bi), bio_op(bio), bio->bi_opf,
+			   bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
+			   bio->bi_status);
 	}
+	btrfs_end_read_dio_bio(bbio, failed_bbio);
 
-	bio_endio(dip->dio_bio);
-	kfree(dip);
+	bio_put(bio);
 }
 
 static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
 					  int mirror_num,
 					  unsigned long bio_flags)
 {
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	blk_status_t ret;
-
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
-
 	btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
-
-	refcount_inc(&dip->refs);
-	ret = btrfs_map_bio(fs_info, bio, mirror_num);
-	if (ret)
-		refcount_dec(&dip->refs);
-	return ret;
+	return btrfs_map_bio(btrfs_sb(inode->i_sb), bio, mirror_num);
 }
 
-static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
-					     struct btrfs_bio *bbio,
-					     const bool uptodate)
+static void btrfs_end_read_dio_bio(struct btrfs_bio *this_bbio,
+		struct btrfs_bio *main_bbio)
 {
-	struct inode *inode = dip->inode;
+	struct inode *inode = main_bbio->inode;
 	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
 	const u32 sectorsize = fs_info->sectorsize;
 	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
@@ -7773,20 +7798,22 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
 	struct bio_vec bvec;
 	struct bvec_iter iter;
+	bool uptodate = !this_bbio->bio.bi_status;
 	u32 bio_offset = 0;
-	blk_status_t err = BLK_STS_OK;
 
-	__bio_for_each_segment(bvec, &bbio->bio, iter, bbio->iter) {
+	main_bbio->bio.bi_status = BLK_STS_OK;
+
+	__bio_for_each_segment(bvec, &this_bbio->bio, iter, this_bbio->iter) {
 		unsigned int i, nr_sectors, pgoff;
 
 		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
 		pgoff = bvec.bv_offset;
 		for (i = 0; i < nr_sectors; i++) {
-			u64 start = bbio->file_offset + bio_offset;
+			u64 start = this_bbio->file_offset + bio_offset;
 
 			ASSERT(pgoff < PAGE_SIZE);
 			if (uptodate &&
-			    (!csum || !check_data_csum(inode, bbio,
+			    (!csum || !check_data_csum(inode, this_bbio,
 						       bio_offset, bvec.bv_page,
 						       pgoff, start))) {
 				clean_io_failure(fs_info, failure_tree, io_tree,
@@ -7796,21 +7823,56 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
 			} else {
 				blk_status_t ret;
 
-				ret = btrfs_repair_one_sector(inode, &bbio->bio,
-						bio_offset, bvec.bv_page, pgoff,
-						start, bbio->mirror_num,
+				atomic_inc(&main_bbio->repair_refs);
+				ret = btrfs_repair_one_sector(inode,
+						&this_bbio->bio, bio_offset,
+						bvec.bv_page, pgoff, start,
+						this_bbio->mirror_num,
 						submit_dio_repair_bio,
-						bbio->bio.bi_private,
-						bbio->bio.bi_end_io);
-				if (ret)
-					err = ret;
+						main_bbio,
+						btrfs_dio_repair_end_io);
+				if (ret) {
+					main_bbio->bio.bi_status = ret;
+					atomic_dec(&main_bbio->repair_refs);
+				}
 			}
 			ASSERT(bio_offset + sectorsize > bio_offset);
 			bio_offset += sectorsize;
 			pgoff += sectorsize;
 		}
 	}
-	return err;
+
+	if (atomic_dec_and_test(&main_bbio->repair_refs)) {
+		unlock_extent(&BTRFS_I(inode)->io_tree, main_bbio->file_offset,
+			main_bbio->file_offset + main_bbio->iter.bi_size - 1);
+		iomap_dio_bio_end_io(&main_bbio->bio);
+	}
+}
+
+static void btrfs_dio_bio_end_io(struct bio *bio)
+{
+	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_inode *bi = BTRFS_I(bbio->inode);
+
+	if (bio->bi_status) {
+		btrfs_warn(bi->root->fs_info,
+			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
+			   btrfs_ino(bi), bio_op(bio), bio->bi_opf,
+			   bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
+			   bio->bi_status);
+	}
+
+	if (bio_op(bio) == REQ_OP_READ) {
+		atomic_set(&bbio->repair_refs, 1);
+		btrfs_end_read_dio_bio(bbio, bbio);
+	} else {
+		btrfs_record_physical_zoned(bbio->inode, bbio->file_offset,
+					    bio);
+		__endio_write_update_ordered(bi, bbio->file_offset,
+					     bbio->iter.bi_size,
+					     !bio->bi_status);
+		iomap_dio_bio_end_io(bio);
+	}
 }
 
 static void __endio_write_update_ordered(struct btrfs_inode *inode,
@@ -7829,47 +7891,47 @@ static void btrfs_submit_bio_start_direct_io(struct btrfs_work *work)
 			&bbio->bio, bbio->file_offset, 1);
 }
 
-static void btrfs_end_dio_bio(struct bio *bio)
+/*
+ * If we are submitting more than one bio, submit them all asynchronously.  The
+ * exception is RAID 5 or 6, as asynchronous checksums make it difficult to
+ * collect full stripe writes.
+ */
+static bool btrfs_dio_allow_async_write(struct btrfs_fs_info *fs_info,
+		struct btrfs_inode *bi)
 {
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
-	blk_status_t err = bio->bi_status;
-
-	if (err)
-		btrfs_warn(BTRFS_I(dip->inode)->root->fs_info,
-			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
-			   btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio),
-			   bio->bi_opf, bio->bi_iter.bi_sector,
-			   bio->bi_iter.bi_size, err);
-
-	if (bio_op(bio) == REQ_OP_READ)
-		err = btrfs_check_read_dio_bio(dip, bbio, !err);
-
-	if (err)
-		dip->dio_bio->bi_status = err;
-
-	btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio);
-
-	bio_put(bio);
-	btrfs_dio_private_put(dip);
+	if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK)
+		return false;
+	if (atomic_read(&bi->sync_writers))
+		return false;
+	return true;
 }
 
-static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
-		struct inode *inode, u64 file_offset, int async_submit)
+static void btrfs_dio_submit_io(const struct iomap_iter *iter,
+		struct bio *bio, loff_t file_offset, bool more)
 {
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_inode *bi = BTRFS_I(inode);
-	struct btrfs_dio_private *dip = bio->bi_private;
+	struct btrfs_fs_info *fs_info = btrfs_sb(iter->inode->i_sb);
+	struct btrfs_inode *bi = BTRFS_I(iter->inode);
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t ret;
 
+	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->inode = iter->inode;
+	bbio->file_offset = file_offset;
+	bbio->iter = bio->bi_iter;
+	bio->bi_end_io = btrfs_dio_bio_end_io;
+
 	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
+		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+			ret = extract_ordered_extent(bi, bio, file_offset);
+			if (ret)
+				goto out_err;
+		}
+
 		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
-			/* See btrfs_submit_data_bio for async submit rules */
-			if (async_submit && !atomic_read(&bi->sync_writers)) {
+			if (more && btrfs_dio_allow_async_write(fs_info, bi)) {
 				btrfs_submit_bio_async(bbio,
 					btrfs_submit_bio_start_direct_io);
-				return BLK_STS_OK;
+				return;
 			}
 
 			/*
@@ -7878,189 +7940,36 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 			 */
 			ret = btrfs_csum_one_bio(bi, bio, file_offset, 1);
 			if (ret)
-				return ret;
+				goto out_err;
 		}
 	} else {
 		bbio->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
 
-		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
-			u64 csum_offset;
-
-			csum_offset = file_offset - dip->file_offset;
-			csum_offset >>= fs_info->sectorsize_bits;
-			csum_offset *= fs_info->csum_size;
-			btrfs_bio(bio)->csum = dip->csums + csum_offset;
-		}
-	}
-
-	return btrfs_map_bio(fs_info, bio, 0);
-}
-
-/*
- * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
- * or ordered extents whether or not we submit any bios.
- */
-static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
-							  struct inode *inode,
-							  loff_t file_offset)
-{
-	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
-	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
-	size_t dip_size;
-	struct btrfs_dio_private *dip;
-
-	dip_size = sizeof(*dip);
-	if (!write && csum) {
-		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-		size_t nblocks;
-
-		nblocks = dio_bio->bi_iter.bi_size >> fs_info->sectorsize_bits;
-		dip_size += fs_info->csum_size * nblocks;
-	}
-
-	dip = kzalloc(dip_size, GFP_NOFS);
-	if (!dip)
-		return NULL;
-
-	dip->inode = inode;
-	dip->file_offset = file_offset;
-	dip->bytes = dio_bio->bi_iter.bi_size;
-	dip->disk_bytenr = dio_bio->bi_iter.bi_sector << 9;
-	dip->dio_bio = dio_bio;
-	refcount_set(&dip->refs, 1);
-	return dip;
-}
-
-static void btrfs_submit_direct(const struct iomap_iter *iter,
-		struct bio *dio_bio, loff_t file_offset, bool more)
-{
-	struct inode *inode = iter->inode;
-	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
-			     BTRFS_BLOCK_GROUP_RAID56_MASK);
-	struct btrfs_dio_private *dip;
-	struct bio *bio;
-	u64 start_sector;
-	int async_submit = 0;
-	u64 submit_len;
-	u64 clone_offset = 0;
-	u64 clone_len;
-	blk_status_t status;
-	struct btrfs_dio_data *dio_data = iter->private;
-	u64 len;
-
-	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
-	if (!dip) {
-		if (!write) {
-			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
-				file_offset + dio_bio->bi_iter.bi_size - 1);
-		}
-		dio_bio->bi_status = BLK_STS_RESOURCE;
-		bio_endio(dio_bio);
-		return;
-	}
-
-	if (!write) {
-		/*
-		 * Load the csums up front to reduce csum tree searches and
-		 * contention when submitting bios.
-		 *
-		 * If we have csums disabled this will do nothing.
-		 */
-		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
-		if (status != BLK_STS_OK)
+		ret = btrfs_lookup_bio_sums(iter->inode, bio, NULL);
+		if (ret)
 			goto out_err;
 	}
 
-	start_sector = dio_bio->bi_iter.bi_sector;
-	submit_len = dio_bio->bi_iter.bi_size;
-
-	do {
-		struct block_device *bdev;
-
-		bdev = btrfs_get_stripe_info(fs_info, btrfs_op(dio_bio),
-				      start_sector << 9, submit_len, &len);
-		if (IS_ERR(bdev)) {
-			status = errno_to_blk_status(PTR_ERR(bdev));
-			goto out_err;
-		}
-
-		clone_len = min(submit_len, len);
-		ASSERT(clone_len <= UINT_MAX);
-
-		/*
-		 * This will never fail as it's passing GPF_NOFS and
-		 * the allocation is backed by btrfs_bioset.
-		 */
-		bio = btrfs_bio_clone_partial(inode, dio_bio, clone_offset,
-					      clone_len);
-		bio->bi_private = dip;
-		bio->bi_end_io = btrfs_end_dio_bio;
-		btrfs_bio(bio)->file_offset = file_offset;
-
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			status = extract_ordered_extent(BTRFS_I(inode), bio,
-							file_offset);
-			if (status) {
-				bio_put(bio);
-				goto out_err;
-			}
-		}
-
-		ASSERT(submit_len >= clone_len);
-		submit_len -= clone_len;
-
-		/*
-		 * Increase the count before we submit the bio so we know
-		 * the end IO handler won't happen before we increase the
-		 * count. Otherwise, the dip might get freed before we're
-		 * done setting it up.
-		 *
-		 * We transfer the initial reference to the last bio, so we
-		 * don't need to increment the reference count for the last one.
-		 */
-		if (submit_len > 0) {
-			refcount_inc(&dip->refs);
-			/*
-			 * If we are submitting more than one bio, submit them
-			 * all asynchronously. The exception is RAID 5 or 6, as
-			 * asynchronous checksums make it difficult to collect
-			 * full stripe writes.
-			 */
-			if (!raid56)
-				async_submit = 1;
-		}
-
-		status = btrfs_submit_dio_bio(bio, inode, file_offset,
-						async_submit);
-		if (status) {
-			bio_put(bio);
-			if (submit_len > 0)
-				refcount_dec(&dip->refs);
-			goto out_err;
-		}
+	ret = btrfs_map_bio(fs_info, bio, 0);
+	if (ret)
+		goto out_err;
 
-		dio_data->submitted += clone_len;
-		clone_offset += clone_len;
-		start_sector += clone_len >> 9;
-		file_offset += clone_len;
-	} while (submit_len > 0);
 	return;
 
 out_err:
-	dip->dio_bio->bi_status = status;
-	btrfs_dio_private_put(dip);
+	bio->bi_status = ret;
+	bio_endio(bio);
 }
 
 static const struct iomap_dio_ops btrfs_dio_ops = {
-	.submit_io		= btrfs_submit_direct,
+	.submit_io		= btrfs_dio_submit_io,
+	.bio_set		= &btrfs_bioset,
 };
 
 ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		size_t done_before)
 {
-	struct btrfs_dio_data data;
+	struct btrfs_dio_data data = {};
 
 	iocb->private = &data;
 	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c6425760f69da..e9d775398141b 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -341,6 +341,7 @@ struct btrfs_bio {
 
 	/* for direct I/O */
 	u64 file_offset;
+	atomic_t repair_refs;
 
 	/* @device is for stripe IO submission. */
 	struct btrfs_device *device;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: RFC: cleanup btrfs bio handling
  2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
                   ` (39 preceding siblings ...)
  2022-03-22 15:56 ` [PATCH 40/40] btrfs: use the iomap direct I/O bio directly Christoph Hellwig
@ 2022-03-22 17:46 ` Johannes Thumshirn
  40 siblings, 0 replies; 81+ messages in thread
From: Johannes Thumshirn @ 2022-03-22 17:46 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel

On 22/03/2022 16:56, Christoph Hellwig wrote:
> All this is pretty rough.  It survices a xfstests auto group run on
> a default file system config, though.
> 
> The tree is based on Jens' for-next tree as it started with the bio
> cleanups, and will need a rebase once 5.18-rc1 is out.

JFYI I've run this through xfstests on zoned null_blk here as well
and it's looking good so far. I.e. no noticeable regressions found.

I'll look deeper into the series (just had a quick fly over by now)
tomorrow with fresh eyes.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 02/40] btrfs: fix direct I/O read repair for split bios
  2022-03-22 15:55 ` [PATCH 02/40] btrfs: fix direct I/O read repair for split bios Christoph Hellwig
@ 2022-03-22 23:59   ` Qu Wenruo
  2022-03-23  6:03     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-22 23:59 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> When a bio is split in btrfs_submit_direct, dip->file_offset contains
> the file offset for the first bio.  But this means the start value used
> in btrfs_check_read_dio_bio is incorrect for subsequent bios.  Add
> a file_offset field to struct btrfs_bio to pass along the correct offset.
>
> Given that check_data_csum only uses start of an error message this
> means problems with this miscalculation will only show up when I/O
> fails or checksums mismatch. >
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Personally speaking, I really hate to add DIO specific value into btrfs_bio.

Hopes we can later turn that btrfs_bio::file_offset into some union for
other usages.

Despite the extra memory usage, it looks good.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/extent_io.c |  1 +
>   fs/btrfs/inode.c     | 13 +++++--------
>   fs/btrfs/volumes.h   |  3 +++
>   3 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e9fa0f6d605ee..7ca4e9b80f023 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2662,6 +2662,7 @@ int btrfs_repair_one_sector(struct inode *inode,
>
>   	repair_bio = btrfs_bio_alloc(1);
>   	repair_bbio = btrfs_bio(repair_bio);
> +	repair_bbio->file_offset = start;
>   	repair_bio->bi_opf = REQ_OP_READ;
>   	repair_bio->bi_end_io = failed_bio->bi_end_io;
>   	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3ef8b63bb1b5c..93f00e9150ed0 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7773,8 +7773,6 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
>   	struct bio_vec bvec;
>   	struct bvec_iter iter;
> -	const u64 orig_file_offset = dip->file_offset;
> -	u64 start = orig_file_offset;
>   	u32 bio_offset = 0;
>   	blk_status_t err = BLK_STS_OK;
>
> @@ -7784,6 +7782,8 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
>   		pgoff = bvec.bv_offset;
>   		for (i = 0; i < nr_sectors; i++) {
> +			u64 start = bbio->file_offset + bio_offset;
> +
>   			ASSERT(pgoff < PAGE_SIZE);
>   			if (uptodate &&
>   			    (!csum || !check_data_csum(inode, bbio,
> @@ -7796,17 +7796,13 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   			} else {
>   				int ret;
>
> -				ASSERT((start - orig_file_offset) < UINT_MAX);
> -				ret = btrfs_repair_one_sector(inode,
> -						&bbio->bio,
> -						start - orig_file_offset,
> -						bvec.bv_page, pgoff,
> +				ret = btrfs_repair_one_sector(inode, &bbio->bio,
> +						bio_offset, bvec.bv_page, pgoff,
>   						start, bbio->mirror_num,
>   						submit_dio_repair_bio);
>   				if (ret)
>   					err = errno_to_blk_status(ret);
>   			}
> -			start += sectorsize;
>   			ASSERT(bio_offset + sectorsize > bio_offset);
>   			bio_offset += sectorsize;
>   			pgoff += sectorsize;
> @@ -8009,6 +8005,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
>   		bio->bi_private = dip;
>   		bio->bi_end_io = btrfs_end_dio_bio;
> +		btrfs_bio(bio)->file_offset = file_offset;
>
>   		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
>   			status = extract_ordered_extent(BTRFS_I(inode), bio,
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 005c9e2a491a1..c22148bebc2f5 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -323,6 +323,9 @@ struct btrfs_fs_devices {
>   struct btrfs_bio {
>   	unsigned int mirror_num;
>
> +	/* for direct I/O */
> +	u64 file_offset;
> +
>   	/* @device is for stripe IO submission. */
>   	struct btrfs_device *device;
>   	u8 *csum;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices
  2022-03-22 15:55 ` [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices Christoph Hellwig
@ 2022-03-23  0:00   ` Qu Wenruo
  2022-03-23  6:04     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:00 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> When a bio is split in btrfs_submit_direct, dip->file_offset contains
> the file offset for the first bio.  But this means the start value used
> in btrfs_end_dio_bio to record the write location for zone devices is
> icorrect for subsequent bios.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Maybe better to be folded with previous patch?

It looks good to me though.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/inode.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 93f00e9150ed0..325e773c6e880 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7829,6 +7829,7 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
>   static void btrfs_end_dio_bio(struct bio *bio)
>   {
>   	struct btrfs_dio_private *dip = bio->bi_private;
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
>   	blk_status_t err = bio->bi_status;
>
>   	if (err)
> @@ -7839,12 +7840,12 @@ static void btrfs_end_dio_bio(struct bio *bio)
>   			   bio->bi_iter.bi_size, err);
>
>   	if (bio_op(bio) == REQ_OP_READ)
> -		err = btrfs_check_read_dio_bio(dip, btrfs_bio(bio), !err);
> +		err = btrfs_check_read_dio_bio(dip, bbio, !err);
>
>   	if (err)
>   		dip->dio_bio->bi_status = err;
>
> -	btrfs_record_physical_zoned(dip->inode, dip->file_offset, bio);
> +	btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio);
>
>   	bio_put(bio);
>   	btrfs_dio_private_put(dip);

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 06/40] btrfs: split submit_bio from btrfsic checking
  2022-03-22 15:55 ` [PATCH 06/40] btrfs: split submit_bio from btrfsic checking Christoph Hellwig
@ 2022-03-23  0:04   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:04 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Require a separate call to the integrity checking helpers from the
> actual bio submission.

All-in for this!

I can't remember how many times I feel embarrassed when the bio
submission is done inside a function which by its name should only do
sanity checks.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/check-integrity.c | 14 +-------------
>   fs/btrfs/check-integrity.h |  8 ++++----
>   fs/btrfs/disk-io.c         |  6 ++++--
>   fs/btrfs/extent_io.c       |  3 ++-
>   fs/btrfs/scrub.c           | 12 ++++++++----
>   fs/btrfs/volumes.c         |  3 ++-
>   6 files changed, 21 insertions(+), 25 deletions(-)
>
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 9efc1feb6cb08..49f9954f1438f 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2703,7 +2703,7 @@ static void btrfsic_check_flush_bio(struct bio *bio,
>   	}
>   }
>
> -static void __btrfsic_submit_bio(struct bio *bio)
> +void btrfsic_check_bio(struct bio *bio)
>   {
>   	struct btrfsic_dev_state *dev_state;
>
> @@ -2725,18 +2725,6 @@ static void __btrfsic_submit_bio(struct bio *bio)
>   	mutex_unlock(&btrfsic_mutex);
>   }
>
> -void btrfsic_submit_bio(struct bio *bio)
> -{
> -	__btrfsic_submit_bio(bio);
> -	submit_bio(bio);
> -}
> -
> -int btrfsic_submit_bio_wait(struct bio *bio)
> -{
> -	__btrfsic_submit_bio(bio);
> -	return submit_bio_wait(bio);
> -}
> -
>   int btrfsic_mount(struct btrfs_fs_info *fs_info,
>   		  struct btrfs_fs_devices *fs_devices,
>   		  int including_extent_data, u32 print_mask)
> diff --git a/fs/btrfs/check-integrity.h b/fs/btrfs/check-integrity.h
> index bcc730a06cb58..ed115e0f2ebbd 100644
> --- a/fs/btrfs/check-integrity.h
> +++ b/fs/btrfs/check-integrity.h
> @@ -7,11 +7,11 @@
>   #define BTRFS_CHECK_INTEGRITY_H
>
>   #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
> -void btrfsic_submit_bio(struct bio *bio);
> -int btrfsic_submit_bio_wait(struct bio *bio);
> +void btrfsic_check_bio(struct bio *bio);
>   #else
> -#define btrfsic_submit_bio submit_bio
> -#define btrfsic_submit_bio_wait submit_bio_wait
> +static inline void btrfsic_check_bio(struct bio *bio)
> +{
> +}
>   #endif
>
>   int btrfsic_mount(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index c245e1b131964..9b8ee74144910 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4048,7 +4048,8 @@ static int write_dev_supers(struct btrfs_device *device,
>   		if (i == 0 && !btrfs_test_opt(device->fs_info, NOBARRIER))
>   			bio->bi_opf |= REQ_FUA;
>
> -		btrfsic_submit_bio(bio);
> +		btrfsic_check_bio(bio);
> +		submit_bio(bio);
>
>   		if (btrfs_advance_sb_log(device, i))
>   			errors++;
> @@ -4161,7 +4162,8 @@ static void write_dev_flush(struct btrfs_device *device)
>   	init_completion(&device->flush_wait);
>   	bio->bi_private = &device->flush_wait;
>
> -	btrfsic_submit_bio(bio);
> +	btrfsic_check_bio(bio);
> +	submit_bio(bio);
>   	set_bit(BTRFS_DEV_STATE_FLUSH_SENT, &device->dev_state);
>   }
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index e789676373ab0..1a39b9ffdd180 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2370,7 +2370,8 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
>   	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
>   	bio_add_page(bio, page, length, pg_offset);
>
> -	if (btrfsic_submit_bio_wait(bio)) {
> +	btrfsic_check_bio(bio);
> +	if (submit_bio_wait(bio)) {
>   		/* try to remap that extent elsewhere? */
>   		btrfs_bio_counter_dec(fs_info);
>   		bio_put(bio);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 2e9a322773f28..605ecc675ba7c 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1479,7 +1479,8 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
>   		bio->bi_iter.bi_sector = spage->physical >> 9;
>   		bio->bi_opf = REQ_OP_READ;
>
> -		if (btrfsic_submit_bio_wait(bio)) {
> +		btrfsic_check_bio(bio);
> +		if (submit_bio_wait(bio)) {
>   			spage->io_error = 1;
>   			sblock->no_io_error_seen = 0;
>   		}
> @@ -1565,7 +1566,8 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
>   			return -EIO;
>   		}
>
> -		if (btrfsic_submit_bio_wait(bio)) {
> +		btrfsic_check_bio(bio);
> +		if (submit_bio_wait(bio)) {
>   			btrfs_dev_stat_inc_and_print(spage_bad->dev,
>   				BTRFS_DEV_STAT_WRITE_ERRS);
>   			atomic64_inc(&fs_info->dev_replace.num_write_errors);
> @@ -1723,7 +1725,8 @@ static void scrub_wr_submit(struct scrub_ctx *sctx)
>   	 * orders the requests before sending them to the driver which
>   	 * doubled the write performance on spinning disks when measured
>   	 * with Linux 3.5 */
> -	btrfsic_submit_bio(sbio->bio);
> +	btrfsic_check_bio(sbio->bio);
> +	submit_bio(sbio->bio);
>
>   	if (btrfs_is_zoned(sctx->fs_info))
>   		sctx->write_pointer = sbio->physical + sbio->page_count *
> @@ -2057,7 +2060,8 @@ static void scrub_submit(struct scrub_ctx *sctx)
>   	sbio = sctx->bios[sctx->curr];
>   	sctx->curr = -1;
>   	scrub_pending_bio_inc(sctx);
> -	btrfsic_submit_bio(sbio->bio);
> +	btrfsic_check_bio(sbio->bio);
> +	submit_bio(sbio->bio);
>   }
>
>   static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b07d382d53a86..bfa8e825e5047 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6755,7 +6755,8 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
>
>   	btrfs_bio_counter_inc_noblocked(fs_info);
>
> -	btrfsic_submit_bio(bio);
> +	btrfsic_check_bio(bio);
> +	submit_bio(bio);
>   }
>
>   static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 08/40] btrfs: simplify repair_io_failure
  2022-03-22 15:55 ` [PATCH 08/40] btrfs: simplify repair_io_failure Christoph Hellwig
@ 2022-03-23  0:06   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:06 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
> so just use an on-stack bio.  Also cleanup the error handling to use goto
> labels and not discard the actual return values.

Didn't even know we can do on-stack bio.

Looks good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 52 ++++++++++++++++++++------------------------
>   1 file changed, 24 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1a39b9ffdd180..be523581c0ac1 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2307,12 +2307,13 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
>   			     u64 length, u64 logical, struct page *page,
>   			     unsigned int pg_offset, int mirror_num)
>   {
> -	struct bio *bio;
>   	struct btrfs_device *dev;
> +	struct bio_vec bvec;
> +	struct bio bio;
>   	u64 map_length = 0;
>   	u64 sector;
>   	struct btrfs_io_context *bioc = NULL;
> -	int ret;
> +	int ret = 0;
>
>   	ASSERT(!(fs_info->sb->s_flags & SB_RDONLY));
>   	BUG_ON(!mirror_num);
> @@ -2320,8 +2321,6 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
>   	if (btrfs_repair_one_zone(fs_info, logical))
>   		return 0;
>
> -	bio = btrfs_bio_alloc(1);
> -	bio->bi_iter.bi_size = 0;
>   	map_length = length;
>
>   	/*
> @@ -2339,53 +2338,50 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start,
>   		 */
>   		ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
>   				      &map_length, &bioc, 0);
> -		if (ret) {
> -			btrfs_bio_counter_dec(fs_info);
> -			bio_put(bio);
> -			return -EIO;
> -		}
> +		if (ret)
> +			goto out_counter_dec;
>   		ASSERT(bioc->mirror_num == 1);
>   	} else {
>   		ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical,
>   				      &map_length, &bioc, mirror_num);
> -		if (ret) {
> -			btrfs_bio_counter_dec(fs_info);
> -			bio_put(bio);
> -			return -EIO;
> -		}
> +		if (ret)
> +			goto out_counter_dec;
>   		BUG_ON(mirror_num != bioc->mirror_num);
>   	}
>
>   	sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9;
> -	bio->bi_iter.bi_sector = sector;
>   	dev = bioc->stripes[bioc->mirror_num - 1].dev;
>   	btrfs_put_bioc(bioc);
> +
>   	if (!dev || !dev->bdev ||
>   	    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) {
> -		btrfs_bio_counter_dec(fs_info);
> -		bio_put(bio);
> -		return -EIO;
> +		ret = -EIO;
> +		goto out_counter_dec;
>   	}
> -	bio_set_dev(bio, dev->bdev);
> -	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC;
> -	bio_add_page(bio, page, length, pg_offset);
>
> -	btrfsic_check_bio(bio);
> -	if (submit_bio_wait(bio)) {
> +	bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC);
> +	bio.bi_iter.bi_sector = sector;
> +	__bio_add_page(&bio, page, length, pg_offset);
> +
> +	btrfsic_check_bio(&bio);
> +	ret = submit_bio_wait(&bio);
> +	if (ret) {
>   		/* try to remap that extent elsewhere? */
> -		btrfs_bio_counter_dec(fs_info);
> -		bio_put(bio);
>   		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
> -		return -EIO;
> +		goto out_bio_uninit;
>   	}
>
>   	btrfs_info_rl_in_rcu(fs_info,
>   		"read error corrected: ino %llu off %llu (dev %s sector %llu)",
>   				  ino, start,
>   				  rcu_str_deref(dev->name), sector);
> +	ret = 0;
> +
> +out_bio_uninit:
> +	bio_uninit(&bio);
> +out_counter_dec:
>   	btrfs_bio_counter_dec(fs_info);
> -	bio_put(bio);
> -	return 0;
> +	return ret;
>   }
>
>   int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 09/40] btrfs: simplify scrub_recheck_block
  2022-03-22 15:55 ` [PATCH 09/40] btrfs: simplify scrub_recheck_block Christoph Hellwig
@ 2022-03-23  0:10   ` Qu Wenruo
  2022-03-23  6:05     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:10 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
> so just use an on-stack bio.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Just an unrelated question below.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/scrub.c | 18 ++++++++----------
>   1 file changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 605ecc675ba7c..508c91e26b6e9 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1462,8 +1462,9 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
>   		return scrub_recheck_block_on_raid56(fs_info, sblock);
>
>   	for (page_num = 0; page_num < sblock->page_count; page_num++) {
> -		struct bio *bio;
>   		struct scrub_page *spage = sblock->pagev[page_num];
> +		struct bio bio;
> +		struct bio_vec bvec;
>
>   		if (spage->dev->bdev == NULL) {
>   			spage->io_error = 1;
> @@ -1472,20 +1473,17 @@ static void scrub_recheck_block(struct btrfs_fs_info *fs_info,
>   		}
>
>   		WARN_ON(!spage->page);
> -		bio = btrfs_bio_alloc(1);
> -		bio_set_dev(bio, spage->dev->bdev);
> -
> -		bio_add_page(bio, spage->page, fs_info->sectorsize, 0);
> -		bio->bi_iter.bi_sector = spage->physical >> 9;
> -		bio->bi_opf = REQ_OP_READ;
> +		bio_init(&bio, spage->dev->bdev, &bvec, 1, REQ_OP_READ);
> +		__bio_add_page(&bio, spage->page, fs_info->sectorsize, 0);

Can we make the naming for __bio_add_page() better?

With more on-stack bio usage, such __bio_add_page() is really a little
embarrassing.

Thanks,
Qu

> +		bio.bi_iter.bi_sector = spage->physical >> 9;
>
> -		btrfsic_check_bio(bio);
> -		if (submit_bio_wait(bio)) {
> +		btrfsic_check_bio(&bio);
> +		if (submit_bio_wait(&bio)) {
>   			spage->io_error = 1;
>   			sblock->no_io_error_seen = 0;
>   		}
>
> -		bio_put(bio);
> +		bio_uninit(&bio);
>   	}
>
>   	if (sblock->no_io_error_seen)

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy
  2022-03-22 15:55 ` [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy Christoph Hellwig
@ 2022-03-23  0:12   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:12 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> The I/O in repair_io_failue is synchronous and doesn't need a btrfs_bio,
> so just use an on-stack bio.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/scrub.c | 23 +++++++++--------------
>   1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 508c91e26b6e9..bb9382c02714f 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1544,7 +1544,8 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
>   	BUG_ON(spage_good->page == NULL);
>   	if (force_write || sblock_bad->header_error ||
>   	    sblock_bad->checksum_error || spage_bad->io_error) {
> -		struct bio *bio;
> +		struct bio bio;
> +		struct bio_vec bvec;
>   		int ret;
>
>   		if (!spage_bad->dev->bdev) {
> @@ -1553,26 +1554,20 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
>   			return -EIO;
>   		}
>
> -		bio = btrfs_bio_alloc(1);
> -		bio_set_dev(bio, spage_bad->dev->bdev);
> -		bio->bi_iter.bi_sector = spage_bad->physical >> 9;
> -		bio->bi_opf = REQ_OP_WRITE;
> +		bio_init(&bio, spage_bad->dev->bdev, &bvec, 1, REQ_OP_WRITE);
> +		bio.bi_iter.bi_sector = spage_bad->physical >> 9;
> +		__bio_add_page(&bio, spage_good->page, sectorsize, 0);
>
> -		ret = bio_add_page(bio, spage_good->page, sectorsize, 0);
> -		if (ret != sectorsize) {
> -			bio_put(bio);
> -			return -EIO;
> -		}
> +		btrfsic_check_bio(&bio);
> +		ret = submit_bio_wait(&bio);
> +		bio_uninit(&bio);
>
> -		btrfsic_check_bio(bio);
> -		if (submit_bio_wait(bio)) {
> +		if (ret) {
>   			btrfs_dev_stat_inc_and_print(spage_bad->dev,
>   				BTRFS_DEV_STAT_WRITE_ERRS);
>   			atomic64_inc(&fs_info->dev_replace.num_write_errors);
> -			bio_put(bio);
>   			return -EIO;
>   		}
> -		bio_put(bio);
>   	}
>
>   	return 0;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios
  2022-03-22 15:55 ` [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios Christoph Hellwig
@ 2022-03-23  0:16   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:16 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Except for the spurious initialization of ->device just after allocation
> nothing uses the btrfs_bio, so just allocate a normal bio without extra
> data.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

RAID56 layer doesn't need to bother special things like mirror_num nor
checksum, as it's at a lower layer under logical layer.

So this is completely fine, and save quite some cleanup I'm going to do
related to RAID56.

Reviewed-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/raid56.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 2f1f7ca27acd5..a0d65f4b2b258 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1103,11 +1103,8 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
>   	}
>
>   	/* put a new bio on the list */
> -	bio = btrfs_bio_alloc(bio_max_len >> PAGE_SHIFT ?: 1);
> -	btrfs_bio(bio)->device = stripe->dev;
> -	bio->bi_iter.bi_size = 0;
> -	bio_set_dev(bio, stripe->dev->bdev);
> -	bio->bi_opf = opf;
> +	bio = bio_alloc(stripe->dev->bdev, max(bio_max_len >> PAGE_SHIFT, 1UL),
> +			opf, GFP_NOFS);
>   	bio->bi_iter.bi_sector = disk_start >> 9;
>   	bio->bi_private = rbio;
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios
  2022-03-22 15:55 ` [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios Christoph Hellwig
@ 2022-03-23  0:18   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:18 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> All the scrub bios go straight to the block device or the raid56 code,
> none of which looks at the btrfs_bio.

Exactly!

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/scrub.c | 47 ++++++++++++++++++-----------------------------
>   1 file changed, 18 insertions(+), 29 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index bb9382c02714f..250d271b02341 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1415,8 +1415,8 @@ static void scrub_recheck_block_on_raid56(struct btrfs_fs_info *fs_info,
>   	if (!first_page->dev->bdev)
>   		goto out;
>
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS);
> -	bio_set_dev(bio, first_page->dev->bdev);
> +	bio = bio_alloc(first_page->dev->bdev, BIO_MAX_VECS, REQ_OP_READ,
> +			GFP_NOFS);
>
>   	for (page_num = 0; page_num < sblock->page_count; page_num++) {
>   		struct scrub_page *spage = sblock->pagev[page_num];
> @@ -1649,8 +1649,6 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
>   	}
>   	sbio = sctx->wr_curr_bio;
>   	if (sbio->page_count == 0) {
> -		struct bio *bio;
> -
>   		ret = fill_writer_pointer_gap(sctx,
>   					      spage->physical_for_dev_replace);
>   		if (ret) {
> @@ -1661,17 +1659,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx *sctx,
>   		sbio->physical = spage->physical_for_dev_replace;
>   		sbio->logical = spage->logical;
>   		sbio->dev = sctx->wr_tgtdev;
> -		bio = sbio->bio;
> -		if (!bio) {
> -			bio = btrfs_bio_alloc(sctx->pages_per_bio);
> -			sbio->bio = bio;
> +		if (!sbio->bio) {
> +			sbio->bio = bio_alloc(sbio->dev->bdev,
> +					      sctx->pages_per_bio,
> +					      REQ_OP_WRITE, GFP_NOFS);
>   		}
> -
> -		bio->bi_private = sbio;
> -		bio->bi_end_io = scrub_wr_bio_end_io;
> -		bio_set_dev(bio, sbio->dev->bdev);
> -		bio->bi_iter.bi_sector = sbio->physical >> 9;
> -		bio->bi_opf = REQ_OP_WRITE;
> +		sbio->bio->bi_private = sbio;
> +		sbio->bio->bi_end_io = scrub_wr_bio_end_io;
> +		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
>   		sbio->status = 0;
>   	} else if (sbio->physical + sbio->page_count * sectorsize !=
>   		   spage->physical_for_dev_replace ||
> @@ -1712,7 +1707,6 @@ static void scrub_wr_submit(struct scrub_ctx *sctx)
>
>   	sbio = sctx->wr_curr_bio;
>   	sctx->wr_curr_bio = NULL;
> -	WARN_ON(!sbio->bio->bi_bdev);
>   	scrub_pending_bio_inc(sctx);
>   	/* process all writes in a single worker thread. Then the block layer
>   	 * orders the requests before sending them to the driver which
> @@ -2084,22 +2078,17 @@ static int scrub_add_page_to_rd_bio(struct scrub_ctx *sctx,
>   	}
>   	sbio = sctx->bios[sctx->curr];
>   	if (sbio->page_count == 0) {
> -		struct bio *bio;
> -
>   		sbio->physical = spage->physical;
>   		sbio->logical = spage->logical;
>   		sbio->dev = spage->dev;
> -		bio = sbio->bio;
> -		if (!bio) {
> -			bio = btrfs_bio_alloc(sctx->pages_per_bio);
> -			sbio->bio = bio;
> +		if (!sbio->bio) {
> +			sbio->bio = bio_alloc(sbio->dev->bdev,
> +					      sctx->pages_per_bio,
> +					      REQ_OP_READ, GFP_NOFS);
>   		}
> -
> -		bio->bi_private = sbio;
> -		bio->bi_end_io = scrub_bio_end_io;
> -		bio_set_dev(bio, sbio->dev->bdev);
> -		bio->bi_iter.bi_sector = sbio->physical >> 9;
> -		bio->bi_opf = REQ_OP_READ;
> +		sbio->bio->bi_private = sbio;
> +		sbio->bio->bi_end_io = scrub_bio_end_io;
> +		sbio->bio->bi_iter.bi_sector = sbio->physical >> 9;
>   		sbio->status = 0;
>   	} else if (sbio->physical + sbio->page_count * sectorsize !=
>   		   spage->physical ||
> @@ -2215,7 +2204,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
>   		goto bioc_out;
>   	}
>
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS);
> +	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
>   	bio->bi_iter.bi_sector = logical >> 9;
>   	bio->bi_private = sblock;
>   	bio->bi_end_io = scrub_missing_raid56_end_io;
> @@ -2831,7 +2820,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
>   	if (ret || !bioc || !bioc->raid_map)
>   		goto bioc_out;
>
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS);
> +	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
>   	bio->bi_iter.bi_sector = sparity->logic_start >> 9;
>   	bio->bi_private = sparity;
>   	bio->bi_end_io = scrub_parity_bio_endio;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair
  2022-03-22 15:55 ` [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair Christoph Hellwig
@ 2022-03-23  0:20   ` Qu Wenruo
  2022-03-23  6:06     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:20 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> submit_bio_hooks is always set to btrfs_submit_data_bio, so just remove
> it.
>

The same as my recent cleanup for it.

https://lore.kernel.org/linux-btrfs/9e29ec4e546249018679224518a465d0240912b0.1647841657.git.wqu@suse.com/T/#u

Although I did extra renaming as submit_read_repair() only works for
data read.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 8 +++-----
>   1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 88d3a46e89a51..238252f86d5ad 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2721,8 +2721,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
>   				      struct bio *failed_bio, u32 bio_offset,
>   				      struct page *page, unsigned int pgoff,
>   				      u64 start, u64 end, int failed_mirror,
> -				      unsigned int error_bitmap,
> -				      submit_bio_hook_t *submit_bio_hook)
> +				      unsigned int error_bitmap)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	const u32 sectorsize = fs_info->sectorsize;
> @@ -2760,7 +2759,7 @@ static blk_status_t submit_read_repair(struct inode *inode,
>   		ret = btrfs_repair_one_sector(inode, failed_bio,
>   				bio_offset + offset,
>   				page, pgoff + offset, start + offset,
> -				failed_mirror, submit_bio_hook);
> +				failed_mirror, btrfs_submit_data_bio);
>   		if (!ret) {
>   			/*
>   			 * We have submitted the read repair, the page release
> @@ -3075,8 +3074,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>   			 */
>   			submit_read_repair(inode, bio, bio_offset, page,
>   					   start - page_offset(page), start,
> -					   end, mirror, error_bitmap,
> -					   btrfs_submit_data_bio);
> +					   end, mirror, error_bitmap);
>
>   			ASSERT(bio_offset + len > bio_offset);
>   			bio_offset += len;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 18/40] btrfs: move more work into btrfs_end_bioc
  2022-03-22 15:55 ` [PATCH 18/40] btrfs: move more work into btrfs_end_bioc Christoph Hellwig
@ 2022-03-23  0:29   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:29 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Assign ->mirror_num and ->bi_status in btrfs_end_bioc instead of
> duplicating the logic in the callers.  Also remove the bio argument as
> it always must be bioc->orig_bio and the now pointless bioc_error that
> did nothing but assign bi_sector to the same value just sampled in the
> caller.

Reviewed-by: Qu Wenruo <wqu@suse.com>

It may be better to rename @first_bio or the @bio parameter, as it takes
me several seconds to realize that @bio get reused for RAID1*/DUP bio
cloned submission.

Thanks,
Qu

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/volumes.c | 68 ++++++++++++++--------------------------------
>   1 file changed, 20 insertions(+), 48 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 4dd54b80dac81..9d1f8c27eff33 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6659,19 +6659,29 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
>   }
>
> -static inline void btrfs_end_bioc(struct btrfs_io_context *bioc, struct bio *bio)
> +static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
>   {
> +	struct bio *bio = bioc->orig_bio;
> +
> +	btrfs_bio(bio)->mirror_num = bioc->mirror_num;
>   	bio->bi_private = bioc->private;
>   	bio->bi_end_io = bioc->end_io;
> -	bio_endio(bio);
>
> +	/*
> +	 * Only send an error to the higher layers if it is beyond the tolerance
> +	 * threshold.
> +	 */
> +	if (atomic_read(&bioc->error) > bioc->max_errors)
> +		bio->bi_status = BLK_STS_IOERR;
> +	else
> +		bio->bi_status = BLK_STS_OK;
> +	bio_endio(bio);
>   	btrfs_put_bioc(bioc);
>   }
>
>   static void btrfs_end_bio(struct bio *bio)
>   {
>   	struct btrfs_io_context *bioc = bio->bi_private;
> -	int is_orig_bio = 0;
>
>   	if (bio->bi_status) {
>   		atomic_inc(&bioc->error);
> @@ -6692,35 +6702,12 @@ static void btrfs_end_bio(struct bio *bio)
>   		}
>   	}
>
> -	if (bio == bioc->orig_bio)
> -		is_orig_bio = 1;
> +	if (bio != bioc->orig_bio)
> +		bio_put(bio);
>
>   	btrfs_bio_counter_dec(bioc->fs_info);
> -
> -	if (atomic_dec_and_test(&bioc->stripes_pending)) {
> -		if (!is_orig_bio) {
> -			bio_put(bio);
> -			bio = bioc->orig_bio;
> -		}
> -
> -		btrfs_bio(bio)->mirror_num = bioc->mirror_num;
> -		/* only send an error to the higher layers if it is
> -		 * beyond the tolerance of the btrfs bio
> -		 */
> -		if (atomic_read(&bioc->error) > bioc->max_errors) {
> -			bio->bi_status = BLK_STS_IOERR;
> -		} else {
> -			/*
> -			 * this bio is actually up to date, we didn't
> -			 * go over the max number of errors
> -			 */
> -			bio->bi_status = BLK_STS_OK;
> -		}
> -
> -		btrfs_end_bioc(bioc, bio);
> -	} else if (!is_orig_bio) {
> -		bio_put(bio);
> -	}
> +	if (atomic_dec_and_test(&bioc->stripes_pending))
> +		btrfs_end_bioc(bioc);
>   }
>
>   static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
> @@ -6758,23 +6745,6 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
>   	submit_bio(bio);
>   }
>
> -static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical)
> -{
> -	atomic_inc(&bioc->error);
> -	if (atomic_dec_and_test(&bioc->stripes_pending)) {
> -		/* Should be the original bio. */
> -		WARN_ON(bio != bioc->orig_bio);
> -
> -		btrfs_bio(bio)->mirror_num = bioc->mirror_num;
> -		bio->bi_iter.bi_sector = logical >> 9;
> -		if (atomic_read(&bioc->error) > bioc->max_errors)
> -			bio->bi_status = BLK_STS_IOERR;
> -		else
> -			bio->bi_status = BLK_STS_OK;
> -		btrfs_end_bioc(bioc, bio);
> -	}
> -}
> -
>   blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   			   int mirror_num)
>   {
> @@ -6833,7 +6803,9 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   						   &dev->dev_state) ||
>   		    (btrfs_op(first_bio) == BTRFS_MAP_WRITE &&
>   		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
> -			bioc_error(bioc, first_bio, logical);
> +			atomic_inc(&bioc->error);
> +			if (atomic_dec_and_test(&bioc->stripes_pending))
> +				btrfs_end_bioc(bioc);
>   			continue;
>   		}
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio
  2022-03-22 15:55 ` [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio Christoph Hellwig
@ 2022-03-23  0:34   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:34 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Remove the unused bio_flags argument and clean up the code flow to be
> straight forward.

That flag is a legacy when we use a function pointer for both data and
metadata bio submission.

After commit 953651eb308f ("btrfs: factor out helper adding a page to
bio") and commit 1b36294a6cd5 ("btrfs: call submit_bio_hook directly for
metadata pages") we get rid of the hook, and no longer needs the extra
@bio_flags.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/disk-io.c   | 42 ++++++++++++++++--------------------------
>   fs/btrfs/disk-io.h   |  2 +-
>   fs/btrfs/extent_io.c |  2 +-
>   3 files changed, 18 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index dc497e17dcd06..f43c9ab86e617 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -890,6 +890,10 @@ static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio,
>   	return btree_csum_one_bio(bio);
>   }
>
> +/*
> + * Check if metadata writes should be submitted by async threads so that
> + * checksumming can happen in parallel across all CPUs.
> + */
>   static bool should_async_write(struct btrfs_fs_info *fs_info,
>   			     struct btrfs_inode *bi)
>   {
> @@ -903,41 +907,27 @@ static bool should_async_write(struct btrfs_fs_info *fs_info,
>   }
>
>   blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
> -				       int mirror_num, unsigned long bio_flags)
> +				       int mirror_num)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	blk_status_t ret;
>
> -	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
> -		/*
> -		 * called for a read, do the setup so that checksum validation
> -		 * can happen in the async kernel threads
> -		 */
> -		ret = btrfs_bio_wq_end_io(fs_info, bio,
> -					  BTRFS_WQ_ENDIO_METADATA);
> -		if (ret)
> -			goto out_w_error;
> -		ret = btrfs_map_bio(fs_info, bio, mirror_num);
> -	} else if (!should_async_write(fs_info, BTRFS_I(inode))) {
> +	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
> +		if (should_async_write(fs_info, BTRFS_I(inode)))
> +			return btrfs_wq_submit_bio(inode, bio, mirror_num, 0, 0,
> +						   btree_submit_bio_start);
>   		ret = btree_csum_one_bio(bio);
>   		if (ret)
> -			goto out_w_error;
> -		ret = btrfs_map_bio(fs_info, bio, mirror_num);
> +			return ret;
>   	} else {
> -		/*
> -		 * kthread helpers are used to submit writes so that
> -		 * checksumming can happen in parallel across all CPUs
> -		 */
> -		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, 0,
> -					  0, btree_submit_bio_start);
> +		/* checksum validation should happen in async threads: */
> +		ret = btrfs_bio_wq_end_io(fs_info, bio,
> +					  BTRFS_WQ_ENDIO_METADATA);
> +		if (ret)
> +			return ret;
>   	}
>
> -	if (ret)
> -		goto out_w_error;
> -	return 0;
> -
> -out_w_error:
> -	return ret;
> +	return btrfs_map_bio(fs_info, bio, mirror_num);
>   }
>
>   #ifdef CONFIG_MIGRATION
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 2364a30cd9e32..afe3bb96616c9 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -87,7 +87,7 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
>   				   struct page *page, u64 start, u64 end,
>   				   int mirror);
>   blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
> -				       int mirror_num, unsigned long bio_flags);
> +				       int mirror_num);
>   #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>   struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
>   #endif
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 238252f86d5ad..58ef0f4fca361 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -179,7 +179,7 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num,
>   					    bio_flags);
>   	else
>   		ret = btrfs_submit_metadata_bio(tree->private_data, bio,
> -						mirror_num, bio_flags);
> +						mirror_num);
>
>   	if (ret) {
>   		bio->bi_status = ret;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio
  2022-03-22 15:55 ` [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio Christoph Hellwig
@ 2022-03-23  0:44   ` Qu Wenruo
  2022-03-23  6:08     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:44 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Clean up the code flow to be straight forward.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/inode.c | 85 +++++++++++++++++++++---------------------------
>   1 file changed, 37 insertions(+), 48 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 325e773c6e880..a54b7fd4658d0 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2511,67 +2511,56 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	struct btrfs_root *root = BTRFS_I(inode)->root;
> -	enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
> -	blk_status_t ret = 0;
> -	int skip_sum;
> -	int async = !atomic_read(&BTRFS_I(inode)->sync_writers);
> -
> -	skip_sum = (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ||
> -		test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
> -
> -	if (btrfs_is_free_space_inode(BTRFS_I(inode)))
> -		metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
> +	struct btrfs_inode *bi = BTRFS_I(inode);
> +	blk_status_t ret;
>
>   	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> -		struct page *page = bio_first_bvec_all(bio)->bv_page;
> -		loff_t file_offset = page_offset(page);
> -
> -		ret = extract_ordered_extent(BTRFS_I(inode), bio, file_offset);
> +		ret = extract_ordered_extent(bi, bio,
> +				page_offset(bio_first_bvec_all(bio)->bv_page));
>   		if (ret)
> -			goto out;
> +			return ret;
>   	}
>
> -	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
> +	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
> +		if ((bi->flags & BTRFS_INODE_NODATASUM) ||
> +		    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
> +			goto mapit;
> +
> +		if (!atomic_read(&bi->sync_writers)) {
> +			/* csum items have already been cloned */
> +			if (btrfs_is_data_reloc_root(bi->root))
> +				goto mapit;
> +			return btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
> +						  0, btrfs_submit_bio_start);
> +		}
> +		ret = btrfs_csum_one_bio(bi, bio, 0, 0);
> +		if (ret)
> +			return ret;

Previously we would also call bio_endio() on the bio, do we miss the
endio call on it?


> +	} else {
> +		enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
> +
> +		if (btrfs_is_free_space_inode(bi))
> +			metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
> +
>   		ret = btrfs_bio_wq_end_io(fs_info, bio, metadata);
>   		if (ret)
> -			goto out;
> +			return ret;
>
> -		if (bio_flags & EXTENT_BIO_COMPRESSED) {
> -			ret = btrfs_submit_compressed_read(inode, bio,
> +		if (bio_flags & EXTENT_BIO_COMPRESSED)
> +			return btrfs_submit_compressed_read(inode, bio,
>   							   mirror_num,
>   							   bio_flags);
> -			goto out;

Previously the out tag also ends the io for the bio.
Now we don't.

> -		} else {
> -			/*
> -			 * Lookup bio sums does extra checks around whether we
> -			 * need to csum or not, which is why we ignore skip_sum
> -			 * here.
> -			 */
> -			ret = btrfs_lookup_bio_sums(inode, bio, NULL);
> -			if (ret)
> -				goto out;
> -		}
> -		goto mapit;
> -	} else if (async && !skip_sum) {
> -		/* csum items have already been cloned */
> -		if (btrfs_is_data_reloc_root(root))
> -			goto mapit;
> -		/* we're doing a write, do the async checksumming */
> -		ret = btrfs_wq_submit_bio(inode, bio, mirror_num, bio_flags,
> -					  0, btrfs_submit_bio_start);
> -		goto out;
> -	} else if (!skip_sum) {
> -		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, 0, 0);
> +
> +		/*
> +		 * Lookup bio sums does extra checks around whether we need to
> +		 * csum or not, which is why we ignore skip_sum here.
> +		 */
> +		ret = btrfs_lookup_bio_sums(inode, bio, NULL);
>   		if (ret)
> -			goto out;
> +			return ret;

The same missing endio call for error path.

Thanks,
Qu
>   	}
> -
>   mapit:
> -	ret = btrfs_map_bio(fs_info, bio, mirror_num);
> -
> -out:
> -	return ret;
> +	return btrfs_map_bio(fs_info, bio, mirror_num);
>   }
>
>   /*

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio
  2022-03-22 15:55 ` [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio Christoph Hellwig
@ 2022-03-23  0:50   ` Qu Wenruo
  2022-03-23  6:09     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:50 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Remove the pointless goto just to return err and clean up the code flow
> to be a little more straight forward.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/inode.c | 59 ++++++++++++++++++++++--------------------------
>   1 file changed, 27 insertions(+), 32 deletions(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a54b7fd4658d0..5c9d8e8a98466 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7844,47 +7844,42 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
>   		struct inode *inode, u64 file_offset, int async_submit)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> +	struct btrfs_inode *bi = BTRFS_I(inode);
>   	struct btrfs_dio_private *dip = bio->bi_private;
> -	bool write = btrfs_op(bio) == BTRFS_MAP_WRITE;
>   	blk_status_t ret;
>
> -	/* Check btrfs_submit_bio_hook() for rules about async submit. */
> -	if (async_submit)
> -		async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
> +	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
> +		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
> +			/* See btrfs_submit_data_bio for async submit rules */
> +			if (async_submit && !atomic_read(&bi->sync_writers))
> +				return btrfs_wq_submit_bio(inode, bio, 0, 0,
> +					file_offset,
> +					btrfs_submit_bio_start_direct_io);
>
> -	if (!write) {
> +			/*
> +			 * If we aren't doing async submit, calculate the csum of the
> +			 * bio now.
> +			 */
> +			ret = btrfs_csum_one_bio(bi, bio, file_offset, 1);
> +			if (ret)
> +				return ret;
> +		}
> +	} else {
>   		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
>   		if (ret)
> -			goto err;
> -	}
> -
> -	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
> -		goto map;
> +			return ret;
>
> -	if (write && async_submit) {
> -		ret = btrfs_wq_submit_bio(inode, bio, 0, 0, file_offset,
> -					  btrfs_submit_bio_start_direct_io);
> -		goto err;
> -	} else if (write) {
> -		/*
> -		 * If we aren't doing async submit, calculate the csum of the
> -		 * bio now.
> -		 */
> -		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, 1);
> -		if (ret)
> -			goto err;
> -	} else {
> -		u64 csum_offset;
> +		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
> +			u64 csum_offset;
>
> -		csum_offset = file_offset - dip->file_offset;
> -		csum_offset >>= fs_info->sectorsize_bits;
> -		csum_offset *= fs_info->csum_size;
> -		btrfs_bio(bio)->csum = dip->csums + csum_offset;
> +			csum_offset = file_offset - dip->file_offset;
> +			csum_offset >>= fs_info->sectorsize_bits;
> +			csum_offset *= fs_info->csum_size;
> +			btrfs_bio(bio)->csum = dip->csums + csum_offset;
> +		}
>   	}
> -map:
> -	ret = btrfs_map_bio(fs_info, bio, 0);
> -err:
> -	return ret;
> +
> +	return btrfs_map_bio(fs_info, bio, 0);

Can we just put btrfs_map_bio() call into each read/write branch?

In fact it's the shared single btrfs_map_bio() still requires us to use
if () {} else {}.

I manually checked the code, it looks fine to me.

Although related to btrfs_op(bio), personally I would put some extra
ASSERT()s to make sure in this function we only got either MAP_WRITE or
MAP_READ, no other values allowed.

Thanks,
Qu

>   }
>
>   /*

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio
  2022-03-22 15:55 ` [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio Christoph Hellwig
@ 2022-03-23  0:54   ` Qu Wenruo
  2022-03-23  6:11     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:54 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> All the I/O going through the btrfs_bio based path are associated with an
> inode.  Add a pointer to it to simplify a few things soon.  Also pass the
> bio operation to btrfs_bio_alloc given that we have to touch it anyway.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Something I want to avoid is to futher increasing the size of btrfs_bio.

For buffered uncompressed IO, we can grab the inode from the first page.
For direct IO we have bio->bi_private (btrfs_dio_private).
For compressed IO, it's bio->bi_private again (compressed_bio).

Do the saved code lines really validate the memory usage for all bios?

Thanks,
Qu

> ---
>   fs/btrfs/compression.c |  4 +---
>   fs/btrfs/extent_io.c   | 23 ++++++++++++-----------
>   fs/btrfs/extent_io.h   |  6 ++++--
>   fs/btrfs/inode.c       |  3 ++-
>   fs/btrfs/volumes.h     |  2 ++
>   5 files changed, 21 insertions(+), 17 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 71e5b2e9a1ba8..419a09d924290 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -464,10 +464,8 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
>   	struct bio *bio;
>   	int ret;
>
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS);
> -
> +	bio = btrfs_bio_alloc(cb->inode, BIO_MAX_VECS, opf);
>   	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
> -	bio->bi_opf = opf;
>   	bio->bi_private = cb;
>   	bio->bi_end_io = endio_func;
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 58ef0f4fca361..116a65787e314 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2657,10 +2657,9 @@ int btrfs_repair_one_sector(struct inode *inode,
>   		return -EIO;
>   	}
>
> -	repair_bio = btrfs_bio_alloc(1);
> +	repair_bio = btrfs_bio_alloc(inode, 1, REQ_OP_READ);
>   	repair_bbio = btrfs_bio(repair_bio);
>   	repair_bbio->file_offset = start;
> -	repair_bio->bi_opf = REQ_OP_READ;
>   	repair_bio->bi_end_io = failed_bio->bi_end_io;
>   	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
>   	repair_bio->bi_private = failed_bio->bi_private;
> @@ -3128,9 +3127,10 @@ static void end_bio_extent_readpage(struct bio *bio)
>    * new bio by bio_alloc_bioset as it does not initialize the bytes outside of
>    * 'bio' because use of __GFP_ZERO is not supported.
>    */
> -static inline void btrfs_bio_init(struct btrfs_bio *bbio)
> +static inline void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode)
>   {
>   	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
> +	bbio->inode = inode;
>   }
>
>   /*
> @@ -3138,13 +3138,14 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio)
>    *
>    * The bio allocation is backed by bioset and does not fail.
>    */
> -struct bio *btrfs_bio_alloc(unsigned int nr_iovecs)
> +struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
> +		unsigned int opf)
>   {
>   	struct bio *bio;
>
>   	ASSERT(0 < nr_iovecs && nr_iovecs <= BIO_MAX_VECS);
> -	bio = bio_alloc_bioset(NULL, nr_iovecs, 0, GFP_NOFS, &btrfs_bioset);
> -	btrfs_bio_init(btrfs_bio(bio));
> +	bio = bio_alloc_bioset(NULL, nr_iovecs, opf, GFP_NOFS, &btrfs_bioset);
> +	btrfs_bio_init(btrfs_bio(bio), inode);
>   	return bio;
>   }
>
> @@ -3156,12 +3157,13 @@ struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio)
>   	/* Bio allocation backed by a bioset does not fail */
>   	new = bio_alloc_clone(bdev, bio, GFP_NOFS, &btrfs_bioset);
>   	bbio = btrfs_bio(new);
> -	btrfs_bio_init(bbio);
> +	btrfs_bio_init(btrfs_bio(new), btrfs_bio(bio)->inode);
>   	bbio->iter = bio->bi_iter;
>   	return new;
>   }
>
> -struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
> +struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
> +		u64 offset, u64 size)
>   {
>   	struct bio *bio;
>   	struct btrfs_bio *bbio;
> @@ -3173,7 +3175,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
>   	ASSERT(bio);
>
>   	bbio = btrfs_bio(bio);
> -	btrfs_bio_init(bbio);
> +	btrfs_bio_init(btrfs_bio(bio), inode);
>
>   	bio_trim(bio, offset >> 9, size >> 9);
>   	bbio->iter = bio->bi_iter;
> @@ -3308,7 +3310,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
>   	struct bio *bio;
>   	int ret;
>
> -	bio = btrfs_bio_alloc(BIO_MAX_VECS);
> +	bio = btrfs_bio_alloc(&inode->vfs_inode, BIO_MAX_VECS, opf);
>   	/*
>   	 * For compressed page range, its disk_bytenr is always @disk_bytenr
>   	 * passed in, no matter if we have added any range into previous bio.
> @@ -3321,7 +3323,6 @@ static int alloc_new_bio(struct btrfs_inode *inode,
>   	bio_ctrl->bio_flags = bio_flags;
>   	bio->bi_end_io = end_io_func;
>   	bio->bi_private = &inode->io_tree;
> -	bio->bi_opf = opf;
>   	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
>   	if (ret < 0)
>   		goto error;
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 72d86f228c56e..d5f3d9692ea29 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -277,9 +277,11 @@ void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
>   void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
>   				  struct page *locked_page,
>   				  u32 bits_to_clear, unsigned long page_ops);
> -struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
> +struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
> +		unsigned int opf);
>   struct bio *btrfs_bio_clone(struct block_device *bdev, struct bio *bio);
> -struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
> +struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
> +		u64 offset, u64 size);
>
>   void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
>   int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 5c9d8e8a98466..18d54cfedf829 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7987,7 +7987,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   		 * This will never fail as it's passing GPF_NOFS and
>   		 * the allocation is backed by btrfs_bioset.
>   		 */
> -		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
> +		bio = btrfs_bio_clone_partial(inode, dio_bio, clone_offset,
> +					      clone_len);
>   		bio->bi_private = dip;
>   		bio->bi_end_io = btrfs_end_dio_bio;
>   		btrfs_bio(bio)->file_offset = file_offset;
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index c22148bebc2f5..a4f942547002e 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -321,6 +321,8 @@ struct btrfs_fs_devices {
>    * Mostly for btrfs specific features like csum and mirror_num.
>    */
>   struct btrfs_bio {
> +	struct inode *inode;
> +
>   	unsigned int mirror_num;
>
>   	/* for direct I/O */

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 24/40] btrfs: remove btrfs_end_io_wq
  2022-03-22 15:55 ` [PATCH 24/40] btrfs: remove btrfs_end_io_wq Christoph Hellwig
@ 2022-03-23  0:57   ` Qu Wenruo
  2022-03-23  6:11     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  0:57 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Avoid the extra allocation for all read bios by embedding a btrfs_work
> and I/O end type into the btrfs_bio structure.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Do we really need to bump the size of btrfs_bio furthermore?

Especially btrfs_bio is allocated for each 64K stripe...

Thanks,
Qu
> ---
>   fs/btrfs/compression.c |  24 +++------
>   fs/btrfs/ctree.h       |   1 -
>   fs/btrfs/disk-io.c     | 112 +----------------------------------------
>   fs/btrfs/disk-io.h     |  10 ----
>   fs/btrfs/inode.c       |  19 +++----
>   fs/btrfs/super.c       |  11 +---
>   fs/btrfs/volumes.c     |  44 ++++++++++++++--
>   fs/btrfs/volumes.h     |  11 ++++
>   8 files changed, 66 insertions(+), 166 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 419a09d924290..ae6f986058c75 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -423,20 +423,6 @@ static void end_compressed_bio_write(struct bio *bio)
>   	bio_put(bio);
>   }
>
> -static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
> -					  struct compressed_bio *cb,
> -					  struct bio *bio, int mirror_num)
> -{
> -	blk_status_t ret;
> -
> -	ASSERT(bio->bi_iter.bi_size);
> -	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
> -	if (ret)
> -		return ret;
> -	ret = btrfs_map_bio(fs_info, bio, mirror_num);
> -	return ret;
> -}
> -
>   /*
>    * Allocate a compressed_bio, which will be used to read/write on-disk
>    * (aka, compressed) * data.
> @@ -468,6 +454,10 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
>   	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
>   	bio->bi_private = cb;
>   	bio->bi_end_io = endio_func;
> +	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
> +		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
> +	else
> +		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
>
>   	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
>   	if (IS_ERR(em)) {
> @@ -594,7 +584,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
>   					goto finish_cb;
>   			}
>
> -			ret = submit_compressed_bio(fs_info, cb, bio, 0);
> +			ASSERT(bio->bi_iter.bi_size);
> +			ret = btrfs_map_bio(fs_info, bio, 0);
>   			if (ret)
>   				goto finish_cb;
>   			bio = NULL;
> @@ -930,7 +921,8 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
>   						  fs_info->sectorsize);
>   			sums += fs_info->csum_size * nr_sectors;
>
> -			ret = submit_compressed_bio(fs_info, cb, comp_bio, mirror_num);
> +			ASSERT(comp_bio->bi_iter.bi_size);
> +			ret = btrfs_map_bio(fs_info, comp_bio, mirror_num);
>   			if (ret)
>   				goto finish_cb;
>   			comp_bio = NULL;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index ebb2d109e8bb2..c22a24ca81652 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -823,7 +823,6 @@ struct btrfs_fs_info {
>   	struct btrfs_workqueue *endio_meta_workers;
>   	struct btrfs_workqueue *endio_raid56_workers;
>   	struct btrfs_workqueue *rmw_workers;
> -	struct btrfs_workqueue *endio_meta_write_workers;
>   	struct btrfs_workqueue *endio_write_workers;
>   	struct btrfs_workqueue *endio_freespace_worker;
>   	struct btrfs_workqueue *caching_workers;
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index f43c9ab86e617..bb910b78bbc82 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -51,7 +51,6 @@
>   				 BTRFS_SUPER_FLAG_METADUMP |\
>   				 BTRFS_SUPER_FLAG_METADUMP_V2)
>
> -static void end_workqueue_fn(struct btrfs_work *work);
>   static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
>   static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
>   				      struct btrfs_fs_info *fs_info);
> @@ -64,40 +63,6 @@ static int btrfs_destroy_pinned_extent(struct btrfs_fs_info *fs_info,
>   static int btrfs_cleanup_transaction(struct btrfs_fs_info *fs_info);
>   static void btrfs_error_commit_super(struct btrfs_fs_info *fs_info);
>
> -/*
> - * btrfs_end_io_wq structs are used to do processing in task context when an IO
> - * is complete.  This is used during reads to verify checksums, and it is used
> - * by writes to insert metadata for new file extents after IO is complete.
> - */
> -struct btrfs_end_io_wq {
> -	struct bio *bio;
> -	bio_end_io_t *end_io;
> -	void *private;
> -	struct btrfs_fs_info *info;
> -	blk_status_t status;
> -	enum btrfs_wq_endio_type metadata;
> -	struct btrfs_work work;
> -};
> -
> -static struct kmem_cache *btrfs_end_io_wq_cache;
> -
> -int __init btrfs_end_io_wq_init(void)
> -{
> -	btrfs_end_io_wq_cache = kmem_cache_create("btrfs_end_io_wq",
> -					sizeof(struct btrfs_end_io_wq),
> -					0,
> -					SLAB_MEM_SPREAD,
> -					NULL);
> -	if (!btrfs_end_io_wq_cache)
> -		return -ENOMEM;
> -	return 0;
> -}
> -
> -void __cold btrfs_end_io_wq_exit(void)
> -{
> -	kmem_cache_destroy(btrfs_end_io_wq_cache);
> -}
> -
>   static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info)
>   {
>   	if (fs_info->csum_shash)
> @@ -726,54 +691,6 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
>   	return ret;
>   }
>
> -static void end_workqueue_bio(struct bio *bio)
> -{
> -	struct btrfs_end_io_wq *end_io_wq = bio->bi_private;
> -	struct btrfs_fs_info *fs_info;
> -	struct btrfs_workqueue *wq;
> -
> -	fs_info = end_io_wq->info;
> -	end_io_wq->status = bio->bi_status;
> -
> -	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
> -		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
> -			wq = fs_info->endio_meta_write_workers;
> -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
> -			wq = fs_info->endio_freespace_worker;
> -		else
> -			wq = fs_info->endio_write_workers;
> -	} else {
> -		if (end_io_wq->metadata)
> -			wq = fs_info->endio_meta_workers;
> -		else
> -			wq = fs_info->endio_workers;
> -	}
> -
> -	btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL);
> -	btrfs_queue_work(wq, &end_io_wq->work);
> -}
> -
> -blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
> -			enum btrfs_wq_endio_type metadata)
> -{
> -	struct btrfs_end_io_wq *end_io_wq;
> -
> -	end_io_wq = kmem_cache_alloc(btrfs_end_io_wq_cache, GFP_NOFS);
> -	if (!end_io_wq)
> -		return BLK_STS_RESOURCE;
> -
> -	end_io_wq->private = bio->bi_private;
> -	end_io_wq->end_io = bio->bi_end_io;
> -	end_io_wq->info = info;
> -	end_io_wq->status = 0;
> -	end_io_wq->bio = bio;
> -	end_io_wq->metadata = metadata;
> -
> -	bio->bi_private = end_io_wq;
> -	bio->bi_end_io = end_workqueue_bio;
> -	return 0;
> -}
> -
>   static void run_one_async_start(struct btrfs_work *work)
>   {
>   	struct async_submit_bio *async;
> @@ -921,10 +838,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
>   			return ret;
>   	} else {
>   		/* checksum validation should happen in async threads: */
> -		ret = btrfs_bio_wq_end_io(fs_info, bio,
> -					  BTRFS_WQ_ENDIO_METADATA);
> -		if (ret)
> -			return ret;
> +		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_METADATA_READ;
>   	}
>
>   	return btrfs_map_bio(fs_info, bio, mirror_num);
> @@ -1888,25 +1802,6 @@ struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
>   	return root;
>   }
>
> -/*
> - * called by the kthread helper functions to finally call the bio end_io
> - * functions.  This is where read checksum verification actually happens
> - */
> -static void end_workqueue_fn(struct btrfs_work *work)
> -{
> -	struct bio *bio;
> -	struct btrfs_end_io_wq *end_io_wq;
> -
> -	end_io_wq = container_of(work, struct btrfs_end_io_wq, work);
> -	bio = end_io_wq->bio;
> -
> -	bio->bi_status = end_io_wq->status;
> -	bio->bi_private = end_io_wq->private;
> -	bio->bi_end_io = end_io_wq->end_io;
> -	bio_endio(bio);
> -	kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
> -}
> -
>   static int cleaner_kthread(void *arg)
>   {
>   	struct btrfs_root *root = arg;
> @@ -2219,7 +2114,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
>   	 * queues can do metadata I/O operations.
>   	 */
>   	btrfs_destroy_workqueue(fs_info->endio_meta_workers);
> -	btrfs_destroy_workqueue(fs_info->endio_meta_write_workers);
>   }
>
>   static void free_root_extent_buffers(struct btrfs_root *root)
> @@ -2404,9 +2298,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
>   	fs_info->endio_meta_workers =
>   		btrfs_alloc_workqueue(fs_info, "endio-meta", flags,
>   				      max_active, 4);
> -	fs_info->endio_meta_write_workers =
> -		btrfs_alloc_workqueue(fs_info, "endio-meta-write", flags,
> -				      max_active, 2);
>   	fs_info->endio_raid56_workers =
>   		btrfs_alloc_workqueue(fs_info, "endio-raid56", flags,
>   				      max_active, 4);
> @@ -2429,7 +2320,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info)
>   	if (!(fs_info->workers && fs_info->delalloc_workers &&
>   	      fs_info->flush_workers &&
>   	      fs_info->endio_workers && fs_info->endio_meta_workers &&
> -	      fs_info->endio_meta_write_workers &&
>   	      fs_info->endio_write_workers && fs_info->endio_raid56_workers &&
>   	      fs_info->endio_freespace_worker && fs_info->rmw_workers &&
>   	      fs_info->caching_workers && fs_info->fixup_workers &&
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index afe3bb96616c9..e8900c1b71664 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -17,12 +17,6 @@
>    */
>   #define BTRFS_BDEV_BLOCKSIZE	(4096)
>
> -enum btrfs_wq_endio_type {
> -	BTRFS_WQ_ENDIO_DATA,
> -	BTRFS_WQ_ENDIO_METADATA,
> -	BTRFS_WQ_ENDIO_FREE_SPACE,
> -};
> -
>   static inline u64 btrfs_sb_offset(int mirror)
>   {
>   	u64 start = SZ_16K;
> @@ -119,8 +113,6 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
>   			  int atomic);
>   int btrfs_read_buffer(struct extent_buffer *buf, u64 parent_transid, int level,
>   		      struct btrfs_key *first_key);
> -blk_status_t btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct bio *bio,
> -			enum btrfs_wq_endio_type metadata);
>   blk_status_t btrfs_wq_submit_bio(struct inode *inode, struct bio *bio,
>   				 int mirror_num, unsigned long bio_flags,
>   				 u64 dio_file_offset,
> @@ -144,8 +136,6 @@ int btree_lock_page_hook(struct page *page, void *data,
>   int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
>   int btrfs_get_free_objectid(struct btrfs_root *root, u64 *objectid);
>   int btrfs_init_root_free_objectid(struct btrfs_root *root);
> -int __init btrfs_end_io_wq_init(void);
> -void __cold btrfs_end_io_wq_exit(void);
>
>   #ifdef CONFIG_DEBUG_LOCK_ALLOC
>   void btrfs_set_buffer_lockdep_class(u64 objectid,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 18d54cfedf829..5a5474fac0b28 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2512,6 +2512,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	struct btrfs_inode *bi = BTRFS_I(inode);
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
>   	blk_status_t ret;
>
>   	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> @@ -2537,14 +2538,10 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>   		if (ret)
>   			return ret;
>   	} else {
> -		enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
> -embedding a btrfs_work
and I/O end type
>   		if (btrfs_is_free_space_inode(bi))
> -			metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
> -
> -		ret = btrfs_bio_wq_end_io(fs_info, bio, metadata);
> -		if (ret)
> -			return ret;
> +			bbio->end_io_type = BTRFS_ENDIO_WQ_FREE_SPACE_READ;
> +		else
> +			bbio->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
>
>   		if (bio_flags & EXTENT_BIO_COMPRESSED)
>   			return btrfs_submit_compressed_read(inode, bio,
> @@ -7739,9 +7736,7 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
>
>   	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
>
> -	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
> -	if (ret)
> -		return ret;
> +	btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
>
>   	refcount_inc(&dip->refs);
>   	ret = btrfs_map_bio(fs_info, bio, mirror_num);
> @@ -7865,9 +7860,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
>   				return ret;
>   		}
>   	} else {
> -		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
> -		if (ret)
> -			return ret;
> +		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
>
>   		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
>   			u64 csum_offset;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 4d947ba32da9d..33dedca4f0862 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1835,8 +1835,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info,
>   	btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size);
>   	btrfs_workqueue_set_max(fs_info->endio_workers, new_pool_size);
>   	btrfs_workqueue_set_max(fs_info->endio_meta_workers, new_pool_size);
> -	btrfs_workqueue_set_max(fs_info->endio_meta_write_workers,
> -				new_pool_size);
>   	btrfs_workqueue_set_max(fs_info->endio_write_workers, new_pool_size);
>   	btrfs_workqueue_set_max(fs_info->endio_freespace_worker, new_pool_size);
>   	btrfs_workqueue_set_max(fs_info->delayed_workers, new_pool_size);
> @@ -2593,13 +2591,9 @@ static int __init init_btrfs_fs(void)
>   	if (err)
>   		goto free_delayed_ref;
>
> -	err = btrfs_end_io_wq_init();
> -	if (err)
> -		goto free_prelim_ref;
> -
>   	err = btrfs_interface_init();
>   	if (err)
> -		goto free_end_io_wq;
> +		goto free_prelim_ref;
>
>   	btrfs_print_mod_info();
>
> @@ -2615,8 +2609,6 @@ static int __init init_btrfs_fs(void)
>
>   unregister_ioctl:
>   	btrfs_interface_exit();
> -free_end_io_wq:
> -	btrfs_end_io_wq_exit();
>   free_prelim_ref:
>   	btrfs_prelim_ref_exit();
>   free_delayed_ref:
> @@ -2654,7 +2646,6 @@ static void __exit exit_btrfs_fs(void)
>   	extent_state_cache_exit();
>   	extent_io_exit();
>   	btrfs_interface_exit();
> -	btrfs_end_io_wq_exit();
>   	unregister_filesystem(&btrfs_fs_type);
>   	btrfs_exit_sysfs();
>   	btrfs_cleanup_fs_uuids();
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9d1f8c27eff33..9a1eb1166d72f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6659,11 +6659,38 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
>   }
>
> -static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
> +static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_io_context *bioc)
>   {
> +	struct btrfs_fs_info *fs_info = bioc->fs_info;
> +
> +	switch (btrfs_bio(bioc->orig_bio)->end_io_type) {
> +	case BTRFS_ENDIO_WQ_DATA_READ:
> +		return fs_info->endio_workers;
> +	case BTRFS_ENDIO_WQ_DATA_WRITE:
> +		return fs_info->endio_write_workers;
> +	case BTRFS_ENDIO_WQ_METADATA_READ:
> +		return fs_info->endio_meta_workers;
> +	case BTRFS_ENDIO_WQ_FREE_SPACE_READ:
> +		return fs_info->endio_freespace_worker;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +static void btrfs_end_bio_work(struct btrfs_work *work)
> +{
> +	struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, work);
> +
> +	bio_endio(&bbio->bio);
> +}
> +
> +static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
> +{
> +	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bioc) : NULL;
>   	struct bio *bio = bioc->orig_bio;
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
>
> -	btrfs_bio(bio)->mirror_num = bioc->mirror_num;
> +	bbio->mirror_num = bioc->mirror_num;
>   	bio->bi_private = bioc->private;
>   	bio->bi_end_io = bioc->end_io;
>
> @@ -6675,7 +6702,14 @@ static inline void btrfs_end_bioc(struct btrfs_io_context *bioc)
>   		bio->bi_status = BLK_STS_IOERR;
>   	else
>   		bio->bi_status = BLK_STS_OK;
> -	bio_endio(bio);
> +
> +	if (wq) {
> +		btrfs_init_work(&bbio->work, btrfs_end_bio_work, NULL, NULL);
> +		btrfs_queue_work(wq, &bbio->work);
> +	} else {
> +		bio_endio(bio);
> +	}
> +
>   	btrfs_put_bioc(bioc);
>   }
>
> @@ -6707,7 +6741,7 @@ static void btrfs_end_bio(struct bio *bio)
>
>   	btrfs_bio_counter_dec(bioc->fs_info);
>   	if (atomic_dec_and_test(&bioc->stripes_pending))
> -		btrfs_end_bioc(bioc);
> +		btrfs_end_bioc(bioc, true);
>   }
>
>   static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
> @@ -6805,7 +6839,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
>   			atomic_inc(&bioc->error);
>   			if (atomic_dec_and_test(&bioc->stripes_pending))
> -				btrfs_end_bioc(bioc);
> +				btrfs_end_bioc(bioc, false);
>   			continue;
>   		}
>
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index a4f942547002e..51a27180004eb 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -315,6 +315,14 @@ struct btrfs_fs_devices {
>   				- 2 * sizeof(struct btrfs_chunk))	\
>   				/ sizeof(struct btrfs_stripe) + 1)
>
> +enum btrfs_endio_type {
> +	BTRFS_ENDIO_NONE = 0,
> +	BTRFS_ENDIO_WQ_DATA_READ,
> +	BTRFS_ENDIO_WQ_DATA_WRITE,
> +	BTRFS_ENDIO_WQ_METADATA_READ,
> +	BTRFS_ENDIO_WQ_FREE_SPACE_READ,
> +};
> +
>   /*
>    * Additional info to pass along bio.
>    *
> @@ -324,6 +332,9 @@ struct btrfs_bio {
>   	struct inode *inode;
>
>   	unsigned int mirror_num;
> +
> +	enum btrfs_endio_type end_io_type;
> +	struct btrfs_work work;
>
>   	/* for direct I/O */
>   	u64 file_offset;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 26/40] btrfs: refactor btrfs_map_bio
  2022-03-22 15:55 ` [PATCH 26/40] btrfs: refactor btrfs_map_bio Christoph Hellwig
@ 2022-03-23  1:03   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:03 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Use a label for common cleanup, untangle the conditionals for parity
> RAID and move all per-stripe handling into submit_stripe_bio.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/volumes.c | 88 ++++++++++++++++++++++------------------------
>   1 file changed, 42 insertions(+), 46 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9a1eb1166d72f..1cf0914b33847 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6744,10 +6744,30 @@ static void btrfs_end_bio(struct bio *bio)
>   		btrfs_end_bioc(bioc, true);
>   }
>
> -static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
> -			      u64 physical, struct btrfs_device *dev)
> +static void submit_stripe_bio(struct btrfs_io_context *bioc,
> +		struct bio *orig_bio, int dev_nr, bool clone)
>   {
>   	struct btrfs_fs_info *fs_info = bioc->fs_info;
> +	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
> +	u64 physical = bioc->stripes[dev_nr].physical;
> +	struct bio *bio;
> +
> +	if (!dev || !dev->bdev ||
> +	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
> +	    (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
> +	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
> +		atomic_inc(&bioc->error);
> +		if (atomic_dec_and_test(&bioc->stripes_pending))
> +			btrfs_end_bioc(bioc, false);
> +		return;
> +	}
> +
> +	if (clone) {
> +		bio = btrfs_bio_clone(dev->bdev, orig_bio);
> +	} else {
> +		bio = orig_bio;
> +		bio_set_dev(bio, dev->bdev);
> +	}
>
>   	bio->bi_private = bioc;
>   	btrfs_bio(bio)->device = dev;
> @@ -6782,46 +6802,40 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
>   blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   			   int mirror_num)
>   {
> -	struct btrfs_device *dev;
> -	struct bio *first_bio = bio;
>   	u64 logical = bio->bi_iter.bi_sector << 9;
> -	u64 length = 0;
> -	u64 map_length;
> +	u64 length = bio->bi_iter.bi_size;
> +	u64 map_length = length;
>   	int ret;
>   	int dev_nr;
>   	int total_devs;
>   	struct btrfs_io_context *bioc = NULL;
>
> -	length = bio->bi_iter.bi_size;
> -	map_length = length;
> -
>   	btrfs_bio_counter_inc_blocked(fs_info);
>   	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
>   				&map_length, &bioc, mirror_num, 1);
> -	if (ret) {
> -		btrfs_bio_counter_dec(fs_info);
> -		return errno_to_blk_status(ret);
> -	}
> +	if (ret)
> +		goto out_dec;
>
>   	total_devs = bioc->num_stripes;
> -	bioc->orig_bio = first_bio;
> -	bioc->private = first_bio->bi_private;
> -	bioc->end_io = first_bio->bi_end_io;
> +	bioc->orig_bio = bio;
> +	bioc->private = bio->bi_private;
> +	bioc->end_io = bio->bi_end_io;
>   	atomic_set(&bioc->stripes_pending, bioc->num_stripes);
>
> -	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
> -	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
> -		/* In this case, map_length has been set to the length of
> -		   a single stripe; not the whole write */
> +	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
> +		/*
> +		 * In this case, map_length has been set to the length of a
> +		 * single stripe; not the whole write.
> +		 */
>   		if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
>   			ret = raid56_parity_write(bio, bioc, map_length);
> -		} else {
> +			goto out_dec;
> +		}
> +		if (mirror_num > 1) {
>   			ret = raid56_parity_recover(bio, bioc, map_length,
>   						    mirror_num, 1);
> +			goto out_dec;
>   		}
> -
> -		btrfs_bio_counter_dec(fs_info);
> -		return errno_to_blk_status(ret);

Can we add some extra comment on the fall through cases?

The read path still needs to go through the regular routine.

Otherwise looks good to me.

Thanks,
Qu

>   	}
>
>   	if (map_length < length) {
> @@ -6831,29 +6845,11 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   		BUG();
>   	}
>
> -	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
> -		dev = bioc->stripes[dev_nr].dev;
> -		if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
> -						   &dev->dev_state) ||
> -		    (btrfs_op(first_bio) == BTRFS_MAP_WRITE &&
> -		    !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
> -			atomic_inc(&bioc->error);
> -			if (atomic_dec_and_test(&bioc->stripes_pending))
> -				btrfs_end_bioc(bioc, false);
> -			continue;
> -		}
> -
> -		if (dev_nr < total_devs - 1) {
> -			bio = btrfs_bio_clone(dev->bdev, first_bio);
> -		} else {
> -			bio = first_bio;
> -			bio_set_dev(bio, dev->bdev);
> -		}
> -
> -		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
> -	}
> +	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
> +		submit_stripe_bio(bioc, bio, dev_nr, dev_nr < total_devs - 1);
> +out_dec:
>   	btrfs_bio_counter_dec(fs_info);
> -	return BLK_STS_OK;
> +	return errno_to_blk_status(ret);
>   }
>
>   static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block
  2022-03-22 15:55 ` [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block Christoph Hellwig
@ 2022-03-23  1:08   ` Qu Wenruo
  2022-03-23  6:13     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:08 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> Clear need_raid_map early instead of repeating the same conditional over
> and over.

I had a more comprehensive cleanup, but only for scrub:
https://lore.kernel.org/linux-btrfs/cover.1646984153.git.wqu@suse.com/

All profiles are split into 3 categories:

- Simple mirror
   Single, DUP, RAID1*.

- Simple stripe
   RAID0 and RAID10
   Inside each data stripe, it's just simple mirror then.

- RAID56
   And for mirror_num == 0/1 cases, inside one data stripe it's still
   simple mirror.

Maybe we can follow the same ideas here?

Thanks,
Qu


>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/volumes.c | 60 ++++++++++++++++++++++------------------------
>   1 file changed, 29 insertions(+), 31 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 1cf0914b33847..cc9e2565e4b64 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6435,6 +6435,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>
>   	map = em->map_lookup;
>
> +	if (!(map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) ||
> +	    (!need_full_stripe(op) && mirror_num <= 1))
> +		need_raid_map = 0;
> +
>   	*length = geom.len;
>   	stripe_len = geom.stripe_len;
>   	stripe_nr = geom.stripe_nr;
> @@ -6509,37 +6513,32 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   					      dev_replace_is_ongoing);
>   			mirror_num = stripe_index - old_stripe_index + 1;
>   		}
> +	} else if (need_raid_map) {
> +		/* push stripe_nr back to the start of the full stripe */
> +		stripe_nr = div64_u64(raid56_full_stripe_start,
> +				      stripe_len * data_stripes);
>
> -	} else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
> -		if (need_raid_map && (need_full_stripe(op) || mirror_num > 1)) {
> -			/* push stripe_nr back to the start of the full stripe */
> -			stripe_nr = div64_u64(raid56_full_stripe_start,
> -					stripe_len * data_stripes);
> -
> -			/* RAID[56] write or recovery. Return all stripes */
> -			num_stripes = map->num_stripes;
> -			max_errors = nr_parity_stripes(map);
> -
> -			*length = map->stripe_len;
> -			stripe_index = 0;
> -			stripe_offset = 0;
> -		} else {
> -			/*
> -			 * Mirror #0 or #1 means the original data block.
> -			 * Mirror #2 is RAID5 parity block.
> -			 * Mirror #3 is RAID6 Q block.
> -			 */
> -			stripe_nr = div_u64_rem(stripe_nr,
> -					data_stripes, &stripe_index);
> -			if (mirror_num > 1)
> -				stripe_index = data_stripes + mirror_num - 2;
> +		/* RAID[56] write or recovery. Return all stripes */
> +		num_stripes = map->num_stripes;
> +		max_errors = nr_parity_stripes(map);
>
> -			/* We distribute the parity blocks across stripes */
> -			div_u64_rem(stripe_nr + stripe_index, map->num_stripes,
> -					&stripe_index);
> -			if (!need_full_stripe(op) && mirror_num <= 1)
> -				mirror_num = 1;
> -		}
> +		*length = map->stripe_len;
> +		stripe_index = 0;
> +		stripe_offset = 0;
> +	} else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
> +		/*
> +		 * Mirror #0 or #1 means the original data block.
> +		 * Mirror #2 is RAID5 parity block.
> +		 * Mirror #3 is RAID6 Q block.
> +		 */
> +		stripe_nr = div_u64_rem(stripe_nr, data_stripes, &stripe_index);
> +		if (mirror_num > 1)
> +			stripe_index = data_stripes + mirror_num - 2;
> +		/* We distribute the parity blocks across stripes */
> +		div_u64_rem(stripe_nr + stripe_index, map->num_stripes,
> +			    &stripe_index);
> +		if (!need_full_stripe(op) && mirror_num <= 1)
> +			mirror_num = 1;
>   	} else {
>   		/*
>   		 * after this, stripe_nr is the number of stripes on this
> @@ -6581,8 +6580,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   	}
>
>   	/* Build raid_map */
> -	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
> -	    (need_full_stripe(op) || mirror_num > 1)) {
> +	if (need_raid_map) {
>   		u64 tmp;
>   		unsigned rot;
>

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio
  2022-03-22 15:55 ` [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio Christoph Hellwig
@ 2022-03-23  1:14   ` Qu Wenruo
  2022-03-23  6:13     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:14 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:55, Christoph Hellwig wrote:
> There is very little of the I/O context that is actually needed for
> issuing a bio.  Add the few needed fields to struct btrfs_bio instead.
>
> The stripes array is still allocated on demand when more than a single
> I/O is needed, but for single leg I/O (e.g. all reads) there is no
> additional memory allocation now.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Really not a fan of enlarging btrfs_bio again and again.

Especially for the btrfs_bio_stripe and btrfs_bio_stripe * part.

Considering how many bytes we waste for SINGLE profile, now we need an
extra pointer which we will never really use.

Thanks,
Qu


> ---
>   fs/btrfs/volumes.c | 147 ++++++++++++++++++++++++++++-----------------
>   fs/btrfs/volumes.h |  20 ++++--
>   2 files changed, 107 insertions(+), 60 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index cc9e2565e4b64..cec3f6b9f5c21 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -253,10 +253,9 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
>   static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev);
>   static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
>   static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> -			     enum btrfs_map_op op,
> -			     u64 logical, u64 *length,
> -			     struct btrfs_io_context **bioc_ret,
> -			     int mirror_num, int need_raid_map);
> +		enum btrfs_map_op op, u64 logical, u64 *length,
> +		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
> +		int mirror_num, int need_raid_map);
>
>   /*
>    * Device locking
> @@ -5926,7 +5925,6 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
>   		sizeof(u64) * (total_stripes),
>   		GFP_NOFS|__GFP_NOFAIL);
>
> -	atomic_set(&bioc->error, 0);
>   	refcount_set(&bioc->refs, 1);
>
>   	bioc->fs_info = fs_info;
> @@ -6128,7 +6126,7 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info,
>   	int ret = 0;
>
>   	ret = __btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
> -				logical, &length, &bioc, 0, 0);
> +				logical, &length, &bioc, NULL, 0, 0);
>   	if (ret) {
>   		ASSERT(bioc == NULL);
>   		return ret;
> @@ -6397,10 +6395,9 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
>   }
>
>   static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
> -			     enum btrfs_map_op op,
> -			     u64 logical, u64 *length,
> -			     struct btrfs_io_context **bioc_ret,
> -			     int mirror_num, int need_raid_map)
> +		enum btrfs_map_op op, u64 logical, u64 *length,
> +		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
> +		int mirror_num, int need_raid_map)
>   {
>   	struct extent_map *em;
>   	struct map_lookup *map;
> @@ -6566,6 +6563,48 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   		tgtdev_indexes = num_stripes;
>   	}
>
> +	if (need_full_stripe(op))
> +		max_errors = btrfs_chunk_max_errors(map);
> +
> +	if (bbio && !need_raid_map) {
> +		int replacement_idx = num_stripes;
> +
> +		if (num_alloc_stripes > 1) {
> +			bbio->stripes = kmalloc_array(num_alloc_stripes,
> +					sizeof(*bbio->stripes),
> +					GFP_NOFS | __GFP_NOFAIL);
> +		} else {
> +			bbio->stripes = &bbio->__stripe;
> +		}
> +
> +		atomic_set(&bbio->stripes_pending, num_stripes);
> +		for (i = 0; i < num_stripes; i++) {
> +			struct btrfs_bio_stripe *s = &bbio->stripes[i];
> +
> +			s->physical = map->stripes[stripe_index].physical +
> +				stripe_offset + stripe_nr * map->stripe_len;
> +			s->dev = map->stripes[stripe_index].dev;
> +			stripe_index++;
> +
> +			if (op == BTRFS_MAP_WRITE && dev_replace_is_ongoing &&
> +			    dev_replace->tgtdev &&
> +			    !is_block_group_to_copy(fs_info, logical) &&
> +			    s->dev->devid == dev_replace->srcdev->devid) {
> +				struct btrfs_bio_stripe *r =
> +					&bbio->stripes[replacement_idx++];
> +
> +				r->physical = s->physical;
> +				r->dev = dev_replace->tgtdev;
> +				max_errors++;
> +				atomic_inc(&bbio->stripes_pending);
> +			}
> +		}
> +
> +		bbio->max_errors = max_errors;
> +		bbio->mirror_num = mirror_num;
> +		goto out;
> +	}
> +
>   	bioc = alloc_btrfs_io_context(fs_info, num_alloc_stripes, tgtdev_indexes);
>   	if (!bioc) {
>   		ret = -ENOMEM;
> @@ -6601,9 +6640,6 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   		sort_parity_stripes(bioc, num_stripes);
>   	}
>
> -	if (need_full_stripe(op))
> -		max_errors = btrfs_chunk_max_errors(map);
> -
>   	if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL &&
>   	    need_full_stripe(op)) {
>   		handle_ops_on_dev_replace(op, &bioc, dev_replace, logical,
> @@ -6646,7 +6682,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   						     length, bioc_ret);
>
>   	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
> -				 mirror_num, 0);
> +				 NULL, mirror_num, 0);
>   }
>
>   /* For Scrub/replace */
> @@ -6654,14 +6690,15 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		     u64 logical, u64 *length,
>   		     struct btrfs_io_context **bioc_ret)
>   {
> -	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
> +	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, NULL,
> +				 0, 1);
>   }
>
> -static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_io_context *bioc)
> +static struct btrfs_workqueue *btrfs_end_io_wq(struct btrfs_bio *bbio)
>   {
> -	struct btrfs_fs_info *fs_info = bioc->fs_info;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb);
>
> -	switch (btrfs_bio(bioc->orig_bio)->end_io_type) {
> +	switch (bbio->end_io_type) {
>   	case BTRFS_ENDIO_WQ_DATA_READ:
>   		return fs_info->endio_workers;
>   	case BTRFS_ENDIO_WQ_DATA_WRITE:
> @@ -6682,21 +6719,22 @@ static void btrfs_end_bio_work(struct btrfs_work *work)
>   	bio_endio(&bbio->bio);
>   }
>
> -static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
> +static void btrfs_end_bbio(struct btrfs_bio *bbio, bool async)
>   {
> -	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bioc) : NULL;
> -	struct bio *bio = bioc->orig_bio;
> -	struct btrfs_bio *bbio = btrfs_bio(bio);
> +	struct btrfs_workqueue *wq = async ? btrfs_end_io_wq(bbio) : NULL;
> +	struct bio *bio = &bbio->bio;
>
> -	bbio->mirror_num = bioc->mirror_num;
> -	bio->bi_private = bioc->private;
> -	bio->bi_end_io = bioc->end_io;
> +	bio->bi_private = bbio->private;
> +	bio->bi_end_io = bbio->end_io;
> +
> +	if (bbio->stripes != &bbio->__stripe)
> +		kfree(bbio->stripes);
>
>   	/*
>   	 * Only send an error to the higher layers if it is beyond the tolerance
>   	 * threshold.
>   	 */
> -	if (atomic_read(&bioc->error) > bioc->max_errors)
> +	if (atomic_read(&bbio->error) > bbio->max_errors)
>   		bio->bi_status = BLK_STS_IOERR;
>   	else
>   		bio->bi_status = BLK_STS_OK;
> @@ -6707,16 +6745,14 @@ static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
>   	} else {
>   		bio_endio(bio);
>   	}
> -
> -	btrfs_put_bioc(bioc);
>   }
>
>   static void btrfs_end_bio(struct bio *bio)
>   {
> -	struct btrfs_io_context *bioc = bio->bi_private;
> +	struct btrfs_bio *bbio = bio->bi_private;
>
>   	if (bio->bi_status) {
> -		atomic_inc(&bioc->error);
> +		atomic_inc(&bbio->error);
>   		if (bio->bi_status == BLK_STS_IOERR ||
>   		    bio->bi_status == BLK_STS_TARGET) {
>   			struct btrfs_device *dev = btrfs_bio(bio)->device;
> @@ -6734,40 +6770,39 @@ static void btrfs_end_bio(struct bio *bio)
>   		}
>   	}
>
> -	if (bio != bioc->orig_bio)
> +	if (bio != &bbio->bio)
>   		bio_put(bio);
>
> -	btrfs_bio_counter_dec(bioc->fs_info);
> -	if (atomic_dec_and_test(&bioc->stripes_pending))
> -		btrfs_end_bioc(bioc, true);
> +	btrfs_bio_counter_dec(btrfs_sb(bbio->inode->i_sb));
> +	if (atomic_dec_and_test(&bbio->stripes_pending))
> +		btrfs_end_bbio(bbio, true);
>   }
>
> -static void submit_stripe_bio(struct btrfs_io_context *bioc,
> -		struct bio *orig_bio, int dev_nr, bool clone)
> +static void submit_stripe_bio(struct btrfs_bio *bbio, int dev_nr, bool clone)
>   {
> -	struct btrfs_fs_info *fs_info = bioc->fs_info;
> -	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
> -	u64 physical = bioc->stripes[dev_nr].physical;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb);
> +	struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
> +	u64 physical = bbio->stripes[dev_nr].physical;
>   	struct bio *bio;
>
>   	if (!dev || !dev->bdev ||
>   	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
> -	    (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
> +	    (btrfs_op(&bbio->bio) == BTRFS_MAP_WRITE &&
>   	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
> -		atomic_inc(&bioc->error);
> -		if (atomic_dec_and_test(&bioc->stripes_pending))
> -			btrfs_end_bioc(bioc, false);
> +		atomic_inc(&bbio->error);
> +		if (atomic_dec_and_test(&bbio->stripes_pending))
> +			btrfs_end_bbio(bbio, false);
>   		return;
>   	}
>
>   	if (clone) {
> -		bio = btrfs_bio_clone(dev->bdev, orig_bio);
> +		bio = btrfs_bio_clone(dev->bdev, &bbio->bio);
>   	} else {
> -		bio = orig_bio;
> +		bio = &bbio->bio;
>   		bio_set_dev(bio, dev->bdev);
>   	}
>
> -	bio->bi_private = bioc;
> +	bio->bi_private = bbio;
>   	btrfs_bio(bio)->device = dev;
>   	bio->bi_end_io = btrfs_end_bio;
>   	bio->bi_iter.bi_sector = physical >> 9;
> @@ -6800,6 +6835,7 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
>   blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   			   int mirror_num)
>   {
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
>   	u64 logical = bio->bi_iter.bi_sector << 9;
>   	u64 length = bio->bi_iter.bi_size;
>   	u64 map_length = length;
> @@ -6809,18 +6845,17 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   	struct btrfs_io_context *bioc = NULL;
>
>   	btrfs_bio_counter_inc_blocked(fs_info);
> -	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
> -				&map_length, &bioc, mirror_num, 1);
> +	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
> +				&bioc, bbio, mirror_num, 1);
>   	if (ret)
>   		goto out_dec;
>
> -	total_devs = bioc->num_stripes;
> -	bioc->orig_bio = bio;
> -	bioc->private = bio->bi_private;
> -	bioc->end_io = bio->bi_end_io;
> -	atomic_set(&bioc->stripes_pending, bioc->num_stripes);
> +	bbio->private = bio->bi_private;
> +	bbio->end_io = bio->bi_end_io;
> +
> +	if (bioc) {
> +		ASSERT(bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK);
>
> -	if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
>   		/*
>   		 * In this case, map_length has been set to the length of a
>   		 * single stripe; not the whole write.
> @@ -6834,6 +6869,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   						    mirror_num, 1);
>   			goto out_dec;
>   		}
> +		ASSERT(0);
>   	}
>
>   	if (map_length < length) {
> @@ -6843,8 +6879,9 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
>   		BUG();
>   	}
>
> +	total_devs = atomic_read(&bbio->stripes_pending);
>   	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
> -		submit_stripe_bio(bioc, bio, dev_nr, dev_nr < total_devs - 1);
> +		submit_stripe_bio(bbio, dev_nr, dev_nr < total_devs - 1);
>   out_dec:
>   	btrfs_bio_counter_dec(fs_info);
>   	return errno_to_blk_status(ret);
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 51a27180004eb..cd71cd33a9df2 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -323,6 +323,11 @@ enum btrfs_endio_type {
>   	BTRFS_ENDIO_WQ_FREE_SPACE_READ,
>   };
>
> +struct btrfs_bio_stripe {
> +	struct btrfs_device *dev;
> +	u64 physical;
> +};
> +
>   /*
>    * Additional info to pass along bio.
>    *
> @@ -333,6 +338,16 @@ struct btrfs_bio {
>
>   	unsigned int mirror_num;
>
> +	atomic_t stripes_pending;
> +	atomic_t error;
> +	int max_errors;
> +
> +	struct btrfs_bio_stripe *stripes;
> +	struct btrfs_bio_stripe __stripe;
> +
> +	bio_end_io_t *end_io;
> +	void *private;
> +
>   	enum btrfs_endio_type end_io_type;
>   	struct btrfs_work work;
>
> @@ -389,13 +404,8 @@ struct btrfs_io_stripe {
>    */
>   struct btrfs_io_context {
>   	refcount_t refs;
> -	atomic_t stripes_pending;
>   	struct btrfs_fs_info *fs_info;
>   	u64 map_type; /* get from map_lookup->type */
> -	bio_end_io_t *end_io;
> -	struct bio *orig_bio;
> -	void *private;
> -	atomic_t error;
>   	int max_errors;
>   	int num_stripes;
>   	int mirror_num;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper
  2022-03-22 15:56 ` [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper Christoph Hellwig
@ 2022-03-23  1:23   ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:23 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:56, Christoph Hellwig wrote:
> ---

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>   fs/btrfs/compression.c | 26 ++++++++-----------------
>   fs/btrfs/extent_io.c   | 24 ++++++++---------------
>   fs/btrfs/inode.c       | 32 ++++++++++--------------------
>   fs/btrfs/volumes.c     | 44 +++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/volumes.h     | 20 ++-----------------
>   5 files changed, 69 insertions(+), 77 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index ae6f986058c75..fca025c327a7e 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -445,10 +445,9 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
>   					u64 *next_stripe_start)
>   {
>   	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
> -	struct btrfs_io_geometry geom;
> -	struct extent_map *em;
> +	struct block_device *bdev;
>   	struct bio *bio;
> -	int ret;
> +	u64 len;
>
>   	bio = btrfs_bio_alloc(cb->inode, BIO_MAX_VECS, opf);
>   	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
> @@ -459,23 +458,14 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
>   	else
>   		btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
>
> -	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
> -	if (IS_ERR(em)) {
> -		bio_put(bio);
> -		return ERR_CAST(em);
> -	}
> +	bdev = btrfs_get_stripe_info(fs_info, btrfs_op(bio), disk_bytenr,
> +			      fs_info->sectorsize, &len);
> +	if (IS_ERR(bdev))
> +		return ERR_CAST(bdev);
>
>   	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
> -		bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev);
> -
> -	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom);
> -	free_extent_map(em);
> -	if (ret < 0) {
> -		bio_put(bio);
> -		return ERR_PTR(ret);
> -	}
> -	*next_stripe_start = disk_bytenr + geom.len;
> -
> +		bio_set_dev(bio, bdev);
> +	*next_stripe_start = disk_bytenr + len;
>   	return bio;
>   }
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bfd91ed27bd14..10fc5e4dd14a3 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3235,11 +3235,10 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
>   			       struct btrfs_inode *inode, u64 file_offset)
>   {
>   	struct btrfs_fs_info *fs_info = inode->root->fs_info;
> -	struct btrfs_io_geometry geom;
>   	struct btrfs_ordered_extent *ordered;
> -	struct extent_map *em;
>   	u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
> -	int ret;
> +	struct block_device *bdev;
> +	u64 len;
>
>   	/*
>   	 * Pages for compressed extent are never submitted to disk directly,
> @@ -3253,19 +3252,12 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
>   		bio_ctrl->len_to_stripe_boundary = U32_MAX;
>   		return 0;
>   	}
> -	em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
> -	if (IS_ERR(em))
> -		return PTR_ERR(em);
> -	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
> -				    logical, &geom);
> -	free_extent_map(em);
> -	if (ret < 0) {
> -		return ret;
> -	}
> -	if (geom.len > U32_MAX)
> -		bio_ctrl->len_to_stripe_boundary = U32_MAX;
> -	else
> -		bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
> +
> +	bdev = btrfs_get_stripe_info(fs_info, btrfs_op(bio_ctrl->bio), logical,
> +			      fs_info->sectorsize, &len);
> +	if (IS_ERR(bdev))
> +		return PTR_ERR(bdev);
> +	bio_ctrl->len_to_stripe_boundary = min(len, (u64)U32_MAX);
>
>   	if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
>   		bio_ctrl->len_to_oe_boundary = U32_MAX;
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index d4faed31d36a4..3f7e1779ff19f 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7944,12 +7944,9 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   	u64 submit_len;
>   	u64 clone_offset = 0;
>   	u64 clone_len;
> -	u64 logical;
> -	int ret;
>   	blk_status_t status;
> -	struct btrfs_io_geometry geom;
>   	struct btrfs_dio_data *dio_data = iter->private;
> -	struct extent_map *em = NULL;
> +	u64 len;
>
>   	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
>   	if (!dip) {
> @@ -7978,21 +7975,16 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   	submit_len = dio_bio->bi_iter.bi_size;
>
>   	do {
> -		logical = start_sector << 9;
> -		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> -		if (IS_ERR(em)) {
> -			status = errno_to_blk_status(PTR_ERR(em));
> -			em = NULL;
> -			goto out_err_em;
> -		}
> -		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> -					    logical, &geom);
> -		if (ret) {
> -			status = errno_to_blk_status(ret);
> -			goto out_err_em;
> +		struct block_device *bdev;
> +
> +		bdev = btrfs_get_stripe_info(fs_info, btrfs_op(dio_bio),
> +				      start_sector << 9, submit_len, &len);
> +		if (IS_ERR(bdev)) {
> +			status = errno_to_blk_status(PTR_ERR(bdev));
> +			goto out_err;
>   		}
>
> -		clone_len = min(submit_len, geom.len);
> +		clone_len = min(submit_len, len);
>   		ASSERT(clone_len <= UINT_MAX);
>
>   		/*
> @@ -8044,20 +8036,16 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
>   			bio_put(bio);
>   			if (submit_len > 0)
>   				refcount_dec(&dip->refs);
> -			goto out_err_em;
> +			goto out_err;
>   		}
>
>   		dio_data->submitted += clone_len;
>   		clone_offset += clone_len;
>   		start_sector += clone_len >> 9;
>   		file_offset += clone_len;
> -
> -		free_extent_map(em);
>   	} while (submit_len > 0);
>   	return;
>
> -out_err_em:
> -	free_extent_map(em);
>   out_err:
>   	dip->dio_bio->bi_status = status;
>   	btrfs_dio_private_put(dip);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7392b9f2a3323..f70bb3569a7ae 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6301,6 +6301,21 @@ static bool need_full_stripe(enum btrfs_map_op op)
>   	return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
>   }
>
> +struct btrfs_io_geometry {
> +	/* remaining bytes before crossing a stripe */
> +	u64 len;
> +	/* offset of logical address in chunk */
> +	u64 offset;
> +	/* length of single IO stripe */
> +	u64 stripe_len;
> +	/* number of stripe where address falls */
> +	u64 stripe_nr;
> +	/* offset of address in stripe */
> +	u64 stripe_offset;
> +	/* offset of raid56 stripe into the chunk */
> +	u64 raid56_stripe_offset;
> +};
> +
>   /*
>    * Calculate the geometry of a particular (address, len) tuple. This
>    * information is used to calculate how big a particular bio can get before it
> @@ -6315,9 +6330,10 @@ static bool need_full_stripe(enum btrfs_map_op op)
>    * Returns < 0 in case a chunk for the given logical address cannot be found,
>    * usually shouldn't happen unless @logical is corrupted, 0 otherwise.
>    */
> -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
> -			  enum btrfs_map_op op, u64 logical,
> -			  struct btrfs_io_geometry *io_geom)
> +static int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info,
> +		struct extent_map *em,
> +		enum btrfs_map_op op, u64 logical,
> +		struct btrfs_io_geometry *io_geom)
>   {
>   	struct map_lookup *map;
>   	u64 len;
> @@ -6394,6 +6410,28 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
>   	return 0;
>   }
>
> +struct block_device *btrfs_get_stripe_info(struct btrfs_fs_info *fs_info,
> +		enum btrfs_map_op op, u64 logical, u64 len, u64 *lenp)
> +{
> +	struct btrfs_io_geometry geom;
> +	struct block_device *bdev;
> +	struct extent_map *em;
> +	int ret;
> +
> +	em = btrfs_get_chunk_map(fs_info, logical, len);
> +	if (IS_ERR(em))
> +		return ERR_CAST(em);
> +
> +	bdev = em->map_lookup->stripes[0].dev->bdev;
> +
> +	ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom);
> +	free_extent_map(em);
> +	if (ret < 0)
> +		return ERR_PTR(ret);
> +	*lenp = geom.len;
> +	return bdev;
> +}
> +
>   static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>   		enum btrfs_map_op op, u64 logical, u64 *length,
>   		struct btrfs_io_context **bioc_ret, struct btrfs_bio *bbio,
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 5b0e7602434b0..c6425760f69da 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -17,21 +17,6 @@ extern struct mutex uuid_mutex;
>
>   #define BTRFS_STRIPE_LEN	SZ_64K
>
> -struct btrfs_io_geometry {
> -	/* remaining bytes before crossing a stripe */
> -	u64 len;
> -	/* offset of logical address in chunk */
> -	u64 offset;
> -	/* length of single IO stripe */
> -	u64 stripe_len;
> -	/* number of stripe where address falls */
> -	u64 stripe_nr;
> -	/* offset of address in stripe */
> -	u64 stripe_offset;
> -	/* offset of raid56 stripe into the chunk */
> -	u64 raid56_stripe_offset;
> -};
> -
>   /*
>    * Use sequence counter to get consistent device stat data on
>    * 32-bit processors.
> @@ -520,9 +505,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
>   		     u64 logical, u64 *length,
>   		     struct btrfs_io_context **bioc_ret);
> -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map,
> -			  enum btrfs_map_op op, u64 logical,
> -			  struct btrfs_io_geometry *io_geom);
> +struct block_device *btrfs_get_stripe_info(struct btrfs_fs_info *fs_info,
> +		enum btrfs_map_op op, u64 logical, u64 length, u64 *lenp);
>   int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
>   int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
>   struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector
  2022-03-22 15:56 ` [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector Christoph Hellwig
@ 2022-03-23  1:28   ` Qu Wenruo
  2022-03-23  6:15     ` Christoph Hellwig
  2022-03-24  0:57   ` Sweet Tea Dorminy
  1 sibling, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:28 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:56, Christoph Hellwig wrote:
> Allow the caller to control what happens when the repair bio completes.
> This will be needed streamline the direct I/O path.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 15 ++++++++-------
>   fs/btrfs/extent_io.h |  8 ++++----
>   fs/btrfs/inode.c     |  4 +++-
>   3 files changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 2fdb5d7dd51e1..5a1447db28228 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2627,10 +2627,10 @@ static bool btrfs_check_repairable(struct inode *inode,
>   }
>
>   blk_status_t btrfs_repair_one_sector(struct inode *inode,
> -			    struct bio *failed_bio, u32 bio_offset,
> -			    struct page *page, unsigned int pgoff,
> -			    u64 start, int failed_mirror,
> -			    submit_bio_hook_t *submit_bio_hook)
> +		struct bio *failed_bio, u32 bio_offset, struct page *page,
> +		unsigned int pgoff, u64 start, int failed_mirror,
> +		submit_bio_hook_t *submit_bio_hook,
> +		void *bi_private, void (*bi_end_io)(struct bio *bio))

Not a big fan of extra parameters for a function which already had enough...

And I always have a question on repair (aka read from extra copy).

Can't we just make the repair part to be synchronous?
Instead of putting everything into another endio call back, and wait for
the read and re-check in the same context.

That would definitely streamline the workload way more than this.

And I don't think users would complain about btrfs is slow on reading
when correcting the corrupted data.

Thanks,
Qu
>   {
>   	struct io_failure_record *failrec;
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> @@ -2660,9 +2660,9 @@ blk_status_t btrfs_repair_one_sector(struct inode *inode,
>   	repair_bio = btrfs_bio_alloc(inode, 1, REQ_OP_READ);
>   	repair_bbio = btrfs_bio(repair_bio);
>   	repair_bbio->file_offset = start;
> -	repair_bio->bi_end_io = failed_bio->bi_end_io;
>   	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
> -	repair_bio->bi_private = failed_bio->bi_private;
> +	repair_bio->bi_private = bi_private;
> +	repair_bio->bi_end_io = bi_end_io;
>
>   	if (failed_bbio->csum) {
>   		const u32 csum_size = fs_info->csum_size;
> @@ -2758,7 +2758,8 @@ static blk_status_t submit_read_repair(struct inode *inode,
>   		ret = btrfs_repair_one_sector(inode, failed_bio,
>   				bio_offset + offset,
>   				page, pgoff + offset, start + offset,
> -				failed_mirror, btrfs_submit_data_bio);
> +				failed_mirror, btrfs_submit_data_bio,
> +				failed_bio->bi_private, failed_bio->bi_end_io);
>   		if (!ret) {
>   			/*
>   			 * We have submitted the read repair, the page release
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 0239b26d5170a..54e54269cfdba 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -304,10 +304,10 @@ struct io_failure_record {
>   };
>
>   blk_status_t btrfs_repair_one_sector(struct inode *inode,
> -			    struct bio *failed_bio, u32 bio_offset,
> -			    struct page *page, unsigned int pgoff,
> -			    u64 start, int failed_mirror,
> -			    submit_bio_hook_t *submit_bio_hook);
> +		struct bio *failed_bio, u32 bio_offset, struct page *page,
> +		unsigned int pgoff, u64 start, int failed_mirror,
> +		submit_bio_hook_t *submit_bio_hook,
> +		void *bi_private, void (*bi_end_io)(struct bio *bio));
>
>   #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>   bool find_lock_delalloc_range(struct inode *inode,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 93b3ef48cea2f..e25d9d860c679 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7799,7 +7799,9 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   				ret = btrfs_repair_one_sector(inode, &bbio->bio,
>   						bio_offset, bvec.bv_page, pgoff,
>   						start, bbio->mirror_num,
> -						submit_dio_repair_bio);
> +						submit_dio_repair_bio,
> +						bbio->bio.bi_private,
> +						bbio->bio.bi_end_io);
>   				if (ret)
>   					err = ret;
>   			}

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-22 15:56 ` [PATCH 40/40] btrfs: use the iomap direct I/O bio directly Christoph Hellwig
@ 2022-03-23  1:39   ` Qu Wenruo
  2022-03-23  6:17     ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  1:39 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel



On 2022/3/22 23:56, Christoph Hellwig wrote:
> Make the iomap code allocate btrfs dios by setting the bio_set field,
> and then feed these directly into btrfs_map_dio.
>
> For this to work iomap_begin needs to report a range that only contains
> a single chunk, and thus is changed to a two level iteration.
>
> This needs another extra field in struct btrfs_dio.  We culd overlay
> it with other fields not used after I/O submittion, or split out a
> new btrfs_dio_bio for the file_offset, iter and repair_refs, but
> compared to the overall saving of the series this is a minor detail.
>
> The per-iomap csum lookup is gone for now as well.  At least for
> small I/Os this just creates a lot of overhead, but for large I/O
> we could look into optimizing this in one for or another, but I'd
> love to see a reproducer where it actually matters first.  With the
> state as of this patch the direct I/O bio submission is so close
> to the buffered one that they could be unified with very little
> work, so diverging again would be a bit counterproductive.  OTOH
> if the optimization is indeed very useful we should do it in a way
> that also works for buffered reads.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Not familar with iomap thus I can be totally wrong, but isn't the idea
of iomap to separate more code from fs?

My (mostly poor) understanding of iomap is, iomap just crafting a bio
(with the help from fs callbacks).

For btrfs or stacked drivers, they can do whatever to split/clone the
bio, as long as the original bio can be fullfiled, how btrfs/stacked
drivers do the work is their own business.


I'm really not sure if it's a good idea to expose btrfs internal bio_set
just for iomap.

Personally speaking I didn't see much problem of cloning an iomap bio,
it only causes extra memory of btrfs_bio, which is pretty small previously.

Or did I miss something specific to iomap?

Thanks,
Qu
> ---
>   fs/btrfs/btrfs_inode.h |  25 ---
>   fs/btrfs/ctree.h       |   1 -
>   fs/btrfs/extent_io.c   |  22 +-
>   fs/btrfs/extent_io.h   |   4 +-
>   fs/btrfs/inode.c       | 451 ++++++++++++++++-------------------------
>   fs/btrfs/volumes.h     |   1 +
>   6 files changed, 184 insertions(+), 320 deletions(-)
>
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index b3e46aabc3d86..a3199020f0001 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -346,31 +346,6 @@ static inline bool btrfs_inode_in_log(struct btrfs_inode *inode, u64 generation)
>   	return ret;
>   }
>
> -struct btrfs_dio_private {
> -	struct inode *inode;
> -
> -	/*
> -	 * Since DIO can use anonymous page, we cannot use page_offset() to
> -	 * grab the file offset, thus need a dedicated member for file offset.
> -	 */
> -	u64 file_offset;
> -	u64 disk_bytenr;
> -	/* Used for bio::bi_size */
> -	u32 bytes;
> -
> -	/*
> -	 * References to this structure. There is one reference per in-flight
> -	 * bio plus one while we're still setting up.
> -	 */
> -	refcount_t refs;
> -
> -	/* dio_bio came from fs/direct-io.c */
> -	struct bio *dio_bio;
> -
> -	/* Array of checksums */
> -	u8 csums[];
> -};
> -
>   /*
>    * btrfs_inode_item stores flags in a u64, btrfs_inode stores them in two
>    * separate u32s. These two functions convert between the two representations.
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 196f308e3e0d7..64ef20b84f694 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3136,7 +3136,6 @@ int btrfs_del_orphan_item(struct btrfs_trans_handle *trans,
>   int btrfs_find_orphan_item(struct btrfs_root *root, u64 offset);
>
>   /* file-item.c */
> -struct btrfs_dio_private;
>   int btrfs_del_csums(struct btrfs_trans_handle *trans,
>   		    struct btrfs_root *root, u64 bytenr, u64 len);
>   blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst);
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 5a1447db28228..f705e4ec9b961 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -31,7 +31,7 @@
>
>   static struct kmem_cache *extent_state_cache;
>   static struct kmem_cache *extent_buffer_cache;
> -static struct bio_set btrfs_bioset;
> +struct bio_set btrfs_bioset;
>
>   static inline bool extent_state_in_tree(const struct extent_state *state)
>   {
> @@ -3150,26 +3150,6 @@ struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
>   	return bio;
>   }
>
> -struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
> -		u64 offset, u64 size)
> -{
> -	struct bio *bio;
> -	struct btrfs_bio *bbio;
> -
> -	ASSERT(offset <= UINT_MAX && size <= UINT_MAX);
> -
> -	/* this will never fail when it's backed by a bioset */
> -	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
> -	ASSERT(bio);
> -
> -	bbio = btrfs_bio(bio);
> -	btrfs_bio_init(btrfs_bio(bio), inode);
> -
> -	bio_trim(bio, offset >> 9, size >> 9);
> -	bbio->iter = bio->bi_iter;
> -	return bio;
> -}
> -
>   /**
>    * Attempt to add a page to bio
>    *
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 54e54269cfdba..b416531721dfb 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -279,8 +279,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
>   				  u32 bits_to_clear, unsigned long page_ops);
>   struct bio *btrfs_bio_alloc(struct inode *inode, unsigned int nr_iovecs,
>   		unsigned int opf);
> -struct bio *btrfs_bio_clone_partial(struct inode *inode, struct bio *orig,
> -		u64 offset, u64 size);
>
>   void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
>   int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
> @@ -323,4 +321,6 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info);
>   #define btrfs_extent_buffer_leak_debug_check(fs_info)	do {} while (0)
>   #endif
>
> +extern struct bio_set btrfs_bioset;
> +
>   #endif
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index e25d9d860c679..6ea6ef214abdb 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -62,8 +62,8 @@ struct btrfs_iget_args {
>   };
>
>   struct btrfs_dio_data {
> -	ssize_t submitted;
>   	struct extent_changeset *data_reserved;
> +	struct iomap extent;
>   };
>
>   static const struct inode_operations btrfs_dir_inode_operations;
> @@ -7507,16 +7507,16 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
>   	return ret;
>   }
>
> -static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
> +static int btrfs_dio_iomap_begin_extent(struct iomap_iter *iter)
>   {
>   	struct inode *inode = iter->inode;
>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   	loff_t start = iter->pos;
>   	loff_t length = iter->len;
> -	struct iomap *iomap = &iter->iomap;
>   	struct extent_map *em;
>   	struct extent_state *cached_state = NULL;
>   	struct btrfs_dio_data *dio_data = iter->private;
> +	struct iomap *iomap = &dio_data->extent;
>   	u64 lockstart, lockend;
>   	bool write = (iter->flags & IOMAP_WRITE);
>   	int ret = 0;
> @@ -7543,7 +7543,6 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
>   			return ret;
>   	}
>
> -	dio_data->submitted = 0;
>   	dio_data->data_reserved = NULL;
>
>   	/*
> @@ -7647,14 +7646,12 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
>   		iomap->type = IOMAP_MAPPED;
>   	}
>   	iomap->offset = start;
> -	iomap->bdev = fs_info->fs_devices->latest_dev->bdev;
>   	iomap->length = len;
>
>   	if (write && btrfs_use_zone_append(BTRFS_I(inode), em->block_start))
>   		iomap->flags |= IOMAP_F_ZONE_APPEND;
>
>   	free_extent_map(em);
> -
>   	return 0;
>
>   unlock_err:
> @@ -7663,53 +7660,95 @@ static int btrfs_dio_iomap_begin(struct iomap_iter *iter)
>   	return ret;
>   }
>
> -static int btrfs_dio_iomap_end(struct iomap_iter *iter)
> +static void btrfs_dio_unlock_remaining_extent(struct btrfs_inode *bi,
> +		u64 pos, u64 len, u64 processed, bool write)
>   {
> -	struct btrfs_dio_data *dio_data = iter->private;
> +	if (write)
> +		__endio_write_update_ordered(bi, pos + processed,
> +				len - processed, false);
> +	else
> +		unlock_extent(&bi->io_tree, pos + processed,
> +				pos + len - 1);
> +}
> +
> +static int btrfs_dio_iomap_begin_chunk(struct iomap_iter *iter)
> +{
> +	struct btrfs_fs_info *fs_info = btrfs_sb(iter->inode->i_sb);
>   	struct btrfs_inode *bi = BTRFS_I(iter->inode);
> -	bool write = (iter->flags & IOMAP_WRITE);
> -	loff_t length = iomap_length(iter);
> -	loff_t pos = iter->pos;
> -	int ret = 0;
> +	struct btrfs_dio_data *dio_data = iter->private;
> +	struct block_device *bdev;
> +	u64 len;
> +
> +	iter->iomap = dio_data->extent;
>
> -	if (!write && iter->iomap.type == IOMAP_HOLE) {
> -		/* If reading from a hole, unlock and return */
> -		unlock_extent(&bi->io_tree, pos, pos + length - 1);
> +	if (dio_data->extent.type != IOMAP_MAPPED)
>   		return 0;
> +
> +	bdev = btrfs_get_stripe_info(fs_info, (iter->flags & IOMAP_WRITE) ?
> +			BTRFS_MAP_WRITE : BTRFS_MAP_READ,
> +			iter->iomap.addr, iter->iomap.length, &len);
> +	if (WARN_ON_ONCE(IS_ERR(bdev))) {
> +		btrfs_dio_unlock_remaining_extent(bi, dio_data->extent.offset,
> +						  dio_data->extent.length, 0,
> +						  iter->flags & IOMAP_WRITE);
> +		return PTR_ERR(bdev);
>   	}
>
> -	if (dio_data->submitted < length) {
> -		pos += dio_data->submitted;
> -		length -= dio_data->submitted;
> -		if (write)
> -			__endio_write_update_ordered(bi, pos, length, false);
> -		else
> -			unlock_extent(&bi->io_tree, pos, pos + length - 1);
> -		ret = -ENOTBLK;
> +	iter->iomap.bdev = bdev;
> +	iter->iomap.length = min(iter->iomap.length, len);
> +	return 0;
> +}
> +
> +static bool btrfs_dio_iomap_end(struct iomap_iter *iter)
> +{
> +	struct btrfs_inode *bi = BTRFS_I(iter->inode);
> +	struct btrfs_dio_data *dio_data = iter->private;
> +	struct iomap *extent = &dio_data->extent;
> +	loff_t processed = iomap_processed(iter);
> +	loff_t length = iomap_length(iter);
> +
> +	if (iter->iomap.type == IOMAP_HOLE) {
> +		ASSERT(!(iter->flags & IOMAP_WRITE));
> +
> +		/* If reading from a hole, unlock the whole range here */
> +		unlock_extent(&bi->io_tree, iter->pos, iter->pos + length - 1);
> +	} else if (processed < length) {
> +		btrfs_dio_unlock_remaining_extent(bi, extent->offset,
> +						  extent->length, processed,
> +						  iter->flags & IOMAP_WRITE);
> +	} else if (iter->pos + processed < extent->offset + extent->length) {
> +		extent->offset += processed;
> +		extent->addr += processed;
> +		extent->length -= processed;
> +		return true;
>   	}
>
> -	if (write)
> +	if (iter->flags & IOMAP_WRITE)
>   		extent_changeset_free(dio_data->data_reserved);
> -	return ret;
> +	return false;
>   }
>
>   static int btrfs_dio_iomap_iter(struct iomap_iter *iter)
>   {
> +	bool keep_extent = false;
>   	int ret;
>
> -	if (iter->iomap.length) {
> -		ret = btrfs_dio_iomap_end(iter);
> -		if (ret < 0 && !iter->processed)
> -			return ret;
> -	}
> +	if (iter->iomap.length)
> +		keep_extent = btrfs_dio_iomap_end(iter);
>
>   	ret = iomap_iter_advance(iter);
>   	if (ret <= 0)
>   		return ret;
>
> -	ret = btrfs_dio_iomap_begin(iter);
> -	if (ret < 0)
> +	if (!keep_extent) {
> +		ret = btrfs_dio_iomap_begin_extent(iter);
> +		if (ret < 0)
> +			return ret;
> +	}
> +	ret = btrfs_dio_iomap_begin_chunk(iter);
> +	if (ret < 0)
>   		return ret;
> +
>   	iomap_iter_done(iter);
>   	return 1;
>   }
> @@ -7718,54 +7757,40 @@ static const struct iomap_ops btrfs_dio_iomap_ops = {
>   	.iomap_iter		= btrfs_dio_iomap_iter,
>   };
>
> -static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
> +static void btrfs_end_read_dio_bio(struct btrfs_bio *bbio,
> +		struct btrfs_bio *main_bbio);
> +
> +static void btrfs_dio_repair_end_io(struct bio *bio)
>   {
> -	/*
> -	 * This implies a barrier so that stores to dio_bio->bi_status before
> -	 * this and loads of dio_bio->bi_status after this are fully ordered.
> -	 */
> -	if (!refcount_dec_and_test(&dip->refs))
> -		return;
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
> +	struct btrfs_inode *bi = BTRFS_I(bbio->inode);
> +	struct btrfs_bio *failed_bbio = bio->bi_private;
>
> -	if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) {
> -		__endio_write_update_ordered(BTRFS_I(dip->inode),
> -					     dip->file_offset,
> -					     dip->bytes,
> -					     !dip->dio_bio->bi_status);
> -	} else {
> -		unlock_extent(&BTRFS_I(dip->inode)->io_tree,
> -			      dip->file_offset,
> -			      dip->file_offset + dip->bytes - 1);
> +	if (bio->bi_status) {
> +		btrfs_warn(bi->root->fs_info,
> +			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
> +			   btrfs_ino(bi), bio_op(bio), bio->bi_opf,
> +			   bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> +			   bio->bi_status);
>   	}
> +	btrfs_end_read_dio_bio(bbio, failed_bbio);
>
> -	bio_endio(dip->dio_bio);
> -	kfree(dip);
> +	bio_put(bio);
>   }
>
>   static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
>   					  int mirror_num,
>   					  unsigned long bio_flags)
>   {
> -	struct btrfs_dio_private *dip = bio->bi_private;
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	blk_status_t ret;
> -
>   	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
> -
>   	btrfs_bio(bio)->end_io_type = BTRFS_ENDIO_WQ_DATA_WRITE;
> -
> -	refcount_inc(&dip->refs);
> -	ret = btrfs_map_bio(fs_info, bio, mirror_num);
> -	if (ret)
> -		refcount_dec(&dip->refs);
> -	return ret;
> +	return btrfs_map_bio(btrfs_sb(inode->i_sb), bio, mirror_num);
>   }
>
> -static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
> -					     struct btrfs_bio *bbio,
> -					     const bool uptodate)
> +static void btrfs_end_read_dio_bio(struct btrfs_bio *this_bbio,
> +		struct btrfs_bio *main_bbio)
>   {
> -	struct inode *inode = dip->inode;
> +	struct inode *inode = main_bbio->inode;
>   	struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info;
>   	const u32 sectorsize = fs_info->sectorsize;
>   	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
> @@ -7773,20 +7798,22 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
>   	struct bio_vec bvec;
>   	struct bvec_iter iter;
> +	bool uptodate = !this_bbio->bio.bi_status;
>   	u32 bio_offset = 0;
> -	blk_status_t err = BLK_STS_OK;
>
> -	__bio_for_each_segment(bvec, &bbio->bio, iter, bbio->iter) {
> +	main_bbio->bio.bi_status = BLK_STS_OK;
> +
> +	__bio_for_each_segment(bvec, &this_bbio->bio, iter, this_bbio->iter) {
>   		unsigned int i, nr_sectors, pgoff;
>
>   		nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec.bv_len);
>   		pgoff = bvec.bv_offset;
>   		for (i = 0; i < nr_sectors; i++) {
> -			u64 start = bbio->file_offset + bio_offset;
> +			u64 start = this_bbio->file_offset + bio_offset;
>
>   			ASSERT(pgoff < PAGE_SIZE);
>   			if (uptodate &&
> -			    (!csum || !check_data_csum(inode, bbio,
> +			    (!csum || !check_data_csum(inode, this_bbio,
>   						       bio_offset, bvec.bv_page,
>   						       pgoff, start))) {
>   				clean_io_failure(fs_info, failure_tree, io_tree,
> @@ -7796,21 +7823,56 @@ static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip,
>   			} else {
>   				blk_status_t ret;
>
> -				ret = btrfs_repair_one_sector(inode, &bbio->bio,
> -						bio_offset, bvec.bv_page, pgoff,
> -						start, bbio->mirror_num,
> +				atomic_inc(&main_bbio->repair_refs);
> +				ret = btrfs_repair_one_sector(inode,
> +						&this_bbio->bio, bio_offset,
> +						bvec.bv_page, pgoff, start,
> +						this_bbio->mirror_num,
>   						submit_dio_repair_bio,
> -						bbio->bio.bi_private,
> -						bbio->bio.bi_end_io);
> -				if (ret)
> -					err = ret;
> +						main_bbio,
> +						btrfs_dio_repair_end_io);
> +				if (ret) {
> +					main_bbio->bio.bi_status = ret;
> +					atomic_dec(&main_bbio->repair_refs);
> +				}
>   			}
>   			ASSERT(bio_offset + sectorsize > bio_offset);
>   			bio_offset += sectorsize;
>   			pgoff += sectorsize;
>   		}
>   	}
> -	return err;
> +
> +	if (atomic_dec_and_test(&main_bbio->repair_refs)) {
> +		unlock_extent(&BTRFS_I(inode)->io_tree, main_bbio->file_offset,
> +			main_bbio->file_offset + main_bbio->iter.bi_size - 1);
> +		iomap_dio_bio_end_io(&main_bbio->bio);
> +	}
> +}
> +
> +static void btrfs_dio_bio_end_io(struct bio *bio)
> +{
> +	struct btrfs_bio *bbio = btrfs_bio(bio);
> +	struct btrfs_inode *bi = BTRFS_I(bbio->inode);
> +
> +	if (bio->bi_status) {
> +		btrfs_warn(bi->root->fs_info,
> +			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
> +			   btrfs_ino(bi), bio_op(bio), bio->bi_opf,
> +			   bio->bi_iter.bi_sector, bio->bi_iter.bi_size,
> +			   bio->bi_status);
> +	}
> +
> +	if (bio_op(bio) == REQ_OP_READ) {
> +		atomic_set(&bbio->repair_refs, 1);
> +		btrfs_end_read_dio_bio(bbio, bbio);
> +	} else {
> +		btrfs_record_physical_zoned(bbio->inode, bbio->file_offset,
> +					    bio);
> +		__endio_write_update_ordered(bi, bbio->file_offset,
> +					     bbio->iter.bi_size,
> +					     !bio->bi_status);
> +		iomap_dio_bio_end_io(bio);
> +	}
>   }
>
>   static void __endio_write_update_ordered(struct btrfs_inode *inode,
> @@ -7829,47 +7891,47 @@ static void btrfs_submit_bio_start_direct_io(struct btrfs_work *work)
>   			&bbio->bio, bbio->file_offset, 1);
>   }
>
> -static void btrfs_end_dio_bio(struct bio *bio)
> +/*
> + * If we are submitting more than one bio, submit them all asynchronously.  The
> + * exception is RAID 5 or 6, as asynchronous checksums make it difficult to
> + * collect full stripe writes.
> + */
> +static bool btrfs_dio_allow_async_write(struct btrfs_fs_info *fs_info,
> +		struct btrfs_inode *bi)
>   {
> -	struct btrfs_dio_private *dip = bio->bi_private;
> -	struct btrfs_bio *bbio = btrfs_bio(bio);
> -	blk_status_t err = bio->bi_status;
> -
> -	if (err)
> -		btrfs_warn(BTRFS_I(dip->inode)->root->fs_info,
> -			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
> -			   btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio),
> -			   bio->bi_opf, bio->bi_iter.bi_sector,
> -			   bio->bi_iter.bi_size, err);
> -
> -	if (bio_op(bio) == REQ_OP_READ)
> -		err = btrfs_check_read_dio_bio(dip, bbio, !err);
> -
> -	if (err)
> -		dip->dio_bio->bi_status = err;
> -
> -	btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio);
> -
> -	bio_put(bio);
> -	btrfs_dio_private_put(dip);
> +	if (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK)
> +		return false;
> +	if (atomic_read(&bi->sync_writers))
> +		return false;
> +	return true;
>   }
>
> -static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
> -		struct inode *inode, u64 file_offset, int async_submit)
> +static void btrfs_dio_submit_io(const struct iomap_iter *iter,
> +		struct bio *bio, loff_t file_offset, bool more)
>   {
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	struct btrfs_inode *bi = BTRFS_I(inode);
> -	struct btrfs_dio_private *dip = bio->bi_private;
> +	struct btrfs_fs_info *fs_info = btrfs_sb(iter->inode->i_sb);
> +	struct btrfs_inode *bi = BTRFS_I(iter->inode);
>   	struct btrfs_bio *bbio = btrfs_bio(bio);
>   	blk_status_t ret;
>
> +	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
> +	bbio->inode = iter->inode;
> +	bbio->file_offset = file_offset;
> +	bbio->iter = bio->bi_iter;
> +	bio->bi_end_io = btrfs_dio_bio_end_io;
> +
>   	if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
> +		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> +			ret = extract_ordered_extent(bi, bio, file_offset);
> +			if (ret)
> +				goto out_err;
> +		}
> +
>   		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
> -			/* See btrfs_submit_data_bio for async submit rules */
> -			if (async_submit && !atomic_read(&bi->sync_writers)) {
> +			if (more && btrfs_dio_allow_async_write(fs_info, bi)) {
>   				btrfs_submit_bio_async(bbio,
>   					btrfs_submit_bio_start_direct_io);
> -				return BLK_STS_OK;
> +				return;
>   			}
>
>   			/*
> @@ -7878,189 +7940,36 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
>   			 */
>   			ret = btrfs_csum_one_bio(bi, bio, file_offset, 1);
>   			if (ret)
> -				return ret;
> +				goto out_err;
>   		}
>   	} else {
>   		bbio->end_io_type = BTRFS_ENDIO_WQ_DATA_READ;
>
> -		if (!(bi->flags & BTRFS_INODE_NODATASUM)) {
> -			u64 csum_offset;
> -
> -			csum_offset = file_offset - dip->file_offset;
> -			csum_offset >>= fs_info->sectorsize_bits;
> -			csum_offset *= fs_info->csum_size;
> -			btrfs_bio(bio)->csum = dip->csums + csum_offset;
> -		}
> -	}
> -
> -	return btrfs_map_bio(fs_info, bio, 0);
> -}
> -
> -/*
> - * If this succeeds, the btrfs_dio_private is responsible for cleaning up locked
> - * or ordered extents whether or not we submit any bios.
> - */
> -static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
> -							  struct inode *inode,
> -							  loff_t file_offset)
> -{
> -	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
> -	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
> -	size_t dip_size;
> -	struct btrfs_dio_private *dip;
> -
> -	dip_size = sizeof(*dip);
> -	if (!write && csum) {
> -		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -		size_t nblocks;
> -
> -		nblocks = dio_bio->bi_iter.bi_size >> fs_info->sectorsize_bits;
> -		dip_size += fs_info->csum_size * nblocks;
> -	}
> -
> -	dip = kzalloc(dip_size, GFP_NOFS);
> -	if (!dip)
> -		return NULL;
> -
> -	dip->inode = inode;
> -	dip->file_offset = file_offset;
> -	dip->bytes = dio_bio->bi_iter.bi_size;
> -	dip->disk_bytenr = dio_bio->bi_iter.bi_sector << 9;
> -	dip->dio_bio = dio_bio;
> -	refcount_set(&dip->refs, 1);
> -	return dip;
> -}
> -
> -static void btrfs_submit_direct(const struct iomap_iter *iter,
> -		struct bio *dio_bio, loff_t file_offset, bool more)
> -{
> -	struct inode *inode = iter->inode;
> -	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
> -	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> -	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
> -			     BTRFS_BLOCK_GROUP_RAID56_MASK);
> -	struct btrfs_dio_private *dip;
> -	struct bio *bio;
> -	u64 start_sector;
> -	int async_submit = 0;
> -	u64 submit_len;
> -	u64 clone_offset = 0;
> -	u64 clone_len;
> -	blk_status_t status;
> -	struct btrfs_dio_data *dio_data = iter->private;
> -	u64 len;
> -
> -	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
> -	if (!dip) {
> -		if (!write) {
> -			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
> -				file_offset + dio_bio->bi_iter.bi_size - 1);
> -		}
> -		dio_bio->bi_status = BLK_STS_RESOURCE;
> -		bio_endio(dio_bio);
> -		return;
> -	}
> -
> -	if (!write) {
> -		/*
> -		 * Load the csums up front to reduce csum tree searches and
> -		 * contention when submitting bios.
> -		 *
> -		 * If we have csums disabled this will do nothing.
> -		 */
> -		status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums);
> -		if (status != BLK_STS_OK)
> +		ret = btrfs_lookup_bio_sums(iter->inode, bio, NULL);
> +		if (ret)
>   			goto out_err;
>   	}
>
> -	start_sector = dio_bio->bi_iter.bi_sector;
> -	submit_len = dio_bio->bi_iter.bi_size;
> -
> -	do {
> -		struct block_device *bdev;
> -
> -		bdev = btrfs_get_stripe_info(fs_info, btrfs_op(dio_bio),
> -				      start_sector << 9, submit_len, &len);
> -		if (IS_ERR(bdev)) {
> -			status = errno_to_blk_status(PTR_ERR(bdev));
> -			goto out_err;
> -		}
> -
> -		clone_len = min(submit_len, len);
> -		ASSERT(clone_len <= UINT_MAX);
> -
> -		/*
> -		 * This will never fail as it's passing GPF_NOFS and
> -		 * the allocation is backed by btrfs_bioset.
> -		 */
> -		bio = btrfs_bio_clone_partial(inode, dio_bio, clone_offset,
> -					      clone_len);
> -		bio->bi_private = dip;
> -		bio->bi_end_io = btrfs_end_dio_bio;
> -		btrfs_bio(bio)->file_offset = file_offset;
> -
> -		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> -			status = extract_ordered_extent(BTRFS_I(inode), bio,
> -							file_offset);
> -			if (status) {
> -				bio_put(bio);
> -				goto out_err;
> -			}
> -		}
> -
> -		ASSERT(submit_len >= clone_len);
> -		submit_len -= clone_len;
> -
> -		/*
> -		 * Increase the count before we submit the bio so we know
> -		 * the end IO handler won't happen before we increase the
> -		 * count. Otherwise, the dip might get freed before we're
> -		 * done setting it up.
> -		 *
> -		 * We transfer the initial reference to the last bio, so we
> -		 * don't need to increment the reference count for the last one.
> -		 */
> -		if (submit_len > 0) {
> -			refcount_inc(&dip->refs);
> -			/*
> -			 * If we are submitting more than one bio, submit them
> -			 * all asynchronously. The exception is RAID 5 or 6, as
> -			 * asynchronous checksums make it difficult to collect
> -			 * full stripe writes.
> -			 */
> -			if (!raid56)
> -				async_submit = 1;
> -		}
> -
> -		status = btrfs_submit_dio_bio(bio, inode, file_offset,
> -						async_submit);
> -		if (status) {
> -			bio_put(bio);
> -			if (submit_len > 0)
> -				refcount_dec(&dip->refs);
> -			goto out_err;
> -		}
> +	ret = btrfs_map_bio(fs_info, bio, 0);
> +	if (ret)
> +		goto out_err;
>
> -		dio_data->submitted += clone_len;
> -		clone_offset += clone_len;
> -		start_sector += clone_len >> 9;
> -		file_offset += clone_len;
> -	} while (submit_len > 0);
>   	return;
>
>   out_err:
> -	dip->dio_bio->bi_status = status;
> -	btrfs_dio_private_put(dip);
> +	bio->bi_status = ret;
> +	bio_endio(bio);
>   }
>
>   static const struct iomap_dio_ops btrfs_dio_ops = {
> -	.submit_io		= btrfs_submit_direct,
> +	.submit_io		= btrfs_dio_submit_io,
> +	.bio_set		= &btrfs_bioset,
>   };
>
>   ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>   		size_t done_before)
>   {
> -	struct btrfs_dio_data data;
> +	struct btrfs_dio_data data = {};
>
>   	iocb->private = &data;
>   	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index c6425760f69da..e9d775398141b 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -341,6 +341,7 @@ struct btrfs_bio {
>
>   	/* for direct I/O */
>   	u64 file_offset;
> +	atomic_t repair_refs;
>
>   	/* @device is for stripe IO submission. */
>   	struct btrfs_device *device;

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 02/40] btrfs: fix direct I/O read repair for split bios
  2022-03-22 23:59   ` Qu Wenruo
@ 2022-03-23  6:03     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:03 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 07:59:15AM +0800, Qu Wenruo wrote:
> Personally speaking, I really hate to add DIO specific value into btrfs_bio.
>
> Hopes we can later turn that btrfs_bio::file_offset into some union for
> other usages.

Agreed.  There is two venues we could explore:

 - ceative use of unions.  Especially with some of my later patches
   adding more fields to struct btrfs_bio this could become possible.
 - adding a btrfs_dio_bio that contains a btrfs_bio.  By the end of
   this series another field (repair_refs) is added, and iter is only
   used for direct I/O, so this might be worthwhile.  But then again
   I think we could eventually kill off iter as well.

So we should eventually do something, but for a non-invasive fix
like this just adding the field for now seems like the most safe
approach.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices
  2022-03-23  0:00   ` Qu Wenruo
@ 2022-03-23  6:04     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:04 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:00:48AM +0800, Qu Wenruo wrote:
>> When a bio is split in btrfs_submit_direct, dip->file_offset contains
>> the file offset for the first bio.  But this means the start value used
>> in btrfs_end_dio_bio to record the write location for zone devices is
>> icorrect for subsequent bios.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Maybe better to be folded with previous patch?

Well, it fixes a separate issue.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 09/40] btrfs: simplify scrub_recheck_block
  2022-03-23  0:10   ` Qu Wenruo
@ 2022-03-23  6:05     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:05 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:10:51AM +0800, Qu Wenruo wrote:
>>   		}
>>
>>   		WARN_ON(!spage->page);
>> -		bio = btrfs_bio_alloc(1);
>> -		bio_set_dev(bio, spage->dev->bdev);
>> -
>> -		bio_add_page(bio, spage->page, fs_info->sectorsize, 0);
>> -		bio->bi_iter.bi_sector = spage->physical >> 9;
>> -		bio->bi_opf = REQ_OP_READ;
>> +		bio_init(&bio, spage->dev->bdev, &bvec, 1, REQ_OP_READ);
>> +		__bio_add_page(&bio, spage->page, fs_info->sectorsize, 0);
>
> Can we make the naming for __bio_add_page() better?
>
> With more on-stack bio usage, such __bio_add_page() is really a little
> embarrassing.

__bio_add_page is really just a micro-optimize version of
__bio_add_page for sinle page users like this.  To be honest we should
probably just stop using it and I should not have added it here.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair
  2022-03-23  0:20   ` Qu Wenruo
@ 2022-03-23  6:06     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:06 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:20:41AM +0800, Qu Wenruo wrote:
>
>
> On 2022/3/22 23:55, Christoph Hellwig wrote:
>> submit_bio_hooks is always set to btrfs_submit_data_bio, so just remove
>> it.
>>
>
> The same as my recent cleanup for it.
>
> https://lore.kernel.org/linux-btrfs/9e29ec4e546249018679224518a465d0240912b0.1647841657.git.wqu@suse.com/T/#u
>
> Although I did extra renaming as submit_read_repair() only works for
> data read.
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>

I'm fine doing either version.  This was just getting in the way of
other repair changes, which is why we probbaly both came up with it.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio
  2022-03-23  0:44   ` Qu Wenruo
@ 2022-03-23  6:08     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:08 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:44:19AM +0800, Qu Wenruo wrote:
> Previously we would also call bio_endio() on the bio, do we miss the
> endio call on it?

The first patch in the series moves the bio_endio call into one of the
callers of this function instead.  That being said mainline has a
different fix from Josef that makes the endio call on errors conditional
so this part of the series will require some rework.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio
  2022-03-23  0:50   ` Qu Wenruo
@ 2022-03-23  6:09     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:09 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:50:44AM +0800, Qu Wenruo wrote:
> Can we just put btrfs_map_bio() call into each read/write branch?

We could - that being said I find this version a lot cleaner as it
has a self-explanatory flow.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio
  2022-03-23  0:54   ` Qu Wenruo
@ 2022-03-23  6:11     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:11 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:54:21AM +0800, Qu Wenruo wrote:
> Something I want to avoid is to futher increasing the size of btrfs_bio.
>
> For buffered uncompressed IO, we can grab the inode from the first page.
> For direct IO we have bio->bi_private (btrfs_dio_private).
> For compressed IO, it's bio->bi_private again (compressed_bio).
>
> Do the saved code lines really validate the memory usage for all bios?

This isn't about the saved lines.  It allows to remove the async submit
and completion container structures that both point to an inode, and
later on the dio_private structure.  So overall it actually is a major
memory saving.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 24/40] btrfs: remove btrfs_end_io_wq
  2022-03-23  0:57   ` Qu Wenruo
@ 2022-03-23  6:11     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:11 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 08:57:03AM +0800, Qu Wenruo wrote:
>
>
> On 2022/3/22 23:55, Christoph Hellwig wrote:
>> Avoid the extra allocation for all read bios by embedding a btrfs_work
>> and I/O end type into the btrfs_bio structure.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Do we really need to bump the size of btrfs_bio furthermore?
>
> Especially btrfs_bio is allocated for each 64K stripe...

One of the async submission or completion contexts is allocated for
almost every btrfs_bio.  So overall this reduceѕ the memory usage
(at least together with the rest of the series).

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block
  2022-03-23  1:08   ` Qu Wenruo
@ 2022-03-23  6:13     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:13 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 09:08:44AM +0800, Qu Wenruo wrote:
>
>
> On 2022/3/22 23:55, Christoph Hellwig wrote:
>> Clear need_raid_map early instead of repeating the same conditional over
>> and over.
>
> I had a more comprehensive cleanup, but only for scrub:
> https://lore.kernel.org/linux-btrfs/cover.1646984153.git.wqu@suse.com/

I'll take a look.  I mostly wanted to avoid checking the same conditional
yet again for fast path added later in this series.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio
  2022-03-23  1:14   ` Qu Wenruo
@ 2022-03-23  6:13     ` Christoph Hellwig
  2022-03-23  6:59       ` Qu Wenruo
  0 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:13 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 09:14:06AM +0800, Qu Wenruo wrote:
> Really not a fan of enlarging btrfs_bio again and again.
>
> Especially for the btrfs_bio_stripe and btrfs_bio_stripe * part.
>
> Considering how many bytes we waste for SINGLE profile, now we need an
> extra pointer which we will never really use.

How do we waste memory?  We stop allocating the btrfs_io_context now
which can be quite big.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector
  2022-03-23  1:28   ` Qu Wenruo
@ 2022-03-23  6:15     ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:15 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 09:28:31AM +0800, Qu Wenruo wrote:
> Not a big fan of extra parameters for a function which already had enough...
>
> And I always have a question on repair (aka read from extra copy).
>
> Can't we just make the repair part to be synchronous?

I'll defer that to people who know the btrfs code better.  This basically
means we will block for a long time in the end_io workqueue, which could
have adverse consequences.

It would howerver simply a lot of things in a very nice way.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-23  1:39   ` Qu Wenruo
@ 2022-03-23  6:17     ` Christoph Hellwig
  2022-03-23  8:02       ` Qu Wenruo
  0 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  6:17 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 09:39:24AM +0800, Qu Wenruo wrote:
> Not familar with iomap thus I can be totally wrong, but isn't the idea
> of iomap to separate more code from fs?

Well, to share more code, which requires a certain abstraction, yes.

> I'm really not sure if it's a good idea to expose btrfs internal bio_set
> just for iomap.

We don't.  iomap still purely operates on the generic bio.  It just
allocates additional space for btrfs to use after ->submit_io is called.
Just like how e.g. VFS inodes can come with extra space for file
system use.

> Personally speaking I didn't see much problem of cloning an iomap bio,
> it only causes extra memory of btrfs_bio, which is pretty small previously.

It is yet another pointless memory allocation in something considered very
much a fast path.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio
  2022-03-23  6:13     ` Christoph Hellwig
@ 2022-03-23  6:59       ` Qu Wenruo
  2022-03-23  7:10         ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  6:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Josef Bacik, David Sterba, Qu Wenruo, Naohiro Aota, linux-btrfs,
	linux-fsdevel



On 2022/3/23 14:13, Christoph Hellwig wrote:
> On Wed, Mar 23, 2022 at 09:14:06AM +0800, Qu Wenruo wrote:
>> Really not a fan of enlarging btrfs_bio again and again.
>>
>> Especially for the btrfs_bio_stripe and btrfs_bio_stripe * part.
>>
>> Considering how many bytes we waste for SINGLE profile, now we need an
>> extra pointer which we will never really use.
>
> How do we waste memory?  We stop allocating the btrfs_io_context now
> which can be quite big.

Doesn't we waste the embedded __stripe if we choose to use the pointer one?
And vice versus.


And for SINGLE profile, we don't really need btrfs_bio_stripe at all, we
can fast-path just setting bdev and bi_sector, and submit without even
overriding its endio/private.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio
  2022-03-23  6:59       ` Qu Wenruo
@ 2022-03-23  7:10         ` Christoph Hellwig
  0 siblings, 0 replies; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  7:10 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 02:59:10PM +0800, Qu Wenruo wrote:
>> How do we waste memory?  We stop allocating the btrfs_io_context now
>> which can be quite big.
>
> Doesn't we waste the embedded __stripe if we choose to use the pointer one?
> And vice versus.
>
> And for SINGLE profile, we don't really need btrfs_bio_stripe at all, we
> can fast-path just setting bdev and bi_sector, and submit without even
> overriding its endio/private.

Yes, the 16 byes of the embedded stripe is wasted for I/O that doesn't
use it.  But all reads use it, which is typically the majority of all
I/O and the most performance critical one.  If the "waste" is a concern
we can split out a separate btrfs_read_bio.  Chance are that it will
just use slack by the time this all settles with a bunch of other
members that can be removed or unioned.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-23  6:17     ` Christoph Hellwig
@ 2022-03-23  8:02       ` Qu Wenruo
  2022-03-23  8:11         ` Christoph Hellwig
  0 siblings, 1 reply; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  8:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Josef Bacik, David Sterba, Qu Wenruo, Naohiro Aota, linux-btrfs,
	linux-fsdevel



On 2022/3/23 14:17, Christoph Hellwig wrote:
> On Wed, Mar 23, 2022 at 09:39:24AM +0800, Qu Wenruo wrote:
>> Not familar with iomap thus I can be totally wrong, but isn't the idea
>> of iomap to separate more code from fs?
>
> Well, to share more code, which requires a certain abstraction, yes.
>
>> I'm really not sure if it's a good idea to expose btrfs internal bio_set
>> just for iomap.
>
> We don't.  iomap still purely operates on the generic bio.  It just
> allocates additional space for btrfs to use after ->submit_io is called.
> Just like how e.g. VFS inodes can come with extra space for file
> system use.

OK, it's just higher layer pre-allocates those structures.

A little curious if there will be other users of this other than btrfs.

I guess for XFS/EXT4 they don't need any extra space and can just submit
the generic bio directly to their devices?

>
>> Personally speaking I didn't see much problem of cloning an iomap bio,
>> it only causes extra memory of btrfs_bio, which is pretty small previously.
>
> It is yet another pointless memory allocation in something considered very
> much a fast path.

Another concern is, this behavior mostly means we don't split the
generic bio.
Or we still need to allocate memory for the btrfs specific memory for
the new bio.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-23  8:02       ` Qu Wenruo
@ 2022-03-23  8:11         ` Christoph Hellwig
  2022-03-23  8:36           ` Qu Wenruo
  0 siblings, 1 reply; 81+ messages in thread
From: Christoph Hellwig @ 2022-03-23  8:11 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo,
	Naohiro Aota, linux-btrfs, linux-fsdevel

On Wed, Mar 23, 2022 at 04:02:34PM +0800, Qu Wenruo wrote:
> A little curious if there will be other users of this other than btrfs.
>
> I guess for XFS/EXT4 they don't need any extra space and can just submit
> the generic bio directly to their devices?

For normal I/O, yes.  But if we want to use Zone Append we'll basically
need to use this kind of hook everywhere. 

>>> Personally speaking I didn't see much problem of cloning an iomap bio,
>>> it only causes extra memory of btrfs_bio, which is pretty small previously.
>>
>> It is yet another pointless memory allocation in something considered very
>> much a fast path.
>
> Another concern is, this behavior mostly means we don't split the
> generic bio.
> Or we still need to allocate memory for the btrfs specific memory for
> the new bio.

With the current series we never split it, yes.  I'm relatively new
to btrfs, so why would we want to split the bio anyway?

As far as I can tell this is only done for parity raid, and maybe
we could actually do the split for those with just this scheme.

I.e. do what you are doing in your series in btrfs_map_bio and
allow to clone partial bios there, which should still be possible
threre.  We'd still need the high-level btrfs_bio to contain the
mapping and various end I/O infrastructure like the btrfs_work.


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 40/40] btrfs: use the iomap direct I/O bio directly
  2022-03-23  8:11         ` Christoph Hellwig
@ 2022-03-23  8:36           ` Qu Wenruo
  0 siblings, 0 replies; 81+ messages in thread
From: Qu Wenruo @ 2022-03-23  8:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Josef Bacik, David Sterba, Qu Wenruo, Naohiro Aota, linux-btrfs,
	linux-fsdevel



On 2022/3/23 16:11, Christoph Hellwig wrote:
> On Wed, Mar 23, 2022 at 04:02:34PM +0800, Qu Wenruo wrote:
>> A little curious if there will be other users of this other than btrfs.
>>
>> I guess for XFS/EXT4 they don't need any extra space and can just submit
>> the generic bio directly to their devices?
>
> For normal I/O, yes.  But if we want to use Zone Append we'll basically
> need to use this kind of hook everywhere.
>
>>>> Personally speaking I didn't see much problem of cloning an iomap bio,
>>>> it only causes extra memory of btrfs_bio, which is pretty small previously.
>>>
>>> It is yet another pointless memory allocation in something considered very
>>> much a fast path.
>>
>> Another concern is, this behavior mostly means we don't split the
>> generic bio.
>> Or we still need to allocate memory for the btrfs specific memory for
>> the new bio.
>
> With the current series we never split it, yes.  I'm relatively new
> to btrfs, so why would we want to split the bio anyway?

Two reasons, one is to make iomap call backs easier, they won't need to
bother the stripe boundary anymore if we can do the bio split inside btrfs.
(Also why I want to get rid of the zone boundary check, but not any clue
on that yet)

Another one is purely design pattern.
We want better layer separation, things like stripe boundary should
belong to chunk layer, thus split should also happen when we map the
bio, not the read/write path.

This is exactly how the stack drivers do, and I really hope to follow
the path.

>
> As far as I can tell this is only done for parity raid, and maybe
> we could actually do the split for those with just this scheme.
>
> I.e. do what you are doing in your series in btrfs_map_bio and
> allow to clone partial bios there, which should still be possible
> threre.

I guess yes, we are probably still able to do that.

Just to mention we can not really get rid of the memory allocation.

>  We'd still need the high-level btrfs_bio to contain the
> mapping and various end I/O infrastructure like the btrfs_work.
>
Then it looks fine to me now.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector
  2022-03-22 15:56 ` [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector Christoph Hellwig
  2022-03-23  1:28   ` Qu Wenruo
@ 2022-03-24  0:57   ` Sweet Tea Dorminy
  1 sibling, 0 replies; 81+ messages in thread
From: Sweet Tea Dorminy @ 2022-03-24  0:57 UTC (permalink / raw)
  To: Christoph Hellwig, Josef Bacik, David Sterba, Qu Wenruo
  Cc: Naohiro Aota, linux-btrfs, linux-fsdevel


> +		void *bi_private, void (*bi_end_io)(struct bio *bio))
Maybe the last parameter can just be "bio_end_io_t bi_end_io"?

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2022-03-24  0:57 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 15:55 RFC: cleanup btrfs bio handling Christoph Hellwig
2022-03-22 15:55 ` [PATCH 01/40] btrfs: fix submission hook error handling in btrfs_repair_one_sector Christoph Hellwig
2022-03-22 15:55 ` [PATCH 02/40] btrfs: fix direct I/O read repair for split bios Christoph Hellwig
2022-03-22 23:59   ` Qu Wenruo
2022-03-23  6:03     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 03/40] btrfs: fix direct I/O writes for split bios on zoned devices Christoph Hellwig
2022-03-23  0:00   ` Qu Wenruo
2022-03-23  6:04     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 04/40] btrfs: fix and document the zoned device choice in alloc_new_bio Christoph Hellwig
2022-03-22 15:55 ` [PATCH 05/40] btrfs: refactor __btrfsic_submit_bio Christoph Hellwig
2022-03-22 15:55 ` [PATCH 06/40] btrfs: split submit_bio from btrfsic checking Christoph Hellwig
2022-03-23  0:04   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 07/40] btrfs: simplify btrfsic_read_block Christoph Hellwig
2022-03-22 15:55 ` [PATCH 08/40] btrfs: simplify repair_io_failure Christoph Hellwig
2022-03-23  0:06   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 09/40] btrfs: simplify scrub_recheck_block Christoph Hellwig
2022-03-23  0:10   ` Qu Wenruo
2022-03-23  6:05     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 10/40] btrfs: simplify scrub_repair_page_from_good_copy Christoph Hellwig
2022-03-23  0:12   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 11/40] btrfs: move the call to bio_set_dev out of submit_stripe_bio Christoph Hellwig
2022-03-22 15:55 ` [PATCH 12/40] btrfs: pass a block_device to btrfs_bio_clone Christoph Hellwig
2022-03-22 15:55 ` [PATCH 13/40] btrfs: initialize ->bi_opf and ->bi_private in rbio_add_io_page Christoph Hellwig
2022-03-22 15:55 ` [PATCH 14/40] btrfs: don't allocate a btrfs_bio for raid56 per-stripe bios Christoph Hellwig
2022-03-23  0:16   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 15/40] btrfs: don't allocate a btrfs_bio for scrub bios Christoph Hellwig
2022-03-23  0:18   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 16/40] btrfs: stop using the btrfs_bio saved iter in index_rbio_pages Christoph Hellwig
2022-03-22 15:55 ` [PATCH 17/40] btrfs: remove the submit_bio_hook argument to submit_read_repair Christoph Hellwig
2022-03-23  0:20   ` Qu Wenruo
2022-03-23  6:06     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 18/40] btrfs: move more work into btrfs_end_bioc Christoph Hellwig
2022-03-23  0:29   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 19/40] btrfs: defer I/O completion based on the btrfs_raid_bio Christoph Hellwig
2022-03-22 15:55 ` [PATCH 20/40] btrfs: cleanup btrfs_submit_metadata_bio Christoph Hellwig
2022-03-23  0:34   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 21/40] btrfs: cleanup btrfs_submit_data_bio Christoph Hellwig
2022-03-23  0:44   ` Qu Wenruo
2022-03-23  6:08     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 22/40] btrfs: cleanup btrfs_submit_dio_bio Christoph Hellwig
2022-03-23  0:50   ` Qu Wenruo
2022-03-23  6:09     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 23/40] btrfs: store an inode pointer in struct btrfs_bio Christoph Hellwig
2022-03-23  0:54   ` Qu Wenruo
2022-03-23  6:11     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 24/40] btrfs: remove btrfs_end_io_wq Christoph Hellwig
2022-03-23  0:57   ` Qu Wenruo
2022-03-23  6:11     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 25/40] btrfs: remove btrfs_wq_submit_bio Christoph Hellwig
2022-03-22 15:55 ` [PATCH 26/40] btrfs: refactor btrfs_map_bio Christoph Hellwig
2022-03-23  1:03   ` Qu Wenruo
2022-03-22 15:55 ` [PATCH 27/40] btrfs: clean up the raid map handling __btrfs_map_block Christoph Hellwig
2022-03-23  1:08   ` Qu Wenruo
2022-03-23  6:13     ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 28/40] btrfs: do not allocate a btrfs_io_context in btrfs_map_bio Christoph Hellwig
2022-03-23  1:14   ` Qu Wenruo
2022-03-23  6:13     ` Christoph Hellwig
2022-03-23  6:59       ` Qu Wenruo
2022-03-23  7:10         ` Christoph Hellwig
2022-03-22 15:55 ` [PATCH 29/40] btrfs: do not allocate a btrfs_bio for low-level bios Christoph Hellwig
2022-03-22 15:55 ` [PATCH 30/40] iomap: add per-iomap_iter private data Christoph Hellwig
2022-03-22 15:55 ` [PATCH 31/40] iomap: add a new ->iomap_iter operation Christoph Hellwig
2022-03-22 15:55 ` [PATCH 32/40] iomap: optionally allocate dio bios from a file system bio_set Christoph Hellwig
2022-03-22 15:55 ` [PATCH 33/40] iomap: add a hint to ->submit_io if there is more I/O coming Christoph Hellwig
2022-03-22 15:56 ` [PATCH 34/40] btrfs: add a btrfs_dio_rw wrapper Christoph Hellwig
2022-03-22 15:56 ` [PATCH 35/40] btrfs: allocate dio_data on stack Christoph Hellwig
2022-03-22 15:56 ` [PATCH 36/40] btrfs: implement ->iomap_iter Christoph Hellwig
2022-03-22 15:56 ` [PATCH 37/40] btrfs: add a btrfs_get_stripe_info helper Christoph Hellwig
2022-03-23  1:23   ` Qu Wenruo
2022-03-22 15:56 ` [PATCH 38/40] btrfs: return a blk_status_t from btrfs_repair_one_sector Christoph Hellwig
2022-03-22 15:56 ` [PATCH 39/40] btrfs: pass private data end end_io handler to btrfs_repair_one_sector Christoph Hellwig
2022-03-23  1:28   ` Qu Wenruo
2022-03-23  6:15     ` Christoph Hellwig
2022-03-24  0:57   ` Sweet Tea Dorminy
2022-03-22 15:56 ` [PATCH 40/40] btrfs: use the iomap direct I/O bio directly Christoph Hellwig
2022-03-23  1:39   ` Qu Wenruo
2022-03-23  6:17     ` Christoph Hellwig
2022-03-23  8:02       ` Qu Wenruo
2022-03-23  8:11         ` Christoph Hellwig
2022-03-23  8:36           ` Qu Wenruo
2022-03-22 17:46 ` RFC: cleanup btrfs bio handling Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.