All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
@ 2022-03-14  9:07 Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
                   ` (18 more replies)
  0 siblings, 19 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This patchset be fetched from this branch:

https://github.com/adam900710/linux/tree/bio_split

[CHANGELOG]
RFC->v1:
- Better patch split
  Now patch 01~06 are refactors/cleanups/preparations.
  While 07~13 are the patches that doing the conversion while can handle
  both old and new bio split timings.
  Finally patch 14~16 convert the bio split call sites one by one to
  newer facility.
  The final patch is just a small clean up.

- Various bug fixes
  During the full fstests run, various stupid bugs are exposed and
  fixed.

v2:
- Fix the error paths for allocated but never submitted bios
  There are tons of error path that we allocate a bio but it goes
  bio_endio() directly without going through btrfs_map_bio().
  New ASSERTS() in endio functions require a populated btrfs_bio::iter,
  thus for such bios we still need to call btrfs_bio_save_iter() to
  populate btrfs_bio::iter to prevent such ASSERT()s get triggered.

- Fix scrub_stripe_index_and_offset() which abuses stripe_len and
  mapped_length

v3:
- Rebased to latest misc-next
  Now add extra patch to remove the btrfs_get_io_geometry() call used by
  encoded read.

[BACKGROUND]

Currently btrfs never uses bio_split() to split its bio against RAID
stripe boundaries.

Instead inside btrfs we check our stripe boundary every time we allocate
a new bio, and ensure the new bio never cross stripe boundaries.

This will make later iomap integration much easier, currently Goldwyn is
using btrfs_bio_clone() to clone bio from iomap, then using the cloned
bio to do buffered read/write.

With this patchset, it will be much easier to integrate later iomap
work.

[PROBLEMS]

Although this behavior works fine, it's against the common practice used in
stacked drivers, and is making the effort to convert to iomap harder.

There is also an hidden burden, every time we allocate a new bio, we uses
BIO_MAX_BVECS, but since we know the boundaries, for RAID0/RAID10 we can
only fit at most 16 pages (fixed 64K stripe size, and 4K page size),
wasting the 256 slots we allocated.

[CHALLENGES]

To change the situation, this patchset attempts to improve the situation
by moving the bio split into btrfs_map_bio() time, so upper layer should
no longer bother the bio split against RAID stripes or even chunk
boundaries.

But there are several challenges:

- Conflicts in various endio functions
  We want the existing granularity, instead of chained endio, thus we
  must make the involved endio functions to handle split bios.

  Although most endio functions are already doing their works
  independent of the bio size, they are not yet fully handling split
  bio.

  This patch will convert them to use saved bi_iter and only iterate
  the split range instead of the whole bio.
  This change involved 3 types of IOs:

  * Buffered IO
    Including both data and metadata
  * Direct IO
  * Compressed IO

  Their endio functions needs different level of updates to handle split
  bios.

  Furthermore, there is another endio, end_workqueue_bio(), it can't
  handle split bios at all, thus we change the timing so that
  btrfs_bio_wq_end_io() is only called after the bio being split.

- Checksum verification
  Currently we rely on btrfs_bio::csum to contain the checksum for the
  whole bio.
  If one bio get split, csum will no longer points to the correct
  location for the split bio.

  This can be solved by introducing btrfs_bio::offset_to_original, and
  use that new member to calculate where we should read csum from.

  For the parent bio, it still has btrfs_bio::csum for the whole bio,
  thus it can still free it correctly.

- Independent endio for each split bio
  Unlike stack drivers, for RAID10 btrfs needs to try its best effort to
  read every sectors, to handle the following case: (X means bad, either
  unable to read or failed to pass checksum verification, V means good)

  Dev 1	(missing) | D1 (X) |
  Dev 2 (OK)	  | D1 (V) |
  Dev 3 (OK)	  | D2 (V) |
  Dev 4 (OK)	  | D2 (X) |

  In the above RAID10 case, dev1 is missing, and although dev4 is fine,
  its D2 sector is corrupted (by bit rot or whatever).

  If we use bio_chain(), read bio for both D1 and D2 will be split, and
  since D1 is missing, the whole D1 and D2 read will be marked as error,
  thus we will try to read from dev2 and dev4.

  But D2 in dev4 has csum mismatch, we can only rebuild D1 and D2
  correctly from dev2:D1 and dev3:D2.

  This patchset resolve this by saving bi_iter into btrfs_bio::iter, and
  uses that at endio to iterate only the split part of an bio.
  Other than this, existing read/write page endio functions can handle
  them properly without problem.

- Bad RAID56 naming/functionality
  There are quite some RAID56 call sites relies on specific behavior on
  __btrfs_map_block(), like returning @map_length as stripe_len other
  than real mapped length.

  This is handled by some small cleanups specific for RAID56.

Qu Wenruo (18):
  btrfs: update an stale comment on btrfs_submit_bio_hook()
  btrfs: save bio::bi_iter into btrfs_bio::iter before any endio
  btrfs: use correct bio size for error message in btrfs_end_dio_bio()
  btrfs: refactor btrfs_map_bio()
  btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio()
  btrfs: replace btrfs_dio_private::refs with
    btrfs_dio_private::pending_bytes
  btrfs: introduce btrfs_bio_split() helper
  btrfs: make data buffered read path to handle split bio properly
  btrfs: make data buffered write endio function to be split bio
    compatible
  btrfs: make metadata write endio functions to be split bio compatible
  btrfs: make dec_and_test_compressed_bio() to be split bio compatible
  btrfs: return proper mapped length for RAID56 profiles in
    __btrfs_map_block()
  btrfs: allow btrfs_map_bio() to split bio according to chunk stripe
    boundaries
  btrfs: remove buffered IO stripe boundary calculation
  btrfs: remove stripe boundary calculation for compressed IO
  btrfs: remove the stripe boundary calculation for direct IO
  btrfs: remove the stripe boundary calcluation for encoded IO
  btrfs: unexport btrfs_get_io_geometry()

 fs/btrfs/btrfs_inode.h |  10 +-
 fs/btrfs/compression.c |  70 +++----------
 fs/btrfs/disk-io.c     |  11 +-
 fs/btrfs/extent_io.c   | 214 ++++++++++++++++++++++++++------------
 fs/btrfs/extent_io.h   |   2 +
 fs/btrfs/inode.c       | 230 +++++++++++++++--------------------------
 fs/btrfs/raid56.c      |  14 ++-
 fs/btrfs/raid56.h      |   2 +-
 fs/btrfs/scrub.c       |  14 +--
 fs/btrfs/volumes.c     | 144 +++++++++++++++++++-------
 fs/btrfs/volumes.h     |  74 +++++++++++--
 11 files changed, 452 insertions(+), 333 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-15  7:43   ` Christoph Hellwig
  2022-03-14  9:07 ` [PATCH v3 02/18] btrfs: save bio::bi_iter into btrfs_bio::iter before any endio Qu Wenruo
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This function is renamed to btrfs_submit_data_bio(), update the comment
and add extra reason why it doesn't completely follow the same rule in
btrfs_submit_data_bio().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2e7143ff5523..dded46291637 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7893,7 +7893,13 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 	bool write = btrfs_op(bio) == BTRFS_MAP_WRITE;
 	blk_status_t ret;
 
-	/* Check btrfs_submit_bio_hook() for rules about async submit. */
+	/*
+	 * Check btrfs_submit_data_bio() for rules about async submit.
+	 *
+	 * The only exception is for RAID56, when there are more than one bios
+	 * to submit, async submit seems to make it harder to collect csums
+	 * for the full stripe.
+	 */
 	if (async_submit)
 		async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 02/18] btrfs: save bio::bi_iter into btrfs_bio::iter before any endio
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 03/18] btrfs: use correct bio size for error message in btrfs_end_dio_bio() Qu Wenruo
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

Currently btrfs_bio::iter is only utilized by direct IO.

But later we will utilize btrfs_bio::iter to record the original
bi_iter, for all endio functions to iterate the original range.

Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to
save bi_iter into btrfs_bio::iter.

All path that can lead to an bio_endio() call needs such
btrfs_bio_save_iter() call.

Under most common case, there will be a btrfs_map_bio() call to handle
submitted bios.

While for other error out paths, we need to call btrfs_bio_save_iter()
manually, or later endio functions will ASSERT() on empty
btrfs_bio::iter.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c |  3 +++
 fs/btrfs/disk-io.c     |  2 ++
 fs/btrfs/extent_io.c   |  7 +++++++
 fs/btrfs/raid56.c      |  2 ++
 fs/btrfs/volumes.c     |  1 +
 fs/btrfs/volumes.h     | 17 +++++++++++++++++
 6 files changed, 32 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index be476f094300..1515c3c507a6 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -879,6 +879,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	/* include any pages we added in add_ra-bio_pages */
 	cb->len = bio->bi_iter.bi_size;
 
+	/* Save bi_iter so that end_bio_extent_readpage() won't freak out. */
+	btrfs_bio_save_iter(btrfs_bio(bio));
+
 	while (cur_disk_byte < disk_bytenr + compressed_len) {
 		u64 offset = cur_disk_byte - disk_bytenr;
 		unsigned int index = offset >> PAGE_SHIFT;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 09693ab4fde0..258ff67631e3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -824,6 +824,7 @@ static void run_one_async_done(struct btrfs_work *work)
 	/* If an error occurred we just want to clean up the bio and move on */
 	if (async->status) {
 		async->bio->bi_status = async->status;
+		btrfs_bio_save_iter(btrfs_bio(async->bio));
 		bio_endio(async->bio);
 		return;
 	}
@@ -956,6 +957,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 
 out_w_error:
 	bio->bi_status = ret;
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	bio_endio(bio);
 	return ret;
 }
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 78486bbd1ac9..cd23ea793838 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -174,6 +174,11 @@ int __must_check submit_one_bio(struct bio *bio, int mirror_num,
 
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bio->bi_iter.bi_size);
+	/*
+	 * This for later endio on errors, as later endio functions will rely
+	 * on btrfs_bio::iter.
+	 */
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	if (is_data_inode(tree->private_data))
 		ret = btrfs_submit_data_bio(tree->private_data, bio, mirror_num,
 					    bio_flags);
@@ -191,6 +196,7 @@ static void end_write_bio(struct extent_page_data *epd, int ret)
 
 	if (bio) {
 		bio->bi_status = errno_to_blk_status(ret);
+		btrfs_bio_save_iter(btrfs_bio(bio));
 		bio_endio(bio);
 		epd->bio_ctrl.bio = NULL;
 	}
@@ -3357,6 +3363,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 error:
 	bio_ctrl->bio = NULL;
 	bio->bi_status = errno_to_blk_status(ret);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	bio_endio(bio);
 	return ret;
 }
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0e239a4c3b26..13e726c88a81 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 		return PTR_ERR(rbio);
 	}
 	bio_list_add(&rbio->bio_list, bio);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 	rbio->operation = BTRFS_RBIO_WRITE;
 
@@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 
 	rbio->operation = BTRFS_RBIO_READ_REBUILD;
 	bio_list_add(&rbio->bio_list, bio);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	rbio->bio_list_bytes = bio->bi_iter.bi_size;
 
 	rbio->faila = find_logical_bio_stripe(rbio, bio);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1be7cb2f955f..9bc48a8368e8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6786,6 +6786,7 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	map_length = length;
 
 	btrfs_bio_counter_inc_blocked(fs_info);
+	btrfs_bio_save_iter(btrfs_bio(bio));
 	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
 				&map_length, &bioc, mirror_num, 1);
 	if (ret) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index bd297f23d19e..d600419fe6a5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -332,6 +332,12 @@ struct btrfs_bio {
 	struct btrfs_device *device;
 	u8 *csum;
 	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
+	/*
+	 * Saved bio::bi_iter before submission.
+	 *
+	 * This allows us to interate the cloned/split bio properly, as at
+	 * endio time bio::bi_iter is no longer reliable.
+	 */
 	struct bvec_iter iter;
 
 	/*
@@ -354,6 +360,17 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 	}
 }
 
+/*
+ * To save bbio::bio->bi_iter into bbio::iter so for callers who need the
+ * original bi_iter can access the original part of the bio.
+ * This is especially important for the incoming split btrfs_bio, which needs
+ * to call its endio for and only for the split range.
+ */
+static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio)
+{
+	bbio->iter = bbio->bio.bi_iter;
+}
+
 struct btrfs_io_stripe {
 	struct btrfs_device *dev;
 	u64 physical;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 03/18] btrfs: use correct bio size for error message in btrfs_end_dio_bio()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 02/18] btrfs: save bio::bi_iter into btrfs_bio::iter before any endio Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 04/18] btrfs: refactor btrfs_map_bio() Qu Wenruo
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

At endio time, bio->bi_iter is no longer valid (there are some cases
they are still valid, but never ensured).

Thus if we really want to get the full size of bio, we have to iterate
them.

In btrfs_end_dio_bio() when we hit error, we would grab bio size from
bi_iter which can be wrong.

Fix it by iterating the bvecs and calculate the bio size.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dded46291637..dbcb4ae9e06d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7866,12 +7866,19 @@ static void btrfs_end_dio_bio(struct bio *bio)
 	struct btrfs_dio_private *dip = bio->bi_private;
 	blk_status_t err = bio->bi_status;
 
-	if (err)
+	if (err) {
+		struct bvec_iter_all iter_all;
+		struct bio_vec *bvec;
+		u32 bi_size = 0;
+
+		bio_for_each_segment_all(bvec, bio, iter_all)
+			bi_size += bvec->bv_len;
+
 		btrfs_warn(BTRFS_I(dip->inode)->root->fs_info,
 			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
 			   btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio),
-			   bio->bi_opf, bio->bi_iter.bi_sector,
-			   bio->bi_iter.bi_size, err);
+			   bio->bi_opf, bio->bi_iter.bi_sector, bi_size, err);
+	}
 
 	if (bio_op(bio) == REQ_OP_READ)
 		err = btrfs_check_read_dio_bio(dip, btrfs_bio(bio), !err);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 04/18] btrfs: refactor btrfs_map_bio()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 03/18] btrfs: use correct bio size for error message in btrfs_end_dio_bio() Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 05/18] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Qu Wenruo
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

Currently in btrfs_map_bio() we call __btrfs_map_block(), then using the
returned bioc to submit real stripes.

This is fine if we're only going to handle one bio a time.

For the incoming bio split at btrfs_map_bio() time, we want to handle
several different bios, thus there we introduce a new helper,
submit_one_mapped_range() to handle the submission part, making it much
easier to make it work in a loop.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 67 ++++++++++++++++++++++++++++------------------
 1 file changed, 41 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9bc48a8368e8..10a5db07836b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6769,30 +6769,15 @@ static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logic
 	}
 }
 
-blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-			   int mirror_num)
+static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bio,
+				   struct btrfs_io_context *bioc, u64 map_length,
+				   int mirror_num)
 {
-	struct btrfs_device *dev;
 	struct bio *first_bio = bio;
-	u64 logical = bio->bi_iter.bi_sector << 9;
-	u64 length = 0;
-	u64 map_length;
-	int ret;
-	int dev_nr;
+	u64 logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	int total_devs;
-	struct btrfs_io_context *bioc = NULL;
-
-	length = bio->bi_iter.bi_size;
-	map_length = length;
-
-	btrfs_bio_counter_inc_blocked(fs_info);
-	btrfs_bio_save_iter(btrfs_bio(bio));
-	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
-				&map_length, &bioc, mirror_num, 1);
-	if (ret) {
-		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
-	}
+	int dev_nr;
+	int ret;
 
 	total_devs = bioc->num_stripes;
 	bioc->orig_bio = first_bio;
@@ -6811,18 +6796,19 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 						    mirror_num, 1);
 		}
 
-		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
+		return ret;
 	}
 
-	if (map_length < length) {
+	if (map_length < bio->bi_iter.bi_size) {
 		btrfs_crit(fs_info,
-			   "mapping failed logical %llu bio len %llu len %llu",
-			   logical, length, map_length);
+			   "mapping failed logical %llu bio len %u len %llu",
+			   logical, bio->bi_iter.bi_size, map_length);
 		BUG();
 	}
 
 	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
+		struct btrfs_device *dev;
+
 		dev = bioc->stripes[dev_nr].dev;
 		if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING,
 						   &dev->dev_state) ||
@@ -6839,6 +6825,35 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 
 		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
 	}
+	return 0;
+}
+
+blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+			   int mirror_num)
+{
+	u64 logical = bio->bi_iter.bi_sector << 9;
+	u64 length = 0;
+	u64 map_length;
+	int ret;
+	struct btrfs_io_context *bioc = NULL;
+
+	length = bio->bi_iter.bi_size;
+	map_length = length;
+
+	btrfs_bio_counter_inc_blocked(fs_info);
+	btrfs_bio_save_iter(btrfs_bio(bio));
+	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
+				&map_length, &bioc, mirror_num, 1);
+	if (ret) {
+		btrfs_bio_counter_dec(fs_info);
+		return errno_to_blk_status(ret);
+	}
+
+	ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num);
+	if (ret < 0) {
+		btrfs_bio_counter_dec(fs_info);
+		return errno_to_blk_status(ret);
+	}
 	btrfs_bio_counter_dec(fs_info);
 	return BLK_STS_OK;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 05/18] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (3 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 04/18] btrfs: refactor btrfs_map_bio() Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 06/18] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes Qu Wenruo
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This is a preparation patch for the incoming chunk mapping layer bio
split.

Function btrfs_bio_wq_end_io() is going to remap bio::bi_private and
bio::bi_end_io so that the real endio function will be executed in a
workqueue.

The problem is, remapped bio::bi_private will be a newly allocated
memory, and after the original endio executed, the memory will be freed.

This will not work well with split bio.

So this patch will move all btrfs_bio_wq_end_io() call into one helper
function, btrfs_bio_final_endio_remap(), and call that helper in
submit_stripe_bio().

This refactor also unified all data bio behaviors.

Before this patch, compressed bio no matter if read or write, will
always be delayed using workqueue.

However all data write operations are already delayed using ordered
extent, and all metadata write doesn't need any delayed execution.

Thus this patch will make compressed bios follow the same data
read/write behavior.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c |  4 +---
 fs/btrfs/disk-io.c     |  9 +--------
 fs/btrfs/inode.c       | 20 +++++---------------
 fs/btrfs/volumes.c     | 41 +++++++++++++++++++++++++++++++++++++----
 fs/btrfs/volumes.h     |  9 ++++++++-
 5 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 1515c3c507a6..5c9b28f2f034 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -430,10 +430,8 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
 {
 	blk_status_t ret;
 
+	btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA;
 	ASSERT(bio->bi_iter.bi_size);
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		return ret;
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	return ret;
 }
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 258ff67631e3..d18871876318 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -928,14 +928,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio,
 	blk_status_t ret;
 
 	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
-		/*
-		 * called for a read, do the setup so that checksum validation
-		 * can happen in the async kernel threads
-		 */
-		ret = btrfs_bio_wq_end_io(fs_info, bio,
-					  BTRFS_WQ_ENDIO_METADATA);
-		if (ret)
-			goto out_w_error;
+		btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_METADATA;
 		ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	} else if (!should_async_write(fs_info, BTRFS_I(inode))) {
 		ret = btree_csum_one_bio(bio);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index dbcb4ae9e06d..08b1c1b05d1f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2517,7 +2517,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_root *root = BTRFS_I(inode)->root;
-	enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA;
+	enum btrfs_wq_endio_type endio_type = BTRFS_WQ_ENDIO_DATA;
 	blk_status_t ret = 0;
 	int skip_sum;
 	int async = !atomic_read(&BTRFS_I(inode)->sync_writers);
@@ -2526,7 +2526,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 		test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
 
 	if (btrfs_is_free_space_inode(BTRFS_I(inode)))
-		metadata = BTRFS_WQ_ENDIO_FREE_SPACE;
+		endio_type = BTRFS_WQ_ENDIO_FREE_SPACE;
 
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 		struct page *page = bio_first_bvec_all(bio)->bv_page;
@@ -2538,10 +2538,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 	}
 
 	if (btrfs_op(bio) != BTRFS_MAP_WRITE) {
-		ret = btrfs_bio_wq_end_io(fs_info, bio, metadata);
-		if (ret)
-			goto out;
-
+		btrfs_bio(bio)->endio_type = endio_type;
 		if (bio_flags & EXTENT_BIO_COMPRESSED) {
 			/*
 			 * btrfs_submit_compressed_read will handle completing
@@ -7781,10 +7778,6 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
 
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
 
-	ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-	if (ret)
-		return ret;
-
 	refcount_inc(&dip->refs);
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	if (ret)
@@ -7910,11 +7903,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
 	if (async_submit)
 		async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
 
-	if (!write) {
-		ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA);
-		if (ret)
-			goto err;
-	}
+	if (!write)
+		btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
 		goto map;
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 10a5db07836b..d2b9cba1e5fd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6717,10 +6717,31 @@ static void btrfs_end_bio(struct bio *bio)
 	}
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
-			      u64 physical, struct btrfs_device *dev)
+/*
+ * Endio remaps which can't handle cloned bio needs to go here.
+ *
+ * Currently it's only btrfs_bio_wq_end_io().
+ */
+static int btrfs_bio_final_endio_remap(struct btrfs_fs_info *fs_info,
+				       struct bio *bio)
+{
+	blk_status_t sts;
+
+	/* For write bio, we don't to put their endio into wq */
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
+		return 0;
+
+	sts = btrfs_bio_wq_end_io(fs_info, bio, btrfs_bio(bio)->endio_type);
+	if (sts != BLK_STS_OK)
+		return blk_status_to_errno(sts);
+	return 0;
+}
+
+static int submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
+			     u64 physical, struct btrfs_device *dev)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
+	int ret;
 
 	bio->bi_private = bioc;
 	btrfs_bio(bio)->device = dev;
@@ -6747,9 +6768,14 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio,
 		dev->devid, bio->bi_iter.bi_size);
 	bio_set_dev(bio, dev->bdev);
 
-	btrfs_bio_counter_inc_noblocked(fs_info);
+	/* Do the final endio remap if needed */
+	ret = btrfs_bio_final_endio_remap(fs_info, bio);
+	if (ret < 0)
+		return ret;
 
+	btrfs_bio_counter_inc_noblocked(fs_info);
 	btrfsic_submit_bio(bio);
+	return ret;
 }
 
 static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical)
@@ -6823,9 +6849,16 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi
 		else
 			bio = first_bio;
 
-		submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev);
+		ret = submit_stripe_bio(bioc, bio,
+					bioc->stripes[dev_nr].physical, dev);
+		if (ret < 0)
+			goto error;
 	}
 	return 0;
+error:
+	for (; dev_nr < total_devs; dev_nr++)
+		bioc_error(bioc, first_bio, logical);
+	return ret;
 }
 
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index d600419fe6a5..6f5519241971 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -326,7 +326,14 @@ struct btrfs_fs_devices {
  * Mostly for btrfs specific features like csum and mirror_num.
  */
 struct btrfs_bio {
-	unsigned int mirror_num;
+	u16 mirror_num;
+
+	/*
+	 * To tell which workqueue the bio's endio should be exeucted in.
+	 *
+	 * Only for read bios.
+	 */
+	u16 endio_type;
 
 	/* @device is for stripe IO submission. */
 	struct btrfs_device *device;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 06/18] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (4 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 05/18] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 07/18] btrfs: introduce btrfs_bio_split() helper Qu Wenruo
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This mostly follows the behavior of compressed_bio::pending_sectors.

The point here is, dip::refs is not split bio friendly, as if a bio with
its bi_private = dip, and the bio get split, we can easily underflow
dip::refs.

By using the same sector based solution as compressed_bio, dio can
handle both unsplit and split bios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/btrfs_inode.h | 10 +++----
 fs/btrfs/inode.c       | 67 +++++++++++++++++++++---------------------
 2 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 47e72d72f7d0..709d6840aada 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -396,11 +396,11 @@ struct btrfs_dio_private {
 	/* Used for bio::bi_size */
 	u32 bytes;
 
-	/*
-	 * References to this structure. There is one reference per in-flight
-	 * bio plus one while we're still setting up.
-	 */
-	refcount_t refs;
+	/* Hit any error for the whole DIO bio */
+	bool errors;
+
+	/* How many bytes are still under IO or not submitted */
+	atomic_t pending_bytes;
 
 	/* dio_bio came from fs/direct-io.c */
 	struct bio *dio_bio;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 08b1c1b05d1f..bcbb47ca473e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7744,20 +7744,28 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length,
 	return ret;
 }
 
-static void btrfs_dio_private_put(struct btrfs_dio_private *dip)
+static bool dec_and_test_dio_private(struct btrfs_dio_private *dip, bool error,
+				     u32 bytes)
 {
-	/*
-	 * This implies a barrier so that stores to dio_bio->bi_status before
-	 * this and loads of dio_bio->bi_status after this are fully ordered.
-	 */
-	if (!refcount_dec_and_test(&dip->refs))
+	ASSERT(bytes <= dip->bytes);
+	ASSERT(bytes <= atomic_read(&dip->pending_bytes));
+
+	if (error)
+		dip->errors = true;
+	return atomic_sub_and_test(bytes, &dip->pending_bytes);
+}
+
+static void dio_private_finish(struct btrfs_dio_private *dip, bool error,
+			       u32 bytes)
+{
+	if (!dec_and_test_dio_private(dip, error, bytes))
 		return;
 
 	if (btrfs_op(dip->dio_bio) == BTRFS_MAP_WRITE) {
 		__endio_write_update_ordered(BTRFS_I(dip->inode),
 					     dip->file_offset,
 					     dip->bytes,
-					     !dip->dio_bio->bi_status);
+					     !dip->errors);
 	} else {
 		unlock_extent(&BTRFS_I(dip->inode)->io_tree,
 			      dip->file_offset,
@@ -7778,10 +7786,10 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio,
 
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
 
-	refcount_inc(&dip->refs);
+	atomic_add(bio->bi_iter.bi_size, &dip->pending_bytes);
 	ret = btrfs_map_bio(fs_info, bio, mirror_num);
 	if (ret)
-		refcount_dec(&dip->refs);
+		atomic_sub(bio->bi_iter.bi_size, &dip->pending_bytes);
 	return ret;
 }
 
@@ -7857,20 +7865,20 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
 static void btrfs_end_dio_bio(struct bio *bio)
 {
 	struct btrfs_dio_private *dip = bio->bi_private;
+	struct bvec_iter iter;
+	struct bio_vec bvec;
+	u32 bi_size = 0;
 	blk_status_t err = bio->bi_status;
 
-	if (err) {
-		struct bvec_iter_all iter_all;
-		struct bio_vec *bvec;
-		u32 bi_size = 0;
-
-		bio_for_each_segment_all(bvec, bio, iter_all)
-			bi_size += bvec->bv_len;
+	__bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter)
+		bi_size += bvec.bv_len;
 
+	if (err) {
 		btrfs_warn(BTRFS_I(dip->inode)->root->fs_info,
 			   "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d",
 			   btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio),
 			   bio->bi_opf, bio->bi_iter.bi_sector, bi_size, err);
+		dip->errors = true;
 	}
 
 	if (bio_op(bio) == REQ_OP_READ)
@@ -7882,7 +7890,7 @@ static void btrfs_end_dio_bio(struct bio *bio)
 	btrfs_record_physical_zoned(dip->inode, dip->file_offset, bio);
 
 	bio_put(bio);
-	btrfs_dio_private_put(dip);
+	dio_private_finish(dip, err, bi_size);
 }
 
 static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
@@ -7941,7 +7949,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
  */
 static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 							  struct inode *inode,
-							  loff_t file_offset)
+							  loff_t file_offset,
+							  u32 length)
 {
 	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
 	const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM);
@@ -7961,12 +7970,12 @@ static struct btrfs_dio_private *btrfs_create_dio_private(struct bio *dio_bio,
 	if (!dip)
 		return NULL;
 
+	atomic_set(&dip->pending_bytes, length);
 	dip->inode = inode;
 	dip->file_offset = file_offset;
 	dip->bytes = dio_bio->bi_iter.bi_size;
 	dip->disk_bytenr = dio_bio->bi_iter.bi_sector << 9;
 	dip->dio_bio = dio_bio;
-	refcount_set(&dip->refs, 1);
 	return dip;
 }
 
@@ -7980,6 +7989,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 			     BTRFS_BLOCK_GROUP_RAID56_MASK);
 	struct btrfs_dio_private *dip;
 	struct bio *bio;
+	const u32 length = dio_bio->bi_iter.bi_size;
+	u32 submitted_bytes = 0;
 	u64 start_sector;
 	int async_submit = 0;
 	u64 submit_len;
@@ -7992,7 +8003,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	struct btrfs_dio_data *dio_data = iter->iomap.private;
 	struct extent_map *em = NULL;
 
-	dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
+	dip = btrfs_create_dio_private(dio_bio, inode, file_offset, length);
 	if (!dip) {
 		if (!write) {
 			unlock_extent(&BTRFS_I(inode)->io_tree, file_offset,
@@ -8002,7 +8013,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		bio_endio(dio_bio);
 		return;
 	}
-
 	if (!write) {
 		/*
 		 * Load the csums up front to reduce csum tree searches and
@@ -8056,17 +8066,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		ASSERT(submit_len >= clone_len);
 		submit_len -= clone_len;
 
-		/*
-		 * Increase the count before we submit the bio so we know
-		 * the end IO handler won't happen before we increase the
-		 * count. Otherwise, the dip might get freed before we're
-		 * done setting it up.
-		 *
-		 * We transfer the initial reference to the last bio, so we
-		 * don't need to increment the reference count for the last one.
-		 */
 		if (submit_len > 0) {
-			refcount_inc(&dip->refs);
 			/*
 			 * If we are submitting more than one bio, submit them
 			 * all asynchronously. The exception is RAID 5 or 6, as
@@ -8081,11 +8081,10 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 						async_submit);
 		if (status) {
 			bio_put(bio);
-			if (submit_len > 0)
-				refcount_dec(&dip->refs);
 			goto out_err_em;
 		}
 
+		submitted_bytes += clone_len;
 		dio_data->submitted += clone_len;
 		clone_offset += clone_len;
 		start_sector += clone_len >> 9;
@@ -8099,7 +8098,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 	free_extent_map(em);
 out_err:
 	dip->dio_bio->bi_status = status;
-	btrfs_dio_private_put(dip);
+	dio_private_finish(dip, status, length - submitted_bytes);
 }
 
 const struct iomap_ops btrfs_dio_iomap_ops = {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 07/18] btrfs: introduce btrfs_bio_split() helper
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (5 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 06/18] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 08/18] btrfs: make data buffered read path to handle split bio properly Qu Wenruo
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This new function will handle the split of a btrfs bio, to co-operate
with the incoming chunk mapping time bio split.

This patch will introduce the following new members and functions:

- btrfs_bio::offset_to_original
  Since btrfs_bio::csum is still storing the checksum for the original
  logical bytenr, we need to know the offset between current advanced
  bio and the original logical bytenr.

  Thus here we need such new member.
  And the new member will fit into the existing hole between
  btrfs_bio::mirror_num and btrfs_bio::device, it should not increase
  the memory usage of btrfs_bio.

- btrfs_bio::parent and btrfs_bio::orig_endio
  To record where the parent bio is and the original endio function.

- btrfs_bio::is_split_bio
  To distinguish bio created by btrfs_bio_split() and
  btrfs_bio_clone*().

  For cloned bio, they still have their csum pointed to correct memory,
  while split bio must rely on its parent bbio to grab csum pointer.

- split_bio_endio()
  Just to call the original endio function then call bio_endio() on
  the original bio.
  This will ensure the original bio is freed after all cloned bio.

- btrfs_split_bio()
  Split the original bio into two, the behavior is pretty much the same
  as bio_split(), just with extra btrfs specific setup.

Currently there is no other caller utilizing above new members/functions
yet.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 82 +++++++++++++++++++++++++++++++++++++++++++-
 fs/btrfs/extent_io.h |  2 ++
 fs/btrfs/volumes.h   | 43 +++++++++++++++++++++--
 3 files changed, 123 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cd23ea793838..e3ae20058cea 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3010,7 +3010,6 @@ static void end_bio_extent_readpage(struct bio *bio)
 	int ret;
 	struct bvec_iter_all iter_all;
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		bool uptodate = !bio->bi_status;
 		struct page *page = bvec->bv_page;
@@ -3194,6 +3193,87 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
 	return bio;
 }
 
+/*
+ * A very simple wrapper to call original endio function and then
+ * call bio_endio() on the parent bio to decrease its bi_remaining count.
+ */
+static void split_bio_endio(struct bio *bio)
+{
+	struct btrfs_bio *bbio = btrfs_bio(bio);
+	/* After endio bbio could be freed, thus grab the info before endio */
+	struct bio *parent = bbio->parent;
+
+	/*
+	 * BIO_CLONED can even be set for our parent bio (DIO use clones
+	 * the initial bio, then uses the cloned one for IO).
+	 * So here we don't check BIO_CLONED for parent.
+	 */
+	ASSERT(bio_flagged(bio, BIO_CLONED) && bbio->is_split_bio);
+	ASSERT(parent && !btrfs_bio(parent)->is_split_bio);
+
+	bio->bi_end_io = bbio->orig_endio;
+	bio_endio(bio);
+	bio_endio(parent);
+}
+
+/*
+ * Pretty much like bio_split(), caller needs to ensure @src is not freed
+ * before the newly allocated bio, as the new bio is relying on @src for
+ * its bvecs.
+ */
+struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info,
+			    struct bio *src, unsigned int bytes)
+{
+	struct bio *new;
+	struct btrfs_bio *src_bbio = btrfs_bio(src);
+	struct btrfs_bio *new_bbio;
+	const unsigned int old_offset = src_bbio->offset_to_original;
+
+	/* Src should not be split */
+	ASSERT(!src_bbio->is_split_bio);
+	ASSERT(IS_ALIGNED(bytes, fs_info->sectorsize));
+	ASSERT(bytes < src->bi_iter.bi_size);
+
+	/*
+	 * We're in fact chaining the new bio to the parent, but we still want
+	 * to have independent bi_private/bi_endio, thus we need to manually
+	 * increase the remaining for the source, just like bio_chain().
+	 */
+	bio_inc_remaining(src);
+
+	/* Bioset backed split should not fail */
+	new = bio_split(src, bytes >> SECTOR_SHIFT, GFP_NOFS, &btrfs_bioset);
+	new_bbio = btrfs_bio(new);
+	new_bbio->offset_to_original = old_offset;
+	new_bbio->iter = new->bi_iter;
+	new_bbio->orig_endio = src->bi_end_io;
+	new_bbio->parent = src;
+	new_bbio->endio_type = src_bbio->endio_type;
+	new_bbio->is_split_bio = 1;
+	new->bi_end_io = split_bio_endio;
+
+	/*
+	 * This is very tricky, as if any endio has extra refcount on
+	 * bi_private, we will be screwed up.
+	 *
+	 * We workaround this hacky behavior by reviewing all the involved
+	 * endio stacks. Making sure only split-safe endio remap are called.
+	 *
+	 * Split-unsafe endio remap like btrfs_bio_wq_end_io() will be called
+	 * after btrfs_bio_split().
+	 */
+	new->bi_private = src->bi_private;
+
+	src_bbio->offset_to_original += bytes;
+
+	/*
+	 * For direct IO, @src is a cloned bio thus bbio::iter still points to
+	 * the full bio. Need to update it too.
+	 */
+	src_bbio->iter = src->bi_iter;
+	return new;
+}
+
 /**
  * Attempt to add a page to bio
  *
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 151e9da5da2d..493c2cd96424 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -280,6 +280,8 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
 struct bio *btrfs_bio_clone(struct bio *bio);
 struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
+struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info,
+			    struct bio *src, unsigned int bytes);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6f5519241971..c73d2fbf80a7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -330,15 +330,52 @@ struct btrfs_bio {
 
 	/*
 	 * To tell which workqueue the bio's endio should be exeucted in.
+	 * This member is to make sure btrfs_bio_wq_end_io() is the last
+	 * endio remap in the stack.
 	 *
 	 * Only for read bios.
 	 */
-	u16 endio_type;
+	u8 endio_type;
+
+	/*
+	 * To tell if this btrfs bio is split or just cloned.
+	 * Both btrfs_bio_clone*() and btrfs_bio_split() will make bbio->bio
+	 * to have BIO_CLONED flag.
+	 * But cloned bio still has its bbio::csum pointed to correct memory,
+	 * unlike split bio relies on its parent bbio to grab csum.
+	 *
+	 * Thus we needs this extra flag to distinguish those cloned bio.
+	 */
+	u8 is_split_bio;
+
+	/*
+	 * Records the offset we're from the original bio.
+	 *
+	 * Since btrfs_bio can be split, but our csum is alwasy for the
+	 * original logical bytenr, we need a way to know the bytes offset
+	 * from the original logical bytenr to do proper csum verification.
+	 */
+	unsigned int offset_to_original;
 
 	/* @device is for stripe IO submission. */
 	struct btrfs_device *device;
-	u8 *csum;
-	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
+
+	union {
+		/*
+		 * For the parent bio recording the csum for the original
+		 * logical bytenr
+		 */
+		struct {
+			u8 *csum;
+			u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
+		};
+
+		/* For child (split) bio to record where its parent is */
+		struct {
+			struct bio *parent;
+			bio_end_io_t *orig_endio;
+		};
+	};
 	/*
 	 * Saved bio::bi_iter before submission.
 	 *
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 08/18] btrfs: make data buffered read path to handle split bio properly
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (6 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 07/18] btrfs: introduce btrfs_bio_split() helper Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 09/18] btrfs: make data buffered write endio function to be split bio compatible Qu Wenruo
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This involves the following modifications:

- Use bio_for_each_segment() instead of bio_for_each_segment_all()
  bio_for_each_segment_all() will iterate all bvecs, even if they are
  not referred by current bi_iter.

  Change it to __bio_for_each_segment() call so we won't have endio called
  on the same range by both split and parent bios, and it can handle
  both split and unsplit bios.

- Make check_data_csum() to take bbio->offset_to_original into
  consideration
  Since btrfs bio can be split now, split/original bio can all start
  with some offset to the original logical bytenr.

  Take btrfs_bio::offset_to_original into consideration to get correct
  checksum offset.

- Remove the BIO_CLONED ASSERT() in submit_read_repair()

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 34 ++++++++++++++++++----------------
 fs/btrfs/inode.c     | 23 +++++++++++++++++++++--
 fs/btrfs/volumes.h   |  3 ++-
 3 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e3ae20058cea..240bdeec2346 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2739,11 +2739,8 @@ static blk_status_t submit_read_repair(struct inode *inode,
 	/* We're here because we had some read errors or csum mismatch */
 	ASSERT(error_bitmap);
 
-	/*
-	 * We only get called on buffered IO, thus page must be mapped and bio
-	 * must not be cloned.
-	 */
-	ASSERT(page->mapping && !bio_flagged(failed_bio, BIO_CLONED));
+	/* We only get called on buffered IO, thus page must be mapped */
+	ASSERT(page->mapping);
 
 	/* Iterate through all the sectors in the range */
 	for (i = 0; i < nr_bits; i++) {
@@ -2997,7 +2994,8 @@ static struct extent_buffer *find_extent_buffer_readpage(
  */
 static void end_bio_extent_readpage(struct bio *bio)
 {
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	struct btrfs_bio *bbio = btrfs_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
@@ -3008,11 +3006,15 @@ static void end_bio_extent_readpage(struct bio *bio)
 	u32 bio_offset = 0;
 	int mirror;
 	int ret;
-	struct bvec_iter_all iter_all;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
+	/*
+	 * We should have saved the original bi_iter, and then start iterating
+	 * using that saved iter, as at endio time bi_iter is not reliable.
+	 */
+	ASSERT(bbio->iter.bi_size);
+	__bio_for_each_segment(bvec, bio, iter, bbio->iter) {
 		bool uptodate = !bio->bi_status;
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
@@ -3035,19 +3037,19 @@ static void end_bio_extent_readpage(struct bio *bio)
 		 * for unaligned offsets, and an error if they don't add up to
 		 * a full sector.
 		 */
-		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+		if (!IS_ALIGNED(bvec.bv_offset, sectorsize))
 			btrfs_err(fs_info,
 		"partial page read in btrfs with offset %u and length %u",
-				  bvec->bv_offset, bvec->bv_len);
-		else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+				  bvec.bv_offset, bvec.bv_len);
+		else if (!IS_ALIGNED(bvec.bv_offset + bvec.bv_len,
 				     sectorsize))
 			btrfs_info(fs_info,
 		"incomplete page read with offset %u and length %u",
-				   bvec->bv_offset, bvec->bv_len);
+				   bvec.bv_offset, bvec.bv_len);
 
-		start = page_offset(page) + bvec->bv_offset;
-		end = start + bvec->bv_len - 1;
-		len = bvec->bv_len;
+		start = page_offset(page) + bvec.bv_offset;
+		end = start + bvec.bv_len - 1;
+		len = bvec.bv_len;
 
 		mirror = bbio->mirror_num;
 		if (likely(uptodate)) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bcbb47ca473e..d5f4c102bab3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3241,6 +3241,24 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode,
 				       finish_ordered_fn, uptodate);
 }
 
+static u8 *bbio_get_real_csum(struct btrfs_fs_info *fs_info,
+			      struct btrfs_bio *bbio)
+{
+	u8 *ret;
+
+	/* Split bbio needs to grab csum from its parent */
+	if (bbio->is_split_bio)
+		ret = btrfs_bio(bbio->parent)->csum;
+	else
+		ret = bbio->csum;
+
+	if (ret == NULL)
+		return ret;
+
+	return ret + (bbio->offset_to_original >> fs_info->sectorsize_bits) *
+		     fs_info->csum_size;
+}
+
 /*
  * check_data_csum - verify checksum of one sector of uncompressed data
  * @inode:	inode
@@ -3268,7 +3286,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio,
 	ASSERT(pgoff + len <= PAGE_SIZE);
 
 	offset_sectors = bio_offset >> fs_info->sectorsize_bits;
-	csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size;
+	csum_expected = bbio_get_real_csum(fs_info, bbio) +
+			offset_sectors * csum_size;
 
 	kaddr = kmap_atomic(page);
 	shash->tfm = fs_info->csum_shash;
@@ -3326,7 +3345,7 @@ unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio,
 	 * Normally this should be covered by above check for compressed read
 	 * or the next check for NODATASUM.  Just do a quicker exit here.
 	 */
-	if (bbio->csum == NULL)
+	if (bbio_get_real_csum(fs_info, bbio) == NULL)
 		return 0;
 
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c73d2fbf80a7..5496b8750e28 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -398,7 +398,8 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
 
 static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 {
-	if (bbio->csum != bbio->csum_inline) {
+	/* Only free the csum if we're not a split bio */
+	if (!bbio->is_split_bio && bbio->csum != bbio->csum_inline) {
 		kfree(bbio->csum);
 		bbio->csum = NULL;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 09/18] btrfs: make data buffered write endio function to be split bio compatible
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (7 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 08/18] btrfs: make data buffered read path to handle split bio properly Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 10/18] btrfs: make metadata write endio functions " Qu Wenruo
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

Only need to change the bio_for_each_segment_all() call to
__bio_for_each_segment() call, and using btrfs_bio::iter as the initial
bi_iter.

Now the endio function can handle both split and unsplit bios well.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 240bdeec2346..a758a5acb8fb 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2830,31 +2830,31 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
 static void end_bio_extent_writepage(struct bio *bio)
 {
 	int error = blk_status_to_errno(bio->bi_status);
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	u64 start;
 	u64 end;
-	struct bvec_iter_all iter_all;
 	bool first_bvec = true;
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+	ASSERT(btrfs_bio(bio)->iter.bi_size);
+	__bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) {
+		struct page *page = bvec.bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
 
 		/* Our read/write should always be sector aligned. */
-		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+		if (!IS_ALIGNED(bvec.bv_offset, sectorsize))
 			btrfs_err(fs_info,
 		"partial page write in btrfs with offset %u and length %u",
-				  bvec->bv_offset, bvec->bv_len);
-		else if (!IS_ALIGNED(bvec->bv_len, sectorsize))
+				  bvec.bv_offset, bvec.bv_len);
+		else if (!IS_ALIGNED(bvec.bv_len, sectorsize))
 			btrfs_info(fs_info,
 		"incomplete page write with offset %u and length %u",
-				   bvec->bv_offset, bvec->bv_len);
+				   bvec.bv_offset, bvec.bv_len);
 
-		start = page_offset(page) + bvec->bv_offset;
-		end = start + bvec->bv_len - 1;
+		start = page_offset(page) + bvec.bv_offset;
+		end = start + bvec.bv_len - 1;
 
 		if (first_bvec) {
 			btrfs_record_physical_zoned(inode, start, bio);
@@ -2863,7 +2863,7 @@ static void end_bio_extent_writepage(struct bio *bio)
 
 		end_extent_writepage(page, error, start, end);
 
-		btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
+		btrfs_page_clear_writeback(fs_info, page, start, bvec.bv_len);
 	}
 
 	bio_put(bio);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 10/18] btrfs: make metadata write endio functions to be split bio compatible
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (8 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 09/18] btrfs: make data buffered write endio function to be split bio compatible Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 11/18] btrfs: make dec_and_test_compressed_bio() " Qu Wenruo
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

Two modifications are needed:

- Convert to __bio_for_each_segment()
  bio_for_each_segment_all() should not be called on cloned bio, as it
  will iterate range which no longer belongs to the split bio.

- Avoid bio_first_page_all() for end_bio_subpage_eb_writepage()
  bio_first_page_all() will use the original bvec, thus on cloned bio it
  will trigger a WARN_ON().

  Introduce a helper to grab page and fs_info from bios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 43 +++++++++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a758a5acb8fb..e8c298572d3e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4482,6 +4482,21 @@ static struct extent_buffer *find_extent_buffer_nolock(
 	return NULL;
 }
 
+/*
+ * Since the bio can be cloned, we can no longer use bio_first_*_all()
+ * calls to grab a page.
+ * Thus here we introduce such helper to grab page and fs_info correctly.
+ */
+static struct btrfs_fs_info *bio_to_fs_info(struct bio *bio)
+{
+	struct bio_vec bvec;
+
+	bvec = bio_iter_iovec(bio, btrfs_bio(bio)->iter);
+
+	ASSERT(bvec.bv_page->mapping);
+	return btrfs_sb(bvec.bv_page->mapping->host->i_sb);
+}
+
 /*
  * The endio function for subpage extent buffer write.
  *
@@ -4491,20 +4506,20 @@ static struct extent_buffer *find_extent_buffer_nolock(
 static void end_bio_subpage_eb_writepage(struct bio *bio)
 {
 	struct btrfs_fs_info *fs_info;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter iter;
+	struct bio_vec bvec;
 
-	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
+	fs_info = bio_to_fs_info(bio);
 	ASSERT(fs_info->sectorsize < PAGE_SIZE);
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
-		u64 bvec_start = page_offset(page) + bvec->bv_offset;
-		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+	ASSERT(btrfs_bio(bio)->iter.bi_size);
+	__bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) {
+		struct page *page = bvec.bv_page;
+		u64 bvec_start = page_offset(page) + bvec.bv_offset;
+		u64 bvec_end = bvec_start + bvec.bv_len - 1;
 		u64 cur_bytenr = bvec_start;
 
-		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+		ASSERT(IS_ALIGNED(bvec.bv_len, fs_info->nodesize));
 
 		/* Iterate through all extent buffers in the range */
 		while (cur_bytenr <= bvec_end) {
@@ -4547,14 +4562,14 @@ static void end_bio_subpage_eb_writepage(struct bio *bio)
 
 static void end_bio_extent_buffer_writepage(struct bio *bio)
 {
-	struct bio_vec *bvec;
 	struct extent_buffer *eb;
+	struct bvec_iter iter;
+	struct bio_vec bvec;
 	int done;
-	struct bvec_iter_all iter_all;
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+	ASSERT(btrfs_bio(bio)->iter.bi_size);
+	__bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) {
+		struct page *page = bvec.bv_page;
 
 		eb = (struct extent_buffer *)page->private;
 		BUG_ON(!eb);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 11/18] btrfs: make dec_and_test_compressed_bio() to be split bio compatible
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (9 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 10/18] btrfs: make metadata write endio functions " Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 12/18] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() Qu Wenruo
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

For compression read write endio functions, they all rely on
dec_and_test_compressed_bio() to determine if they are the last bio.

So here we only need to convert the bio_for_each_segment_all() call into
__bio_for_each_segment() so that compression read/write endio functions
will handle both split and unsplit bios well.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 5c9b28f2f034..e9b0887c03a9 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -205,18 +205,14 @@ static int check_compressed_csum(struct btrfs_inode *inode, struct bio *bio,
 static bool dec_and_test_compressed_bio(struct compressed_bio *cb, struct bio *bio)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
+	struct bio_vec bvec;
+	struct bvec_iter iter;
 	unsigned int bi_size = 0;
 	bool last_io = false;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
 
-	/*
-	 * At endio time, bi_iter.bi_size doesn't represent the real bio size.
-	 * Thus here we have to iterate through all segments to grab correct
-	 * bio size.
-	 */
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		bi_size += bvec->bv_len;
+	ASSERT(btrfs_bio(bio)->iter.bi_size);
+	__bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter)
+		bi_size += bvec.bv_len;
 
 	if (bio->bi_status)
 		cb->status = bio->bi_status;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 12/18] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (10 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 11/18] btrfs: make dec_and_test_compressed_bio() " Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 13/18] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Qu Wenruo
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

For profiles other than RAID56, __btrfs_map_block() returns @map_length
as min(stripe_end, logical + *length), which is also the same result
from btrfs_get_io_geometry().

But for RAID56, __btrfs_map_block() returns @map_length as stripe_len.

This strange behavior is going to hurt incoming bio split at
btrfs_map_bio() time, as we will use @map_length as bio split size.

Fix this behavior by:

- Return @map_length by the same calculatioin as other profiles

- Save stripe_len into btrfs_io_context

- Pass btrfs_io_context::stripe_len to raid56_*() functions

- Update raid56_*() functions to make its stripe_len parameter more
  explicit

- Update scrub_stripe_index_and_offset() to properly name its
  parameters

- Add extra ASSERT()s to make sure the passed stripe_len is correct

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c  | 12 ++++++++++--
 fs/btrfs/raid56.h  |  2 +-
 fs/btrfs/scrub.c   | 14 ++++++++------
 fs/btrfs/volumes.c | 13 ++++++++++---
 fs/btrfs/volumes.h |  1 +
 5 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 13e726c88a81..d35cfd750b76 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -969,6 +969,8 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	int stripe_npages = DIV_ROUND_UP(stripe_len, PAGE_SIZE);
 	void *p;
 
+	ASSERT(stripe_len == BTRFS_STRIPE_LEN);
+
 	rbio = kzalloc(sizeof(*rbio) +
 		       sizeof(*rbio->stripe_pages) * num_pages +
 		       sizeof(*rbio->bio_pages) * num_pages +
@@ -1725,6 +1727,9 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc,
 	struct blk_plug_cb *cb;
 	int ret;
 
+	/* Currently we only support fixed stripe len */
+	ASSERT(stripe_len == BTRFS_STRIPE_LEN);
+
 	rbio = alloc_rbio(fs_info, bioc, stripe_len);
 	if (IS_ERR(rbio)) {
 		btrfs_put_bioc(bioc);
@@ -2122,6 +2127,9 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 	struct btrfs_raid_bio *rbio;
 	int ret;
 
+	/* Currently we only support fixed stripe len */
+	ASSERT(stripe_len == BTRFS_STRIPE_LEN);
+
 	if (generic_io) {
 		ASSERT(bioc->mirror_num == mirror_num);
 		btrfs_bio(bio)->mirror_num = mirror_num;
@@ -2671,12 +2679,12 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio)
 
 struct btrfs_raid_bio *
 raid56_alloc_missing_rbio(struct bio *bio, struct btrfs_io_context *bioc,
-			  u64 length)
+			  u64 stripe_len)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_raid_bio *rbio;
 
-	rbio = alloc_rbio(fs_info, bioc, length);
+	rbio = alloc_rbio(fs_info, bioc, stripe_len);
 	if (IS_ERR(rbio))
 		return NULL;
 
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 72c00fc284b5..7322dcae4498 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -46,7 +46,7 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio);
 
 struct btrfs_raid_bio *
 raid56_alloc_missing_rbio(struct bio *bio, struct btrfs_io_context *bioc,
-			  u64 length);
+			  u64 stripe_len);
 void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio);
 
 int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 11089568b287..6b3f4149883f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1222,13 +1222,15 @@ static inline int scrub_nr_raid_mirrors(struct btrfs_io_context *bioc)
 
 static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
 						 u64 *raid_map,
-						 u64 mapped_length,
+						 u64 stripe_len,
 						 int nstripes, int mirror,
 						 int *stripe_index,
 						 u64 *stripe_offset)
 {
 	int i;
 
+	ASSERT(stripe_len == BTRFS_STRIPE_LEN);
+
 	if (map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
 		/* RAID5/6 */
 		for (i = 0; i < nstripes; i++) {
@@ -1237,7 +1239,7 @@ static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
 				continue;
 
 			if (logical >= raid_map[i] &&
-			    logical < raid_map[i] + mapped_length)
+			    logical < raid_map[i] + stripe_len)
 				break;
 		}
 
@@ -1342,7 +1344,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 			scrub_stripe_index_and_offset(logical,
 						      bioc->map_type,
 						      bioc->raid_map,
-						      mapped_length,
+						      bioc->stripe_len,
 						      bioc->num_stripes -
 						      bioc->num_tgtdevs,
 						      mirror_index,
@@ -1394,7 +1396,7 @@ static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info,
 
 	mirror_num = spage->sblock->pagev[0]->mirror_num;
 	ret = raid56_parity_recover(bio, spage->recover->bioc,
-				    spage->recover->map_length,
+				    spage->recover->bioc->stripe_len,
 				    mirror_num, 0);
 	if (ret)
 		return ret;
@@ -2223,7 +2225,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	bio->bi_private = sblock;
 	bio->bi_end_io = scrub_missing_raid56_end_io;
 
-	rbio = raid56_alloc_missing_rbio(bio, bioc, length);
+	rbio = raid56_alloc_missing_rbio(bio, bioc, bioc->stripe_len);
 	if (!rbio)
 		goto rbio_out;
 
@@ -2839,7 +2841,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 	bio->bi_private = sparity;
 	bio->bi_end_io = scrub_parity_bio_endio;
 
-	rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, length,
+	rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, bioc->stripe_len,
 					      sparity->scrub_dev,
 					      sparity->dbitmap,
 					      sparity->nsectors);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d2b9cba1e5fd..e4e688b31c90 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6043,6 +6043,7 @@ static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
 		ret = -ENOMEM;
 		goto out;
 	}
+	bioc->stripe_len = map->stripe_len;
 
 	for (i = 0; i < num_stripes; i++) {
 		bioc->stripes[i].physical =
@@ -6398,6 +6399,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 {
 	struct extent_map *em;
 	struct map_lookup *map;
+	const u64 orig_length = *length;
 	u64 stripe_offset;
 	u64 stripe_nr;
 	u64 stripe_len;
@@ -6419,6 +6421,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 
 	ASSERT(bioc_ret);
 	ASSERT(op != BTRFS_MAP_DISCARD);
+	ASSERT(orig_length);
 
 	em = btrfs_get_chunk_map(fs_info, logical, *length);
 	ASSERT(!IS_ERR(em));
@@ -6514,7 +6517,10 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 			num_stripes = map->num_stripes;
 			max_errors = nr_parity_stripes(map);
 
-			*length = map->stripe_len;
+			/* Return the length to the full stripe end */
+			*length = min(raid56_full_stripe_start + em->start +
+				      data_stripes * stripe_len,
+				      logical + orig_length) - logical;
 			stripe_index = 0;
 			stripe_offset = 0;
 		} else {
@@ -6566,6 +6572,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		ret = -ENOMEM;
 		goto out;
 	}
+	bioc->stripe_len = map->stripe_len;
 
 	for (i = 0; i < num_stripes; i++) {
 		bioc->stripes[i].physical = map->stripes[stripe_index].physical +
@@ -6816,9 +6823,9 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi
 		/* In this case, map_length has been set to the length of
 		   a single stripe; not the whole write */
 		if (btrfs_op(bio) == BTRFS_MAP_WRITE) {
-			ret = raid56_parity_write(bio, bioc, map_length);
+			ret = raid56_parity_write(bio, bioc, bioc->stripe_len);
 		} else {
-			ret = raid56_parity_recover(bio, bioc, map_length,
+			ret = raid56_parity_recover(bio, bioc, bioc->stripe_len,
 						    mirror_num, 1);
 		}
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5496b8750e28..410617cb7533 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -447,6 +447,7 @@ struct btrfs_io_context {
 	struct bio *orig_bio;
 	void *private;
 	atomic_t error;
+	u32 stripe_len;
 	int max_errors;
 	int num_stripes;
 	int mirror_num;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 13/18] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (11 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 12/18] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 14/18] btrfs: remove buffered IO stripe boundary calculation Qu Wenruo
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

With the new btrfs_bio_split() helper, we are able to split bio
according to chunk stripe boundaries at btrfs_map_bio() time.

Although currently bios are split at buffered/compressed/encoded/direct
IO time, this ability is not yet utilized.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 50 +++++++++++++++++++++++++++++-----------------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e4e688b31c90..403aa371c11f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6871,30 +6871,44 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num)
 {
-	u64 logical = bio->bi_iter.bi_sector << 9;
-	u64 length = 0;
-	u64 map_length;
+	const u64 orig_logical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
+	const unsigned int orig_length = bio->bi_iter.bi_size;
+	const enum btrfs_map_op op = btrfs_op(bio);
+	u64 cur_logical = orig_logical;
 	int ret;
-	struct btrfs_io_context *bioc = NULL;
 
-	length = bio->bi_iter.bi_size;
-	map_length = length;
+	while (cur_logical < orig_logical + orig_length) {
+		u64 map_length = orig_logical + orig_length - cur_logical;
+		struct btrfs_io_context *bioc = NULL;
+		struct bio *cur_bio;
 
-	btrfs_bio_counter_inc_blocked(fs_info);
-	btrfs_bio_save_iter(btrfs_bio(bio));
-	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
-				&map_length, &bioc, mirror_num, 1);
-	if (ret) {
-		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
-	}
+		btrfs_bio_save_iter(btrfs_bio(bio));
+		ret = __btrfs_map_block(fs_info, op, cur_logical, &map_length,
+					&bioc, mirror_num, 1);
+		if (ret)
+			return errno_to_blk_status(ret);
 
-	ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num);
-	if (ret < 0) {
+		if (cur_logical + map_length < orig_logical + orig_length) {
+			/*
+			 * For now zoned write should never cross stripe
+			 * boundary
+			 */
+			ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND);
+
+			/* Split the bio */
+			cur_bio = btrfs_bio_split(fs_info, bio, map_length);
+		} else {
+			/* Use the existing bio directly */
+			cur_bio = bio;
+		}
+		btrfs_bio_counter_inc_blocked(fs_info);
+		ret = submit_one_mapped_range(fs_info, cur_bio, bioc,
+					      map_length, mirror_num);
 		btrfs_bio_counter_dec(fs_info);
-		return errno_to_blk_status(ret);
+		if (ret < 0)
+			return errno_to_blk_status(ret);
+		cur_logical += map_length;
 	}
-	btrfs_bio_counter_dec(fs_info);
 	return BLK_STS_OK;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 14/18] btrfs: remove buffered IO stripe boundary calculation
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (12 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 13/18] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 15/18] btrfs: remove stripe boundary calculation for compressed IO Qu Wenruo
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This will remove btrfs_bio_ctrl::len_to_stripe_boundary, so that buffer
IO will no longer limits its bio size according to stripe length.

This will move the bio split to btrfs_map_bio() for all buffered IO.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 24 ++----------------------
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e8c298572d3e..d52defdc08e3 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3308,7 +3308,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 
 	ASSERT(bio);
 	/* The limit should be calculated when bio_ctrl->bio is allocated */
-	ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary);
+	ASSERT(bio_ctrl->len_to_oe_boundary);
 	if (bio_ctrl->bio_flags != bio_flags)
 		return 0;
 
@@ -3319,9 +3319,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 	if (!contig)
 		return 0;
 
-	real_size = min(bio_ctrl->len_to_oe_boundary,
-			bio_ctrl->len_to_stripe_boundary) - bio_size;
-	real_size = min(real_size, size);
+	real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size);
 
 	/*
 	 * If real_size is 0, never call bio_add_*_page(), as even size is 0,
@@ -3341,12 +3339,8 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl,
 static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 			       struct btrfs_inode *inode, u64 file_offset)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
-	struct btrfs_io_geometry geom;
 	struct btrfs_ordered_extent *ordered;
-	struct extent_map *em;
 	u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT);
-	int ret;
 
 	/*
 	 * Pages for compressed extent are never submitted to disk directly,
@@ -3357,22 +3351,8 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl,
 	 */
 	if (bio_ctrl->bio_flags & EXTENT_BIO_COMPRESSED) {
 		bio_ctrl->len_to_oe_boundary = U32_MAX;
-		bio_ctrl->len_to_stripe_boundary = U32_MAX;
 		return 0;
 	}
-	em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize);
-	if (IS_ERR(em))
-		return PTR_ERR(em);
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio),
-				    logical, &geom);
-	free_extent_map(em);
-	if (ret < 0) {
-		return ret;
-	}
-	if (geom.len > U32_MAX)
-		bio_ctrl->len_to_stripe_boundary = U32_MAX;
-	else
-		bio_ctrl->len_to_stripe_boundary = (u32)geom.len;
 
 	if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) {
 		bio_ctrl->len_to_oe_boundary = U32_MAX;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 15/18] btrfs: remove stripe boundary calculation for compressed IO
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (13 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 14/18] btrfs: remove buffered IO stripe boundary calculation Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 16/18] btrfs: remove the stripe boundary calculation for direct IO Qu Wenruo
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

For compressed IO, we calculate the next stripe start inside
alloc_compressed_bio().

Since now btrfs_map_bio() can handle bio split, we no longer need to
calculate the boundary any more.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/compression.c | 49 +++++-------------------------------------
 1 file changed, 5 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e9b0887c03a9..0bf694038c61 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -443,21 +443,15 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info,
  *                      from or written to.
  * @endio_func:         The endio function to call after the IO for compressed data
  *                      is finished.
- * @next_stripe_start:  Return value of logical bytenr of where next stripe starts.
- *                      Let the caller know to only fill the bio up to the stripe
- *                      boundary.
  */
 
 
 static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
-					unsigned int opf, bio_end_io_t endio_func,
-					u64 *next_stripe_start)
+					unsigned int opf, bio_end_io_t endio_func)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
-	struct btrfs_io_geometry geom;
 	struct extent_map *em;
 	struct bio *bio;
-	int ret;
 
 	bio = btrfs_bio_alloc(BIO_MAX_VECS);
 
@@ -474,14 +468,7 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND)
 		bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev);
-
-	ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom);
 	free_extent_map(em);
-	if (ret < 0) {
-		bio_put(bio);
-		return ERR_PTR(ret);
-	}
-	*next_stripe_start = disk_bytenr + geom.len;
 
 	return bio;
 }
@@ -508,7 +495,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	struct bio *bio = NULL;
 	struct compressed_bio *cb;
 	u64 cur_disk_bytenr = disk_start;
-	u64 next_stripe_start;
 	blk_status_t ret;
 	int skip_sum = inode->flags & BTRFS_INODE_NODATASUM;
 	const bool use_append = btrfs_use_zone_append(inode, disk_start);
@@ -542,28 +528,19 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 		/* Allocate new bio if submitted or not yet allocated */
 		if (!bio) {
 			bio = alloc_compressed_bio(cb, cur_disk_bytenr,
-				bio_op | write_flags, end_compressed_bio_write,
-				&next_stripe_start);
+				bio_op | write_flags, end_compressed_bio_write);
 			if (IS_ERR(bio)) {
 				ret = errno_to_blk_status(PTR_ERR(bio));
 				bio = NULL;
 				goto finish_cb;
 			}
 		}
-		/*
-		 * We should never reach next_stripe_start start as we will
-		 * submit comp_bio when reach the boundary immediately.
-		 */
-		ASSERT(cur_disk_bytenr != next_stripe_start);
-
 		/*
 		 * We have various limits on the real read size:
-		 * - stripe boundary
 		 * - page boundary
 		 * - compressed length boundary
 		 */
-		real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_bytenr);
-		real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset));
 		real_size = min_t(u64, real_size, compressed_len - offset);
 		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
@@ -578,9 +555,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			submit = true;
 
 		cur_disk_bytenr += added;
-		/* Reached stripe boundary */
-		if (cur_disk_bytenr == next_stripe_start)
-			submit = true;
 
 		/* Finished the range */
 		if (cur_disk_bytenr == disk_start + compressed_len)
@@ -800,7 +774,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	struct bio *comp_bio = NULL;
 	const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT;
 	u64 cur_disk_byte = disk_bytenr;
-	u64 next_stripe_start;
 	u64 file_offset;
 	u64 em_len;
 	u64 em_start;
@@ -887,27 +860,19 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		/* Allocate new bio if submitted or not yet allocated */
 		if (!comp_bio) {
 			comp_bio = alloc_compressed_bio(cb, cur_disk_byte,
-					REQ_OP_READ, end_compressed_bio_read,
-					&next_stripe_start);
+					REQ_OP_READ, end_compressed_bio_read);
 			if (IS_ERR(comp_bio)) {
 				ret = errno_to_blk_status(PTR_ERR(comp_bio));
 				comp_bio = NULL;
 				goto finish_cb;
 			}
 		}
-		/*
-		 * We should never reach next_stripe_start start as we will
-		 * submit comp_bio when reach the boundary immediately.
-		 */
-		ASSERT(cur_disk_byte != next_stripe_start);
 		/*
 		 * We have various limit on the real read size:
-		 * - stripe boundary
 		 * - page boundary
 		 * - compressed length boundary
 		 */
-		real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_byte);
-		real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset));
+		real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset));
 		real_size = min_t(u64, real_size, compressed_len - offset);
 		ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize));
 
@@ -919,10 +884,6 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		ASSERT(added == real_size);
 		cur_disk_byte += added;
 
-		/* Reached stripe boundary, need to submit */
-		if (cur_disk_byte == next_stripe_start)
-			submit = true;
-
 		/* Has finished the range, need to submit */
 		if (cur_disk_byte == disk_bytenr + compressed_len)
 			submit = true;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 16/18] btrfs: remove the stripe boundary calculation for direct IO
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (14 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 15/18] btrfs: remove stripe boundary calculation for compressed IO Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 17/18] btrfs: remove the stripe boundary calcluation for encoded IO Qu Wenruo
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_submit_direct() we have a do {} while () loop to handle the bio
split due to stripe boundary.

Since btrfs_map_bio() can handle it for us now, there is no need to
manually do the split anymore.

Also since we don't need to split bio, there is no special check for
RAID56 anymore, make btrfs_submit_dio_bio() to have the same rule as
btrfs_submit_data_bio() for async submit.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 113 ++++++++++-------------------------------------
 1 file changed, 24 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d5f4c102bab3..66a6cb5d9572 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7913,22 +7913,16 @@ static void btrfs_end_dio_bio(struct bio *bio)
 }
 
 static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio,
-		struct inode *inode, u64 file_offset, int async_submit)
+		struct inode *inode, u64 file_offset)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_dio_private *dip = bio->bi_private;
 	bool write = btrfs_op(bio) == BTRFS_MAP_WRITE;
+	bool async_submit;
 	blk_status_t ret;
 
-	/*
-	 * Check btrfs_submit_data_bio() for rules about async submit.
-	 *
-	 * The only exception is for RAID56, when there are more than one bios
-	 * to submit, async submit seems to make it harder to collect csums
-	 * for the full stripe.
-	 */
-	if (async_submit)
-		async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
+	/* Check btrfs_submit_data_bio() for rules about async submit. */
+	async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers);
 
 	if (!write)
 		btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA;
@@ -8002,25 +7996,12 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		struct bio *dio_bio, loff_t file_offset)
 {
 	struct inode *inode = iter->inode;
+	struct btrfs_dio_data *dio_data = iter->iomap.private;
 	const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE);
-	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	const bool raid56 = (btrfs_data_alloc_profile(fs_info) &
-			     BTRFS_BLOCK_GROUP_RAID56_MASK);
 	struct btrfs_dio_private *dip;
 	struct bio *bio;
 	const u32 length = dio_bio->bi_iter.bi_size;
-	u32 submitted_bytes = 0;
-	u64 start_sector;
-	int async_submit = 0;
-	u64 submit_len;
-	u64 clone_offset = 0;
-	u64 clone_len;
-	u64 logical;
-	int ret;
 	blk_status_t status;
-	struct btrfs_io_geometry geom;
-	struct btrfs_dio_data *dio_data = iter->iomap.private;
-	struct extent_map *em = NULL;
 
 	dip = btrfs_create_dio_private(dio_bio, inode, file_offset, length);
 	if (!dip) {
@@ -8044,80 +8025,34 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 			goto out_err;
 	}
 
-	start_sector = dio_bio->bi_iter.bi_sector;
-	submit_len = dio_bio->bi_iter.bi_size;
-
-	do {
-		logical = start_sector << 9;
-		em = btrfs_get_chunk_map(fs_info, logical, submit_len);
-		if (IS_ERR(em)) {
-			status = errno_to_blk_status(PTR_ERR(em));
-			em = NULL;
-			goto out_err_em;
-		}
-		ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
-					    logical, &geom);
-		if (ret) {
-			status = errno_to_blk_status(ret);
-			goto out_err_em;
-		}
-
-		clone_len = min(submit_len, geom.len);
-		ASSERT(clone_len <= UINT_MAX);
-
-		/*
-		 * This will never fail as it's passing GPF_NOFS and
-		 * the allocation is backed by btrfs_bioset.
-		 */
-		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
-		bio->bi_private = dip;
-		bio->bi_end_io = btrfs_end_dio_bio;
-
-		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			status = extract_ordered_extent(BTRFS_I(inode), bio,
-							file_offset);
-			if (status) {
-				bio_put(bio);
-				goto out_err;
-			}
-		}
-
-		ASSERT(submit_len >= clone_len);
-		submit_len -= clone_len;
+	/*
+	 * This will never fail as it's passing GPF_NOFS and
+	 * the allocation is backed by btrfs_bioset.
+	 */
+	bio = btrfs_bio_clone(dio_bio);
+	bio->bi_private = dip;
+	bio->bi_end_io = btrfs_end_dio_bio;
 
-		if (submit_len > 0) {
-			/*
-			 * If we are submitting more than one bio, submit them
-			 * all asynchronously. The exception is RAID 5 or 6, as
-			 * asynchronous checksums make it difficult to collect
-			 * full stripe writes.
-			 */
-			if (!raid56)
-				async_submit = 1;
-		}
 
-		status = btrfs_submit_dio_bio(bio, inode, file_offset,
-						async_submit);
+	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		status = extract_ordered_extent(BTRFS_I(inode), bio,
+						file_offset);
 		if (status) {
 			bio_put(bio);
-			goto out_err_em;
+			goto out_err;
 		}
-
-		submitted_bytes += clone_len;
-		dio_data->submitted += clone_len;
-		clone_offset += clone_len;
-		start_sector += clone_len >> 9;
-		file_offset += clone_len;
-
-		free_extent_map(em);
-	} while (submit_len > 0);
+	}
+	status = btrfs_submit_dio_bio(bio, inode, file_offset);
+	if (status) {
+		bio_put(bio);
+		goto out_err;
+	}
+	dio_data->submitted += length;
 	return;
 
-out_err_em:
-	free_extent_map(em);
 out_err:
 	dip->dio_bio->bi_status = status;
-	dio_private_finish(dip, status, length - submitted_bytes);
+	dio_private_finish(dip, status, length);
 }
 
 const struct iomap_ops btrfs_dio_iomap_ops = {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 17/18] btrfs: remove the stripe boundary calcluation for encoded IO
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (15 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 16/18] btrfs: remove the stripe boundary calculation for direct IO Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-14  9:07 ` [PATCH v3 18/18] btrfs: unexport btrfs_get_io_geometry() Qu Wenruo
  2022-03-22 15:44 ` [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Christoph Hellwig
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

In btrfs_encoded_read_regular_fill_pages(), we have a loop to handle the
bio split due to stripe boundary.

Since btrfs_map_bio() can handle it for us now, there is no need to
manually do the split anymore.

Just remove the related btrfs_get_io_geometry() call inside
btrfs_encoded_read_regular_fill_pages().

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/inode.c | 20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 66a6cb5d9572..ecf039c272fc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10331,7 +10331,6 @@ static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 						 u64 disk_io_size,
 						 struct page **pages)
 {
-	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	struct btrfs_encoded_read_private priv = {
 		.inode = inode,
 		.file_offset = file_offset,
@@ -10340,7 +10339,6 @@ static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 	};
 	unsigned long i = 0;
 	u64 cur = 0;
-	int ret;
 
 	init_waitqueue_head(&priv.wait);
 	/*
@@ -10348,25 +10346,9 @@ static int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 	 * necessary.
 	 */
 	while (cur < disk_io_size) {
-		struct extent_map *em;
-		struct btrfs_io_geometry geom;
 		struct bio *bio = NULL;
-		u64 remaining;
+		u64 remaining = disk_io_size - cur;
 
-		em = btrfs_get_chunk_map(fs_info, disk_bytenr + cur,
-					 disk_io_size - cur);
-		if (IS_ERR(em)) {
-			ret = PTR_ERR(em);
-		} else {
-			ret = btrfs_get_io_geometry(fs_info, em, BTRFS_MAP_READ,
-						    disk_bytenr + cur, &geom);
-			free_extent_map(em);
-		}
-		if (ret) {
-			WRITE_ONCE(priv.status, errno_to_blk_status(ret));
-			break;
-		}
-		remaining = min(geom.len, disk_io_size - cur);
 		while (bio || remaining) {
 			size_t bytes = min_t(u64, remaining, PAGE_SIZE);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 18/18] btrfs: unexport btrfs_get_io_geometry()
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (16 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 17/18] btrfs: remove the stripe boundary calcluation for encoded IO Qu Wenruo
@ 2022-03-14  9:07 ` Qu Wenruo
  2022-03-22 15:44 ` [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Christoph Hellwig
  18 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-03-14  9:07 UTC (permalink / raw)
  To: linux-btrfs

This function provides a lighter weight version of btrfs_map_block(),
just to provide enough info without filling everything of
btrfs_map_block().

But that function is only used for stripe boundary calculation, and now
stripe boundary calculation is all handled inside btrfs_map_bio(), there
is no need to export it anymore.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 8 ++++----
 fs/btrfs/volumes.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 403aa371c11f..301491429e37 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6312,9 +6312,9 @@ static bool need_full_stripe(enum btrfs_map_op op)
  * Returns < 0 in case a chunk for the given logical address cannot be found,
  * usually shouldn't happen unless @logical is corrupted, 0 otherwise.
  */
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom)
+static int get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
+			   enum btrfs_map_op op, u64 logical,
+			   struct btrfs_io_geometry *io_geom)
 {
 	struct map_lookup *map;
 	u64 len;
@@ -6426,7 +6426,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	em = btrfs_get_chunk_map(fs_info, logical, *length);
 	ASSERT(!IS_ERR(em));
 
-	ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom);
+	ret = get_io_geometry(fs_info, em, op, logical, &geom);
 	if (ret < 0)
 		return ret;
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 410617cb7533..9259c1a4cf73 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -559,9 +559,6 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 		     u64 logical, u64 *length,
 		     struct btrfs_io_context **bioc_ret);
-int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map,
-			  enum btrfs_map_op op, u64 logical,
-			  struct btrfs_io_geometry *io_geom);
 int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook()
  2022-03-14  9:07 ` [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
@ 2022-03-15  7:43   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2022-03-15  7:43 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Mon, Mar 14, 2022 at 05:07:14PM +0800, Qu Wenruo wrote:
> This function is renamed to btrfs_submit_data_bio(), update the comment
> and add extra reason why it doesn't completely follow the same rule in
> btrfs_submit_data_bio().

Looks good, I actually have something similar in a local tree:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
                   ` (17 preceding siblings ...)
  2022-03-14  9:07 ` [PATCH v3 18/18] btrfs: unexport btrfs_get_io_geometry() Qu Wenruo
@ 2022-03-22 15:44 ` Christoph Hellwig
  2022-03-22 23:45   ` Qu Wenruo
  18 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2022-03-22 15:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

I spent some time looking over this series and I think while it has
some nice cleanups, it also goes fundamentally in the wrong direction.

The way bios are used is that the file systems always builds bios
to it's own limits like extents, lower drivers split them up if needed.
By building the bigger bios in btrfs a lot of the completion handling
gets much more complicated.

I had actually started a series a bit ago to clean up the btrfs bio
usage bottom up, taking advantage of the newer bios interfaces.  I've
spent some of my vacation time last week to finish this off and also
add a few iomap improvements so that btrfs doesn't need to clone the
iomap dio bios above btrfs_map_bio either.  I'll send it out in a bit.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-03-22 15:44 ` [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Christoph Hellwig
@ 2022-03-22 23:45   ` Qu Wenruo
  2022-04-20 20:11     ` David Sterba
  0 siblings, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2022-03-22 23:45 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs



On 2022/3/22 23:44, Christoph Hellwig wrote:
> I spent some time looking over this series and I think while it has
> some nice cleanups, it also goes fundamentally in the wrong direction.

Well, at least I got some review, that's always a good news.

>
> The way bios are used is that the file systems always builds bios
> to it's own limits like extents,

That part is not changed. We still have extent limits.

> lower drivers split them up if needed.
> By building the bigger bios in btrfs a lot of the completion handling
> gets much more complicated.

The "bigger" bios is still limited by extents.

The work here is to mimic the stack driver behavior.

For dm-raid0, the fs only splits its bio at its extent boundary, then
the dm driver further split according to stripe boundary.

IMHO this behavior is simpler, and has better layer separation.
Thus it's worthy to do it the same in btrfs.

But you're correct about the complexity.
In fact I also find out that, we don't really need to do as complex as I
did in that series.

One thing to improve is the split bio handling.
We can do checksum verification later, after all data has been read from
disk.

Then we don't need the complex split bio handling.
>
> I had actually started a series a bit ago to clean up the btrfs bio
> usage bottom up, taking advantage of the newer bios interfaces.  I've
> spent some of my vacation time last week to finish this off and also
> add a few iomap improvements so that btrfs doesn't need to clone the
> iomap dio bios above btrfs_map_bio either.  I'll send it out in a bit.

Let me steal some tricks from your series.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-03-22 23:45   ` Qu Wenruo
@ 2022-04-20 20:11     ` David Sterba
  2022-04-20 23:04       ` Qu Wenruo
  2022-04-21  5:27       ` Christoph Hellwig
  0 siblings, 2 replies; 27+ messages in thread
From: David Sterba @ 2022-04-20 20:11 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs

On Wed, Mar 23, 2022 at 07:45:31AM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/3/22 23:44, Christoph Hellwig wrote:
> > I spent some time looking over this series and I think while it has
> > some nice cleanups, it also goes fundamentally in the wrong direction.
> 
> Well, at least I got some review, that's always a good news.

So this whole series will be dropped and replaced by Christoph's
patches, however if there's anything useful left please send it
separately later on. Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-04-20 20:11     ` David Sterba
@ 2022-04-20 23:04       ` Qu Wenruo
  2022-04-21  5:28         ` Christoph Hellwig
  2022-04-21  5:27       ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Qu Wenruo @ 2022-04-20 23:04 UTC (permalink / raw)
  To: dsterba, Christoph Hellwig, Qu Wenruo, linux-btrfs



On 2022/4/21 04:11, David Sterba wrote:
> On Wed, Mar 23, 2022 at 07:45:31AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2022/3/22 23:44, Christoph Hellwig wrote:
>>> I spent some time looking over this series and I think while it has
>>> some nice cleanups, it also goes fundamentally in the wrong direction.
>>
>> Well, at least I got some review, that's always a good news.
>
> So this whole series will be dropped and replaced by Christoph's
> patches, however if there's anything useful left please send it
> separately later on. Thanks.

I'll refresh the patchset, still keep the core idea of splitting bio at
btrfs_map_bio() time, but also take some ideas from Christoph to further
improve the patchset.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-04-20 20:11     ` David Sterba
  2022-04-20 23:04       ` Qu Wenruo
@ 2022-04-21  5:27       ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2022-04-21  5:27 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Christoph Hellwig, Qu Wenruo, linux-btrfs

On Wed, Apr 20, 2022 at 10:11:58PM +0200, David Sterba wrote:
> On Wed, Mar 23, 2022 at 07:45:31AM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2022/3/22 23:44, Christoph Hellwig wrote:
> > > I spent some time looking over this series and I think while it has
> > > some nice cleanups, it also goes fundamentally in the wrong direction.
> > 
> > Well, at least I got some review, that's always a good news.
> 
> So this whole series will be dropped and replaced by Christoph's
> patches, however if there's anything useful left please send it
> separately later on. Thanks.

There are some good ideas in here actually.  I think we'll come up
with a fusion variant eventually.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-04-20 23:04       ` Qu Wenruo
@ 2022-04-21  5:28         ` Christoph Hellwig
  2022-04-21  7:08           ` Qu Wenruo
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2022-04-21  5:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Christoph Hellwig, Qu Wenruo, linux-btrfs

On Thu, Apr 21, 2022 at 07:04:25AM +0800, Qu Wenruo wrote:
> I'll refresh the patchset, still keep the core idea of splitting bio at
> btrfs_map_bio() time, but also take some ideas from Christoph to further
> improve the patchset.

I'll have about two more batches of patches that don't touch this at
all.  I have a bunch of ideas how to deal with the splitting after
that and will contact you about the ideas.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time
  2022-04-21  5:28         ` Christoph Hellwig
@ 2022-04-21  7:08           ` Qu Wenruo
  0 siblings, 0 replies; 27+ messages in thread
From: Qu Wenruo @ 2022-04-21  7:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: dsterba, Qu Wenruo, linux-btrfs



On 2022/4/21 13:28, Christoph Hellwig wrote:
> On Thu, Apr 21, 2022 at 07:04:25AM +0800, Qu Wenruo wrote:
>> I'll refresh the patchset, still keep the core idea of splitting bio at
>> btrfs_map_bio() time, but also take some ideas from Christoph to further
>> improve the patchset.
>
> I'll have about two more batches of patches that don't touch this at
> all.  I have a bunch of ideas how to deal with the splitting after
> that and will contact you about the ideas.

I'm not in a hurry, especially considering there is really no direct
user of the delayed bio split feature.
(It will make a lot of things easier, but no determining effect yet).

So looking forward for the new ideas.

Although personally speaking, the zoned related code is a bigger concern
to me.

If we can also do the zone split at that delayed timing, that' would be
awesome.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-04-21  7:08 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14  9:07 [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 01/18] btrfs: update an stale comment on btrfs_submit_bio_hook() Qu Wenruo
2022-03-15  7:43   ` Christoph Hellwig
2022-03-14  9:07 ` [PATCH v3 02/18] btrfs: save bio::bi_iter into btrfs_bio::iter before any endio Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 03/18] btrfs: use correct bio size for error message in btrfs_end_dio_bio() Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 04/18] btrfs: refactor btrfs_map_bio() Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 05/18] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 06/18] btrfs: replace btrfs_dio_private::refs with btrfs_dio_private::pending_bytes Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 07/18] btrfs: introduce btrfs_bio_split() helper Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 08/18] btrfs: make data buffered read path to handle split bio properly Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 09/18] btrfs: make data buffered write endio function to be split bio compatible Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 10/18] btrfs: make metadata write endio functions " Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 11/18] btrfs: make dec_and_test_compressed_bio() " Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 12/18] btrfs: return proper mapped length for RAID56 profiles in __btrfs_map_block() Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 13/18] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 14/18] btrfs: remove buffered IO stripe boundary calculation Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 15/18] btrfs: remove stripe boundary calculation for compressed IO Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 16/18] btrfs: remove the stripe boundary calculation for direct IO Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 17/18] btrfs: remove the stripe boundary calcluation for encoded IO Qu Wenruo
2022-03-14  9:07 ` [PATCH v3 18/18] btrfs: unexport btrfs_get_io_geometry() Qu Wenruo
2022-03-22 15:44 ` [PATCH v3 00/18] btrfs: split bio at btrfs_map_bio() time Christoph Hellwig
2022-03-22 23:45   ` Qu Wenruo
2022-04-20 20:11     ` David Sterba
2022-04-20 23:04       ` Qu Wenruo
2022-04-21  5:28         ` Christoph Hellwig
2022-04-21  7:08           ` Qu Wenruo
2022-04-21  5:27       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.