All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs I/O completion cleanup and single device I/O optimizations v2
@ 2022-07-13  6:13 Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 01/11] btrfs: don't call bioset_integrity_create for btrfs_bioset Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Hi all,

this series cleans up the btrfs_bio API, most prominently by splitting
the end_io handler for the highlevel bio from the low-level bio
bi_end_io, which are really confusingly coupled in the current code.
Once that is done it then optimizes the bio submission to not allocate
a btrfs_io_context for I/Os tht just go to a single device.

This series sits on top of the for-next tree that has the "fix read
repair on compressed extents v2" series included.  To make everyones life
easier a git tree is also available:

    git://git.infradead.org/users/hch/misc.git btrfs-bio-api-cleanup

Gitweb:

    http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-bio-api-cleanup

Note that I've picked up the reviews and tested tags from the last
posting for the patches that are relatively unchainged or just split,
so they might appear a little inconsistent.

Changes since v1:
 - add two previously submitted patches skipped from an earlier
   series
 - merged one of those patches with one from this series
 - split one of the patches in this series into three
 - improve various commit logs

Diffstat:
 compression.c    |   43 ++-----
 ctree.h          |    1 
 dev-replace.c    |    5 
 disk-io.c        |   16 +-
 extent-io-tree.h |    4 
 extent_io.c      |  117 +++----------------
 extent_io.h      |    3 
 inode.c          |   57 ++++-----
 raid56.c         |   45 +------
 raid56.h         |    4 
 scrub.c          |    7 -
 super.c          |    6 
 volumes.c        |  337 ++++++++++++++++++++++++++++++++++++++-----------------
 volumes.h        |   20 ++-
 14 files changed, 340 insertions(+), 325 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] btrfs: don't call bioset_integrity_create for btrfs_bioset
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 02/11] btrfs: move btrfs_bio allocation to volumes.c Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

btrfs never uses bio integrity data itself, so don't allocate
the integrity pools for btrfs_bioset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent_io.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index db95de608ce3f..35c609aaeda4a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -255,14 +255,8 @@ int __init extent_io_init(void)
 			BIOSET_NEED_BVECS))
 		goto free_buffer_cache;
 
-	if (bioset_integrity_create(&btrfs_bioset, BIO_POOL_SIZE))
-		goto free_bioset;
-
 	return 0;
 
-free_bioset:
-	bioset_exit(&btrfs_bioset);
-
 free_buffer_cache:
 	kmem_cache_destroy(extent_buffer_cache);
 	extent_buffer_cache = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] btrfs: move btrfs_bio allocation to volumes.c
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 01/11] btrfs: don't call bioset_integrity_create for btrfs_bioset Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 03/11] btrfs: pass the operation to btrfs_bio_alloc Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

volumes.c is the place that implements the storage layer using the
btrfs_bio structure, so move the bio_set and allocation helpers there
as well.

To make up for the new initialization boilerplate, merge the two
init/exit helpers in extent_io.c into a single one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/extent-io-tree.h |  4 +--
 fs/btrfs/extent_io.c      | 74 ++++-----------------------------------
 fs/btrfs/extent_io.h      |  3 --
 fs/btrfs/super.c          |  6 ++--
 fs/btrfs/volumes.c        | 58 ++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h        |  3 ++
 6 files changed, 72 insertions(+), 76 deletions(-)

diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h
index c3eb52dbe61cc..819763ace0aca 100644
--- a/fs/btrfs/extent-io-tree.h
+++ b/fs/btrfs/extent-io-tree.h
@@ -96,8 +96,8 @@ struct extent_state {
 #endif
 };
 
-int __init extent_state_cache_init(void);
-void __cold extent_state_cache_exit(void);
+int __init volumes_init(void);
+void __cold volumes_exit(void);
 
 void extent_io_tree_init(struct btrfs_fs_info *fs_info,
 			 struct extent_io_tree *tree, unsigned int owner,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 35c609aaeda4a..f3a7614333cb7 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -33,7 +33,6 @@
 
 static struct kmem_cache *extent_state_cache;
 static struct kmem_cache *extent_buffer_cache;
-static struct bio_set btrfs_bioset;
 
 static inline bool extent_state_in_tree(const struct extent_state *state)
 {
@@ -232,41 +231,23 @@ static void submit_write_bio(struct extent_page_data *epd, int ret)
 	}
 }
 
-int __init extent_state_cache_init(void)
+int __init extent_io_init(void)
 {
 	extent_state_cache = kmem_cache_create("btrfs_extent_state",
 			sizeof(struct extent_state), 0,
 			SLAB_MEM_SPREAD, NULL);
 	if (!extent_state_cache)
 		return -ENOMEM;
-	return 0;
-}
 
-int __init extent_io_init(void)
-{
 	extent_buffer_cache = kmem_cache_create("btrfs_extent_buffer",
 			sizeof(struct extent_buffer), 0,
 			SLAB_MEM_SPREAD, NULL);
-	if (!extent_buffer_cache)
+	if (!extent_buffer_cache) {
+		kmem_cache_destroy(extent_state_cache);
 		return -ENOMEM;
-
-	if (bioset_init(&btrfs_bioset, BIO_POOL_SIZE,
-			offsetof(struct btrfs_bio, bio),
-			BIOSET_NEED_BVECS))
-		goto free_buffer_cache;
+	}
 
 	return 0;
-
-free_buffer_cache:
-	kmem_cache_destroy(extent_buffer_cache);
-	extent_buffer_cache = NULL;
-	return -ENOMEM;
-}
-
-void __cold extent_state_cache_exit(void)
-{
-	btrfs_extent_state_leak_debug_check();
-	kmem_cache_destroy(extent_state_cache);
 }
 
 void __cold extent_io_exit(void)
@@ -277,7 +258,8 @@ void __cold extent_io_exit(void)
 	 */
 	rcu_barrier();
 	kmem_cache_destroy(extent_buffer_cache);
-	bioset_exit(&btrfs_bioset);
+	btrfs_extent_state_leak_debug_check();
+	kmem_cache_destroy(extent_state_cache);
 }
 
 /*
@@ -3157,50 +3139,6 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array)
 	return 0;
 }
 
-/*
- * Initialize the members up to but not including 'bio'. Use after allocating a
- * new bio by bio_alloc_bioset as it does not initialize the bytes outside of
- * 'bio' because use of __GFP_ZERO is not supported.
- */
-static inline void btrfs_bio_init(struct btrfs_bio *bbio)
-{
-	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
-}
-
-/*
- * Allocate a btrfs_io_bio, with @nr_iovecs as maximum number of iovecs.
- *
- * The bio allocation is backed by bioset and does not fail.
- */
-struct bio *btrfs_bio_alloc(unsigned int nr_iovecs)
-{
-	struct bio *bio;
-
-	ASSERT(0 < nr_iovecs && nr_iovecs <= BIO_MAX_VECS);
-	bio = bio_alloc_bioset(NULL, nr_iovecs, 0, GFP_NOFS, &btrfs_bioset);
-	btrfs_bio_init(btrfs_bio(bio));
-	return bio;
-}
-
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
-{
-	struct bio *bio;
-	struct btrfs_bio *bbio;
-
-	ASSERT(offset <= UINT_MAX && size <= UINT_MAX);
-
-	/* this will never fail when it's backed by a bioset */
-	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
-	ASSERT(bio);
-
-	bbio = btrfs_bio(bio);
-	btrfs_bio_init(bbio);
-
-	bio_trim(bio, offset >> 9, size >> 9);
-	bbio->iter = bio->bi_iter;
-	return bio;
-}
-
 /**
  * Attempt to add a page to bio
  *
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 9dec34c009e91..75194aee6a42e 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -60,7 +60,6 @@ enum {
 struct btrfs_bio;
 struct btrfs_root;
 struct btrfs_inode;
-struct btrfs_io_bio;
 struct btrfs_fs_info;
 struct io_failure_record;
 struct extent_io_tree;
@@ -242,8 +241,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
 				  u32 bits_to_clear, unsigned long page_ops);
 
 int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array);
-struct bio *btrfs_bio_alloc(unsigned int nr_iovecs);
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
 
 void end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 4c7089b1681b3..96b22ee7b12ce 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2664,7 +2664,7 @@ static int __init init_btrfs_fs(void)
 	if (err)
 		goto free_cachep;
 
-	err = extent_state_cache_init();
+	err = volumes_init();
 	if (err)
 		goto free_extent_io;
 
@@ -2723,7 +2723,7 @@ static int __init init_btrfs_fs(void)
 free_extent_map:
 	extent_map_exit();
 free_extent_state_cache:
-	extent_state_cache_exit();
+	volumes_exit();
 free_extent_io:
 	extent_io_exit();
 free_cachep:
@@ -2744,7 +2744,7 @@ static void __exit exit_btrfs_fs(void)
 	btrfs_prelim_ref_exit();
 	ordered_data_exit();
 	extent_map_exit();
-	extent_state_cache_exit();
+	volumes_exit();
 	extent_io_exit();
 	btrfs_interface_exit();
 	unregister_filesystem(&btrfs_fs_type);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index bf4e140f6bfc7..6825f62364c29 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -34,6 +34,8 @@
 #include "discard.h"
 #include "zoned.h"
 
+static struct bio_set btrfs_bioset;
+
 #define BTRFS_BLOCK_GROUP_STRIPE_MASK	(BTRFS_BLOCK_GROUP_RAID0 | \
 					 BTRFS_BLOCK_GROUP_RAID10 | \
 					 BTRFS_BLOCK_GROUP_RAID56_MASK)
@@ -6609,6 +6611,48 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
 }
 
+/*
+ * Initialize a btrfs_bio structure.  This skips the embedded bio itself as it
+ * is already initialized by the block layer.
+ */
+static inline void btrfs_bio_init(struct btrfs_bio *bbio)
+{
+	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+}
+
+/*
+ * Allocate a btrfs_bio structure.  The btrfs_bio is the main I/O container for
+ * btrfs, and is used for all I/O submitted through btrfs_submit_bio.
+ *
+ * Just like the underlying bio_alloc_bioset it will no fail as it is backed by
+ * a mempool.
+ */
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs)
+{
+	struct bio *bio;
+
+	bio = bio_alloc_bioset(NULL, nr_vecs, 0, GFP_NOFS, &btrfs_bioset);
+	btrfs_bio_init(btrfs_bio(bio));
+	return bio;
+}
+
+struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
+{
+	struct bio *bio;
+	struct btrfs_bio *bbio;
+
+	ASSERT(offset <= UINT_MAX && size <= UINT_MAX);
+
+	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
+	bbio = btrfs_bio(bio);
+	btrfs_bio_init(bbio);
+
+	bio_trim(bio, offset >> 9, size >> 9);
+	bbio->iter = bio->bi_iter;
+	return bio;
+
+}
+
 static struct workqueue_struct *btrfs_end_io_wq(struct btrfs_io_context *bioc)
 {
 	if (bioc->orig_bio->bi_opf & REQ_META)
@@ -8294,3 +8338,17 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical)
 
 	return true;
 }
+
+int __init volumes_init(void)
+{
+	if (bioset_init(&btrfs_bioset, BIO_POOL_SIZE,
+			offsetof(struct btrfs_bio, bio),
+			BIOSET_NEED_BVECS))
+		return -ENOMEM;
+	return 0;
+}
+
+void __cold volumes_exit(void)
+{
+	bioset_exit(&btrfs_bioset);
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5639961b3626f..fb57b50fb60b1 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -393,6 +393,9 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
 	return container_of(bio, struct btrfs_bio, bio);
 }
 
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs);
+struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
+
 static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 {
 	if (bbio->csum != bbio->csum_inline) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] btrfs: pass the operation to btrfs_bio_alloc
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 01/11] btrfs: don't call bioset_integrity_create for btrfs_bioset Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 02/11] btrfs: move btrfs_bio allocation to volumes.c Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Pass the operation to btrfs_bio_alloc, matching what bio_alloc_bioset
set does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/compression.c | 3 +--
 fs/btrfs/extent_io.c   | 6 ++----
 fs/btrfs/inode.c       | 3 +--
 fs/btrfs/volumes.c     | 4 ++--
 fs/btrfs/volumes.h     | 2 +-
 5 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b6fb8b0682bae..b2bc605ec73a9 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -344,10 +344,9 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf);
 
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
-	bio->bi_opf = opf;
 	bio->bi_private = cb;
 	bio->bi_end_io = endio_func;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index f3a7614333cb7..667b4f439a089 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2629,10 +2629,9 @@ int btrfs_repair_one_sector(struct inode *inode, struct btrfs_bio *failed_bbio,
 		return -EIO;
 	}
 
-	repair_bio = btrfs_bio_alloc(1);
+	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ);
 	repair_bbio = btrfs_bio(repair_bio);
 	repair_bbio->file_offset = start;
-	repair_bio->bi_opf = REQ_OP_READ;
 	repair_bio->bi_end_io = failed_bio->bi_end_io;
 	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
 	repair_bio->bi_private = failed_bio->bi_private;
@@ -3267,7 +3266,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS);
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf);
 	/*
 	 * For compressed page range, its disk_bytenr is always @disk_bytenr
 	 * passed in, no matter if we have added any range into previous bio.
@@ -3279,7 +3278,6 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	bio_ctrl->bio = bio;
 	bio_ctrl->compress_type = compress_type;
 	bio->bi_end_io = end_io_func;
-	bio->bi_opf = opf;
 	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 	if (ret < 0)
 		goto error;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index c91b68bcbee23..18284f6105e7c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -10478,12 +10478,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 			size_t bytes = min_t(u64, remaining, PAGE_SIZE);
 
 			if (!bio) {
-				bio = btrfs_bio_alloc(BIO_MAX_VECS);
+				bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ);
 				bio->bi_iter.bi_sector =
 					(disk_bytenr + cur) >> SECTOR_SHIFT;
 				bio->bi_end_io = btrfs_encoded_read_endio;
 				bio->bi_private = &priv;
-				bio->bi_opf = REQ_OP_READ;
 			}
 
 			if (!bytes ||
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6825f62364c29..ce86ab6207eab 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6627,11 +6627,11 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio)
  * Just like the underlying bio_alloc_bioset it will no fail as it is backed by
  * a mempool.
  */
-struct bio *btrfs_bio_alloc(unsigned int nr_vecs)
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf)
 {
 	struct bio *bio;
 
-	bio = bio_alloc_bioset(NULL, nr_vecs, 0, GFP_NOFS, &btrfs_bioset);
+	bio = bio_alloc_bioset(NULL, nr_vecs, opf, GFP_NOFS, &btrfs_bioset);
 	btrfs_bio_init(btrfs_bio(bio));
 	return bio;
 }
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index fb57b50fb60b1..bd108c7ed1ac3 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -393,7 +393,7 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
 	return container_of(bio, struct btrfs_bio, bio);
 }
 
-struct bio *btrfs_bio_alloc(unsigned int nr_vecs);
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf);
 struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
 
 static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 03/11] btrfs: pass the operation to btrfs_bio_alloc Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:27   ` Johannes Thumshirn
  2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Stop grabbing an extra bio_counter reference for each clone bio in a
mirrored write and instead just release the one original reference in
btrfs_end_bioc once all the bios for a single btfs_bio have completed
instead of at the end of btrfs_submit_bio once all bios have been
submitted.

This means the reference is now carried by the "upper" btrfs_bio only
instead of each lower bio.

Also remove the now unused btrfs_bio_counter_inc_noblocked helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.h       | 1 -
 fs/btrfs/dev-replace.c | 5 -----
 fs/btrfs/volumes.c     | 6 ++----
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 68a10c515dd65..1d33bec9c6b72 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3996,7 +3996,6 @@ static inline void btrfs_init_full_stripe_locks_tree(
 
 /* dev-replace.c */
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info);
-void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info);
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount);
 
 static inline void btrfs_bio_counter_dec(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index f43196a893ca3..eabf6de361c68 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -1289,11 +1289,6 @@ int __pure btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace)
 	return 1;
 }
 
-void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
-{
-	percpu_counter_inc(&fs_info->dev_replace.bio_counter);
-}
-
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
 	percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ce86ab6207eab..3f5672b362202 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6673,6 +6673,8 @@ static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
 	struct bio *orig_bio = bioc->orig_bio;
 	struct btrfs_bio *bbio = btrfs_bio(orig_bio);
 
+	btrfs_bio_counter_dec(bioc->fs_info);
+
 	bbio->mirror_num = bioc->mirror_num;
 	orig_bio->bi_private = bioc->private;
 	orig_bio->bi_end_io = bioc->end_io;
@@ -6720,7 +6722,6 @@ static void btrfs_end_bio(struct bio *bio)
 	if (bio != bioc->orig_bio)
 		bio_put(bio);
 
-	btrfs_bio_counter_dec(bioc->fs_info);
 	if (atomic_dec_and_test(&bioc->stripes_pending))
 		btrfs_end_bioc(bioc, true);
 }
@@ -6775,8 +6776,6 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 		(unsigned long)dev->bdev->bd_dev, rcu_str_deref(dev->name),
 		dev->devid, bio->bi_iter.bi_size);
 
-	btrfs_bio_counter_inc_noblocked(fs_info);
-
 	btrfsic_check_bio(bio);
 	submit_bio(bio);
 }
@@ -6828,7 +6827,6 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 
 		submit_stripe_bio(bioc, bio, dev_nr, should_clone);
 	}
-	btrfs_bio_counter_dec(fs_info);
 }
 
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] btrfs: remove bioc->stripes_pending
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:31   ` Johannes Thumshirn
                     ` (2 more replies)
  2022-07-13  6:13 ` [PATCH 06/11] btrfs: properly abstract the parity raid bio handling Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 3 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

The stripes_pending in the btrfs_io_context counts number of inflight
low-level bios for an upper btrfs_bio.  For reads this is generally
one as reads are never cloned, while for writes we can trivially use
the bio remaining mechanisms that is used for chained bios.

To be able to make use of that mechanism, split out a separate trivial
end_io handler for the cloned bios that does a minimal amount of error
tracking and which then calls bio_endio on the original bio to transfer
control to that, with the remaining counter making sure it is completed
last.  This then allows to merge btrfs_end_bioc into the original bio
bi_end_io handler.  To make this all work all error handling needs to
happen through the bi_end_io handler, which requires a small amount
of reshuffling in submit_stripe_bio so that the bio is cloned already
by the time the suitability of the device is checked.

This reduces the size of the btrfs_io_context and prepares splitting
the btrfs_bio at the stripe boundary.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 94 ++++++++++++++++++++++++----------------------
 fs/btrfs/volumes.h |  1 -
 2 files changed, 50 insertions(+), 45 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3f5672b362202..92e15e523451a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5893,7 +5893,6 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 		sizeof(u64) * (total_stripes),
 		GFP_NOFS|__GFP_NOFAIL);
 
-	atomic_set(&bioc->error, 0);
 	refcount_set(&bioc->refs, 1);
 
 	bioc->fs_info = fs_info;
@@ -6650,7 +6649,21 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
 	bio_trim(bio, offset >> 9, size >> 9);
 	bbio->iter = bio->bi_iter;
 	return bio;
+}
+
+static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
+{
+	if (!dev || !dev->bdev)
+		return;
+	if (bio->bi_status != BLK_STS_IOERR && bio->bi_status != BLK_STS_TARGET)
+		return;
 
+	if (btrfs_op(bio) == BTRFS_MAP_WRITE)
+		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
+	if (!(bio->bi_opf & REQ_RAHEAD))
+		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS);
+	if (bio->bi_opf & REQ_PREFLUSH)
+		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_FLUSH_ERRS);
 }
 
 static struct workqueue_struct *btrfs_end_io_wq(struct btrfs_io_context *bioc)
@@ -6668,62 +6681,54 @@ static void btrfs_end_bio_work(struct work_struct *work)
 	bio_endio(&bbio->bio);
 }
 
-static void btrfs_end_bioc(struct btrfs_io_context *bioc, bool async)
+static void btrfs_end_bio(struct bio *bio)
 {
-	struct bio *orig_bio = bioc->orig_bio;
-	struct btrfs_bio *bbio = btrfs_bio(orig_bio);
+	struct btrfs_io_stripe *stripe = bio->bi_private;
+	struct btrfs_io_context *bioc = stripe->bioc;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 
+	if (bio->bi_status) {
+		atomic_inc(&bioc->error);
+		btrfs_log_dev_io_error(bio, stripe->dev);
+	}
+
 	bbio->mirror_num = bioc->mirror_num;
-	orig_bio->bi_private = bioc->private;
-	orig_bio->bi_end_io = bioc->end_io;
+	bio->bi_end_io = bioc->end_io;
+	bio->bi_private = bioc->private;
 
 	/*
 	 * Only send an error to the higher layers if it is beyond the tolerance
 	 * threshold.
 	 */
 	if (atomic_read(&bioc->error) > bioc->max_errors)
-		orig_bio->bi_status = BLK_STS_IOERR;
+		bio->bi_status = BLK_STS_IOERR;
 	else
-		orig_bio->bi_status = BLK_STS_OK;
+		bio->bi_status = BLK_STS_OK;
 
-	if (btrfs_op(orig_bio) == BTRFS_MAP_READ && async) {
+	if (btrfs_op(bio) == BTRFS_MAP_READ) {
 		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
 		queue_work(btrfs_end_io_wq(bioc), &bbio->end_io_work);
 	} else {
-		bio_endio(orig_bio);
+		bio_endio(bio);
 	}
 
 	btrfs_put_bioc(bioc);
 }
 
-static void btrfs_end_bio(struct bio *bio)
+static void btrfs_clone_write_end_io(struct bio *bio)
 {
 	struct btrfs_io_stripe *stripe = bio->bi_private;
-	struct btrfs_io_context *bioc = stripe->bioc;
 
 	if (bio->bi_status) {
-		atomic_inc(&bioc->error);
-		if (bio->bi_status == BLK_STS_IOERR ||
-		    bio->bi_status == BLK_STS_TARGET) {
-			if (btrfs_op(bio) == BTRFS_MAP_WRITE)
-				btrfs_dev_stat_inc_and_print(stripe->dev,
-						BTRFS_DEV_STAT_WRITE_ERRS);
-			else if (!(bio->bi_opf & REQ_RAHEAD))
-				btrfs_dev_stat_inc_and_print(stripe->dev,
-						BTRFS_DEV_STAT_READ_ERRS);
-			if (bio->bi_opf & REQ_PREFLUSH)
-				btrfs_dev_stat_inc_and_print(stripe->dev,
-						BTRFS_DEV_STAT_FLUSH_ERRS);
-		}
+		atomic_inc(&stripe->bioc->error);
+		btrfs_log_dev_io_error(bio, stripe->dev);
 	}
 
-	if (bio != bioc->orig_bio)
-		bio_put(bio);
-
-	if (atomic_dec_and_test(&bioc->stripes_pending))
-		btrfs_end_bioc(bioc, true);
+	/* Pass on cotrol to the original bio this one was cloned from */
+	bio_endio(stripe->bioc->orig_bio);
+	bio_put(bio);
 }
 
 static void submit_stripe_bio(struct btrfs_io_context *bioc,
@@ -6734,28 +6739,30 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 	u64 physical = bioc->stripes[dev_nr].physical;
 	struct bio *bio;
 
-	if (!dev || !dev->bdev ||
-	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
-	    (btrfs_op(orig_bio) == BTRFS_MAP_WRITE &&
-	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
-		atomic_inc(&bioc->error);
-		if (atomic_dec_and_test(&bioc->stripes_pending))
-			btrfs_end_bioc(bioc, false);
-		return;
-	}
-
 	if (clone) {
-		bio = bio_alloc_clone(dev->bdev, orig_bio, GFP_NOFS, &fs_bio_set);
+		bio = bio_alloc_clone(NULL, orig_bio, GFP_NOFS, &fs_bio_set);
+		bio_inc_remaining(orig_bio);
+		bio->bi_end_io = btrfs_clone_write_end_io;
 	} else {
 		bio = orig_bio;
-		bio_set_dev(bio, dev->bdev);
 		btrfs_bio(bio)->device = dev;
+		bio->bi_end_io = btrfs_end_bio;
 	}
 
 	bioc->stripes[dev_nr].bioc = bioc;
 	bio->bi_private = &bioc->stripes[dev_nr];
-	bio->bi_end_io = btrfs_end_bio;
 	bio->bi_iter.bi_sector = physical >> 9;
+
+	if (!dev || !dev->bdev ||
+	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
+	    (btrfs_op(bio) == BTRFS_MAP_WRITE &&
+	     !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) {
+		bio_io_error(bio);
+		return;
+	}
+
+	bio_set_dev(bio, dev->bdev);
+
 	/*
 	 * For zone append writing, bi_sector must point the beginning of the
 	 * zone
@@ -6804,7 +6811,6 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	bioc->orig_bio = bio;
 	bioc->private = bio->bi_private;
 	bioc->end_io = bio->bi_end_io;
-	atomic_set(&bioc->stripes_pending, total_devs);
 
 	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
 	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index bd108c7ed1ac3..9e984a9922c59 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -454,7 +454,6 @@ struct btrfs_discard_stripe {
  */
 struct btrfs_io_context {
 	refcount_t refs;
-	atomic_t stripes_pending;
 	struct btrfs_fs_info *fs_info;
 	u64 map_type; /* get from map_lookup->type */
 	bio_end_io_t *end_io;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] btrfs: properly abstract the parity raid bio handling
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:13 ` [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler Christoph Hellwig
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

The parity raid write/recover functionality is currently not very well
abstracted from the bio submission and completion handling in volumes.c:

 - the raid56 code directly completes the original btrfs_bio fed into
   btrfs_submit_bio instead of dispatching back to volumes.c
 - the raid56 code consumes the bioc and bio_counter references taken
   by volumes.c, which also leads to ugly special casing of the calls
   from the scrub code into the raid56 code

To fix this up supply a bi_end_io handler that calls back into the
volumes.c machinery, which then puts the bioc, decrements the bio_counter
and completes the original bio, and updates the scrub code to also
take ownership of the bioc and bio_counter in all cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid56.c  | 45 +++++++--------------------------------------
 fs/btrfs/raid56.h  |  4 +---
 fs/btrfs/scrub.c   |  7 +++++--
 fs/btrfs/volumes.c | 18 +++++++++++++++++-
 4 files changed, 30 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1afe32d5ab017..d767814653249 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -275,7 +275,6 @@ static void merge_rbio(struct btrfs_raid_bio *dest,
 	/* Also inherit the bitmaps from @victim. */
 	bitmap_or(&dest->dbitmap, &victim->dbitmap, &dest->dbitmap,
 		  dest->stripe_nsectors);
-	dest->generic_bio_cnt += victim->generic_bio_cnt;
 	bio_list_init(&victim->bio_list);
 }
 
@@ -814,8 +813,6 @@ static void rbio_orig_end_io(struct btrfs_raid_bio *rbio, blk_status_t err)
 	struct bio *cur = bio_list_get(&rbio->bio_list);
 	struct bio *extra;
 
-	if (rbio->generic_bio_cnt)
-		btrfs_bio_counter_sub(rbio->bioc->fs_info, rbio->generic_bio_cnt);
 	/*
 	 * Clear the data bitmap, as the rbio may be cached for later usage.
 	 * do this before before unlock_stripe() so there will be no new bio
@@ -946,6 +943,7 @@ static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info,
 	spin_lock_init(&rbio->bio_list_lock);
 	INIT_LIST_HEAD(&rbio->stripe_cache);
 	INIT_LIST_HEAD(&rbio->hash_list);
+	btrfs_get_bioc(bioc);
 	rbio->bioc = bioc;
 	rbio->nr_pages = num_pages;
 	rbio->nr_sectors = num_sectors;
@@ -1813,15 +1811,12 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
 
 	rbio = alloc_rbio(fs_info, bioc);
 	if (IS_ERR(rbio)) {
-		btrfs_put_bioc(bioc);
 		ret = PTR_ERR(rbio);
-		goto out_dec_counter;
+		goto fail;
 	}
 	rbio->operation = BTRFS_RBIO_WRITE;
 	rbio_add_bio(rbio, bio);
 
-	rbio->generic_bio_cnt = 1;
-
 	/*
 	 * don't plug on full rbios, just get them out the door
 	 * as quickly as we can
@@ -1829,7 +1824,7 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
 	if (rbio_is_full(rbio)) {
 		ret = full_stripe_write(rbio);
 		if (ret)
-			goto out_dec_counter;
+			goto fail;
 		return;
 	}
 
@@ -1844,13 +1839,12 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc)
 	} else {
 		ret = __raid56_parity_write(rbio);
 		if (ret)
-			goto out_dec_counter;
+			goto fail;
 	}
 
 	return;
 
-out_dec_counter:
-	btrfs_bio_counter_dec(fs_info);
+fail:
 	bio->bi_status = errno_to_blk_status(ret);
 	bio_endio(bio);
 }
@@ -2198,18 +2192,11 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
  * of the drive.
  */
 void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
-			   int mirror_num, bool generic_io)
+			   int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = bioc->fs_info;
 	struct btrfs_raid_bio *rbio;
 
-	if (generic_io) {
-		ASSERT(bioc->mirror_num == mirror_num);
-		btrfs_bio(bio)->mirror_num = mirror_num;
-	} else {
-		btrfs_get_bioc(bioc);
-	}
-
 	rbio = alloc_rbio(fs_info, bioc);
 	if (IS_ERR(rbio)) {
 		bio->bi_status = errno_to_blk_status(PTR_ERR(rbio));
@@ -2225,14 +2212,11 @@ void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 "%s could not find the bad stripe in raid56 so that we cannot recover any more (bio has logical %llu len %llu, bioc has map_type %llu)",
 			   __func__, bio->bi_iter.bi_sector << 9,
 			   (u64)bio->bi_iter.bi_size, bioc->map_type);
-		kfree(rbio);
+		__free_raid_bio(rbio);
 		bio->bi_status = BLK_STS_IOERR;
 		goto out_end_bio;
 	}
 
-	if (generic_io)
-		rbio->generic_bio_cnt = 1;
-
 	/*
 	 * Loop retry:
 	 * for 'mirror == 2', reconstruct from all other stripes.
@@ -2261,8 +2245,6 @@ void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
 	return;
 
 out_end_bio:
-	btrfs_bio_counter_dec(fs_info);
-	btrfs_put_bioc(bioc);
 	bio_endio(bio);
 }
 
@@ -2326,13 +2308,6 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio,
 	ASSERT(i < rbio->real_stripes);
 
 	bitmap_copy(&rbio->dbitmap, dbitmap, stripe_nsectors);
-
-	/*
-	 * We have already increased bio_counter when getting bioc, record it
-	 * so we can free it at rbio_orig_end_io().
-	 */
-	rbio->generic_bio_cnt = 1;
-
 	return rbio;
 }
 
@@ -2772,12 +2747,6 @@ raid56_alloc_missing_rbio(struct bio *bio, struct btrfs_io_context *bioc)
 		return NULL;
 	}
 
-	/*
-	 * When we get bioc, we have already increased bio_counter, record it
-	 * so we can free it at rbio_orig_end_io()
-	 */
-	rbio->generic_bio_cnt = 1;
-
 	return rbio;
 }
 
diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h
index 6f48f9e4c8694..91d5c0adad151 100644
--- a/fs/btrfs/raid56.h
+++ b/fs/btrfs/raid56.h
@@ -89,8 +89,6 @@ struct btrfs_raid_bio {
 	 */
 	int bio_list_bytes;
 
-	int generic_bio_cnt;
-
 	refcount_t refs;
 
 	atomic_t stripes_pending;
@@ -166,7 +164,7 @@ static inline int nr_data_stripes(const struct map_lookup *map)
 struct btrfs_device;
 
 void raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc,
-			   int mirror_num, bool generic_io);
+			   int mirror_num);
 void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc);
 
 void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3afe5fa50a631..5f9ecb6d5b997 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1381,7 +1381,7 @@ static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info,
 	bio->bi_private = &done;
 	bio->bi_end_io = scrub_bio_wait_endio;
 	raid56_parity_recover(bio, sector->recover->bioc,
-			      sector->sblock->sectors[0]->mirror_num, false);
+			      sector->sblock->sectors[0]->mirror_num);
 
 	wait_for_completion_io(&done);
 	return blk_status_to_errno(bio->bi_status);
@@ -2102,6 +2102,7 @@ static void scrub_missing_raid56_end_io(struct bio *bio)
 	struct scrub_block *sblock = bio->bi_private;
 	struct btrfs_fs_info *fs_info = sblock->sctx->fs_info;
 
+	btrfs_bio_counter_dec(fs_info);
 	if (bio->bi_status)
 		sblock->no_io_error_seen = 0;
 
@@ -2204,6 +2205,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	scrub_block_get(sblock);
 	scrub_pending_bio_inc(sctx);
 	raid56_submit_missing_rbio(rbio);
+	btrfs_put_bioc(bioc);
 	return;
 
 rbio_out:
@@ -2774,6 +2776,7 @@ static void scrub_parity_bio_endio_worker(struct work_struct *work)
 						    work);
 	struct scrub_ctx *sctx = sparity->sctx;
 
+	btrfs_bio_counter_dec(sctx->fs_info);
 	scrub_free_parity(sparity);
 	scrub_pending_bio_dec(sctx);
 }
@@ -2824,6 +2827,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 					      sparity->scrub_dev,
 					      &sparity->dbitmap,
 					      sparity->nsectors);
+	btrfs_put_bioc(bioc);
 	if (!rbio)
 		goto rbio_out;
 
@@ -2835,7 +2839,6 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 	bio_put(bio);
 bioc_out:
 	btrfs_bio_counter_dec(fs_info);
-	btrfs_put_bioc(bioc);
 	bitmap_or(&sparity->ebitmap, &sparity->ebitmap, &sparity->dbitmap,
 		  sparity->nsectors);
 	spin_lock(&sctx->stat_lock);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 92e15e523451a..a635442c06c3e 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6681,6 +6681,20 @@ static void btrfs_end_bio_work(struct work_struct *work)
 	bio_endio(&bbio->bio);
 }
 
+static void btrfs_raid56_end_io(struct bio *bio)
+{
+	struct btrfs_io_context *bioc = bio->bi_private;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
+
+	btrfs_bio_counter_dec(bioc->fs_info);
+	bbio->mirror_num = bioc->mirror_num;
+	bio->bi_end_io = bioc->end_io;
+	bio->bi_private = bioc->private;
+	bio->bi_end_io(bio);
+
+	btrfs_put_bioc(bioc);
+}
+
 static void btrfs_end_bio(struct bio *bio)
 {
 	struct btrfs_io_stripe *stripe = bio->bi_private;
@@ -6814,10 +6828,12 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 
 	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
 	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
+		bio->bi_private = bioc;
+		bio->bi_end_io = btrfs_raid56_end_io;
 		if (btrfs_op(bio) == BTRFS_MAP_WRITE)
 			raid56_parity_write(bio, bioc);
 		else
-			raid56_parity_recover(bio, bioc, mirror_num, true);
+			raid56_parity_recover(bio, bioc, mirror_num);
 		return;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 06/11] btrfs: properly abstract the parity raid bio handling Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:47   ` Johannes Thumshirn
  2022-07-13  6:13 ` [PATCH 08/11] btrfs: split submit_stripe_bio Christoph Hellwig
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Currently btrfs_bio end I/O handling is a bit of a mess.  The bi_end_io
handler and bi_private pointer of the embedded struct bio are both used
to handle the completion of the high-level btrfs_bio and for the I/O
completion for the low-level device that the embedded bio ends up being
sent to.  To support this bi_end_io and bi_private are saved into the
btrfs_io_context structure and then restored after the bio sent to the
underlying device has completed the actual I/O.

Untangle this by adding an end I/O handler and private data to struct
btrfs_bio for the highlevel btrfs_bio based completions, and leave the
actual bio bi_end_io handler and bi_private pointer entirely to the
low-level device I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/compression.c | 42 +++++++++++++------------------
 fs/btrfs/disk-io.c     | 16 ++++++------
 fs/btrfs/extent_io.c   | 35 +++++++++++++-------------
 fs/btrfs/inode.c       | 56 +++++++++++++++++++-----------------------
 fs/btrfs/volumes.c     | 30 +++++++++++-----------
 fs/btrfs/volumes.h     | 20 ++++++++++++---
 6 files changed, 96 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b2bc605ec73a9..ea6a391d11980 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -152,9 +152,7 @@ static void finish_compressed_bio_read(struct compressed_bio *cb)
 	}
 
 	/* Do io completion on the original bio */
-	if (cb->status != BLK_STS_OK)
-		cb->orig_bio->bi_status = cb->status;
-	bio_endio(cb->orig_bio);
+	btrfs_bio_end_io(btrfs_bio(cb->orig_bio), cb->status);
 
 	/* Finally free the cb struct */
 	kfree(cb->compressed_pages);
@@ -166,16 +164,15 @@ static void finish_compressed_bio_read(struct compressed_bio *cb)
  * before decompressing it into the original bio and freeing the uncompressed
  * pages.
  */
-static void end_compressed_bio_read(struct bio *bio)
+static void end_compressed_bio_read(struct btrfs_bio *bbio)
 {
-	struct compressed_bio *cb = bio->bi_private;
+	struct compressed_bio *cb = bbio->private;
 	struct inode *inode = cb->inode;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_inode *bi = BTRFS_I(inode);
 	bool csum = !(bi->flags & BTRFS_INODE_NODATASUM) &&
 		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
-	blk_status_t status = bio->bi_status;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
+	blk_status_t status = bbio->bio.bi_status;
 	struct bvec_iter iter;
 	struct bio_vec bv;
 	u32 offset;
@@ -209,7 +206,7 @@ static void end_compressed_bio_read(struct bio *bio)
 	if (refcount_dec_and_test(&cb->pending_ios))
 		finish_compressed_bio_read(cb);
 	btrfs_bio_free_csum(bbio);
-	bio_put(bio);
+	bio_put(&bbio->bio);
 }
 
 /*
@@ -301,20 +298,20 @@ static void btrfs_finish_compressed_write_work(struct work_struct *work)
  * This also calls the writeback end hooks for the file pages so that metadata
  * and checksums can be updated in the file.
  */
-static void end_compressed_bio_write(struct bio *bio)
+static void end_compressed_bio_write(struct btrfs_bio *bbio)
 {
-	struct compressed_bio *cb = bio->bi_private;
+	struct compressed_bio *cb = bbio->private;
 
-	if (bio->bi_status)
-		cb->status = bio->bi_status;
+	if (bbio->bio.bi_status)
+		cb->status = bbio->bio.bi_status;
 
 	if (refcount_dec_and_test(&cb->pending_ios)) {
 		struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
 
-		btrfs_record_physical_zoned(cb->inode, cb->start, bio);
+		btrfs_record_physical_zoned(cb->inode, cb->start, &bbio->bio);
 		queue_work(fs_info->compressed_write_workers, &cb->write_end_work);
 	}
-	bio_put(bio);
+	bio_put(&bbio->bio);
 }
 
 /*
@@ -335,7 +332,8 @@ static void end_compressed_bio_write(struct bio *bio)
 
 
 static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr,
-					unsigned int opf, bio_end_io_t endio_func,
+					unsigned int opf,
+					btrfs_bio_end_io_t endio_func,
 					u64 *next_stripe_start)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb);
@@ -344,11 +342,8 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf);
-
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, endio_func, cb);
 	bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT;
-	bio->bi_private = cb;
-	bio->bi_end_io = endio_func;
 
 	em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize);
 	if (IS_ERR(em)) {
@@ -477,8 +472,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 			if (!skip_sum) {
 				ret = btrfs_csum_one_bio(inode, bio, start, true);
 				if (ret) {
-					bio->bi_status = ret;
-					bio_endio(bio);
+					btrfs_bio_end_io(btrfs_bio(bio), ret);
 					break;
 				}
 			}
@@ -799,8 +793,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 
 			ret = btrfs_lookup_bio_sums(inode, comp_bio, NULL);
 			if (ret) {
-				comp_bio->bi_status = ret;
-				bio_endio(comp_bio);
+				btrfs_bio_end_io(btrfs_bio(comp_bio), ret);
 				break;
 			}
 
@@ -826,8 +819,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	kfree(cb);
 out:
 	free_extent_map(em);
-	bio->bi_status = ret;
-	bio_endio(bio);
+	btrfs_bio_end_io(btrfs_bio(bio), ret);
 	return;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 717c9f6b9ba5b..d62ab276fd6b7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -728,16 +728,14 @@ static void run_one_async_start(struct btrfs_work *work)
  */
 static void run_one_async_done(struct btrfs_work *work)
 {
-	struct async_submit_bio *async;
-	struct inode *inode;
-
-	async = container_of(work, struct  async_submit_bio, work);
-	inode = async->inode;
+	struct async_submit_bio *async =
+		container_of(work, struct  async_submit_bio, work);
+	struct inode *inode = async->inode;
+	struct btrfs_bio *bbio = btrfs_bio(async->bio);
 
 	/* If an error occurred we just want to clean up the bio and move on */
 	if (async->status) {
-		async->bio->bi_status = async->status;
-		bio_endio(async->bio);
+		btrfs_bio_end_io(bbio, async->status);
 		return;
 	}
 
@@ -838,6 +836,7 @@ static bool should_async_write(struct btrfs_fs_info *fs_info,
 void btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int mirror_num)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+	struct btrfs_bio *bbio = btrfs_bio(bio);
 	blk_status_t ret;
 
 	bio->bi_opf |= REQ_META;
@@ -857,8 +856,7 @@ void btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int mirror_
 
 	ret = btree_csum_one_bio(bio);
 	if (ret) {
-		bio->bi_status = ret;
-		bio_endio(bio);
+		btrfs_bio_end_io(bbio, ret);
 		return;
 	}
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 667b4f439a089..36d604711fe04 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -206,7 +206,7 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 		btrfs_submit_data_read_bio(inode, bio, mirror_num,
 					   bio_ctrl->compress_type);
 
-	/* The bio is owned by the bi_end_io handler now */
+	/* The bio is owned by the end_io handler now */
 	bio_ctrl->bio = NULL;
 }
 
@@ -222,9 +222,8 @@ static void submit_write_bio(struct extent_page_data *epd, int ret)
 
 	if (ret) {
 		ASSERT(ret < 0);
-		bio->bi_status = errno_to_blk_status(ret);
-		bio_endio(bio);
-		/* The bio is owned by the bi_end_io handler now */
+		btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
+		/* The bio is owned by the end_io handler now */
 		epd->bio_ctrl.bio = NULL;
 	} else {
 		submit_one_bio(&epd->bio_ctrl);
@@ -2629,12 +2628,11 @@ int btrfs_repair_one_sector(struct inode *inode, struct btrfs_bio *failed_bbio,
 		return -EIO;
 	}
 
-	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ);
+	repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io,
+				     failed_bbio->private);
 	repair_bbio = btrfs_bio(repair_bio);
 	repair_bbio->file_offset = start;
-	repair_bio->bi_end_io = failed_bio->bi_end_io;
 	repair_bio->bi_iter.bi_sector = failrec->logical >> 9;
-	repair_bio->bi_private = failed_bio->bi_private;
 
 	if (failed_bbio->csum) {
 		const u32 csum_size = fs_info->csum_size;
@@ -2801,8 +2799,9 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end)
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_writepage(struct bio *bio)
+static void end_bio_extent_writepage(struct btrfs_bio *bbio)
 {
+	struct bio *bio = &bbio->bio;
 	int error = blk_status_to_errno(bio->bi_status);
 	struct bio_vec *bvec;
 	u64 start;
@@ -2964,10 +2963,10 @@ static struct extent_buffer *find_extent_buffer_readpage(
  * Scheduling is not allowed, so the extent state tree is expected
  * to have one and only one object corresponding to this IO.
  */
-static void end_bio_extent_readpage(struct bio *bio)
+static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 {
+	struct bio *bio = &bbio->bio;
 	struct bio_vec *bvec;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
 	struct extent_io_tree *tree, *failure_tree;
 	struct processed_extent processed = { 0 };
 	/*
@@ -3258,7 +3257,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 			 struct btrfs_bio_ctrl *bio_ctrl,
 			 struct writeback_control *wbc,
 			 unsigned int opf,
-			 bio_end_io_t end_io_func,
+			 btrfs_bio_end_io_t end_io_func,
 			 u64 disk_bytenr, u32 offset, u64 file_offset,
 			 enum btrfs_compression_type compress_type)
 {
@@ -3266,7 +3265,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	struct bio *bio;
 	int ret;
 
-	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf);
+	bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, end_io_func, NULL);
 	/*
 	 * For compressed page range, its disk_bytenr is always @disk_bytenr
 	 * passed in, no matter if we have added any range into previous bio.
@@ -3277,7 +3276,6 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 		bio->bi_iter.bi_sector = (disk_bytenr + offset) >> SECTOR_SHIFT;
 	bio_ctrl->bio = bio;
 	bio_ctrl->compress_type = compress_type;
-	bio->bi_end_io = end_io_func;
 	ret = calc_bio_boundaries(bio_ctrl, inode, file_offset);
 	if (ret < 0)
 		goto error;
@@ -3316,8 +3314,7 @@ static int alloc_new_bio(struct btrfs_inode *inode,
 	return 0;
 error:
 	bio_ctrl->bio = NULL;
-	bio->bi_status = errno_to_blk_status(ret);
-	bio_endio(bio);
+	btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
 	return ret;
 }
 
@@ -3340,7 +3337,7 @@ static int submit_extent_page(unsigned int opf,
 			      struct btrfs_bio_ctrl *bio_ctrl,
 			      struct page *page, u64 disk_bytenr,
 			      size_t size, unsigned long pg_offset,
-			      bio_end_io_t end_io_func,
+			      btrfs_bio_end_io_t end_io_func,
 			      enum btrfs_compression_type compress_type,
 			      bool force_bio_submit)
 {
@@ -4337,8 +4334,9 @@ static struct extent_buffer *find_extent_buffer_nolock(
  * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
  * after all extent buffers in the page has finished their writeback.
  */
-static void end_bio_subpage_eb_writepage(struct bio *bio)
+static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
 {
+	struct bio *bio = &bbio->bio;
 	struct btrfs_fs_info *fs_info;
 	struct bio_vec *bvec;
 	struct bvec_iter_all iter_all;
@@ -4394,8 +4392,9 @@ static void end_bio_subpage_eb_writepage(struct bio *bio)
 	bio_put(bio);
 }
 
-static void end_bio_extent_buffer_writepage(struct bio *bio)
+static void end_bio_extent_buffer_writepage(struct btrfs_bio *bbio)
 {
+	struct bio *bio = &bbio->bio;
 	struct bio_vec *bvec;
 	struct extent_buffer *eb;
 	int done;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 18284f6105e7c..e1b6a5164d035 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2701,8 +2701,10 @@ void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirro
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
 		ret = extract_ordered_extent(bi, bio,
 				page_offset(bio_first_bvec_all(bio)->bv_page));
-		if (ret)
-			goto out;
+		if (ret) {
+			btrfs_bio_end_io(btrfs_bio(bio), ret);
+			return;
+		}
 	}
 
 	/*
@@ -2722,16 +2724,12 @@ void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirro
 			return;
 
 		ret = btrfs_csum_one_bio(bi, bio, (u64)-1, false);
-		if (ret)
-			goto out;
+		if (ret) {
+			btrfs_bio_end_io(btrfs_bio(bio), ret);
+			return;
+		}
 	}
 	btrfs_submit_bio(fs_info, bio, mirror_num);
-	return;
-out:
-	if (ret) {
-		bio->bi_status = ret;
-		bio_endio(bio);
-	}
 }
 
 void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio,
@@ -2758,8 +2756,7 @@ void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio,
 	 */
 	ret = btrfs_lookup_bio_sums(inode, bio, NULL);
 	if (ret) {
-		bio->bi_status = ret;
-		bio_endio(bio);
+		btrfs_bio_end_io(btrfs_bio(bio), ret);
 		return;
 	}
 
@@ -7974,7 +7971,7 @@ static void submit_dio_repair_bio(struct inode *inode, struct bio *bio,
 				  int mirror_num,
 				  enum btrfs_compression_type compress_type)
 {
-	struct btrfs_dio_private *dip = bio->bi_private;
+	struct btrfs_dio_private *dip = btrfs_bio(bio)->private;
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 
 	BUG_ON(bio_op(bio) == REQ_OP_WRITE);
@@ -8027,10 +8024,10 @@ static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode,
 	return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, false);
 }
 
-static void btrfs_end_dio_bio(struct bio *bio)
+static void btrfs_end_dio_bio(struct btrfs_bio *bbio)
 {
-	struct btrfs_dio_private *dip = bio->bi_private;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_dio_private *dip = bbio->private;
+	struct bio *bio = &bbio->bio;
 	blk_status_t err = bio->bi_status;
 
 	if (err)
@@ -8056,7 +8053,7 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct inode *inode,
 				 u64 file_offset, int async_submit)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
-	struct btrfs_dio_private *dip = bio->bi_private;
+	struct btrfs_dio_private *dip = btrfs_bio(bio)->private;
 	blk_status_t ret;
 
 	/* Save the original iter for read repair */
@@ -8079,8 +8076,7 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct inode *inode,
 		 */
 		ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, false);
 		if (ret) {
-			bio->bi_status = ret;
-			bio_endio(bio);
+			btrfs_bio_end_io(btrfs_bio(bio), ret);
 			return;
 		}
 	} else {
@@ -8163,9 +8159,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter,
 		 * This will never fail as it's passing GPF_NOFS and
 		 * the allocation is backed by btrfs_bioset.
 		 */
-		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len);
-		bio->bi_private = dip;
-		bio->bi_end_io = btrfs_end_dio_bio;
+		bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len,
+					      btrfs_end_dio_bio, dip);
 		btrfs_bio(bio)->file_offset = file_offset;
 
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
@@ -10364,7 +10359,7 @@ struct btrfs_encoded_read_private {
 static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
 					    struct bio *bio, int mirror_num)
 {
-	struct btrfs_encoded_read_private *priv = bio->bi_private;
+	struct btrfs_encoded_read_private *priv = btrfs_bio(bio)->private;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	blk_status_t ret;
 
@@ -10382,7 +10377,7 @@ static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode,
 static blk_status_t btrfs_encoded_read_verify_csum(struct btrfs_bio *bbio)
 {
 	const bool uptodate = (bbio->bio.bi_status == BLK_STS_OK);
-	struct btrfs_encoded_read_private *priv = bbio->bio.bi_private;
+	struct btrfs_encoded_read_private *priv = bbio->private;
 	struct btrfs_inode *inode = priv->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
 	u32 sectorsize = fs_info->sectorsize;
@@ -10410,10 +10405,9 @@ static blk_status_t btrfs_encoded_read_verify_csum(struct btrfs_bio *bbio)
 	return BLK_STS_OK;
 }
 
-static void btrfs_encoded_read_endio(struct bio *bio)
+static void btrfs_encoded_read_endio(struct btrfs_bio *bbio)
 {
-	struct btrfs_encoded_read_private *priv = bio->bi_private;
-	struct btrfs_bio *bbio = btrfs_bio(bio);
+	struct btrfs_encoded_read_private *priv = bbio->private;
 	blk_status_t status;
 
 	status = btrfs_encoded_read_verify_csum(bbio);
@@ -10431,7 +10425,7 @@ static void btrfs_encoded_read_endio(struct bio *bio)
 	if (!atomic_dec_return(&priv->pending))
 		wake_up(&priv->wait);
 	btrfs_bio_free_csum(bbio);
-	bio_put(bio);
+	bio_put(&bbio->bio);
 }
 
 int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
@@ -10478,11 +10472,11 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode,
 			size_t bytes = min_t(u64, remaining, PAGE_SIZE);
 
 			if (!bio) {
-				bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ);
+				bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ,
+						      btrfs_encoded_read_endio,
+						      &priv);
 				bio->bi_iter.bi_sector =
 					(disk_bytenr + cur) >> SECTOR_SHIFT;
-				bio->bi_end_io = btrfs_encoded_read_endio;
-				bio->bi_private = &priv;
 			}
 
 			if (!bytes ||
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a635442c06c3e..7e092c69ee70b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6614,9 +6614,12 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
  * Initialize a btrfs_bio structure.  This skips the embedded bio itself as it
  * is already initialized by the block layer.
  */
-static inline void btrfs_bio_init(struct btrfs_bio *bbio)
+static inline void btrfs_bio_init(struct btrfs_bio *bbio,
+				  btrfs_bio_end_io_t end_io, void *private)
 {
 	memset(bbio, 0, offsetof(struct btrfs_bio, bio));
+	bbio->end_io = end_io;
+	bbio->private = private;
 }
 
 /*
@@ -6626,16 +6629,18 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio)
  * Just like the underlying bio_alloc_bioset it will no fail as it is backed by
  * a mempool.
  */
-struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf)
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf,
+			    btrfs_bio_end_io_t end_io, void *private)
 {
 	struct bio *bio;
 
 	bio = bio_alloc_bioset(NULL, nr_vecs, opf, GFP_NOFS, &btrfs_bioset);
-	btrfs_bio_init(btrfs_bio(bio));
+	btrfs_bio_init(btrfs_bio(bio), end_io, private);
 	return bio;
 }
 
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
+struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
+				    btrfs_bio_end_io_t end_io, void *private)
 {
 	struct bio *bio;
 	struct btrfs_bio *bbio;
@@ -6644,7 +6649,7 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size)
 
 	bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset);
 	bbio = btrfs_bio(bio);
-	btrfs_bio_init(bbio);
+	btrfs_bio_init(bbio, end_io, private);
 
 	bio_trim(bio, offset >> 9, size >> 9);
 	bbio->iter = bio->bi_iter;
@@ -6678,7 +6683,7 @@ static void btrfs_end_bio_work(struct work_struct *work)
 	struct btrfs_bio *bbio =
 		container_of(work, struct btrfs_bio, end_io_work);
 
-	bio_endio(&bbio->bio);
+	bbio->end_io(bbio);
 }
 
 static void btrfs_raid56_end_io(struct bio *bio)
@@ -6688,9 +6693,7 @@ static void btrfs_raid56_end_io(struct bio *bio)
 
 	btrfs_bio_counter_dec(bioc->fs_info);
 	bbio->mirror_num = bioc->mirror_num;
-	bio->bi_end_io = bioc->end_io;
-	bio->bi_private = bioc->private;
-	bio->bi_end_io(bio);
+	bbio->end_io(bbio);
 
 	btrfs_put_bioc(bioc);
 }
@@ -6709,8 +6712,6 @@ static void btrfs_end_bio(struct bio *bio)
 	}
 
 	bbio->mirror_num = bioc->mirror_num;
-	bio->bi_end_io = bioc->end_io;
-	bio->bi_private = bioc->private;
 
 	/*
 	 * Only send an error to the higher layers if it is beyond the tolerance
@@ -6725,7 +6726,7 @@ static void btrfs_end_bio(struct bio *bio)
 		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
 		queue_work(btrfs_end_io_wq(bioc), &bbio->end_io_work);
 	} else {
-		bio_endio(bio);
+		bbio->end_io(bbio);
 	}
 
 	btrfs_put_bioc(bioc);
@@ -6816,15 +6817,12 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 				&map_length, &bioc, mirror_num, 1);
 	if (ret) {
 		btrfs_bio_counter_dec(fs_info);
-		bio->bi_status = errno_to_blk_status(ret);
-		bio_endio(bio);
+		btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
 		return;
 	}
 
 	total_devs = bioc->num_stripes;
 	bioc->orig_bio = bio;
-	bioc->private = bio->bi_private;
-	bioc->end_io = bio->bi_end_io;
 
 	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
 	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 9e984a9922c59..d3667592fec76 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -361,6 +361,8 @@ struct btrfs_fs_devices {
  */
 #define BTRFS_MAX_BIO_SECTORS				(256)
 
+typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio);
+
 /*
  * Additional info to pass along bio.
  *
@@ -378,6 +380,10 @@ struct btrfs_bio {
 	u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE];
 	struct bvec_iter iter;
 
+	/* End I/O information supplied to btrfs_bio_alloc */
+	btrfs_bio_end_io_t end_io;
+	void *private;
+
 	/* For read end I/O handling */
 	struct work_struct end_io_work;
 
@@ -393,8 +399,16 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio)
 	return container_of(bio, struct btrfs_bio, bio);
 }
 
-struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf);
-struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size);
+struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf,
+			    btrfs_bio_end_io_t end_io, void *private);
+struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size,
+				    btrfs_bio_end_io_t end_io, void *private);
+
+static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status)
+{
+	bbio->bio.bi_status = status;
+	bbio->end_io(bbio);
+}
 
 static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio)
 {
@@ -456,9 +470,7 @@ struct btrfs_io_context {
 	refcount_t refs;
 	struct btrfs_fs_info *fs_info;
 	u64 map_type; /* get from map_lookup->type */
-	bio_end_io_t *end_io;
 	struct bio *orig_bio;
-	void *private;
 	atomic_t error;
 	int max_errors;
 	int num_stripes;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/11] btrfs: split submit_stripe_bio
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:46   ` Johannes Thumshirn
  2022-07-14 14:02   ` Nikolay Borisov
  2022-07-13  6:13 ` [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Split out a low-level btrfs_submit_dev_bio helper that just submits
the bio without any cloning decisions or setting up the end I/O handler
for future reuse by a different caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 51 +++++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7e092c69ee70b..d5b6777874e3c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6746,28 +6746,8 @@ static void btrfs_clone_write_end_io(struct bio *bio)
 	bio_put(bio);
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc,
-			      struct bio *orig_bio, int dev_nr, bool clone)
+static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
 {
-	struct btrfs_fs_info *fs_info = bioc->fs_info;
-	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
-	u64 physical = bioc->stripes[dev_nr].physical;
-	struct bio *bio;
-
-	if (clone) {
-		bio = bio_alloc_clone(NULL, orig_bio, GFP_NOFS, &fs_bio_set);
-		bio_inc_remaining(orig_bio);
-		bio->bi_end_io = btrfs_clone_write_end_io;
-	} else {
-		bio = orig_bio;
-		btrfs_bio(bio)->device = dev;
-		bio->bi_end_io = btrfs_end_bio;
-	}
-
-	bioc->stripes[dev_nr].bioc = bioc;
-	bio->bi_private = &bioc->stripes[dev_nr];
-	bio->bi_iter.bi_sector = physical >> 9;
-
 	if (!dev || !dev->bdev ||
 	    test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) ||
 	    (btrfs_op(bio) == BTRFS_MAP_WRITE &&
@@ -6783,8 +6763,11 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 	 * zone
 	 */
 	if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
+		u64 physical = bio->bi_iter.bi_sector << SECTOR_SHIFT;
+
 		if (btrfs_dev_is_sequential(dev, physical)) {
-			u64 zone_start = round_down(physical, fs_info->zone_size);
+			u64 zone_start = round_down(physical,
+						    dev->fs_info->zone_size);
 
 			bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT;
 		} else {
@@ -6792,7 +6775,7 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 			bio->bi_opf |= REQ_OP_WRITE;
 		}
 	}
-	btrfs_debug_in_rcu(fs_info,
+	btrfs_debug_in_rcu(dev->fs_info,
 	"%s: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u",
 		__func__, bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector,
 		(unsigned long)dev->bdev->bd_dev, rcu_str_deref(dev->name),
@@ -6802,6 +6785,28 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc,
 	submit_bio(bio);
 }
 
+static void submit_stripe_bio(struct btrfs_io_context *bioc,
+			      struct bio *orig_bio, int dev_nr, bool clone)
+{
+	struct bio *bio;
+
+	/* Reuse the bio embedded into the btrfs_bio for the last mirror */
+	if (!clone) {
+		bio = orig_bio;
+		btrfs_bio(bio)->device = bioc->stripes[dev_nr].dev;
+		bio->bi_end_io = btrfs_end_bio;
+	} else {
+		bio = bio_alloc_clone(NULL, orig_bio, GFP_NOFS, &fs_bio_set);
+		bio_inc_remaining(orig_bio);
+		bio->bi_end_io = btrfs_clone_write_end_io;
+	}
+
+	bio->bi_private = &bioc->stripes[dev_nr];
+	bio->bi_iter.bi_sector = bioc->stripes[dev_nr].physical >> SECTOR_SHIFT;
+	bioc->stripes[dev_nr].bioc = bioc;
+	btrfs_submit_dev_bio(bioc->stripes[dev_nr].dev, bio);
+}
+
 void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num)
 {
 	u64 logical = bio->bi_iter.bi_sector << 9;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 08/11] btrfs: split submit_stripe_bio Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  6:49   ` Johannes Thumshirn
  2022-07-14 14:30   ` Nikolay Borisov
  2022-07-13  6:13 ` [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional Christoph Hellwig
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

Remove the orig_bio argument as it can be derived from the bioc, and
the clone argument as it can be calculated from bioc and dev_nr.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/volumes.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d5b6777874e3c..73e538da31bc4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6785,13 +6785,12 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
 	submit_bio(bio);
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc,
-			      struct bio *orig_bio, int dev_nr, bool clone)
+static void submit_stripe_bio(struct btrfs_io_context *bioc, int dev_nr)
 {
-	struct bio *bio;
+	struct bio *orig_bio = bioc->orig_bio, *bio;
 
 	/* Reuse the bio embedded into the btrfs_bio for the last mirror */
-	if (!clone) {
+	if (dev_nr == bioc->num_stripes - 1) {
 		bio = orig_bio;
 		btrfs_bio(bio)->device = bioc->stripes[dev_nr].dev;
 		bio->bi_end_io = btrfs_end_bio;
@@ -6847,11 +6846,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 		BUG();
 	}
 
-	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
-		const bool should_clone = (dev_nr < total_devs - 1);
-
-		submit_stripe_bio(bioc, bio, dev_nr, should_clone);
-	}
+	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
+		submit_stripe_bio(bioc, dev_nr);
 }
 
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-07-13  9:58   ` Johannes Thumshirn
  2022-07-13  6:13 ` [PATCH 11/11] btrfs: stop allocation a btrfs_io_context for simple I/O Christoph Hellwig
  2022-09-06 13:12 ` btrfs I/O completion cleanup and single device I/O optimizations v2 David Sterba
  11 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

There is no need for most of the btrfs_io_context when doing I/O to a
single device.  To support such I/O without the extra btrfs_io_context
allocation, turn the mirror_num argument into a pointer so that it can
be used to output the selected mirror number, and add an optional
argument that points to a btrfs_io_stripe structure, which will be
filled with a single extent if provided by the caller.  In that case
the btrfs_io_context allocation can be skipped as all information for
the single device I/O is provided in the mirror_num argument and the
on-stack btrfs_io_stripe.  A caller that makes use of this new
argument will be added in the next commit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 60 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 46 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73e538da31bc4..6824634786c2c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -250,10 +250,10 @@ static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
 static void btrfs_dev_stat_print_on_error(struct btrfs_device *dev);
 static void btrfs_dev_stat_print_on_load(struct btrfs_device *device);
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
-			     enum btrfs_map_op op,
-			     u64 logical, u64 *length,
+			     enum btrfs_map_op op, u64 logical, u64 *length,
 			     struct btrfs_io_context **bioc_ret,
-			     int mirror_num, int need_raid_map);
+			     struct btrfs_io_stripe *smap,
+			     int *mirror_num_p, int need_raid_map);
 
 /*
  * Device locking
@@ -6088,7 +6088,7 @@ static int get_extra_mirror_from_replace(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	ret = __btrfs_map_block(fs_info, BTRFS_MAP_GET_READ_MIRRORS,
-				logical, &length, &bioc, 0, 0);
+				logical, &length, &bioc, NULL, NULL, 0);
 	if (ret) {
 		ASSERT(bioc == NULL);
 		return ret;
@@ -6347,11 +6347,19 @@ int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 	return 0;
 }
 
+static void set_stripe(struct btrfs_io_stripe *dst, struct map_lookup *map,
+		       u32 stripe_index, u64 stripe_offset, u64 stripe_nr)
+{
+	dst->dev = map->stripes[stripe_index].dev;
+	dst->physical = map->stripes[stripe_index].physical +
+			stripe_offset + stripe_nr * map->stripe_len;
+}
+
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
-			     enum btrfs_map_op op,
-			     u64 logical, u64 *length,
+			     enum btrfs_map_op op, u64 logical, u64 *length,
 			     struct btrfs_io_context **bioc_ret,
-			     int mirror_num, int need_raid_map)
+			     struct btrfs_io_stripe *smap,
+			     int *mirror_num_p, int need_raid_map)
 {
 	struct extent_map *em;
 	struct map_lookup *map;
@@ -6362,6 +6370,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	int data_stripes;
 	int i;
 	int ret = 0;
+	int mirror_num = mirror_num_p ? *mirror_num_p : 0;
 	int num_stripes;
 	int max_errors = 0;
 	int tgtdev_indexes = 0;
@@ -6522,6 +6531,29 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		tgtdev_indexes = num_stripes;
 	}
 
+	/*
+	 * If this I/O maps to a single device, try to return the device and
+	 * physical block information on the stack instead of allocating an
+	 * I/O context structure.
+	 */
+	if (smap && num_alloc_stripes == 1 &&
+	    !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
+	    (!need_full_stripe(op) || !dev_replace_is_ongoing ||
+	     !dev_replace->tgtdev)) {
+		if (unlikely(patch_the_first_stripe_for_dev_replace)) {
+			smap->dev = dev_replace->tgtdev;
+			smap->physical = physical_to_patch_in_first_stripe;
+			*mirror_num_p = map->num_stripes + 1;
+		} else {
+			set_stripe(smap, map, stripe_index, stripe_offset,
+				   stripe_nr);
+			*mirror_num_p = mirror_num;
+		}
+		*bioc_ret = NULL;
+		ret = 0;
+		goto out;
+	}
+
 	bioc = alloc_btrfs_io_context(fs_info, num_alloc_stripes, tgtdev_indexes);
 	if (!bioc) {
 		ret = -ENOMEM;
@@ -6529,9 +6561,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	}
 
 	for (i = 0; i < num_stripes; i++) {
-		bioc->stripes[i].physical = map->stripes[stripe_index].physical +
-			stripe_offset + stripe_nr * map->stripe_len;
-		bioc->stripes[i].dev = map->stripes[stripe_index].dev;
+		set_stripe(&bioc->stripes[i], map, stripe_index, stripe_offset,
+			   stripe_nr);
 		stripe_index++;
 	}
 
@@ -6599,7 +6630,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 		      struct btrfs_io_context **bioc_ret, int mirror_num)
 {
 	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
-				 mirror_num, 0);
+				 NULL, &mirror_num, 0);
 }
 
 /* For Scrub/replace */
@@ -6607,7 +6638,8 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 		     u64 logical, u64 *length,
 		     struct btrfs_io_context **bioc_ret)
 {
-	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret, 0, 1);
+	return __btrfs_map_block(fs_info, op, logical, length, bioc_ret,
+				 NULL, NULL, 1);
 }
 
 /*
@@ -6817,8 +6849,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	struct btrfs_io_context *bioc = NULL;
 
 	btrfs_bio_counter_inc_blocked(fs_info);
-	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical,
-				&map_length, &bioc, mirror_num, 1);
+	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
+				&bioc, NULL, &mirror_num, 1);
 	if (ret) {
 		btrfs_bio_counter_dec(fs_info);
 		btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/11] btrfs: stop allocation a btrfs_io_context for simple I/O
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional Christoph Hellwig
@ 2022-07-13  6:13 ` Christoph Hellwig
  2022-09-06 13:12 ` btrfs I/O completion cleanup and single device I/O optimizations v2 David Sterba
  11 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:13 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, Johannes Thumshirn, linux-btrfs

The I/O context structure is only used to pass the btrfs_device to
the end I/O handler for I/Os that go to a single device.

Stop allocating the I/O context for these cases by passing the optional
btrfs_io_stripe argument to __btrfs_map_block to query the mapping
information and then using a fast path submission and I/O completion
handler.  As the old btrfs_io_context based I/O submission path is
only used for mirrored writes, rename the functions to make that
clear and stop setting the btrfs_bio device and mirror_num field
that is only used for reads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/volumes.c | 94 ++++++++++++++++++++++++++++------------------
 1 file changed, 57 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 6824634786c2c..35164be4159d4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6703,11 +6703,12 @@ static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_FLUSH_ERRS);
 }
 
-static struct workqueue_struct *btrfs_end_io_wq(struct btrfs_io_context *bioc)
+static struct workqueue_struct *btrfs_end_io_wq(struct btrfs_fs_info *fs_info,
+						struct bio *bio)
 {
-	if (bioc->orig_bio->bi_opf & REQ_META)
-		return bioc->fs_info->endio_meta_workers;
-	return bioc->fs_info->endio_workers;
+	if (bio->bi_opf & REQ_META)
+		return fs_info->endio_meta_workers;
+	return fs_info->endio_workers;
 }
 
 static void btrfs_end_bio_work(struct work_struct *work)
@@ -6718,6 +6719,24 @@ static void btrfs_end_bio_work(struct work_struct *work)
 	bbio->end_io(bbio);
 }
 
+static void btrfs_simple_end_io(struct bio *bio)
+{
+	struct btrfs_fs_info *fs_info = bio->bi_private;
+	struct btrfs_bio *bbio = btrfs_bio(bio);
+
+	btrfs_bio_counter_dec(fs_info);
+
+	if (bio->bi_status)
+		btrfs_log_dev_io_error(bio, bbio->device);
+
+	if (bio_op(bio) == REQ_OP_READ) {
+		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
+		queue_work(btrfs_end_io_wq(fs_info, bio), &bbio->end_io_work);
+	} else {
+		bbio->end_io(bbio);
+	}
+}
+
 static void btrfs_raid56_end_io(struct bio *bio)
 {
 	struct btrfs_io_context *bioc = bio->bi_private;
@@ -6730,7 +6749,7 @@ static void btrfs_raid56_end_io(struct bio *bio)
 	btrfs_put_bioc(bioc);
 }
 
-static void btrfs_end_bio(struct bio *bio)
+static void btrfs_orig_write_end_io(struct bio *bio)
 {
 	struct btrfs_io_stripe *stripe = bio->bi_private;
 	struct btrfs_io_context *bioc = stripe->bioc;
@@ -6743,8 +6762,6 @@ static void btrfs_end_bio(struct bio *bio)
 		btrfs_log_dev_io_error(bio, stripe->dev);
 	}
 
-	bbio->mirror_num = bioc->mirror_num;
-
 	/*
 	 * Only send an error to the higher layers if it is beyond the tolerance
 	 * threshold.
@@ -6754,13 +6771,7 @@ static void btrfs_end_bio(struct bio *bio)
 	else
 		bio->bi_status = BLK_STS_OK;
 
-	if (btrfs_op(bio) == BTRFS_MAP_READ) {
-		INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work);
-		queue_work(btrfs_end_io_wq(bioc), &bbio->end_io_work);
-	} else {
-		bbio->end_io(bbio);
-	}
-
+	bbio->end_io(bbio);
 	btrfs_put_bioc(bioc);
 }
 
@@ -6817,15 +6828,16 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
 	submit_bio(bio);
 }
 
-static void submit_stripe_bio(struct btrfs_io_context *bioc, int dev_nr)
+static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr)
 {
 	struct bio *orig_bio = bioc->orig_bio, *bio;
 
+	ASSERT(bio_op(orig_bio) != REQ_OP_READ);
+
 	/* Reuse the bio embedded into the btrfs_bio for the last mirror */
 	if (dev_nr == bioc->num_stripes - 1) {
 		bio = orig_bio;
-		btrfs_bio(bio)->device = bioc->stripes[dev_nr].dev;
-		bio->bi_end_io = btrfs_end_bio;
+		bio->bi_end_io = btrfs_orig_write_end_io;
 	} else {
 		bio = bio_alloc_clone(NULL, orig_bio, GFP_NOFS, &fs_bio_set);
 		bio_inc_remaining(orig_bio);
@@ -6843,34 +6855,19 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 	u64 logical = bio->bi_iter.bi_sector << 9;
 	u64 length = bio->bi_iter.bi_size;
 	u64 map_length = length;
-	int ret;
-	int dev_nr;
-	int total_devs;
 	struct btrfs_io_context *bioc = NULL;
+	struct btrfs_io_stripe smap;
+	int ret;
 
 	btrfs_bio_counter_inc_blocked(fs_info);
 	ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length,
-				&bioc, NULL, &mirror_num, 1);
+				&bioc, &smap, &mirror_num, 1);
 	if (ret) {
 		btrfs_bio_counter_dec(fs_info);
 		btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret));
 		return;
 	}
 
-	total_devs = bioc->num_stripes;
-	bioc->orig_bio = bio;
-
-	if ((bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) &&
-	    ((btrfs_op(bio) == BTRFS_MAP_WRITE) || (mirror_num > 1))) {
-		bio->bi_private = bioc;
-		bio->bi_end_io = btrfs_raid56_end_io;
-		if (btrfs_op(bio) == BTRFS_MAP_WRITE)
-			raid56_parity_write(bio, bioc);
-		else
-			raid56_parity_recover(bio, bioc, mirror_num);
-		return;
-	}
-
 	if (map_length < length) {
 		btrfs_crit(fs_info,
 			   "mapping failed logical %llu bio len %llu len %llu",
@@ -6878,8 +6875,31 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
 		BUG();
 	}
 
-	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
-		submit_stripe_bio(bioc, dev_nr);
+	if (!bioc) {
+		/* Single mirror read/write fast path */
+		btrfs_bio(bio)->mirror_num = mirror_num;
+		btrfs_bio(bio)->device = smap.dev;
+		bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT;
+		bio->bi_private = fs_info;
+		bio->bi_end_io = btrfs_simple_end_io;
+		btrfs_submit_dev_bio(smap.dev, bio);
+	} else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+		/* Parity RAID write or read recovery */
+		bio->bi_private = bioc;
+		bio->bi_end_io = btrfs_raid56_end_io;
+		if (bio_op(bio) == REQ_OP_READ)
+			raid56_parity_recover(bio, bioc, mirror_num);
+		else
+			raid56_parity_write(bio, bioc);
+	} else {
+		/* Write to multiple mirrors */
+		int total_devs = bioc->num_stripes;
+		int dev_nr;
+
+		bioc->orig_bio = bio;
+		for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
+			btrfs_submit_mirrored_bio(bioc, dev_nr);
+	}
 }
 
 static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios
  2022-07-13  6:13 ` [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios Christoph Hellwig
@ 2022-07-13  6:27   ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:27 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] btrfs: remove bioc->stripes_pending
  2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
@ 2022-07-13  6:31   ` Johannes Thumshirn
  2022-07-13  6:41   ` Johannes Thumshirn
  2022-07-14 12:15   ` Nikolay Borisov
  2 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:31 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] btrfs: remove bioc->stripes_pending
  2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
  2022-07-13  6:31   ` Johannes Thumshirn
@ 2022-07-13  6:41   ` Johannes Thumshirn
  2022-07-14 12:15   ` Nikolay Borisov
  2 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:41 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

On 13.07.22 08:14, Christoph Hellwig wrote:
> +	/* Pass on cotrol to the original bio this one was cloned from */
          control ~^
Sorry for not noticing this earlier

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] btrfs: split submit_stripe_bio
  2022-07-13  6:13 ` [PATCH 08/11] btrfs: split submit_stripe_bio Christoph Hellwig
@ 2022-07-13  6:46   ` Johannes Thumshirn
  2022-07-13  6:50     ` Christoph Hellwig
  2022-07-14 14:02   ` Nikolay Borisov
  1 sibling, 1 reply; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:46 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

On 13.07.22 08:14, Christoph Hellwig wrote:
> +static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)>  {

Nit:	struct btrfs_fs_info *fs_info = dev->fs_info;

so we don't need dev->fs_info for the round_down() and 
btrfs_debug_in_rcu() calls.

> -	struct btrfs_fs_info *fs_info = bioc->fs_info;
> -	struct btrfs_device *dev = bioc->stripes[dev_nr].dev;
> -	u64 physical = bioc->stripes[dev_nr].physical;
> -	struct bio *bio;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler
  2022-07-13  6:13 ` [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler Christoph Hellwig
@ 2022-07-13  6:47   ` Johannes Thumshirn
  2022-07-13  6:51     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:47 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

On 13.07.22 08:14, Christoph Hellwig wrote:
> Currently btrfs_bio end I/O handling is a bit of a mess.  The bi_end_io
> handler and bi_private pointer of the embedded struct bio are both used
> to handle the completion of the high-level btrfs_bio and for the I/O
> completion for the low-level device that the embedded bio ends up being
> sent to.  To support this bi_end_io and bi_private are saved into the
> btrfs_io_context structure and then restored after the bio sent to the
> underlying device has completed the actual I/O.
> 
> Untangle this by adding an end I/O handler and private data to struct
> btrfs_bio for the highlevel btrfs_bio based completions, and leave the
> actual bio bi_end_io handler and bi_private pointer entirely to the
> low-level device I/O.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Tested-by: Nikolay Borisov <nborisov@suse.com>
> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


JFYI this one doesn't apply cleanly on today's misc-next.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention
  2022-07-13  6:13 ` [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention Christoph Hellwig
@ 2022-07-13  6:49   ` Johannes Thumshirn
  2022-07-14 14:30   ` Nikolay Borisov
  1 sibling, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:49 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] btrfs: split submit_stripe_bio
  2022-07-13  6:46   ` Johannes Thumshirn
@ 2022-07-13  6:50     ` Christoph Hellwig
  2022-07-13  6:52       ` Johannes Thumshirn
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Nikolay Borisov, linux-btrfs

On Wed, Jul 13, 2022 at 06:46:41AM +0000, Johannes Thumshirn wrote:
> On 13.07.22 08:14, Christoph Hellwig wrote:
> > +static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)>  {
> 
> Nit:	struct btrfs_fs_info *fs_info = dev->fs_info;
> 
> so we don't need dev->fs_info for the round_down() and 
> btrfs_debug_in_rcu() calls.

We can't assign it here because it is before the NULL check for dev,
and assigning it later just to save a second dereference does not
seem all that useful.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler
  2022-07-13  6:47   ` Johannes Thumshirn
@ 2022-07-13  6:51     ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13  6:51 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Nikolay Borisov, linux-btrfs

On Wed, Jul 13, 2022 at 06:47:18AM +0000, Johannes Thumshirn wrote:
> JFYI this one doesn't apply cleanly on today's misc-next.

Yes, as stated in the cover letter this is against the for-next
branch that has the compressed repair fixes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] btrfs: split submit_stripe_bio
  2022-07-13  6:50     ` Christoph Hellwig
@ 2022-07-13  6:52       ` Johannes Thumshirn
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  6:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Nikolay Borisov, linux-btrfs

On 13.07.22 08:51, Christoph Hellwig wrote:
> On Wed, Jul 13, 2022 at 06:46:41AM +0000, Johannes Thumshirn wrote:
>> On 13.07.22 08:14, Christoph Hellwig wrote:
>>> +static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)>  {
>>
>> Nit:	struct btrfs_fs_info *fs_info = dev->fs_info;
>>
>> so we don't need dev->fs_info for the round_down() and 
>> btrfs_debug_in_rcu() calls.
> 
> We can't assign it here because it is before the NULL check for dev,
> and assigning it later just to save a second dereference does not
> seem all that useful.
> 

Damn missed this, time for another coffee I guess

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional
  2022-07-13  6:13 ` [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional Christoph Hellwig
@ 2022-07-13  9:58   ` Johannes Thumshirn
  2022-07-13 11:18     ` Christoph Hellwig
  2022-07-15  8:30     ` Nikolay Borisov
  0 siblings, 2 replies; 30+ messages in thread
From: Johannes Thumshirn @ 2022-07-13  9:58 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Nikolay Borisov, linux-btrfs

On 13.07.22 08:14, Christoph Hellwig wrote:
> +	/*
> +	 * If this I/O maps to a single device, try to return the device and
> +	 * physical block information on the stack instead of allocating an
> +	 * I/O context structure.
> +	 */
> +	if (smap && num_alloc_stripes == 1 &&
> +	    !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
> +	    (!need_full_stripe(op) || !dev_replace_is_ongoing ||
> +	     !dev_replace->tgtdev)) {
> +		if (unlikely(patch_the_first_stripe_for_dev_replace)) {
> +			smap->dev = dev_replace->tgtdev;
> +			smap->physical = physical_to_patch_in_first_stripe;
> +			*mirror_num_p = map->num_stripes + 1;
> +		} else {
> +			set_stripe(smap, map, stripe_index, stripe_offset,
> +				   stripe_nr);
> +			*mirror_num_p = mirror_num;
> +		}
> +		*bioc_ret = NULL;
> +		ret = 0;
> +		goto out;
> +	}
> +


Could be my changes on top, but in order to get RAID0 with a raid-stripe-tree
working I needed the following change:

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a7886e421153..344d2cf941a7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6506,7 +6506,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
         * physical block information on the stack instead of allocating an
         * I/O context structure.
         */
-       if (smap && num_alloc_stripes == 1 &&
+       if (smap && num_alloc_stripes == 1 && !(map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) &&
            !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
            (!need_full_stripe(op) || !dev_replace_is_ongoing ||
             !dev_replace->tgtdev)) {


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional
  2022-07-13  9:58   ` Johannes Thumshirn
@ 2022-07-13 11:18     ` Christoph Hellwig
  2022-07-15  8:30     ` Nikolay Borisov
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2022-07-13 11:18 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Nikolay Borisov, linux-btrfs

On Wed, Jul 13, 2022 at 09:58:15AM +0000, Johannes Thumshirn wrote:
> Could be my changes on top, but in order to get RAID0 with a raid-stripe-tree
> working I needed the following change:

For the existing raid0 code we can one call to __btfs_map_block, so this
is not needed.  I think it should go into your series and be restricted
to declustered raid0 only.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] btrfs: remove bioc->stripes_pending
  2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
  2022-07-13  6:31   ` Johannes Thumshirn
  2022-07-13  6:41   ` Johannes Thumshirn
@ 2022-07-14 12:15   ` Nikolay Borisov
  2 siblings, 0 replies; 30+ messages in thread
From: Nikolay Borisov @ 2022-07-14 12:15 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Johannes Thumshirn, linux-btrfs



On 13.07.22 г. 9:13 ч., Christoph Hellwig wrote:
> The stripes_pending in the btrfs_io_context counts number of inflight
> low-level bios for an upper btrfs_bio.  For reads this is generally
> one as reads are never cloned, while for writes we can trivially use
> the bio remaining mechanisms that is used for chained bios.
> 
> To be able to make use of that mechanism, split out a separate trivial
> end_io handler for the cloned bios that does a minimal amount of error
> tracking and which then calls bio_endio on the original bio to transfer
> control to that, with the remaining counter making sure it is completed
> last.  This then allows to merge btrfs_end_bioc into the original bio
> bi_end_io handler.  To make this all work all error handling needs to
> happen through the bi_end_io handler, which requires a small amount
> of reshuffling in submit_stripe_bio so that the bio is cloned already
> by the time the suitability of the device is checked.
> 
> This reduces the size of the btrfs_io_context and prepares splitting
> the btrfs_bio at the stripe boundary.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 08/11] btrfs: split submit_stripe_bio
  2022-07-13  6:13 ` [PATCH 08/11] btrfs: split submit_stripe_bio Christoph Hellwig
  2022-07-13  6:46   ` Johannes Thumshirn
@ 2022-07-14 14:02   ` Nikolay Borisov
  1 sibling, 0 replies; 30+ messages in thread
From: Nikolay Borisov @ 2022-07-14 14:02 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Johannes Thumshirn, linux-btrfs



On 13.07.22 г. 9:13 ч., Christoph Hellwig wrote:
> Split out a low-level btrfs_submit_dev_bio helper that just submits
> the bio without any cloning decisions or setting up the end I/O handler
> for future reuse by a different caller.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention
  2022-07-13  6:13 ` [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention Christoph Hellwig
  2022-07-13  6:49   ` Johannes Thumshirn
@ 2022-07-14 14:30   ` Nikolay Borisov
  1 sibling, 0 replies; 30+ messages in thread
From: Nikolay Borisov @ 2022-07-14 14:30 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba
  Cc: Johannes Thumshirn, linux-btrfs



On 13.07.22 г. 9:13 ч., Christoph Hellwig wrote:
> Remove the orig_bio argument as it can be derived from the bioc, and
> the clone argument as it can be calculated from bioc and dev_nr.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>   fs/btrfs/volumes.c | 14 +++++---------
>   1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d5b6777874e3c..73e538da31bc4 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6785,13 +6785,12 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio)
>   	submit_bio(bio);
>   }
>   
> -static void submit_stripe_bio(struct btrfs_io_context *bioc,
> -			      struct bio *orig_bio, int dev_nr, bool clone)
> +static void submit_stripe_bio(struct btrfs_io_context *bioc, int dev_nr)
>   {
> -	struct bio *bio;
> +	struct bio *orig_bio = bioc->orig_bio, *bio;
>   
>   	/* Reuse the bio embedded into the btrfs_bio for the last mirror */
> -	if (!clone) {
> +	if (dev_nr == bioc->num_stripes - 1) {

nit: I think this warrants a comment that the last bios being submitted 
is always the orig one.

>   		bio = orig_bio;
>   		btrfs_bio(bio)->device = bioc->stripes[dev_nr].dev;
>   		bio->bi_end_io = btrfs_end_bio;
> @@ -6847,11 +6846,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror
>   		BUG();
>   	}
>   
> -	for (dev_nr = 0; dev_nr < total_devs; dev_nr++) {
> -		const bool should_clone = (dev_nr < total_devs - 1);
> -
> -		submit_stripe_bio(bioc, bio, dev_nr, should_clone);
> -	}
> +	for (dev_nr = 0; dev_nr < total_devs; dev_nr++)
> +		submit_stripe_bio(bioc, dev_nr);
>   }
>   
>   static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional
  2022-07-13  9:58   ` Johannes Thumshirn
  2022-07-13 11:18     ` Christoph Hellwig
@ 2022-07-15  8:30     ` Nikolay Borisov
  1 sibling, 0 replies; 30+ messages in thread
From: Nikolay Borisov @ 2022-07-15  8:30 UTC (permalink / raw)
  To: Johannes Thumshirn, Christoph Hellwig, Chris Mason, Josef Bacik,
	David Sterba
  Cc: linux-btrfs



On 13.07.22 г. 12:58 ч., Johannes Thumshirn wrote:
> On 13.07.22 08:14, Christoph Hellwig wrote:
>> +	/*
>> +	 * If this I/O maps to a single device, try to return the device and
>> +	 * physical block information on the stack instead of allocating an
>> +	 * I/O context structure.
>> +	 */
>> +	if (smap && num_alloc_stripes == 1 &&
>> +	    !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
>> +	    (!need_full_stripe(op) || !dev_replace_is_ongoing ||
>> +	     !dev_replace->tgtdev)) {
>> +		if (unlikely(patch_the_first_stripe_for_dev_replace)) {
>> +			smap->dev = dev_replace->tgtdev;
>> +			smap->physical = physical_to_patch_in_first_stripe;
>> +			*mirror_num_p = map->num_stripes + 1;
>> +		} else {
>> +			set_stripe(smap, map, stripe_index, stripe_offset,
>> +				   stripe_nr);
>> +			*mirror_num_p = mirror_num;
>> +		}
>> +		*bioc_ret = NULL;
>> +		ret = 0;
>> +		goto out;
>> +	}
>> +
> 
> 
> Could be my changes on top, but in order to get RAID0 with a raid-stripe-tree
> working I needed the following change:
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a7886e421153..344d2cf941a7 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6506,7 +6506,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
>           * physical block information on the stack instead of allocating an
>           * I/O context structure.
>           */
> -       if (smap && num_alloc_stripes == 1 &&
> +       if (smap && num_alloc_stripes == 1 && !(map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) &&

nit: BLOCK_GROUP_STRIPE_MASK already contains BLOCK_GROUP_RAID56_MASK so 
BTRFS_BLOCK_GROUP_RAID56_MASK can be replaced. But I agree with hch that 
this change should come with your series.

>              !((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) && mirror_num > 1) &&
>              (!need_full_stripe(op) || !dev_replace_is_ongoing ||
>               !dev_replace->tgtdev)) {
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: btrfs I/O completion cleanup and single device I/O optimizations v2
  2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2022-07-13  6:13 ` [PATCH 11/11] btrfs: stop allocation a btrfs_io_context for simple I/O Christoph Hellwig
@ 2022-09-06 13:12 ` David Sterba
  2022-09-07  9:08   ` Christoph Hellwig
  11 siblings, 1 reply; 30+ messages in thread
From: David Sterba @ 2022-09-06 13:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chris Mason, Josef Bacik, David Sterba, Nikolay Borisov,
	Johannes Thumshirn, linux-btrfs

On Wed, Jul 13, 2022 at 08:13:48AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series cleans up the btrfs_bio API, most prominently by splitting
> the end_io handler for the highlevel bio from the low-level bio
> bi_end_io, which are really confusingly coupled in the current code.
> Once that is done it then optimizes the bio submission to not allocate
> a btrfs_io_context for I/Os tht just go to a single device.
> 
> This series sits on top of the for-next tree that has the "fix read
> repair on compressed extents v2" series included.  To make everyones life
> easier a git tree is also available:
> 
>     git://git.infradead.org/users/hch/misc.git btrfs-bio-api-cleanup
> 
> Gitweb:
> 
>     http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/btrfs-bio-api-cleanup
> 
> Note that I've picked up the reviews and tested tags from the last
> posting for the patches that are relatively unchainged or just split,
> so they might appear a little inconsistent.
> 
> Changes since v1:
>  - add two previously submitted patches skipped from an earlier
>    series
>  - merged one of those patches with one from this series
>  - split one of the patches in this series into three
>  - improve various commit logs

This patchset has been merged a few weeks back, I wanted to reply
regarding a few things. The changes touch critical code, there are
cleanups but doing functional changes but the changelogs are IMHO
insufficient.  The code is not touched often and the changelog is the
only thing left that's supposed to hold some code documentation,
explanations and descriptions of new logic. I'm not counting the obvious
and fairly straightfowrad changes.

Let me point out one example, the 5/11 "btrfs: remove
bioc->stripes_pending". The subject gives a false idea what's going on.
There was a member removed, yes but the whole logic regarding bios is
changed, it's now using the bio chaining, as mentioned in the first
paragraph at least. Other patches state in words what the code does, not
explaining why.

Each such patch requires a more thorough review and rewriting or
updating the changelog. I do that for free for irregular contributors,
otherwise I'd rather see we're on the same page.

I'm also worried when none of the Reviewed-by come with a comment
regarding the changelogs. Any code can be understood after enough time,
but we need to track that so we can refer to it in the future.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: btrfs I/O completion cleanup and single device I/O optimizations v2
  2022-09-06 13:12 ` btrfs I/O completion cleanup and single device I/O optimizations v2 David Sterba
@ 2022-09-07  9:08   ` Christoph Hellwig
  2022-09-09 12:34     ` David Sterba
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2022-09-07  9:08 UTC (permalink / raw)
  To: David Sterba
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba,
	Nikolay Borisov, Johannes Thumshirn, linux-btrfs

On Tue, Sep 06, 2022 at 03:12:01PM +0200, David Sterba wrote:
> Let me point out one example, the 5/11 "btrfs: remove
> bioc->stripes_pending". The subject gives a false idea what's going on.
> There was a member removed, yes but the whole logic regarding bios is
> changed, it's now using the bio chaining, as mentioned in the first
> paragraph at least. Other patches state in words what the code does, not
> explaining why.

Maybe I'm just too familiar with the block code to not agree with you,
as to me these changes are totally obvious.  But as I often point out:
if an experienced developers asks questions that seem obvious to me
my changelogs are not good enough.  But asking for that after the series
has been out for a month and a half is a bit silly.  Please comment
on these patches in a somewhat timely fashion if you think the
changelogs are not good, and I will update them in a timely fashion.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: btrfs I/O completion cleanup and single device I/O optimizations v2
  2022-09-07  9:08   ` Christoph Hellwig
@ 2022-09-09 12:34     ` David Sterba
  0 siblings, 0 replies; 30+ messages in thread
From: David Sterba @ 2022-09-09 12:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: David Sterba, Chris Mason, Josef Bacik, David Sterba,
	Nikolay Borisov, Johannes Thumshirn, linux-btrfs

On Wed, Sep 07, 2022 at 11:08:26AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 06, 2022 at 03:12:01PM +0200, David Sterba wrote:
> > Let me point out one example, the 5/11 "btrfs: remove
> > bioc->stripes_pending". The subject gives a false idea what's going on.
> > There was a member removed, yes but the whole logic regarding bios is
> > changed, it's now using the bio chaining, as mentioned in the first
> > paragraph at least. Other patches state in words what the code does, not
> > explaining why.
> 
> Maybe I'm just too familiar with the block code to not agree with you,
> as to me these changes are totally obvious.  But as I often point out:
> if an experienced developers asks questions that seem obvious to me
> my changelogs are not good enough.

I think that even if you as patch author see it obvious it's a good
thing to give some summary or hint for understanding. The changelog
quality varies, there are some stellar examples that explain lots of
the mechanics that probably just the patch author understands that well.
We want that for code documentation and to spread knowledge, ideally.

Speaking about block code, as it's an interface for btrfs, in a few
occasions I had to read some commits and got the impression that I'm not
supposed to know.

> But asking for that after the series
> has been out for a month and a half is a bit silly.  Please comment
> on these patches in a somewhat timely fashion if you think the
> changelogs are not good, and I will update them in a timely fashion.

Well, the summer development cycle is like that, I wanted to reply
earlier but it was not the most important thing I had to do at that
time. Some changes happen in git because that's what gets tested and
pushed as people depend on it.  The mail can wait.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-09-09 12:40 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13  6:13 btrfs I/O completion cleanup and single device I/O optimizations v2 Christoph Hellwig
2022-07-13  6:13 ` [PATCH 01/11] btrfs: don't call bioset_integrity_create for btrfs_bioset Christoph Hellwig
2022-07-13  6:13 ` [PATCH 02/11] btrfs: move btrfs_bio allocation to volumes.c Christoph Hellwig
2022-07-13  6:13 ` [PATCH 03/11] btrfs: pass the operation to btrfs_bio_alloc Christoph Hellwig
2022-07-13  6:13 ` [PATCH 04/11] btrfs: don't take a bio_counter reference for cloned bios Christoph Hellwig
2022-07-13  6:27   ` Johannes Thumshirn
2022-07-13  6:13 ` [PATCH 05/11] btrfs: remove bioc->stripes_pending Christoph Hellwig
2022-07-13  6:31   ` Johannes Thumshirn
2022-07-13  6:41   ` Johannes Thumshirn
2022-07-14 12:15   ` Nikolay Borisov
2022-07-13  6:13 ` [PATCH 06/11] btrfs: properly abstract the parity raid bio handling Christoph Hellwig
2022-07-13  6:13 ` [PATCH 07/11] btrfs: give struct btrfs_bio a real end_io handler Christoph Hellwig
2022-07-13  6:47   ` Johannes Thumshirn
2022-07-13  6:51     ` Christoph Hellwig
2022-07-13  6:13 ` [PATCH 08/11] btrfs: split submit_stripe_bio Christoph Hellwig
2022-07-13  6:46   ` Johannes Thumshirn
2022-07-13  6:50     ` Christoph Hellwig
2022-07-13  6:52       ` Johannes Thumshirn
2022-07-14 14:02   ` Nikolay Borisov
2022-07-13  6:13 ` [PATCH 09/11] btrfs: simplify the submit_stripe_bio calling convention Christoph Hellwig
2022-07-13  6:49   ` Johannes Thumshirn
2022-07-14 14:30   ` Nikolay Borisov
2022-07-13  6:13 ` [PATCH 10/11] btrfs: make the btrfs_io_context allocation in __btrfs_map_block optional Christoph Hellwig
2022-07-13  9:58   ` Johannes Thumshirn
2022-07-13 11:18     ` Christoph Hellwig
2022-07-15  8:30     ` Nikolay Borisov
2022-07-13  6:13 ` [PATCH 11/11] btrfs: stop allocation a btrfs_io_context for simple I/O Christoph Hellwig
2022-09-06 13:12 ` btrfs I/O completion cleanup and single device I/O optimizations v2 David Sterba
2022-09-07  9:08   ` Christoph Hellwig
2022-09-09 12:34     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.