All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] block layer patches for bcachefs
@ 2023-05-25 21:48 Kent Overstreet
  2023-05-25 21:48 ` [PATCH 1/7] block: Add some exports " Kent Overstreet
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe; +Cc: Kent Overstreet, linux-block, linux-fsdevel

Jens, here's the full series of block layer patches needed for bcachefs:

Some of these (added exports, zero_fill_bio_iter?) can probably go with
the bcachefs pull and I'm just including here for completeness. The main
ones are the bio_iter patches, and the __invalidate_super() patch.

The bio_iter series has a new documentation patch.

I would still like the __invalidate_super() patch to get some review
(from VFS people? unclear who owns this).

Thanks,
Kent

Kent Overstreet (7):
  block: Add some exports for bcachefs
  block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset
  block: Bring back zero_fill_bio_iter
  block: Rework bio_for_each_segment_all()
  block: Rework bio_for_each_folio_all()
  block: Add documentation for bio iterator macros
  block: Don't block on s_umount from __invalidate_super()

 block/bdev.c               |   2 +-
 block/bio.c                |  57 ++++++------
 block/blk-core.c           |   1 +
 block/blk-map.c            |  38 ++++----
 block/blk.h                |   1 -
 block/bounce.c             |  12 +--
 drivers/md/bcache/btree.c  |   8 +-
 drivers/md/dm-crypt.c      |  10 +-
 drivers/md/raid1.c         |   4 +-
 fs/btrfs/disk-io.c         |   4 +-
 fs/btrfs/extent_io.c       |  50 +++++-----
 fs/btrfs/raid56.c          |  14 +--
 fs/crypto/bio.c            |   9 +-
 fs/erofs/zdata.c           |   4 +-
 fs/ext4/page-io.c          |   8 +-
 fs/ext4/readpage.c         |   4 +-
 fs/f2fs/data.c             |  20 ++--
 fs/gfs2/lops.c             |  10 +-
 fs/gfs2/meta_io.c          |   8 +-
 fs/iomap/buffered-io.c     |  14 +--
 fs/mpage.c                 |   4 +-
 fs/squashfs/block.c        |  48 +++++-----
 fs/squashfs/lz4_wrapper.c  |  17 ++--
 fs/squashfs/lzo_wrapper.c  |  17 ++--
 fs/squashfs/xz_wrapper.c   |  19 ++--
 fs/squashfs/zlib_wrapper.c |  18 ++--
 fs/squashfs/zstd_wrapper.c |  19 ++--
 fs/super.c                 |  40 ++++++--
 fs/verity/verify.c         |   9 +-
 include/linux/bio.h        | 186 +++++++++++++++++++++++++------------
 include/linux/blkdev.h     |   1 +
 include/linux/bvec.h       |  70 ++++++++------
 include/linux/fs.h         |   1 +
 33 files changed, 429 insertions(+), 298 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] block: Add some exports for bcachefs
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-25 21:48 ` [PATCH 2/7] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe
  Cc: Kent Overstreet, linux-block, linux-fsdevel, Kent Overstreet

From: Kent Overstreet <kent.overstreet@gmail.com>

 - bio_set_pages_dirty(), bio_check_pages_dirty() - dio path
 - blk_status_to_str() - error messages
 - bio_add_folio() - this should definitely be exported for everyone,
   it's the modern version of bio_add_page()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
---
 block/bio.c            | 3 +++
 block/blk-core.c       | 1 +
 block/blk.h            | 1 -
 include/linux/blkdev.h | 1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/bio.c b/block/bio.c
index fd11614bba..1e75840d17 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1159,6 +1159,7 @@ bool bio_add_folio(struct bio *bio, struct folio *folio, size_t len,
 		return false;
 	return bio_add_page(bio, &folio->page, len, off) > 0;
 }
+EXPORT_SYMBOL(bio_add_folio);
 
 void __bio_release_pages(struct bio *bio, bool mark_dirty)
 {
@@ -1480,6 +1481,7 @@ void bio_set_pages_dirty(struct bio *bio)
 			set_page_dirty_lock(bvec->bv_page);
 	}
 }
+EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
 
 /*
  * bio_check_pages_dirty() will check that all the BIO's pages are still dirty.
@@ -1539,6 +1541,7 @@ void bio_check_pages_dirty(struct bio *bio)
 	spin_unlock_irqrestore(&bio_dirty_lock, flags);
 	schedule_work(&bio_dirty_work);
 }
+EXPORT_SYMBOL_GPL(bio_check_pages_dirty);
 
 static inline bool bio_remaining_done(struct bio *bio)
 {
diff --git a/block/blk-core.c b/block/blk-core.c
index 42926e6cb8..f19bcc684b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -205,6 +205,7 @@ const char *blk_status_to_str(blk_status_t status)
 		return "<null>";
 	return blk_errors[idx].name;
 }
+EXPORT_SYMBOL_GPL(blk_status_to_str);
 
 /**
  * blk_sync_queue - cancel any pending callbacks on a queue
diff --git a/block/blk.h b/block/blk.h
index cc4e8873df..cc04dc73e9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -259,7 +259,6 @@ static inline void blk_integrity_del(struct gendisk *disk)
 
 unsigned long blk_rq_timeout(unsigned long timeout);
 void blk_add_timer(struct request *req);
-const char *blk_status_to_str(blk_status_t status);
 
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 941304f174..7cac183112 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -867,6 +867,7 @@ extern const char *blk_op_str(enum req_op op);
 
 int blk_status_to_errno(blk_status_t status);
 blk_status_t errno_to_blk_status(int errno);
+const char *blk_status_to_str(blk_status_t status);
 
 /* only poll the hardware once, don't continue until a completion was found */
 #define BLK_POLL_ONESHOT		(1 << 0)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
  2023-05-25 21:48 ` [PATCH 1/7] block: Add some exports " Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-25 21:48 ` [PATCH 3/7] block: Bring back zero_fill_bio_iter Kent Overstreet
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe; +Cc: Kent Overstreet, linux-block, linux-fsdevel

bio_iov_iter_get_pages() trims the IO based on the block size of the
block device the IO will be issued to.

However, bcachefs is a multi device filesystem; when we're creating the
bio we don't yet know which block device the bio will be submitted to -
we have to handle the alignment checks elsewhere.

Thus this is needed to avoid a null ptr deref.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
---
 block/bio.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1e75840d17..e74a04ea14 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1245,7 +1245,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	struct page **pages = (struct page **)bv;
 	ssize_t size, left;
 	unsigned len, i = 0;
-	size_t offset, trim;
+	size_t offset;
 	int ret = 0;
 
 	/*
@@ -1274,10 +1274,12 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 
 	nr_pages = DIV_ROUND_UP(offset + size, PAGE_SIZE);
 
-	trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
-	iov_iter_revert(iter, trim);
+	if (bio->bi_bdev) {
+		size_t trim = size & (bdev_logical_block_size(bio->bi_bdev) - 1);
+		iov_iter_revert(iter, trim);
+		size -= trim;
+	}
 
-	size -= trim;
 	if (unlikely(!size)) {
 		ret = -EFAULT;
 		goto out;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] block: Bring back zero_fill_bio_iter
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
  2023-05-25 21:48 ` [PATCH 1/7] block: Add some exports " Kent Overstreet
  2023-05-25 21:48 ` [PATCH 2/7] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-25 21:48 ` [PATCH 4/7] block: Rework bio_for_each_segment_all() Kent Overstreet
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe
  Cc: Kent Overstreet, linux-block, linux-fsdevel, Kent Overstreet

From: Kent Overstreet <kent.overstreet@gmail.com>

This reverts the commit that deleted it; it's used by bcachefs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
---
 block/bio.c         | 6 +++---
 include/linux/bio.h | 7 ++++++-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index e74a04ea14..70b5c987bc 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -606,15 +606,15 @@ struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(bio_kmalloc);
 
-void zero_fill_bio(struct bio *bio)
+void zero_fill_bio_iter(struct bio *bio, struct bvec_iter start)
 {
 	struct bio_vec bv;
 	struct bvec_iter iter;
 
-	bio_for_each_segment(bv, bio, iter)
+	__bio_for_each_segment(bv, bio, iter, start)
 		memzero_bvec(&bv);
 }
-EXPORT_SYMBOL(zero_fill_bio);
+EXPORT_SYMBOL(zero_fill_bio_iter);
 
 /**
  * bio_truncate - truncate the bio to small size of @new_size
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d766be7152..3536f28c05 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -484,7 +484,12 @@ extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
 extern void bio_copy_data(struct bio *dst, struct bio *src);
 extern void bio_free_pages(struct bio *bio);
 void guard_bio_eod(struct bio *bio);
-void zero_fill_bio(struct bio *bio);
+void zero_fill_bio_iter(struct bio *bio, struct bvec_iter iter);
+
+static inline void zero_fill_bio(struct bio *bio)
+{
+	zero_fill_bio_iter(bio, bio->bi_iter);
+}
 
 static inline void bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] block: Rework bio_for_each_segment_all()
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
                   ` (2 preceding siblings ...)
  2023-05-25 21:48 ` [PATCH 3/7] block: Bring back zero_fill_bio_iter Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-25 21:48 ` [PATCH 5/7] block: Rework bio_for_each_folio_all() Kent Overstreet
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe
  Cc: Kent Overstreet, linux-block, linux-fsdevel, Ming Lei, Phillip Lougher

This patch reworks bio_for_each_segment_all() to be more inline with how
the other bio iterators work:

 - bio_iter_all_peek() now returns a synthesized bio_vec; we don't stash
   one in the iterator and pass a pointer to it - bad. This way makes it
   clearer what's a constructed value vs. a reference to something
   pre-existing, and it also will help with cleaning up and
   consolidating code with bio_for_each_folio_all().

 - We now provide bio_for_each_segment_all_continue(), for squashfs:
   this makes their code clearer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
---
 block/bio.c                | 38 ++++++++++++------------
 block/blk-map.c            | 38 ++++++++++++------------
 block/bounce.c             | 12 ++++----
 drivers/md/bcache/btree.c  |  8 ++---
 drivers/md/dm-crypt.c      | 10 +++----
 drivers/md/raid1.c         |  4 +--
 fs/btrfs/disk-io.c         |  4 +--
 fs/btrfs/extent_io.c       | 50 +++++++++++++++----------------
 fs/btrfs/raid56.c          | 14 ++++-----
 fs/erofs/zdata.c           |  4 +--
 fs/ext4/page-io.c          |  8 ++---
 fs/ext4/readpage.c         |  4 +--
 fs/f2fs/data.c             | 20 ++++++-------
 fs/gfs2/lops.c             | 10 +++----
 fs/gfs2/meta_io.c          |  8 ++---
 fs/mpage.c                 |  4 +--
 fs/squashfs/block.c        | 48 +++++++++++++++++-------------
 fs/squashfs/lz4_wrapper.c  | 17 ++++++-----
 fs/squashfs/lzo_wrapper.c  | 17 ++++++-----
 fs/squashfs/xz_wrapper.c   | 19 ++++++------
 fs/squashfs/zlib_wrapper.c | 18 ++++++-----
 fs/squashfs/zstd_wrapper.c | 19 ++++++------
 include/linux/bio.h        | 34 ++++++++++++++++-----
 include/linux/bvec.h       | 61 ++++++++++++++++++++++----------------
 24 files changed, 256 insertions(+), 213 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 70b5c987bc..f2845d4e47 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1163,13 +1163,13 @@ EXPORT_SYMBOL(bio_add_folio);
 
 void __bio_release_pages(struct bio *bio, bool mark_dirty)
 {
-	struct bvec_iter_all iter_all;
-	struct bio_vec *bvec;
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		if (mark_dirty && !PageCompound(bvec->bv_page))
-			set_page_dirty_lock(bvec->bv_page);
-		put_page(bvec->bv_page);
+	bio_for_each_segment_all(bvec, bio, iter) {
+		if (mark_dirty && !PageCompound(bvec.bv_page))
+			set_page_dirty_lock(bvec.bv_page);
+		put_page(bvec.bv_page);
 	}
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1436,11 +1436,11 @@ EXPORT_SYMBOL(bio_copy_data);
 
 void bio_free_pages(struct bio *bio)
 {
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all)
-		__free_page(bvec->bv_page);
+	bio_for_each_segment_all(bvec, bio, iter)
+		__free_page(bvec.bv_page);
 }
 EXPORT_SYMBOL(bio_free_pages);
 
@@ -1475,12 +1475,12 @@ EXPORT_SYMBOL(bio_free_pages);
  */
 void bio_set_pages_dirty(struct bio *bio)
 {
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		if (!PageCompound(bvec->bv_page))
-			set_page_dirty_lock(bvec->bv_page);
+	bio_for_each_segment_all(bvec, bio, iter) {
+		if (!PageCompound(bvec.bv_page))
+			set_page_dirty_lock(bvec.bv_page);
 	}
 }
 EXPORT_SYMBOL_GPL(bio_set_pages_dirty);
@@ -1524,12 +1524,12 @@ static void bio_dirty_fn(struct work_struct *work)
 
 void bio_check_pages_dirty(struct bio *bio)
 {
-	struct bio_vec *bvec;
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 	unsigned long flags;
-	struct bvec_iter_all iter_all;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		if (!PageDirty(bvec->bv_page) && !PageCompound(bvec->bv_page))
+	bio_for_each_segment_all(bvec, bio, iter) {
+		if (!PageDirty(bvec.bv_page) && !PageCompound(bvec.bv_page))
 			goto defer;
 	}
 
diff --git a/block/blk-map.c b/block/blk-map.c
index 9137d16cec..5774a9e467 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -46,21 +46,21 @@ static struct bio_map_data *bio_alloc_map_data(struct iov_iter *data,
  */
 static int bio_copy_from_iter(struct bio *bio, struct iov_iter *iter)
 {
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all bv_iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
+	bio_for_each_segment_all(bvec, bio, bv_iter) {
 		ssize_t ret;
 
-		ret = copy_page_from_iter(bvec->bv_page,
-					  bvec->bv_offset,
-					  bvec->bv_len,
+		ret = copy_page_from_iter(bvec.bv_page,
+					  bvec.bv_offset,
+					  bvec.bv_len,
 					  iter);
 
 		if (!iov_iter_count(iter))
 			break;
 
-		if (ret < bvec->bv_len)
+		if (ret < bvec.bv_len)
 			return -EFAULT;
 	}
 
@@ -77,21 +77,21 @@ static int bio_copy_from_iter(struct bio *bio, struct iov_iter *iter)
  */
 static int bio_copy_to_iter(struct bio *bio, struct iov_iter iter)
 {
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all bv_iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
+	bio_for_each_segment_all(bvec, bio, bv_iter) {
 		ssize_t ret;
 
-		ret = copy_page_to_iter(bvec->bv_page,
-					bvec->bv_offset,
-					bvec->bv_len,
+		ret = copy_page_to_iter(bvec.bv_page,
+					bvec.bv_offset,
+					bvec.bv_len,
 					&iter);
 
 		if (!iov_iter_count(&iter))
 			break;
 
-		if (ret < bvec->bv_len)
+		if (ret < bvec.bv_len)
 			return -EFAULT;
 	}
 
@@ -442,12 +442,12 @@ static void bio_copy_kern_endio(struct bio *bio)
 static void bio_copy_kern_endio_read(struct bio *bio)
 {
 	char *p = bio->bi_private;
-	struct bio_vec *bvec;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		memcpy_from_bvec(p, bvec);
-		p += bvec->bv_len;
+	bio_for_each_segment_all(bvec, bio, iter) {
+		memcpy_from_bvec(p, &bvec);
+		p += bvec.bv_len;
 	}
 
 	bio_copy_kern_endio(bio);
diff --git a/block/bounce.c b/block/bounce.c
index 7cfcb242f9..e701832d76 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -102,18 +102,18 @@ static void copy_to_high_bio_irq(struct bio *to, struct bio *from)
 static void bounce_end_io(struct bio *bio)
 {
 	struct bio *bio_orig = bio->bi_private;
-	struct bio_vec *bvec, orig_vec;
+	struct bio_vec bvec, orig_vec;
 	struct bvec_iter orig_iter = bio_orig->bi_iter;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all iter;
 
 	/*
 	 * free up bounce indirect pages used
 	 */
-	bio_for_each_segment_all(bvec, bio, iter_all) {
+	bio_for_each_segment_all(bvec, bio, iter) {
 		orig_vec = bio_iter_iovec(bio_orig, orig_iter);
-		if (bvec->bv_page != orig_vec.bv_page) {
-			dec_zone_page_state(bvec->bv_page, NR_BOUNCE);
-			mempool_free(bvec->bv_page, &page_pool);
+		if (bvec.bv_page != orig_vec.bv_page) {
+			dec_zone_page_state(bvec.bv_page, NR_BOUNCE);
+			mempool_free(bvec.bv_page, &page_pool);
 		}
 		bio_advance_iter(bio_orig, &orig_iter, orig_vec.bv_len);
 	}
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 147c493a98..98ce12b239 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -373,12 +373,12 @@ static void do_btree_node_write(struct btree *b)
 		       bset_sector_offset(&b->keys, i));
 
 	if (!bch_bio_alloc_pages(b->bio, __GFP_NOWARN|GFP_NOWAIT)) {
-		struct bio_vec *bv;
+		struct bio_vec bv;
 		void *addr = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
-		struct bvec_iter_all iter_all;
+		struct bvec_iter_all iter;
 
-		bio_for_each_segment_all(bv, b->bio, iter_all) {
-			memcpy(page_address(bv->bv_page), addr, PAGE_SIZE);
+		bio_for_each_segment_all(bv, b->bio, iter) {
+			memcpy(page_address(bv.bv_page), addr, PAGE_SIZE);
 			addr += PAGE_SIZE;
 		}
 
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3ba53dc3cc..166bb4fdb4 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1713,12 +1713,12 @@ static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size)
 
 static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone)
 {
-	struct bio_vec *bv;
-	struct bvec_iter_all iter_all;
+	struct bvec_iter_all iter;
+	struct bio_vec bv;
 
-	bio_for_each_segment_all(bv, clone, iter_all) {
-		BUG_ON(!bv->bv_page);
-		mempool_free(bv->bv_page, &cc->page_pool);
+	bio_for_each_segment_all(bv, clone, iter) {
+		BUG_ON(!bv.bv_page);
+		mempool_free(bv.bv_page, &cc->page_pool);
 	}
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 68a9e2d998..4f58cae37e 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2188,7 +2188,7 @@ static void process_checks(struct r1bio *r1_bio)
 		blk_status_t status = sbio->bi_status;
 		struct page **ppages = get_resync_pages(pbio)->pages;
 		struct page **spages = get_resync_pages(sbio)->pages;
-		struct bio_vec *bi;
+		struct bio_vec bi;
 		int page_len[RESYNC_PAGES] = { 0 };
 		struct bvec_iter_all iter_all;
 
@@ -2198,7 +2198,7 @@ static void process_checks(struct r1bio *r1_bio)
 		sbio->bi_status = 0;
 
 		bio_for_each_segment_all(bi, sbio, iter_all)
-			page_len[j++] = bi->bv_len;
+			page_len[j++] = bi.bv_len;
 
 		if (!status) {
 			for (j = vcnt; j-- ; ) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9e1596bb20..92b3396c15 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3804,12 +3804,12 @@ ALLOW_ERROR_INJECTION(open_ctree, ERRNO);
 static void btrfs_end_super_write(struct bio *bio)
 {
 	struct btrfs_device *device = bio->bi_private;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 	struct page *page;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		page = bvec->bv_page;
+		page = bvec.bv_page;
 
 		if (bio->bi_status) {
 			btrfs_warn_rl_in_rcu(device->fs_info,
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 40300e8e5f..5796c99ea1 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -581,34 +581,34 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio)
 {
 	struct bio *bio = &bbio->bio;
 	int error = blk_status_to_errno(bio->bi_status);
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	u64 start;
 	u64 end;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
 
 		/* Our read/write should always be sector aligned. */
-		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+		if (!IS_ALIGNED(bvec.bv_offset, sectorsize))
 			btrfs_err(fs_info,
 		"partial page write in btrfs with offset %u and length %u",
-				  bvec->bv_offset, bvec->bv_len);
-		else if (!IS_ALIGNED(bvec->bv_len, sectorsize))
+				  bvec.bv_offset, bvec.bv_len);
+		else if (!IS_ALIGNED(bvec.bv_len, sectorsize))
 			btrfs_info(fs_info,
 		"incomplete page write with offset %u and length %u",
-				   bvec->bv_offset, bvec->bv_len);
+				   bvec.bv_offset, bvec.bv_len);
 
-		start = page_offset(page) + bvec->bv_offset;
-		end = start + bvec->bv_len - 1;
+		start = page_offset(page) + bvec.bv_offset;
+		end = start + bvec.bv_len - 1;
 
 		end_extent_writepage(page, error, start, end);
 
-		btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len);
+		btrfs_page_clear_writeback(fs_info, page, start, bvec.bv_len);
 	}
 
 	bio_put(bio);
@@ -736,7 +736,7 @@ static struct extent_buffer *find_extent_buffer_readpage(
 static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 {
 	struct bio *bio = &bbio->bio;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct processed_extent processed = { 0 };
 	/*
 	 * The offset to the beginning of a bio, since one bio can never be
@@ -749,7 +749,7 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		bool uptodate = !bio->bi_status;
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		struct inode *inode = page->mapping->host;
 		struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 		const u32 sectorsize = fs_info->sectorsize;
@@ -769,19 +769,19 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 		 * for unaligned offsets, and an error if they don't add up to
 		 * a full sector.
 		 */
-		if (!IS_ALIGNED(bvec->bv_offset, sectorsize))
+		if (!IS_ALIGNED(bvec.bv_offset, sectorsize))
 			btrfs_err(fs_info,
 		"partial page read in btrfs with offset %u and length %u",
-				  bvec->bv_offset, bvec->bv_len);
-		else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+				  bvec.bv_offset, bvec.bv_len);
+		else if (!IS_ALIGNED(bvec.bv_offset + bvec.bv_len,
 				     sectorsize))
 			btrfs_info(fs_info,
 		"incomplete page read with offset %u and length %u",
-				   bvec->bv_offset, bvec->bv_len);
+				   bvec.bv_offset, bvec.bv_len);
 
-		start = page_offset(page) + bvec->bv_offset;
-		end = start + bvec->bv_len - 1;
-		len = bvec->bv_len;
+		start = page_offset(page) + bvec.bv_offset;
+		end = start + bvec.bv_len - 1;
+		len = bvec.bv_len;
 
 		mirror = bbio->mirror_num;
 		if (uptodate && !is_data_inode(inode) &&
@@ -1993,7 +1993,7 @@ static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
 {
 	struct bio *bio = &bbio->bio;
 	struct btrfs_fs_info *fs_info;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
@@ -2001,12 +2001,12 @@ static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
-		u64 bvec_start = page_offset(page) + bvec->bv_offset;
-		u64 bvec_end = bvec_start + bvec->bv_len - 1;
+		struct page *page = bvec.bv_page;
+		u64 bvec_start = page_offset(page) + bvec.bv_offset;
+		u64 bvec_end = bvec_start + bvec.bv_len - 1;
 		u64 cur_bytenr = bvec_start;
 
-		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
+		ASSERT(IS_ALIGNED(bvec.bv_len, fs_info->nodesize));
 
 		/* Iterate through all extent buffers in the range */
 		while (cur_bytenr <= bvec_end) {
@@ -2050,14 +2050,14 @@ static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
 static void end_bio_extent_buffer_writepage(struct btrfs_bio *bbio)
 {
 	struct bio *bio = &bbio->bio;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct extent_buffer *eb;
 	int done;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 
 		eb = (struct extent_buffer *)page->private;
 		BUG_ON(!eb);
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 642828c1b2..39d8101541 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1388,7 +1388,7 @@ static struct sector_ptr *find_stripe_sector(struct btrfs_raid_bio *rbio,
 static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
 {
 	const u32 sectorsize = rbio->bioc->fs_info->sectorsize;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -1397,9 +1397,9 @@ static void set_bio_pages_uptodate(struct btrfs_raid_bio *rbio, struct bio *bio)
 		struct sector_ptr *sector;
 		int pgoff;
 
-		for (pgoff = bvec->bv_offset; pgoff - bvec->bv_offset < bvec->bv_len;
+		for (pgoff = bvec.bv_offset; pgoff - bvec.bv_offset < bvec.bv_len;
 		     pgoff += sectorsize) {
-			sector = find_stripe_sector(rbio, bvec->bv_page, pgoff);
+			sector = find_stripe_sector(rbio, bvec.bv_page, pgoff);
 			ASSERT(sector);
 			if (sector)
 				sector->uptodate = 1;
@@ -1453,7 +1453,7 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
 {
 	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
 	int total_sector_nr = get_bio_sector_nr(rbio, bio);
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	/* No data csum for the whole stripe, no need to verify. */
@@ -1467,8 +1467,8 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
 	bio_for_each_segment_all(bvec, bio, iter_all) {
 		int bv_offset;
 
-		for (bv_offset = bvec->bv_offset;
-		     bv_offset < bvec->bv_offset + bvec->bv_len;
+		for (bv_offset = bvec.bv_offset;
+		     bv_offset < bvec.bv_offset + bvec.bv_len;
 		     bv_offset += fs_info->sectorsize, total_sector_nr++) {
 			u8 csum_buf[BTRFS_CSUM_SIZE];
 			u8 *expected_csum = rbio->csum_buf +
@@ -1479,7 +1479,7 @@ static void verify_bio_data_sectors(struct btrfs_raid_bio *rbio,
 			if (!test_bit(total_sector_nr, rbio->csum_bitmap))
 				continue;
 
-			ret = btrfs_check_sector_csum(fs_info, bvec->bv_page,
+			ret = btrfs_check_sector_csum(fs_info, bvec.bv_page,
 				bv_offset, csum_buf, expected_csum);
 			if (ret < 0)
 				set_bit(total_sector_nr, rbio->error_bitmap);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index f1708c77a9..1fd0f01d11 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1651,11 +1651,11 @@ static void z_erofs_decompressqueue_endio(struct bio *bio)
 {
 	struct z_erofs_decompressqueue *q = bio->bi_private;
 	blk_status_t err = bio->bi_status;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 
 		DBG_BUGON(PageUptodate(page));
 		DBG_BUGON(z_erofs_page_is_invalidated(page));
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1e4db96a04..81a1cc4518 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -99,15 +99,15 @@ static void buffer_io_error(struct buffer_head *bh)
 
 static void ext4_finish_bio(struct bio *bio)
 {
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		struct page *bounce_page = NULL;
 		struct buffer_head *bh, *head;
-		unsigned bio_start = bvec->bv_offset;
-		unsigned bio_end = bio_start + bvec->bv_len;
+		unsigned bio_start = bvec.bv_offset;
+		unsigned bio_end = bio_start + bvec.bv_len;
 		unsigned under_io = 0;
 		unsigned long flags;
 
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index c61dc8a7c0..ce42b3d5c9 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -69,11 +69,11 @@ struct bio_post_read_ctx {
 static void __read_end_io(struct bio *bio)
 {
 	struct page *page;
-	struct bio_vec *bv;
+	struct bio_vec bv;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, bio, iter_all) {
-		page = bv->bv_page;
+		page = bv.bv_page;
 
 		if (bio->bi_status)
 			ClearPageUptodate(page);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 06b552a0ab..e44bd8586f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -139,12 +139,12 @@ struct bio_post_read_ctx {
  */
 static void f2fs_finish_read_bio(struct bio *bio, bool in_task)
 {
-	struct bio_vec *bv;
+	struct bio_vec bv;
 	struct bvec_iter_all iter_all;
 	struct bio_post_read_ctx *ctx = bio->bi_private;
 
 	bio_for_each_segment_all(bv, bio, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bv.bv_page;
 
 		if (f2fs_is_compressed_page(page)) {
 			if (ctx && !ctx->decompression_attempted)
@@ -189,11 +189,11 @@ static void f2fs_verify_bio(struct work_struct *work)
 	 * as those were handled separately by f2fs_end_read_compressed_page().
 	 */
 	if (may_have_compressed_pages) {
-		struct bio_vec *bv;
+		struct bio_vec bv;
 		struct bvec_iter_all iter_all;
 
 		bio_for_each_segment_all(bv, bio, iter_all) {
-			struct page *page = bv->bv_page;
+			struct page *page = bv.bv_page;
 
 			if (!f2fs_is_compressed_page(page) &&
 			    !fsverity_verify_page(page)) {
@@ -241,13 +241,13 @@ static void f2fs_verify_and_finish_bio(struct bio *bio, bool in_task)
 static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx,
 		bool in_task)
 {
-	struct bio_vec *bv;
+	struct bio_vec bv;
 	struct bvec_iter_all iter_all;
 	bool all_compressed = true;
 	block_t blkaddr = ctx->fs_blkaddr;
 
 	bio_for_each_segment_all(bv, ctx->bio, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bv.bv_page;
 
 		if (f2fs_is_compressed_page(page))
 			f2fs_end_read_compressed_page(page, false, blkaddr,
@@ -327,7 +327,7 @@ static void f2fs_read_end_io(struct bio *bio)
 static void f2fs_write_end_io(struct bio *bio)
 {
 	struct f2fs_sb_info *sbi;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	iostat_update_and_unbind_ctx(bio);
@@ -337,7 +337,7 @@ static void f2fs_write_end_io(struct bio *bio)
 		bio->bi_status = BLK_STS_IOERR;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		enum count_type type = WB_DATA_TYPE(page);
 
 		if (page_private_dummy(page)) {
@@ -583,7 +583,7 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
 static bool __has_merged_page(struct bio *bio, struct inode *inode,
 						struct page *page, nid_t ino)
 {
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	if (!bio)
@@ -593,7 +593,7 @@ static bool __has_merged_page(struct bio *bio, struct inode *inode,
 		return true;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *target = bvec->bv_page;
+		struct page *target = bvec.bv_page;
 
 		if (fscrypt_is_bounce_page(target)) {
 			target = fscrypt_pagecache_page(target);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 1902413d5d..7f62fe8eb7 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -202,7 +202,7 @@ static void gfs2_end_log_write_bh(struct gfs2_sbd *sdp,
 static void gfs2_end_log_write(struct bio *bio)
 {
 	struct gfs2_sbd *sdp = bio->bi_private;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct page *page;
 	struct bvec_iter_all iter_all;
 
@@ -217,9 +217,9 @@ static void gfs2_end_log_write(struct bio *bio)
 	}
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		page = bvec->bv_page;
+		page = bvec.bv_page;
 		if (page_has_buffers(page))
-			gfs2_end_log_write_bh(sdp, bvec, bio->bi_status);
+			gfs2_end_log_write_bh(sdp, &bvec, bio->bi_status);
 		else
 			mempool_free(page, gfs2_page_pool);
 	}
@@ -395,11 +395,11 @@ static void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page)
 static void gfs2_end_log_read(struct bio *bio)
 {
 	struct page *page;
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		page = bvec->bv_page;
+		page = bvec.bv_page;
 		if (bio->bi_status) {
 			int err = blk_status_to_errno(bio->bi_status);
 
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 924361fa51..832572784e 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -193,15 +193,15 @@ struct buffer_head *gfs2_meta_new(struct gfs2_glock *gl, u64 blkno)
 
 static void gfs2_meta_read_endio(struct bio *bio)
 {
-	struct bio_vec *bvec;
+	struct bio_vec bvec;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
+		struct page *page = bvec.bv_page;
 		struct buffer_head *bh = page_buffers(page);
-		unsigned int len = bvec->bv_len;
+		unsigned int len = bvec.bv_len;
 
-		while (bh_offset(bh) < bvec->bv_offset)
+		while (bh_offset(bh) < bvec.bv_offset)
 			bh = bh->b_this_page;
 		do {
 			struct buffer_head *next = bh->b_this_page;
diff --git a/fs/mpage.c b/fs/mpage.c
index 22b9de5ddd..49505456ba 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -45,11 +45,11 @@
  */
 static void mpage_end_io(struct bio *bio)
 {
-	struct bio_vec *bv;
+	struct bio_vec bv;
 	struct bvec_iter_all iter_all;
 
 	bio_for_each_segment_all(bv, bio, iter_all) {
-		struct page *page = bv->bv_page;
+		struct page *page = bv.bv_page;
 		page_endio(page, bio_op(bio),
 			   blk_status_to_errno(bio->bi_status));
 	}
diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index bed3bb8b27..83e8b44518 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -35,30 +35,33 @@ static int copy_bio_to_actor(struct bio *bio,
 			     int offset, int req_length)
 {
 	void *actor_addr;
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 	int copied_bytes = 0;
 	int actor_offset = 0;
+	int bytes_to_copy;
 
 	squashfs_actor_nobuff(actor);
 	actor_addr = squashfs_first_page(actor);
 
-	if (WARN_ON_ONCE(!bio_next_segment(bio, &iter_all)))
-		return 0;
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
 
-	while (copied_bytes < req_length) {
-		int bytes_to_copy = min_t(int, bvec->bv_len - offset,
+	while (copied_bytes < req_length &&
+	       iter.idx < bio->bi_vcnt) {
+		bvec = bio_iter_all_peek(bio, &iter);
+
+		bytes_to_copy = min_t(int, bvec.bv_len,
 					  PAGE_SIZE - actor_offset);
 
 		bytes_to_copy = min_t(int, bytes_to_copy,
 				      req_length - copied_bytes);
 		if (!IS_ERR(actor_addr))
-			memcpy(actor_addr + actor_offset, bvec_virt(bvec) +
-					offset, bytes_to_copy);
+			memcpy(actor_addr + actor_offset, bvec_virt(&bvec),
+			       bytes_to_copy);
 
 		actor_offset += bytes_to_copy;
 		copied_bytes += bytes_to_copy;
-		offset += bytes_to_copy;
 
 		if (actor_offset >= PAGE_SIZE) {
 			actor_addr = squashfs_next_page(actor);
@@ -66,11 +69,8 @@ static int copy_bio_to_actor(struct bio *bio,
 				break;
 			actor_offset = 0;
 		}
-		if (offset >= bvec->bv_len) {
-			if (!bio_next_segment(bio, &iter_all))
-				break;
-			offset = 0;
-		}
+
+		bio_iter_all_advance(bio, &iter, bytes_to_copy);
 	}
 	squashfs_finish_page(actor);
 	return copied_bytes;
@@ -159,8 +159,10 @@ int squashfs_read_data(struct super_block *sb, u64 index, int length,
 		 * Metadata block.
 		 */
 		const u8 *data;
-		struct bvec_iter_all iter_all = {};
-		struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+		struct bvec_iter_all iter;
+		struct bio_vec bvec;
+
+		bvec_iter_all_init(&iter);
 
 		if (index + 2 > msblk->bytes_used) {
 			res = -EIO;
@@ -170,21 +172,25 @@ int squashfs_read_data(struct super_block *sb, u64 index, int length,
 		if (res)
 			goto out;
 
-		if (WARN_ON_ONCE(!bio_next_segment(bio, &iter_all))) {
+		bvec = bio_iter_all_peek(bio, &iter);
+
+		if (WARN_ON_ONCE(!bvec.bv_len)) {
 			res = -EIO;
 			goto out_free_bio;
 		}
 		/* Extract the length of the metadata block */
-		data = bvec_virt(bvec);
+		data = bvec_virt(&bvec);
 		length = data[offset];
-		if (offset < bvec->bv_len - 1) {
+		if (offset < bvec.bv_len - 1) {
 			length |= data[offset + 1] << 8;
 		} else {
-			if (WARN_ON_ONCE(!bio_next_segment(bio, &iter_all))) {
+			bio_iter_all_advance(bio, &iter, bvec.bv_len);
+
+			if (WARN_ON_ONCE(!bvec.bv_len)) {
 				res = -EIO;
 				goto out_free_bio;
 			}
-			data = bvec_virt(bvec);
+			data = bvec_virt(&bvec);
 			length |= data[0] << 8;
 		}
 		bio_free_pages(bio);
diff --git a/fs/squashfs/lz4_wrapper.c b/fs/squashfs/lz4_wrapper.c
index 49797729f1..bd0dd787d2 100644
--- a/fs/squashfs/lz4_wrapper.c
+++ b/fs/squashfs/lz4_wrapper.c
@@ -92,20 +92,23 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	struct bio *bio, int offset, int length,
 	struct squashfs_page_actor *output)
 {
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 	struct squashfs_lz4 *stream = strm;
 	void *buff = stream->input, *data;
 	int bytes = length, res;
 
-	while (bio_next_segment(bio, &iter_all)) {
-		int avail = min(bytes, ((int)bvec->bv_len) - offset);
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
 
-		data = bvec_virt(bvec);
-		memcpy(buff, data + offset, avail);
+	bio_for_each_segment_all_continue(bvec, bio, iter) {
+		unsigned avail = min_t(unsigned, bytes, bvec.bv_len);
+
+		memcpy(buff, bvec_virt(&bvec), avail);
 		buff += avail;
 		bytes -= avail;
-		offset = 0;
+		if (!bytes)
+			break;
 	}
 
 	res = LZ4_decompress_safe(stream->input, stream->output,
diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c
index d216aeefa8..bccfcfa12e 100644
--- a/fs/squashfs/lzo_wrapper.c
+++ b/fs/squashfs/lzo_wrapper.c
@@ -66,21 +66,24 @@ static int lzo_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	struct bio *bio, int offset, int length,
 	struct squashfs_page_actor *output)
 {
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
+	struct bio_vec bvec;
 	struct squashfs_lzo *stream = strm;
 	void *buff = stream->input, *data;
 	int bytes = length, res;
 	size_t out_len = output->length;
 
-	while (bio_next_segment(bio, &iter_all)) {
-		int avail = min(bytes, ((int)bvec->bv_len) - offset);
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
 
-		data = bvec_virt(bvec);
-		memcpy(buff, data + offset, avail);
+	bio_for_each_segment_all_continue(bvec, bio, iter) {
+		unsigned avail = min_t(unsigned, bytes, bvec.bv_len);
+
+		memcpy(buff, bvec_virt(&bvec), avail);
 		buff += avail;
 		bytes -= avail;
-		offset = 0;
+		if (!bytes)
+			break;
 	}
 
 	res = lzo1x_decompress_safe(stream->input, (size_t)length,
diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c
index 6c49481a2f..6cf0e11e3b 100644
--- a/fs/squashfs/xz_wrapper.c
+++ b/fs/squashfs/xz_wrapper.c
@@ -120,8 +120,7 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	struct bio *bio, int offset, int length,
 	struct squashfs_page_actor *output)
 {
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
 	int total = 0, error = 0;
 	struct squashfs_xz *stream = strm;
 
@@ -136,26 +135,28 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info *msblk, void *strm,
 		goto finish;
 	}
 
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
+
 	for (;;) {
 		enum xz_ret xz_err;
 
 		if (stream->buf.in_pos == stream->buf.in_size) {
-			const void *data;
-			int avail;
+			struct bio_vec bvec = bio_iter_all_peek(bio, &iter);
+			unsigned avail = min_t(unsigned, length, bvec.bv_len);
 
-			if (!bio_next_segment(bio, &iter_all)) {
+			if (iter.idx >= bio->bi_vcnt) {
 				/* XZ_STREAM_END must be reached. */
 				error = -EIO;
 				break;
 			}
 
-			avail = min(length, ((int)bvec->bv_len) - offset);
-			data = bvec_virt(bvec);
 			length -= avail;
-			stream->buf.in = data + offset;
+			stream->buf.in = bvec_virt(&bvec);
 			stream->buf.in_size = avail;
 			stream->buf.in_pos = 0;
-			offset = 0;
+
+			bio_iter_all_advance(bio, &iter, avail);
 		}
 
 		if (stream->buf.out_pos == stream->buf.out_size) {
diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c
index cbb7afe7bc..981ca5e410 100644
--- a/fs/squashfs/zlib_wrapper.c
+++ b/fs/squashfs/zlib_wrapper.c
@@ -53,8 +53,7 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	struct bio *bio, int offset, int length,
 	struct squashfs_page_actor *output)
 {
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
 	int zlib_init = 0, error = 0;
 	z_stream *stream = strm;
 
@@ -67,25 +66,28 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, void *strm,
 		goto finish;
 	}
 
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
+
 	for (;;) {
 		int zlib_err;
 
 		if (stream->avail_in == 0) {
-			const void *data;
+			struct bio_vec bvec = bio_iter_all_peek(bio, &iter);
 			int avail;
 
-			if (!bio_next_segment(bio, &iter_all)) {
+			if (iter.idx >= bio->bi_vcnt) {
 				/* Z_STREAM_END must be reached. */
 				error = -EIO;
 				break;
 			}
 
-			avail = min(length, ((int)bvec->bv_len) - offset);
-			data = bvec_virt(bvec);
+			avail = min_t(unsigned, length, bvec.bv_len);
 			length -= avail;
-			stream->next_in = data + offset;
+			stream->next_in = bvec_virt(&bvec);
 			stream->avail_in = avail;
-			offset = 0;
+
+			bio_iter_all_advance(bio, &iter, avail);
 		}
 
 		if (stream->avail_out == 0) {
diff --git a/fs/squashfs/zstd_wrapper.c b/fs/squashfs/zstd_wrapper.c
index 0e407c4d8b..658e5d462a 100644
--- a/fs/squashfs/zstd_wrapper.c
+++ b/fs/squashfs/zstd_wrapper.c
@@ -68,8 +68,7 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
 	int error = 0;
 	zstd_in_buffer in_buf = { NULL, 0, 0 };
 	zstd_out_buffer out_buf = { NULL, 0, 0 };
-	struct bvec_iter_all iter_all = {};
-	struct bio_vec *bvec = bvec_init_iter_all(&iter_all);
+	struct bvec_iter_all iter;
 
 	stream = zstd_init_dstream(wksp->window_size, wksp->mem, wksp->mem_size);
 
@@ -85,25 +84,27 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, void *strm,
 		goto finish;
 	}
 
+	bvec_iter_all_init(&iter);
+	bio_iter_all_advance(bio, &iter, offset);
+
 	for (;;) {
 		size_t zstd_err;
 
 		if (in_buf.pos == in_buf.size) {
-			const void *data;
-			int avail;
+			struct bio_vec bvec = bio_iter_all_peek(bio, &iter);
+			unsigned avail = min_t(unsigned, length, bvec.bv_len);
 
-			if (!bio_next_segment(bio, &iter_all)) {
+			if (iter.idx >= bio->bi_vcnt) {
 				error = -EIO;
 				break;
 			}
 
-			avail = min(length, ((int)bvec->bv_len) - offset);
-			data = bvec_virt(bvec);
 			length -= avail;
-			in_buf.src = data + offset;
+			in_buf.src = bvec_virt(&bvec);
 			in_buf.size = avail;
 			in_buf.pos = 0;
-			offset = 0;
+
+			bio_iter_all_advance(bio, &iter, avail);
 		}
 
 		if (out_buf.pos == out_buf.size) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 3536f28c05..f86c7190c3 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -78,22 +78,40 @@ static inline void *bio_data(struct bio *bio)
 	return NULL;
 }
 
-static inline bool bio_next_segment(const struct bio *bio,
-				    struct bvec_iter_all *iter)
+static inline struct bio_vec bio_iter_all_peek(const struct bio *bio,
+					       struct bvec_iter_all *iter)
 {
-	if (iter->idx >= bio->bi_vcnt)
-		return false;
+	if (WARN_ON(iter->idx >= bio->bi_vcnt))
+		return (struct bio_vec) { NULL };
 
-	bvec_advance(&bio->bi_io_vec[iter->idx], iter);
-	return true;
+	return bvec_iter_all_peek(bio->bi_io_vec, iter);
+}
+
+static inline void bio_iter_all_advance(const struct bio *bio,
+					struct bvec_iter_all *iter,
+					unsigned bytes)
+{
+	bvec_iter_all_advance(bio->bi_io_vec, iter, bytes);
+
+	WARN_ON(iter->idx > bio->bi_vcnt ||
+		(iter->idx == bio->bi_vcnt && iter->done));
 }
 
+#define bio_for_each_segment_all_continue(bvl, bio, iter)		\
+	for (;								\
+	     iter.idx < bio->bi_vcnt &&					\
+		((bvl = bio_iter_all_peek(bio, &iter)), true);		\
+	     bio_iter_all_advance((bio), &iter, bvl.bv_len))
+
 /*
  * drivers should _never_ use the all version - the bio may have been split
  * before it got to the driver and the driver won't own all of it
  */
-#define bio_for_each_segment_all(bvl, bio, iter) \
-	for (bvl = bvec_init_iter_all(&iter); bio_next_segment((bio), &iter); )
+#define bio_for_each_segment_all(bvl, bio, iter)			\
+	for (bvec_iter_all_init(&iter);					\
+	     iter.idx < (bio)->bi_vcnt &&				\
+		((bvl = bio_iter_all_peek((bio), &iter)), true);		\
+	     bio_iter_all_advance((bio), &iter, bvl.bv_len))
 
 static inline void bio_advance_iter(const struct bio *bio,
 				    struct bvec_iter *iter, unsigned int bytes)
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 555aae5448..635fb54143 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -85,12 +85,6 @@ struct bvec_iter {
 						   current bvec */
 } __packed;
 
-struct bvec_iter_all {
-	struct bio_vec	bv;
-	int		idx;
-	unsigned	done;
-};
-
 /*
  * various member access, note that bio_data should of course not be used
  * on highmem page vectors
@@ -184,7 +178,10 @@ static inline void bvec_iter_advance_single(const struct bio_vec *bv,
 		((bvl = bvec_iter_bvec((bio_vec), (iter))), 1);	\
 	     bvec_iter_advance_single((bio_vec), &(iter), (bvl).bv_len))
 
-/* for iterating one bio from start to end */
+/*
+ * bvec_iter_all: for advancing over a bio as it was originally created, but
+ * with the usual bio_for_each_segment interface - nonstandard, do not use:
+ */
 #define BVEC_ITER_ALL_INIT (struct bvec_iter)				\
 {									\
 	.bi_sector	= 0,						\
@@ -193,33 +190,45 @@ static inline void bvec_iter_advance_single(const struct bio_vec *bv,
 	.bi_bvec_done	= 0,						\
 }
 
-static inline struct bio_vec *bvec_init_iter_all(struct bvec_iter_all *iter_all)
+/*
+ * bvec_iter_all: for advancing over individual pages in a bio, as it was when
+ * it was first created:
+ */
+struct bvec_iter_all {
+	int		idx;
+	unsigned	done;
+};
+
+static inline void bvec_iter_all_init(struct bvec_iter_all *iter_all)
 {
 	iter_all->done = 0;
 	iter_all->idx = 0;
+}
 
-	return &iter_all->bv;
+static inline struct bio_vec bvec_iter_all_peek(const struct bio_vec *bvec,
+						struct bvec_iter_all *iter)
+{
+	struct bio_vec bv = bvec[iter->idx];
+
+	bv.bv_offset	+= iter->done;
+	bv.bv_len	-= iter->done;
+
+	bv.bv_page	+= bv.bv_offset >> PAGE_SHIFT;
+	bv.bv_offset	&= ~PAGE_MASK;
+	bv.bv_len	= min_t(unsigned, PAGE_SIZE - bv.bv_offset, bv.bv_len);
+
+	return bv;
 }
 
-static inline void bvec_advance(const struct bio_vec *bvec,
-				struct bvec_iter_all *iter_all)
+static inline void bvec_iter_all_advance(const struct bio_vec *bvec,
+					 struct bvec_iter_all *iter,
+					 unsigned bytes)
 {
-	struct bio_vec *bv = &iter_all->bv;
-
-	if (iter_all->done) {
-		bv->bv_page++;
-		bv->bv_offset = 0;
-	} else {
-		bv->bv_page = bvec->bv_page + (bvec->bv_offset >> PAGE_SHIFT);
-		bv->bv_offset = bvec->bv_offset & ~PAGE_MASK;
-	}
-	bv->bv_len = min_t(unsigned int, PAGE_SIZE - bv->bv_offset,
-			   bvec->bv_len - iter_all->done);
-	iter_all->done += bv->bv_len;
+	iter->done += bytes;
 
-	if (iter_all->done == bvec->bv_len) {
-		iter_all->idx++;
-		iter_all->done = 0;
+	while (iter->done && iter->done >= bvec[iter->idx].bv_len) {
+		iter->done -= bvec[iter->idx].bv_len;
+		iter->idx++;
 	}
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] block: Rework bio_for_each_folio_all()
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
                   ` (3 preceding siblings ...)
  2023-05-25 21:48 ` [PATCH 4/7] block: Rework bio_for_each_segment_all() Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-26  0:36   ` Dave Chinner
  2023-05-25 21:48 ` [PATCH 6/7] block: Add documentation for bio iterator macros Kent Overstreet
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe
  Cc: Kent Overstreet, linux-block, linux-fsdevel, Matthew Wilcox

This reimplements bio_for_each_folio_all() on top of the newly-reworked
bvec_iter_all, and since it's now trivial we also provide
bio_for_each_folio.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-block@vger.kernel.org
---
 fs/crypto/bio.c        |  9 +++--
 fs/iomap/buffered-io.c | 14 ++++---
 fs/verity/verify.c     |  9 +++--
 include/linux/bio.h    | 91 +++++++++++++++++++++---------------------
 include/linux/bvec.h   | 15 +++++--
 5 files changed, 75 insertions(+), 63 deletions(-)

diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index d57d0a020f..6469861add 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -30,11 +30,12 @@
  */
 bool fscrypt_decrypt_bio(struct bio *bio)
 {
-	struct folio_iter fi;
+	struct bvec_iter_all iter;
+	struct folio_vec fv;
 
-	bio_for_each_folio_all(fi, bio) {
-		int err = fscrypt_decrypt_pagecache_blocks(fi.folio, fi.length,
-							   fi.offset);
+	bio_for_each_folio_all(fv, bio, iter) {
+		int err = fscrypt_decrypt_pagecache_blocks(fv.fv_folio, fv.fv_len,
+							   fv.fv_offset);
 
 		if (err) {
 			bio->bi_status = errno_to_blk_status(err);
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 6f4c97a6d7..60661c87d5 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -187,10 +187,11 @@ static void iomap_finish_folio_read(struct folio *folio, size_t offset,
 static void iomap_read_end_io(struct bio *bio)
 {
 	int error = blk_status_to_errno(bio->bi_status);
-	struct folio_iter fi;
+	struct bvec_iter_all iter;
+	struct folio_vec fv;
 
-	bio_for_each_folio_all(fi, bio)
-		iomap_finish_folio_read(fi.folio, fi.offset, fi.length, error);
+	bio_for_each_folio_all(fv, bio, iter)
+		iomap_finish_folio_read(fv.fv_folio, fv.fv_offset, fv.fv_len, error);
 	bio_put(bio);
 }
 
@@ -1328,7 +1329,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 	u32 folio_count = 0;
 
 	for (bio = &ioend->io_inline_bio; bio; bio = next) {
-		struct folio_iter fi;
+		struct bvec_iter_all iter;
+		struct folio_vec fv;
 
 		/*
 		 * For the last bio, bi_private points to the ioend, so we
@@ -1340,8 +1342,8 @@ iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 			next = bio->bi_private;
 
 		/* walk all folios in bio, ending page IO on them */
-		bio_for_each_folio_all(fi, bio) {
-			iomap_finish_folio_write(inode, fi.folio, fi.length,
+		bio_for_each_folio_all(fv, bio, iter) {
+			iomap_finish_folio_write(inode, fv.fv_folio, fv.fv_len,
 					error);
 			folio_count++;
 		}
diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index e250822275..b111ab0102 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -340,7 +340,8 @@ void fsverity_verify_bio(struct bio *bio)
 	struct inode *inode = bio_first_page_all(bio)->mapping->host;
 	struct fsverity_info *vi = inode->i_verity_info;
 	struct ahash_request *req;
-	struct folio_iter fi;
+	struct bvec_iter_all iter;
+	struct folio_vec fv;
 	unsigned long max_ra_pages = 0;
 
 	/* This allocation never fails, since it's mempool-backed. */
@@ -359,9 +360,9 @@ void fsverity_verify_bio(struct bio *bio)
 		max_ra_pages = bio->bi_iter.bi_size >> (PAGE_SHIFT + 2);
 	}
 
-	bio_for_each_folio_all(fi, bio) {
-		if (!verify_data_blocks(inode, vi, req, fi.folio, fi.length,
-					fi.offset, max_ra_pages)) {
+	bio_for_each_folio_all(fv, bio, iter) {
+		if (!verify_data_blocks(inode, vi, req, fv.fv_folio, fv.fv_len,
+					fv.fv_offset, max_ra_pages)) {
 			bio->bi_status = BLK_STS_IOERR;
 			break;
 		}
diff --git a/include/linux/bio.h b/include/linux/bio.h
index f86c7190c3..7ced281734 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -169,6 +169,42 @@ static inline void bio_advance(struct bio *bio, unsigned int nbytes)
 #define bio_for_each_segment(bvl, bio, iter)				\
 	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
+struct folio_vec {
+	struct folio	*fv_folio;
+	size_t		fv_offset;
+	size_t		fv_len;
+};
+
+static inline struct folio_vec biovec_to_foliovec(struct bio_vec bv)
+{
+
+	struct folio *folio	= page_folio(bv.bv_page);
+	size_t offset		= (folio_page_idx(folio, bv.bv_page) << PAGE_SHIFT) +
+		bv.bv_offset;
+	size_t len = min_t(size_t, folio_size(folio) - offset, bv.bv_len);
+
+	return (struct folio_vec) {
+		.fv_folio	= folio,
+		.fv_offset	= offset,
+		.fv_len		= len,
+	};
+}
+
+static inline struct folio_vec bio_iter_iovec_folio(struct bio *bio,
+						    struct bvec_iter iter)
+{
+	return biovec_to_foliovec(bio_iter_iovec(bio, iter));
+}
+
+#define __bio_for_each_folio(bvl, bio, iter, start)			\
+	for (iter = (start);						\
+	     (iter).bi_size &&						\
+		((bvl = bio_iter_iovec_folio((bio), (iter))), 1);	\
+	     bio_advance_iter_single((bio), &(iter), (bvl).fv_len))
+
+#define bio_for_each_folio(bvl, bio, iter)				\
+	__bio_for_each_folio(bvl, bio, iter, (bio)->bi_iter)
+
 #define __bio_for_each_bvec(bvl, bio, iter, start)		\
 	for (iter = (start);						\
 	     (iter).bi_size &&						\
@@ -277,59 +313,22 @@ static inline struct bio_vec *bio_last_bvec_all(struct bio *bio)
 	return &bio->bi_io_vec[bio->bi_vcnt - 1];
 }
 
-/**
- * struct folio_iter - State for iterating all folios in a bio.
- * @folio: The current folio we're iterating.  NULL after the last folio.
- * @offset: The byte offset within the current folio.
- * @length: The number of bytes in this iteration (will not cross folio
- *	boundary).
- */
-struct folio_iter {
-	struct folio *folio;
-	size_t offset;
-	size_t length;
-	/* private: for use by the iterator */
-	struct folio *_next;
-	size_t _seg_count;
-	int _i;
-};
-
-static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio,
-				   int i)
-{
-	struct bio_vec *bvec = bio_first_bvec_all(bio) + i;
-
-	fi->folio = page_folio(bvec->bv_page);
-	fi->offset = bvec->bv_offset +
-			PAGE_SIZE * (bvec->bv_page - &fi->folio->page);
-	fi->_seg_count = bvec->bv_len;
-	fi->length = min(folio_size(fi->folio) - fi->offset, fi->_seg_count);
-	fi->_next = folio_next(fi->folio);
-	fi->_i = i;
-}
-
-static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio)
+static inline struct folio_vec bio_folio_iter_all_peek(const struct bio *bio,
+						       const struct bvec_iter_all *iter)
 {
-	fi->_seg_count -= fi->length;
-	if (fi->_seg_count) {
-		fi->folio = fi->_next;
-		fi->offset = 0;
-		fi->length = min(folio_size(fi->folio), fi->_seg_count);
-		fi->_next = folio_next(fi->folio);
-	} else if (fi->_i + 1 < bio->bi_vcnt) {
-		bio_first_folio(fi, bio, fi->_i + 1);
-	} else {
-		fi->folio = NULL;
-	}
+	return biovec_to_foliovec(__bvec_iter_all_peek(bio->bi_io_vec, iter));
 }
 
 /**
  * bio_for_each_folio_all - Iterate over each folio in a bio.
- * @fi: struct folio_iter which is updated for each folio.
+ * @fi: struct bio_folio_iter_all which is updated for each folio.
  * @bio: struct bio to iterate over.
  */
-#define bio_for_each_folio_all(fi, bio)				\
-	for (bio_first_folio(&fi, bio, 0); fi.folio; bio_next_folio(&fi, bio))
+#define bio_for_each_folio_all(fv, bio, iter)				\
+	for (bvec_iter_all_init(&iter);					\
+	     iter.idx < bio->bi_vcnt &&					\
+		((fv = bio_folio_iter_all_peek(bio, &iter)), true);	\
+	     bio_iter_all_advance((bio), &iter, fv.fv_len))
 
 enum bip_flags {
 	BIP_BLOCK_INTEGRITY	= 1 << 0, /* block layer owns integrity data */
diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index 635fb54143..d238f959e3 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -205,18 +205,27 @@ static inline void bvec_iter_all_init(struct bvec_iter_all *iter_all)
 	iter_all->idx = 0;
 }
 
-static inline struct bio_vec bvec_iter_all_peek(const struct bio_vec *bvec,
-						struct bvec_iter_all *iter)
+static inline struct bio_vec __bvec_iter_all_peek(const struct bio_vec *bvec,
+						  const struct bvec_iter_all *iter)
 {
 	struct bio_vec bv = bvec[iter->idx];
 
+	BUG_ON(iter->done >= bv.bv_len);
+
 	bv.bv_offset	+= iter->done;
 	bv.bv_len	-= iter->done;
 
 	bv.bv_page	+= bv.bv_offset >> PAGE_SHIFT;
 	bv.bv_offset	&= ~PAGE_MASK;
-	bv.bv_len	= min_t(unsigned, PAGE_SIZE - bv.bv_offset, bv.bv_len);
+	return bv;
+}
+
+static inline struct bio_vec bvec_iter_all_peek(const struct bio_vec *bvec,
+						const struct bvec_iter_all *iter)
+{
+	struct bio_vec bv = __bvec_iter_all_peek(bvec, iter);
 
+	bv.bv_len = min_t(unsigned, PAGE_SIZE - bv.bv_offset, bv.bv_len);
 	return bv;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] block: Add documentation for bio iterator macros
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
                   ` (4 preceding siblings ...)
  2023-05-25 21:48 ` [PATCH 5/7] block: Rework bio_for_each_folio_all() Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-25 21:48 ` [PATCH 7/7] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
  2023-05-26 14:35 ` [PATCH 0/7] block layer patches for bcachefs Jens Axboe
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe; +Cc: Kent Overstreet, linux-block, linux-fsdevel, Ming Lei

We've now got 3x2 interfaces for iterating over bios: by page, by bvec,
or by folio, and variants that iterate over what bi_iter points to, or
the entire bio as created by the filesystem/originator.

This adds more detailed kerneldoc comments for each variant.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org
---
 include/linux/bio.h | 54 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7ced281734..e9d4d9e776 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -103,9 +103,14 @@ static inline void bio_iter_all_advance(const struct bio *bio,
 		((bvl = bio_iter_all_peek(bio, &iter)), true);		\
 	     bio_iter_all_advance((bio), &iter, bvl.bv_len))
 
-/*
- * drivers should _never_ use the all version - the bio may have been split
- * before it got to the driver and the driver won't own all of it
+/**
+ * bio_for_each_segment_all - iterate over single pages in a bio
+ *
+ * Like other _all versions, this is for the filesystem, or the owner/creator of
+ * a bio; it iterates over the original contents of a bio.
+ *
+ * Drivers that are working with bios that were submitted to them should not use
+ * the _all version.
  */
 #define bio_for_each_segment_all(bvl, bio, iter)			\
 	for (bvec_iter_all_init(&iter);					\
@@ -166,6 +171,13 @@ static inline void bio_advance(struct bio *bio, unsigned int nbytes)
 		((bvl = bio_iter_iovec((bio), (iter))), 1);		\
 	     bio_advance_iter_single((bio), &(iter), (bvl).bv_len))
 
+/**
+ * bio_for_each_segment - iterate over single pages in a bio
+ *
+ * Like other non-_all versions, this iterates over what bio->bi_iter currently
+ * points to. This version is for drivers, where the bio may have previously
+ * been split or cloned.
+ */
 #define bio_for_each_segment(bvl, bio, iter)				\
 	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
 
@@ -202,6 +214,13 @@ static inline struct folio_vec bio_iter_iovec_folio(struct bio *bio,
 		((bvl = bio_iter_iovec_folio((bio), (iter))), 1);	\
 	     bio_advance_iter_single((bio), &(iter), (bvl).fv_len))
 
+/**
+ * bio_for_each_folio - iterate over folios within a bio
+ *
+ * Like other non-_all versions, this iterates over what bio->bi_iter currently
+ * points to. This version is for drivers, where the bio may have previously
+ * been split or cloned.
+ */
 #define bio_for_each_folio(bvl, bio, iter)				\
 	__bio_for_each_folio(bvl, bio, iter, (bio)->bi_iter)
 
@@ -211,13 +230,30 @@ static inline struct folio_vec bio_iter_iovec_folio(struct bio *bio,
 		((bvl = mp_bvec_iter_bvec((bio)->bi_io_vec, (iter))), 1); \
 	     bio_advance_iter_single((bio), &(iter), (bvl).bv_len))
 
-/* iterate over multi-page bvec */
+/**
+ * bio_for_each_bvec - iterate over bvecs within a bio
+ *
+ * This version iterates over entire bio_vecs, which will be a range of
+ * contiguous pages.
+ *
+ * Like other non-_all versions, this iterates over what bio->bi_iter currently
+ * points to. This version is for drivers, where the bio may have previously
+ * been split or cloned.
+ */
 #define bio_for_each_bvec(bvl, bio, iter)			\
 	__bio_for_each_bvec(bvl, bio, iter, (bio)->bi_iter)
 
 /*
- * Iterate over all multi-page bvecs. Drivers shouldn't use this version for the
- * same reasons as bio_for_each_segment_all().
+ * bio_for_each_bvec_all - iterate over bvecs within a bio
+ *
+ * This version iterates over entire bio_vecs, which will be a range of
+ * contiguous pages.
+ *
+ * Like other _all versions, this is for the filesystem, or the owner/creator of
+ * a bio; it iterates over the original contents of a bio.
+ *
+ * Drivers that are working with bios that were submitted to them should not use
+ * the _all version.
  */
 #define bio_for_each_bvec_all(bvl, bio, i)		\
 	for (i = 0, bvl = bio_first_bvec_all(bio);	\
@@ -323,6 +359,12 @@ static inline struct folio_vec bio_folio_iter_all_peek(const struct bio *bio,
  * bio_for_each_folio_all - Iterate over each folio in a bio.
  * @fi: struct bio_folio_iter_all which is updated for each folio.
  * @bio: struct bio to iterate over.
+ *
+ * Like other _all versions, this is for the filesystem, or the owner/creator of
+ * a bio; it iterates over the original contents of a bio.
+ *
+ * Drivers that are working with bios that were submitted to them should not use
+ * the _all version.
  */
 #define bio_for_each_folio_all(fv, bio, iter)				\
 	for (bvec_iter_all_init(&iter);					\
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] block: Don't block on s_umount from __invalidate_super()
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
                   ` (5 preceding siblings ...)
  2023-05-25 21:48 ` [PATCH 6/7] block: Add documentation for bio iterator macros Kent Overstreet
@ 2023-05-25 21:48 ` Kent Overstreet
  2023-05-26 14:35 ` [PATCH 0/7] block layer patches for bcachefs Jens Axboe
  7 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-25 21:48 UTC (permalink / raw)
  To: linux-kernel, axboe
  Cc: Kent Overstreet, linux-block, linux-fsdevel, Alexander Viro,
	Christian Brauner

__invalidate_super() is used to flush any filesystem mounted on a
device, generally on some sort of media change event.

However, when unmounting a filesystem and closing the underlying block
devices, we can deadlock if the block driver then calls
__invalidate_device() (e.g. because the block device goes away when it
is no longer in use).

This happens with bcachefs on top of loopback, and can be triggered by
fstests generic/042:

  put_super
    -> blkdev_put
    -> lo_release
    -> disk_force_media_change
    -> __invalidate_device
    -> get_super

This isn't inherently specific to bcachefs - it hasn't shown up with
other filesystems before because most other filesystems use the sget()
mechanism for opening/closing block devices (and enforcing exclusion),
however sget() has its own downsides and weird/sketchy behaviour w.r.t.
block device open lifetime - if that ever gets fixed more code will run
into this issue.

The __invalidate_device() call here is really a best effort "I just
yanked the device for a mounted filesystem, please try not to lose my
data" - if it's ever actually needed the user has already done something
crazy, and we probably shouldn't make things worse by deadlocking.
Switching to a trylock seems in keeping with what the code is trying to
do.

If we ever get revoke() at the block layer, perhaps we would look at
rearchitecting to use that instead.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
---
 block/bdev.c       |  2 +-
 fs/super.c         | 40 +++++++++++++++++++++++++++++++---------
 include/linux/fs.h |  1 +
 3 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index 1795c7d4b9..743e969b7b 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -922,7 +922,7 @@ EXPORT_SYMBOL(lookup_bdev);
 
 int __invalidate_device(struct block_device *bdev, bool kill_dirty)
 {
-	struct super_block *sb = get_super(bdev);
+	struct super_block *sb = try_get_super(bdev);
 	int res = 0;
 
 	if (sb) {
diff --git a/fs/super.c b/fs/super.c
index 04bc62ab7d..a2decce02f 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -791,14 +791,7 @@ void iterate_supers_type(struct file_system_type *type,
 
 EXPORT_SYMBOL(iterate_supers_type);
 
-/**
- * get_super - get the superblock of a device
- * @bdev: device to get the superblock for
- *
- * Scans the superblock list and finds the superblock of the file system
- * mounted on the device given. %NULL is returned if no match is found.
- */
-struct super_block *get_super(struct block_device *bdev)
+static struct super_block *__get_super(struct block_device *bdev, bool try)
 {
 	struct super_block *sb;
 
@@ -813,7 +806,12 @@ struct super_block *get_super(struct block_device *bdev)
 		if (sb->s_bdev == bdev) {
 			sb->s_count++;
 			spin_unlock(&sb_lock);
-			down_read(&sb->s_umount);
+
+			if (!try)
+				down_read(&sb->s_umount);
+			else if (!down_read_trylock(&sb->s_umount))
+				return NULL;
+
 			/* still alive? */
 			if (sb->s_root && (sb->s_flags & SB_BORN))
 				return sb;
@@ -828,6 +826,30 @@ struct super_block *get_super(struct block_device *bdev)
 	return NULL;
 }
 
+/**
+ * get_super - get the superblock of a device
+ * @bdev: device to get the superblock for
+ *
+ * Scans the superblock list and finds the superblock of the file system
+ * mounted on the device given. %NULL is returned if no match is found.
+ */
+struct super_block *get_super(struct block_device *bdev)
+{
+	return __get_super(bdev, false);
+}
+
+/**
+ * try_get_super - get the superblock of a device, using trylock on sb->s_umount
+ * @bdev: device to get the superblock for
+ *
+ * Scans the superblock list and finds the superblock of the file system
+ * mounted on the device given. %NULL is returned if no match is found.
+ */
+struct super_block *try_get_super(struct block_device *bdev)
+{
+	return __get_super(bdev, true);
+}
+
 /**
  * get_active_super - get an active reference to the superblock of a device
  * @bdev: device to get the superblock for
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7..1a6f951942 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2878,6 +2878,7 @@ extern struct file_system_type *get_filesystem(struct file_system_type *fs);
 extern void put_filesystem(struct file_system_type *fs);
 extern struct file_system_type *get_fs_type(const char *name);
 extern struct super_block *get_super(struct block_device *);
+extern struct super_block *try_get_super(struct block_device *);
 extern struct super_block *get_active_super(struct block_device *bdev);
 extern void drop_super(struct super_block *sb);
 extern void drop_super_exclusive(struct super_block *sb);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] block: Rework bio_for_each_folio_all()
  2023-05-25 21:48 ` [PATCH 5/7] block: Rework bio_for_each_folio_all() Kent Overstreet
@ 2023-05-26  0:36   ` Dave Chinner
  2023-05-26  0:50     ` Kent Overstreet
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Chinner @ 2023-05-26  0:36 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: linux-kernel, axboe, linux-block, linux-fsdevel, Matthew Wilcox

On Thu, May 25, 2023 at 05:48:20PM -0400, Kent Overstreet wrote:
> This reimplements bio_for_each_folio_all() on top of the newly-reworked
> bvec_iter_all, and since it's now trivial we also provide
> bio_for_each_folio.
> 
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: linux-block@vger.kernel.org
> ---
>  fs/crypto/bio.c        |  9 +++--
>  fs/iomap/buffered-io.c | 14 ++++---
>  fs/verity/verify.c     |  9 +++--
>  include/linux/bio.h    | 91 +++++++++++++++++++++---------------------
>  include/linux/bvec.h   | 15 +++++--
>  5 files changed, 75 insertions(+), 63 deletions(-)
....
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index f86c7190c3..7ced281734 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -169,6 +169,42 @@ static inline void bio_advance(struct bio *bio, unsigned int nbytes)
>  #define bio_for_each_segment(bvl, bio, iter)				\
>  	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
>  
> +struct folio_vec {
> +	struct folio	*fv_folio;
> +	size_t		fv_offset;
> +	size_t		fv_len;
> +};

Can we drop the "fv_" variable prefix here? It's just unnecessary
verbosity when we know we have a folio_vec structure. i.e fv->folio
is easier to read and type than fv->fv_folio...

Hmmm, this is probably not a good name considering "struct pagevec" is
something completely different - the equivalent is "struct
folio_batch" but I can see this being confusing for people who
largely expect some symmetry between page<->folio naming
conventions...

Also, why is this in bio.h and not in a mm/folio related header
file?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] block: Rework bio_for_each_folio_all()
  2023-05-26  0:36   ` Dave Chinner
@ 2023-05-26  0:50     ` Kent Overstreet
  0 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-26  0:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, axboe, linux-block, linux-fsdevel, Matthew Wilcox

On Fri, May 26, 2023 at 10:36:03AM +1000, Dave Chinner wrote:
> On Thu, May 25, 2023 at 05:48:20PM -0400, Kent Overstreet wrote:
> > This reimplements bio_for_each_folio_all() on top of the newly-reworked
> > bvec_iter_all, and since it's now trivial we also provide
> > bio_for_each_folio.
> > 
> > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: linux-block@vger.kernel.org
> > ---
> >  fs/crypto/bio.c        |  9 +++--
> >  fs/iomap/buffered-io.c | 14 ++++---
> >  fs/verity/verify.c     |  9 +++--
> >  include/linux/bio.h    | 91 +++++++++++++++++++++---------------------
> >  include/linux/bvec.h   | 15 +++++--
> >  5 files changed, 75 insertions(+), 63 deletions(-)
> ....
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index f86c7190c3..7ced281734 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -169,6 +169,42 @@ static inline void bio_advance(struct bio *bio, unsigned int nbytes)
> >  #define bio_for_each_segment(bvl, bio, iter)				\
> >  	__bio_for_each_segment(bvl, bio, iter, (bio)->bi_iter)
> >  
> > +struct folio_vec {
> > +	struct folio	*fv_folio;
> > +	size_t		fv_offset;
> > +	size_t		fv_len;
> > +};
> 
> Can we drop the "fv_" variable prefix here? It's just unnecessary
> verbosity when we know we have a folio_vec structure. i.e fv->folio
> is easier to read and type than fv->fv_folio...

That's actually one of the things I like about bio/biovec, it's been
handy in the past for grepping and block layer refactorings...

(I would _kill_ for a tool that let me do that kind of type-aware grep.
ctags can in theory produce that kind of an index but I never figured
out how to get vim to use it properly. I believe the lsp-server stuff
that uses the compiler as a backend can do it; I've started using that
stuff for Rust coding and it works amazingly, don't think I've tried it
for struct members - I wonder if that stuff works at all on a codebase
the size of the kernel or just dies...)

> Hmmm, this is probably not a good name considering "struct pagevec" is
> something completely different - the equivalent is "struct
> folio_batch" but I can see this being confusing for people who
> largely expect some symmetry between page<->folio naming
> conventions...

Yeah, good point. folio_seg, perhaps?

(I think Matthew may have already made that suggestion...)

> Also, why is this in bio.h and not in a mm/folio related header
> file?

Is it worth moving it there considering it's only used in bio.h/bvec.h?
Perhaps we could keep it where it's used for now and move it if it gains
more users?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
                   ` (6 preceding siblings ...)
  2023-05-25 21:48 ` [PATCH 7/7] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
@ 2023-05-26 14:35 ` Jens Axboe
  2023-05-26 20:44   ` Kent Overstreet
  7 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2023-05-26 14:35 UTC (permalink / raw)
  To: Kent Overstreet, linux-kernel; +Cc: linux-block, linux-fsdevel

On 5/25/23 3:48 PM, Kent Overstreet wrote:
> Jens, here's the full series of block layer patches needed for bcachefs:
> 
> Some of these (added exports, zero_fill_bio_iter?) can probably go with
> the bcachefs pull and I'm just including here for completeness. The main
> ones are the bio_iter patches, and the __invalidate_super() patch.
> 
> The bio_iter series has a new documentation patch.
> 
> I would still like the __invalidate_super() patch to get some review
> (from VFS people? unclear who owns this).

I wanted to check the code generation for patches 4 and 5, but the
series doesn't seem to apply to current -git nor my for-6.5/block.
There's no base commit in this cover letter either, so what is this
against?

Please send one that applies to for-6.5/block so it's a bit easier
to take a closer look at this.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-26 14:35 ` [PATCH 0/7] block layer patches for bcachefs Jens Axboe
@ 2023-05-26 20:44   ` Kent Overstreet
  2023-05-30 14:22     ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2023-05-26 20:44 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block, linux-fsdevel

On Fri, May 26, 2023 at 08:35:23AM -0600, Jens Axboe wrote:
> On 5/25/23 3:48 PM, Kent Overstreet wrote:
> > Jens, here's the full series of block layer patches needed for bcachefs:
> > 
> > Some of these (added exports, zero_fill_bio_iter?) can probably go with
> > the bcachefs pull and I'm just including here for completeness. The main
> > ones are the bio_iter patches, and the __invalidate_super() patch.
> > 
> > The bio_iter series has a new documentation patch.
> > 
> > I would still like the __invalidate_super() patch to get some review
> > (from VFS people? unclear who owns this).
> 
> I wanted to check the code generation for patches 4 and 5, but the
> series doesn't seem to apply to current -git nor my for-6.5/block.
> There's no base commit in this cover letter either, so what is this
> against?
> 
> Please send one that applies to for-6.5/block so it's a bit easier
> to take a closer look at this.

Here you go:
git pull https://evilpiepirate.org/git/bcachefs.git block-for-bcachefs

changed folio_vec to folio_seg, build tested it and just pointed my CI
at it, results will be showing up for xfs at
https://evilpiepirate.org/~testdashboard/ci?branch=block-for-bcachefs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-26 20:44   ` Kent Overstreet
@ 2023-05-30 14:22     ` Jens Axboe
  2023-05-30 16:06       ` Kent Overstreet
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2023-05-30 14:22 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-block, linux-fsdevel

On 5/26/23 2:44?PM, Kent Overstreet wrote:
> On Fri, May 26, 2023 at 08:35:23AM -0600, Jens Axboe wrote:
>> On 5/25/23 3:48?PM, Kent Overstreet wrote:
>>> Jens, here's the full series of block layer patches needed for bcachefs:
>>>
>>> Some of these (added exports, zero_fill_bio_iter?) can probably go with
>>> the bcachefs pull and I'm just including here for completeness. The main
>>> ones are the bio_iter patches, and the __invalidate_super() patch.
>>>
>>> The bio_iter series has a new documentation patch.
>>>
>>> I would still like the __invalidate_super() patch to get some review
>>> (from VFS people? unclear who owns this).
>>
>> I wanted to check the code generation for patches 4 and 5, but the
>> series doesn't seem to apply to current -git nor my for-6.5/block.
>> There's no base commit in this cover letter either, so what is this
>> against?
>>
>> Please send one that applies to for-6.5/block so it's a bit easier
>> to take a closer look at this.
> 
> Here you go:
> git pull https://evilpiepirate.org/git/bcachefs.git block-for-bcachefs

Thanks

The re-exporting of helpers is somewhat odd - why is bcachefs special
here and needs these, while others do not?

But the main issue for me are the iterator changes, which mostly just
seems like unnecessary churn. What's the justification for these? The
commit messages don;t really have any. Doesn't seem like much of a
simplification, and in fact it's more code than before and obviously
more stack usage as well.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-30 14:22     ` Jens Axboe
@ 2023-05-30 16:06       ` Kent Overstreet
  2023-05-30 16:50         ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2023-05-30 16:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block, linux-fsdevel

On Tue, May 30, 2023 at 08:22:50AM -0600, Jens Axboe wrote:
> On 5/26/23 2:44?PM, Kent Overstreet wrote:
> > On Fri, May 26, 2023 at 08:35:23AM -0600, Jens Axboe wrote:
> >> On 5/25/23 3:48?PM, Kent Overstreet wrote:
> >>> Jens, here's the full series of block layer patches needed for bcachefs:
> >>>
> >>> Some of these (added exports, zero_fill_bio_iter?) can probably go with
> >>> the bcachefs pull and I'm just including here for completeness. The main
> >>> ones are the bio_iter patches, and the __invalidate_super() patch.
> >>>
> >>> The bio_iter series has a new documentation patch.
> >>>
> >>> I would still like the __invalidate_super() patch to get some review
> >>> (from VFS people? unclear who owns this).
> >>
> >> I wanted to check the code generation for patches 4 and 5, but the
> >> series doesn't seem to apply to current -git nor my for-6.5/block.
> >> There's no base commit in this cover letter either, so what is this
> >> against?
> >>
> >> Please send one that applies to for-6.5/block so it's a bit easier
> >> to take a closer look at this.
> > 
> > Here you go:
> > git pull https://evilpiepirate.org/git/bcachefs.git block-for-bcachefs
> 
> Thanks
> 
> The re-exporting of helpers is somewhat odd - why is bcachefs special
> here and needs these, while others do not?

It's not iomap based.

> But the main issue for me are the iterator changes, which mostly just
> seems like unnecessary churn. What's the justification for these? The
> commit messages don;t really have any. Doesn't seem like much of a
> simplification, and in fact it's more code than before and obviously
> more stack usage as well.

I need bio_for_each_folio().

The approach taken by the bcachefs IO paths is to first build up bios,
then walk the extents btree to determine where to send them, splitting
as needed.

For reading into the page cache we additionally need to initialize our
private state based on what we're reading from that says what's on disk
(unallocated, reservation, or normal allocation) and how many replicas.
This is used for both i_blocks accounting and for deciding when we need
to get a disk reservation. Since we're doing this post split, it needs
bio_for_each_folio, not the _all variant.

Yes, the iterator changes are a bit more code - but it's split up into
better helpers now, the pointer arithmetic before was a bit dense; I
found the result to be more readable. I'm surprised at more stack usage;
I would have expected _less_ for bio_for_each_page_all() since it gets
rid of a pointer into the bvec_iter_all. How did you measure that?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-30 16:06       ` Kent Overstreet
@ 2023-05-30 16:50         ` Jens Axboe
  2023-05-30 23:30           ` Kent Overstreet
  2023-06-04 23:38           ` Kent Overstreet
  0 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2023-05-30 16:50 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-block, linux-fsdevel

On 5/30/23 10:06?AM, Kent Overstreet wrote:
> On Tue, May 30, 2023 at 08:22:50AM -0600, Jens Axboe wrote:
>> On 5/26/23 2:44?PM, Kent Overstreet wrote:
>>> On Fri, May 26, 2023 at 08:35:23AM -0600, Jens Axboe wrote:
>>>> On 5/25/23 3:48?PM, Kent Overstreet wrote:
>>>>> Jens, here's the full series of block layer patches needed for bcachefs:
>>>>>
>>>>> Some of these (added exports, zero_fill_bio_iter?) can probably go with
>>>>> the bcachefs pull and I'm just including here for completeness. The main
>>>>> ones are the bio_iter patches, and the __invalidate_super() patch.
>>>>>
>>>>> The bio_iter series has a new documentation patch.
>>>>>
>>>>> I would still like the __invalidate_super() patch to get some review
>>>>> (from VFS people? unclear who owns this).
>>>>
>>>> I wanted to check the code generation for patches 4 and 5, but the
>>>> series doesn't seem to apply to current -git nor my for-6.5/block.
>>>> There's no base commit in this cover letter either, so what is this
>>>> against?
>>>>
>>>> Please send one that applies to for-6.5/block so it's a bit easier
>>>> to take a closer look at this.
>>>
>>> Here you go:
>>> git pull https://evilpiepirate.org/git/bcachefs.git block-for-bcachefs
>>
>> Thanks
>>
>> The re-exporting of helpers is somewhat odd - why is bcachefs special
>> here and needs these, while others do not?
> 
> It's not iomap based.
> 
>> But the main issue for me are the iterator changes, which mostly just
>> seems like unnecessary churn. What's the justification for these? The
>> commit messages don;t really have any. Doesn't seem like much of a
>> simplification, and in fact it's more code than before and obviously
>> more stack usage as well.
> 
> I need bio_for_each_folio().
> 
> The approach taken by the bcachefs IO paths is to first build up bios,
> then walk the extents btree to determine where to send them, splitting
> as needed.
> 
> For reading into the page cache we additionally need to initialize our
> private state based on what we're reading from that says what's on
> disk (unallocated, reservation, or normal allocation) and how many
> replicas. This is used for both i_blocks accounting and for deciding
> when we need to get a disk reservation. Since we're doing this post
> split, it needs bio_for_each_folio, not the _all variant.
> 
> Yes, the iterator changes are a bit more code - but it's split up into
> better helpers now, the pointer arithmetic before was a bit dense; I
> found the result to be more readable. I'm surprised at more stack
> usage; I would have expected _less_ for bio_for_each_page_all() since
> it gets rid of a pointer into the bvec_iter_all. How did you measure
> that?

Sorry typo, I meant text. Just checked stack and it looks identical, but
things like blk-map grows ~6% more text, and bio ~3%. Didn't check all
of them, but at least those two are consistent across x86-64 and
aarch64. Ditto on the data front. Need to take a closer look at where
exactly that is coming from, and what that looks like.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-30 16:50         ` Jens Axboe
@ 2023-05-30 23:30           ` Kent Overstreet
  2023-06-04 23:38           ` Kent Overstreet
  1 sibling, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-05-30 23:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block, linux-fsdevel

On Tue, May 30, 2023 at 10:50:55AM -0600, Jens Axboe wrote:
> Sorry typo, I meant text. Just checked stack and it looks identical, but
> things like blk-map grows ~6% more text, and bio ~3%. Didn't check all
> of them, but at least those two are consistent across x86-64 and
> aarch64. Ditto on the data front. Need to take a closer look at where
> exactly that is coming from, and what that looks like.

Weird - when I looked kernel text size was unchanged, it was data that
increased with the earlier version of the patch due to a new WARN_ON().
I'll have another look.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-05-30 16:50         ` Jens Axboe
  2023-05-30 23:30           ` Kent Overstreet
@ 2023-06-04 23:38           ` Kent Overstreet
  2023-06-05 16:49             ` Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Kent Overstreet @ 2023-06-04 23:38 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block, linux-fsdevel

On Tue, May 30, 2023 at 10:50:55AM -0600, Jens Axboe wrote:
> Sorry typo, I meant text. Just checked stack and it looks identical, but
> things like blk-map grows ~6% more text, and bio ~3%. Didn't check all
> of them, but at least those two are consistent across x86-64 and
> aarch64. Ditto on the data front. Need to take a closer look at where
> exactly that is coming from, and what that looks like.

A good chunk of that is because I added warnings and assertions for
e.g. running past the end of the bvec array. These bugs are rare and
shouldn't happen with normal iterator usage (e.g. the bio_for_each_*
macros), but I'd like to keep them as a debug mode thing.

But we don't yet have CONFIG_BLOCK_DEBUG - perhaps we should.

With those out, I see a code size decrease in bio.c, which makes sense -
gcc ought to be able to generate slightly better code when it's dealing
with pure values, provided everything is inlined and there's no aliasing
considerations.

Onto blk-map.c:

bio_copy_kern_endio_read() increases in code size, but if I change
memcpy_from_bvec() to take the bvec by val instead of by ref it's
basically the same code size. There's no disadvantage to changing
memcpy_from_bvec() to pass by val.

bio_copy_(to|from)_iter() is a wtf, though - gcc is now spilling the
constructed bvec to the stack; my best guess is it's a register pressure
thing (but we shouldn't be short registers here!).

So, since the fastpath stuff in bio.c gets smaller and blk-map.c is not
exactly fastpath stuff I'm not inclined to fight with gcc on this one -
let me know if that works for you.

Branch is updated - I split out the new assertions into a separate patch
that adds CONFIG_BLK_DEBUG, and another patch for mempcy_(to|from)_bio()
for a small code size decrease.

https://evilpiepirate.org/git/bcachefs.git/log/?h=block-for-bcachefs
or
git pull http://evilpiepirate.org/git/bcachefs.git block-for-bcachefs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-06-04 23:38           ` Kent Overstreet
@ 2023-06-05 16:49             ` Jens Axboe
  2023-06-05 21:26               ` Kent Overstreet
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2023-06-05 16:49 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: linux-kernel, linux-block, linux-fsdevel

On 6/4/23 5:38?PM, Kent Overstreet wrote:
> On Tue, May 30, 2023 at 10:50:55AM -0600, Jens Axboe wrote:
>> Sorry typo, I meant text. Just checked stack and it looks identical, but
>> things like blk-map grows ~6% more text, and bio ~3%. Didn't check all
>> of them, but at least those two are consistent across x86-64 and
>> aarch64. Ditto on the data front. Need to take a closer look at where
>> exactly that is coming from, and what that looks like.
> 
> A good chunk of that is because I added warnings and assertions for
> e.g. running past the end of the bvec array. These bugs are rare and
> shouldn't happen with normal iterator usage (e.g. the bio_for_each_*
> macros), but I'd like to keep them as a debug mode thing.
> 
> But we don't yet have CONFIG_BLOCK_DEBUG - perhaps we should.

Let's split those out then, especially as we don't have a BLOCK_DEBUG
option right now.

> With those out, I see a code size decrease in bio.c, which makes sense -
> gcc ought to be able to generate slightly better code when it's dealing
> with pure values, provided everything is inlined and there's no aliasing
> considerations.
> 
> Onto blk-map.c:
> 
> bio_copy_kern_endio_read() increases in code size, but if I change
> memcpy_from_bvec() to take the bvec by val instead of by ref it's
> basically the same code size. There's no disadvantage to changing
> memcpy_from_bvec() to pass by val.
> 
> bio_copy_(to|from)_iter() is a wtf, though - gcc is now spilling the
> constructed bvec to the stack; my best guess is it's a register pressure
> thing (but we shouldn't be short registers here!).
> 
> So, since the fastpath stuff in bio.c gets smaller and blk-map.c is not
> exactly fastpath stuff I'm not inclined to fight with gcc on this one -
> let me know if that works for you.
> 
> Branch is updated - I split out the new assertions into a separate patch
> that adds CONFIG_BLK_DEBUG, and another patch for mempcy_(to|from)_bio()
> for a small code size decrease.
> 
> https://evilpiepirate.org/git/bcachefs.git/log/?h=block-for-bcachefs
> or
> git pull http://evilpiepirate.org/git/bcachefs.git block-for-bcachefs

Cn you resend just the iterator changes in their current form? The
various re-exports are a separate discussion, I think we should focus on
the iterator bits first.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] block layer patches for bcachefs
  2023-06-05 16:49             ` Jens Axboe
@ 2023-06-05 21:26               ` Kent Overstreet
  0 siblings, 0 replies; 19+ messages in thread
From: Kent Overstreet @ 2023-06-05 21:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-block, linux-fsdevel

On Mon, Jun 05, 2023 at 10:49:37AM -0600, Jens Axboe wrote:
> On 6/4/23 5:38?PM, Kent Overstreet wrote:
> > On Tue, May 30, 2023 at 10:50:55AM -0600, Jens Axboe wrote:
> >> Sorry typo, I meant text. Just checked stack and it looks identical, but
> >> things like blk-map grows ~6% more text, and bio ~3%. Didn't check all
> >> of them, but at least those two are consistent across x86-64 and
> >> aarch64. Ditto on the data front. Need to take a closer look at where
> >> exactly that is coming from, and what that looks like.
> > 
> > A good chunk of that is because I added warnings and assertions for
> > e.g. running past the end of the bvec array. These bugs are rare and
> > shouldn't happen with normal iterator usage (e.g. the bio_for_each_*
> > macros), but I'd like to keep them as a debug mode thing.
> > 
> > But we don't yet have CONFIG_BLOCK_DEBUG - perhaps we should.
> 
> Let's split those out then, especially as we don't have a BLOCK_DEBUG
> option right now.

Already did that; there's a patch in the branch that adds
CONFIG_BLK_DEBUG with the new assertions.

> Cn you resend just the iterator changes in their current form? The
> various re-exports are a separate discussion, I think we should focus on
> the iterator bits first.

They're up in that branch with the iterator changes first now; I'll mail
them out too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-05 21:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 21:48 [PATCH 0/7] block layer patches for bcachefs Kent Overstreet
2023-05-25 21:48 ` [PATCH 1/7] block: Add some exports " Kent Overstreet
2023-05-25 21:48 ` [PATCH 2/7] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
2023-05-25 21:48 ` [PATCH 3/7] block: Bring back zero_fill_bio_iter Kent Overstreet
2023-05-25 21:48 ` [PATCH 4/7] block: Rework bio_for_each_segment_all() Kent Overstreet
2023-05-25 21:48 ` [PATCH 5/7] block: Rework bio_for_each_folio_all() Kent Overstreet
2023-05-26  0:36   ` Dave Chinner
2023-05-26  0:50     ` Kent Overstreet
2023-05-25 21:48 ` [PATCH 6/7] block: Add documentation for bio iterator macros Kent Overstreet
2023-05-25 21:48 ` [PATCH 7/7] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
2023-05-26 14:35 ` [PATCH 0/7] block layer patches for bcachefs Jens Axboe
2023-05-26 20:44   ` Kent Overstreet
2023-05-30 14:22     ` Jens Axboe
2023-05-30 16:06       ` Kent Overstreet
2023-05-30 16:50         ` Jens Axboe
2023-05-30 23:30           ` Kent Overstreet
2023-06-04 23:38           ` Kent Overstreet
2023-06-05 16:49             ` Jens Axboe
2023-06-05 21:26               ` Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.