All of lore.kernel.org
 help / color / mirror / Atom feed
* simplify extent_buffer reading and writing
@ 2023-03-09  9:05 Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
                   ` (19 more replies)
  0 siblings, 20 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Hi all,

currently reading and writing of extent_buffers is very complicated as it
tries to work in a page oriented way.  Switch as much as possible to work
based on the extent_buffer object to simplify the code.

I suspect in the long run switching to dedicated object based writeback
and reclaim similar to the XFS buffer cache would be a good idea, but as
that involves pretty big behavior changes that's better left for a
separate series.

Diffstat:
 compression.c |    4 
 compression.h |    2 
 disk-io.c     |  276 +++------------------------
 disk-io.h     |    5 
 extent_io.c   |  589 ++++++++++++++--------------------------------------------
 extent_io.h   |    3 
 6 files changed, 192 insertions(+), 687 deletions(-)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 01/20] btrfs: mark extent_buffer_under_io static
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 11:06   ` Johannes Thumshirn
  2023-03-10  7:26   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer Christoph Hellwig
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

extent_buffer_under_io is only used in extent_io.c, so mark it static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 2 +-
 fs/btrfs/extent_io.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1221f699ffc596..302af9b01bda2a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3420,7 +3420,7 @@ static void __free_extent_buffer(struct extent_buffer *eb)
 	kmem_cache_free(extent_buffer_cache, eb);
 }
 
-int extent_buffer_under_io(const struct extent_buffer *eb)
+static int extent_buffer_under_io(const struct extent_buffer *eb)
 {
 	return (atomic_read(&eb->io_pages) ||
 		test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4341ad978fb8e4..342412d37a7b4b 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -265,7 +265,6 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
 bool set_extent_buffer_dirty(struct extent_buffer *eb);
 void set_extent_buffer_uptodate(struct extent_buffer *eb);
 void clear_extent_buffer_uptodate(struct extent_buffer *eb);
-int extent_buffer_under_io(const struct extent_buffer *eb);
 void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
 void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 11:10   ` Johannes Thumshirn
  2023-03-10  7:27   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate Christoph Hellwig
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Setting the buffer uptodate in a function that is named as a validation
helper is a it confusing.  Move the call from validate_extent_buffer to
the one of its two callers that didn't already have a duplicate call
to set_extent_buffer_uptodate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bb864cf2eed60f..7d766eaef4aee7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -590,9 +590,7 @@ static int validate_extent_buffer(struct extent_buffer *eb,
 	if (found_level > 0 && btrfs_check_node(eb))
 		ret = -EIO;
 
-	if (!ret)
-		set_extent_buffer_uptodate(eb);
-	else
+	if (ret)
 		btrfs_err(fs_info,
 		"read time tree block corruption detected on logical %llu mirror %u",
 			  eb->start, eb->read_mirror);
@@ -684,6 +682,8 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
 		goto err;
 	}
 	ret = validate_extent_buffer(eb, &bbio->parent_check);
+	if (!ret)
+		set_extent_buffer_uptodate(eb);
 err:
 	if (ret) {
 		/*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 11:17   ` Johannes Thumshirn
  2023-03-10  7:28   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 04/20] btrfs: always read the entire extent_buffer Christoph Hellwig
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

verify_parent_transid is only called by btrfs_buffer_uptodate, which
confusingly inverts the return value.  Merge the two functions and
reflow the parent_transid so that error handling is in a branch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 46 +++++++++++++++-------------------------------
 1 file changed, 15 insertions(+), 31 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 7d766eaef4aee7..d03b431b07781c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -110,32 +110,33 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
  * detect blocks that either didn't get written at all or got written
  * in the wrong place.
  */
-static int verify_parent_transid(struct extent_io_tree *io_tree,
-				 struct extent_buffer *eb, u64 parent_transid,
-				 int atomic)
+int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid,
+			  int atomic)
 {
+	struct inode *btree_inode = eb->pages[0]->mapping->host;
+	struct extent_io_tree *io_tree = &BTRFS_I(btree_inode)->io_tree;
 	struct extent_state *cached_state = NULL;
-	int ret;
+	int ret = 1;
 
-	if (!parent_transid || btrfs_header_generation(eb) == parent_transid)
+	if (!extent_buffer_uptodate(eb))
 		return 0;
 
+	if (!parent_transid || btrfs_header_generation(eb) == parent_transid)
+		return 1;
+
 	if (atomic)
 		return -EAGAIN;
 
 	lock_extent(io_tree, eb->start, eb->start + eb->len - 1, &cached_state);
-	if (extent_buffer_uptodate(eb) &&
-	    btrfs_header_generation(eb) == parent_transid) {
-		ret = 0;
-		goto out;
-	}
-	btrfs_err_rl(eb->fs_info,
+	if (!extent_buffer_uptodate(eb) ||
+	    btrfs_header_generation(eb) != parent_transid) {
+		btrfs_err_rl(eb->fs_info,
 "parent transid verify failed on logical %llu mirror %u wanted %llu found %llu",
 			eb->start, eb->read_mirror,
 			parent_transid, btrfs_header_generation(eb));
-	ret = 1;
-	clear_extent_buffer_uptodate(eb);
-out:
+		clear_extent_buffer_uptodate(eb);
+		ret = 0;
+	}
 	unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
 		      &cached_state);
 	return ret;
@@ -4638,23 +4639,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
 	btrfs_close_devices(fs_info->fs_devices);
 }
 
-int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
-			  int atomic)
-{
-	int ret;
-	struct inode *btree_inode = buf->pages[0]->mapping->host;
-
-	ret = extent_buffer_uptodate(buf);
-	if (!ret)
-		return ret;
-
-	ret = verify_parent_transid(&BTRFS_I(btree_inode)->io_tree, buf,
-				    parent_transid, atomic);
-	if (ret == -EAGAIN)
-		return ret;
-	return !ret;
-}
-
 void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
 {
 	struct btrfs_fs_info *fs_info = buf->fs_info;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 11:29   ` Johannes Thumshirn
  2023-03-10  7:35   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 05/20] btrfs: simplify extent buffer reading Christoph Hellwig
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Currently read_extent_buffer_pages skips pages that are already uptodate
when reading in an extent_buffer.  While this reduces the amount of data
read, it increases the number of I/O operations as we now need to do
multiple I/Os when reading an extent buffer with one or more uptodate
pages in the middle of it.  On any modern storage device, be that hard
drives or SSDs this actually decreases I/O performance.  Fortunately
this case is pretty rare as the pages are always initially read together
and then aged the same way.  Besides simplifying the code a bit as-is
this will allow for major simplifications to the I/O completion handler
later on.

Note that the case where all pages are uptodate is still handled by an
optimized fast path that does not read any data from disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 302af9b01bda2a..26d8162bee000d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	int locked_pages = 0;
 	int all_uptodate = 1;
 	int num_pages;
-	unsigned long num_reads = 0;
 	struct btrfs_bio_ctrl bio_ctrl = {
 		.opf = REQ_OP_READ,
 		.mirror_num = mirror_num,
@@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	 */
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
-		if (!PageUptodate(page)) {
-			num_reads++;
+		if (!PageUptodate(page))
 			all_uptodate = 0;
-		}
 	}
 
 	if (all_uptodate) {
@@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 
 	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
 	eb->read_mirror = 0;
-	atomic_set(&eb->io_pages, num_reads);
+	atomic_set(&eb->io_pages, num_pages);
 	/*
 	 * It is possible for release_folio to clear the TREE_REF bit before we
 	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
@@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	for (i = 0; i < num_pages; i++) {
 		page = eb->pages[i];
 
-		if (!PageUptodate(page)) {
-			ClearPageError(page);
-			submit_extent_page(&bio_ctrl, page_offset(page), page,
-					   PAGE_SIZE, 0);
-		} else {
-			unlock_page(page);
-		}
+		ClearPageError(page);
+		submit_extent_page(&bio_ctrl, page_offset(page), page,
+				   PAGE_SIZE, 0);
 	}
 
 	submit_one_bio(&bio_ctrl);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 04/20] btrfs: always read the entire extent_buffer Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 11:59   ` Johannes Thumshirn
  2023-03-10  7:42   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read Christoph Hellwig
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The btrfs_bio_ctrl machinery is overkill for reading extent_buffers
as we always operate on PAGE SIZE chunks (or one smaller one for the
subpage case) that are contigous and are guaranteed to fit into a
single bio.  Replace it with open coded btrfs_bio_alloc, __bio_add_page
and btrfs_submit_bio calls in a helper function shared between
the subpage and node size >= PAGE_SIZE cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 99 ++++++++++++++++----------------------------
 1 file changed, 36 insertions(+), 63 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 26d8162bee000d..5169e73ffea647 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -98,22 +98,12 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
  */
 struct btrfs_bio_ctrl {
 	struct btrfs_bio *bbio;
-	int mirror_num;
 	enum btrfs_compression_type compress_type;
 	u32 len_to_oe_boundary;
 	blk_opf_t opf;
 	btrfs_bio_end_io_t end_io_func;
 	struct writeback_control *wbc;
 
-	/*
-	 * This is for metadata read, to provide the extra needed verification
-	 * info.  This has to be provided for submit_one_bio(), as
-	 * submit_one_bio() can submit a bio if it ends at stripe boundary.  If
-	 * no such parent_check is provided, the metadata can hit false alert at
-	 * endio time.
-	 */
-	struct btrfs_tree_parent_check *parent_check;
-
 	/*
 	 * Tell writepage not to lock the state bits for this range, it still
 	 * does the unlocking.
@@ -124,7 +114,6 @@ struct btrfs_bio_ctrl {
 static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 {
 	struct btrfs_bio *bbio = bio_ctrl->bbio;
-	int mirror_num = bio_ctrl->mirror_num;
 
 	if (!bbio)
 		return;
@@ -132,25 +121,14 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bbio->bio.bi_iter.bi_size);
 
-	if (!is_data_inode(&bbio->inode->vfs_inode)) {
-		if (btrfs_op(&bbio->bio) != BTRFS_MAP_WRITE) {
-			/*
-			 * For metadata read, we should have the parent_check,
-			 * and copy it to bbio for metadata verification.
-			 */
-			ASSERT(bio_ctrl->parent_check);
-			memcpy(&bbio->parent_check,
-			       bio_ctrl->parent_check,
-			       sizeof(struct btrfs_tree_parent_check));
-		}
+	if (!is_data_inode(&bbio->inode->vfs_inode))
 		bbio->bio.bi_opf |= REQ_META;
-	}
 
 	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
 	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
-		btrfs_submit_compressed_read(bbio, mirror_num);
+		btrfs_submit_compressed_read(bbio, 0);
 	else
-		btrfs_submit_bio(bbio, mirror_num);
+		btrfs_submit_bio(bbio, 0);
 
 	/* The bbio is owned by the end_io handler now */
 	bio_ctrl->bbio = NULL;
@@ -4241,6 +4219,36 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
+				       struct btrfs_tree_parent_check *check)
+{
+	int num_pages = num_extent_pages(eb), i;
+	struct btrfs_bio *bbio;
+
+	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	eb->read_mirror = 0;
+	atomic_set(&eb->io_pages, num_pages);
+	check_buffer_tree_ref(eb);
+
+	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
+			       REQ_OP_READ | REQ_META,
+			       BTRFS_I(eb->fs_info->btree_inode),
+			       end_bio_extent_readpage, NULL);
+	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
+	bbio->file_offset = eb->start;
+	memcpy(&bbio->parent_check, check, sizeof(*check));
+	if (eb->fs_info->nodesize < PAGE_SIZE) {
+		__bio_add_page(&bbio->bio, eb->pages[0], eb->len,
+			       eb->start - page_offset(eb->pages[0]));
+	} else {
+		for (i = 0; i < num_pages; i++) {
+			ClearPageError(eb->pages[i]);
+			__bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
+		}
+	}
+	btrfs_submit_bio(bbio, mirror_num);
+}
+
 static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 				      int mirror_num,
 				      struct btrfs_tree_parent_check *check)
@@ -4249,11 +4257,6 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 	struct extent_io_tree *io_tree;
 	struct page *page = eb->pages[0];
 	struct extent_state *cached_state = NULL;
-	struct btrfs_bio_ctrl bio_ctrl = {
-		.opf = REQ_OP_READ,
-		.mirror_num = mirror_num,
-		.parent_check = check,
-	};
 	int ret;
 
 	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
@@ -4281,18 +4284,10 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 		return 0;
 	}
 
-	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-	eb->read_mirror = 0;
-	atomic_set(&eb->io_pages, 1);
-	check_buffer_tree_ref(eb);
-	bio_ctrl.end_io_func = end_bio_extent_readpage;
-
 	btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len);
-
 	btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len);
-	submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
-			   eb->start - page_offset(page));
-	submit_one_bio(&bio_ctrl);
+
+	__read_extent_buffer_pages(eb, mirror_num, check);
 	if (wait != WAIT_COMPLETE) {
 		free_extent_state(cached_state);
 		return 0;
@@ -4313,11 +4308,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	int locked_pages = 0;
 	int all_uptodate = 1;
 	int num_pages;
-	struct btrfs_bio_ctrl bio_ctrl = {
-		.opf = REQ_OP_READ,
-		.mirror_num = mirror_num,
-		.parent_check = check,
-	};
 
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 		return 0;
@@ -4367,24 +4357,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 		goto unlock_exit;
 	}
 
-	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-	eb->read_mirror = 0;
-	atomic_set(&eb->io_pages, num_pages);
-	/*
-	 * It is possible for release_folio to clear the TREE_REF bit before we
-	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
-	 */
-	check_buffer_tree_ref(eb);
-	bio_ctrl.end_io_func = end_bio_extent_readpage;
-	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
-
-		ClearPageError(page);
-		submit_extent_page(&bio_ctrl, page_offset(page), page,
-				   PAGE_SIZE, 0);
-	}
-
-	submit_one_bio(&bio_ctrl);
+	__read_extent_buffer_pages(eb, mirror_num, check);
 
 	if (wait != WAIT_COMPLETE)
 		return 0;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 05/20] btrfs: simplify extent buffer reading Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 12:58   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Given that read recovery for data I/O is handled in the storage layer,
the mirror_num argument to btrfs_submit_compressed_read is always 0,
so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/compression.c | 4 ++--
 fs/btrfs/compression.h | 2 +-
 fs/btrfs/extent_io.c   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index c5839d04690d67..cc6bc0c13bbe12 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -499,7 +499,7 @@ static noinline int add_ra_bio_pages(struct inode *inode,
  * After the compressed pages are read, we copy the bytes into the
  * bio we were passed and then call the bio end_io calls
  */
-void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num)
+void btrfs_submit_compressed_read(struct btrfs_bio *bbio)
 {
 	struct btrfs_inode *inode = bbio->inode;
 	struct btrfs_fs_info *fs_info = inode->root->fs_info;
@@ -566,7 +566,7 @@ void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num)
 	if (memstall)
 		psi_memstall_leave(&pflags);
 
-	btrfs_submit_bio(&cb->bbio, mirror_num);
+	btrfs_submit_bio(&cb->bbio, 0);
 	return;
 
 out_free_compressed_pages:
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 5d5146e72a860b..8ba8e62b096061 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -94,7 +94,7 @@ void btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 				  blk_opf_t write_flags,
 				  struct cgroup_subsys_state *blkcg_css,
 				  bool writeback);
-void btrfs_submit_compressed_read(struct btrfs_bio *bbio, int mirror_num);
+void btrfs_submit_compressed_read(struct btrfs_bio *bbio);
 
 unsigned int btrfs_compress_str2level(unsigned int type, const char *str);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5169e73ffea647..d60a80572b8ba2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -126,7 +126,7 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 
 	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
 	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
-		btrfs_submit_compressed_read(bbio, 0);
+		btrfs_submit_compressed_read(bbio);
 	else
 		btrfs_submit_bio(bbio, 0);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 13:08   ` Johannes Thumshirn
                     ` (2 more replies)
  2023-03-09  9:05 ` [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads Christoph Hellwig
                   ` (12 subsequent siblings)
  19 siblings, 3 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Now that we always use a single bio to read an extent_buffer, the buffer
can be passed to the end_io handler as private data.  This allows
implementing a much simplified dedicated end I/O handler for metadata
reads.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c   | 105 +------------------------------------------
 fs/btrfs/disk-io.h   |   5 +--
 fs/btrfs/extent_io.c |  80 +++++++++++++++------------------
 3 files changed, 41 insertions(+), 149 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d03b431b07781c..6795acae476993 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -485,8 +485,8 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
 }
 
 /* Do basic extent buffer checks at read time */
-static int validate_extent_buffer(struct extent_buffer *eb,
-				  struct btrfs_tree_parent_check *check)
+int btrfs_validate_extent_buffer(struct extent_buffer *eb,
+				 struct btrfs_tree_parent_check *check)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	u64 found_start;
@@ -599,107 +599,6 @@ static int validate_extent_buffer(struct extent_buffer *eb,
 	return ret;
 }
 
-static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
-				   int mirror, struct btrfs_tree_parent_check *check)
-{
-	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
-	struct extent_buffer *eb;
-	bool reads_done;
-	int ret = 0;
-
-	ASSERT(check);
-
-	/*
-	 * We don't allow bio merge for subpage metadata read, so we should
-	 * only get one eb for each endio hook.
-	 */
-	ASSERT(end == start + fs_info->nodesize - 1);
-	ASSERT(PagePrivate(page));
-
-	eb = find_extent_buffer(fs_info, start);
-	/*
-	 * When we are reading one tree block, eb must have been inserted into
-	 * the radix tree. If not, something is wrong.
-	 */
-	ASSERT(eb);
-
-	reads_done = atomic_dec_and_test(&eb->io_pages);
-	/* Subpage read must finish in page read */
-	ASSERT(reads_done);
-
-	eb->read_mirror = mirror;
-	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-		ret = -EIO;
-		goto err;
-	}
-	ret = validate_extent_buffer(eb, check);
-	if (ret < 0)
-		goto err;
-
-	set_extent_buffer_uptodate(eb);
-
-	free_extent_buffer(eb);
-	return ret;
-err:
-	/*
-	 * end_bio_extent_readpage decrements io_pages in case of error,
-	 * make sure it has something to decrement.
-	 */
-	atomic_inc(&eb->io_pages);
-	clear_extent_buffer_uptodate(eb);
-	free_extent_buffer(eb);
-	return ret;
-}
-
-int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
-				   struct page *page, u64 start, u64 end,
-				   int mirror)
-{
-	struct extent_buffer *eb;
-	int ret = 0;
-	int reads_done;
-
-	ASSERT(page->private);
-
-	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
-		return validate_subpage_buffer(page, start, end, mirror,
-					       &bbio->parent_check);
-
-	eb = (struct extent_buffer *)page->private;
-
-	/*
-	 * The pending IO might have been the only thing that kept this buffer
-	 * in memory.  Make sure we have a ref for all this other checks
-	 */
-	atomic_inc(&eb->refs);
-
-	reads_done = atomic_dec_and_test(&eb->io_pages);
-	if (!reads_done)
-		goto err;
-
-	eb->read_mirror = mirror;
-	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
-		ret = -EIO;
-		goto err;
-	}
-	ret = validate_extent_buffer(eb, &bbio->parent_check);
-	if (!ret)
-		set_extent_buffer_uptodate(eb);
-err:
-	if (ret) {
-		/*
-		 * our io error hook is going to dec the io pages
-		 * again, we have to make sure it has something
-		 * to decrement
-		 */
-		atomic_inc(&eb->io_pages);
-		clear_extent_buffer_uptodate(eb);
-	}
-	free_extent_buffer(eb);
-
-	return ret;
-}
-
 #ifdef CONFIG_MIGRATION
 static int btree_migrate_folio(struct address_space *mapping,
 		struct folio *dst, struct folio *src, enum migrate_mode mode)
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 4d577233011023..2923b5d7cfca0b 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -84,9 +84,8 @@ void btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info);
 void btrfs_btree_balance_dirty_nodelay(struct btrfs_fs_info *fs_info);
 void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
 				 struct btrfs_root *root);
-int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
-				   struct page *page, u64 start, u64 end,
-				   int mirror);
+int btrfs_validate_extent_buffer(struct extent_buffer *eb,
+				 struct btrfs_tree_parent_check *check);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
 #endif
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d60a80572b8ba2..738fcf5cbc71d6 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -663,35 +663,6 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
 	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
 }
 
-/*
- * Find extent buffer for a givne bytenr.
- *
- * This is for end_bio_extent_readpage(), thus we can't do any unsafe locking
- * in endio context.
- */
-static struct extent_buffer *find_extent_buffer_readpage(
-		struct btrfs_fs_info *fs_info, struct page *page, u64 bytenr)
-{
-	struct extent_buffer *eb;
-
-	/*
-	 * For regular sectorsize, we can use page->private to grab extent
-	 * buffer
-	 */
-	if (fs_info->nodesize >= PAGE_SIZE) {
-		ASSERT(PagePrivate(page) && page->private);
-		return (struct extent_buffer *)page->private;
-	}
-
-	/* For subpage case, we need to lookup buffer radix tree */
-	rcu_read_lock();
-	eb = radix_tree_lookup(&fs_info->buffer_radix,
-			       bytenr >> fs_info->sectorsize_bits);
-	rcu_read_unlock();
-	ASSERT(eb);
-	return eb;
-}
-
 /*
  * after a readpage IO is done, we need to:
  * clear the uptodate bits on error
@@ -713,7 +684,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 	 * larger than UINT_MAX, u32 here is enough.
 	 */
 	u32 bio_offset = 0;
-	int mirror;
 	struct bvec_iter_all iter_all;
 
 	ASSERT(!bio_flagged(bio, BIO_CLONED));
@@ -753,11 +723,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 		end = start + bvec->bv_len - 1;
 		len = bvec->bv_len;
 
-		mirror = bbio->mirror_num;
-		if (uptodate && !is_data_inode(inode) &&
-		    btrfs_validate_metadata_buffer(bbio, page, start, end, mirror))
-			uptodate = false;
-
 		if (likely(uptodate)) {
 			loff_t i_size = i_size_read(inode);
 			pgoff_t end_index = i_size >> PAGE_SHIFT;
@@ -778,13 +743,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
 				zero_user_segment(page, zero_start,
 						  offset_in_page(end) + 1);
 			}
-		} else if (!is_data_inode(inode)) {
-			struct extent_buffer *eb;
-
-			eb = find_extent_buffer_readpage(fs_info, page, start);
-			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
-			eb->read_mirror = mirror;
-			atomic_dec(&eb->io_pages);
 		}
 
 		/* Update page status and unlock. */
@@ -4219,6 +4177,42 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 	}
 }
 
+static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
+{
+	struct extent_buffer *eb = bbio->private;
+	bool uptodate = !bbio->bio.bi_status;
+	struct bvec_iter_all iter_all;
+	struct bio_vec *bvec;
+	u32 bio_offset = 0;
+
+	atomic_inc(&eb->refs);
+	eb->read_mirror = bbio->mirror_num;
+
+	if (uptodate &&
+	    btrfs_validate_extent_buffer(eb, &bbio->parent_check) < 0)
+		uptodate = false;
+
+	if (uptodate) {
+		set_extent_buffer_uptodate(eb);
+	} else {
+		clear_extent_buffer_uptodate(eb);
+		set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
+	}
+
+	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
+		atomic_dec(&eb->io_pages);
+		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
+			      bvec->bv_len);
+		bio_offset += bvec->bv_len;
+	}
+
+	unlock_extent(&bbio->inode->io_tree, eb->start,
+		      eb->start + bio_offset - 1, NULL);
+	free_extent_buffer(eb);
+
+	bio_put(&bbio->bio);
+}
+
 static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
 				       struct btrfs_tree_parent_check *check)
 {
@@ -4233,7 +4227,7 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
 	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
 			       REQ_OP_READ | REQ_META,
 			       BTRFS_I(eb->fs_info->btree_inode),
-			       end_bio_extent_readpage, NULL);
+			       extent_buffer_read_end_io, eb);
 	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
 	bbio->file_offset = eb->start;
 	memcpy(&bbio->parent_check, check, sizeof(*check));
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 13:13   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io Christoph Hellwig
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Only subpage metadata reads lock the extent.  Don't try to unlock it and
waste cycles in the extent tree lookup for PAGE_SIZE or larger metadata.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 738fcf5cbc71d6..c9121aed06000b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4206,8 +4206,10 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 		bio_offset += bvec->bv_len;
 	}
 
-	unlock_extent(&bbio->inode->io_tree, eb->start,
-		      eb->start + bio_offset - 1, NULL);
+	if (eb->fs_info->nodesize < PAGE_SIZE) {
+		unlock_extent(&bbio->inode->io_tree, eb->start,
+			      eb->start + bio_offset - 1, NULL);
+	}
 	free_extent_buffer(eb);
 
 	bio_put(&bbio->bio);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 13:17   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer Christoph Hellwig
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

lock_extent_buffer_for_io never returns a negative error value, so switch
the return value to a simple bool.  Also remove the noinline_for_stack
annotation given that nothing in lock_extent_buffer_for_io or its callers
is particularly stack hungry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 39 ++++++++++++---------------------------
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9121aed06000b..1e8670d7fc2e71 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1628,18 +1628,17 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
  *
  * May try to flush write bio if we can't get the lock.
  *
- * Return  0 if the extent buffer doesn't need to be submitted.
- *           (E.g. the extent buffer is not dirty)
- * Return >0 is the extent buffer is submitted to bio.
- * Return <0 if something went wrong, no page is locked.
+ * Return %false if the extent buffer doesn't need to be submitted (e.g. the
+ * extent buffer is not dirty)
+ * Return %true is the extent buffer is submitted to bio.
  */
-static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb,
-			  struct btrfs_bio_ctrl *bio_ctrl)
+static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
+				      struct btrfs_bio_ctrl *bio_ctrl)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	int i, num_pages;
 	int flush = 0;
-	int ret = 0;
+	bool ret = false;
 
 	if (!btrfs_try_tree_write_lock(eb)) {
 		submit_write_bio(bio_ctrl, 0);
@@ -1650,7 +1649,7 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
 		btrfs_tree_unlock(eb);
 		if (bio_ctrl->wbc->sync_mode != WB_SYNC_ALL)
-			return 0;
+			return false;
 		if (!flush) {
 			submit_write_bio(bio_ctrl, 0);
 			flush = 1;
@@ -1677,7 +1676,7 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 		percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
 					 -eb->len,
 					 fs_info->dirty_metadata_batch);
-		ret = 1;
+		ret = true;
 	} else {
 		spin_unlock(&eb->refs_lock);
 	}
@@ -2010,7 +2009,6 @@ static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
 	u64 page_start = page_offset(page);
 	int bit_start = 0;
 	int sectors_per_node = fs_info->nodesize >> fs_info->sectorsize_bits;
-	int ret;
 
 	/* Lock and write each dirty extent buffers in the range */
 	while (bit_start < fs_info->subpage_info->bitmap_nr_bits) {
@@ -2056,25 +2054,13 @@ static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
 		if (!eb)
 			continue;
 
-		ret = lock_extent_buffer_for_io(eb, bio_ctrl);
-		if (ret == 0) {
-			free_extent_buffer(eb);
-			continue;
+		if (lock_extent_buffer_for_io(eb, bio_ctrl)) {
+			write_one_subpage_eb(eb, bio_ctrl);
+			submitted++;
 		}
-		if (ret < 0) {
-			free_extent_buffer(eb);
-			goto cleanup;
-		}
-		write_one_subpage_eb(eb, bio_ctrl);
 		free_extent_buffer(eb);
-		submitted++;
 	}
 	return submitted;
-
-cleanup:
-	/* We hit error, end bio for the submitted extent buffers */
-	submit_write_bio(bio_ctrl, ret);
-	return ret;
 }
 
 /*
@@ -2153,8 +2139,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 
 	*eb_context = eb;
 
-	ret = lock_extent_buffer_for_io(eb, bio_ctrl);
-	if (ret <= 0) {
+	if (!lock_extent_buffer_for_io(eb, bio_ctrl)) {
 		btrfs_revert_meta_write_pointer(cache, eb);
 		if (cache)
 			btrfs_put_block_group(cache);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 13:35   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb Christoph Hellwig
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Stop trying to cluster writes of multiple extent_buffers into a single
bio.  There is no need for that as the blk_plug mechanism used all the
way up in writeback_inodes_wb gives us the same I/O pattern even with
multiple bios.  Removing the clustering simplifies
lock_extent_buffer_for_io a lot and will also allow passing the eb
as private data to the end I/O handler.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 102 ++++++++++++++++---------------------------
 1 file changed, 37 insertions(+), 65 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1e8670d7fc2e71..ffcc700ea52196 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1626,41 +1626,24 @@ static void end_extent_buffer_writeback(struct extent_buffer *eb)
 /*
  * Lock extent buffer status and pages for writeback.
  *
- * May try to flush write bio if we can't get the lock.
- *
  * Return %false if the extent buffer doesn't need to be submitted (e.g. the
  * extent buffer is not dirty)
  * Return %true is the extent buffer is submitted to bio.
  */
 static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
-				      struct btrfs_bio_ctrl *bio_ctrl)
+				      struct writeback_control *wbc)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	int i, num_pages;
-	int flush = 0;
 	bool ret = false;
+	int i;
 
-	if (!btrfs_try_tree_write_lock(eb)) {
-		submit_write_bio(bio_ctrl, 0);
-		flush = 1;
-		btrfs_tree_lock(eb);
-	}
-
-	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
+	btrfs_tree_lock(eb);
+	while (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
 		btrfs_tree_unlock(eb);
-		if (bio_ctrl->wbc->sync_mode != WB_SYNC_ALL)
+		if (wbc->sync_mode != WB_SYNC_ALL)
 			return false;
-		if (!flush) {
-			submit_write_bio(bio_ctrl, 0);
-			flush = 1;
-		}
-		while (1) {
-			wait_on_extent_buffer_writeback(eb);
-			btrfs_tree_lock(eb);
-			if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags))
-				break;
-			btrfs_tree_unlock(eb);
-		}
+		wait_on_extent_buffer_writeback(eb);
+		btrfs_tree_lock(eb);
 	}
 
 	/*
@@ -1692,19 +1675,8 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
 	if (!ret || fs_info->nodesize < PAGE_SIZE)
 		return ret;
 
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *p = eb->pages[i];
-
-		if (!trylock_page(p)) {
-			if (!flush) {
-				submit_write_bio(bio_ctrl, 0);
-				flush = 1;
-			}
-			lock_page(p);
-		}
-	}
-
+	for (i = 0; i < num_extent_pages(eb); i++)
+		lock_page(eb->pages[i]);
 	return ret;
 }
 
@@ -1934,11 +1906,16 @@ static void prepare_eb_write(struct extent_buffer *eb)
  * Page locking is only utilized at minimum to keep the VMM code happy.
  */
 static void write_one_subpage_eb(struct extent_buffer *eb,
-				 struct btrfs_bio_ctrl *bio_ctrl)
+				 struct writeback_control *wbc)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page = eb->pages[0];
 	bool no_dirty_ebs = false;
+	struct btrfs_bio_ctrl bio_ctrl = {
+		.wbc = wbc,
+		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
+		.end_io_func = end_bio_subpage_eb_writepage,
+	};
 
 	prepare_eb_write(eb);
 
@@ -1952,40 +1929,43 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
 	if (no_dirty_ebs)
 		clear_page_dirty_for_io(page);
 
-	bio_ctrl->end_io_func = end_bio_subpage_eb_writepage;
-
-	submit_extent_page(bio_ctrl, eb->start, page, eb->len,
+	submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
 			   eb->start - page_offset(page));
 	unlock_page(page);
+	submit_one_bio(&bio_ctrl);
 	/*
 	 * Submission finished without problem, if no range of the page is
 	 * dirty anymore, we have submitted a page.  Update nr_written in wbc.
 	 */
 	if (no_dirty_ebs)
-		bio_ctrl->wbc->nr_to_write--;
+		wbc->nr_to_write--;
 }
 
 static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
-			struct btrfs_bio_ctrl *bio_ctrl)
+					    struct writeback_control *wbc)
 {
 	u64 disk_bytenr = eb->start;
 	int i, num_pages;
+	struct btrfs_bio_ctrl bio_ctrl = {
+		.wbc = wbc,
+		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
+		.end_io_func = end_bio_extent_buffer_writepage,
+	};
 
 	prepare_eb_write(eb);
 
-	bio_ctrl->end_io_func = end_bio_extent_buffer_writepage;
-
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
-		submit_extent_page(bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
+		submit_extent_page(&bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
 		disk_bytenr += PAGE_SIZE;
-		bio_ctrl->wbc->nr_to_write--;
+		wbc->nr_to_write--;
 		unlock_page(p);
 	}
+	submit_one_bio(&bio_ctrl);
 }
 
 /*
@@ -2002,7 +1982,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
  * Return >=0 for the number of submitted extent buffers.
  * Return <0 for fatal error.
  */
-static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
+static int submit_eb_subpage(struct page *page, struct writeback_control *wbc)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
 	int submitted = 0;
@@ -2054,8 +2034,8 @@ static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
 		if (!eb)
 			continue;
 
-		if (lock_extent_buffer_for_io(eb, bio_ctrl)) {
-			write_one_subpage_eb(eb, bio_ctrl);
+		if (lock_extent_buffer_for_io(eb, wbc)) {
+			write_one_subpage_eb(eb, wbc);
 			submitted++;
 		}
 		free_extent_buffer(eb);
@@ -2083,7 +2063,7 @@ static int submit_eb_subpage(struct page *page, struct btrfs_bio_ctrl *bio_ctrl)
  * previous call.
  * Return <0 for fatal error.
  */
-static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
+static int submit_eb_page(struct page *page, struct writeback_control *wbc,
 			  struct extent_buffer **eb_context)
 {
 	struct address_space *mapping = page->mapping;
@@ -2095,7 +2075,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 		return 0;
 
 	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
-		return submit_eb_subpage(page, bio_ctrl);
+		return submit_eb_subpage(page, wbc);
 
 	spin_lock(&mapping->private_lock);
 	if (!PagePrivate(page)) {
@@ -2128,8 +2108,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 		 * If for_sync, this hole will be filled with
 		 * trasnsaction commit.
 		 */
-		if (bio_ctrl->wbc->sync_mode == WB_SYNC_ALL &&
-		    !bio_ctrl->wbc->for_sync)
+		if (wbc->sync_mode == WB_SYNC_ALL && !wbc->for_sync)
 			ret = -EAGAIN;
 		else
 			ret = 0;
@@ -2139,12 +2118,12 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 
 	*eb_context = eb;
 
-	if (!lock_extent_buffer_for_io(eb, bio_ctrl)) {
+	if (!lock_extent_buffer_for_io(eb, wbc)) {
 		btrfs_revert_meta_write_pointer(cache, eb);
 		if (cache)
 			btrfs_put_block_group(cache);
 		free_extent_buffer(eb);
-		return ret;
+		return 0;
 	}
 	if (cache) {
 		/*
@@ -2153,7 +2132,7 @@ static int submit_eb_page(struct page *page, struct btrfs_bio_ctrl *bio_ctrl,
 		btrfs_schedule_zone_finish_bg(cache, eb);
 		btrfs_put_block_group(cache);
 	}
-	write_one_eb(eb, bio_ctrl);
+	write_one_eb(eb, wbc);
 	free_extent_buffer(eb);
 	return 1;
 }
@@ -2162,11 +2141,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 				   struct writeback_control *wbc)
 {
 	struct extent_buffer *eb_context = NULL;
-	struct btrfs_bio_ctrl bio_ctrl = {
-		.wbc = wbc,
-		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-		.extent_locked = 0,
-	};
 	struct btrfs_fs_info *fs_info = BTRFS_I(mapping->host)->root->fs_info;
 	int ret = 0;
 	int done = 0;
@@ -2208,7 +2182,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 		for (i = 0; i < nr_folios; i++) {
 			struct folio *folio = fbatch.folios[i];
 
-			ret = submit_eb_page(&folio->page, &bio_ctrl, &eb_context);
+			ret = submit_eb_page(&folio->page, wbc, &eb_context);
 			if (ret == 0)
 				continue;
 			if (ret < 0) {
@@ -2269,8 +2243,6 @@ int btree_write_cache_pages(struct address_space *mapping,
 		ret = 0;
 	if (!ret && BTRFS_FS_ERROR(fs_info))
 		ret = -EROFS;
-	submit_write_bio(&bio_ctrl, ret);
-
 	btrfs_zoned_meta_io_unlock(fs_info);
 	return ret;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 13:46   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 12/20] btrfs: simplify extent buffer writing Christoph Hellwig
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Locking the pages in lock_extent_buffer_for_io only for the non-subpage
case is very confusing.  Move it to write_one_eb to mirror the subpage
case and simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index ffcc700ea52196..1fc50d1402cef8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1635,7 +1635,6 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	bool ret = false;
-	int i;
 
 	btrfs_tree_lock(eb);
 	while (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
@@ -1663,20 +1662,7 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
 	} else {
 		spin_unlock(&eb->refs_lock);
 	}
-
 	btrfs_tree_unlock(eb);
-
-	/*
-	 * Either we don't need to submit any tree block, or we're submitting
-	 * subpage eb.
-	 * Subpage metadata doesn't use page locking at all, so we can skip
-	 * the page locking.
-	 */
-	if (!ret || fs_info->nodesize < PAGE_SIZE)
-		return ret;
-
-	for (i = 0; i < num_extent_pages(eb); i++)
-		lock_page(eb->pages[i]);
 	return ret;
 }
 
@@ -1958,6 +1944,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
 
+		lock_page(p);
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
 		submit_extent_page(&bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 12/20] btrfs: simplify extent buffer writing
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 14:00   ` Johannes Thumshirn
  2023-03-10  8:34   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler Christoph Hellwig
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The btrfs_bio_ctrl machinery is overkill for writing extent_buffers
as we always operate on PAGE SIZE chunks (or one smaller one for the
subpage case) that are contigous and are guaranteed to fit into a
single bio.  Replace it with open coded btrfs_bio_alloc, __bio_add_page
and btrfs_submit_bio calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1fc50d1402cef8..1d7f48190ee9b9 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -121,9 +121,6 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bbio->bio.bi_iter.bi_size);
 
-	if (!is_data_inode(&bbio->inode->vfs_inode))
-		bbio->bio.bi_opf |= REQ_META;
-
 	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
 	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
 		btrfs_submit_compressed_read(bbio);
@@ -1897,11 +1894,7 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct page *page = eb->pages[0];
 	bool no_dirty_ebs = false;
-	struct btrfs_bio_ctrl bio_ctrl = {
-		.wbc = wbc,
-		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-		.end_io_func = end_bio_subpage_eb_writepage,
-	};
+	struct btrfs_bio *bbio;
 
 	prepare_eb_write(eb);
 
@@ -1915,10 +1908,16 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
 	if (no_dirty_ebs)
 		clear_page_dirty_for_io(page);
 
-	submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
-			   eb->start - page_offset(page));
+	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
+			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
+			       BTRFS_I(eb->fs_info->btree_inode),
+			       end_bio_subpage_eb_writepage, NULL);
+	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
+	bbio->file_offset = eb->start;
+	__bio_add_page(&bbio->bio, page, eb->len, eb->start - page_offset(page));
 	unlock_page(page);
-	submit_one_bio(&bio_ctrl);
+	btrfs_submit_bio(bbio, 0);
+
 	/*
 	 * Submission finished without problem, if no range of the page is
 	 * dirty anymore, we have submitted a page.  Update nr_written in wbc.
@@ -1930,16 +1929,18 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
 static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 					    struct writeback_control *wbc)
 {
-	u64 disk_bytenr = eb->start;
+	struct btrfs_bio *bbio;
 	int i, num_pages;
-	struct btrfs_bio_ctrl bio_ctrl = {
-		.wbc = wbc,
-		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
-		.end_io_func = end_bio_extent_buffer_writepage,
-	};
 
 	prepare_eb_write(eb);
 
+	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
+			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
+			       BTRFS_I(eb->fs_info->btree_inode),
+			       end_bio_extent_buffer_writepage, NULL);
+	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
+	bbio->file_offset = eb->start;
+
 	num_pages = num_extent_pages(eb);
 	for (i = 0; i < num_pages; i++) {
 		struct page *p = eb->pages[i];
@@ -1947,12 +1948,11 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 		lock_page(p);
 		clear_page_dirty_for_io(p);
 		set_page_writeback(p);
-		submit_extent_page(&bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
-		disk_bytenr += PAGE_SIZE;
+		__bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
 		wbc->nr_to_write--;
 		unlock_page(p);
 	}
-	submit_one_bio(&bio_ctrl);
+	btrfs_submit_bio(bbio, 0);
 }
 
 /*
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 12/20] btrfs: simplify extent buffer writing Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 14:10   ` Johannes Thumshirn
  2023-03-10  8:44   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 14/20] btrfs: simplify btree block checksumming Christoph Hellwig
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Now that we always use a single bio to write an extent_buffer, the buffer
can be passed to the end_io handler as private data.  This allows
to simplify the metadata write end I/O handler, and merge the subpage
version into the main one with just a single conditional.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 112 +++++++++----------------------------------
 1 file changed, 23 insertions(+), 89 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1d7f48190ee9b9..16522bcf5d4a10 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1663,13 +1663,11 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
 	return ret;
 }
 
-static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
+static void set_btree_ioerr(struct extent_buffer *eb)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
 
-	btrfs_page_set_error(fs_info, page, eb->start, eb->len);
-	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
-		return;
+	set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
 
 	/*
 	 * A read may stumble upon this buffer later, make sure that it gets an
@@ -1683,7 +1681,7 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
 	 * return a 0 because we are readonly if we don't modify the err seq for
 	 * the superblock.
 	 */
-	mapping_set_error(page->mapping, -EIO);
+	mapping_set_error(eb->fs_info->btree_inode->i_mapping, -EIO);
 
 	/*
 	 * If writeback for a btree extent that doesn't belong to a log tree
@@ -1758,101 +1756,37 @@ static struct extent_buffer *find_extent_buffer_nolock(
 	return NULL;
 }
 
-/*
- * The endio function for subpage extent buffer write.
- *
- * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
- * after all extent buffers in the page has finished their writeback.
- */
-static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
+static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
 {
-	struct bio *bio = &bbio->bio;
-	struct btrfs_fs_info *fs_info;
-	struct bio_vec *bvec;
+	struct extent_buffer *eb = bbio->private;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
+	bool uptodate = !bbio->bio.bi_status;
 	struct bvec_iter_all iter_all;
-
-	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
-	ASSERT(fs_info->nodesize < PAGE_SIZE);
-
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
-		struct page *page = bvec->bv_page;
-		u64 bvec_start = page_offset(page) + bvec->bv_offset;
-		u64 bvec_end = bvec_start + bvec->bv_len - 1;
-		u64 cur_bytenr = bvec_start;
-
-		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
-
-		/* Iterate through all extent buffers in the range */
-		while (cur_bytenr <= bvec_end) {
-			struct extent_buffer *eb;
-			int done;
-
-			/*
-			 * Here we can't use find_extent_buffer(), as it may
-			 * try to lock eb->refs_lock, which is not safe in endio
-			 * context.
-			 */
-			eb = find_extent_buffer_nolock(fs_info, cur_bytenr);
-			ASSERT(eb);
-
-			cur_bytenr = eb->start + eb->len;
-
-			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
-			done = atomic_dec_and_test(&eb->io_pages);
-			ASSERT(done);
-
-			if (bio->bi_status ||
-			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
-				ClearPageUptodate(page);
-				set_btree_ioerr(page, eb);
-			}
-
-			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
-						      eb->len);
-			end_extent_buffer_writeback(eb);
-			/*
-			 * free_extent_buffer() will grab spinlock which is not
-			 * safe in endio context. Thus here we manually dec
-			 * the ref.
-			 */
-			atomic_dec(&eb->refs);
-		}
-	}
-	bio_put(bio);
-}
-
-static void end_bio_extent_buffer_writepage(struct btrfs_bio *bbio)
-{
-	struct bio *bio = &bbio->bio;
 	struct bio_vec *bvec;
-	struct extent_buffer *eb;
-	int done;
-	struct bvec_iter_all iter_all;
 
-	ASSERT(!bio_flagged(bio, BIO_CLONED));
-	bio_for_each_segment_all(bvec, bio, iter_all) {
+	if (!uptodate)
+		set_btree_ioerr(eb);
+
+	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
 		struct page *page = bvec->bv_page;
 
-		eb = (struct extent_buffer *)page->private;
-		BUG_ON(!eb);
-		done = atomic_dec_and_test(&eb->io_pages);
+		atomic_dec(&eb->io_pages);
 
-		if (bio->bi_status ||
-		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
+		if (!uptodate) {
 			ClearPageUptodate(page);
-			set_btree_ioerr(page, eb);
+			btrfs_page_set_error(fs_info, page, eb->start, eb->len);
 		}
 
-		end_page_writeback(page);
-
-		if (!done)
-			continue;
+		if (fs_info->nodesize < PAGE_SIZE)
+			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
+						      eb->len);
+		else
+			end_page_writeback(page);
 
-		end_extent_buffer_writeback(eb);
 	}
+	end_extent_buffer_writeback(eb);
 
-	bio_put(bio);
+	bio_put(&bbio->bio);
 }
 
 static void prepare_eb_write(struct extent_buffer *eb)
@@ -1911,7 +1845,7 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
 	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
 			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
 			       BTRFS_I(eb->fs_info->btree_inode),
-			       end_bio_subpage_eb_writepage, NULL);
+			       extent_buffer_write_end_io, eb);
 	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
 	bbio->file_offset = eb->start;
 	__bio_add_page(&bbio->bio, page, eb->len, eb->start - page_offset(page));
@@ -1937,7 +1871,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
 			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
 			       BTRFS_I(eb->fs_info->btree_inode),
-			       end_bio_extent_buffer_writepage, NULL);
+			       extent_buffer_write_end_io, eb);
 	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
 	bbio->file_offset = eb->start;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 14/20] btrfs: simplify btree block checksumming
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (12 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 15:51   ` Johannes Thumshirn
  2023-03-10  8:57   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer Christoph Hellwig
                   ` (5 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The checksumming of btree blocks always operates on the entire
extent_buffer, and because btree blocks are always allocated contiguously
on disk they are never split by btrfs_submit_bio.

Simplify the checksumming code by finding the extent_buffer in the
btrfs_bio private data instead of trying to search through the bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 121 ++++++++++-----------------------------------
 1 file changed, 25 insertions(+), 96 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6795acae476993..e007e15e1455e1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -312,12 +312,35 @@ int btrfs_read_extent_buffer(struct extent_buffer *eb,
 	return ret;
 }
 
-static int csum_one_extent_buffer(struct extent_buffer *eb)
+/*
+ * Checksum a dirty tree block before IO.
+ */
+blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
 {
+	struct extent_buffer *eb = bbio->private;
 	struct btrfs_fs_info *fs_info = eb->fs_info;
+	u64 found_start = btrfs_header_bytenr(eb);
 	u8 result[BTRFS_CSUM_SIZE];
 	int ret;
 
+	/*
+	 * Btree blocks are always contiguous on disk.
+	 */
+	if (WARN_ON_ONCE(bbio->file_offset != eb->start))
+		return BLK_STS_IOERR;
+	if (WARN_ON_ONCE(bbio->bio.bi_iter.bi_size != eb->len))
+		return BLK_STS_IOERR;
+
+	if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
+		WARN_ON_ONCE(found_start != 0);
+		return BLK_STS_OK;
+	}
+
+	if (WARN_ON_ONCE(found_start != eb->start))
+		return BLK_STS_IOERR;
+	if (WARN_ON_ONCE(!PageUptodate(eb->pages[0])))
+		return BLK_STS_IOERR;
+
 	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
 				    offsetof(struct btrfs_header, fsid),
 				    BTRFS_FSID_SIZE) == 0);
@@ -344,8 +367,7 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
 		goto error;
 	}
 	write_extent_buffer(eb, result, 0, fs_info->csum_size);
-
-	return 0;
+	return BLK_STS_OK;
 
 error:
 	btrfs_print_tree(eb, 0);
@@ -359,99 +381,6 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
 	 */
 	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG) ||
 		btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID);
-	return ret;
-}
-
-/* Checksum all dirty extent buffers in one bio_vec */
-static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
-				      struct bio_vec *bvec)
-{
-	struct page *page = bvec->bv_page;
-	u64 bvec_start = page_offset(page) + bvec->bv_offset;
-	u64 cur;
-	int ret = 0;
-
-	for (cur = bvec_start; cur < bvec_start + bvec->bv_len;
-	     cur += fs_info->nodesize) {
-		struct extent_buffer *eb;
-		bool uptodate;
-
-		eb = find_extent_buffer(fs_info, cur);
-		uptodate = btrfs_subpage_test_uptodate(fs_info, page, cur,
-						       fs_info->nodesize);
-
-		/* A dirty eb shouldn't disappear from buffer_radix */
-		if (WARN_ON(!eb))
-			return -EUCLEAN;
-
-		if (WARN_ON(cur != btrfs_header_bytenr(eb))) {
-			free_extent_buffer(eb);
-			return -EUCLEAN;
-		}
-		if (WARN_ON(!uptodate)) {
-			free_extent_buffer(eb);
-			return -EUCLEAN;
-		}
-
-		ret = csum_one_extent_buffer(eb);
-		free_extent_buffer(eb);
-		if (ret < 0)
-			return ret;
-	}
-	return ret;
-}
-
-/*
- * Checksum a dirty tree block before IO.  This has extra checks to make sure
- * we only fill in the checksum field in the first page of a multi-page block.
- * For subpage extent buffers we need bvec to also read the offset in the page.
- */
-static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
-{
-	struct page *page = bvec->bv_page;
-	u64 start = page_offset(page);
-	u64 found_start;
-	struct extent_buffer *eb;
-
-	if (fs_info->nodesize < PAGE_SIZE)
-		return csum_dirty_subpage_buffers(fs_info, bvec);
-
-	eb = (struct extent_buffer *)page->private;
-	if (page != eb->pages[0])
-		return 0;
-
-	found_start = btrfs_header_bytenr(eb);
-
-	if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
-		WARN_ON(found_start != 0);
-		return 0;
-	}
-
-	/*
-	 * Please do not consolidate these warnings into a single if.
-	 * It is useful to know what went wrong.
-	 */
-	if (WARN_ON(found_start != start))
-		return -EUCLEAN;
-	if (WARN_ON(!PageUptodate(page)))
-		return -EUCLEAN;
-
-	return csum_one_extent_buffer(eb);
-}
-
-blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
-{
-	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
-	struct bvec_iter iter;
-	struct bio_vec bv;
-	int ret = 0;
-
-	bio_for_each_segment(bv, &bbio->bio, iter) {
-		ret = csum_dirty_buffer(fs_info, &bv);
-		if (ret)
-			break;
-	}
-
 	return errno_to_blk_status(ret);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (13 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 14/20] btrfs: simplify btree block checksumming Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 16:01   ` Johannes Thumshirn
  2023-03-10  8:53   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 16/20] btrfs: stop using PageError for extent_buffers Christoph Hellwig
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

No need to track the number of pages under I/O now that each
extent_buffer is read and written using a single bio.  For the
read side we need to grab an extra reference for the duration of
the I/O to prevent eviction, though.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 17 +++++------------
 fs/btrfs/extent_io.h |  1 -
 2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 16522bcf5d4a10..24df1247d81d88 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1770,8 +1770,6 @@ static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
 	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
 		struct page *page = bvec->bv_page;
 
-		atomic_dec(&eb->io_pages);
-
 		if (!uptodate) {
 			ClearPageUptodate(page);
 			btrfs_page_set_error(fs_info, page, eb->start, eb->len);
@@ -1796,7 +1794,6 @@ static void prepare_eb_write(struct extent_buffer *eb)
 	unsigned long end;
 
 	clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
-	atomic_set(&eb->io_pages, num_extent_pages(eb));
 
 	/* Set btree blocks beyond nritems with 0 to avoid stale content */
 	nritems = btrfs_header_nritems(eb);
@@ -3236,8 +3233,7 @@ static void __free_extent_buffer(struct extent_buffer *eb)
 
 static int extent_buffer_under_io(const struct extent_buffer *eb)
 {
-	return (atomic_read(&eb->io_pages) ||
-		test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
+	return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
 		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
 }
 
@@ -3374,7 +3370,6 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
 
 	spin_lock_init(&eb->refs_lock);
 	atomic_set(&eb->refs, 1);
-	atomic_set(&eb->io_pages, 0);
 
 	ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
 
@@ -3491,9 +3486,9 @@ static void check_buffer_tree_ref(struct extent_buffer *eb)
 	 * adequately protected by the refcount, but the TREE_REF bit and
 	 * its corresponding reference are not. To protect against this
 	 * class of races, we call check_buffer_tree_ref from the codepaths
-	 * which trigger io after they set eb->io_pages. Note that once io is
-	 * initiated, TREE_REF can no longer be cleared, so that is the
-	 * moment at which any such race is best fixed.
+	 * which trigger io. Note that once io is initiated, TREE_REF can no
+	 * longer be cleared, so that is the moment at which any such race is
+	 * best fixed.
 	 */
 	refs = atomic_read(&eb->refs);
 	if (refs >= 2 && test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags))
@@ -4063,7 +4058,6 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 	struct bio_vec *bvec;
 	u32 bio_offset = 0;
 
-	atomic_inc(&eb->refs);
 	eb->read_mirror = bbio->mirror_num;
 
 	if (uptodate &&
@@ -4078,7 +4072,6 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 	}
 
 	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
-		atomic_dec(&eb->io_pages);
 		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
 			      bvec->bv_len);
 		bio_offset += bvec->bv_len;
@@ -4101,8 +4094,8 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
 
 	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
 	eb->read_mirror = 0;
-	atomic_set(&eb->io_pages, num_pages);
 	check_buffer_tree_ref(eb);
+	atomic_inc(&eb->refs);
 
 	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
 			       REQ_OP_READ | REQ_META,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 342412d37a7b4b..12854a2b48f060 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -79,7 +79,6 @@ struct extent_buffer {
 	struct btrfs_fs_info *fs_info;
 	spinlock_t refs_lock;
 	atomic_t refs;
-	atomic_t io_pages;
 	int read_mirror;
 	struct rcu_head rcu_head;
 	pid_t lock_owner;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 16/20] btrfs: stop using PageError for extent_buffers
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (14 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 16:05   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages Christoph Hellwig
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

PageError is only used to limit the uptodate check in
assert_eb_page_uptodate.  But we have a much more useful flag indicating
the exact condition we are about with the EXTENT_BUFFER_WRITE_ERR flag,
so use that instead and help the kernel torward eventually removing
PageError.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 24df1247d81d88..88a941c64fc73f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1770,10 +1770,8 @@ static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
 	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
 		struct page *page = bvec->bv_page;
 
-		if (!uptodate) {
+		if (!uptodate)
 			ClearPageUptodate(page);
-			btrfs_page_set_error(fs_info, page, eb->start, eb->len);
-		}
 
 		if (fs_info->nodesize < PAGE_SIZE)
 			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
@@ -4108,10 +4106,8 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
 		__bio_add_page(&bbio->bio, eb->pages[0], eb->len,
 			       eb->start - page_offset(eb->pages[0]));
 	} else {
-		for (i = 0; i < num_pages; i++) {
-			ClearPageError(eb->pages[i]);
+		for (i = 0; i < num_pages; i++)
 			__bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
-		}
 	}
 	btrfs_submit_bio(bbio, mirror_num);
 }
@@ -4151,7 +4147,6 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 		return 0;
 	}
 
-	btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len);
 	btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len);
 
 	__read_extent_buffer_pages(eb, mirror_num, check);
@@ -4393,18 +4388,16 @@ static void assert_eb_page_uptodate(const struct extent_buffer *eb,
 	 * looked up.  We don't want to complain in this case, as the page was
 	 * valid before, we just didn't write it out.  Instead we want to catch
 	 * the case where we didn't actually read the block properly, which
-	 * would have !PageUptodate && !PageError, as we clear PageError before
-	 * reading.
+	 * would have !PageUptodate && !EXTENT_BUFFER_WRITE_ERR.
 	 */
-	if (fs_info->nodesize < PAGE_SIZE) {
-		bool uptodate, error;
+	if (test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
+		return;
 
-		uptodate = btrfs_subpage_test_uptodate(fs_info, page,
-						       eb->start, eb->len);
-		error = btrfs_subpage_test_error(fs_info, page, eb->start, eb->len);
-		WARN_ON(!uptodate && !error);
+	if (fs_info->nodesize < PAGE_SIZE) {
+		WARN_ON(!btrfs_subpage_test_uptodate(fs_info, page,
+						     eb->start, eb->len));
 	} else {
-		WARN_ON(!PageUptodate(page) && !PageError(page));
+		WARN_ON(!PageUptodate(page));
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (15 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 16/20] btrfs: stop using PageError for extent_buffers Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 16:10   ` Johannes Thumshirn
  2023-03-10  9:08   ` Qu Wenruo
  2023-03-09  9:05 ` [PATCH 18/20] btrfs: stop using lock_extent in btrfs_buffer_uptodate Christoph Hellwig
                   ` (2 subsequent siblings)
  19 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The only place that reads in pages and thus marks them uptodate for
the btree inode is read_extent_buffer_pages.  Which means that either
pages are already uptodate from an old buffer when creating a new
one in alloc_extent_buffer, or they will be updated by ca call
to read_extent_buffer_pages.  This means the checks for uptodate
pages in read_extent_buffer_pages and read_extent_buffer_subpage are
superflous and can be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 21 +--------------------
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 88a941c64fc73f..73ea618282d466 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4138,10 +4138,7 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 			return ret;
 	}
 
-	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
-	    PageUptodate(page) ||
-	    btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
-		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) {
 		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
 			      &cached_state);
 		return 0;
@@ -4168,7 +4165,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 	int i;
 	struct page *page;
 	int locked_pages = 0;
-	int all_uptodate = 1;
 	int num_pages;
 
 	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
@@ -4203,21 +4199,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
 		}
 		locked_pages++;
 	}
-	/*
-	 * We need to firstly lock all pages to make sure that
-	 * the uptodate bit of our pages won't be affected by
-	 * clear_extent_buffer_uptodate().
-	 */
-	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
-		if (!PageUptodate(page))
-			all_uptodate = 0;
-	}
-
-	if (all_uptodate) {
-		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
-		goto unlock_exit;
-	}
 
 	__read_extent_buffer_pages(eb, mirror_num, check);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 18/20] btrfs: stop using lock_extent in btrfs_buffer_uptodate
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (16 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading Christoph Hellwig
  2023-03-09  9:05 ` [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb Christoph Hellwig
  19 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

The only other place that locks extents on the btree inode is
read_extent_buffer_subpage while reading in the partial page for a
buffer.  This means locking the extent in btrfs_buffer_uptodate does not
synchronize with anything on non-supage file systems, and on subpage
file systems it only waits for a parallel read(-ahead) to finish,
which seems to be counter to what the callers actually expect.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/disk-io.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e007e15e1455e1..2f62852348cc6d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -113,11 +113,6 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
 int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid,
 			  int atomic)
 {
-	struct inode *btree_inode = eb->pages[0]->mapping->host;
-	struct extent_io_tree *io_tree = &BTRFS_I(btree_inode)->io_tree;
-	struct extent_state *cached_state = NULL;
-	int ret = 1;
-
 	if (!extent_buffer_uptodate(eb))
 		return 0;
 
@@ -127,7 +122,6 @@ int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid,
 	if (atomic)
 		return -EAGAIN;
 
-	lock_extent(io_tree, eb->start, eb->start + eb->len - 1, &cached_state);
 	if (!extent_buffer_uptodate(eb) ||
 	    btrfs_header_generation(eb) != parent_transid) {
 		btrfs_err_rl(eb->fs_info,
@@ -135,11 +129,9 @@ int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid,
 			eb->start, eb->read_mirror,
 			parent_transid, btrfs_header_generation(eb));
 		clear_extent_buffer_uptodate(eb);
-		ret = 0;
+		return 0;
 	}
-	unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-		      &cached_state);
-	return ret;
+	return 1;
 }
 
 static bool btrfs_supported_super_csum(u16 csum_type)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (17 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 18/20] btrfs: stop using lock_extent in btrfs_buffer_uptodate Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 17:12   ` Johannes Thumshirn
  2023-03-09  9:05 ` [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb Christoph Hellwig
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Instead of locking and unlocking every page or the extent, just add a
new EXTENT_BUFFER_READING bit that mirrors EXTENT_BUFFER_WRITEBACK
for synchronizing threads trying to read an extent_buffer and to wait
for I/O completion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 151 +++++++++++--------------------------------
 fs/btrfs/extent_io.h |   1 +
 2 files changed, 38 insertions(+), 114 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 73ea618282d466..e0e0d1bdb43f1a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4051,6 +4051,7 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
 static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 {
 	struct extent_buffer *eb = bbio->private;
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	bool uptodate = !bbio->bio.bi_status;
 	struct bvec_iter_all iter_all;
 	struct bio_vec *bvec;
@@ -4070,26 +4071,49 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
 	}
 
 	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
-		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
-			      bvec->bv_len);
+		u64 start = eb->start + bio_offset;
+		struct page *page = bvec->bv_page;
+		u32 len = bvec->bv_len;
+
+		if (uptodate)
+			btrfs_page_set_uptodate(fs_info, page, start, len);
+		else
+			btrfs_page_clear_uptodate(fs_info, page, start, len);
+
 		bio_offset += bvec->bv_len;
 	}
 
-	if (eb->fs_info->nodesize < PAGE_SIZE) {
-		unlock_extent(&bbio->inode->io_tree, eb->start,
-			      eb->start + bio_offset - 1, NULL);
-	}
+	clear_bit(EXTENT_BUFFER_READING, &eb->bflags);
+	smp_mb__after_atomic();
+	wake_up_bit(&eb->bflags, EXTENT_BUFFER_READING);
 	free_extent_buffer(eb);
 
 	bio_put(&bbio->bio);
 }
 
-static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
-				       struct btrfs_tree_parent_check *check)
+int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
+			     struct btrfs_tree_parent_check *check)
 {
 	int num_pages = num_extent_pages(eb), i;
 	struct btrfs_bio *bbio;
 
+	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
+		return 0;
+
+	/*
+	 * We could have had EXTENT_BUFFER_UPTODATE cleared by the write
+	 * operation, which could potentially still be in flight.  In this case
+	 * we simply want to return an error.
+	 */
+	if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
+		return -EIO;
+
+	/*
+	 * Someone else is already reading the buffer, just wait for it.
+	 */
+	if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags))
+		goto done;
+
 	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
 	eb->read_mirror = 0;
 	check_buffer_tree_ref(eb);
@@ -4110,117 +4134,16 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
 			__bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
 	}
 	btrfs_submit_bio(bbio, mirror_num);
-}
-
-static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
-				      int mirror_num,
-				      struct btrfs_tree_parent_check *check)
-{
-	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct extent_io_tree *io_tree;
-	struct page *page = eb->pages[0];
-	struct extent_state *cached_state = NULL;
-	int ret;
 
-	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
-	ASSERT(PagePrivate(page));
-	ASSERT(check);
-	io_tree = &BTRFS_I(fs_info->btree_inode)->io_tree;
-
-	if (wait == WAIT_NONE) {
-		if (!try_lock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-				     &cached_state))
-			return -EAGAIN;
-	} else {
-		ret = lock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-				  &cached_state);
-		if (ret < 0)
-			return ret;
-	}
-
-	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) {
-		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
-			      &cached_state);
-		return 0;
-	}
-
-	btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len);
-
-	__read_extent_buffer_pages(eb, mirror_num, check);
-	if (wait != WAIT_COMPLETE) {
-		free_extent_state(cached_state);
-		return 0;
-	}
-
-	wait_extent_bit(io_tree, eb->start, eb->start + eb->len - 1,
-			EXTENT_LOCKED, &cached_state);
-	if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
-		return -EIO;
-	return 0;
-}
-
-int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
-			     struct btrfs_tree_parent_check *check)
-{
-	int i;
-	struct page *page;
-	int locked_pages = 0;
-	int num_pages;
-
-	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
-		return 0;
-
-	/*
-	 * We could have had EXTENT_BUFFER_UPTODATE cleared by the write
-	 * operation, which could potentially still be in flight.  In this case
-	 * we simply want to return an error.
-	 */
-	if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)))
-		return -EIO;
-
-	if (eb->fs_info->nodesize < PAGE_SIZE)
-		return read_extent_buffer_subpage(eb, wait, mirror_num, check);
-
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
-		if (wait == WAIT_NONE) {
-			/*
-			 * WAIT_NONE is only utilized by readahead. If we can't
-			 * acquire the lock atomically it means either the eb
-			 * is being read out or under modification.
-			 * Either way the eb will be or has been cached,
-			 * readahead can exit safely.
-			 */
-			if (!trylock_page(page))
-				goto unlock_exit;
-		} else {
-			lock_page(page);
-		}
-		locked_pages++;
-	}
-
-	__read_extent_buffer_pages(eb, mirror_num, check);
-
-	if (wait != WAIT_COMPLETE)
-		return 0;
-
-	for (i = 0; i < num_pages; i++) {
-		page = eb->pages[i];
-		wait_on_page_locked(page);
-		if (!PageUptodate(page))
+done:
+	if (wait == WAIT_COMPLETE) {
+		wait_on_bit_io(&eb->bflags, EXTENT_BUFFER_READING,
+			       TASK_UNINTERRUPTIBLE);
+		if (!test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
 			return -EIO;
 	}
 
 	return 0;
-
-unlock_exit:
-	while (locked_pages > 0) {
-		locked_pages--;
-		page = eb->pages[locked_pages];
-		unlock_page(page);
-	}
-	return 0;
 }
 
 static bool report_eb_range(const struct extent_buffer *eb, unsigned long start,
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 12854a2b48f060..44f63ab18b1888 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -29,6 +29,7 @@ enum {
 	/* write IO error */
 	EXTENT_BUFFER_WRITE_ERR,
 	EXTENT_BUFFER_NO_CHECK,
+	EXTENT_BUFFER_READING,
 };
 
 /* these are flags for __process_pages_contig */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb
  2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
                   ` (18 preceding siblings ...)
  2023-03-09  9:05 ` [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading Christoph Hellwig
@ 2023-03-09  9:05 ` Christoph Hellwig
  2023-03-09 17:28   ` Johannes Thumshirn
  19 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09  9:05 UTC (permalink / raw)
  To: Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Most of the code in write_one_subpage_eb and write_one_eb is shared,
so merge the two functions into one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/btrfs/extent_io.c | 77 ++++++++++++++------------------------------
 1 file changed, 25 insertions(+), 52 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e0e0d1bdb43f1a..423a3d7378140e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1813,53 +1813,11 @@ static void prepare_eb_write(struct extent_buffer *eb)
 	}
 }
 
-/*
- * Unlike the work in write_one_eb(), we rely completely on extent locking.
- * Page locking is only utilized at minimum to keep the VMM code happy.
- */
-static void write_one_subpage_eb(struct extent_buffer *eb,
-				 struct writeback_control *wbc)
-{
-	struct btrfs_fs_info *fs_info = eb->fs_info;
-	struct page *page = eb->pages[0];
-	bool no_dirty_ebs = false;
-	struct btrfs_bio *bbio;
-
-	prepare_eb_write(eb);
-
-	/* clear_page_dirty_for_io() in subpage helper needs page locked */
-	lock_page(page);
-	btrfs_subpage_set_writeback(fs_info, page, eb->start, eb->len);
-
-	/* Check if this is the last dirty bit to update nr_written */
-	no_dirty_ebs = btrfs_subpage_clear_and_test_dirty(fs_info, page,
-							  eb->start, eb->len);
-	if (no_dirty_ebs)
-		clear_page_dirty_for_io(page);
-
-	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
-			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
-			       BTRFS_I(eb->fs_info->btree_inode),
-			       extent_buffer_write_end_io, eb);
-	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
-	bbio->file_offset = eb->start;
-	__bio_add_page(&bbio->bio, page, eb->len, eb->start - page_offset(page));
-	unlock_page(page);
-	btrfs_submit_bio(bbio, 0);
-
-	/*
-	 * Submission finished without problem, if no range of the page is
-	 * dirty anymore, we have submitted a page.  Update nr_written in wbc.
-	 */
-	if (no_dirty_ebs)
-		wbc->nr_to_write--;
-}
-
 static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 					    struct writeback_control *wbc)
 {
+	struct btrfs_fs_info *fs_info = eb->fs_info;
 	struct btrfs_bio *bbio;
-	int i, num_pages;
 
 	prepare_eb_write(eb);
 
@@ -1869,17 +1827,32 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
 			       extent_buffer_write_end_io, eb);
 	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
 	bbio->file_offset = eb->start;
-
-	num_pages = num_extent_pages(eb);
-	for (i = 0; i < num_pages; i++) {
-		struct page *p = eb->pages[i];
+	if (fs_info->nodesize < PAGE_SIZE) {
+		struct page *p = eb->pages[0];
 
 		lock_page(p);
-		clear_page_dirty_for_io(p);
-		set_page_writeback(p);
-		__bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
-		wbc->nr_to_write--;
+		btrfs_subpage_set_writeback(fs_info, p, eb->start, eb->len);
+		if (btrfs_subpage_clear_and_test_dirty(fs_info, p, eb->start,
+						       eb->len)) {
+			clear_page_dirty_for_io(p);
+			wbc->nr_to_write--;
+		}
+		__bio_add_page(&bbio->bio, p, eb->len,
+			       offset_in_page(eb->start));
 		unlock_page(p);
+	} else {
+		int i;
+
+		for (i = 0; i < num_extent_pages(eb); i++) {
+			struct page *p = eb->pages[i];
+
+			lock_page(p);
+			clear_page_dirty_for_io(p);
+			set_page_writeback(p);
+			__bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
+			wbc->nr_to_write--;
+			unlock_page(p);
+		}
 	}
 	btrfs_submit_bio(bbio, 0);
 }
@@ -1951,7 +1924,7 @@ static int submit_eb_subpage(struct page *page, struct writeback_control *wbc)
 			continue;
 
 		if (lock_extent_buffer_for_io(eb, wbc)) {
-			write_one_subpage_eb(eb, wbc);
+			write_one_eb(eb, wbc);
 			submitted++;
 		}
 		free_extent_buffer(eb);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 01/20] btrfs: mark extent_buffer_under_io static
  2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
@ 2023-03-09 11:06   ` Johannes Thumshirn
  2023-03-10  7:26   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 11:06 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer
  2023-03-09  9:05 ` [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer Christoph Hellwig
@ 2023-03-09 11:10   ` Johannes Thumshirn
  2023-03-10  7:27   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 11:10 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
  2023-03-09  9:05 ` [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate Christoph Hellwig
@ 2023-03-09 11:17   ` Johannes Thumshirn
  2023-03-09 15:21     ` Christoph Hellwig
  2023-03-10  7:28   ` Qu Wenruo
  1 sibling, 1 reply; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 11:17 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:07, Christoph Hellwig wrote:
> verify_parent_transid is only called by btrfs_buffer_uptodate, which
> confusingly inverts the return value.  Merge the two functions and
> reflow the parent_transid so that error handling is in a branch.

This would be a good chance to make btrfs_buffer_uptodate() a bool
function.

Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09  9:05 ` [PATCH 04/20] btrfs: always read the entire extent_buffer Christoph Hellwig
@ 2023-03-09 11:29   ` Johannes Thumshirn
  2023-03-09 15:21     ` Christoph Hellwig
                       ` (2 more replies)
  2023-03-10  7:35   ` Qu Wenruo
  1 sibling, 3 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 11:29 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:07, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.

Can someone smarter than me explain why we need to iterate eb->pages four
times in this function? This doesn't look super efficient to me.

> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>  	 */
>  	for (i = 0; i < num_pages; i++) {
>  		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>  			all_uptodate = 0;
> -		}

I think we could also break out of the loop here. As soon as
one page isn't uptodate we don't care about the fast path anymore.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-09  9:05 ` [PATCH 05/20] btrfs: simplify extent buffer reading Christoph Hellwig
@ 2023-03-09 11:59   ` Johannes Thumshirn
  2023-03-10  7:42   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 11:59 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read
  2023-03-09  9:05 ` [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read Christoph Hellwig
@ 2023-03-09 12:58   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 12:58 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
@ 2023-03-09 13:08   ` Johannes Thumshirn
  2023-03-10  8:14   ` Qu Wenruo
  2023-03-10  9:30   ` Qu Wenruo
  2 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 13:08 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads
  2023-03-09  9:05 ` [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads Christoph Hellwig
@ 2023-03-09 13:13   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 13:13 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io
  2023-03-09  9:05 ` [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io Christoph Hellwig
@ 2023-03-09 13:17   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 13:17 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:07, Christoph Hellwig wrote:
> lock_extent_buffer_for_io never returns a negative error value, so switch
> the return value to a simple bool.  Also remove the noinline_for_stack
> annotation given that nothing in lock_extent_buffer_for_io or its callers
> is particularly stack hungry.
 
I _think_ the noinline_for_stack annotation is to have 
lock_extent_buffer_for_io() in a stacktrace.


Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer
  2023-03-09  9:05 ` [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer Christoph Hellwig
@ 2023-03-09 13:35   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 13:35 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb
  2023-03-09  9:05 ` [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb Christoph Hellwig
@ 2023-03-09 13:46   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 13:46 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/20] btrfs: simplify extent buffer writing
  2023-03-09  9:05 ` [PATCH 12/20] btrfs: simplify extent buffer writing Christoph Hellwig
@ 2023-03-09 14:00   ` Johannes Thumshirn
  2023-03-09 15:22     ` Christoph Hellwig
  2023-03-10  8:34   ` Qu Wenruo
  1 sibling, 1 reply; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 14:00 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Looking at the resulting code, I hope one of the subsequent
patches is merging write_one_subpage_eb() and write_one_eb()
as they're getting quite similar.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler
  2023-03-09  9:05 ` [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler Christoph Hellwig
@ 2023-03-09 14:10   ` Johannes Thumshirn
  2023-03-09 15:22     ` Christoph Hellwig
  2023-03-10  8:44   ` Qu Wenruo
  1 sibling, 1 reply; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 14:10 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:08, Christoph Hellwig wrote:
> +			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
> +						      eb->len);
> +		else
> +			end_page_writeback(page);
>  
> -		end_extent_buffer_writeback(eb);
>  	}
> +	end_extent_buffer_writeback(eb);
>  
> -	bio_put(bio);
> +	bio_put(&bbio->bio);
>  }
>  


Can we merge end_extent_buffer_writeback() here? It's a 3 line function
that get's only called once after this change.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
  2023-03-09 11:17   ` Johannes Thumshirn
@ 2023-03-09 15:21     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09 15:21 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 11:17:47AM +0000, Johannes Thumshirn wrote:
> On 09.03.23 10:07, Christoph Hellwig wrote:
> > verify_parent_transid is only called by btrfs_buffer_uptodate, which
> > confusingly inverts the return value.  Merge the two functions and
> > reflow the parent_transid so that error handling is in a branch.
> 
> This would be a good chance to make btrfs_buffer_uptodate() a bool
> function.

It still can return 0/1/-EAGAIN.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09 11:29   ` Johannes Thumshirn
@ 2023-03-09 15:21     ` Christoph Hellwig
  2023-03-14  6:09     ` Christoph Hellwig
  2023-03-17 23:16     ` David Sterba
  2 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09 15:21 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

We don't, and only a single iteration is left by the end of the series :)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/20] btrfs: simplify extent buffer writing
  2023-03-09 14:00   ` Johannes Thumshirn
@ 2023-03-09 15:22     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09 15:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 02:00:32PM +0000, Johannes Thumshirn wrote:
> Looks good,
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> Looking at the resulting code, I hope one of the subsequent
> patches is merging write_one_subpage_eb() and write_one_eb()
> as they're getting quite similar.

The final patch will do that.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler
  2023-03-09 14:10   ` Johannes Thumshirn
@ 2023-03-09 15:22     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-09 15:22 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 02:10:51PM +0000, Johannes Thumshirn wrote:
> > +	end_extent_buffer_writeback(eb);
> >  
> > -	bio_put(bio);
> > +	bio_put(&bbio->bio);
> >  }
> >  
> 
> 
> Can we merge end_extent_buffer_writeback() here? It's a 3 line function
> that get's only called once after this change.

I can add a patch for that.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 14/20] btrfs: simplify btree block checksumming
  2023-03-09  9:05 ` [PATCH 14/20] btrfs: simplify btree block checksumming Christoph Hellwig
@ 2023-03-09 15:51   ` Johannes Thumshirn
  2023-03-10  8:57   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 15:51 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer
  2023-03-09  9:05 ` [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer Christoph Hellwig
@ 2023-03-09 16:01   ` Johannes Thumshirn
  2023-03-10  8:53   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 16:01 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 16/20] btrfs: stop using PageError for extent_buffers
  2023-03-09  9:05 ` [PATCH 16/20] btrfs: stop using PageError for extent_buffers Christoph Hellwig
@ 2023-03-09 16:05   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 16:05 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages
  2023-03-09  9:05 ` [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages Christoph Hellwig
@ 2023-03-09 16:10   ` Johannes Thumshirn
  2023-03-10  9:08   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 16:10 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:08, Christoph Hellwig wrote:
> The only place that reads in pages and thus marks them uptodate for
> the btree inode is read_extent_buffer_pages.  Which means that either
> pages are already uptodate from an old buffer when creating a new
> one in alloc_extent_buffer, or they will be updated by ca call
> to read_extent_buffer_pages.  This means the checks for uptodate
> pages in read_extent_buffer_pages and read_extent_buffer_subpage are
> superflous and can be removed.

superfluous ^

[...]

Otherwise,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading
  2023-03-09  9:05 ` [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading Christoph Hellwig
@ 2023-03-09 17:12   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 17:12 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

On 09.03.23 10:08, Christoph Hellwig wrote:
> Instead of locking and unlocking every page or the extent, just add a
> new EXTENT_BUFFER_READING bit that mirrors EXTENT_BUFFER_WRITEBACK
> for synchronizing threads trying to read an extent_buffer and to wait
> for I/O completion.
> 

While I like how the resulting code looks, I deferre reviewing it to
someone who's more in depth with the locking of eb pages. 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb
  2023-03-09  9:05 ` [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb Christoph Hellwig
@ 2023-03-09 17:28   ` Johannes Thumshirn
  0 siblings, 0 replies; 74+ messages in thread
From: Johannes Thumshirn @ 2023-03-09 17:28 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 01/20] btrfs: mark extent_buffer_under_io static
  2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
  2023-03-09 11:06   ` Johannes Thumshirn
@ 2023-03-10  7:26   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  7:26 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> extent_buffer_under_io is only used in extent_io.c, so mark it static.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 2 +-
>   fs/btrfs/extent_io.h | 1 -
>   2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1221f699ffc596..302af9b01bda2a 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3420,7 +3420,7 @@ static void __free_extent_buffer(struct extent_buffer *eb)
>   	kmem_cache_free(extent_buffer_cache, eb);
>   }
>   
> -int extent_buffer_under_io(const struct extent_buffer *eb)
> +static int extent_buffer_under_io(const struct extent_buffer *eb)
>   {
>   	return (atomic_read(&eb->io_pages) ||
>   		test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 4341ad978fb8e4..342412d37a7b4b 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -265,7 +265,6 @@ void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
>   bool set_extent_buffer_dirty(struct extent_buffer *eb);
>   void set_extent_buffer_uptodate(struct extent_buffer *eb);
>   void clear_extent_buffer_uptodate(struct extent_buffer *eb);
> -int extent_buffer_under_io(const struct extent_buffer *eb);
>   void extent_range_clear_dirty_for_io(struct inode *inode, u64 start, u64 end);
>   void extent_range_redirty_for_io(struct inode *inode, u64 start, u64 end);
>   void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end,

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer
  2023-03-09  9:05 ` [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer Christoph Hellwig
  2023-03-09 11:10   ` Johannes Thumshirn
@ 2023-03-10  7:27   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  7:27 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> Setting the buffer uptodate in a function that is named as a validation
> helper is a it confusing.  Move the call from validate_extent_buffer to
> the one of its two callers that didn't already have a duplicate call
> to set_extent_buffer_uptodate.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/disk-io.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index bb864cf2eed60f..7d766eaef4aee7 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -590,9 +590,7 @@ static int validate_extent_buffer(struct extent_buffer *eb,
>   	if (found_level > 0 && btrfs_check_node(eb))
>   		ret = -EIO;
>   
> -	if (!ret)
> -		set_extent_buffer_uptodate(eb);
> -	else
> +	if (ret)
>   		btrfs_err(fs_info,
>   		"read time tree block corruption detected on logical %llu mirror %u",
>   			  eb->start, eb->read_mirror);
> @@ -684,6 +682,8 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
>   		goto err;
>   	}
>   	ret = validate_extent_buffer(eb, &bbio->parent_check);
> +	if (!ret)
> +		set_extent_buffer_uptodate(eb);
>   err:
>   	if (ret) {
>   		/*

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate
  2023-03-09  9:05 ` [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate Christoph Hellwig
  2023-03-09 11:17   ` Johannes Thumshirn
@ 2023-03-10  7:28   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  7:28 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> verify_parent_transid is only called by btrfs_buffer_uptodate, which
> confusingly inverts the return value.  Merge the two functions and
> reflow the parent_transid so that error handling is in a branch.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>   fs/btrfs/disk-io.c | 46 +++++++++++++++-------------------------------
>   1 file changed, 15 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 7d766eaef4aee7..d03b431b07781c 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -110,32 +110,33 @@ static void csum_tree_block(struct extent_buffer *buf, u8 *result)
>    * detect blocks that either didn't get written at all or got written
>    * in the wrong place.
>    */
> -static int verify_parent_transid(struct extent_io_tree *io_tree,
> -				 struct extent_buffer *eb, u64 parent_transid,
> -				 int atomic)
> +int btrfs_buffer_uptodate(struct extent_buffer *eb, u64 parent_transid,
> +			  int atomic)
>   {
> +	struct inode *btree_inode = eb->pages[0]->mapping->host;
> +	struct extent_io_tree *io_tree = &BTRFS_I(btree_inode)->io_tree;
>   	struct extent_state *cached_state = NULL;
> -	int ret;
> +	int ret = 1;
>   
> -	if (!parent_transid || btrfs_header_generation(eb) == parent_transid)
> +	if (!extent_buffer_uptodate(eb))
>   		return 0;
>   
> +	if (!parent_transid || btrfs_header_generation(eb) == parent_transid)
> +		return 1;
> +
>   	if (atomic)
>   		return -EAGAIN;
>   
>   	lock_extent(io_tree, eb->start, eb->start + eb->len - 1, &cached_state);
> -	if (extent_buffer_uptodate(eb) &&
> -	    btrfs_header_generation(eb) == parent_transid) {
> -		ret = 0;
> -		goto out;
> -	}
> -	btrfs_err_rl(eb->fs_info,
> +	if (!extent_buffer_uptodate(eb) ||
> +	    btrfs_header_generation(eb) != parent_transid) {
> +		btrfs_err_rl(eb->fs_info,
>   "parent transid verify failed on logical %llu mirror %u wanted %llu found %llu",
>   			eb->start, eb->read_mirror,
>   			parent_transid, btrfs_header_generation(eb));
> -	ret = 1;
> -	clear_extent_buffer_uptodate(eb);
> -out:
> +		clear_extent_buffer_uptodate(eb);
> +		ret = 0;
> +	}
>   	unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
>   		      &cached_state);
>   	return ret;
> @@ -4638,23 +4639,6 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info)
>   	btrfs_close_devices(fs_info->fs_devices);
>   }
>   
> -int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
> -			  int atomic)
> -{
> -	int ret;
> -	struct inode *btree_inode = buf->pages[0]->mapping->host;
> -
> -	ret = extent_buffer_uptodate(buf);
> -	if (!ret)
> -		return ret;
> -
> -	ret = verify_parent_transid(&BTRFS_I(btree_inode)->io_tree, buf,
> -				    parent_transid, atomic);
> -	if (ret == -EAGAIN)
> -		return ret;
> -	return !ret;
> -}
> -
>   void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
>   {
>   	struct btrfs_fs_info *fs_info = buf->fs_info;

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09  9:05 ` [PATCH 04/20] btrfs: always read the entire extent_buffer Christoph Hellwig
  2023-03-09 11:29   ` Johannes Thumshirn
@ 2023-03-10  7:35   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  7:35 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> Currently read_extent_buffer_pages skips pages that are already uptodate
> when reading in an extent_buffer.  While this reduces the amount of data
> read, it increases the number of I/O operations as we now need to do
> multiple I/Os when reading an extent buffer with one or more uptodate
> pages in the middle of it.  On any modern storage device, be that hard
> drives or SSDs this actually decreases I/O performance.  Fortunately
> this case is pretty rare as the pages are always initially read together
> and then aged the same way.
>  Besides simplifying the code a bit as-is
> this will allow for major simplifications to the I/O completion handler
> later on.
> 
> Note that the case where all pages are uptodate is still handled by an
> optimized fast path that does not read any data from disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 302af9b01bda2a..26d8162bee000d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4313,7 +4313,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	int locked_pages = 0;
>   	int all_uptodate = 1;
>   	int num_pages;
> -	unsigned long num_reads = 0;
>   	struct btrfs_bio_ctrl bio_ctrl = {
>   		.opf = REQ_OP_READ,
>   		.mirror_num = mirror_num,
> @@ -4359,10 +4358,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	 */
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
> -		if (!PageUptodate(page)) {
> -			num_reads++;
> +		if (!PageUptodate(page))
>   			all_uptodate = 0;
> -		}
>   	}
>   
>   	if (all_uptodate) {
> @@ -4372,7 +4369,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   
>   	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
>   	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, num_reads);
> +	atomic_set(&eb->io_pages, num_pages);
>   	/*
>   	 * It is possible for release_folio to clear the TREE_REF bit before we
>   	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
> @@ -4382,13 +4379,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	for (i = 0; i < num_pages; i++) {
>   		page = eb->pages[i];
>   
> -		if (!PageUptodate(page)) {
> -			ClearPageError(page);
> -			submit_extent_page(&bio_ctrl, page_offset(page), page,
> -					   PAGE_SIZE, 0);
> -		} else {
> -			unlock_page(page);
> -		}
> +		ClearPageError(page);
> +		submit_extent_page(&bio_ctrl, page_offset(page), page,
> +				   PAGE_SIZE, 0);
>   	}
>   
>   	submit_one_bio(&bio_ctrl);

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-09  9:05 ` [PATCH 05/20] btrfs: simplify extent buffer reading Christoph Hellwig
  2023-03-09 11:59   ` Johannes Thumshirn
@ 2023-03-10  7:42   ` Qu Wenruo
  2023-03-10  7:47     ` Christoph Hellwig
  1 sibling, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  7:42 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> The btrfs_bio_ctrl machinery is overkill for reading extent_buffers
> as we always operate on PAGE SIZE chunks (or one smaller one for the
> subpage case) that are contigous and are guaranteed to fit into a
> single bio.

This is the legacy left by older stripe boundary based bio split code.
(Please note that, metadata crossing stripe boundaries is not ideal and 
is very rare nowadays, but we should still support it).

But now we have btrfs_submit_bio() handling such cases, thus it should 
be fine.

>  Replace it with open coded btrfs_bio_alloc, __bio_add_page
> and btrfs_submit_bio calls in a helper function shared between
> the subpage and node size >= PAGE_SIZE cases.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 99 ++++++++++++++++----------------------------
>   1 file changed, 36 insertions(+), 63 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 26d8162bee000d..5169e73ffea647 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -98,22 +98,12 @@ void btrfs_extent_buffer_leak_debug_check(struct btrfs_fs_info *fs_info)
>    */
>   struct btrfs_bio_ctrl {
>   	struct btrfs_bio *bbio;
> -	int mirror_num;
>   	enum btrfs_compression_type compress_type;
>   	u32 len_to_oe_boundary;
>   	blk_opf_t opf;
>   	btrfs_bio_end_io_t end_io_func;
>   	struct writeback_control *wbc;
>   
> -	/*
> -	 * This is for metadata read, to provide the extra needed verification
> -	 * info.  This has to be provided for submit_one_bio(), as
> -	 * submit_one_bio() can submit a bio if it ends at stripe boundary.  If
> -	 * no such parent_check is provided, the metadata can hit false alert at
> -	 * endio time.
> -	 */
> -	struct btrfs_tree_parent_check *parent_check;
> -
>   	/*
>   	 * Tell writepage not to lock the state bits for this range, it still
>   	 * does the unlocking.
> @@ -124,7 +114,6 @@ struct btrfs_bio_ctrl {
>   static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
>   {
>   	struct btrfs_bio *bbio = bio_ctrl->bbio;
> -	int mirror_num = bio_ctrl->mirror_num;
>   
>   	if (!bbio)
>   		return;
> @@ -132,25 +121,14 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
>   	/* Caller should ensure the bio has at least some range added */
>   	ASSERT(bbio->bio.bi_iter.bi_size);
>   
> -	if (!is_data_inode(&bbio->inode->vfs_inode)) {
> -		if (btrfs_op(&bbio->bio) != BTRFS_MAP_WRITE) {
> -			/*
> -			 * For metadata read, we should have the parent_check,
> -			 * and copy it to bbio for metadata verification.
> -			 */
> -			ASSERT(bio_ctrl->parent_check);
> -			memcpy(&bbio->parent_check,
> -			       bio_ctrl->parent_check,
> -			       sizeof(struct btrfs_tree_parent_check));
> -		}
> +	if (!is_data_inode(&bbio->inode->vfs_inode))
>   		bbio->bio.bi_opf |= REQ_META;
> -	}
>   
>   	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
>   	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
> -		btrfs_submit_compressed_read(bbio, mirror_num);
> +		btrfs_submit_compressed_read(bbio, 0);
>   	else
> -		btrfs_submit_bio(bbio, mirror_num);
> +		btrfs_submit_bio(bbio, 0);
>   
>   	/* The bbio is owned by the end_io handler now */
>   	bio_ctrl->bbio = NULL;
> @@ -4241,6 +4219,36 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
>   	}
>   }
>   
> +static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
> +				       struct btrfs_tree_parent_check *check)
> +{
> +	int num_pages = num_extent_pages(eb), i;
> +	struct btrfs_bio *bbio;
> +
> +	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> +	eb->read_mirror = 0;
> +	atomic_set(&eb->io_pages, num_pages);
> +	check_buffer_tree_ref(eb);
> +
> +	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
> +			       REQ_OP_READ | REQ_META,
> +			       BTRFS_I(eb->fs_info->btree_inode),
> +			       end_bio_extent_readpage, NULL);
> +	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
> +	bbio->file_offset = eb->start;
> +	memcpy(&bbio->parent_check, check, sizeof(*check));
> +	if (eb->fs_info->nodesize < PAGE_SIZE) {
> +		__bio_add_page(&bbio->bio, eb->pages[0], eb->len,
> +			       eb->start - page_offset(eb->pages[0]));
> +	} else {
> +		for (i = 0; i < num_pages; i++) {
> +			ClearPageError(eb->pages[i]);
> +			__bio_add_page(&bbio->bio, eb->pages[i], PAGE_SIZE, 0);
> +		}
> +	}
> +	btrfs_submit_bio(bbio, mirror_num);
> +}
> +
>   static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
>   				      int mirror_num,
>   				      struct btrfs_tree_parent_check *check)
> @@ -4249,11 +4257,6 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
>   	struct extent_io_tree *io_tree;
>   	struct page *page = eb->pages[0];
>   	struct extent_state *cached_state = NULL;
> -	struct btrfs_bio_ctrl bio_ctrl = {
> -		.opf = REQ_OP_READ,
> -		.mirror_num = mirror_num,
> -		.parent_check = check,
> -	};
>   	int ret;
>   
>   	ASSERT(!test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags));
> @@ -4281,18 +4284,10 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
>   		return 0;
>   	}
>   
> -	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> -	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, 1);
> -	check_buffer_tree_ref(eb);
> -	bio_ctrl.end_io_func = end_bio_extent_readpage;
> -
>   	btrfs_subpage_clear_error(fs_info, page, eb->start, eb->len);
> -
>   	btrfs_subpage_start_reader(fs_info, page, eb->start, eb->len);
> -	submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
> -			   eb->start - page_offset(page));
> -	submit_one_bio(&bio_ctrl);
> +
> +	__read_extent_buffer_pages(eb, mirror_num, check);
>   	if (wait != WAIT_COMPLETE) {
>   		free_extent_state(cached_state);
>   		return 0;
> @@ -4313,11 +4308,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	int locked_pages = 0;
>   	int all_uptodate = 1;
>   	int num_pages;
> -	struct btrfs_bio_ctrl bio_ctrl = {
> -		.opf = REQ_OP_READ,
> -		.mirror_num = mirror_num,
> -		.parent_check = check,
> -	};
>   
>   	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
>   		return 0;
> @@ -4367,24 +4357,7 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   		goto unlock_exit;
>   	}
>   
> -	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> -	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, num_pages);
> -	/*
> -	 * It is possible for release_folio to clear the TREE_REF bit before we
> -	 * set io_pages. See check_buffer_tree_ref for a more detailed comment.
> -	 */
> -	check_buffer_tree_ref(eb);
> -	bio_ctrl.end_io_func = end_bio_extent_readpage;
> -	for (i = 0; i < num_pages; i++) {
> -		page = eb->pages[i];
> -
> -		ClearPageError(page);
> -		submit_extent_page(&bio_ctrl, page_offset(page), page,
> -				   PAGE_SIZE, 0);
> -	}
> -
> -	submit_one_bio(&bio_ctrl);
> +	__read_extent_buffer_pages(eb, mirror_num, check);
>   
>   	if (wait != WAIT_COMPLETE)
>   		return 0;

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  7:42   ` Qu Wenruo
@ 2023-03-10  7:47     ` Christoph Hellwig
  2023-03-10  8:02       ` Qu Wenruo
  2023-03-10 10:54       ` Filipe Manana
  0 siblings, 2 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10  7:47 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 03:42:04PM +0800, Qu Wenruo wrote:
> This is the legacy left by older stripe boundary based bio split code.
> (Please note that, metadata crossing stripe boundaries is not ideal and is 
> very rare nowadays, but we should still support it).

How can metadata cross a stripe boundary?

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  7:47     ` Christoph Hellwig
@ 2023-03-10  8:02       ` Qu Wenruo
  2023-03-10  8:03         ` Christoph Hellwig
  2023-03-10 10:54       ` Filipe Manana
  1 sibling, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2023/3/10 15:47, Christoph Hellwig wrote:
> On Fri, Mar 10, 2023 at 03:42:04PM +0800, Qu Wenruo wrote:
>> This is the legacy left by older stripe boundary based bio split code.
>> (Please note that, metadata crossing stripe boundaries is not ideal and is
>> very rare nowadays, but we should still support it).
> 
> How can metadata cross a stripe boundary?

Like this inside a RAID0 bg:

0	32K	64K	96K	128K
|             |//|//|           |

There is an old chunk allocator behavior that we can have a chunk starts 
at some physical bytenr which is only 4K aligned, but not yet STRIPE_LEN 
aligned.

In that case, extent allocator can give us some range which crosses 
stripe boundary.

That's also why we have metadata crossing stripe boundary checks in 
btrfs check and scrub.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  8:02       ` Qu Wenruo
@ 2023-03-10  8:03         ` Christoph Hellwig
  2023-03-10  8:07           ` Qu Wenruo
  0 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10  8:03 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:02:02PM +0800, Qu Wenruo wrote:
> Like this inside a RAID0 bg:
>
> 0	32K	64K	96K	128K
> |             |//|//|           |
>
> There is an old chunk allocator behavior that we can have a chunk starts at 
> some physical bytenr which is only 4K aligned, but not yet STRIPE_LEN 
> aligned.
>
> In that case, extent allocator can give us some range which crosses stripe 
> boundary.

Ewww, ok.  So the metadata isn't required to be naturally aligned.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  8:03         ` Christoph Hellwig
@ 2023-03-10  8:07           ` Qu Wenruo
  2023-03-10  8:15             ` Christoph Hellwig
  0 siblings, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2023/3/10 16:03, Christoph Hellwig wrote:
> On Fri, Mar 10, 2023 at 04:02:02PM +0800, Qu Wenruo wrote:
>> Like this inside a RAID0 bg:
>>
>> 0	32K	64K	96K	128K
>> |             |//|//|           |
>>
>> There is an old chunk allocator behavior that we can have a chunk starts at
>> some physical bytenr which is only 4K aligned, but not yet STRIPE_LEN
>> aligned.
>>
>> In that case, extent allocator can give us some range which crosses stripe
>> boundary.
> 
> Ewww, ok.  So the metadata isn't required to be naturally aligned.

Yes, but you don't need to spend too much time on that.
We haven't hit such case for a long long time.

Unless the fs is super old and never balanced by any currently supported 
LTS kernel, it should be very rare to hit.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
  2023-03-09 13:08   ` Johannes Thumshirn
@ 2023-03-10  8:14   ` Qu Wenruo
  2023-03-10  8:17     ` Christoph Hellwig
  2023-03-10  9:30   ` Qu Wenruo
  2 siblings, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:14 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> Now that we always use a single bio to read an extent_buffer, the buffer
> can be passed to the end_io handler as private data.  This allows
> implementing a much simplified dedicated end I/O handler for metadata
> reads.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

This greatly simplify the subpage routine.

Although we no longer share data and metadata read endio function, it's 
still a net reduce of codes.

But there is a problem related to how we handle the validation.

>   
> +static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
> +{
> +	struct extent_buffer *eb = bbio->private;
> +	bool uptodate = !bbio->bio.bi_status;
> +	struct bvec_iter_all iter_all;
> +	struct bio_vec *bvec;
> +	u32 bio_offset = 0;
> +
> +	atomic_inc(&eb->refs);
> +	eb->read_mirror = bbio->mirror_num;
> +
> +	if (uptodate &&
> +	    btrfs_validate_extent_buffer(eb, &bbio->parent_check) < 0)

Here we call btrfs_validate_extent_buffer() directly.

But in the case that a metadata bio is split, the endio function would 
be called on each split part.

Thus for the first half, we may fail on checking the eb, as the second 
half may not yet finished.

I'm afraid here in the endio function, we still need to call 
btrfs_validate_metadata_buffer(), which will only do the validation 
after all parts of the metadata is properly read.

Thanks,
Qu
> +		uptodate = false;
> +
> +	if (uptodate) {
> +		set_extent_buffer_uptodate(eb);
> +	} else {
> +		clear_extent_buffer_uptodate(eb);
> +		set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> +	}
> +
> +	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
> +		atomic_dec(&eb->io_pages);
> +		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
> +			      bvec->bv_len);
> +		bio_offset += bvec->bv_len;
> +	}
> +
> +	unlock_extent(&bbio->inode->io_tree, eb->start,
> +		      eb->start + bio_offset - 1, NULL);
> +	free_extent_buffer(eb);
> +
> +	bio_put(&bbio->bio);
> +}
> +
>   static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
>   				       struct btrfs_tree_parent_check *check)
>   {
> @@ -4233,7 +4227,7 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
>   	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
>   			       REQ_OP_READ | REQ_META,
>   			       BTRFS_I(eb->fs_info->btree_inode),
> -			       end_bio_extent_readpage, NULL);
> +			       extent_buffer_read_end_io, eb);
>   	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
>   	bbio->file_offset = eb->start;
>   	memcpy(&bbio->parent_check, check, sizeof(*check));

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  8:07           ` Qu Wenruo
@ 2023-03-10  8:15             ` Christoph Hellwig
  2023-03-10  9:14               ` Qu Wenruo
  0 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10  8:15 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:07:42PM +0800, Qu Wenruo wrote:
> Yes, but you don't need to spend too much time on that.
> We haven't hit such case for a long long time.
>
> Unless the fs is super old and never balanced by any currently supported 
> LTS kernel, it should be very rare to hit.

Well, if it is a valid format we'll need to handle it.  And we probably
want a test case to exercise the code path to make sure it doesn't break
when it is so rarely exercised.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-10  8:14   ` Qu Wenruo
@ 2023-03-10  8:17     ` Christoph Hellwig
  2023-03-10  8:30       ` Qu Wenruo
  0 siblings, 1 reply; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10  8:17 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:14:46PM +0800, Qu Wenruo wrote:
> Here we call btrfs_validate_extent_buffer() directly.
>
> But in the case that a metadata bio is split, the endio function would be 
> called on each split part.

No.  bbio->end_io is called on the originally submitted bbio after all
I/O has finished, and is not called multiple times when split.  Without
that all consumers of btrfs_submit_bio would have to know about splitting
and need to be able to deal with cloned bios, which is exactly what I
spent great effort on to avoid (similar to how the block layer works).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-10  8:17     ` Christoph Hellwig
@ 2023-03-10  8:30       ` Qu Wenruo
  0 siblings, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2023/3/10 16:17, Christoph Hellwig wrote:
> On Fri, Mar 10, 2023 at 04:14:46PM +0800, Qu Wenruo wrote:
>> Here we call btrfs_validate_extent_buffer() directly.
>>
>> But in the case that a metadata bio is split, the endio function would be
>> called on each split part.
> 
> No.  bbio->end_io is called on the originally submitted bbio after all
> I/O has finished, and is not called multiple times when split.  Without
> that all consumers of btrfs_submit_bio would have to know about splitting
> and need to be able to deal with cloned bios, which is exactly what I
> spent great effort on to avoid (similar to how the block layer works).

Oh, you avoided the endio call for each splited bio but still handle 
interleaved RAID0 corruption cases by going btrfs_check_read_bio(), and 
directly submit repair for the failed sectors.


Then the code is fine, and makes the life of endio much easier.

Reviewed-by: Qu Wenruo <wqu@suse.com>

And finally I'm understanding why you move the data read repair part to 
bio layer.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/20] btrfs: simplify extent buffer writing
  2023-03-09  9:05 ` [PATCH 12/20] btrfs: simplify extent buffer writing Christoph Hellwig
  2023-03-09 14:00   ` Johannes Thumshirn
@ 2023-03-10  8:34   ` Qu Wenruo
  2023-03-10  8:41     ` Christoph Hellwig
  1 sibling, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:34 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> The btrfs_bio_ctrl machinery is overkill for writing extent_buffers
> as we always operate on PAGE SIZE chunks (or one smaller one for the
> subpage case) that are contigous and are guaranteed to fit into a
> single bio.  Replace it with open coded btrfs_bio_alloc, __bio_add_page
> and btrfs_submit_bio calls.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Does this also mean, the only benefit of manually merging continuous 
pages into a larger bio, compared to multiple smaller adjacent bios, is 
just memory saving?

Thanks,
Qu
> ---
>   fs/btrfs/extent_io.c | 40 ++++++++++++++++++++--------------------
>   1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1fc50d1402cef8..1d7f48190ee9b9 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -121,9 +121,6 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
>   	/* Caller should ensure the bio has at least some range added */
>   	ASSERT(bbio->bio.bi_iter.bi_size);
>   
> -	if (!is_data_inode(&bbio->inode->vfs_inode))
> -		bbio->bio.bi_opf |= REQ_META;
> -
>   	if (btrfs_op(&bbio->bio) == BTRFS_MAP_READ &&
>   	    bio_ctrl->compress_type != BTRFS_COMPRESS_NONE)
>   		btrfs_submit_compressed_read(bbio);
> @@ -1897,11 +1894,7 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
>   	struct btrfs_fs_info *fs_info = eb->fs_info;
>   	struct page *page = eb->pages[0];
>   	bool no_dirty_ebs = false;
> -	struct btrfs_bio_ctrl bio_ctrl = {
> -		.wbc = wbc,
> -		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
> -		.end_io_func = end_bio_subpage_eb_writepage,
> -	};
> +	struct btrfs_bio *bbio;
>   
>   	prepare_eb_write(eb);
>   
> @@ -1915,10 +1908,16 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
>   	if (no_dirty_ebs)
>   		clear_page_dirty_for_io(page);
>   
> -	submit_extent_page(&bio_ctrl, eb->start, page, eb->len,
> -			   eb->start - page_offset(page));
> +	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
> +			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
> +			       BTRFS_I(eb->fs_info->btree_inode),
> +			       end_bio_subpage_eb_writepage, NULL);
> +	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
> +	bbio->file_offset = eb->start;
> +	__bio_add_page(&bbio->bio, page, eb->len, eb->start - page_offset(page));
>   	unlock_page(page);
> -	submit_one_bio(&bio_ctrl);
> +	btrfs_submit_bio(bbio, 0);
> +
>   	/*
>   	 * Submission finished without problem, if no range of the page is
>   	 * dirty anymore, we have submitted a page.  Update nr_written in wbc.
> @@ -1930,16 +1929,18 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
>   static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
>   					    struct writeback_control *wbc)
>   {
> -	u64 disk_bytenr = eb->start;
> +	struct btrfs_bio *bbio;
>   	int i, num_pages;
> -	struct btrfs_bio_ctrl bio_ctrl = {
> -		.wbc = wbc,
> -		.opf = REQ_OP_WRITE | wbc_to_write_flags(wbc),
> -		.end_io_func = end_bio_extent_buffer_writepage,
> -	};
>   
>   	prepare_eb_write(eb);
>   
> +	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
> +			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
> +			       BTRFS_I(eb->fs_info->btree_inode),
> +			       end_bio_extent_buffer_writepage, NULL);
> +	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
> +	bbio->file_offset = eb->start;
> +
>   	num_pages = num_extent_pages(eb);
>   	for (i = 0; i < num_pages; i++) {
>   		struct page *p = eb->pages[i];
> @@ -1947,12 +1948,11 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
>   		lock_page(p);
>   		clear_page_dirty_for_io(p);
>   		set_page_writeback(p);
> -		submit_extent_page(&bio_ctrl, disk_bytenr, p, PAGE_SIZE, 0);
> -		disk_bytenr += PAGE_SIZE;
> +		__bio_add_page(&bbio->bio, p, PAGE_SIZE, 0);
>   		wbc->nr_to_write--;
>   		unlock_page(p);
>   	}
> -	submit_one_bio(&bio_ctrl);
> +	btrfs_submit_bio(bbio, 0);
>   }
>   
>   /*

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 12/20] btrfs: simplify extent buffer writing
  2023-03-10  8:34   ` Qu Wenruo
@ 2023-03-10  8:41     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10  8:41 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:34:10PM +0800, Qu Wenruo wrote:
> Does this also mean, the only benefit of manually merging continuous pages 
> into a larger bio, compared to multiple smaller adjacent bios, is just 
> memory saving?

That is the only significant saving.  There also is a very minimal amount
of CPU cycles saved as well.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler
  2023-03-09  9:05 ` [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler Christoph Hellwig
  2023-03-09 14:10   ` Johannes Thumshirn
@ 2023-03-10  8:44   ` Qu Wenruo
  2023-03-10 11:47     ` Christoph Hellwig
  1 sibling, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:44 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> Now that we always use a single bio to write an extent_buffer, the buffer
> can be passed to the end_io handler as private data.  This allows
> to simplify the metadata write end I/O handler, and merge the subpage
> version into the main one with just a single conditional.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 112 +++++++++----------------------------------
>   1 file changed, 23 insertions(+), 89 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1d7f48190ee9b9..16522bcf5d4a10 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1663,13 +1663,11 @@ static bool lock_extent_buffer_for_io(struct extent_buffer *eb,
>   	return ret;
>   }
>   
> -static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
> +static void set_btree_ioerr(struct extent_buffer *eb)
>   {
>   	struct btrfs_fs_info *fs_info = eb->fs_info;
>   
> -	btrfs_page_set_error(fs_info, page, eb->start, eb->len);
> -	if (test_and_set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))
> -		return;
> +	set_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
>   
>   	/*
>   	 * A read may stumble upon this buffer later, make sure that it gets an
> @@ -1683,7 +1681,7 @@ static void set_btree_ioerr(struct page *page, struct extent_buffer *eb)
>   	 * return a 0 because we are readonly if we don't modify the err seq for
>   	 * the superblock.
>   	 */
> -	mapping_set_error(page->mapping, -EIO);
> +	mapping_set_error(eb->fs_info->btree_inode->i_mapping, -EIO);
>   
>   	/*
>   	 * If writeback for a btree extent that doesn't belong to a log tree
> @@ -1758,101 +1756,37 @@ static struct extent_buffer *find_extent_buffer_nolock(
>   	return NULL;
>   }
>   
> -/*
> - * The endio function for subpage extent buffer write.
> - *
> - * Unlike end_bio_extent_buffer_writepage(), we only call end_page_writeback()
> - * after all extent buffers in the page has finished their writeback.
> - */
> -static void end_bio_subpage_eb_writepage(struct btrfs_bio *bbio)
> +static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
>   {
> -	struct bio *bio = &bbio->bio;
> -	struct btrfs_fs_info *fs_info;
> -	struct bio_vec *bvec;
> +	struct extent_buffer *eb = bbio->private;
> +	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	bool uptodate = !bbio->bio.bi_status;
>   	struct bvec_iter_all iter_all;
> -
> -	fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb);
> -	ASSERT(fs_info->nodesize < PAGE_SIZE);
> -
> -	ASSERT(!bio_flagged(bio, BIO_CLONED));
> -	bio_for_each_segment_all(bvec, bio, iter_all) {
> -		struct page *page = bvec->bv_page;
> -		u64 bvec_start = page_offset(page) + bvec->bv_offset;
> -		u64 bvec_end = bvec_start + bvec->bv_len - 1;
> -		u64 cur_bytenr = bvec_start;
> -
> -		ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize));
> -
> -		/* Iterate through all extent buffers in the range */
> -		while (cur_bytenr <= bvec_end) {
> -			struct extent_buffer *eb;
> -			int done;
> -
> -			/*
> -			 * Here we can't use find_extent_buffer(), as it may
> -			 * try to lock eb->refs_lock, which is not safe in endio
> -			 * context.
> -			 */
> -			eb = find_extent_buffer_nolock(fs_info, cur_bytenr);
> -			ASSERT(eb);
> -
> -			cur_bytenr = eb->start + eb->len;
> -
> -			ASSERT(test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags));
> -			done = atomic_dec_and_test(&eb->io_pages);
> -			ASSERT(done);
> -
> -			if (bio->bi_status ||
> -			    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
> -				ClearPageUptodate(page);
> -				set_btree_ioerr(page, eb);
> -			}
> -
> -			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
> -						      eb->len);
> -			end_extent_buffer_writeback(eb);
> -			/*
> -			 * free_extent_buffer() will grab spinlock which is not
> -			 * safe in endio context. Thus here we manually dec
> -			 * the ref.
> -			 */
> -			atomic_dec(&eb->refs);
> -		}
> -	}
> -	bio_put(bio);
> -}
> -
> -static void end_bio_extent_buffer_writepage(struct btrfs_bio *bbio)
> -{
> -	struct bio *bio = &bbio->bio;
>   	struct bio_vec *bvec;
> -	struct extent_buffer *eb;
> -	int done;
> -	struct bvec_iter_all iter_all;
>   
> -	ASSERT(!bio_flagged(bio, BIO_CLONED));
> -	bio_for_each_segment_all(bvec, bio, iter_all) {
> +	if (!uptodate)
> +		set_btree_ioerr(eb);
> +
> +	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
>   		struct page *page = bvec->bv_page;
>   
> -		eb = (struct extent_buffer *)page->private;
> -		BUG_ON(!eb);
> -		done = atomic_dec_and_test(&eb->io_pages);
> +		atomic_dec(&eb->io_pages);
>   
> -		if (bio->bi_status ||
> -		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
> +		if (!uptodate) {
>   			ClearPageUptodate(page);

There seems to be an existing bug in the endio function for subpage.

We should call btrfs_page_clear_uptodate() helper.
Or we subpage uptodate bitmap and the page uptodate flag would get out 
of sync.

> -			set_btree_ioerr(page, eb);
> +			btrfs_page_set_error(fs_info, page, eb->start, eb->len); >   		}
>   
> -		end_page_writeback(page);
> -
> -		if (!done)
> -			continue;
> +		if (fs_info->nodesize < PAGE_SIZE)
> +			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
> +						      eb->len);
> +		else
> +			end_page_writeback(page);


Here we can just call btrfs_clear_writeback(), which can handle both 
subpage and regular cases.

Thanks,
Qu

>   
> -		end_extent_buffer_writeback(eb);
>   	}
> +	end_extent_buffer_writeback(eb);
>   
> -	bio_put(bio);
> +	bio_put(&bbio->bio);
>   }
>   
>   static void prepare_eb_write(struct extent_buffer *eb)
> @@ -1911,7 +1845,7 @@ static void write_one_subpage_eb(struct extent_buffer *eb,
>   	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
>   			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
>   			       BTRFS_I(eb->fs_info->btree_inode),
> -			       end_bio_subpage_eb_writepage, NULL);
> +			       extent_buffer_write_end_io, eb);
>   	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
>   	bbio->file_offset = eb->start;
>   	__bio_add_page(&bbio->bio, page, eb->len, eb->start - page_offset(page));
> @@ -1937,7 +1871,7 @@ static noinline_for_stack void write_one_eb(struct extent_buffer *eb,
>   	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
>   			       REQ_OP_WRITE | REQ_META | wbc_to_write_flags(wbc),
>   			       BTRFS_I(eb->fs_info->btree_inode),
> -			       end_bio_extent_buffer_writepage, NULL);
> +			       extent_buffer_write_end_io, eb);
>   	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
>   	bbio->file_offset = eb->start;
>   

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer
  2023-03-09  9:05 ` [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer Christoph Hellwig
  2023-03-09 16:01   ` Johannes Thumshirn
@ 2023-03-10  8:53   ` Qu Wenruo
  2023-03-10 11:50     ` Christoph Hellwig
  1 sibling, 1 reply; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:53 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> No need to track the number of pages under I/O now that each
> extent_buffer is read and written using a single bio.  For the
> read side we need to grab an extra reference for the duration of
> the I/O to prevent eviction, though.

But don't we already have an aomtic_inc_not_zero(&eb->refs) call in the 
old submit_eb_pages() function?

And I didn't find that function got modified in the series.

Thanks,
Qu

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 17 +++++------------
>   fs/btrfs/extent_io.h |  1 -
>   2 files changed, 5 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 16522bcf5d4a10..24df1247d81d88 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1770,8 +1770,6 @@ static void extent_buffer_write_end_io(struct btrfs_bio *bbio)
>   	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
>   		struct page *page = bvec->bv_page;
>   
> -		atomic_dec(&eb->io_pages);
> -
>   		if (!uptodate) {
>   			ClearPageUptodate(page);
>   			btrfs_page_set_error(fs_info, page, eb->start, eb->len);
> @@ -1796,7 +1794,6 @@ static void prepare_eb_write(struct extent_buffer *eb)
>   	unsigned long end;
>   
>   	clear_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags);
> -	atomic_set(&eb->io_pages, num_extent_pages(eb));
>   
>   	/* Set btree blocks beyond nritems with 0 to avoid stale content */
>   	nritems = btrfs_header_nritems(eb);
> @@ -3236,8 +3233,7 @@ static void __free_extent_buffer(struct extent_buffer *eb)
>   
>   static int extent_buffer_under_io(const struct extent_buffer *eb)
>   {
> -	return (atomic_read(&eb->io_pages) ||
> -		test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
> +	return (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags) ||
>   		test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>   }
>   
> @@ -3374,7 +3370,6 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start,
>   
>   	spin_lock_init(&eb->refs_lock);
>   	atomic_set(&eb->refs, 1);
> -	atomic_set(&eb->io_pages, 0);
>   
>   	ASSERT(len <= BTRFS_MAX_METADATA_BLOCKSIZE);
>   
> @@ -3491,9 +3486,9 @@ static void check_buffer_tree_ref(struct extent_buffer *eb)
>   	 * adequately protected by the refcount, but the TREE_REF bit and
>   	 * its corresponding reference are not. To protect against this
>   	 * class of races, we call check_buffer_tree_ref from the codepaths
> -	 * which trigger io after they set eb->io_pages. Note that once io is
> -	 * initiated, TREE_REF can no longer be cleared, so that is the
> -	 * moment at which any such race is best fixed.
> +	 * which trigger io. Note that once io is initiated, TREE_REF can no
> +	 * longer be cleared, so that is the moment at which any such race is
> +	 * best fixed.
>   	 */
>   	refs = atomic_read(&eb->refs);
>   	if (refs >= 2 && test_bit(EXTENT_BUFFER_TREE_REF, &eb->bflags))
> @@ -4063,7 +4058,6 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
>   	struct bio_vec *bvec;
>   	u32 bio_offset = 0;
>   
> -	atomic_inc(&eb->refs);
>   	eb->read_mirror = bbio->mirror_num;
>   
>   	if (uptodate &&
> @@ -4078,7 +4072,6 @@ static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
>   	}
>   
>   	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
> -		atomic_dec(&eb->io_pages);
>   		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
>   			      bvec->bv_len);
>   		bio_offset += bvec->bv_len;
> @@ -4101,8 +4094,8 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
>   
>   	clear_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
>   	eb->read_mirror = 0;
> -	atomic_set(&eb->io_pages, num_pages);
>   	check_buffer_tree_ref(eb);
> +	atomic_inc(&eb->refs);
>   
>   	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
>   			       REQ_OP_READ | REQ_META,
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 342412d37a7b4b..12854a2b48f060 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -79,7 +79,6 @@ struct extent_buffer {
>   	struct btrfs_fs_info *fs_info;
>   	spinlock_t refs_lock;
>   	atomic_t refs;
> -	atomic_t io_pages;
>   	int read_mirror;
>   	struct rcu_head rcu_head;
>   	pid_t lock_owner;

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 14/20] btrfs: simplify btree block checksumming
  2023-03-09  9:05 ` [PATCH 14/20] btrfs: simplify btree block checksumming Christoph Hellwig
  2023-03-09 15:51   ` Johannes Thumshirn
@ 2023-03-10  8:57   ` Qu Wenruo
  1 sibling, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  8:57 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> The checksumming of btree blocks always operates on the entire
> extent_buffer, and because btree blocks are always allocated contiguously
> on disk they are never split by btrfs_submit_bio.
> 
> Simplify the checksumming code by finding the extent_buffer in the
> btrfs_bio private data instead of trying to search through the bio_vec.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Qu Wenruo <wqu@suse.com>

The whole idea of never merging metadata bios, and let 
btrfs_submit_bio() to handle split, really help a lot to cleanup the eb 
grabbing.

Thanks,
Qu

> ---
>   fs/btrfs/disk-io.c | 121 ++++++++++-----------------------------------
>   1 file changed, 25 insertions(+), 96 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 6795acae476993..e007e15e1455e1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -312,12 +312,35 @@ int btrfs_read_extent_buffer(struct extent_buffer *eb,
>   	return ret;
>   }
>   
> -static int csum_one_extent_buffer(struct extent_buffer *eb)
> +/*
> + * Checksum a dirty tree block before IO.
> + */
> +blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
>   {
> +	struct extent_buffer *eb = bbio->private;
>   	struct btrfs_fs_info *fs_info = eb->fs_info;
> +	u64 found_start = btrfs_header_bytenr(eb);
>   	u8 result[BTRFS_CSUM_SIZE];
>   	int ret;
>   
> +	/*
> +	 * Btree blocks are always contiguous on disk.
> +	 */
> +	if (WARN_ON_ONCE(bbio->file_offset != eb->start))
> +		return BLK_STS_IOERR;
> +	if (WARN_ON_ONCE(bbio->bio.bi_iter.bi_size != eb->len))
> +		return BLK_STS_IOERR;
> +
> +	if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
> +		WARN_ON_ONCE(found_start != 0);
> +		return BLK_STS_OK;
> +	}
> +
> +	if (WARN_ON_ONCE(found_start != eb->start))
> +		return BLK_STS_IOERR;
> +	if (WARN_ON_ONCE(!PageUptodate(eb->pages[0])))
> +		return BLK_STS_IOERR;
> +
>   	ASSERT(memcmp_extent_buffer(eb, fs_info->fs_devices->metadata_uuid,
>   				    offsetof(struct btrfs_header, fsid),
>   				    BTRFS_FSID_SIZE) == 0);
> @@ -344,8 +367,7 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>   		goto error;
>   	}
>   	write_extent_buffer(eb, result, 0, fs_info->csum_size);
> -
> -	return 0;
> +	return BLK_STS_OK;
>   
>   error:
>   	btrfs_print_tree(eb, 0);
> @@ -359,99 +381,6 @@ static int csum_one_extent_buffer(struct extent_buffer *eb)
>   	 */
>   	WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG) ||
>   		btrfs_header_owner(eb) == BTRFS_TREE_LOG_OBJECTID);
> -	return ret;
> -}
> -
> -/* Checksum all dirty extent buffers in one bio_vec */
> -static int csum_dirty_subpage_buffers(struct btrfs_fs_info *fs_info,
> -				      struct bio_vec *bvec)
> -{
> -	struct page *page = bvec->bv_page;
> -	u64 bvec_start = page_offset(page) + bvec->bv_offset;
> -	u64 cur;
> -	int ret = 0;
> -
> -	for (cur = bvec_start; cur < bvec_start + bvec->bv_len;
> -	     cur += fs_info->nodesize) {
> -		struct extent_buffer *eb;
> -		bool uptodate;
> -
> -		eb = find_extent_buffer(fs_info, cur);
> -		uptodate = btrfs_subpage_test_uptodate(fs_info, page, cur,
> -						       fs_info->nodesize);
> -
> -		/* A dirty eb shouldn't disappear from buffer_radix */
> -		if (WARN_ON(!eb))
> -			return -EUCLEAN;
> -
> -		if (WARN_ON(cur != btrfs_header_bytenr(eb))) {
> -			free_extent_buffer(eb);
> -			return -EUCLEAN;
> -		}
> -		if (WARN_ON(!uptodate)) {
> -			free_extent_buffer(eb);
> -			return -EUCLEAN;
> -		}
> -
> -		ret = csum_one_extent_buffer(eb);
> -		free_extent_buffer(eb);
> -		if (ret < 0)
> -			return ret;
> -	}
> -	return ret;
> -}
> -
> -/*
> - * Checksum a dirty tree block before IO.  This has extra checks to make sure
> - * we only fill in the checksum field in the first page of a multi-page block.
> - * For subpage extent buffers we need bvec to also read the offset in the page.
> - */
> -static int csum_dirty_buffer(struct btrfs_fs_info *fs_info, struct bio_vec *bvec)
> -{
> -	struct page *page = bvec->bv_page;
> -	u64 start = page_offset(page);
> -	u64 found_start;
> -	struct extent_buffer *eb;
> -
> -	if (fs_info->nodesize < PAGE_SIZE)
> -		return csum_dirty_subpage_buffers(fs_info, bvec);
> -
> -	eb = (struct extent_buffer *)page->private;
> -	if (page != eb->pages[0])
> -		return 0;
> -
> -	found_start = btrfs_header_bytenr(eb);
> -
> -	if (test_bit(EXTENT_BUFFER_NO_CHECK, &eb->bflags)) {
> -		WARN_ON(found_start != 0);
> -		return 0;
> -	}
> -
> -	/*
> -	 * Please do not consolidate these warnings into a single if.
> -	 * It is useful to know what went wrong.
> -	 */
> -	if (WARN_ON(found_start != start))
> -		return -EUCLEAN;
> -	if (WARN_ON(!PageUptodate(page)))
> -		return -EUCLEAN;
> -
> -	return csum_one_extent_buffer(eb);
> -}
> -
> -blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio)
> -{
> -	struct btrfs_fs_info *fs_info = bbio->inode->root->fs_info;
> -	struct bvec_iter iter;
> -	struct bio_vec bv;
> -	int ret = 0;
> -
> -	bio_for_each_segment(bv, &bbio->bio, iter) {
> -		ret = csum_dirty_buffer(fs_info, &bv);
> -		if (ret)
> -			break;
> -	}
> -
>   	return errno_to_blk_status(ret);
>   }
>   

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages
  2023-03-09  9:05 ` [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages Christoph Hellwig
  2023-03-09 16:10   ` Johannes Thumshirn
@ 2023-03-10  9:08   ` Qu Wenruo
  2023-03-10 11:54     ` Christoph Hellwig
  2023-03-14  6:12     ` Christoph Hellwig
  1 sibling, 2 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  9:08 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> The only place that reads in pages and thus marks them uptodate for
> the btree inode is read_extent_buffer_pages.  Which means that either
> pages are already uptodate from an old buffer when creating a new
> one in alloc_extent_buffer, or they will be updated by ca call
> to read_extent_buffer_pages.  This means the checks for uptodate
> pages in read_extent_buffer_pages and read_extent_buffer_subpage are
> superflous and can be removed.

Can we rely completely on EXTENT_BUFFER_UPTODATE flag and getting rid of 
PateUptodate for metadata?

Or the PageUptodate would be shared by some other common routines like 
readahead?

Thanks,
Qu
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/extent_io.c | 21 +--------------------
>   1 file changed, 1 insertion(+), 20 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 88a941c64fc73f..73ea618282d466 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4138,10 +4138,7 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
>   			return ret;
>   	}
>   
> -	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags) ||
> -	    PageUptodate(page) ||
> -	    btrfs_subpage_test_uptodate(fs_info, page, eb->start, eb->len)) {
> -		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
> +	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) {
>   		unlock_extent(io_tree, eb->start, eb->start + eb->len - 1,
>   			      &cached_state);
>   		return 0;
> @@ -4168,7 +4165,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   	int i;
>   	struct page *page;
>   	int locked_pages = 0;
> -	int all_uptodate = 1;
>   	int num_pages;
>   
>   	if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
> @@ -4203,21 +4199,6 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num,
>   		}
>   		locked_pages++;
>   	}
> -	/*
> -	 * We need to firstly lock all pages to make sure that
> -	 * the uptodate bit of our pages won't be affected by
> -	 * clear_extent_buffer_uptodate().
> -	 */
> -	for (i = 0; i < num_pages; i++) {
> -		page = eb->pages[i];
> -		if (!PageUptodate(page))
> -			all_uptodate = 0;
> -	}
> -
> -	if (all_uptodate) {
> -		set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
> -		goto unlock_exit;
> -	}
>   
>   	__read_extent_buffer_pages(eb, mirror_num, check);
>   

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  8:15             ` Christoph Hellwig
@ 2023-03-10  9:14               ` Qu Wenruo
  0 siblings, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  9:14 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo
  Cc: Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2023/3/10 16:15, Christoph Hellwig wrote:
> On Fri, Mar 10, 2023 at 04:07:42PM +0800, Qu Wenruo wrote:
>> Yes, but you don't need to spend too much time on that.
>> We haven't hit such case for a long long time.
>>
>> Unless the fs is super old and never balanced by any currently supported
>> LTS kernel, it should be very rare to hit.
> 
> Well, if it is a valid format we'll need to handle it.  And we probably
> want a test case to exercise the code path to make sure it doesn't break
> when it is so rarely exercised.

Then we're already in a big trouble:

- Fstests doesn't accept binary dump

- We don't have any good way to create such slightly unaligned chunks
   Older progs may not even compile, and any currently supported LTS
   kernel won't create such chunk at all...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler
  2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
  2023-03-09 13:08   ` Johannes Thumshirn
  2023-03-10  8:14   ` Qu Wenruo
@ 2023-03-10  9:30   ` Qu Wenruo
  2 siblings, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10  9:30 UTC (permalink / raw)
  To: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba; +Cc: linux-btrfs



On 2023/3/9 17:05, Christoph Hellwig wrote:
> Now that we always use a single bio to read an extent_buffer, the buffer
> can be passed to the end_io handler as private data.  This allows
> implementing a much simplified dedicated end I/O handler for metadata
> reads.

This greately simplify the behavior for subpage.

Looks pretty good.

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   fs/btrfs/disk-io.c   | 105 +------------------------------------------
>   fs/btrfs/disk-io.h   |   5 +--
>   fs/btrfs/extent_io.c |  80 +++++++++++++++------------------
>   3 files changed, 41 insertions(+), 149 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index d03b431b07781c..6795acae476993 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -485,8 +485,8 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
>   }
>   
>   /* Do basic extent buffer checks at read time */
> -static int validate_extent_buffer(struct extent_buffer *eb,
> -				  struct btrfs_tree_parent_check *check)
> +int btrfs_validate_extent_buffer(struct extent_buffer *eb,
> +				 struct btrfs_tree_parent_check *check)
>   {
>   	struct btrfs_fs_info *fs_info = eb->fs_info;
>   	u64 found_start;
> @@ -599,107 +599,6 @@ static int validate_extent_buffer(struct extent_buffer *eb,
>   	return ret;
>   }
>   
> -static int validate_subpage_buffer(struct page *page, u64 start, u64 end,
> -				   int mirror, struct btrfs_tree_parent_check *check)
> -{
> -	struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb);
> -	struct extent_buffer *eb;
> -	bool reads_done;
> -	int ret = 0;
> -
> -	ASSERT(check);
> -
> -	/*
> -	 * We don't allow bio merge for subpage metadata read, so we should
> -	 * only get one eb for each endio hook.
> -	 */
> -	ASSERT(end == start + fs_info->nodesize - 1);
> -	ASSERT(PagePrivate(page));
> -
> -	eb = find_extent_buffer(fs_info, start);
> -	/*
> -	 * When we are reading one tree block, eb must have been inserted into
> -	 * the radix tree. If not, something is wrong.
> -	 */
> -	ASSERT(eb);
> -
> -	reads_done = atomic_dec_and_test(&eb->io_pages);
> -	/* Subpage read must finish in page read */
> -	ASSERT(reads_done);
> -
> -	eb->read_mirror = mirror;
> -	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
> -		ret = -EIO;
> -		goto err;
> -	}
> -	ret = validate_extent_buffer(eb, check);
> -	if (ret < 0)
> -		goto err;
> -
> -	set_extent_buffer_uptodate(eb);
> -
> -	free_extent_buffer(eb);
> -	return ret;
> -err:
> -	/*
> -	 * end_bio_extent_readpage decrements io_pages in case of error,
> -	 * make sure it has something to decrement.
> -	 */
> -	atomic_inc(&eb->io_pages);
> -	clear_extent_buffer_uptodate(eb);
> -	free_extent_buffer(eb);
> -	return ret;
> -}
> -
> -int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
> -				   struct page *page, u64 start, u64 end,
> -				   int mirror)
> -{
> -	struct extent_buffer *eb;
> -	int ret = 0;
> -	int reads_done;
> -
> -	ASSERT(page->private);
> -
> -	if (btrfs_sb(page->mapping->host->i_sb)->nodesize < PAGE_SIZE)
> -		return validate_subpage_buffer(page, start, end, mirror,
> -					       &bbio->parent_check);
> -
> -	eb = (struct extent_buffer *)page->private;
> -
> -	/*
> -	 * The pending IO might have been the only thing that kept this buffer
> -	 * in memory.  Make sure we have a ref for all this other checks
> -	 */
> -	atomic_inc(&eb->refs);
> -
> -	reads_done = atomic_dec_and_test(&eb->io_pages);
> -	if (!reads_done)
> -		goto err;
> -
> -	eb->read_mirror = mirror;
> -	if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
> -		ret = -EIO;
> -		goto err;
> -	}
> -	ret = validate_extent_buffer(eb, &bbio->parent_check);
> -	if (!ret)
> -		set_extent_buffer_uptodate(eb);
> -err:
> -	if (ret) {
> -		/*
> -		 * our io error hook is going to dec the io pages
> -		 * again, we have to make sure it has something
> -		 * to decrement
> -		 */
> -		atomic_inc(&eb->io_pages);
> -		clear_extent_buffer_uptodate(eb);
> -	}
> -	free_extent_buffer(eb);
> -
> -	return ret;
> -}
> -
>   #ifdef CONFIG_MIGRATION
>   static int btree_migrate_folio(struct address_space *mapping,
>   		struct folio *dst, struct folio *src, enum migrate_mode mode)
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 4d577233011023..2923b5d7cfca0b 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -84,9 +84,8 @@ void btrfs_btree_balance_dirty(struct btrfs_fs_info *fs_info);
>   void btrfs_btree_balance_dirty_nodelay(struct btrfs_fs_info *fs_info);
>   void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info,
>   				 struct btrfs_root *root);
> -int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio,
> -				   struct page *page, u64 start, u64 end,
> -				   int mirror);
> +int btrfs_validate_extent_buffer(struct extent_buffer *eb,
> +				 struct btrfs_tree_parent_check *check);
>   #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
>   struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info);
>   #endif
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index d60a80572b8ba2..738fcf5cbc71d6 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -663,35 +663,6 @@ static void begin_page_read(struct btrfs_fs_info *fs_info, struct page *page)
>   	btrfs_subpage_start_reader(fs_info, page, page_offset(page), PAGE_SIZE);
>   }
>   
> -/*
> - * Find extent buffer for a givne bytenr.
> - *
> - * This is for end_bio_extent_readpage(), thus we can't do any unsafe locking
> - * in endio context.
> - */
> -static struct extent_buffer *find_extent_buffer_readpage(
> -		struct btrfs_fs_info *fs_info, struct page *page, u64 bytenr)
> -{
> -	struct extent_buffer *eb;
> -
> -	/*
> -	 * For regular sectorsize, we can use page->private to grab extent
> -	 * buffer
> -	 */
> -	if (fs_info->nodesize >= PAGE_SIZE) {
> -		ASSERT(PagePrivate(page) && page->private);
> -		return (struct extent_buffer *)page->private;
> -	}
> -
> -	/* For subpage case, we need to lookup buffer radix tree */
> -	rcu_read_lock();
> -	eb = radix_tree_lookup(&fs_info->buffer_radix,
> -			       bytenr >> fs_info->sectorsize_bits);
> -	rcu_read_unlock();
> -	ASSERT(eb);
> -	return eb;
> -}
> -
>   /*
>    * after a readpage IO is done, we need to:
>    * clear the uptodate bits on error
> @@ -713,7 +684,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
>   	 * larger than UINT_MAX, u32 here is enough.
>   	 */
>   	u32 bio_offset = 0;
> -	int mirror;
>   	struct bvec_iter_all iter_all;
>   
>   	ASSERT(!bio_flagged(bio, BIO_CLONED));
> @@ -753,11 +723,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
>   		end = start + bvec->bv_len - 1;
>   		len = bvec->bv_len;
>   
> -		mirror = bbio->mirror_num;
> -		if (uptodate && !is_data_inode(inode) &&
> -		    btrfs_validate_metadata_buffer(bbio, page, start, end, mirror))
> -			uptodate = false;
> -
>   		if (likely(uptodate)) {
>   			loff_t i_size = i_size_read(inode);
>   			pgoff_t end_index = i_size >> PAGE_SHIFT;
> @@ -778,13 +743,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio)
>   				zero_user_segment(page, zero_start,
>   						  offset_in_page(end) + 1);
>   			}
> -		} else if (!is_data_inode(inode)) {
> -			struct extent_buffer *eb;
> -
> -			eb = find_extent_buffer_readpage(fs_info, page, start);
> -			set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> -			eb->read_mirror = mirror;
> -			atomic_dec(&eb->io_pages);
>   		}
>   
>   		/* Update page status and unlock. */
> @@ -4219,6 +4177,42 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
>   	}
>   }
>   
> +static void extent_buffer_read_end_io(struct btrfs_bio *bbio)
> +{
> +	struct extent_buffer *eb = bbio->private;
> +	bool uptodate = !bbio->bio.bi_status;
> +	struct bvec_iter_all iter_all;
> +	struct bio_vec *bvec;
> +	u32 bio_offset = 0;
> +
> +	atomic_inc(&eb->refs);
> +	eb->read_mirror = bbio->mirror_num;
> +
> +	if (uptodate &&
> +	    btrfs_validate_extent_buffer(eb, &bbio->parent_check) < 0)
> +		uptodate = false;
> +
> +	if (uptodate) {
> +		set_extent_buffer_uptodate(eb);
> +	} else {
> +		clear_extent_buffer_uptodate(eb);
> +		set_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags);
> +	}
> +
> +	bio_for_each_segment_all(bvec, &bbio->bio, iter_all) {
> +		atomic_dec(&eb->io_pages);
> +		end_page_read(bvec->bv_page, uptodate, eb->start + bio_offset,
> +			      bvec->bv_len);
> +		bio_offset += bvec->bv_len;
> +	}
> +
> +	unlock_extent(&bbio->inode->io_tree, eb->start,
> +		      eb->start + bio_offset - 1, NULL);
> +	free_extent_buffer(eb);
> +
> +	bio_put(&bbio->bio);
> +}
> +
>   static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
>   				       struct btrfs_tree_parent_check *check)
>   {
> @@ -4233,7 +4227,7 @@ static void __read_extent_buffer_pages(struct extent_buffer *eb, int mirror_num,
>   	bbio = btrfs_bio_alloc(INLINE_EXTENT_BUFFER_PAGES,
>   			       REQ_OP_READ | REQ_META,
>   			       BTRFS_I(eb->fs_info->btree_inode),
> -			       end_bio_extent_readpage, NULL);
> +			       extent_buffer_read_end_io, eb);
>   	bbio->bio.bi_iter.bi_sector = eb->start >> SECTOR_SHIFT;
>   	bbio->file_offset = eb->start;
>   	memcpy(&bbio->parent_check, check, sizeof(*check));

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10  7:47     ` Christoph Hellwig
  2023-03-10  8:02       ` Qu Wenruo
@ 2023-03-10 10:54       ` Filipe Manana
  2023-03-10 11:12         ` Qu Wenruo
  1 sibling, 1 reply; 74+ messages in thread
From: Filipe Manana @ 2023-03-10 10:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Qu Wenruo, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 8:02 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Mar 10, 2023 at 03:42:04PM +0800, Qu Wenruo wrote:
> > This is the legacy left by older stripe boundary based bio split code.
> > (Please note that, metadata crossing stripe boundaries is not ideal and is
> > very rare nowadays, but we should still support it).
>
> How can metadata cross a stripe boundary?

Probably with mixed block groups (mkfs.btrfs -O mixed-bg) you can get
that, as block groups are used to allocate both data extents (variable
length) and metadata extents (fixed length).

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 05/20] btrfs: simplify extent buffer reading
  2023-03-10 10:54       ` Filipe Manana
@ 2023-03-10 11:12         ` Qu Wenruo
  0 siblings, 0 replies; 74+ messages in thread
From: Qu Wenruo @ 2023-03-10 11:12 UTC (permalink / raw)
  To: Filipe Manana, Christoph Hellwig
  Cc: Qu Wenruo, Chris Mason, Josef Bacik, David Sterba, linux-btrfs



On 2023/3/10 18:54, Filipe Manana wrote:
> On Fri, Mar 10, 2023 at 8:02 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> On Fri, Mar 10, 2023 at 03:42:04PM +0800, Qu Wenruo wrote:
>>> This is the legacy left by older stripe boundary based bio split code.
>>> (Please note that, metadata crossing stripe boundaries is not ideal and is
>>> very rare nowadays, but we should still support it).
>>
>> How can metadata cross a stripe boundary?
> 
> Probably with mixed block groups (mkfs.btrfs -O mixed-bg) you can get
> that, as block groups are used to allocate both data extents (variable
> length) and metadata extents (fixed length).

Mixed block groups requires nodesize to be the same as sectorsize, thus 
not possible.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler
  2023-03-10  8:44   ` Qu Wenruo
@ 2023-03-10 11:47     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10 11:47 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:44:37PM +0800, Qu Wenruo wrote:
>> -		BUG_ON(!eb);
>> -		done = atomic_dec_and_test(&eb->io_pages);
>> +		atomic_dec(&eb->io_pages);
>>   -		if (bio->bi_status ||
>> -		    test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags)) {
>> +		if (!uptodate) {
>>   			ClearPageUptodate(page);
>
> There seems to be an existing bug in the endio function for subpage.
>
> We should call btrfs_page_clear_uptodate() helper.
> Or we subpage uptodate bitmap and the page uptodate flag would get out of 
> sync.

Indeed.  I'll add a patch to fix this to the beginning of the series
for the next round.

>> +		if (fs_info->nodesize < PAGE_SIZE)
>> +			btrfs_subpage_clear_writeback(fs_info, page, eb->start,
>> +						      eb->len);
>> +		else
>> +			end_page_writeback(page);
>
>
> Here we can just call btrfs_clear_writeback(), which can handle both 
> subpage and regular cases.

Ok.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer
  2023-03-10  8:53   ` Qu Wenruo
@ 2023-03-10 11:50     ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10 11:50 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 04:53:14PM +0800, Qu Wenruo wrote:
>
>
> On 2023/3/9 17:05, Christoph Hellwig wrote:
>> No need to track the number of pages under I/O now that each
>> extent_buffer is read and written using a single bio.  For the
>> read side we need to grab an extra reference for the duration of
>> the I/O to prevent eviction, though.
>
> But don't we already have an aomtic_inc_not_zero(&eb->refs) call in the old 
> submit_eb_pages() function?

I guess you mean submit_eb_page?  That deals with writeback, so doesn't
help with having a reference during reds, which this patch adds.

> And I didn't find that function got modified in the series.

It doesn't have to.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages
  2023-03-10  9:08   ` Qu Wenruo
@ 2023-03-10 11:54     ` Christoph Hellwig
  2023-03-14  6:12     ` Christoph Hellwig
  1 sibling, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-10 11:54 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 05:08:17PM +0800, Qu Wenruo wrote:
>> The only place that reads in pages and thus marks them uptodate for
>> the btree inode is read_extent_buffer_pages.  Which means that either
>> pages are already uptodate from an old buffer when creating a new
>> one in alloc_extent_buffer, or they will be updated by ca call
>> to read_extent_buffer_pages.  This means the checks for uptodate
>> pages in read_extent_buffer_pages and read_extent_buffer_subpage are
>> superflous and can be removed.
>
> Can we rely completely on EXTENT_BUFFER_UPTODATE flag and getting rid of 
> PateUptodate for metadata?
>
> Or the PageUptodate would be shared by some other common routines like 
> readahead?

That's a good question.  Not maintaining PageUptodate would simplify
a lot of things.  As btree_aops has no ->readpage or ->readahead,
the filemap read code is never used, which the biggest dependency
on PageUptodate in the core pagecache code.

So right now I can't think of anything requiring it, but I'd have
to carefully look into that.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09 11:29   ` Johannes Thumshirn
  2023-03-09 15:21     ` Christoph Hellwig
@ 2023-03-14  6:09     ` Christoph Hellwig
  2023-03-17 23:16     ` David Sterba
  2 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-14  6:09 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> > -		if (!PageUptodate(page)) {
> > -			num_reads++;
> > +		if (!PageUptodate(page))
> >  			all_uptodate = 0;
> > -		}
> 
> I think we could also break out of the loop here. As soon as
> one page isn't uptodate we don't care about the fast path anymore.

FYI, I've decided to keep this as-is for the next version as the
code is gone by the end of the series anyway.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages
  2023-03-10  9:08   ` Qu Wenruo
  2023-03-10 11:54     ` Christoph Hellwig
@ 2023-03-14  6:12     ` Christoph Hellwig
  1 sibling, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-14  6:12 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Fri, Mar 10, 2023 at 05:08:17PM +0800, Qu Wenruo wrote:
> Can we rely completely on EXTENT_BUFFER_UPTODATE flag and getting rid of 
> PateUptodate for metadata?

I looked into this a bit, and think without stopping to use the
page cache entirely (which I want to get to eventually) this causes
more trouble than it solves.  So I've skipped it for this series,
but eventually there will be another round of massive simplifications
here.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-09 11:29   ` Johannes Thumshirn
  2023-03-09 15:21     ` Christoph Hellwig
  2023-03-14  6:09     ` Christoph Hellwig
@ 2023-03-17 23:16     ` David Sterba
  2023-03-20  5:46       ` Christoph Hellwig
  2 siblings, 1 reply; 74+ messages in thread
From: David Sterba @ 2023-03-17 23:16 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Chris Mason, Josef Bacik, David Sterba, linux-btrfs

On Thu, Mar 09, 2023 at 11:29:54AM +0000, Johannes Thumshirn wrote:
> On 09.03.23 10:07, Christoph Hellwig wrote:
> > Currently read_extent_buffer_pages skips pages that are already uptodate
> > when reading in an extent_buffer.  While this reduces the amount of data
> > read, it increases the number of I/O operations as we now need to do
> > multiple I/Os when reading an extent buffer with one or more uptodate
> > pages in the middle of it.  On any modern storage device, be that hard
> > drives or SSDs this actually decreases I/O performance.  Fortunately
> > this case is pretty rare as the pages are always initially read together
> > and then aged the same way.  Besides simplifying the code a bit as-is
> > this will allow for major simplifications to the I/O completion handler
> > later on.
> > 
> > Note that the case where all pages are uptodate is still handled by an
> > optimized fast path that does not read any data from disk.
> 
> Can someone smarter than me explain why we need to iterate eb->pages four
> times in this function? This doesn't look super efficient to me.

I remember one bug that was solved by splitting the first loop into two,
one locking all pages first and then checking the uptodate status, the
comment is explaining that.

2571e739677f ("Btrfs: fix memory leak in reading btree blocks")

As you can see in the changelog it's a bit convoluted race, the number
of loops should not matter if they're there for correctness reasons. I
haven't checked the final code but I doubt it's equivalent and may
introduce subtle bugs.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 04/20] btrfs: always read the entire extent_buffer
  2023-03-17 23:16     ` David Sterba
@ 2023-03-20  5:46       ` Christoph Hellwig
  0 siblings, 0 replies; 74+ messages in thread
From: Christoph Hellwig @ 2023-03-20  5:46 UTC (permalink / raw)
  To: David Sterba
  Cc: Johannes Thumshirn, Christoph Hellwig, Chris Mason, Josef Bacik,
	David Sterba, linux-btrfs

On Sat, Mar 18, 2023 at 12:16:09AM +0100, David Sterba wrote:
> I remember one bug that was solved by splitting the first loop into two,
> one locking all pages first and then checking the uptodate status, the
> comment is explaining that.
> 
> 2571e739677f ("Btrfs: fix memory leak in reading btree blocks")
> 
> As you can see in the changelog it's a bit convoluted race, the number
> of loops should not matter if they're there for correctness reasons. I
> haven't checked the final code but I doubt it's equivalent and may
> introduce subtle bugs.

The patch we're replying to would have actually fixed that above
bug on it's own, and thus have also fixed the above issue.  The
problem it solves was that num_pages could be smaller than the
actual number of pages in the extent_buffer.  With this change,
we always read all pages and thus can't have a wrong ->io_pages.
In fact a later patch removes ->io_pages entirely.  Even better
at the end of the series the locking between different reads moves
to the extent_buffer level, so the above race is fundamentally
fixed at a higher level and there won't be an extra read at all.

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2023-03-20  5:46 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09  9:05 simplify extent_buffer reading and writing Christoph Hellwig
2023-03-09  9:05 ` [PATCH 01/20] btrfs: mark extent_buffer_under_io static Christoph Hellwig
2023-03-09 11:06   ` Johannes Thumshirn
2023-03-10  7:26   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 02/20] btrfs: move setting the buffer uptodate out of validate_extent_buffer Christoph Hellwig
2023-03-09 11:10   ` Johannes Thumshirn
2023-03-10  7:27   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 03/20] btrfs: merge verify_parent_transid and btrfs_buffer_uptodate Christoph Hellwig
2023-03-09 11:17   ` Johannes Thumshirn
2023-03-09 15:21     ` Christoph Hellwig
2023-03-10  7:28   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 04/20] btrfs: always read the entire extent_buffer Christoph Hellwig
2023-03-09 11:29   ` Johannes Thumshirn
2023-03-09 15:21     ` Christoph Hellwig
2023-03-14  6:09     ` Christoph Hellwig
2023-03-17 23:16     ` David Sterba
2023-03-20  5:46       ` Christoph Hellwig
2023-03-10  7:35   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 05/20] btrfs: simplify extent buffer reading Christoph Hellwig
2023-03-09 11:59   ` Johannes Thumshirn
2023-03-10  7:42   ` Qu Wenruo
2023-03-10  7:47     ` Christoph Hellwig
2023-03-10  8:02       ` Qu Wenruo
2023-03-10  8:03         ` Christoph Hellwig
2023-03-10  8:07           ` Qu Wenruo
2023-03-10  8:15             ` Christoph Hellwig
2023-03-10  9:14               ` Qu Wenruo
2023-03-10 10:54       ` Filipe Manana
2023-03-10 11:12         ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 06/20] btrfs: remove the mirror_num argument to btrfs_submit_compressed_read Christoph Hellwig
2023-03-09 12:58   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 07/20] btrfs: simplify the read_extent_buffer end_io handler Christoph Hellwig
2023-03-09 13:08   ` Johannes Thumshirn
2023-03-10  8:14   ` Qu Wenruo
2023-03-10  8:17     ` Christoph Hellwig
2023-03-10  8:30       ` Qu Wenruo
2023-03-10  9:30   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 08/20] btrfs: do not try to unlock the extent for non-subpage metadata reads Christoph Hellwig
2023-03-09 13:13   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 09/20] btrfs: return bool from lock_extent_buffer_for_io Christoph Hellwig
2023-03-09 13:17   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 10/20] btrfs: submit a writeback bio per extent_buffer Christoph Hellwig
2023-03-09 13:35   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 11/20] btrfs: move page locking from lock_extent_buffer_for_io to write_one_eb Christoph Hellwig
2023-03-09 13:46   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 12/20] btrfs: simplify extent buffer writing Christoph Hellwig
2023-03-09 14:00   ` Johannes Thumshirn
2023-03-09 15:22     ` Christoph Hellwig
2023-03-10  8:34   ` Qu Wenruo
2023-03-10  8:41     ` Christoph Hellwig
2023-03-09  9:05 ` [PATCH 13/20] btrfs: simplify the extent_buffer write end_io handler Christoph Hellwig
2023-03-09 14:10   ` Johannes Thumshirn
2023-03-09 15:22     ` Christoph Hellwig
2023-03-10  8:44   ` Qu Wenruo
2023-03-10 11:47     ` Christoph Hellwig
2023-03-09  9:05 ` [PATCH 14/20] btrfs: simplify btree block checksumming Christoph Hellwig
2023-03-09 15:51   ` Johannes Thumshirn
2023-03-10  8:57   ` Qu Wenruo
2023-03-09  9:05 ` [PATCH 15/20] btrfs: remove the io_pages field in struct extent_buffer Christoph Hellwig
2023-03-09 16:01   ` Johannes Thumshirn
2023-03-10  8:53   ` Qu Wenruo
2023-03-10 11:50     ` Christoph Hellwig
2023-03-09  9:05 ` [PATCH 16/20] btrfs: stop using PageError for extent_buffers Christoph Hellwig
2023-03-09 16:05   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 17/20] btrfs: don't check for uptodate pages in read_extent_buffer_pages Christoph Hellwig
2023-03-09 16:10   ` Johannes Thumshirn
2023-03-10  9:08   ` Qu Wenruo
2023-03-10 11:54     ` Christoph Hellwig
2023-03-14  6:12     ` Christoph Hellwig
2023-03-09  9:05 ` [PATCH 18/20] btrfs: stop using lock_extent in btrfs_buffer_uptodate Christoph Hellwig
2023-03-09  9:05 ` [PATCH 19/20] btrfs: use per-buffer locking for extent_buffer reading Christoph Hellwig
2023-03-09 17:12   ` Johannes Thumshirn
2023-03-09  9:05 ` [PATCH 20/20] btrfs: merge write_one_subpage_eb into write_one_eb Christoph Hellwig
2023-03-09 17:28   ` Johannes Thumshirn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.