All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases
@ 2022-04-12 12:30 Qu Wenruo
  2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-12 12:30 UTC (permalink / raw)
  To: linux-btrfs

[CHANGELOG]
v2:
- Make submit_one_bio() to return void just in the same patch

- Update the commit message the for 2nd patch
  To mention remaining error path (which is not really possible to
  trigger), to prove the first patch is already fixing all involved
  error paths.

[DESCRIPTION]
When testing my subpage raid56 support, generic/475 is always hanging
with some data page unable to be unlocked.

It turns out that, the hang is not related to the raid56 subpage
support (obviously, as the test case is not utilizing RAID56 at all),
but a recent commit 1784b7d502a9 ("btrfs: handle csum lookup errors
properly on reads") introduced a new error path, and it caught us by
surprise.

The new error path is from btrfs_lookup_bio_sums(), which can now return
error if the csum search failed.

This new error path exposed several problems:

- Double cleanup for submit_one_bio() and its callers
  Bio submission hooks, btrfs_submit_data_bio() and
  btrfs_submit_metadata_bio() will call endio to cleanup on errors.

  But those bio submission hooks will also return error, and
  finally callers of submit_extent_page() will also try to do
  cleanup.

  This will be fixed by the first patch, by always returning 0 for
  submit_one_bio().
  This fix is kept as minimal as possible, to make backport easier.
  The proper conversion to return void will be done in the last patch.

- btrfs_do_readpage() can leave page locked on error
  If submit_extent_page() failed in btrfs_do_readpage(), we only
  cleanup the current range, and leaving the remaining subpage
  range locked.

  This bug is subpage specific, and will not affect regular cases.

  Fix it by cleaning up all the remaining subpage range before
  exiting.

- __extent_writepage_io() can return 0 even it hit some error
  Although we continue writing the remaining ranges, we didn't save
  the first error, causing @ret to be overwritten.
 
  This bug is subpage specific, as for regular cases we only have one
  sector inside the page.

  Fix it by introducing @has_error and @saved_ret.

I manually checked all other submit_extent_page() callers, they all
look fine and won't cause problems like the above.

Finally since submit_one_bio() will always return 0, the final patch
will make it return void, which greatly makes our code cleaner.

But that patch is introducing quite some modifications, not a candidate
for backport, unlike the first 3 patches.

Special thanks to Josef, as my initial bisection points to his patch and
I have no clue why it can cause problems at all.
His hints immediately solved all my questions, and lead to this
patchset.


Qu Wenruo (3):
  btrfs: avoid double clean up when submit_one_bio() failed
  btrfs: fix the error handling for submit_extent_page() for
    btrfs_do_readpage()
  btrfs: return correct error number for __extent_writepage_io()

 fs/btrfs/extent_io.c | 135 ++++++++++++++++++-------------------------
 fs/btrfs/extent_io.h |   3 +-
 fs/btrfs/inode.c     |  13 ++---
 3 files changed, 62 insertions(+), 89 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-12 12:30 [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases Qu Wenruo
@ 2022-04-12 12:30 ` Qu Wenruo
  2022-04-12 20:41   ` David Sterba
  2022-04-15  6:28   ` Christoph Hellwig
  2022-04-12 12:30 ` [PATCH v2 2/3] btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-12 12:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
When running generic/475 with 64K page size and 4K sector size, it has a
very high chance (almost 100%) to hang, with mostly data page locked but
no one is going to unlock it.

[CAUSE]
With commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
reads"), if we failed to lookup checksum due to metadata IO error, we
will return error for btrfs_submit_data_bio().

This will cause the page to be unlocked twice in btrfs_do_readpage():

 btrfs_do_readpage()
 |- submit_extent_page()
 |  |- submit_one_bio()
 |     |- btrfs_submit_data_bio()
 |        |- if (ret) {
 |        |-     bio->bi_status = ret;
 |        |-     bio_endio(bio); }
 |               In the endio function, we will call end_page_read()
 |               and unlock_extent() to cleanup the subpage range.
 |
 |- if (ret) {
 |-        unlock_extent(); end_page_read() }
           Here we unlock the extent and cleanup the subpage range
           again.

For unlock_extent(), it's mostly double unlock safe.

But for end_page_read(), it's not, especially for subpage case,
as for subpage case we will call btrfs_subpage_end_reader() to reduce
the reader number, and use that to number to determine if we need to
unlock the full page.

If double accounted, it can underflow the number and leave the page
locked without anyone to unlock it.

[FIX]
The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
reads") itself is completely fine, it's our existing code not properly
handling the error from bio submission hook properly.

This patch will make submit_one_bio() to return void so that the callers
will never be able to do cleanup when bio submission hook fails.

CC: stable@vger.kernel.org # 5.18+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 114 ++++++++++++++-----------------------------
 fs/btrfs/extent_io.h |   3 +-
 fs/btrfs/inode.c     |  13 +++--
 3 files changed, 44 insertions(+), 86 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 15a429e28174..163aa6dee987 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -166,24 +166,27 @@ static int add_extent_changeset(struct extent_state *state, u32 bits,
 	return ret;
 }
 
-int __must_check submit_one_bio(struct bio *bio, int mirror_num,
-				unsigned long bio_flags)
+void submit_one_bio(struct bio *bio, int mirror_num, unsigned long bio_flags)
 {
-	blk_status_t ret = 0;
 	struct extent_io_tree *tree = bio->bi_private;
 
 	bio->bi_private = NULL;
 
 	/* Caller should ensure the bio has at least some range added */
 	ASSERT(bio->bi_iter.bi_size);
+
 	if (is_data_inode(tree->private_data))
-		ret = btrfs_submit_data_bio(tree->private_data, bio, mirror_num,
+		btrfs_submit_data_bio(tree->private_data, bio, mirror_num,
 					    bio_flags);
 	else
-		ret = btrfs_submit_metadata_bio(tree->private_data, bio,
+		btrfs_submit_metadata_bio(tree->private_data, bio,
 						mirror_num, bio_flags);
-
-	return blk_status_to_errno(ret);
+	/*
+	 * Above submission hooks will handle the error by ending the bio,
+	 * which will do the cleanup properly.
+	 * So here we should not return any error, or the caller of
+	 * submit_extent_page() will do cleanup again, causing problems.
+	 */
 }
 
 /* Cleanup unsubmitted bios */
@@ -204,13 +207,12 @@ static void end_write_bio(struct extent_page_data *epd, int ret)
  * Return 0 if everything is OK.
  * Return <0 for error.
  */
-static int __must_check flush_write_bio(struct extent_page_data *epd)
+static void flush_write_bio(struct extent_page_data *epd)
 {
-	int ret = 0;
 	struct bio *bio = epd->bio_ctrl.bio;
 
 	if (bio) {
-		ret = submit_one_bio(bio, 0, 0);
+		submit_one_bio(bio, 0, 0);
 		/*
 		 * Clean up of epd->bio is handled by its endio function.
 		 * And endio is either triggered by successful bio execution
@@ -220,7 +222,6 @@ static int __must_check flush_write_bio(struct extent_page_data *epd)
 		 */
 		epd->bio_ctrl.bio = NULL;
 	}
-	return ret;
 }
 
 int __init extent_state_cache_init(void)
@@ -3441,10 +3442,8 @@ static int submit_extent_page(unsigned int opf,
 	ASSERT(pg_offset < PAGE_SIZE && size <= PAGE_SIZE &&
 	       pg_offset + size <= PAGE_SIZE);
 	if (force_bio_submit && bio_ctrl->bio) {
-		ret = submit_one_bio(bio_ctrl->bio, mirror_num, bio_ctrl->bio_flags);
+		submit_one_bio(bio_ctrl->bio, mirror_num, bio_ctrl->bio_flags);
 		bio_ctrl->bio = NULL;
-		if (ret < 0)
-			return ret;
 	}
 
 	while (cur < pg_offset + size) {
@@ -3485,11 +3484,9 @@ static int submit_extent_page(unsigned int opf,
 		if (added < size - offset) {
 			/* The bio should contain some page(s) */
 			ASSERT(bio_ctrl->bio->bi_iter.bi_size);
-			ret = submit_one_bio(bio_ctrl->bio, mirror_num,
+			submit_one_bio(bio_ctrl->bio, mirror_num,
 					bio_ctrl->bio_flags);
 			bio_ctrl->bio = NULL;
-			if (ret < 0)
-				return ret;
 		}
 		cur += added;
 	}
@@ -4237,14 +4234,12 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 			  struct extent_page_data *epd)
 {
 	struct btrfs_fs_info *fs_info = eb->fs_info;
-	int i, num_pages, failed_page_nr;
+	int i, num_pages;
 	int flush = 0;
 	int ret = 0;
 
 	if (!btrfs_try_tree_write_lock(eb)) {
-		ret = flush_write_bio(epd);
-		if (ret < 0)
-			return ret;
+		flush_write_bio(epd);
 		flush = 1;
 		btrfs_tree_lock(eb);
 	}
@@ -4254,9 +4249,7 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 		if (!epd->sync_io)
 			return 0;
 		if (!flush) {
-			ret = flush_write_bio(epd);
-			if (ret < 0)
-				return ret;
+			flush_write_bio(epd);
 			flush = 1;
 		}
 		while (1) {
@@ -4303,39 +4296,13 @@ static noinline_for_stack int lock_extent_buffer_for_io(struct extent_buffer *eb
 
 		if (!trylock_page(p)) {
 			if (!flush) {
-				int err;
-
-				err = flush_write_bio(epd);
-				if (err < 0) {
-					ret = err;
-					failed_page_nr = i;
-					goto err_unlock;
-				}
+				flush_write_bio(epd);
 				flush = 1;
 			}
 			lock_page(p);
 		}
 	}
 
-	return ret;
-err_unlock:
-	/* Unlock already locked pages */
-	for (i = 0; i < failed_page_nr; i++)
-		unlock_page(eb->pages[i]);
-	/*
-	 * Clear EXTENT_BUFFER_WRITEBACK and wake up anyone waiting on it.
-	 * Also set back EXTENT_BUFFER_DIRTY so future attempts to this eb can
-	 * be made and undo everything done before.
-	 */
-	btrfs_tree_lock(eb);
-	spin_lock(&eb->refs_lock);
-	set_bit(EXTENT_BUFFER_DIRTY, &eb->bflags);
-	end_extent_buffer_writeback(eb);
-	spin_unlock(&eb->refs_lock);
-	percpu_counter_add_batch(&fs_info->dirty_metadata_bytes, eb->len,
-				 fs_info->dirty_metadata_batch);
-	btrfs_clear_header_flag(eb, BTRFS_HEADER_FLAG_WRITTEN);
-	btrfs_tree_unlock(eb);
 	return ret;
 }
 
@@ -4957,13 +4924,19 @@ int btree_write_cache_pages(struct address_space *mapping,
 	 *   if the fs already has error.
 	 */
 	if (!BTRFS_FS_ERROR(fs_info)) {
-		ret = flush_write_bio(&epd);
+		flush_write_bio(&epd);
 	} else {
 		ret = -EROFS;
 		end_write_bio(&epd, ret);
 	}
 out:
 	btrfs_zoned_meta_io_unlock(fs_info);
+	/*
+	 * We can get ret > 0 from submit_extent_page() indicating how many ebs
+	 * were submitted. Reset it to 0 to avoid false alerts for the caller.
+	 */
+	if (ret > 0)
+		ret = 0;
 	return ret;
 }
 
@@ -5065,8 +5038,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			 * tmpfs file mapping
 			 */
 			if (!trylock_page(page)) {
-				ret = flush_write_bio(epd);
-				BUG_ON(ret < 0);
+				flush_write_bio(epd);
 				lock_page(page);
 			}
 
@@ -5076,10 +5048,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
 			}
 
 			if (wbc->sync_mode != WB_SYNC_NONE) {
-				if (PageWriteback(page)) {
-					ret = flush_write_bio(epd);
-					BUG_ON(ret < 0);
-				}
+				if (PageWriteback(page))
+					flush_write_bio(epd);
 				wait_on_page_writeback(page);
 			}
 
@@ -5119,9 +5089,8 @@ static int extent_write_cache_pages(struct address_space *mapping,
 		 * page in our current bio, and thus deadlock, so flush the
 		 * write bio here.
 		 */
-		ret = flush_write_bio(epd);
-		if (!ret)
-			goto retry;
+		flush_write_bio(epd);
+		goto retry;
 	}
 
 	if (wbc->range_cyclic || (wbc->nr_to_write > 0 && range_whole))
@@ -5147,8 +5116,7 @@ int extent_write_full_page(struct page *page, struct writeback_control *wbc)
 		return ret;
 	}
 
-	ret = flush_write_bio(&epd);
-	ASSERT(ret <= 0);
+	flush_write_bio(&epd);
 	return ret;
 }
 
@@ -5210,7 +5178,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
 	}
 
 	if (!found_error)
-		ret = flush_write_bio(&epd);
+		flush_write_bio(&epd);
 	else
 		end_write_bio(&epd, ret);
 
@@ -5243,7 +5211,7 @@ int extent_writepages(struct address_space *mapping,
 		end_write_bio(&epd, ret);
 		return ret;
 	}
-	ret = flush_write_bio(&epd);
+	flush_write_bio(&epd);
 	return ret;
 }
 
@@ -5266,10 +5234,8 @@ void extent_readahead(struct readahead_control *rac)
 	if (em_cached)
 		free_extent_map(em_cached);
 
-	if (bio_ctrl.bio) {
-		if (submit_one_bio(bio_ctrl.bio, 0, bio_ctrl.bio_flags))
-			return;
-	}
+	if (bio_ctrl.bio)
+		submit_one_bio(bio_ctrl.bio, 0, bio_ctrl.bio_flags);
 }
 
 /*
@@ -6649,12 +6615,8 @@ static int read_extent_buffer_subpage(struct extent_buffer *eb, int wait,
 		atomic_dec(&eb->io_pages);
 	}
 	if (bio_ctrl.bio) {
-		int tmp;
-
-		tmp = submit_one_bio(bio_ctrl.bio, mirror_num, 0);
+		submit_one_bio(bio_ctrl.bio, mirror_num, 0);
 		bio_ctrl.bio = NULL;
-		if (tmp < 0)
-			return tmp;
 	}
 	if (ret || wait != WAIT_COMPLETE)
 		return ret;
@@ -6767,10 +6729,8 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	}
 
 	if (bio_ctrl.bio) {
-		err = submit_one_bio(bio_ctrl.bio, mirror_num, bio_ctrl.bio_flags);
+		submit_one_bio(bio_ctrl.bio, mirror_num, bio_ctrl.bio_flags);
 		bio_ctrl.bio = NULL;
-		if (err)
-			return err;
 	}
 
 	if (ret || wait != WAIT_COMPLETE)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 05253612ce7b..9a283b2358b8 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -178,8 +178,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
 int try_release_extent_mapping(struct page *page, gfp_t mask);
 int try_release_extent_buffer(struct page *page);
 
-int __must_check submit_one_bio(struct bio *bio, int mirror_num,
-				unsigned long bio_flags);
+void submit_one_bio(struct bio *bio, int mirror_num, unsigned long bio_flags);
 int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 		      struct btrfs_bio_ctrl *bio_ctrl,
 		      unsigned int read_flags, u64 *prev_em_start);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 03fa33db1640..62fdcb7576ba 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8158,13 +8158,12 @@ int btrfs_readpage(struct file *file, struct page *page)
 	btrfs_lock_and_flush_ordered_range(inode, start, end, NULL);
 
 	ret = btrfs_do_readpage(page, NULL, &bio_ctrl, 0, NULL);
-	if (bio_ctrl.bio) {
-		int ret2;
-
-		ret2 = submit_one_bio(bio_ctrl.bio, 0, bio_ctrl.bio_flags);
-		if (ret == 0)
-			ret = ret2;
-	}
+	/*
+	 * If btrfs_do_readpage() failed we will want to submit the assembled
+	 * bio to do the cleanup.
+	 */
+	if (bio_ctrl.bio)
+		submit_one_bio(bio_ctrl.bio, 0, bio_ctrl.bio_flags);
 	return ret;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage()
  2022-04-12 12:30 [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases Qu Wenruo
  2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
@ 2022-04-12 12:30 ` Qu Wenruo
  2022-04-12 12:30 ` [PATCH v2 3/3] btrfs: return correct error number for __extent_writepage_io() Qu Wenruo
  2022-04-12 20:42 ` [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases David Sterba
  3 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-12 12:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
Test case generic/475 have a very high chance (almost 100%) to hit a fs
hang, where a data page will never be unlocked and hang all later
operations.

[CAUSE]
In btrfs_do_readpage(), if we hit an error from submit_extent_page() we
will try to do the cleanup for our current io range, and exit.

This works fine for PAGE_SIZE == sectorsize cases, but not for subpage.

For subpage btrfs_do_readpage() will lock the full page first, which can
contain several different sectors and extents:

 btrfs_do_readpage()
 |- begin_page_read()
 |  |- btrfs_subpage_start_reader();
 |     Now the page will hage PAGE_SIZE / sectorsize reader pending,
 |     and the page is locked.
 |
 |- end_page_read() for different branches
 |  This function will reduce subpage readers, and when readers
 |  reach 0, it will unlock the page.

But when submit_extent_page() failed, we only cleanup the current
io range, while the remaining io range will never be cleaned up, and the
page remains locked forever.

[FIX]
Update the error handling of submit_extent_page() to cleanup all the
remaining subpage range before exiting the loop.

Please note that, now submit_extent_page() can only fail due to
sanity check in alloc_new_bio().

Thus regular IO errors are impossible to trigger the error path.

CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 163aa6dee987..990d8475ba31 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3774,8 +3774,12 @@ int btrfs_do_readpage(struct page *page, struct extent_map **em_cached,
 					 this_bio_flag,
 					 force_bio_submit);
 		if (ret) {
-			unlock_extent(tree, cur, cur + iosize - 1);
-			end_page_read(page, false, cur, iosize);
+			/*
+			 * We have to unlock the remaining range, or the page
+			 * will never be unlocked.
+			 */
+			unlock_extent(tree, cur, end);
+			end_page_read(page, false, cur, end + 1 - cur);
 			goto out;
 		}
 		cur = cur + iosize;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] btrfs: return correct error number for __extent_writepage_io()
  2022-04-12 12:30 [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases Qu Wenruo
  2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
  2022-04-12 12:30 ` [PATCH v2 2/3] btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() Qu Wenruo
@ 2022-04-12 12:30 ` Qu Wenruo
  2022-04-12 20:42 ` [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases David Sterba
  3 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-12 12:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: stable

[BUG]
If we hit an error from submit_extent_page() inside
__extent_writepage_io(), we could still return 0 to the caller, and
even trigger the warning in btrfs_page_assert_not_dirty().

[CAUSE]
In __extent_writepage_io(), if we hit an error from
submit_extent_page(), we will just clean up the range and continue.

This is completely fine for regular PAGE_SIZE == sectorsize, as we can
only hit one sector in one page, thus after the error we're ensured to
exit and @ret will be saved.

But for subpage case, we may have other dirty subpage range in the page,
and in the next loop, we may succeeded submitting the next range.

In that case, @ret will be overwritten, and we return 0 to the caller,
while we have hit some error.

[FIX]
Introduce @has_error and @saved_ret to record the first error we hit, so
we will never forget what error we hit.

CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 990d8475ba31..df4e78ff3b18 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3955,10 +3955,12 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	u64 extent_offset;
 	u64 block_start;
 	struct extent_map *em;
+	int saved_ret = 0;
 	int ret = 0;
 	int nr = 0;
 	u32 opf = REQ_OP_WRITE;
 	const unsigned int write_flags = wbc_to_write_flags(wbc);
+	bool has_error = false;
 	bool compressed;
 
 	ret = btrfs_writepage_cow_fixup(page);
@@ -4008,6 +4010,9 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 		if (IS_ERR(em)) {
 			btrfs_page_set_error(fs_info, page, cur, end - cur + 1);
 			ret = PTR_ERR_OR_ZERO(em);
+			has_error = true;
+			if (!saved_ret)
+				saved_ret = ret;
 			break;
 		}
 
@@ -4071,6 +4076,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 					 end_bio_extent_writepage,
 					 0, 0, false);
 		if (ret) {
+			has_error = true;
+			if (!saved_ret)
+				saved_ret = ret;
+
 			btrfs_page_set_error(fs_info, page, cur, iosize);
 			if (PageWriteback(page))
 				btrfs_page_clear_writeback(fs_info, page, cur,
@@ -4084,8 +4093,10 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode,
 	 * If we finish without problem, we should not only clear page dirty,
 	 * but also empty subpage dirty bits
 	 */
-	if (!ret)
+	if (!has_error)
 		btrfs_page_assert_not_dirty(fs_info, page);
+	else
+		ret = saved_ret;
 	*nr_ret = nr;
 	return ret;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
@ 2022-04-12 20:41   ` David Sterba
  2022-04-12 23:32     ` Qu Wenruo
  2022-04-15  6:28   ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-04-12 20:41 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, stable

On Tue, Apr 12, 2022 at 08:30:13PM +0800, Qu Wenruo wrote:
> [BUG]
> When running generic/475 with 64K page size and 4K sector size, it has a
> very high chance (almost 100%) to hang, with mostly data page locked but
> no one is going to unlock it.
> 
> [CAUSE]
> With commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> reads"), if we failed to lookup checksum due to metadata IO error, we
> will return error for btrfs_submit_data_bio().
> 
> This will cause the page to be unlocked twice in btrfs_do_readpage():
> 
>  btrfs_do_readpage()
>  |- submit_extent_page()
>  |  |- submit_one_bio()
>  |     |- btrfs_submit_data_bio()
>  |        |- if (ret) {
>  |        |-     bio->bi_status = ret;
>  |        |-     bio_endio(bio); }
>  |               In the endio function, we will call end_page_read()
>  |               and unlock_extent() to cleanup the subpage range.
>  |
>  |- if (ret) {
>  |-        unlock_extent(); end_page_read() }
>            Here we unlock the extent and cleanup the subpage range
>            again.
> 
> For unlock_extent(), it's mostly double unlock safe.
> 
> But for end_page_read(), it's not, especially for subpage case,
> as for subpage case we will call btrfs_subpage_end_reader() to reduce
> the reader number, and use that to number to determine if we need to
> unlock the full page.
> 
> If double accounted, it can underflow the number and leave the page
> locked without anyone to unlock it.
> 
> [FIX]
> The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> reads") itself is completely fine, it's our existing code not properly
> handling the error from bio submission hook properly.
> 
> This patch will make submit_one_bio() to return void so that the callers
> will never be able to do cleanup when bio submission hook fails.
> 
> CC: stable@vger.kernel.org # 5.18+

BTW stable tags are only for released kernels, if it's still in some rc
then Fixes: is appropriate.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases
  2022-04-12 12:30 [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases Qu Wenruo
                   ` (2 preceding siblings ...)
  2022-04-12 12:30 ` [PATCH v2 3/3] btrfs: return correct error number for __extent_writepage_io() Qu Wenruo
@ 2022-04-12 20:42 ` David Sterba
  2022-04-14 19:38   ` David Sterba
  3 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-04-12 20:42 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Apr 12, 2022 at 08:30:12PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> v2:
> - Make submit_one_bio() to return void just in the same patch
> 
> - Update the commit message the for 2nd patch
>   To mention remaining error path (which is not really possible to
>   trigger), to prove the first patch is already fixing all involved
>   error paths.
> 
> [DESCRIPTION]
> When testing my subpage raid56 support, generic/475 is always hanging
> with some data page unable to be unlocked.
> 
> It turns out that, the hang is not related to the raid56 subpage
> support (obviously, as the test case is not utilizing RAID56 at all),
> but a recent commit 1784b7d502a9 ("btrfs: handle csum lookup errors
> properly on reads") introduced a new error path, and it caught us by
> surprise.
> 
> The new error path is from btrfs_lookup_bio_sums(), which can now return
> error if the csum search failed.
> 
> This new error path exposed several problems:
> 
> - Double cleanup for submit_one_bio() and its callers
>   Bio submission hooks, btrfs_submit_data_bio() and
>   btrfs_submit_metadata_bio() will call endio to cleanup on errors.
> 
>   But those bio submission hooks will also return error, and
>   finally callers of submit_extent_page() will also try to do
>   cleanup.
> 
>   This will be fixed by the first patch, by always returning 0 for
>   submit_one_bio().
>   This fix is kept as minimal as possible, to make backport easier.
>   The proper conversion to return void will be done in the last patch.
> 
> - btrfs_do_readpage() can leave page locked on error
>   If submit_extent_page() failed in btrfs_do_readpage(), we only
>   cleanup the current range, and leaving the remaining subpage
>   range locked.
> 
>   This bug is subpage specific, and will not affect regular cases.
> 
>   Fix it by cleaning up all the remaining subpage range before
>   exiting.
> 
> - __extent_writepage_io() can return 0 even it hit some error
>   Although we continue writing the remaining ranges, we didn't save
>   the first error, causing @ret to be overwritten.
>  
>   This bug is subpage specific, as for regular cases we only have one
>   sector inside the page.
> 
>   Fix it by introducing @has_error and @saved_ret.
> 
> I manually checked all other submit_extent_page() callers, they all
> look fine and won't cause problems like the above.
> 
> Finally since submit_one_bio() will always return 0, the final patch
> will make it return void, which greatly makes our code cleaner.
> 
> But that patch is introducing quite some modifications, not a candidate
> for backport, unlike the first 3 patches.
> 
> Special thanks to Josef, as my initial bisection points to his patch and
> I have no clue why it can cause problems at all.
> His hints immediately solved all my questions, and lead to this
> patchset.
> 
> 
> Qu Wenruo (3):
>   btrfs: avoid double clean up when submit_one_bio() failed
>   btrfs: fix the error handling for submit_extent_page() for
>     btrfs_do_readpage()
>   btrfs: return correct error number for __extent_writepage_io()

Added as topic branch to for-next, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-12 20:41   ` David Sterba
@ 2022-04-12 23:32     ` Qu Wenruo
  2022-04-13 13:46       ` David Sterba
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2022-04-12 23:32 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, stable



On 2022/4/13 04:41, David Sterba wrote:
> On Tue, Apr 12, 2022 at 08:30:13PM +0800, Qu Wenruo wrote:
>> [BUG]
>> When running generic/475 with 64K page size and 4K sector size, it has a
>> very high chance (almost 100%) to hang, with mostly data page locked but
>> no one is going to unlock it.
>>
>> [CAUSE]
>> With commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
>> reads"), if we failed to lookup checksum due to metadata IO error, we
>> will return error for btrfs_submit_data_bio().
>>
>> This will cause the page to be unlocked twice in btrfs_do_readpage():
>>
>>   btrfs_do_readpage()
>>   |- submit_extent_page()
>>   |  |- submit_one_bio()
>>   |     |- btrfs_submit_data_bio()
>>   |        |- if (ret) {
>>   |        |-     bio->bi_status = ret;
>>   |        |-     bio_endio(bio); }
>>   |               In the endio function, we will call end_page_read()
>>   |               and unlock_extent() to cleanup the subpage range.
>>   |
>>   |- if (ret) {
>>   |-        unlock_extent(); end_page_read() }
>>             Here we unlock the extent and cleanup the subpage range
>>             again.
>>
>> For unlock_extent(), it's mostly double unlock safe.
>>
>> But for end_page_read(), it's not, especially for subpage case,
>> as for subpage case we will call btrfs_subpage_end_reader() to reduce
>> the reader number, and use that to number to determine if we need to
>> unlock the full page.
>>
>> If double accounted, it can underflow the number and leave the page
>> locked without anyone to unlock it.
>>
>> [FIX]
>> The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
>> reads") itself is completely fine, it's our existing code not properly
>> handling the error from bio submission hook properly.
>>
>> This patch will make submit_one_bio() to return void so that the callers
>> will never be able to do cleanup when bio submission hook fails.
>>
>> CC: stable@vger.kernel.org # 5.18+
>
> BTW stable tags are only for released kernels, if it's still in some rc
> then Fixes: is appropriate.

The problem is I don't have a good idea on which commit to be fixed.

Commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
reads") is completely fine by itself.

The problem is there for a long long time, but can only be triggered
with IO errors with that newer commit.

Should we really use that commit? It looks like a scapegoat to me...

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-12 23:32     ` Qu Wenruo
@ 2022-04-13 13:46       ` David Sterba
  2022-04-13 23:23         ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: David Sterba @ 2022-04-13 13:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, stable

On Wed, Apr 13, 2022 at 07:32:41AM +0800, Qu Wenruo wrote:
> On 2022/4/13 04:41, David Sterba wrote:
> > On Tue, Apr 12, 2022 at 08:30:13PM +0800, Qu Wenruo wrote:
> >> The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> >> reads") itself is completely fine, it's our existing code not properly
> >> handling the error from bio submission hook properly.
> >>
> >> This patch will make submit_one_bio() to return void so that the callers
> >> will never be able to do cleanup when bio submission hook fails.
> >>
> >> CC: stable@vger.kernel.org # 5.18+
> >
> > BTW stable tags are only for released kernels, if it's still in some rc
> > then Fixes: is appropriate.
> 
> The problem is I don't have a good idea on which commit to be fixed.
> 
> Commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
> reads") is completely fine by itself.
> 
> The problem is there for a long long time, but can only be triggered
> with IO errors with that newer commit.
> 
> Should we really use that commit? It looks like a scapegoat to me...

I see, so it does not make sense to put Fixes: if it's not clearly
caused by the patch, the text description of the problem and references
to patches that could affect is OK.

Still the stable tag should reflect where the fix applies but 5.18
hasn't been released so either it's a typo or you know roughly which
stable kernels should get the fix (5.15, 5.10, etc).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-13 13:46       ` David Sterba
@ 2022-04-13 23:23         ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-13 23:23 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, stable



On 2022/4/13 21:46, David Sterba wrote:
> On Wed, Apr 13, 2022 at 07:32:41AM +0800, Qu Wenruo wrote:
>> On 2022/4/13 04:41, David Sterba wrote:
>>> On Tue, Apr 12, 2022 at 08:30:13PM +0800, Qu Wenruo wrote:
>>>> The commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
>>>> reads") itself is completely fine, it's our existing code not properly
>>>> handling the error from bio submission hook properly.
>>>>
>>>> This patch will make submit_one_bio() to return void so that the callers
>>>> will never be able to do cleanup when bio submission hook fails.
>>>>
>>>> CC: stable@vger.kernel.org # 5.18+
>>>
>>> BTW stable tags are only for released kernels, if it's still in some rc
>>> then Fixes: is appropriate.
>>
>> The problem is I don't have a good idea on which commit to be fixed.
>>
>> Commit 1784b7d502a9 ("btrfs: handle csum lookup errors properly on
>> reads") is completely fine by itself.
>>
>> The problem is there for a long long time, but can only be triggered
>> with IO errors with that newer commit.
>>
>> Should we really use that commit? It looks like a scapegoat to me...
>
> I see, so it does not make sense to put Fixes: if it's not clearly
> caused by the patch, the text description of the problem and references
> to patches that could affect is OK.
>
> Still the stable tag should reflect where the fix applies but 5.18
> hasn't been released so either it's a typo or you know roughly which
> stable kernels should get the fix (5.15, 5.10, etc).

Then I guess we can drop the stable tag.

Before that mentioned commit, btrfs_lookup_bio_csum() will never return
error, thus submit_one_bio() will not really fail (due to IO error).

Although the error path is there for a long long time, we don't have
easy way to trigger the problem.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases
  2022-04-12 20:42 ` [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases David Sterba
@ 2022-04-14 19:38   ` David Sterba
  0 siblings, 0 replies; 15+ messages in thread
From: David Sterba @ 2022-04-14 19:38 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs

On Tue, Apr 12, 2022 at 10:42:27PM +0200, David Sterba wrote:
> On Tue, Apr 12, 2022 at 08:30:12PM +0800, Qu Wenruo wrote:
> > [CHANGELOG]
> > v2:
> > - Make submit_one_bio() to return void just in the same patch
> > 
> > - Update the commit message the for 2nd patch
> >   To mention remaining error path (which is not really possible to
> >   trigger), to prove the first patch is already fixing all involved
> >   error paths.
> > 
> > [DESCRIPTION]
> > When testing my subpage raid56 support, generic/475 is always hanging
> > with some data page unable to be unlocked.
> > 
> > It turns out that, the hang is not related to the raid56 subpage
> > support (obviously, as the test case is not utilizing RAID56 at all),
> > but a recent commit 1784b7d502a9 ("btrfs: handle csum lookup errors
> > properly on reads") introduced a new error path, and it caught us by
> > surprise.
> > 
> > The new error path is from btrfs_lookup_bio_sums(), which can now return
> > error if the csum search failed.
> > 
> > This new error path exposed several problems:
> > 
> > - Double cleanup for submit_one_bio() and its callers
> >   Bio submission hooks, btrfs_submit_data_bio() and
> >   btrfs_submit_metadata_bio() will call endio to cleanup on errors.
> > 
> >   But those bio submission hooks will also return error, and
> >   finally callers of submit_extent_page() will also try to do
> >   cleanup.
> > 
> >   This will be fixed by the first patch, by always returning 0 for
> >   submit_one_bio().
> >   This fix is kept as minimal as possible, to make backport easier.
> >   The proper conversion to return void will be done in the last patch.
> > 
> > - btrfs_do_readpage() can leave page locked on error
> >   If submit_extent_page() failed in btrfs_do_readpage(), we only
> >   cleanup the current range, and leaving the remaining subpage
> >   range locked.
> > 
> >   This bug is subpage specific, and will not affect regular cases.
> > 
> >   Fix it by cleaning up all the remaining subpage range before
> >   exiting.
> > 
> > - __extent_writepage_io() can return 0 even it hit some error
> >   Although we continue writing the remaining ranges, we didn't save
> >   the first error, causing @ret to be overwritten.
> >  
> >   This bug is subpage specific, as for regular cases we only have one
> >   sector inside the page.
> > 
> >   Fix it by introducing @has_error and @saved_ret.
> > 
> > I manually checked all other submit_extent_page() callers, they all
> > look fine and won't cause problems like the above.
> > 
> > Finally since submit_one_bio() will always return 0, the final patch
> > will make it return void, which greatly makes our code cleaner.
> > 
> > But that patch is introducing quite some modifications, not a candidate
> > for backport, unlike the first 3 patches.
> > 
> > Special thanks to Josef, as my initial bisection points to his patch and
> > I have no clue why it can cause problems at all.
> > His hints immediately solved all my questions, and lead to this
> > patchset.
> > 
> > 
> > Qu Wenruo (3):
> >   btrfs: avoid double clean up when submit_one_bio() failed
> >   btrfs: fix the error handling for submit_extent_page() for
> >     btrfs_do_readpage()
> >   btrfs: return correct error number for __extent_writepage_io()
> 
> Added as topic branch to for-next, thanks.

Moved to misc-next, thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
  2022-04-12 20:41   ` David Sterba
@ 2022-04-15  6:28   ` Christoph Hellwig
  2022-04-15  7:02     ` Qu Wenruo
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2022-04-15  6:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, stable

Btw, btrfs_submit_compressed_write also seems to do some double cleanups,
even if the pattern is slightly different as the bio is allocated inside
btrfs_submit_compressed_write itself.  Can someone who is more familiar
with that code look into that?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-15  6:28   ` Christoph Hellwig
@ 2022-04-15  7:02     ` Qu Wenruo
  2022-04-15  7:12       ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2022-04-15  7:02 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs, stable



On 2022/4/15 14:28, Christoph Hellwig wrote:
> Btw, btrfs_submit_compressed_write also seems to do some double cleanups,
> even if the pattern is slightly different as the bio is allocated inside
> btrfs_submit_compressed_write itself.  Can someone who is more familiar
> with that code look into that?

I just checked the code, it's indeed causing problems.

If btrfs_csum_one_bio() or submit_compressed_bio() failed (either ENOMEM
or failed some sanity checks), then we have one bio for writing the
compressed data back to disk.

Finish_cb label will call endio on it, which will call:
  -> end_compressed_bio_write()
     -> finish_compressed_bio() (this needs the compressed write bio not
be split)

Then finish_cb tag will also call finish_compressed_bio() directly.
Double freeing cb, and double clearing writeback flags.

The only relief is, regular EIO won't trigger the bug for write path.


But this can not be said to btrfs_submit_compressed_read(), which has
the same problem and can be triggered by EIO error easily.

Do you want to give it a try? Or mind to me fix it?

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-15  7:02     ` Qu Wenruo
@ 2022-04-15  7:12       ` Christoph Hellwig
  2022-04-15  7:14         ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2022-04-15  7:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, stable

On Fri, Apr 15, 2022 at 03:02:41PM +0800, Qu Wenruo wrote:
> But this can not be said to btrfs_submit_compressed_read(), which has
> the same problem and can be triggered by EIO error easily.
> 
> Do you want to give it a try? Or mind to me fix it?

I don't really know the btrfs writeback and compression code very well,
so if you can tackle it that would be great.  I'll review it and will
ask lots of stupid question in exchange :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-15  7:12       ` Christoph Hellwig
@ 2022-04-15  7:14         ` Qu Wenruo
  2022-04-24 23:26           ` Qu Wenruo
  0 siblings, 1 reply; 15+ messages in thread
From: Qu Wenruo @ 2022-04-15  7:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, stable



On 2022/4/15 15:12, Christoph Hellwig wrote:
> On Fri, Apr 15, 2022 at 03:02:41PM +0800, Qu Wenruo wrote:
>> But this can not be said to btrfs_submit_compressed_read(), which has
>> the same problem and can be triggered by EIO error easily.
>>
>> Do you want to give it a try? Or mind to me fix it?
>
> I don't really know the btrfs writeback and compression code very well,
> so if you can tackle it that would be great.  I'll review it and will
> ask lots of stupid question in exchange :)

No stupid questions at all.

Great we have some extra reviewing eyes!

Thanks for your review in advance.
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed
  2022-04-15  7:14         ` Qu Wenruo
@ 2022-04-24 23:26           ` Qu Wenruo
  0 siblings, 0 replies; 15+ messages in thread
From: Qu Wenruo @ 2022-04-24 23:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, stable



On 2022/4/15 15:14, Qu Wenruo wrote:
>
>
> On 2022/4/15 15:12, Christoph Hellwig wrote:
>> On Fri, Apr 15, 2022 at 03:02:41PM +0800, Qu Wenruo wrote:
>>> But this can not be said to btrfs_submit_compressed_read(), which has
>>> the same problem and can be triggered by EIO error easily.
>>>
>>> Do you want to give it a try? Or mind to me fix it?
>>
>> I don't really know the btrfs writeback and compression code very well,
>> so if you can tackle it that would be great.  I'll review it and will
>> ask lots of stupid question in exchange :)
>
> No stupid questions at all.
>
> Great we have some extra reviewing eyes!
>
> Thanks for your review in advance.
> Qu

OK, it turns out things are better than we thought.

For btrfs_submit_compressed_read() and btrfs_submit_compressed_write(),
we have a different (and correct) way handling the endio.

Let's look at btrfs_submit_compressed_read() as an example.

If functions like btrfs_lookup_bio_sums() failed, although we go
finish_cb: label, our @cur_disk_byte is already updated.

Then under finish_cb: label, we first finish the current @comp_bio,
which may:

1. Decrease cb::pending_bytes and it reached 0.
    Call finish_compressed_bio_read() as we're the last pending bytes.

2. Just decrease cb::pending_bytes
    There are still other pending ios.

For case 1, our @cur_disk_bytnr has already reached our range end, thus
we won't do anything but exit early, without manually calling
finish_compressed_bio_read().

For case 2, we continue to wait_var_event() first, to wait all bios
on-the-fly to finish.
As since the pending_sectors will never reach 0, no one is going to
finish the @cb.

Then we're safe to call finish_compressed_bio_read() then.


In fact, those are exact what I fixed in commit 6853c64a6e76 ("btrfs:
handle errors properly inside btrfs_submit_compressed_write()") and
86ccbb4d2a2a ("btrfs: handle errors properly inside
btrfs_submit_compressed_read()").

In fact, when I saw the overkilled usage of ASSERT()s, I should know
it's myself...

So in fact we're safe since v5.16.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-24 23:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 12:30 [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases Qu Wenruo
2022-04-12 12:30 ` [PATCH v2 1/3] btrfs: avoid double clean up when submit_one_bio() failed Qu Wenruo
2022-04-12 20:41   ` David Sterba
2022-04-12 23:32     ` Qu Wenruo
2022-04-13 13:46       ` David Sterba
2022-04-13 23:23         ` Qu Wenruo
2022-04-15  6:28   ` Christoph Hellwig
2022-04-15  7:02     ` Qu Wenruo
2022-04-15  7:12       ` Christoph Hellwig
2022-04-15  7:14         ` Qu Wenruo
2022-04-24 23:26           ` Qu Wenruo
2022-04-12 12:30 ` [PATCH v2 2/3] btrfs: fix the error handling for submit_extent_page() for btrfs_do_readpage() Qu Wenruo
2022-04-12 12:30 ` [PATCH v2 3/3] btrfs: return correct error number for __extent_writepage_io() Qu Wenruo
2022-04-12 20:42 ` [PATCH v2 0/3] btrfs: vairous bug fixes related to generic/475 failure with subpage cases David Sterba
2022-04-14 19:38   ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.