All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] Fix error handling on data bio submission
@ 2022-02-18 15:03 Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG Josef Bacik
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- addressed the various comments, expanded on the comment about NODATASUM and
  the DATA_RELOC inode since I had to look up why we even had that there.
- Added the various reviewed-by's on the patches I didn't touch.

--- Original email ---

Internally we tried to enable a remediation to automatically re-provision
machines that had checksum errors.  This was based on dmesg scanning, however we
discovered that we were getting transienty csum error messages.  This came down
to getting a transient ENOMEM while trying to lookup checksums while doing a
data read (this was on memory constrained containers).

What we were doing was simply acting like there wasn't a checksum there, which
would print a scary message about missing checksums.  And then we'd do the read,
but because we didn't have a checksum we'd complain about a checksum mismatch.
Neither of these things were actually what was happening, we simply got an EIO
while looking up the checksums.

Fix this by properly returning an error and erroring out the BIO with the
correct error.  This is actually correct, it allows us to skip the IO and also
not erroneously tell the user that their checksums are invalid.

While testing this fix however I uncovered a variety of problems with our error
handling when we submit.  So the first two patches are to fix the main problem I
wanted to fix, and the next 6 are to fix problems that happen when injecting
errors into the checksum lookup path.

With these patches I'm no longer getting csum mismatch errors when I fail to
lookup csums, and I'm also able to survive a xfstests run while randomly
injecting errors into this path.  Thanks,

Josef

Josef Bacik (8):
  btrfs: make search_csum_tree return 0 if we get -EFBIG
  btrfs: handle csum lookup errors properly on reads
  btrfs: check correct bio in finish_compressed_bio_read
  btrfs: remove the bio argument from finish_compressed_bio_read
  btrfs: track compressed bio errors as blk_status_t
  btrfs: do not double complete bio on errors during compressed reads
  btrfs: do not try to repair bio that has no mirror set
  btrfs: do not clean up repair bio if submit fails

 fs/btrfs/compression.c | 58 ++++++++++++++++++++++++------------------
 fs/btrfs/compression.h |  2 +-
 fs/btrfs/extent_io.c   | 25 ++++++++++++------
 fs/btrfs/file-item.c   | 43 ++++++++++++++++++++-----------
 fs/btrfs/inode.c       | 12 ++++++---
 5 files changed, 87 insertions(+), 53 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads Josef Bacik
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Boris Burkov, Johannes Thumshirn

We can either fail to find a csum entry at all and return -ENOENT, or we
can find a range that is close, but return -EFBIG.  In essence these
both mean the same thing when we are doing a lookup for a csum in an
existing range, we didn't find a csum.  We want to treat both of these
errors the same way, complain loudly that there wasn't a csum.  This
currently happens anyway because we do

	count = search_csum_tree();
	if (count <= 0) {
		// reloc and error handling
	}

however it forces us to incorrectly treat EIO or ENOMEM errors as on
disk corruption.  Fix this by returning 0 if we get either -ENOENT or
-EFBIG from btrfs_lookup_csum() so we can do proper error handling.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file-item.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 9a3de652ada8..efb24cc0b083 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -305,7 +305,7 @@ static int search_csum_tree(struct btrfs_fs_info *fs_info,
 	read_extent_buffer(path->nodes[0], dst, (unsigned long)item,
 			ret * csum_size);
 out:
-	if (ret == -ENOENT)
+	if (ret == -ENOENT || ret == -EFBIG)
 		ret = 0;
 	return ret;
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 16:17   ` David Sterba
  2022-02-18 15:03 ` [PATCH v2 3/8] btrfs: check correct bio in finish_compressed_bio_read Josef Bacik
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Currently any error we get while trying to lookup csums during reads
shows up as a missing csum, and then on the read completion side we spit
out an error saying there was a csum mismatch and we increase the device
corruption count.

However we could have gotten an EIO from the lookup.  We could also be
inside of a memory constrained container and gotten a ENOMEM while
trying to do the read.  In either case we don't want to make this look
like a file system corruption problem, we want to make it look like the
actual error it is.  Capture any negative value, convert it to the
appropriate blk_sts_t, free the csum array if we have one and bail.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/file-item.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index efb24cc0b083..1ce5988297c3 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -368,6 +368,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
+	struct btrfs_bio *bbio = NULL;
 	struct btrfs_path *path;
 	const u32 sectorsize = fs_info->sectorsize;
 	const u32 csum_size = fs_info->csum_size;
@@ -377,6 +378,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	u8 *csum;
 	const unsigned int nblocks = orig_len >> fs_info->sectorsize_bits;
 	int count = 0;
+	blk_status_t ret = BLK_STS_OK;
 
 	if ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) ||
 	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
@@ -400,7 +402,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 		return BLK_STS_RESOURCE;
 
 	if (!dst) {
-		struct btrfs_bio *bbio = btrfs_bio(bio);
+		bbio = btrfs_bio(bio);
 
 		if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) {
 			bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS);
@@ -456,21 +458,32 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 
 		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
 					 search_len, csum_dst);
-		if (count <= 0) {
-			/*
-			 * Either we hit a critical error or we didn't find
-			 * the csum.
-			 * Either way, we put zero into the csums dst, and skip
-			 * to the next sector.
-			 */
+		if (count < 0) {
+			ret = errno_to_blk_status(count);
+			if (bbio)
+				btrfs_bio_free_csum(bbio);
+			break;
+		}
+
+		/*
+		 * We didn't find a csum for this range.  We need to make sure
+		 * we complain loudly about this, because we are not NODATASUM.
+		 *
+		 * However for the DATA_RELOC inode we could potentially be
+		 * relocating data extents for a NODATASUM inode, so the inode
+		 * itself won't be marked with NODATASUM, but the extent we're
+		 * copying is in fact NODATASUM.  If we don't find a csum we
+		 * assume this is the case.
+		 *
+		 * TODO: make the relocation code look up the owning inode and
+		 * see if it's marked as NODATASUM and set EXTENT_NODATASUM
+		 * there, that way if there's corruption and there isn't a
+		 * checksum when we want it we can fail here rather than later.
+		 */
+		if (count == 0) {
 			memset(csum_dst, 0, csum_size);
 			count = 1;
 
-			/*
-			 * For data reloc inode, we need to mark the range
-			 * NODATASUM so that balance won't report false csum
-			 * error.
-			 */
 			if (BTRFS_I(inode)->root->root_key.objectid ==
 			    BTRFS_DATA_RELOC_TREE_OBJECTID) {
 				u64 file_offset;
@@ -491,7 +504,7 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst
 	}
 
 	btrfs_free_path(path);
-	return BLK_STS_OK;
+	return ret;
 }
 
 int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/8] btrfs: check correct bio in finish_compressed_bio_read
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 4/8] btrfs: remove the bio argument from finish_compressed_bio_read Josef Bacik
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Boris Burkov, Johannes Thumshirn

Commit c09abff87f90 ("btrfs: cloned bios must not be iterated by
bio_for_each_segment_all") added ASSERT()'s to make sure we weren't
calling bio_for_each_segment_all() on a RAID5/6 bio.  However it was
checking the bio that the compression code passed in, not the
cb->orig_bio that we actually iterate over, so adjust this ASSERT() to
check the correct bio.

Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/compression.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 71e5b2e9a1ba..a9d458305a08 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -259,7 +259,7 @@ static void finish_compressed_bio_read(struct compressed_bio *cb, struct bio *bi
 		 * We have verified the checksum already, set page checked so
 		 * the end_io handlers know about it
 		 */
-		ASSERT(!bio_flagged(bio, BIO_CLONED));
+		ASSERT(!bio_flagged(cb->orig_bio, BIO_CLONED));
 		bio_for_each_segment_all(bvec, cb->orig_bio, iter_all) {
 			u64 bvec_start = page_offset(bvec->bv_page) +
 					 bvec->bv_offset;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/8] btrfs: remove the bio argument from finish_compressed_bio_read
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (2 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 3/8] btrfs: check correct bio in finish_compressed_bio_read Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 5/8] btrfs: track compressed bio errors as blk_status_t Josef Bacik
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Johannes Thumshirn

This bio is usually one of the compressed bio's, and we don't actually
need it in this function, so remove the argument and stop passing it
around.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/compression.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index a9d458305a08..0112fba345d4 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -234,7 +234,7 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb, struct bio *b
 	return last_io;
 }
 
-static void finish_compressed_bio_read(struct compressed_bio *cb, struct bio *bio)
+static void finish_compressed_bio_read(struct compressed_bio *cb)
 {
 	unsigned int index;
 	struct page *page;
@@ -253,8 +253,6 @@ static void finish_compressed_bio_read(struct compressed_bio *cb, struct bio *bi
 		struct bio_vec *bvec;
 		struct bvec_iter_all iter_all;
 
-		ASSERT(bio);
-		ASSERT(!bio->bi_status);
 		/*
 		 * We have verified the checksum already, set page checked so
 		 * the end_io handlers know about it
@@ -325,7 +323,7 @@ static void end_compressed_bio_read(struct bio *bio)
 csum_failed:
 	if (ret)
 		cb->errors = 1;
-	finish_compressed_bio_read(cb, bio);
+	finish_compressed_bio_read(cb);
 out:
 	bio_put(bio);
 }
@@ -970,7 +968,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	 */
 	ASSERT(refcount_read(&cb->pending_sectors));
 	/* Now we are the only one referring @cb, can finish it safely. */
-	finish_compressed_bio_read(cb, NULL);
+	finish_compressed_bio_read(cb);
 	return ret;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/8] btrfs: track compressed bio errors as blk_status_t
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (3 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 4/8] btrfs: remove the bio argument from finish_compressed_bio_read Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads Josef Bacik
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Right now we just have a binary "errors" flag, so any error we get on
the compressed bio's gets translated to EIO.  This isn't necessarily a
bad thing, but if we get an ENOMEM it may be nice to know that's what
happened instead of an EIO.  Track our errors as a blk_status_t, and do
the appropriate setting of the errors accordingly.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/compression.c | 28 +++++++++++++++-------------
 fs/btrfs/compression.h |  2 +-
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 0112fba345d4..747920ce1c17 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -219,7 +219,7 @@ static bool dec_and_test_compressed_bio(struct compressed_bio *cb, struct bio *b
 		bi_size += bvec->bv_len;
 
 	if (bio->bi_status)
-		cb->errors = 1;
+		cb->status = bio->bi_status;
 
 	ASSERT(bi_size && bi_size <= cb->compressed_len);
 	last_io = refcount_sub_and_test(bi_size >> fs_info->sectorsize_bits,
@@ -247,8 +247,9 @@ static void finish_compressed_bio_read(struct compressed_bio *cb)
 	}
 
 	/* Do io completion on the original bio */
-	if (cb->errors) {
-		bio_io_error(cb->orig_bio);
+	if (cb->status != BLK_STS_OK) {
+		cb->orig_bio->bi_status = cb->status;
+		bio_endio(cb->orig_bio);
 	} else {
 		struct bio_vec *bvec;
 		struct bvec_iter_all iter_all;
@@ -306,7 +307,7 @@ static void end_compressed_bio_read(struct bio *bio)
 	 * Some IO in this cb have failed, just skip checksum as there
 	 * is no way it could be correct.
 	 */
-	if (cb->errors == 1)
+	if (cb->status != BLK_STS_OK)
 		goto csum_failed;
 
 	inode = cb->inode;
@@ -322,7 +323,7 @@ static void end_compressed_bio_read(struct bio *bio)
 
 csum_failed:
 	if (ret)
-		cb->errors = 1;
+		cb->status = errno_to_blk_status(ret);
 	finish_compressed_bio_read(cb);
 out:
 	bio_put(bio);
@@ -340,11 +341,12 @@ static noinline void end_compressed_writeback(struct inode *inode,
 	unsigned long end_index = (cb->start + cb->len - 1) >> PAGE_SHIFT;
 	struct page *pages[16];
 	unsigned long nr_pages = end_index - index + 1;
+	int errno = blk_status_to_errno(cb->status);
 	int i;
 	int ret;
 
-	if (cb->errors)
-		mapping_set_error(inode->i_mapping, -EIO);
+	if (errno)
+		mapping_set_error(inode->i_mapping, errno);
 
 	while (nr_pages > 0) {
 		ret = find_get_pages_contig(inode->i_mapping, index,
@@ -356,7 +358,7 @@ static noinline void end_compressed_writeback(struct inode *inode,
 			continue;
 		}
 		for (i = 0; i < ret; i++) {
-			if (cb->errors)
+			if (errno)
 				SetPageError(pages[i]);
 			btrfs_page_clamp_clear_writeback(fs_info, pages[i],
 							 cb->start, cb->len);
@@ -377,9 +379,9 @@ static void finish_compressed_bio_write(struct compressed_bio *cb)
 	 * Ok, we're the last bio for this extent, step one is to call back
 	 * into the FS and do all the end_io operations.
 	 */
-	btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
-			cb->start, cb->start + cb->len - 1,
-			!cb->errors);
+	btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL, cb->start,
+					     cb->start + cb->len - 1,
+					     cb->status == BLK_STS_OK);
 
 	end_compressed_writeback(inode, cb);
 	/* Note, our inode could be gone now */
@@ -522,7 +524,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start,
 	if (!cb)
 		return BLK_STS_RESOURCE;
 	refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits);
-	cb->errors = 0;
+	cb->status = BLK_STS_OK;
 	cb->inode = &inode->vfs_inode;
 	cb->start = start;
 	cb->len = len;
@@ -829,7 +831,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 		goto out;
 
 	refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits);
-	cb->errors = 0;
+	cb->status = BLK_STS_OK;
 	cb->inode = inode;
 	cb->mirror_num = mirror_num;
 	sums = cb->sums;
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 7dbd14caab01..c224c67f8c5e 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -55,7 +55,7 @@ struct compressed_bio {
 	u8 compress_type;
 
 	/* IO errors */
-	u8 errors;
+	blk_status_t status;
 	int mirror_num;
 
 	/* for reads, this is the bio we are copying the data into */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (4 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 5/8] btrfs: track compressed bio errors as blk_status_t Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 16:34   ` David Sterba
  2022-02-18 15:03 ` [PATCH v2 7/8] btrfs: do not try to repair bio that has no mirror set Josef Bacik
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

I hit some weird panics while fixing up the error handling from
btrfs_lookup_bio_sums().  Turns out the compression path will complete
the bio we use if we set up any of the compression bios and then return
an error, and then btrfs_submit_data_bio() will also call bio_endio() on
the bio.

Fix this by making btrfs_submit_compressed_read() responsible for
calling bio_endio() on the bio if there are any errors.  Currently it
was only doing it if we created the compression bios, otherwise it was
depending on btrfs_submit_data_bio() to do the right thing.  This
creates the above problem, so fix up btrfs_submit_compressed_read() to
always call bio_endio() in case of an error, and then simply return from
btrfs_submit_data_bio() if we had to call
btrfs_submit_compressed_read().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/compression.c | 20 ++++++++++++++------
 fs/btrfs/inode.c       | 12 ++++++++----
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 747920ce1c17..6dc727d38d5c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -808,7 +808,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	u64 em_len;
 	u64 em_start;
 	struct extent_map *em;
-	blk_status_t ret = BLK_STS_RESOURCE;
+	blk_status_t ret;
 	int faili = 0;
 	u8 *sums;
 
@@ -821,14 +821,18 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	read_lock(&em_tree->lock);
 	em = lookup_extent_mapping(em_tree, file_offset, fs_info->sectorsize);
 	read_unlock(&em_tree->lock);
-	if (!em)
-		return BLK_STS_IOERR;
+	if (!em) {
+		ret = BLK_STS_IOERR;
+		goto out;
+	}
 
 	ASSERT(em->compress_type != BTRFS_COMPRESS_NONE);
 	compressed_len = em->block_len;
 	cb = kmalloc(compressed_bio_size(fs_info, compressed_len), GFP_NOFS);
-	if (!cb)
+	if (!cb) {
+		ret = BLK_STS_RESOURCE;
 		goto out;
+	}
 
 	refcount_set(&cb->pending_sectors, compressed_len >> fs_info->sectorsize_bits);
 	cb->status = BLK_STS_OK;
@@ -851,8 +855,10 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	nr_pages = DIV_ROUND_UP(compressed_len, PAGE_SIZE);
 	cb->compressed_pages = kcalloc(nr_pages, sizeof(struct page *),
 				       GFP_NOFS);
-	if (!cb->compressed_pages)
+	if (!cb->compressed_pages) {
+		ret = BLK_STS_RESOURCE;
 		goto fail1;
+	}
 
 	for (pg_index = 0; pg_index < nr_pages; pg_index++) {
 		cb->compressed_pages[pg_index] = alloc_page(GFP_NOFS);
@@ -938,7 +944,7 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 			comp_bio = NULL;
 		}
 	}
-	return 0;
+	return BLK_STS_OK;
 
 fail2:
 	while (faili >= 0) {
@@ -951,6 +957,8 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 	kfree(cb);
 out:
 	free_extent_map(em);
+	bio->bi_status = ret;
+	bio_endio(bio);
 	return ret;
 finish_cb:
 	if (comp_bio) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 24099fe9e120..69fa71186e72 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2542,10 +2542,14 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
 			goto out;
 
 		if (bio_flags & EXTENT_BIO_COMPRESSED) {
-			ret = btrfs_submit_compressed_read(inode, bio,
-							   mirror_num,
-							   bio_flags);
-			goto out;
+			/*
+			 * btrfs_submit_compressed_read will handle completing
+			 * the bio if there were any errors, so just return
+			 * here.
+			 */
+			return btrfs_submit_compressed_read(inode, bio,
+							    mirror_num,
+							    bio_flags);
 		} else {
 			/*
 			 * Lookup bio sums does extra checks around whether we
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 7/8] btrfs: do not try to repair bio that has no mirror set
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (5 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 8/8] btrfs: do not clean up repair bio if submit fails Josef Bacik
  2022-02-18 16:38 ` [PATCH v2 0/8] Fix error handling on data bio submission David Sterba
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Boris Burkov

If we fail to submit a bio for whatever reason, we may not have setup a
mirror_num for that bio.  This means we shouldn't try to do the repair
workflow, if we do we'll hit an BUG_ON(!failrec->this_mirror) in
clean_io_failure.  Instead simply skip the repair workflow if we have no
mirror set, and add an assert to btrfs_check_repairable() to make it
easier to catch what is happening in the future.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index bda7fa8cf540..29ffb2814e5c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2610,6 +2610,7 @@ static bool btrfs_check_repairable(struct inode *inode,
 	 * a good copy of the failed sector and if we succeed, we have setup
 	 * everything for repair_io_failure to do the rest for us.
 	 */
+	ASSERT(failed_mirror);
 	failrec->failed_mirror = failed_mirror;
 	failrec->this_mirror++;
 	if (failrec->this_mirror == failed_mirror)
@@ -3067,6 +3068,14 @@ static void end_bio_extent_readpage(struct bio *bio)
 			goto readpage_ok;
 
 		if (is_data_inode(inode)) {
+			/*
+			 * If we failed to submit the IO at all we'll have a
+			 * mirror_num == 0, in which case we need to just mark
+			 * the page with an error and unlock it and carry on.
+			 */
+			if (mirror == 0)
+				goto readpage_ok;
+
 			/*
 			 * btrfs_submit_read_repair() will handle all the good
 			 * and bad sectors, we just continue to the next bvec.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 8/8] btrfs: do not clean up repair bio if submit fails
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (6 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 7/8] btrfs: do not try to repair bio that has no mirror set Josef Bacik
@ 2022-02-18 15:03 ` Josef Bacik
  2022-02-18 16:38 ` [PATCH v2 0/8] Fix error handling on data bio submission David Sterba
  8 siblings, 0 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Boris Burkov

The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of UAF and NULL
pointer deref bugs because we race with the endio function that is
cleaning up the bio.  Instead just return STS_OK as the repair function
has to continue to process the rest of the pages, and the endio for the
repair bio will do the appropriate cleanup for the page that it was
given.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 29ffb2814e5c..16b6820c913d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2640,7 +2640,6 @@ int btrfs_repair_one_sector(struct inode *inode,
 	const int icsum = bio_offset >> fs_info->sectorsize_bits;
 	struct bio *repair_bio;
 	struct btrfs_bio *repair_bbio;
-	blk_status_t status;
 
 	btrfs_debug(fs_info,
 		   "repair read error: read error at %llu", start);
@@ -2679,13 +2678,14 @@ int btrfs_repair_one_sector(struct inode *inode,
 		    "repair read error: submitting new read to mirror %d",
 		    failrec->this_mirror);
 
-	status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
-				 failrec->bio_flags);
-	if (status) {
-		free_io_failure(failure_tree, tree, failrec);
-		bio_put(repair_bio);
-	}
-	return blk_status_to_errno(status);
+	/*
+	 * At this point we have a bio, so any errors from submit_bio_hook()
+	 * will be handled by the endio on the repair_bio, so we can't return an
+	 * error here.
+	 */
+	submit_bio_hook(inode, repair_bio, failrec->this_mirror,
+			failrec->bio_flags);
+	return BLK_STS_OK;
 }
 
 static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads
  2022-02-18 15:03 ` [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads Josef Bacik
@ 2022-02-18 16:17   ` David Sterba
  2022-02-18 16:28     ` Josef Bacik
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2022-02-18 16:17 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Feb 18, 2022 at 10:03:23AM -0500, Josef Bacik wrote:
> +		/*
> +		 * We didn't find a csum for this range.  We need to make sure
> +		 * we complain loudly about this, because we are not NODATASUM.
> +		 *
> +		 * However for the DATA_RELOC inode we could potentially be
> +		 * relocating data extents for a NODATASUM inode, so the inode
> +		 * itself won't be marked with NODATASUM, but the extent we're
> +		 * copying is in fact NODATASUM.  If we don't find a csum we
> +		 * assume this is the case.
> +		 *
> +		 * TODO: make the relocation code look up the owning inode and
> +		 * see if it's marked as NODATASUM and set EXTENT_NODATASUM
> +		 * there, that way if there's corruption and there isn't a
> +		 * checksum when we want it we can fail here rather than later.

Please don't add TODOs to the code, that's the best place to forget
about them, we've had some unfixed for years.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads
  2022-02-18 16:17   ` David Sterba
@ 2022-02-18 16:28     ` Josef Bacik
  2022-02-18 16:31       ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 16:28 UTC (permalink / raw)
  To: dsterba, linux-btrfs, kernel-team

On Fri, Feb 18, 2022 at 05:17:06PM +0100, David Sterba wrote:
> On Fri, Feb 18, 2022 at 10:03:23AM -0500, Josef Bacik wrote:
> > +		/*
> > +		 * We didn't find a csum for this range.  We need to make sure
> > +		 * we complain loudly about this, because we are not NODATASUM.
> > +		 *
> > +		 * However for the DATA_RELOC inode we could potentially be
> > +		 * relocating data extents for a NODATASUM inode, so the inode
> > +		 * itself won't be marked with NODATASUM, but the extent we're
> > +		 * copying is in fact NODATASUM.  If we don't find a csum we
> > +		 * assume this is the case.
> > +		 *
> > +		 * TODO: make the relocation code look up the owning inode and
> > +		 * see if it's marked as NODATASUM and set EXTENT_NODATASUM
> > +		 * there, that way if there's corruption and there isn't a
> > +		 * checksum when we want it we can fail here rather than later.
> 
> Please don't add TODOs to the code, that's the best place to forget
> about them, we've had some unfixed for years.

Do we want to just make a GH issue for it?  If you want to yank that bit out I'm
fine with it.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads
  2022-02-18 16:28     ` Josef Bacik
@ 2022-02-18 16:31       ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2022-02-18 16:31 UTC (permalink / raw)
  To: Josef Bacik; +Cc: dsterba, linux-btrfs, kernel-team

On Fri, Feb 18, 2022 at 11:28:17AM -0500, Josef Bacik wrote:
> On Fri, Feb 18, 2022 at 05:17:06PM +0100, David Sterba wrote:
> > On Fri, Feb 18, 2022 at 10:03:23AM -0500, Josef Bacik wrote:
> > > +		/*
> > > +		 * We didn't find a csum for this range.  We need to make sure
> > > +		 * we complain loudly about this, because we are not NODATASUM.
> > > +		 *
> > > +		 * However for the DATA_RELOC inode we could potentially be
> > > +		 * relocating data extents for a NODATASUM inode, so the inode
> > > +		 * itself won't be marked with NODATASUM, but the extent we're
> > > +		 * copying is in fact NODATASUM.  If we don't find a csum we
> > > +		 * assume this is the case.
> > > +		 *
> > > +		 * TODO: make the relocation code look up the owning inode and
> > > +		 * see if it's marked as NODATASUM and set EXTENT_NODATASUM
> > > +		 * there, that way if there's corruption and there isn't a
> > > +		 * checksum when we want it we can fail here rather than later.
> > 
> > Please don't add TODOs to the code, that's the best place to forget
> > about them, we've had some unfixed for years.
> 
> Do we want to just make a GH issue for it?  If you want to yank that bit out I'm
> fine with it.  Thanks,

I've moved part of the comment and to changelog as a note so it's not
completely lost, an issue makes sense too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads
  2022-02-18 15:03 ` [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads Josef Bacik
@ 2022-02-18 16:34   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2022-02-18 16:34 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Feb 18, 2022 at 10:03:27AM -0500, Josef Bacik wrote:
> @@ -2542,10 +2542,14 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio,
>  			goto out;
>  
>  		if (bio_flags & EXTENT_BIO_COMPRESSED) {
> -			ret = btrfs_submit_compressed_read(inode, bio,
> -							   mirror_num,
> -							   bio_flags);
> -			goto out;
> +			/*
> +			 * btrfs_submit_compressed_read will handle completing
> +			 * the bio if there were any errors, so just return
> +			 * here.
> +			 */
> +			return btrfs_submit_compressed_read(inode, bio,
> +							    mirror_num,
> +							    bio_flags);

All the other error branches do a goto, so I'll convert this too so it's
consistent.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/8] Fix error handling on data bio submission
  2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
                   ` (7 preceding siblings ...)
  2022-02-18 15:03 ` [PATCH v2 8/8] btrfs: do not clean up repair bio if submit fails Josef Bacik
@ 2022-02-18 16:38 ` David Sterba
  8 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2022-02-18 16:38 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Fri, Feb 18, 2022 at 10:03:21AM -0500, Josef Bacik wrote:
> v1->v2:
> - addressed the various comments, expanded on the comment about NODATASUM and
>   the DATA_RELOC inode since I had to look up why we even had that there.
> - Added the various reviewed-by's on the patches I didn't touch.
> 
> --- Original email ---
> 
> Internally we tried to enable a remediation to automatically re-provision
> machines that had checksum errors.  This was based on dmesg scanning, however we
> discovered that we were getting transienty csum error messages.  This came down
> to getting a transient ENOMEM while trying to lookup checksums while doing a
> data read (this was on memory constrained containers).
> 
> What we were doing was simply acting like there wasn't a checksum there, which
> would print a scary message about missing checksums.  And then we'd do the read,
> but because we didn't have a checksum we'd complain about a checksum mismatch.
> Neither of these things were actually what was happening, we simply got an EIO
> while looking up the checksums.
> 
> Fix this by properly returning an error and erroring out the BIO with the
> correct error.  This is actually correct, it allows us to skip the IO and also
> not erroneously tell the user that their checksums are invalid.
> 
> While testing this fix however I uncovered a variety of problems with our error
> handling when we submit.  So the first two patches are to fix the main problem I
> wanted to fix, and the next 6 are to fix problems that happen when injecting
> errors into the checksum lookup path.
> 
> With these patches I'm no longer getting csum mismatch errors when I fail to
> lookup csums, and I'm also able to survive a xfstests run while randomly
> injecting errors into this path.  Thanks,
> 
> Josef
> 
> Josef Bacik (8):
>   btrfs: make search_csum_tree return 0 if we get -EFBIG
>   btrfs: handle csum lookup errors properly on reads
>   btrfs: check correct bio in finish_compressed_bio_read
>   btrfs: remove the bio argument from finish_compressed_bio_read
>   btrfs: track compressed bio errors as blk_status_t
>   btrfs: do not double complete bio on errors during compressed reads
>   btrfs: do not try to repair bio that has no mirror set
>   btrfs: do not clean up repair bio if submit fails

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-02-18 16:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-18 15:03 [PATCH v2 0/8] Fix error handling on data bio submission Josef Bacik
2022-02-18 15:03 ` [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG Josef Bacik
2022-02-18 15:03 ` [PATCH v2 2/8] btrfs: handle csum lookup errors properly on reads Josef Bacik
2022-02-18 16:17   ` David Sterba
2022-02-18 16:28     ` Josef Bacik
2022-02-18 16:31       ` David Sterba
2022-02-18 15:03 ` [PATCH v2 3/8] btrfs: check correct bio in finish_compressed_bio_read Josef Bacik
2022-02-18 15:03 ` [PATCH v2 4/8] btrfs: remove the bio argument from finish_compressed_bio_read Josef Bacik
2022-02-18 15:03 ` [PATCH v2 5/8] btrfs: track compressed bio errors as blk_status_t Josef Bacik
2022-02-18 15:03 ` [PATCH v2 6/8] btrfs: do not double complete bio on errors during compressed reads Josef Bacik
2022-02-18 16:34   ` David Sterba
2022-02-18 15:03 ` [PATCH v2 7/8] btrfs: do not try to repair bio that has no mirror set Josef Bacik
2022-02-18 15:03 ` [PATCH v2 8/8] btrfs: do not clean up repair bio if submit fails Josef Bacik
2022-02-18 16:38 ` [PATCH v2 0/8] Fix error handling on data bio submission David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.