[PATCH v2 0/8] Fix error handling on data bio submission

* [PATCH v2 0/8] Fix error handling on data bio submission
@ 2022-02-18 15:03 Josef Bacik
  2022-02-18 15:03 ` [PATCH v2 1/8] btrfs: make search_csum_tree return 0 if we get -EFBIG Josef Bacik
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Josef Bacik @ 2022-02-18 15:03 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

v1->v2:
- addressed the various comments, expanded on the comment about NODATASUM and
  the DATA_RELOC inode since I had to look up why we even had that there.
- Added the various reviewed-by's on the patches I didn't touch.

--- Original email ---

Internally we tried to enable a remediation to automatically re-provision
machines that had checksum errors.  This was based on dmesg scanning, however we
discovered that we were getting transienty csum error messages.  This came down
to getting a transient ENOMEM while trying to lookup checksums while doing a
data read (this was on memory constrained containers).

What we were doing was simply acting like there wasn't a checksum there, which
would print a scary message about missing checksums.  And then we'd do the read,
but because we didn't have a checksum we'd complain about a checksum mismatch.
Neither of these things were actually what was happening, we simply got an EIO
while looking up the checksums.

Fix this by properly returning an error and erroring out the BIO with the
correct error.  This is actually correct, it allows us to skip the IO and also
not erroneously tell the user that their checksums are invalid.

While testing this fix however I uncovered a variety of problems with our error
handling when we submit.  So the first two patches are to fix the main problem I
wanted to fix, and the next 6 are to fix problems that happen when injecting
errors into the checksum lookup path.

With these patches I'm no longer getting csum mismatch errors when I fail to
lookup csums, and I'm also able to survive a xfstests run while randomly
injecting errors into this path.  Thanks,

Josef

Josef Bacik (8):
  btrfs: make search_csum_tree return 0 if we get -EFBIG
  btrfs: handle csum lookup errors properly on reads
  btrfs: check correct bio in finish_compressed_bio_read
  btrfs: remove the bio argument from finish_compressed_bio_read
  btrfs: track compressed bio errors as blk_status_t
  btrfs: do not double complete bio on errors during compressed reads
  btrfs: do not try to repair bio that has no mirror set
  btrfs: do not clean up repair bio if submit fails

 fs/btrfs/compression.c | 58 ++++++++++++++++++++++++------------------
 fs/btrfs/compression.h |  2 +-
 fs/btrfs/extent_io.c   | 25 ++++++++++++------
 fs/btrfs/file-item.c   | 43 ++++++++++++++++++++-----------
 fs/btrfs/inode.c       | 12 ++++++---
 5 files changed, 87 insertions(+), 53 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 14+ messages in thread