linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/12] btrfs: Enhancement to tree block validation
@ 2019-02-15 10:50 Qu Wenruo
  2019-02-15 10:50 ` [PATCH v5 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Qu Wenruo @ 2019-02-15 10:50 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.0-rc1 tag.
Also there is no conflict rebasing the patchset to misc-next.

This patchset has the following 3 features:
- Tree block validation output enhancement
  * Output validation failure timing (write time or read time)
  * Always output tree block level/key mismatch error message
    This part is already submitted and reviewed.

- Write time tree block validation check
  To catch memory corruption either from hardware or kernel.
  Example output would be:

    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): write time tree block corruption detected
    BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
    BTRFS info (device dm-3): forced readonly
    BTRFS warning (device dm-3): Skipping commit of aborted transaction.
    BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
    BTRFS info (device dm-3): delayed_refs has NO entry

- Better error handling before calling flush_write_bio()
  One hidden reason of calling flush_write_bio() under all cases is,
  flush_write_bio() will trigger endio function and endio function of
  epd->bio will free the bio under all cases.
  So we're in fact abusing flush_write_bio() as cleanup.

  Since now flush_write_bio() has its own return value, we shouldn't call
  flush_write_bio() no-brain, here we introduce proper cleanup helper,
  end_write_bio(). Now we call flush_write_bio() like:
              New                 |           Old
  --------------------------------------------------------------
  ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
  if (ret < 0) {                  | flush_write_bio(&epd);
  	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
  	return ret;               | return ret;
  }                               |
  ret = flush_write_bio(&epd);    |
  return ret;                     |

  Above code should be more streamline for the error handling part.

Changelog:
v2:
- Unlock locked pages in lock_extent_buffer_for_io() for error handling.
- Added Reviewed-by tags.

v3:
- Remove duplicated error message.
- Use IS_ENABLED() macro to replace #ifdef.
- Added Reviewed-by tags.

v4:
- Re-organized patch split
  Now each BUG_ON() cleanup has its own patch
- Dig much further into the call sites to eliminate unexpected >0 return
  May be a little paranoid and abuse some ASSERT(), but it should be
  much safer against further code change.
- Fix the false alert caused by balance and memory pressure
  The fix it skip owner checker for non-essential tree at write time.
  Since owner root can't always be reliable, either due to commit root
  created in current transaction or balance + memory pressure.

v5:
- Do proper error-out handling other than relying on flush_write_bio()
  to clean up.
  This has a side effect that no Reviewed-by tags for modified patches.
- New comment for why we don't need to do anything about ebp->bio when
  submit_one_bio() fails.
- Add some Reviewed-by tag.

Qu Wenruo (12):
  btrfs: Always output error message when key/level verification fails
  btrfs: extent_io: Kill the forward declaration of flush_write_bio()
  btrfs: disk-io: Show the timing of corrupted tree block explicitly
  btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  btrfs: extent_io: Handle error better in extent_write_full_page()
  btrfs: extent_io: Handle error better in btree_write_cache_pages()
  btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_write_locked_range()
  btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_writepages()
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c      |  21 +++--
 fs/btrfs/extent_io.c    | 168 ++++++++++++++++++++++++++++------------
 fs/btrfs/tree-checker.c |  24 +++++-
 fs/btrfs/tree-checker.h |   8 ++
 4 files changed, 162 insertions(+), 59 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-02-16  6:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 10:50 [PATCH v5 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 05/12] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 06/12] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 08/12] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 11/12] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-02-15 13:10 ` [PATCH v5 00/12] btrfs: Enhancement to tree block validation Nikolay Borisov
2019-02-15 13:18   ` Qu Wenruo
2019-02-15 17:19     ` David Sterba
2019-02-16  6:49       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).