All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker
@ 2019-03-20  6:27 Qu Wenruo
  2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
                   ` (11 more replies)
  0 siblings, 12 replies; 24+ messages in thread
From: Qu Wenruo @ 2019-03-20  6:27 UTC (permalink / raw)
  To: linux-btrfs

Patchset can be fetched from github:
https://github.com/adam900710/linux/tree/write_time_tree_checker
Which is based on v5.1-rc1 tag.

This patchset has the following 3 features:
- Tree block validation output enhancement
  * Output validation failure timing (write time or read time)
  * Always output tree block level/key mismatch error message
    This part is already submitted and reviewed.

- Write time tree block validation check
  To catch memory corruption either from hardware or kernel.
  Example output would be:

    BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
    BTRFS error (device dm-3): block=1350630375424 write time tree block corruption detected
    BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
    BTRFS info (device dm-3): forced readonly
    BTRFS warning (device dm-3): Skipping commit of aborted transaction.
    BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
    BTRFS info (device dm-3): delayed_refs has NO entry

- Better error handling before calling flush_write_bio()
  One hidden reason of calling flush_write_bio() under all cases is,
  flush_write_bio() will trigger endio function and endio function of
  epd->bio will free the bio under all cases.
  So we're in fact abusing flush_write_bio() as cleanup.

  Since now flush_write_bio() has its own return value, we shouldn't call
  flush_write_bio() no-brain, here we introduce proper cleanup helper,
  end_write_bio(). Now we call flush_write_bio() like:
              New                 |           Old
  --------------------------------------------------------------
  ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
  if (ret < 0) {                  | flush_write_bio(&epd);
  	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
  	return ret;               | return ret;
  }                               |
  ret = flush_write_bio(&epd);    |
  return ret;                     |

  Above code should be more streamline for the error handling part.

Changelog:
v2:
- Unlock locked pages in lock_extent_buffer_for_io() for error handling.
- Added Reviewed-by tags.

v3:
- Remove duplicated error message.
- Use IS_ENABLED() macro to replace #ifdef.
- Added Reviewed-by tags.

v4:
- Re-organized patch split
  Now each BUG_ON() cleanup has its own patch
- Dig much further into the call sites to eliminate unexpected >0 return
  May be a little paranoid and abuse some ASSERT(), but it should be
  much safer against further code change.
- Fix the false alert caused by balance and memory pressure
  The fix it skip owner checker for non-essential tree at write time.
  Since owner root can't always be reliable, either due to commit root
  created in current transaction or balance + memory pressure.

v5:
- Do proper error-out handling other than relying on flush_write_bio()
  to clean up.
  This has a side effect that no Reviewed-by tags for modified patches.
- New comment for why we don't need to do anything about ebp->bio when
  submit_one_bio() fails.
- Add some Reviewed-by tag.

v5.1:
- Add "block=%llu " output for write/read time error line.
- Also output read time error message for fsid/start/level check.

v5.2:
- Fix a missing page_unlock() in error hanlding

v5.3:
- Rebase to v5.1-rc1 tag

Qu Wenruo (11):
  btrfs: Always output error message when key/level verification fails
  btrfs: disk-io: Show the timing of corrupted tree block explicitly
  btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
  btrfs: extent_io: Handle error better in extent_write_full_page()
  btrfs: extent_io: Handle error better in btree_write_cache_pages()
  btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_write_locked_range()
  btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
  btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
  btrfs: extent_io: Handle error better in extent_writepages()
  btrfs: Do mandatory tree block check before submitting bio

 fs/btrfs/disk-io.c      |  24 ++++++---
 fs/btrfs/extent_io.c    | 109 +++++++++++++++++++++++++++++++++-------
 fs/btrfs/tree-checker.c |  24 +++++++--
 fs/btrfs/tree-checker.h |   8 +++
 4 files changed, 137 insertions(+), 28 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-04-03  9:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20  6:27 [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 01/11] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 02/11] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-03-29 14:11   ` David Sterba
2019-03-29 14:18     ` Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 03/11] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-03-20 19:39   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 04/11] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 05/11] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 06/11] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 07/11] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-03-21 13:19   ` David Sterba
2019-03-21 13:45     ` Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 08/11] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-03-21 13:30   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 09/11] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-03-21 14:14   ` David Sterba
2019-03-20  6:27 ` [PATCH v5.3 10/11] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-03-20  6:27 ` [PATCH v5.3 11/11] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-04-03  9:07   ` Qu Wenruo
2019-03-21 14:34 ` [PATCH v5.3 00/11] btrfs: tree-checker: Write time tree checker David Sterba
2019-03-21 23:56   ` Qu Wenruo
2019-03-22 17:52     ` David Sterba
2019-03-25  4:32       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.