linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v5 00/12] btrfs: Enhancement to tree block validation
Date: Fri, 15 Feb 2019 15:10:36 +0200	[thread overview]
Message-ID: <2ebcb996-dd7a-2662-811a-5526fccb03b7@suse.com> (raw)
In-Reply-To: <20190215105044.17619-1-wqu@suse.com>



On 15.02.19 г. 12:50 ч., Qu Wenruo wrote:
> Patchset can be fetched from github:
> https://github.com/adam900710/linux/tree/write_time_tree_checker
> Which is based on v5.0-rc1 tag.
> Also there is no conflict rebasing the patchset to misc-next.
> 
> This patchset has the following 3 features:
> - Tree block validation output enhancement
>   * Output validation failure timing (write time or read time)
>   * Always output tree block level/key mismatch error message
>     This part is already submitted and reviewed.
> 
> - Write time tree block validation check
>   To catch memory corruption either from hardware or kernel.
>   Example output would be:
> 
>     BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
>     BTRFS error (device dm-3): write time tree block corruption detected
This is not good.  Those two error messages should be collapsed into
one. Otherwise it's hard to actually match them up. Better output will
be "Corrupt leaf detected during writing: root=..." and eliminate "write
time tree block corruption detected" line. Is that feasible?

>     BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
>     BTRFS info (device dm-3): forced readonly
>     BTRFS warning (device dm-3): Skipping commit of aborted transaction.
>     BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
>     BTRFS info (device dm-3): delayed_refs has NO entry
> 
> - Better error handling before calling flush_write_bio()
>   One hidden reason of calling flush_write_bio() under all cases is,
>   flush_write_bio() will trigger endio function and endio function of
>   epd->bio will free the bio under all cases.
>   So we're in fact abusing flush_write_bio() as cleanup.
> 
>   Since now flush_write_bio() has its own return value, we shouldn't call
>   flush_write_bio() no-brain, here we introduce proper cleanup helper,
>   end_write_bio(). Now we call flush_write_bio() like:
>               New                 |           Old
>   --------------------------------------------------------------
>   ret = do_some_evil(&epd);       | ret = do_some_evil(&epd);
>   if (ret < 0) {                  | flush_write_bio(&epd);
>   	end_write_bio(&epd, ret); | ^^^ submitting half-backed epd->bio?
>   	return ret;               | return ret;
>   }                               |
>   ret = flush_write_bio(&epd);    |
>   return ret;                     |
> 
>   Above code should be more streamline for the error handling part.
> 
> Changelog:
> v2:
> - Unlock locked pages in lock_extent_buffer_for_io() for error handling.
> - Added Reviewed-by tags.
> 
> v3:
> - Remove duplicated error message.
> - Use IS_ENABLED() macro to replace #ifdef.
> - Added Reviewed-by tags.
> 
> v4:
> - Re-organized patch split
>   Now each BUG_ON() cleanup has its own patch
> - Dig much further into the call sites to eliminate unexpected >0 return
>   May be a little paranoid and abuse some ASSERT(), but it should be
>   much safer against further code change.
> - Fix the false alert caused by balance and memory pressure
>   The fix it skip owner checker for non-essential tree at write time.
>   Since owner root can't always be reliable, either due to commit root
>   created in current transaction or balance + memory pressure.
> 
> v5:
> - Do proper error-out handling other than relying on flush_write_bio()
>   to clean up.
>   This has a side effect that no Reviewed-by tags for modified patches.
> - New comment for why we don't need to do anything about ebp->bio when
>   submit_one_bio() fails.
> - Add some Reviewed-by tag.
> 
> Qu Wenruo (12):
>   btrfs: Always output error message when key/level verification fails
>   btrfs: extent_io: Kill the forward declaration of flush_write_bio()
>   btrfs: disk-io: Show the timing of corrupted tree block explicitly
>   btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up
>   btrfs: extent_io: Handle error better in extent_write_full_page()
>   btrfs: extent_io: Handle error better in btree_write_cache_pages()
>   btrfs: extent_io: Kill the dead branch in extent_write_cache_pages()
>   btrfs: extent_io: Handle error better in extent_write_locked_range()
>   btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io()
>   btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages()
>   btrfs: extent_io: Handle error better in extent_writepages()
>   btrfs: Do mandatory tree block check before submitting bio
> 
>  fs/btrfs/disk-io.c      |  21 +++--
>  fs/btrfs/extent_io.c    | 168 ++++++++++++++++++++++++++++------------
>  fs/btrfs/tree-checker.c |  24 +++++-
>  fs/btrfs/tree-checker.h |   8 ++
>  4 files changed, 162 insertions(+), 59 deletions(-)
> 

  parent reply	other threads:[~2019-02-15 13:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-15 10:50 [PATCH v5 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 05/12] btrfs: extent_io: Handle error better in extent_write_full_page() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 06/12] btrfs: extent_io: Handle error better in btree_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 08/12] btrfs: extent_io: Handle error better in extent_write_locked_range() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 11/12] btrfs: extent_io: Handle error better in extent_writepages() Qu Wenruo
2019-02-15 10:50 ` [PATCH v5 12/12] btrfs: Do mandatory tree block check before submitting bio Qu Wenruo
2019-02-15 13:10 ` Nikolay Borisov [this message]
2019-02-15 13:18   ` [PATCH v5 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-02-15 17:19     ` David Sterba
2019-02-16  6:49       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ebcb996-dd7a-2662-811a-5526fccb03b7@suse.com \
    --to=nborisov@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).