Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: Leonard Lausen <leonard@lausen.nl>
Subject: [PATCH v4 12/12] btrfs: Do mandatory tree block check before submitting bio
Date: Fri, 25 Jan 2019 13:09:25 +0800
Message-ID: <20190125050925.30754-13-wqu@suse.com> (raw)
In-Reply-To: <20190125050925.30754-1-wqu@suse.com>

There are at least 2 reports about memory bit flip sneaking into on-disk
data.

Currently we only have a relaxed check triggered at
btrfs_mark_buffer_dirty() time, as it's not mandatory and only for
CONFIG_BTRFS_FS_CHECK_INTEGRITY enabled build, it doesn't help user to
detect such problem.

This patch will address the hole by triggering comprehensive check on
tree blocks before writing it back to disk.

The design points are:
- Timing of the check: Tree block write hook
  This timing is chosen to reduce the overhead.
  The comprehensive check should be as expensive as csum.
  Doing full check at btrfs_mark_buffer_dirty() is too expensive for end
  user.

- Loose empty leaf check
  Originally for empty leaf, tree-checker will report error if it's not
  a tree root.
  The problem for such check at write time is:
  * False alert for tree root created in current transaction
    In that case, the commit root still needs to be written to disk.
    And since current root can differ from commit root, then it will
    cause false alert.
    This happens for log tree.

  * False alert for relocated tree block
    Relocated tree block can be written to disk due to memory pressure,
    in that case an empty csum tree root can be written to disk and
    cause false alert, since csum root node hasn't been updated.

  Although some more reliable empty leaf check is still kept as is.
  Namely essential trees (e.g. extent, chunk) should never be empty.

The example error output will be something like:
  BTRFS critical (device dm-3): corrupt leaf: root=2 block=1350630375424 slot=68, bad key order, prev (10510212874240 169 0) current (1714119868416 169 0)
  BTRFS error (device dm-3): write time tree block corruption detected
  BTRFS: error (device dm-3) in btrfs_commit_transaction:2220: errno=-5 IO failure (Error while writing out transaction)
  BTRFS info (device dm-3): forced readonly
  BTRFS warning (device dm-3): Skipping commit of aborted transaction.
  BTRFS: error (device dm-3) in cleanup_transaction:1839: errno=-5 IO failure
  BTRFS info (device dm-3): delayed_refs has NO entry

Reported-by: Leonard Lausen <leonard@lausen.nl>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c      |  9 +++++++++
 fs/btrfs/tree-checker.c | 24 +++++++++++++++++++++---
 fs/btrfs/tree-checker.h |  8 ++++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 426e9f450f70..135d84ed72eb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -313,6 +313,15 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
 			return -EUCLEAN;
 		}
 	} else {
+		if (btrfs_header_level(buf))
+			err = btrfs_check_node(fs_info, buf);
+		else
+			err = btrfs_check_leaf_write(fs_info, buf);
+		if (err < 0) {
+			btrfs_err(fs_info,
+				  "write time tree block corruption detected");
+			return err;
+		}
 		write_extent_buffer(buf, result, 0, csum_size);
 	}
 
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index a62e1e837a89..b8cdaf472031 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -477,7 +477,7 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
 }
 
 static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
-		      bool check_item_data)
+		      bool check_item_data, bool check_empty_leaf)
 {
 	/* No valid key type is 0, so all key should be larger than this key */
 	struct btrfs_key prev_key = {0, 0, 0};
@@ -516,6 +516,18 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 				    owner);
 			return -EUCLEAN;
 		}
+
+		/*
+		 * Skip empty leaf check, mostly for write time tree block
+		 *
+		 * Such skip mostly happens for tree block write time, as
+		 * we can't use @owner as accurate owner indicator.
+		 * Case like balance and new tree block created for commit root
+		 * can break owner check easily.
+		 */
+		if (!check_empty_leaf)
+			return 0;
+
 		key.objectid = owner;
 		key.type = BTRFS_ROOT_ITEM_KEY;
 		key.offset = (u64)-1;
@@ -636,13 +648,19 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
 			  struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, true);
+	return check_leaf(fs_info, leaf, true, true);
 }
 
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf)
 {
-	return check_leaf(fs_info, leaf, false);
+	return check_leaf(fs_info, leaf, false, true);
+}
+
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf)
+{
+	return check_leaf(fs_info, leaf, false, false);
 }
 
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node)
diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h
index ff043275b784..6f8d1b627c53 100644
--- a/fs/btrfs/tree-checker.h
+++ b/fs/btrfs/tree-checker.h
@@ -23,6 +23,14 @@ int btrfs_check_leaf_full(struct btrfs_fs_info *fs_info,
  */
 int btrfs_check_leaf_relaxed(struct btrfs_fs_info *fs_info,
 			     struct extent_buffer *leaf);
+
+/*
+ * Write time specific leaf checker.
+ * Don't check if the empty leaf belongs to a tree root. Mostly for balance
+ * and new tree created in current transaction.
+ */
+int btrfs_check_leaf_write(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *leaf);
 int btrfs_check_node(struct btrfs_fs_info *fs_info, struct extent_buffer *node);
 
 #endif
-- 
2.20.1


      parent reply index

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-25  5:09 [PATCH v4 00/12] btrfs: Enhancement to tree block validation Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 01/12] btrfs: Always output error message when key/level verification fails Qu Wenruo
2019-01-25  9:02   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 02/12] btrfs: extent_io: Kill the forward declaration of flush_write_bio() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 03/12] btrfs: disk-io: Show the timing of corrupted tree block explicitly Qu Wenruo
2019-01-25  9:03   ` Johannes Thumshirn
2019-01-30 14:57   ` David Sterba
2019-01-30 14:59     ` Nikolay Borisov
2019-01-31  0:03       ` Qu Wenruo
2019-01-31 14:20         ` David Sterba
2019-01-31 14:22           ` Nikolay Borisov
2019-02-07 17:27             ` David Sterba
2019-01-25  5:09 ` [PATCH v4 04/12] btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up Qu Wenruo
2019-01-25  9:04   ` Johannes Thumshirn
2019-01-30 15:08   ` David Sterba
2019-01-30 15:19   ` David Sterba
2019-01-31  0:45     ` Qu Wenruo
2019-01-31 14:36       ` David Sterba
2019-02-15  7:18         ` Qu Wenruo
2019-02-15  9:58           ` Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 05/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_full_page() Qu Wenruo
2019-01-25  9:05   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 06/12] btrfs: extent_io: Kill the BUG_ON() in btree_write_cache_pages() Qu Wenruo
2019-01-25  9:11   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 07/12] btrfs: extent_io: Kill the dead branch in extent_write_cache_pages() Qu Wenruo
2019-01-25  9:11   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 08/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_locked_range() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 09/12] btrfs: extent_io: Kill the BUG_ON() in lock_extent_buffer_for_io() Qu Wenruo
2019-01-25  5:09 ` [PATCH v4 10/12] btrfs: extent_io: Kill the BUG_ON() in extent_write_cache_pages() Qu Wenruo
2019-01-25  9:16   ` Johannes Thumshirn
2019-01-25  5:09 ` [PATCH v4 11/12] btrfs: extent_io: Kill the BUG_ON() in extent_writepages() Qu Wenruo
2019-01-25  9:16   ` Johannes Thumshirn
2019-01-25  5:09 ` Qu Wenruo [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190125050925.30754-13-wqu@suse.com \
    --to=wqu@suse.com \
    --cc=leonard@lausen.nl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox