All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Enhancement for block group/chunk verification
@ 2018-07-03  9:10 Qu Wenruo
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

Can be fetched from github, which is based on v4.18-rc1 tag:
https://github.com/adam900710/linux/tree/tree_checker_enhance

Reported by Xu Wen <wen.xu@gatech.edu>, some crafted btrfs image can
cause unexpected kernel behavior.

All of them are related to block group and chunk, so this patchset will
enhance block group and chunk verification, so kernel can detect them
and error out gracefully (with user friendly error message showing
what's going wrong)

Obvious corruption (don't need to cross check with chunk/block group),
will be addressed by enhanced tree-checker.
(Most crafted images will be caught by tree-checker)

More complex corruption will be addressed mostly at
btrfs_read_block_groups(), doing extra cross reference check for
chunk<->block group mapping.
It may cause extra mount time, but compared to the existing time
consuming block group items search, all added check is done completely
in memory using rb_tree, so it shouldn't add too much overhead.

Qu Wenruo (5):
  btrfs: tree-checker: Verify block_group_item
  btrfs: tree-checker: Detect invalid empty essential tree
  btrfs: relocation: Only remove reloc rb_trees if reloc control has
    been initialized
  btrfs: Check each block group has corresponding chunk at mount time
  btrfs: Verify every chunk has corresponding block group at mount time

 fs/btrfs/extent-tree.c  | 107 +++++++++++++++++++++++++++++++++++--
 fs/btrfs/relocation.c   |  23 ++++----
 fs/btrfs/tree-checker.c | 113 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c      |   2 +-
 fs/btrfs/volumes.h      |   2 +
 5 files changed, 232 insertions(+), 15 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
@ 2018-07-03  9:10 ` Qu Wenruo
  2018-07-04  2:20   ` Gu, Jinxiang
                     ` (2 more replies)
  2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

A crafted image with invalid block group items could make free space cache
code to cause panic.

We could early detect such invalid block group item by checking:
1) Item size
   Fixed value.
2) Block group size (key.offset)
   We have a up limit on block group item (10G)
3) Chunk objectid
   Fixed value.
4) Type
   Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
   No more than 1 bit set for profile type.
5) Used space
   No more than block group size.

This should allow btrfs to detect and refuse to mount the crafted image.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 101 ++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.c      |   2 +-
 fs/btrfs/volumes.h      |   2 +
 3 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 8d40e7dd8c30..1cd735b099df 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -19,6 +19,7 @@
 #include "tree-checker.h"
 #include "disk-io.h"
 #include "compression.h"
+#include "volumes.h"
 
 /*
  * Error message should follow the following format:
@@ -353,6 +354,103 @@ static int check_dir_item(struct btrfs_fs_info *fs_info,
 	return 0;
 }
 
+__printf(4, 5)
+__cold
+static void block_group_err(const struct btrfs_fs_info *fs_info,
+			    const struct extent_buffer *eb, int slot,
+			    const char *fmt, ...)
+{
+	struct btrfs_key key;
+	struct va_format vaf;
+	va_list args;
+
+	btrfs_item_key_to_cpu(eb, &key, slot);
+	va_start(args, fmt);
+
+	vaf.fmt = fmt;
+	vaf.va = &args;
+
+	btrfs_crit(fs_info,
+	"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV",
+		btrfs_header_level(eb) == 0 ? "leaf" : "node",
+		btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
+		key.objectid, key.offset, &vaf);
+	va_end(args);
+}
+
+static int check_block_group_item(struct btrfs_fs_info *fs_info,
+				  struct extent_buffer *leaf,
+				  struct btrfs_key *key, int slot)
+{
+	struct btrfs_block_group_item bgi;
+	u32 item_size = btrfs_item_size_nr(leaf, slot);
+	u64 flags;
+	u64 type_flags;
+
+	/*
+	 * Here we don't really care about unalignment since extent allocator
+	 * can handle it.
+	 * We care more about the size, as if one block group is larger than
+	 * maximum size, it's must be some obvious corruption
+	 */
+	if (key->offset > BTRFS_MAX_DATA_CHUNK_SIZE || key->offset == 0) {
+		block_group_err(fs_info, leaf, slot,
+			"invalid block group size, have %llu expect (0, %llu]",
+				key->offset, 10ULL * SZ_1G);
+		return -EUCLEAN;
+	}
+
+	if (item_size != sizeof(bgi)) {
+		block_group_err(fs_info, leaf, slot,
+			"invalid item size, have %u expect %lu",
+				item_size, sizeof(bgi));
+		return -EUCLEAN;
+	}
+
+	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
+			   sizeof(bgi));
+	if (btrfs_block_group_chunk_objectid(&bgi) !=
+	    BTRFS_FIRST_CHUNK_TREE_OBJECTID) {
+		block_group_err(fs_info, leaf, slot,
+		"invalid block group chunk objectid, have %llu expect %llu",
+				btrfs_block_group_chunk_objectid(&bgi),
+				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+		return -EUCLEAN;
+	}
+
+	if (btrfs_block_group_used(&bgi) > key->offset) {
+		block_group_err(fs_info, leaf, slot,
+			"invalid block group used, have %llu expect [0, %llu)",
+				btrfs_block_group_used(&bgi), key->offset);
+		return -EUCLEAN;
+	}
+
+	flags = btrfs_block_group_flags(&bgi);
+	if (hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) > 1) {
+		block_group_err(fs_info, leaf, slot,
+"invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set",
+			flags & BTRFS_BLOCK_GROUP_PROFILE_MASK,
+			hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK));
+		return -EUCLEAN;
+	}
+
+	type_flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
+	if (type_flags != BTRFS_BLOCK_GROUP_DATA &&
+	    type_flags != BTRFS_BLOCK_GROUP_METADATA &&
+	    type_flags != BTRFS_BLOCK_GROUP_SYSTEM &&
+	    type_flags != (BTRFS_BLOCK_GROUP_METADATA |
+			   BTRFS_BLOCK_GROUP_DATA)) {
+		block_group_err(fs_info, leaf, slot,
+"invalid type flags, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llu or 0x%llx",
+			type_flags, hweight64(type_flags),
+			BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA,
+			BTRFS_BLOCK_GROUP_SYSTEM,
+			BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA);
+		return -EUCLEAN;
+	}
+	return 0;
+}
+
 /*
  * Common point to switch the item-specific validation.
  */
@@ -374,6 +472,9 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
 	case BTRFS_XATTR_ITEM_KEY:
 		ret = check_dir_item(fs_info, leaf, key, slot);
 		break;
+	case BTRFS_BLOCK_GROUP_ITEM_KEY:
+		ret = check_block_group_item(fs_info, leaf, key, slot);
+		break;
 	}
 	return ret;
 }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e034ad9e23b4..b33bf29130b6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4690,7 +4690,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 
 	if (type & BTRFS_BLOCK_GROUP_DATA) {
 		max_stripe_size = SZ_1G;
-		max_chunk_size = 10 * max_stripe_size;
+		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
 		if (!devs_max)
 			devs_max = BTRFS_MAX_DEVS(info);
 	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 5139ec8daf4c..77e6004b6cb9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -11,6 +11,8 @@
 #include <linux/btrfs.h>
 #include "async-thread.h"
 
+#define BTRFS_MAX_DATA_CHUNK_SIZE	(10ULL * SZ_1G)
+
 extern struct mutex uuid_mutex;
 
 #define BTRFS_STRIPE_LEN	SZ_64K
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
@ 2018-07-03  9:10 ` Qu Wenruo
  2018-07-04  3:42   ` Gu, Jinxiang
                     ` (2 more replies)
  2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

A crafted image has empty root tree block, which will cause later NULL
pointer dereference.

The following trees should never be empty:
1) Tree root
   Must contain at least root items for extent tree, device tree and fs
   tree

2) Chunk tree
   Or we can't even bootstrap.

3) Fs tree
   At least inode item for top level inode (.).

4) Device tree
   Dev extents for chunks

5) Extent tree
   Must has corresponding extent for each chunk.

If any is empty, we are sure the fs is corrupted and no need to mount
it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/tree-checker.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 1cd735b099df..3a307efab46b 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -497,8 +497,20 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
 	 * skip this check for relocation trees.
 	 */
 	if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) {
+		u64 owner = btrfs_header_owner(leaf);
 		struct btrfs_root *check_root;
 
+		/* These trees should never be empty */
+		if (owner == BTRFS_ROOT_TREE_OBJECTID ||
+		    owner == BTRFS_CHUNK_TREE_OBJECTID ||
+		    owner == BTRFS_EXTENT_TREE_OBJECTID ||
+		    owner == BTRFS_DEV_TREE_OBJECTID ||
+		    owner == BTRFS_FS_TREE_OBJECTID) {
+			generic_err(fs_info, leaf, 0,
+			"invalid root, root %llu should never be empty",
+				    owner);
+			return -EUCLEAN;
+		}
 		key.objectid = btrfs_header_owner(leaf);
 		key.type = BTRFS_ROOT_ITEM_KEY;
 		key.offset = (u64)-1;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
  2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
@ 2018-07-03  9:10 ` Qu Wenruo
  2018-07-04  5:23   ` Gu, Jinxiang
  2018-07-04  7:37   ` Gu, Jinxiang
  2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

Invalid tree reloc tree can cause kernel NULL pointer dereference when
btrfs does some cleanup for reloc roots.

It turns out that fs_info->reloc_ctl can be NULL in
btrfs_recover_relocation() as we allocate relocation control after all
reloc roots are verified.
So when we hit out: tag, we haven't call set_reloc_control() thus
fs_info->reloc_ctl is still NULL.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 879b76fa881a..be94c65bb4d2 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1321,18 +1321,19 @@ static void __del_reloc_root(struct btrfs_root *root)
 	struct mapping_node *node = NULL;
 	struct reloc_control *rc = fs_info->reloc_ctl;
 
-	spin_lock(&rc->reloc_root_tree.lock);
-	rb_node = tree_search(&rc->reloc_root_tree.rb_root,
-			      root->node->start);
-	if (rb_node) {
-		node = rb_entry(rb_node, struct mapping_node, rb_node);
-		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
+	if (rc) {
+		spin_lock(&rc->reloc_root_tree.lock);
+		rb_node = tree_search(&rc->reloc_root_tree.rb_root,
+				      root->node->start);
+		if (rb_node) {
+			node = rb_entry(rb_node, struct mapping_node, rb_node);
+			rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
+		}
+		spin_unlock(&rc->reloc_root_tree.lock);
+		if (!node)
+			return;
+		BUG_ON((struct btrfs_root *)node->data != root);
 	}
-	spin_unlock(&rc->reloc_root_tree.lock);
-
-	if (!node)
-		return;
-	BUG_ON((struct btrfs_root *)node->data != root);
 
 	spin_lock(&fs_info->trans_lock);
 	list_del_init(&root->root_list);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
                   ` (2 preceding siblings ...)
  2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
@ 2018-07-03  9:10 ` Qu Wenruo
  2018-07-04  5:45   ` Gu, Jinxiang
  2018-07-04  6:02   ` Nikolay Borisov
  2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
  2018-07-04 13:36 ` [PATCH 0/5] Enhancement for block group/chunk verification David Sterba
  5 siblings, 2 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

A crafted btrfs with incorrect chunk<->block group mapping, it could leads
to a lot of unexpected behavior.

Although the crafted image can be catched by block group item checker
added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
crafted a valid enough block group item which can pass above check but
still mismatch with existing chunk, it could cause a lot of undefined
behavior.

This patch will add extra block group -> chunk mapping check, to ensure
we have a completely matching (start, len, flags) chunk for each block
group at mount time.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58c0080..82b446f014b9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10003,6 +10003,41 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info,
 	return cache;
 }
 
+static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
+			     u64 flags)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	int ret;
+
+	read_lock(&map_tree->map_tree.lock);
+	em = lookup_extent_mapping(&map_tree->map_tree, start, len);
+	read_unlock(&map_tree->map_tree.lock);
+
+	if (!em) {
+		btrfs_err_rl(fs_info,
+	"block group start=%llu len=%llu doesn't have corresponding chunk",
+			     start, len);
+		ret = -ENOENT;
+		goto out;
+	}
+	if (em->start != start || em->len != len ||
+	    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
+	    (flags & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+		btrfs_err_rl(fs_info,
+"block group start=%llu len=%llu flags=0x%llx doesn't match with chunk start=%llu len=%llu flags=0x%llx",
+			     start, len , flags & BTRFS_BLOCK_GROUP_TYPE_MASK,
+			     em->start, em->len, em->map_lookup->type &
+			     BTRFS_BLOCK_GROUP_TYPE_MASK);
+		ret = -EUCLEAN;
+		goto out;
+	}
+	ret = 0;
+out:
+	free_extent_map(em);
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -10036,6 +10071,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 		need_clear = 1;
 
 	while (1) {
+		struct btrfs_block_group_item bg;
+		int slot;
+
 		ret = find_first_block_group(info, path, &key);
 		if (ret > 0)
 			break;
@@ -10043,7 +10081,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 			goto error;
 
 		leaf = path->nodes[0];
-		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+		slot = path->slots[0];
+		btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+		read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
+				   sizeof(bg));
+		/*
+		 * Chunk and block group must have 1:1 mapping.
+		 * So there must be a chunk for this block group.
+		 */
+		ret = check_exist_chunk(info, found_key.objectid,
+					found_key.offset,
+					btrfs_block_group_flags(&bg));
+		if (ret < 0)
+			goto error;
 
 		cache = btrfs_create_block_group_cache(info, found_key.objectid,
 						       found_key.offset);
@@ -10068,7 +10119,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 		}
 
 		read_extent_buffer(leaf, &cache->item,
-				   btrfs_item_ptr_offset(leaf, path->slots[0]),
+				   btrfs_item_ptr_offset(leaf, slot),
 				   sizeof(cache->item));
 		cache->flags = btrfs_block_group_flags(&cache->item);
 		if (!mixed &&
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
                   ` (3 preceding siblings ...)
  2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
@ 2018-07-03  9:10 ` Qu Wenruo
  2018-07-04  6:09   ` Gu, Jinxiang
                     ` (2 more replies)
  2018-07-04 13:36 ` [PATCH 0/5] Enhancement for block group/chunk verification David Sterba
  5 siblings, 3 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-03  9:10 UTC (permalink / raw)
  To: linux-btrfs

If a crafted btrfs has missing block group items, it could cause
unexpected behavior and breaks our expectation on 1:1
chunk<->block group mapping.

Although we added block group -> chunk mapping check, we still need
chunk -> block group mapping check.

This patch will do extra check to ensure each chunk has its
corresponding block group.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b446f014b9..746095034ca2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
 	return ret;
 }
 
+/*
+ * Iterate all chunks and verify each of them has corresponding block group
+ */
+static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	struct btrfs_block_group_cache *bg;
+	u64 start = 0;
+	int ret = 0;
+
+	while (1) {
+		read_lock(&map_tree->map_tree.lock);
+		em = lookup_extent_mapping(&map_tree->map_tree, start,
+					   (u64)-1 - start);
+		read_unlock(&map_tree->map_tree.lock);
+		if (!em)
+			break;
+
+		bg = btrfs_lookup_block_group(fs_info, em->start);
+		if (!bg) {
+			btrfs_err_rl(fs_info,
+	"chunk start=%llu len=%llu doesn't have corresponding block group",
+				     em->start, em->len);
+			ret = -ENOENT;
+			free_extent_map(em);
+			break;
+		}
+		if (bg->key.objectid != em->start ||
+		    bg->key.offset != em->len ||
+		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
+		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+			btrfs_err_rl(fs_info,
+"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
+				em->start, em->len,
+				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
+				bg->key.objectid, bg->key.offset,
+				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
+			ret = -EUCLEAN;
+			free_extent_map(em);
+			btrfs_put_block_group(bg);
+			break;
+		}
+		start = em->start + em->len;
+		free_extent_map(em);
+		btrfs_put_block_group(bg);
+	}
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 
 	btrfs_add_raid_kobjects(info);
 	init_global_block_rsv(info);
-	ret = 0;
+	ret = check_chunk_block_group_mappings(info);
 error:
 	btrfs_free_path(path);
 	return ret;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
@ 2018-07-04  2:20   ` Gu, Jinxiang
  2018-07-04  5:54   ` Nikolay Borisov
  2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  2:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 6664 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
> 
> A crafted image with invalid block group items could make free space cache
> code to cause panic.
> 
> We could early detect such invalid block group item by checking:
> 1) Item size
>    Fixed value.
> 2) Block group size (key.offset)
>    We have a up limit on block group item (10G)
> 3) Chunk objectid
>    Fixed value.
> 4) Type
>    Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
>    No more than 1 bit set for profile type.
> 5) Used space
>    No more than block group size.
> 
> This should allow btrfs to detect and refuse to mount the crafted image.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/tree-checker.c | 101 ++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.c      |   2 +-
>  fs/btrfs/volumes.h      |   2 +
>  3 files changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 8d40e7dd8c30..1cd735b099df 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -19,6 +19,7 @@
>  #include "tree-checker.h"
>  #include "disk-io.h"
>  #include "compression.h"
> +#include "volumes.h"
> 
>  /*
>   * Error message should follow the following format:
> @@ -353,6 +354,103 @@ static int check_dir_item(struct btrfs_fs_info *fs_info,
>  	return 0;
>  }
> 
> +__printf(4, 5)
> +__cold
> +static void block_group_err(const struct btrfs_fs_info *fs_info,
> +			    const struct extent_buffer *eb, int slot,
> +			    const char *fmt, ...)
> +{
> +	struct btrfs_key key;
> +	struct va_format vaf;
> +	va_list args;
> +
> +	btrfs_item_key_to_cpu(eb, &key, slot);
> +	va_start(args, fmt);
> +
> +	vaf.fmt = fmt;
> +	vaf.va = &args;
> +
> +	btrfs_crit(fs_info,
> +	"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV",
> +		btrfs_header_level(eb) == 0 ? "leaf" : "node",
> +		btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
> +		key.objectid, key.offset, &vaf);
> +	va_end(args);
> +}
> +
> +static int check_block_group_item(struct btrfs_fs_info *fs_info,
> +				  struct extent_buffer *leaf,
> +				  struct btrfs_key *key, int slot)
> +{
> +	struct btrfs_block_group_item bgi;
> +	u32 item_size = btrfs_item_size_nr(leaf, slot);
> +	u64 flags;
> +	u64 type_flags;
> +
> +	/*
> +	 * Here we don't really care about unalignment since extent allocator
> +	 * can handle it.
> +	 * We care more about the size, as if one block group is larger than
> +	 * maximum size, it's must be some obvious corruption
> +	 */
> +	if (key->offset > BTRFS_MAX_DATA_CHUNK_SIZE || key->offset == 0) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid block group size, have %llu expect (0, %llu]",
> +				key->offset, 10ULL * SZ_1G);
> +		return -EUCLEAN;
> +	}
> +
> +	if (item_size != sizeof(bgi)) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid item size, have %u expect %lu",
> +				item_size, sizeof(bgi));
> +		return -EUCLEAN;
> +	}
> +
> +	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
> +			   sizeof(bgi));
> +	if (btrfs_block_group_chunk_objectid(&bgi) !=
> +	    BTRFS_FIRST_CHUNK_TREE_OBJECTID) {
> +		block_group_err(fs_info, leaf, slot,
> +		"invalid block group chunk objectid, have %llu expect %llu",
> +				btrfs_block_group_chunk_objectid(&bgi),
> +				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> +		return -EUCLEAN;
> +	}
> +
> +	if (btrfs_block_group_used(&bgi) > key->offset) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid block group used, have %llu expect [0, %llu)",
> +				btrfs_block_group_used(&bgi), key->offset);
> +		return -EUCLEAN;
> +	}
> +
> +	flags = btrfs_block_group_flags(&bgi);
> +	if (hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) > 1) {
> +		block_group_err(fs_info, leaf, slot,
> +"invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set",
> +			flags & BTRFS_BLOCK_GROUP_PROFILE_MASK,
> +			hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK));
> +		return -EUCLEAN;
> +	}
> +
> +	type_flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
> +	if (type_flags != BTRFS_BLOCK_GROUP_DATA &&
> +	    type_flags != BTRFS_BLOCK_GROUP_METADATA &&
> +	    type_flags != BTRFS_BLOCK_GROUP_SYSTEM &&
> +	    type_flags != (BTRFS_BLOCK_GROUP_METADATA |
> +			   BTRFS_BLOCK_GROUP_DATA)) {
> +		block_group_err(fs_info, leaf, slot,
> +"invalid type flags, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llu or 0x%llx",
> +			type_flags, hweight64(type_flags),
> +			BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA,
> +			BTRFS_BLOCK_GROUP_SYSTEM,
> +			BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA);
> +		return -EUCLEAN;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * Common point to switch the item-specific validation.
>   */
> @@ -374,6 +472,9 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
>  	case BTRFS_XATTR_ITEM_KEY:
>  		ret = check_dir_item(fs_info, leaf, key, slot);
>  		break;
> +	case BTRFS_BLOCK_GROUP_ITEM_KEY:
> +		ret = check_block_group_item(fs_info, leaf, key, slot);
> +		break;
>  	}
>  	return ret;
>  }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e034ad9e23b4..b33bf29130b6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4690,7 +4690,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> 
>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>  		max_stripe_size = SZ_1G;
> -		max_chunk_size = 10 * max_stripe_size;
> +		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>  		if (!devs_max)
>  			devs_max = BTRFS_MAX_DEVS(info);
>  	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 5139ec8daf4c..77e6004b6cb9 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -11,6 +11,8 @@
>  #include <linux/btrfs.h>
>  #include "async-thread.h"
> 
> +#define BTRFS_MAX_DATA_CHUNK_SIZE	(10ULL * SZ_1G)
> +
>  extern struct mutex uuid_mutex;
> 
>  #define BTRFS_STRIPE_LEN	SZ_64K
> --

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
  2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
@ 2018-07-04  3:42   ` Gu, Jinxiang
  2018-07-04  5:56   ` Nikolay Borisov
  2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  3:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 2506 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
> 
> A crafted image has empty root tree block, which will cause later NULL
> pointer dereference.
> 
> The following trees should never be empty:
> 1) Tree root
>    Must contain at least root items for extent tree, device tree and fs
>    tree
> 
> 2) Chunk tree
>    Or we can't even bootstrap.
> 
> 3) Fs tree
>    At least inode item for top level inode (.).
> 
> 4) Device tree
>    Dev extents for chunks
> 
> 5) Extent tree
>    Must has corresponding extent for each chunk.
> 
> If any is empty, we are sure the fs is corrupted and no need to mount
> it.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/tree-checker.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1cd735b099df..3a307efab46b 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -497,8 +497,20 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
>  	 * skip this check for relocation trees.
>  	 */
>  	if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) {
> +		u64 owner = btrfs_header_owner(leaf);
>  		struct btrfs_root *check_root;
> 
> +		/* These trees should never be empty */
> +		if (owner == BTRFS_ROOT_TREE_OBJECTID ||
> +		    owner == BTRFS_CHUNK_TREE_OBJECTID ||
> +		    owner == BTRFS_EXTENT_TREE_OBJECTID ||
> +		    owner == BTRFS_DEV_TREE_OBJECTID ||
> +		    owner == BTRFS_FS_TREE_OBJECTID) {
> +			generic_err(fs_info, leaf, 0,
> +			"invalid root, root %llu should never be empty",
> +				    owner);
> +			return -EUCLEAN;
> +		}
>  		key.objectid = btrfs_header_owner(leaf);
>  		key.type = BTRFS_ROOT_ITEM_KEY;
>  		key.offset = (u64)-1;
> --

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
  2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
@ 2018-07-04  5:23   ` Gu, Jinxiang
  2018-07-04  7:37   ` Gu, Jinxiang
  1 sibling, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  5:23 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 2609 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
> 
> Invalid tree reloc tree can cause kernel NULL pointer dereference when
> btrfs does some cleanup for reloc roots.
> 
> It turns out that fs_info->reloc_ctl can be NULL in
> btrfs_recover_relocation() as we allocate relocation control after all
> reloc roots are verified.
> So when we hit out: tag, we haven't call set_reloc_control() thus
> fs_info->reloc_ctl is still NULL.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/relocation.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 879b76fa881a..be94c65bb4d2 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1321,18 +1321,19 @@ static void __del_reloc_root(struct btrfs_root *root)
>  	struct mapping_node *node = NULL;
>  	struct reloc_control *rc = fs_info->reloc_ctl;
> 
> -	spin_lock(&rc->reloc_root_tree.lock);
> -	rb_node = tree_search(&rc->reloc_root_tree.rb_root,
> -			      root->node->start);
> -	if (rb_node) {
> -		node = rb_entry(rb_node, struct mapping_node, rb_node);
> -		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
> +	if (rc) {
> +		spin_lock(&rc->reloc_root_tree.lock);
> +		rb_node = tree_search(&rc->reloc_root_tree.rb_root,
> +				      root->node->start);
> +		if (rb_node) {
> +			node = rb_entry(rb_node, struct mapping_node, rb_node);
> +			rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
> +		}
> +		spin_unlock(&rc->reloc_root_tree.lock);
> +		if (!node)
> +			return;
> +		BUG_ON((struct btrfs_root *)node->data != root);
>  	}
> -	spin_unlock(&rc->reloc_root_tree.lock);
> -
> -	if (!node)
> -		return;
> -	BUG_ON((struct btrfs_root *)node->data != root);
> 
>  	spin_lock(&fs_info->trans_lock);
>  	list_del_init(&root->root_list);
> --

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
  2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
@ 2018-07-04  5:45   ` Gu, Jinxiang
  2018-07-05 23:41     ` Qu Wenruo
  2018-07-04  6:02   ` Nikolay Borisov
  1 sibling, 1 reply; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  5:45 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 4565 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
> 
> A crafted btrfs with incorrect chunk<->block group mapping, it could leads
> to a lot of unexpected behavior.
> 
> Although the crafted image can be catched by block group item checker
> added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
> crafted a valid enough block group item which can pass above check but
> still mismatch with existing chunk, it could cause a lot of undefined
> behavior.
> 
> This patch will add extra block group -> chunk mapping check, to ensure
> we have a completely matching (start, len, flags) chunk for each block
> group at mount time.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58c0080..82b446f014b9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10003,6 +10003,41 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info,
>  	return cache;
>  }
> 
> +static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
> +			     u64 flags)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	int ret;
> +
> +	read_lock(&map_tree->map_tree.lock);
> +	em = lookup_extent_mapping(&map_tree->map_tree, start, len);
> +	read_unlock(&map_tree->map_tree.lock);
> +
> +	if (!em) {
> +		btrfs_err_rl(fs_info,
> +	"block group start=%llu len=%llu doesn't have corresponding chunk",
> +			     start, len);
> +		ret = -ENOENT;
> +		goto out;
> +	}

This check has been done in find_first_block_group which has been called before
check_exist_chunk be called.

> +	if (em->start != start || em->len != len ||
> +	    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
> +	    (flags & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +		btrfs_err_rl(fs_info,
> +"block group start=%llu len=%llu flags=0x%llx doesn't match with chunk start=%llu len=%llu flags=0x%llx",
> +			     start, len , flags & BTRFS_BLOCK_GROUP_TYPE_MASK,
> +			     em->start, em->len, em->map_lookup->type &
> +			     BTRFS_BLOCK_GROUP_TYPE_MASK);
> +		ret = -EUCLEAN;
> +		goto out;
> +	}
Should this check also be added to find_first_block_group?

> +	ret = 0;
> +out:
> +	free_extent_map(em);
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -10036,6 +10071,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		need_clear = 1;
> 
>  	while (1) {
> +		struct btrfs_block_group_item bg;
> +		int slot;
> +
>  		ret = find_first_block_group(info, path, &key);
>  		if (ret > 0)
>  			break;
> @@ -10043,7 +10081,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  			goto error;
> 
>  		leaf = path->nodes[0];
> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +		slot = path->slots[0];
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
> +				   sizeof(bg));
> +		/*
> +		 * Chunk and block group must have 1:1 mapping.
> +		 * So there must be a chunk for this block group.
> +		 */
> +		ret = check_exist_chunk(info, found_key.objectid,
> +					found_key.offset,
> +					btrfs_block_group_flags(&bg));
> +		if (ret < 0)
> +			goto error;
> 
>  		cache = btrfs_create_block_group_cache(info, found_key.objectid,
>  						       found_key.offset);
> @@ -10068,7 +10119,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		}
> 
>  		read_extent_buffer(leaf, &cache->item,
> -				   btrfs_item_ptr_offset(leaf, path->slots[0]),
> +				   btrfs_item_ptr_offset(leaf, slot),
>  				   sizeof(cache->item));
>  		cache->flags = btrfs_block_group_flags(&cache->item);
>  		if (!mixed &&
> --
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
  2018-07-04  2:20   ` Gu, Jinxiang
@ 2018-07-04  5:54   ` Nikolay Borisov
  2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Nikolay Borisov @ 2018-07-04  5:54 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On  3.07.2018 12:10, Qu Wenruo wrote:
> A crafted image with invalid block group items could make free space cache
> code to cause panic.
> 
> We could early detect such invalid block group item by checking:
> 1) Item size
>    Fixed value.
> 2) Block group size (key.offset)
>    We have a up limit on block group item (10G)
> 3) Chunk objectid
>    Fixed value.
> 4) Type
>    Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
>    No more than 1 bit set for profile type.
> 5) Used space
>    No more than block group size.
> 
> This should allow btrfs to detect and refuse to mount the crafted image.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
  2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
  2018-07-04  3:42   ` Gu, Jinxiang
@ 2018-07-04  5:56   ` Nikolay Borisov
  2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Nikolay Borisov @ 2018-07-04  5:56 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On  3.07.2018 12:10, Qu Wenruo wrote:
> A crafted image has empty root tree block, which will cause later NULL
> pointer dereference.
> 
> The following trees should never be empty:
> 1) Tree root
>    Must contain at least root items for extent tree, device tree and fs
>    tree
> 
> 2) Chunk tree
>    Or we can't even bootstrap.
> 
> 3) Fs tree
>    At least inode item for top level inode (.).
> 
> 4) Device tree
>    Dev extents for chunks
> 
> 5) Extent tree
>    Must has corresponding extent for each chunk.
> 
> If any is empty, we are sure the fs is corrupted and no need to mount
> it.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/tree-checker.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1cd735b099df..3a307efab46b 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -497,8 +497,20 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
>  	 * skip this check for relocation trees.
>  	 */
>  	if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) {
> +		u64 owner = btrfs_header_owner(leaf);

nit: Since you assign the owner of the leaf to a variable use this
variable when initialising key object below.

>  		struct btrfs_root *check_root;
>  
> +		/* These trees should never be empty */
> +		if (owner == BTRFS_ROOT_TREE_OBJECTID ||
> +		    owner == BTRFS_CHUNK_TREE_OBJECTID ||
> +		    owner == BTRFS_EXTENT_TREE_OBJECTID ||
> +		    owner == BTRFS_DEV_TREE_OBJECTID ||
> +		    owner == BTRFS_FS_TREE_OBJECTID) {
> +			generic_err(fs_info, leaf, 0,
> +			"invalid root, root %llu should never be empty",
> +				    owner);
> +			return -EUCLEAN;
> +		}
>  		key.objectid = btrfs_header_owner(leaf);
>  		key.type = BTRFS_ROOT_ITEM_KEY;
>  		key.offset = (u64)-1;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
  2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
  2018-07-04  5:45   ` Gu, Jinxiang
@ 2018-07-04  6:02   ` Nikolay Borisov
  1 sibling, 0 replies; 31+ messages in thread
From: Nikolay Borisov @ 2018-07-04  6:02 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On  3.07.2018 12:10, Qu Wenruo wrote:
> A crafted btrfs with incorrect chunk<->block group mapping, it could leads
> to a lot of unexpected behavior.
> 
> Although the crafted image can be catched by block group item checker
> added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
> crafted a valid enough block group item which can pass above check but
> still mismatch with existing chunk, it could cause a lot of undefined
> behavior.
> 
> This patch will add extra block group -> chunk mapping check, to ensure
> we have a completely matching (start, len, flags) chunk for each block
> group at mount time.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

Just one minor nit below.

> ---
>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58c0080..82b446f014b9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10003,6 +10003,41 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info,
>  	return cache;
>  }
>  
> +static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
> +			     u64 flags)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	int ret;
> +
> +	read_lock(&map_tree->map_tree.lock);
> +	em = lookup_extent_mapping(&map_tree->map_tree, start, len);
> +	read_unlock(&map_tree->map_tree.lock);
> +
> +	if (!em) {
> +		btrfs_err_rl(fs_info,
> +	"block group start=%llu len=%llu doesn't have corresponding chunk",
> +			     start, len);
> +		ret = -ENOENT;
> +		goto out;
> +	}
> +	if (em->start != start || em->len != len ||
> +	    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
> +	    (flags & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +		btrfs_err_rl(fs_info,
> +"block group start=%llu len=%llu flags=0x%llx doesn't match with chunk start=%llu len=%llu flags=0x%llx",
> +			     start, len , flags & BTRFS_BLOCK_GROUP_TYPE_MASK,
> +			     em->start, em->len, em->map_lookup->type &
> +			     BTRFS_BLOCK_GROUP_TYPE_MASK);
> +		ret = -EUCLEAN;
> +		goto out;
> +	}
> +	ret = 0;

nit: I'd rather the ret be initialised when it's defined, it's changed
only if there is an error so it actually saves a line and makes it
obvious that we start with an assumption that the check should pass.
> +out:
> +	free_extent_map(em);
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -10036,6 +10071,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		need_clear = 1;
>  
>  	while (1) {
> +		struct btrfs_block_group_item bg;
> +		int slot;
> +
>  		ret = find_first_block_group(info, path, &key);
>  		if (ret > 0)
>  			break;
> @@ -10043,7 +10081,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  			goto error;
>  
>  		leaf = path->nodes[0];
> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +		slot = path->slots[0];
> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
> +
> +		read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
> +				   sizeof(bg));
> +		/*
> +		 * Chunk and block group must have 1:1 mapping.
> +		 * So there must be a chunk for this block group.
> +		 */
> +		ret = check_exist_chunk(info, found_key.objectid,
> +					found_key.offset,
> +					btrfs_block_group_flags(&bg));
> +		if (ret < 0)
> +			goto error;
>  
>  		cache = btrfs_create_block_group_cache(info, found_key.objectid,
>  						       found_key.offset);
> @@ -10068,7 +10119,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  		}
>  
>  		read_extent_buffer(leaf, &cache->item,
> -				   btrfs_item_ptr_offset(leaf, path->slots[0]),
> +				   btrfs_item_ptr_offset(leaf, slot),
>  				   sizeof(cache->item));
>  		cache->flags = btrfs_block_group_flags(&cache->item);
>  		if (!mixed &&
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
@ 2018-07-04  6:09   ` Gu, Jinxiang
  2018-07-04  7:08   ` Nikolay Borisov
  2018-07-05 15:18   ` David Sterba
  2 siblings, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  6:09 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 3589 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
> 
> If a crafted btrfs has missing block group items, it could cause
> unexpected behavior and breaks our expectation on 1:1
> chunk<->block group mapping.
> 
> Although we added block group -> chunk mapping check, we still need
> chunk -> block group mapping check.
> 
> This patch will do extra check to ensure each chunk has its
> corresponding block group.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b446f014b9..746095034ca2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>  	return ret;
>  }
> 
> +/*
> + * Iterate all chunks and verify each of them has corresponding block group
> + */
> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	struct btrfs_block_group_cache *bg;
> +	u64 start = 0;
> +	int ret = 0;
> +
> +	while (1) {
> +		read_lock(&map_tree->map_tree.lock);
> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
> +					   (u64)-1 - start);
> +		read_unlock(&map_tree->map_tree.lock);
> +		if (!em)
> +			break;
> +
> +		bg = btrfs_lookup_block_group(fs_info, em->start);
> +		if (!bg) {
> +			btrfs_err_rl(fs_info,
> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
> +				     em->start, em->len);
> +			ret = -ENOENT;
> +			free_extent_map(em);
> +			break;
> +		}
> +		if (bg->key.objectid != em->start ||
> +		    bg->key.offset != em->len ||
> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +			btrfs_err_rl(fs_info,
> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
> +				em->start, em->len,
> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
> +				bg->key.objectid, bg->key.offset,
> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
> +			ret = -EUCLEAN;
> +			free_extent_map(em);
> +			btrfs_put_block_group(bg);
> +			break;
> +		}
> +		start = em->start + em->len;
> +		free_extent_map(em);
> +		btrfs_put_block_group(bg);
> +	}
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
> 
>  	btrfs_add_raid_kobjects(info);
>  	init_global_block_rsv(info);
> -	ret = 0;
> +	ret = check_chunk_block_group_mappings(info);
>  error:
>  	btrfs_free_path(path);
>  	return ret;
> --

Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
  2018-07-04  6:09   ` Gu, Jinxiang
@ 2018-07-04  7:08   ` Nikolay Borisov
  2018-07-04  9:46     ` Qu Wenruo
  2018-07-05 15:18   ` David Sterba
  2 siblings, 1 reply; 31+ messages in thread
From: Nikolay Borisov @ 2018-07-04  7:08 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On  3.07.2018 12:10, Qu Wenruo wrote:
> If a crafted btrfs has missing block group items, it could cause
> unexpected behavior and breaks our expectation on 1:1
> chunk<->block group mapping.
> 
> Although we added block group -> chunk mapping check, we still need
> chunk -> block group mapping check.
> 
> This patch will do extra check to ensure each chunk has its
> corresponding block group.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b446f014b9..746095034ca2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>  	return ret;
>  }
>  
> +/*
> + * Iterate all chunks and verify each of them has corresponding block group
> + */
> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	struct btrfs_block_group_cache *bg;
> +	u64 start = 0;
> +	int ret = 0;
> +
> +	while (1) {
> +		read_lock(&map_tree->map_tree.lock);
> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
> +					   (u64)-1 - start);
len parameter of lookup_extent_mapping eventually ends up in range_end.
Meaning it will just return -1. Why not use just -1 for len. Looking at
the rest of the code this seems to be the convention. But then there are
several places where 1 is passed as well. Hm, in any case a single
number is simpler than an expression.

> +		read_unlock(&map_tree->map_tree.lock);
> +		if (!em)
> +			break;
> +
> +		bg = btrfs_lookup_block_group(fs_info, em->start);
> +		if (!bg) {
> +			btrfs_err_rl(fs_info,
> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
> +				     em->start, em->len);
> +			ret = -ENOENT;
> +			free_extent_map(em);
> +			break;
> +		}
> +		if (bg->key.objectid != em->start ||
> +		    bg->key.offset != em->len ||
> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +			btrfs_err_rl(fs_info,
> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
> +				em->start, em->len,
> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
> +				bg->key.objectid, bg->key.offset,
> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
> +			ret = -EUCLEAN;
> +			free_extent_map(em);
> +			btrfs_put_block_group(bg);
> +			break;
> +		}
> +		start = em->start + em->len;
> +		free_extent_map(em);
> +		btrfs_put_block_group(bg);
> +	}
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  
>  	btrfs_add_raid_kobjects(info);
>  	init_global_block_rsv(info);
> -	ret = 0;
> +	ret = check_chunk_block_group_mappings(info);

Rather than doing that can we just get the count of chunks. Then if we
have as many chunks as BG have been read in and we know the BG -> chunk
mapping check has passed we can assume that chunks also map to BG
without going through all chunks.

>  error:
>  	btrfs_free_path(path);
>  	return ret;
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
  2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
  2018-07-04  2:20   ` Gu, Jinxiang
  2018-07-04  5:54   ` Nikolay Borisov
@ 2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  7:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 6663 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 1/5] btrfs: tree-checker: Verify block_group_item
> 
> A crafted image with invalid block group items could make free space cache
> code to cause panic.
> 
> We could early detect such invalid block group item by checking:
> 1) Item size
>    Fixed value.
> 2) Block group size (key.offset)
>    We have a up limit on block group item (10G)
> 3) Chunk objectid
>    Fixed value.
> 4) Type
>    Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA.
>    No more than 1 bit set for profile type.
> 5) Used space
>    No more than block group size.
> 
> This should allow btrfs to detect and refuse to mount the crafted image.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/tree-checker.c | 101 ++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.c      |   2 +-
>  fs/btrfs/volumes.h      |   2 +
>  3 files changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 8d40e7dd8c30..1cd735b099df 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -19,6 +19,7 @@
>  #include "tree-checker.h"
>  #include "disk-io.h"
>  #include "compression.h"
> +#include "volumes.h"
> 
>  /*
>   * Error message should follow the following format:
> @@ -353,6 +354,103 @@ static int check_dir_item(struct btrfs_fs_info *fs_info,
>  	return 0;
>  }
> 
> +__printf(4, 5)
> +__cold
> +static void block_group_err(const struct btrfs_fs_info *fs_info,
> +			    const struct extent_buffer *eb, int slot,
> +			    const char *fmt, ...)
> +{
> +	struct btrfs_key key;
> +	struct va_format vaf;
> +	va_list args;
> +
> +	btrfs_item_key_to_cpu(eb, &key, slot);
> +	va_start(args, fmt);
> +
> +	vaf.fmt = fmt;
> +	vaf.va = &args;
> +
> +	btrfs_crit(fs_info,
> +	"corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV",
> +		btrfs_header_level(eb) == 0 ? "leaf" : "node",
> +		btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot,
> +		key.objectid, key.offset, &vaf);
> +	va_end(args);
> +}
> +
> +static int check_block_group_item(struct btrfs_fs_info *fs_info,
> +				  struct extent_buffer *leaf,
> +				  struct btrfs_key *key, int slot)
> +{
> +	struct btrfs_block_group_item bgi;
> +	u32 item_size = btrfs_item_size_nr(leaf, slot);
> +	u64 flags;
> +	u64 type_flags;
> +
> +	/*
> +	 * Here we don't really care about unalignment since extent allocator
> +	 * can handle it.
> +	 * We care more about the size, as if one block group is larger than
> +	 * maximum size, it's must be some obvious corruption
> +	 */
> +	if (key->offset > BTRFS_MAX_DATA_CHUNK_SIZE || key->offset == 0) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid block group size, have %llu expect (0, %llu]",
> +				key->offset, 10ULL * SZ_1G);
> +		return -EUCLEAN;
> +	}
> +
> +	if (item_size != sizeof(bgi)) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid item size, have %u expect %lu",
> +				item_size, sizeof(bgi));
> +		return -EUCLEAN;
> +	}
> +
> +	read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot),
> +			   sizeof(bgi));
> +	if (btrfs_block_group_chunk_objectid(&bgi) !=
> +	    BTRFS_FIRST_CHUNK_TREE_OBJECTID) {
> +		block_group_err(fs_info, leaf, slot,
> +		"invalid block group chunk objectid, have %llu expect %llu",
> +				btrfs_block_group_chunk_objectid(&bgi),
> +				BTRFS_FIRST_CHUNK_TREE_OBJECTID);
> +		return -EUCLEAN;
> +	}
> +
> +	if (btrfs_block_group_used(&bgi) > key->offset) {
> +		block_group_err(fs_info, leaf, slot,
> +			"invalid block group used, have %llu expect [0, %llu)",
> +				btrfs_block_group_used(&bgi), key->offset);
> +		return -EUCLEAN;
> +	}
> +
> +	flags = btrfs_block_group_flags(&bgi);
> +	if (hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) > 1) {
> +		block_group_err(fs_info, leaf, slot,
> +"invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set",
> +			flags & BTRFS_BLOCK_GROUP_PROFILE_MASK,
> +			hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK));
> +		return -EUCLEAN;
> +	}
> +
> +	type_flags = flags & BTRFS_BLOCK_GROUP_TYPE_MASK;
> +	if (type_flags != BTRFS_BLOCK_GROUP_DATA &&
> +	    type_flags != BTRFS_BLOCK_GROUP_METADATA &&
> +	    type_flags != BTRFS_BLOCK_GROUP_SYSTEM &&
> +	    type_flags != (BTRFS_BLOCK_GROUP_METADATA |
> +			   BTRFS_BLOCK_GROUP_DATA)) {
> +		block_group_err(fs_info, leaf, slot,
> +"invalid type flags, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llu or 0x%llx",
> +			type_flags, hweight64(type_flags),
> +			BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA,
> +			BTRFS_BLOCK_GROUP_SYSTEM,
> +			BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA);
> +		return -EUCLEAN;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * Common point to switch the item-specific validation.
>   */
> @@ -374,6 +472,9 @@ static int check_leaf_item(struct btrfs_fs_info *fs_info,
>  	case BTRFS_XATTR_ITEM_KEY:
>  		ret = check_dir_item(fs_info, leaf, key, slot);
>  		break;
> +	case BTRFS_BLOCK_GROUP_ITEM_KEY:
> +		ret = check_block_group_item(fs_info, leaf, key, slot);
> +		break;
>  	}
>  	return ret;
>  }
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e034ad9e23b4..b33bf29130b6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4690,7 +4690,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> 
>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>  		max_stripe_size = SZ_1G;
> -		max_chunk_size = 10 * max_stripe_size;
> +		max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE;
>  		if (!devs_max)
>  			devs_max = BTRFS_MAX_DEVS(info);
>  	} else if (type & BTRFS_BLOCK_GROUP_METADATA) {
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 5139ec8daf4c..77e6004b6cb9 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -11,6 +11,8 @@
>  #include <linux/btrfs.h>
>  #include "async-thread.h"
> 
> +#define BTRFS_MAX_DATA_CHUNK_SIZE	(10ULL * SZ_1G)
> +
>  extern struct mutex uuid_mutex;
> 
>  #define BTRFS_STRIPE_LEN	SZ_64K
> --

Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>


> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
  2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
  2018-07-04  3:42   ` Gu, Jinxiang
  2018-07-04  5:56   ` Nikolay Borisov
@ 2018-07-04  7:37   ` Gu, Jinxiang
  2 siblings, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  7:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 2504 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree
> 
> A crafted image has empty root tree block, which will cause later NULL
> pointer dereference.
> 
> The following trees should never be empty:
> 1) Tree root
>    Must contain at least root items for extent tree, device tree and fs
>    tree
> 
> 2) Chunk tree
>    Or we can't even bootstrap.
> 
> 3) Fs tree
>    At least inode item for top level inode (.).
> 
> 4) Device tree
>    Dev extents for chunks
> 
> 5) Extent tree
>    Must has corresponding extent for each chunk.
> 
> If any is empty, we are sure the fs is corrupted and no need to mount
> it.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/tree-checker.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 1cd735b099df..3a307efab46b 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -497,8 +497,20 @@ static int check_leaf(struct btrfs_fs_info *fs_info, struct extent_buffer *leaf,
>  	 * skip this check for relocation trees.
>  	 */
>  	if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) {
> +		u64 owner = btrfs_header_owner(leaf);
>  		struct btrfs_root *check_root;
> 
> +		/* These trees should never be empty */
> +		if (owner == BTRFS_ROOT_TREE_OBJECTID ||
> +		    owner == BTRFS_CHUNK_TREE_OBJECTID ||
> +		    owner == BTRFS_EXTENT_TREE_OBJECTID ||
> +		    owner == BTRFS_DEV_TREE_OBJECTID ||
> +		    owner == BTRFS_FS_TREE_OBJECTID) {
> +			generic_err(fs_info, leaf, 0,
> +			"invalid root, root %llu should never be empty",
> +				    owner);
> +			return -EUCLEAN;
> +		}
>  		key.objectid = btrfs_header_owner(leaf);
>  		key.type = BTRFS_ROOT_ITEM_KEY;
>  		key.offset = (u64)-1;
> --

Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
  2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
  2018-07-04  5:23   ` Gu, Jinxiang
@ 2018-07-04  7:37   ` Gu, Jinxiang
  1 sibling, 0 replies; 31+ messages in thread
From: Gu, Jinxiang @ 2018-07-04  7:37 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 2607 bytes --]



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
> Sent: Tuesday, July 03, 2018 5:10 PM
> To: linux-btrfs@vger.kernel.org
> Subject: [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized
> 
> Invalid tree reloc tree can cause kernel NULL pointer dereference when
> btrfs does some cleanup for reloc roots.
> 
> It turns out that fs_info->reloc_ctl can be NULL in
> btrfs_recover_relocation() as we allocate relocation control after all
> reloc roots are verified.
> So when we hit out: tag, we haven't call set_reloc_control() thus
> fs_info->reloc_ctl is still NULL.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199833
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/relocation.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 879b76fa881a..be94c65bb4d2 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1321,18 +1321,19 @@ static void __del_reloc_root(struct btrfs_root *root)
>  	struct mapping_node *node = NULL;
>  	struct reloc_control *rc = fs_info->reloc_ctl;
> 
> -	spin_lock(&rc->reloc_root_tree.lock);
> -	rb_node = tree_search(&rc->reloc_root_tree.rb_root,
> -			      root->node->start);
> -	if (rb_node) {
> -		node = rb_entry(rb_node, struct mapping_node, rb_node);
> -		rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
> +	if (rc) {
> +		spin_lock(&rc->reloc_root_tree.lock);
> +		rb_node = tree_search(&rc->reloc_root_tree.rb_root,
> +				      root->node->start);
> +		if (rb_node) {
> +			node = rb_entry(rb_node, struct mapping_node, rb_node);
> +			rb_erase(&node->rb_node, &rc->reloc_root_tree.rb_root);
> +		}
> +		spin_unlock(&rc->reloc_root_tree.lock);
> +		if (!node)
> +			return;
> +		BUG_ON((struct btrfs_root *)node->data != root);
>  	}
> -	spin_unlock(&rc->reloc_root_tree.lock);
> -
> -	if (!node)
> -		return;
> -	BUG_ON((struct btrfs_root *)node->data != root);
> 
>  	spin_lock(&fs_info->trans_lock);
>  	list_del_init(&root->root_list);
> --

Tested-by: Gu Jinxiang <gujx@cn.fujitsu.com>

> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±ý»k~ÏâžØ^n‡r¡ö¦zË\x1aëh™¨è­Ú&£ûàz¿äz¹Þ—ú+€Ê+zf£¢·hšˆ§~†­†Ûiÿÿïêÿ‘êçz_è®\x0fæj:+v‰¨þ)ߣøm

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-04  7:08   ` Nikolay Borisov
@ 2018-07-04  9:46     ` Qu Wenruo
  2018-07-05 23:49       ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-07-04  9:46 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2018年07月04日 15:08, Nikolay Borisov wrote:
> 
> 
> On  3.07.2018 12:10, Qu Wenruo wrote:
>> If a crafted btrfs has missing block group items, it could cause
>> unexpected behavior and breaks our expectation on 1:1
>> chunk<->block group mapping.
>>
>> Although we added block group -> chunk mapping check, we still need
>> chunk -> block group mapping check.
>>
>> This patch will do extra check to ensure each chunk has its
>> corresponding block group.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 82b446f014b9..746095034ca2 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Iterate all chunks and verify each of them has corresponding block group
>> + */
>> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>> +{
>> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>> +	struct extent_map *em;
>> +	struct btrfs_block_group_cache *bg;
>> +	u64 start = 0;
>> +	int ret = 0;
>> +
>> +	while (1) {
>> +		read_lock(&map_tree->map_tree.lock);
>> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
>> +					   (u64)-1 - start);
> len parameter of lookup_extent_mapping eventually ends up in range_end.
> Meaning it will just return -1. Why not use just -1 for len. Looking at
> the rest of the code this seems to be the convention. But then there are
> several places where 1 is passed as well. Hm, in any case a single
> number is simpler than an expression.

I still like to be accurate here, since it's @len, then we should follow
its naming.
Although we have range_end() for correct careless caller, it still
doesn't sound good just passing -1 as @len.

> 
>> +		read_unlock(&map_tree->map_tree.lock);
>> +		if (!em)
>> +			break;
>> +
>> +		bg = btrfs_lookup_block_group(fs_info, em->start);
>> +		if (!bg) {
>> +			btrfs_err_rl(fs_info,
>> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
>> +				     em->start, em->len);
>> +			ret = -ENOENT;
>> +			free_extent_map(em);
>> +			break;
>> +		}
>> +		if (bg->key.objectid != em->start ||
>> +		    bg->key.offset != em->len ||
>> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
>> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
>> +			btrfs_err_rl(fs_info,
>> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
>> +				em->start, em->len,
>> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
>> +				bg->key.objectid, bg->key.offset,
>> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
>> +			ret = -EUCLEAN;
>> +			free_extent_map(em);
>> +			btrfs_put_block_group(bg);
>> +			break;
>> +		}
>> +		start = em->start + em->len;
>> +		free_extent_map(em);
>> +		btrfs_put_block_group(bg);
>> +	}
>> +	return ret;
>> +}
>> +
>>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  {
>>  	struct btrfs_path *path;
>> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  
>>  	btrfs_add_raid_kobjects(info);
>>  	init_global_block_rsv(info);
>> -	ret = 0;
>> +	ret = check_chunk_block_group_mappings(info);
> 
> Rather than doing that can we just get the count of chunks. Then if we
> have as many chunks as BG have been read in and we know the BG -> chunk
> mapping check has passed we can assume that chunks also map to BG
> without going through all chunks.

Nope, just as the checks done in that function, we must ensure not only
the number of bgs/chunks matches, but *each* chunk must have a block
group with the same size, length and type flags.

If we have a block group doesn't match its size/length, it's pretty
possible that the corrupted block group may overlap with other block
groups, causing undefined behavior.
So the same for type flags.

This means the only reliable check is the one used in this and previous
check.
(Check bg->chunk matches, and then check chunk->bg matches, using size +
len + type flags as material)

Thanks,
Qu

> 
>>  error:
>>  	btrfs_free_path(path);
>>  	return ret;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Enhancement for block group/chunk verification
  2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
                   ` (4 preceding siblings ...)
  2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
@ 2018-07-04 13:36 ` David Sterba
  2018-07-05  1:36   ` Qu Wenruo
  5 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-07-04 13:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jul 03, 2018 at 05:10:04PM +0800, Qu Wenruo wrote:
> Can be fetched from github, which is based on v4.18-rc1 tag:
> https://github.com/adam900710/linux/tree/tree_checker_enhance
> 
> Reported by Xu Wen <wen.xu@gatech.edu>, some crafted btrfs image can
> cause unexpected kernel behavior.
> 
> All of them are related to block group and chunk, so this patchset will
> enhance block group and chunk verification, so kernel can detect them
> and error out gracefully (with user friendly error message showing
> what's going wrong)
> 
> Obvious corruption (don't need to cross check with chunk/block group),
> will be addressed by enhanced tree-checker.
> (Most crafted images will be caught by tree-checker)
> 
> More complex corruption will be addressed mostly at
> btrfs_read_block_groups(), doing extra cross reference check for
> chunk<->block group mapping.
> It may cause extra mount time, but compared to the existing time
> consuming block group items search, all added check is done completely
> in memory using rb_tree, so it shouldn't add too much overhead.
> 
> Qu Wenruo (5):
>   btrfs: tree-checker: Verify block_group_item
>   btrfs: tree-checker: Detect invalid empty essential tree
>   btrfs: relocation: Only remove reloc rb_trees if reloc control has
>     been initialized
>   btrfs: Check each block group has corresponding chunk at mount time
>   btrfs: Verify every chunk has corresponding block group at mount time

Patches 1-3 queued, thanks. 4 and 5 have some comments.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Enhancement for block group/chunk verification
  2018-07-04 13:36 ` [PATCH 0/5] Enhancement for block group/chunk verification David Sterba
@ 2018-07-05  1:36   ` Qu Wenruo
  2018-07-05 15:18     ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-07-05  1:36 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1932 bytes --]



On 2018年07月04日 21:36, David Sterba wrote:
> On Tue, Jul 03, 2018 at 05:10:04PM +0800, Qu Wenruo wrote:
>> Can be fetched from github, which is based on v4.18-rc1 tag:
>> https://github.com/adam900710/linux/tree/tree_checker_enhance
>>
>> Reported by Xu Wen <wen.xu@gatech.edu>, some crafted btrfs image can
>> cause unexpected kernel behavior.
>>
>> All of them are related to block group and chunk, so this patchset will
>> enhance block group and chunk verification, so kernel can detect them
>> and error out gracefully (with user friendly error message showing
>> what's going wrong)
>>
>> Obvious corruption (don't need to cross check with chunk/block group),
>> will be addressed by enhanced tree-checker.
>> (Most crafted images will be caught by tree-checker)
>>
>> More complex corruption will be addressed mostly at
>> btrfs_read_block_groups(), doing extra cross reference check for
>> chunk<->block group mapping.
>> It may cause extra mount time, but compared to the existing time
>> consuming block group items search, all added check is done completely
>> in memory using rb_tree, so it shouldn't add too much overhead.
>>
>> Qu Wenruo (5):
>>   btrfs: tree-checker: Verify block_group_item
>>   btrfs: tree-checker: Detect invalid empty essential tree
>>   btrfs: relocation: Only remove reloc rb_trees if reloc control has
>>     been initialized
>>   btrfs: Check each block group has corresponding chunk at mount time
>>   btrfs: Verify every chunk has corresponding block group at mount time
> 
> Patches 1-3 queued, thanks. 4 and 5 have some comments.

Did I miss the comments for 4 and 5?

I only see some discussion on "Link:" tag.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
  2018-07-04  6:09   ` Gu, Jinxiang
  2018-07-04  7:08   ` Nikolay Borisov
@ 2018-07-05 15:18   ` David Sterba
  2018-07-05 23:44     ` Qu Wenruo
  2 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-07-05 15:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
> If a crafted btrfs has missing block group items, it could cause
> unexpected behavior and breaks our expectation on 1:1
> chunk<->block group mapping.
> 
> Although we added block group -> chunk mapping check, we still need
> chunk -> block group mapping check.
> 
> This patch will do extra check to ensure each chunk has its
> corresponding block group.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 82b446f014b9..746095034ca2 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>  	return ret;
>  }
>  
> +/*
> + * Iterate all chunks and verify each of them has corresponding block group
> + */
> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> +	struct extent_map *em;
> +	struct btrfs_block_group_cache *bg;
> +	u64 start = 0;
> +	int ret = 0;
> +
> +	while (1) {
> +		read_lock(&map_tree->map_tree.lock);
> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
> +					   (u64)-1 - start);

This needs a comment.

> +		read_unlock(&map_tree->map_tree.lock);
> +		if (!em)
> +			break;
> +
> +		bg = btrfs_lookup_block_group(fs_info, em->start);
> +		if (!bg) {
> +			btrfs_err_rl(fs_info,
> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
> +				     em->start, em->len);
> +			ret = -ENOENT;

-EUCLEAN ?

> +			free_extent_map(em);
> +			break;
> +		}
> +		if (bg->key.objectid != em->start ||
> +		    bg->key.offset != em->len ||
> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +			btrfs_err_rl(fs_info,

Why is this ratelmited? I'd understand that a fuzzed image will spew a
lot of these errors but for a normal case, it should be ok to print all
the messages.

> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
> +				em->start, em->len,
> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
> +				bg->key.objectid, bg->key.offset,
> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
> +			ret = -EUCLEAN;
> +			free_extent_map(em);
> +			btrfs_put_block_group(bg);
> +			break;
> +		}
> +		start = em->start + em->len;
> +		free_extent_map(em);
> +		btrfs_put_block_group(bg);
> +	}
> +	return ret;
> +}
> +
>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  {
>  	struct btrfs_path *path;
> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>  
>  	btrfs_add_raid_kobjects(info);
>  	init_global_block_rsv(info);
> -	ret = 0;
> +	ret = check_chunk_block_group_mappings(info);
>  error:
>  	btrfs_free_path(path);
>  	return ret;
> -- 
> 2.18.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/5] Enhancement for block group/chunk verification
  2018-07-05  1:36   ` Qu Wenruo
@ 2018-07-05 15:18     ` David Sterba
  0 siblings, 0 replies; 31+ messages in thread
From: David Sterba @ 2018-07-05 15:18 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Jul 05, 2018 at 09:36:47AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月04日 21:36, David Sterba wrote:
> > On Tue, Jul 03, 2018 at 05:10:04PM +0800, Qu Wenruo wrote:
> >> Can be fetched from github, which is based on v4.18-rc1 tag:
> >> https://github.com/adam900710/linux/tree/tree_checker_enhance
> >>
> >> Reported by Xu Wen <wen.xu@gatech.edu>, some crafted btrfs image can
> >> cause unexpected kernel behavior.
> >>
> >> All of them are related to block group and chunk, so this patchset will
> >> enhance block group and chunk verification, so kernel can detect them
> >> and error out gracefully (with user friendly error message showing
> >> what's going wrong)
> >>
> >> Obvious corruption (don't need to cross check with chunk/block group),
> >> will be addressed by enhanced tree-checker.
> >> (Most crafted images will be caught by tree-checker)
> >>
> >> More complex corruption will be addressed mostly at
> >> btrfs_read_block_groups(), doing extra cross reference check for
> >> chunk<->block group mapping.
> >> It may cause extra mount time, but compared to the existing time
> >> consuming block group items search, all added check is done completely
> >> in memory using rb_tree, so it shouldn't add too much overhead.
> >>
> >> Qu Wenruo (5):
> >>   btrfs: tree-checker: Verify block_group_item
> >>   btrfs: tree-checker: Detect invalid empty essential tree
> >>   btrfs: relocation: Only remove reloc rb_trees if reloc control has
> >>     been initialized
> >>   btrfs: Check each block group has corresponding chunk at mount time
> >>   btrfs: Verify every chunk has corresponding block group at mount time
> > 
> > Patches 1-3 queued, thanks. 4 and 5 have some comments.
> 
> Did I miss the comments for 4 and 5?

Gu has a question regarding patch 4 a and 5 now from me too.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
  2018-07-04  5:45   ` Gu, Jinxiang
@ 2018-07-05 23:41     ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-05 23:41 UTC (permalink / raw)
  To: Gu, Jinxiang, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5071 bytes --]



On 2018年07月04日 13:45, Gu, Jinxiang wrote:
> 
> 
>> -----Original Message-----
>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-owner@vger.kernel.org] On Behalf Of Qu Wenruo
>> Sent: Tuesday, July 03, 2018 5:10 PM
>> To: linux-btrfs@vger.kernel.org
>> Subject: [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time
>>
>> A crafted btrfs with incorrect chunk<->block group mapping, it could leads
>> to a lot of unexpected behavior.
>>
>> Although the crafted image can be catched by block group item checker
>> added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
>> crafted a valid enough block group item which can pass above check but
>> still mismatch with existing chunk, it could cause a lot of undefined
>> behavior.
>>
>> This patch will add extra block group -> chunk mapping check, to ensure
>> we have a completely matching (start, len, flags) chunk for each block
>> group at mount time.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 55 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 53 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3d9fe58c0080..82b446f014b9 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10003,6 +10003,41 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info,
>>  	return cache;
>>  }
>>
>> +static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>> +			     u64 flags)
>> +{
>> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>> +	struct extent_map *em;
>> +	int ret;
>> +
>> +	read_lock(&map_tree->map_tree.lock);
>> +	em = lookup_extent_mapping(&map_tree->map_tree, start, len);
>> +	read_unlock(&map_tree->map_tree.lock);
>> +
>> +	if (!em) {
>> +		btrfs_err_rl(fs_info,
>> +	"block group start=%llu len=%llu doesn't have corresponding chunk",
>> +			     start, len);
>> +		ret = -ENOENT;
>> +		goto out;
>> +	}
> 
> This check has been done in find_first_block_group which has been called before
> check_exist_chunk be called.

Oh, yes, find_first_block_group() indeed does this check, so there is no
need for check_exsist_chunk().
> 
>> +	if (em->start != start || em->len != len ||
>> +	    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
>> +	    (flags & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
>> +		btrfs_err_rl(fs_info,
>> +"block group start=%llu len=%llu flags=0x%llx doesn't match with chunk start=%llu len=%llu flags=0x%llx",
>> +			     start, len , flags & BTRFS_BLOCK_GROUP_TYPE_MASK,
>> +			     em->start, em->len, em->map_lookup->type &
>> +			     BTRFS_BLOCK_GROUP_TYPE_MASK);
>> +		ret = -EUCLEAN;
>> +		goto out;
>> +	}
> Should this check also be added to find_first_block_group?

Yep.

Thanks,
Qu

> 
>> +	ret = 0;
>> +out:
>> +	free_extent_map(em);
>> +	return ret;
>> +}
>> +
>>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  {
>>  	struct btrfs_path *path;
>> @@ -10036,6 +10071,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  		need_clear = 1;
>>
>>  	while (1) {
>> +		struct btrfs_block_group_item bg;
>> +		int slot;
>> +
>>  		ret = find_first_block_group(info, path, &key);
>>  		if (ret > 0)
>>  			break;
>> @@ -10043,7 +10081,20 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  			goto error;
>>
>>  		leaf = path->nodes[0];
>> -		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +		slot = path->slots[0];
>> +		btrfs_item_key_to_cpu(leaf, &found_key, slot);
>> +
>> +		read_extent_buffer(leaf, &bg, btrfs_item_ptr_offset(leaf, slot),
>> +				   sizeof(bg));
>> +		/*
>> +		 * Chunk and block group must have 1:1 mapping.
>> +		 * So there must be a chunk for this block group.
>> +		 */
>> +		ret = check_exist_chunk(info, found_key.objectid,
>> +					found_key.offset,
>> +					btrfs_block_group_flags(&bg));
>> +		if (ret < 0)
>> +			goto error;
>>
>>  		cache = btrfs_create_block_group_cache(info, found_key.objectid,
>>  						       found_key.offset);
>> @@ -10068,7 +10119,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  		}
>>
>>  		read_extent_buffer(leaf, &cache->item,
>> -				   btrfs_item_ptr_offset(leaf, path->slots[0]),
>> +				   btrfs_item_ptr_offset(leaf, slot),
>>  				   sizeof(cache->item));
>>  		cache->flags = btrfs_block_group_flags(&cache->item);
>>  		if (!mixed &&
>> --
>> 2.18.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> 
> 
> N嫥叉靣笡y氊b瞂千v豝�)藓{.n�+壏{眓谶�)韰骅w*\x1fjg�\x1e秹殠娸/侁鋤罐枈�2娹櫒璀�&�)摺玜囤\x7f\x1e瓽珴閔�\x0f鎗:+v墾妛鑶佶
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-05 15:18   ` David Sterba
@ 2018-07-05 23:44     ` Qu Wenruo
  2018-07-16 13:16       ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-07-05 23:44 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3844 bytes --]



On 2018年07月05日 23:18, David Sterba wrote:
> On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
>> If a crafted btrfs has missing block group items, it could cause
>> unexpected behavior and breaks our expectation on 1:1
>> chunk<->block group mapping.
>>
>> Although we added block group -> chunk mapping check, we still need
>> chunk -> block group mapping check.
>>
>> This patch will do extra check to ensure each chunk has its
>> corresponding block group.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 82b446f014b9..746095034ca2 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Iterate all chunks and verify each of them has corresponding block group
>> + */
>> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>> +{
>> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>> +	struct extent_map *em;
>> +	struct btrfs_block_group_cache *bg;
>> +	u64 start = 0;
>> +	int ret = 0;
>> +
>> +	while (1) {
>> +		read_lock(&map_tree->map_tree.lock);
>> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
>> +					   (u64)-1 - start);
> 
> This needs a comment.

For the @len part?

> 
>> +		read_unlock(&map_tree->map_tree.lock);
>> +		if (!em)
>> +			break;
>> +
>> +		bg = btrfs_lookup_block_group(fs_info, em->start);
>> +		if (!bg) {
>> +			btrfs_err_rl(fs_info,
>> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
>> +				     em->start, em->len);
>> +			ret = -ENOENT;
> 
> -EUCLEAN ?

Either works for me.

> 
>> +			free_extent_map(em);
>> +			break;
>> +		}
>> +		if (bg->key.objectid != em->start ||
>> +		    bg->key.offset != em->len ||
>> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
>> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
>> +			btrfs_err_rl(fs_info,
> 
> Why is this ratelmited? I'd understand that a fuzzed image will spew a
> lot of these errors but for a normal case, it should be ok to print all
> the messages.

Well, even for fuzzed image it won't trigger twice, the first time it
triggers we will error our, so indeed no need to rate the limit anyway.

Thanks,
Qu

> 
>> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
>> +				em->start, em->len,
>> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
>> +				bg->key.objectid, bg->key.offset,
>> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
>> +			ret = -EUCLEAN;
>> +			free_extent_map(em);
>> +			btrfs_put_block_group(bg);
>> +			break;
>> +		}
>> +		start = em->start + em->len;
>> +		free_extent_map(em);
>> +		btrfs_put_block_group(bg);
>> +	}
>> +	return ret;
>> +}
>> +
>>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  {
>>  	struct btrfs_path *path;
>> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>  
>>  	btrfs_add_raid_kobjects(info);
>>  	init_global_block_rsv(info);
>> -	ret = 0;
>> +	ret = check_chunk_block_group_mappings(info);
>>  error:
>>  	btrfs_free_path(path);
>>  	return ret;
>> -- 
>> 2.18.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-04  9:46     ` Qu Wenruo
@ 2018-07-05 23:49       ` Qu Wenruo
  0 siblings, 0 replies; 31+ messages in thread
From: Qu Wenruo @ 2018-07-05 23:49 UTC (permalink / raw)
  To: Qu Wenruo, Nikolay Borisov, linux-btrfs



On 2018年07月04日 17:46, Qu Wenruo wrote:
> 
> 
> On 2018年07月04日 15:08, Nikolay Borisov wrote:
>>
>>
>> On  3.07.2018 12:10, Qu Wenruo wrote:
>>> If a crafted btrfs has missing block group items, it could cause
>>> unexpected behavior and breaks our expectation on 1:1
>>> chunk<->block group mapping.
>>>
>>> Although we added block group -> chunk mapping check, we still need
>>> chunk -> block group mapping check.
>>>
>>> This patch will do extra check to ensure each chunk has its
>>> corresponding block group.
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
>>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 51 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 82b446f014b9..746095034ca2 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * Iterate all chunks and verify each of them has corresponding block group
>>> + */
>>> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>>> +{
>>> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>>> +	struct extent_map *em;
>>> +	struct btrfs_block_group_cache *bg;
>>> +	u64 start = 0;
>>> +	int ret = 0;
>>> +
>>> +	while (1) {
>>> +		read_lock(&map_tree->map_tree.lock);
>>> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
>>> +					   (u64)-1 - start);
>> len parameter of lookup_extent_mapping eventually ends up in range_end.
>> Meaning it will just return -1. Why not use just -1 for len. Looking at
>> the rest of the code this seems to be the convention. But then there are
>> several places where 1 is passed as well. Hm, in any case a single
>> number is simpler than an expression.
> 
> I still like to be accurate here, since it's @len, then we should follow
> its naming.
> Although we have range_end() for correct careless caller, it still
> doesn't sound good just passing -1 as @len.
> 
>>
>>> +		read_unlock(&map_tree->map_tree.lock);
>>> +		if (!em)
>>> +			break;
>>> +
>>> +		bg = btrfs_lookup_block_group(fs_info, em->start);
>>> +		if (!bg) {
>>> +			btrfs_err_rl(fs_info,
>>> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
>>> +				     em->start, em->len);
>>> +			ret = -ENOENT;
>>> +			free_extent_map(em);
>>> +			break;
>>> +		}
>>> +		if (bg->key.objectid != em->start ||
>>> +		    bg->key.offset != em->len ||
>>> +		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
>>> +		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
>>> +			btrfs_err_rl(fs_info,
>>> +"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
>>> +				em->start, em->len,
>>> +				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
>>> +				bg->key.objectid, bg->key.offset,
>>> +				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
>>> +			ret = -EUCLEAN;
>>> +			free_extent_map(em);
>>> +			btrfs_put_block_group(bg);
>>> +			break;
>>> +		}
>>> +		start = em->start + em->len;
>>> +		free_extent_map(em);
>>> +		btrfs_put_block_group(bg);
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>>  int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>>  {
>>>  	struct btrfs_path *path;
>>> @@ -10227,7 +10277,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
>>>  
>>>  	btrfs_add_raid_kobjects(info);
>>>  	init_global_block_rsv(info);
>>> -	ret = 0;
>>> +	ret = check_chunk_block_group_mappings(info);
>>
>> Rather than doing that can we just get the count of chunks. Then if we
>> have as many chunks as BG have been read in and we know the BG -> chunk
>> mapping check has passed we can assume that chunks also map to BG
>> without going through all chunks.
> 
> Nope, just as the checks done in that function, we must ensure not only
> the number of bgs/chunks matches, but *each* chunk must have a block
> group with the same size, length and type flags.

Thanks to Gu's comment, there in find_first_block() we have already done
extra check to ensure every block group we're adding has a corresponding
chunk, thus just doing chunk/bg counting should be able to detect
missing block groups.

I'll try this method to reduce unnecessary block group lookup in next
version.

Thanks,
Qu

> 
> If we have a block group doesn't match its size/length, it's pretty
> possible that the corrupted block group may overlap with other block
> groups, causing undefined behavior.
> So the same for type flags.
> 
> This means the only reliable check is the one used in this and previous
> check.
> (Check bg->chunk matches, and then check chunk->bg matches, using size +
> len + type flags as material)
> 
> Thanks,
> Qu
> 
>>
>>>  error:
>>>  	btrfs_free_path(path);
>>>  	return ret;
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-05 23:44     ` Qu Wenruo
@ 2018-07-16 13:16       ` David Sterba
  2018-07-16 13:57         ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-07-16 13:16 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Jul 06, 2018 at 07:44:37AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月05日 23:18, David Sterba wrote:
> > On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
> >> If a crafted btrfs has missing block group items, it could cause
> >> unexpected behavior and breaks our expectation on 1:1
> >> chunk<->block group mapping.
> >>
> >> Although we added block group -> chunk mapping check, we still need
> >> chunk -> block group mapping check.
> >>
> >> This patch will do extra check to ensure each chunk has its
> >> corresponding block group.
> >>
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> >> Reported-by: Xu Wen <wen.xu@gatech.edu>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 51 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 82b446f014b9..746095034ca2 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
> >>  	return ret;
> >>  }
> >>  
> >> +/*
> >> + * Iterate all chunks and verify each of them has corresponding block group
> >> + */
> >> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> >> +{
> >> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
> >> +	struct extent_map *em;
> >> +	struct btrfs_block_group_cache *bg;
> >> +	u64 start = 0;
> >> +	int ret = 0;
> >> +
> >> +	while (1) {
> >> +		read_lock(&map_tree->map_tree.lock);
> >> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
> >> +					   (u64)-1 - start);
> > 
> > This needs a comment.
> 
> For the @len part?

Yes, for the expression how it's calculated.

> > 
> >> +		read_unlock(&map_tree->map_tree.lock);
> >> +		if (!em)
> >> +			break;
> >> +
> >> +		bg = btrfs_lookup_block_group(fs_info, em->start);
> >> +		if (!bg) {
> >> +			btrfs_err_rl(fs_info,
> >> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
> >> +				     em->start, em->len);
> >> +			ret = -ENOENT;
> > 
> > -EUCLEAN ?
> 
> Either works for me.

That's not just a cosmetic change, there's a semantic difference between
the error codes, I maybe make that more explicit and not expect that this
is obvious.

ENOENT does not make much sense in this context, the caller (mount in
this case) cannot do anything about a code that says 'some internal
structure not found'.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-16 13:16       ` David Sterba
@ 2018-07-16 13:57         ` Qu Wenruo
  2018-07-17 12:33           ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-07-16 13:57 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3685 bytes --]



On 2018年07月16日 21:16, David Sterba wrote:
> On Fri, Jul 06, 2018 at 07:44:37AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018年07月05日 23:18, David Sterba wrote:
>>> On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
>>>> If a crafted btrfs has missing block group items, it could cause
>>>> unexpected behavior and breaks our expectation on 1:1
>>>> chunk<->block group mapping.
>>>>
>>>> Although we added block group -> chunk mapping check, we still need
>>>> chunk -> block group mapping check.
>>>>
>>>> This patch will do extra check to ensure each chunk has its
>>>> corresponding block group.
>>>>
>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
>>>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>>  fs/btrfs/extent-tree.c | 52 +++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 51 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 82b446f014b9..746095034ca2 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info *fs_info, u64 start, u64 len,
>>>>  	return ret;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Iterate all chunks and verify each of them has corresponding block group
>>>> + */
>>>> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
>>>> +{
>>>> +	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
>>>> +	struct extent_map *em;
>>>> +	struct btrfs_block_group_cache *bg;
>>>> +	u64 start = 0;
>>>> +	int ret = 0;
>>>> +
>>>> +	while (1) {
>>>> +		read_lock(&map_tree->map_tree.lock);
>>>> +		em = lookup_extent_mapping(&map_tree->map_tree, start,
>>>> +					   (u64)-1 - start);
>>>
>>> This needs a comment.
>>
>> For the @len part?
> 
> Yes, for the expression how it's calculated.
> 
>>>
>>>> +		read_unlock(&map_tree->map_tree.lock);
>>>> +		if (!em)
>>>> +			break;
>>>> +
>>>> +		bg = btrfs_lookup_block_group(fs_info, em->start);
>>>> +		if (!bg) {
>>>> +			btrfs_err_rl(fs_info,
>>>> +	"chunk start=%llu len=%llu doesn't have corresponding block group",
>>>> +				     em->start, em->len);
>>>> +			ret = -ENOENT;
>>>
>>> -EUCLEAN ?
>>
>> Either works for me.
> 
> That's not just a cosmetic change, there's a semantic difference between
> the error codes, I maybe make that more explicit and not expect that this
> is obvious.
> 
> ENOENT does not make much sense in this context, the caller (mount in
> this case) cannot do anything about a code that says 'some internal
> structure not found'.

The point here is, if every self-checker should only return -EUCLEAN, it
won't really indicate what's going wrong, except points to some
self-checker (and such self-checkers are growing larger than our
expectation already).

My practice here is, put some human readable and meaningful error
message. No matter what we choose to return, the error message should
tell us what's going wrong.

In this case, I don't really care the return value. If it's explicitly
needed to return -EUCLEAN, I could make all existing checker (from
tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
anything is wrong (and save several "ret = -EUCLEAN" lines).
The return value doesn't really have much meaning nowadays, it's the
error message important now.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-16 13:57         ` Qu Wenruo
@ 2018-07-17 12:33           ` David Sterba
  2018-07-17 13:32             ` Qu Wenruo
  0 siblings, 1 reply; 31+ messages in thread
From: David Sterba @ 2018-07-17 12:33 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Mon, Jul 16, 2018 at 09:57:43PM +0800, Qu Wenruo wrote:
> >>> -EUCLEAN ?
> >>
> >> Either works for me.
> > 
> > That's not just a cosmetic change, there's a semantic difference between
> > the error codes, I maybe make that more explicit and not expect that this
> > is obvious.
> > 
> > ENOENT does not make much sense in this context, the caller (mount in
> > this case) cannot do anything about a code that says 'some internal
> > structure not found'.
> 
> The point here is, if every self-checker should only return -EUCLEAN, it
> won't really indicate what's going wrong, except points to some
> self-checker (and such self-checkers are growing larger than our
> expectation already).
> 
> My practice here is, put some human readable and meaningful error
> message. No matter what we choose to return, the error message should
> tell us what's going wrong.
> 
> In this case, I don't really care the return value. If it's explicitly
> needed to return -EUCLEAN, I could make all existing checker (from
> tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
> anything is wrong (and save several "ret = -EUCLEAN" lines).
> The return value doesn't really have much meaning nowadays, it's the
> error message important now.

Ok, I see what you mean. The message is important as it's otherwise
almost impossible to find where exactly the mount fails.

The error messages perhaps fall into several categories:

1) transient errors, some failure that happens before the filesystem state
   is fully examined

this is namely ENOMEM, or EINTR eg. returned by kthread_run

maybe also a failure on a multi-device filesystem when the devices
haven't been scanned yet

2) clearly some corruption/consistency condtion, with enough information
   available to decide

like a missing tree, most of the tree-checker would fall into this
category

3) same as the previous one, but thre's some extenal condition preventing
   a full check

that's eg. a real EIO after reading a tree block


The error code are IMO important to see how severe the problems are and
what's the expected solution. 2 is for 'check', 3 may need degraded
mount, 1 needs maybe more time to mount again.

With the error messages in place, 2 can be completely covered by
EUCLEAN. I briefly skimmed a few call paths and think that the 3
categories should be enough, but I'm also expecting some exceptions that
can be decided case by case.

The error codes are now not consistent, lots of EUCLEAN are historically
EIO, but before we start cleaning that up we should have at least some
guidelines. Please let me know what you think.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-17 12:33           ` David Sterba
@ 2018-07-17 13:32             ` Qu Wenruo
  2018-07-19 14:22               ` David Sterba
  0 siblings, 1 reply; 31+ messages in thread
From: Qu Wenruo @ 2018-07-17 13:32 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4476 bytes --]



On 2018年07月17日 20:33, David Sterba wrote:
> On Mon, Jul 16, 2018 at 09:57:43PM +0800, Qu Wenruo wrote:
>>>>> -EUCLEAN ?
>>>>
>>>> Either works for me.
>>>
>>> That's not just a cosmetic change, there's a semantic difference between
>>> the error codes, I maybe make that more explicit and not expect that this
>>> is obvious.
>>>
>>> ENOENT does not make much sense in this context, the caller (mount in
>>> this case) cannot do anything about a code that says 'some internal
>>> structure not found'.
>>
>> The point here is, if every self-checker should only return -EUCLEAN, it
>> won't really indicate what's going wrong, except points to some
>> self-checker (and such self-checkers are growing larger than our
>> expectation already).
>>
>> My practice here is, put some human readable and meaningful error
>> message. No matter what we choose to return, the error message should
>> tell us what's going wrong.
>>
>> In this case, I don't really care the return value. If it's explicitly
>> needed to return -EUCLEAN, I could make all existing checker (from
>> tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
>> anything is wrong (and save several "ret = -EUCLEAN" lines).
>> The return value doesn't really have much meaning nowadays, it's the
>> error message important now.
> 
> Ok, I see what you mean. The message is important as it's otherwise
> almost impossible to find where exactly the mount fails.
> 
> The error messages perhaps fall into several categories:
> 
> 1) transient errors, some failure that happens before the filesystem state
>    is fully examined
> 
> this is namely ENOMEM, or EINTR eg. returned by kthread_run

This standard is a little misleading, or did I misunderstand your category?

From the example error number, I could only find ENOMEM so
straightforward for end user/developer that we don't need any error
message to explain them.
Or this category is just for error no need of error message? (or can be
handled by btrfs-progs without any need of user interruption/decision?)

> 
> maybe also a failure on a multi-device filesystem when the devices
> haven't been scanned yet
> 
> 2) clearly some corruption/consistency condtion, with enough information
>    available to decide
> 
> like a missing tree, most of the tree-checker would fall into this
> category

This is pretty clear.

> 
> 3) same as the previous one, but there's some external condition preventing
>    a full check
> 
> that's eg. a real EIO after reading a tree block

That csum mismatch EIO with error message or really some error from
underlying layer like some ATA command failure?

> 
> 
> The error code are IMO important to see how severe the problems are and
> what's the expected solution. 2 is for 'check', 3 may need degraded
> mount, 1 needs maybe more time to mount again.

Category 2 for check is sure.
For other 2 cases it's a little hard to say.
Normally if we really hit some error we don't expect, under most case
the filesystem is already corrupted (e.g. a lot of errors of resuming
balance/mount failure finally turns out to be fs corruption).

If category is determined by the expected solution, most will just fall
into category 2), including most of errors we have in btrfs module
currently.

> 
> With the error messages in place, 2 can be completely covered by
> EUCLEAN. I briefly skimmed a few call paths and think that the 3
> categories should be enough, but I'm also expecting some exceptions that
> can be decided case by case.

For category 2), I think it's pretty clear and practically to use EUCLEAN.

For other categories I'm not really sure.

E.G what happens if we can't find certain backref when running delayed
refs? It's either a kernel bug or a corrupted fs.
Which category should it fit? Category 2? But we don't really know
what's going wrong.
For category 1/3? It won't really be fixed until we fix the bug or the fs.

More details examples would definitely help me understand the category.

> 
> The error codes are now not consistent, lots of EUCLEAN are historically
> EIO, but before we start cleaning that up we should have at least some
> guidelines. Please let me know what you think.
> 
At least for self-verification code it's pretty clear that we should
have error message for what's going wrong and what we expect, with
explicit EUCLEAN error number.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time
  2018-07-17 13:32             ` Qu Wenruo
@ 2018-07-19 14:22               ` David Sterba
  0 siblings, 0 replies; 31+ messages in thread
From: David Sterba @ 2018-07-19 14:22 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Tue, Jul 17, 2018 at 09:32:27PM +0800, Qu Wenruo wrote:
> On 2018年07月17日 20:33, David Sterba wrote:
> > On Mon, Jul 16, 2018 at 09:57:43PM +0800, Qu Wenruo wrote:
> >>>>> -EUCLEAN ?
> >>>>
> >>>> Either works for me.
> >>>
> >>> That's not just a cosmetic change, there's a semantic difference between
> >>> the error codes, I maybe make that more explicit and not expect that this
> >>> is obvious.
> >>>
> >>> ENOENT does not make much sense in this context, the caller (mount in
> >>> this case) cannot do anything about a code that says 'some internal
> >>> structure not found'.
> >>
> >> The point here is, if every self-checker should only return -EUCLEAN, it
> >> won't really indicate what's going wrong, except points to some
> >> self-checker (and such self-checkers are growing larger than our
> >> expectation already).
> >>
> >> My practice here is, put some human readable and meaningful error
> >> message. No matter what we choose to return, the error message should
> >> tell us what's going wrong.
> >>
> >> In this case, I don't really care the return value. If it's explicitly
> >> needed to return -EUCLEAN, I could make all existing checker (from
> >> tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
> >> anything is wrong (and save several "ret = -EUCLEAN" lines).
> >> The return value doesn't really have much meaning nowadays, it's the
> >> error message important now.
> > 
> > Ok, I see what you mean. The message is important as it's otherwise
> > almost impossible to find where exactly the mount fails.
> > 
> > The error messages perhaps fall into several categories:
> > 
> > 1) transient errors, some failure that happens before the filesystem state
> >    is fully examined
> > 
> > this is namely ENOMEM, or EINTR eg. returned by kthread_run
> 
> This standard is a little misleading, or did I misunderstand your category?
> 
> From the example error number, I could only find ENOMEM so
> straightforward for end user/developer that we don't need any error
> message to explain them.
> Or this category is just for error no need of error message? (or can be
> handled by btrfs-progs without any need of user interruption/decision?)

Yes, that's the point that there does not need to be any message. The
error code should be selfexplanatory.

> > maybe also a failure on a multi-device filesystem when the devices
> > haven't been scanned yet
> > 
> > 2) clearly some corruption/consistency condtion, with enough information
> >    available to decide
> > 
> > like a missing tree, most of the tree-checker would fall into this
> > category
> 
> This is pretty clear.
> 
> > 
> > 3) same as the previous one, but there's some external condition preventing
> >    a full check
> > 
> > that's eg. a real EIO after reading a tree block
> 
> That csum mismatch EIO with error message or really some error from
> underlying layer like some ATA command failure?

The idea for 3 is to cover hard errors, like the ATA error or anything
that block layer/driver returns. The checksum mismatch is a soft error
that still can be considered an EIO ("I can't give you the data"). So
there could be another category with the soft errors like csum
mismatches or generation mismatches etc.

> > The error code are IMO important to see how severe the problems are and
> > what's the expected solution. 2 is for 'check', 3 may need degraded
> > mount, 1 needs maybe more time to mount again.
> 
> Category 2 for check is sure.
> For other 2 cases it's a little hard to say.
> Normally if we really hit some error we don't expect, under most case
> the filesystem is already corrupted (e.g. a lot of errors of resuming
> balance/mount failure finally turns out to be fs corruption).
> 
> If category is determined by the expected solution, most will just fall
> into category 2), including most of errors we have in btrfs module
> currently.

Agreed that most things fall to 2 and we can't do much about it other
than try 'check' detect the scope of damage.

> > With the error messages in place, 2 can be completely covered by
> > EUCLEAN. I briefly skimmed a few call paths and think that the 3
> > categories should be enough, but I'm also expecting some exceptions that
> > can be decided case by case.
> 
> For category 2), I think it's pretty clear and practically to use EUCLEAN.
> 
> For other categories I'm not really sure.
> 
> E.G what happens if we can't find certain backref when running delayed
> refs? It's either a kernel bug or a corrupted fs.
> Which category should it fit? Category 2? But we don't really know
> what's going wrong.

If some critical piece of data is missing, like the backref, then it's a
structural bug and it's for 2. The reason why it happened we may not
known at the moment, but it is still a detected corruption. If it turns
out it's a bug, then it'll get fixed and the corruption will not be
detected for the 'bug' reason, but can be for a corruption or memory
bitflip error.

> For category 1/3? It won't really be fixed until we fix the bug or the fs.
> 
> More details examples would definitely help me understand the category.
> 
> > The error codes are now not consistent, lots of EUCLEAN are historically
> > EIO, but before we start cleaning that up we should have at least some
> > guidelines. Please let me know what you think.
> > 
> At least for self-verification code it's pretty clear that we should
> have error message for what's going wrong and what we expect, with
> explicit EUCLEAN error number.

Agreed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-07-19 15:05 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03  9:10 [PATCH 0/5] Enhancement for block group/chunk verification Qu Wenruo
2018-07-03  9:10 ` [PATCH 1/5] btrfs: tree-checker: Verify block_group_item Qu Wenruo
2018-07-04  2:20   ` Gu, Jinxiang
2018-07-04  5:54   ` Nikolay Borisov
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 2/5] btrfs: tree-checker: Detect invalid empty essential tree Qu Wenruo
2018-07-04  3:42   ` Gu, Jinxiang
2018-07-04  5:56   ` Nikolay Borisov
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 3/5] btrfs: relocation: Only remove reloc rb_trees if reloc control has been initialized Qu Wenruo
2018-07-04  5:23   ` Gu, Jinxiang
2018-07-04  7:37   ` Gu, Jinxiang
2018-07-03  9:10 ` [PATCH 4/5] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
2018-07-04  5:45   ` Gu, Jinxiang
2018-07-05 23:41     ` Qu Wenruo
2018-07-04  6:02   ` Nikolay Borisov
2018-07-03  9:10 ` [PATCH 5/5] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
2018-07-04  6:09   ` Gu, Jinxiang
2018-07-04  7:08   ` Nikolay Borisov
2018-07-04  9:46     ` Qu Wenruo
2018-07-05 23:49       ` Qu Wenruo
2018-07-05 15:18   ` David Sterba
2018-07-05 23:44     ` Qu Wenruo
2018-07-16 13:16       ` David Sterba
2018-07-16 13:57         ` Qu Wenruo
2018-07-17 12:33           ` David Sterba
2018-07-17 13:32             ` Qu Wenruo
2018-07-19 14:22               ` David Sterba
2018-07-04 13:36 ` [PATCH 0/5] Enhancement for block group/chunk verification David Sterba
2018-07-05  1:36   ` Qu Wenruo
2018-07-05 15:18     ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.