linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images
@ 2018-08-01  2:37 Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

The branch can be fetched from the following git repo:
https://github.com/adam900710/linux/tree/tree_checker_enhance

It's based on v4.18-rc1, with 3 patches already merged into misc-next.

This patchset introduced the following enhanced validation check:
1) chunk/block group/dev extent cross check
   Unlike extent tree, such cross check can be implemented pretty easy
   with minimal mount time impact.
   Now the kernel could do chunk/bg/dev extent check as good as btrfs
   check.

2) Locking test to avoid possible deadlock due to extent tree corruption
   Unfortunately, for extent tree we can't do really much cross check.
   Instead we use the selftest from btrfs_tree_lock() to detect and
   avoid deadlock caused by corrupted extent tree.

The 3rd patch "btrfs: Remove unused function btrfs_account_dev_extents_size()"
has also been merged into misc-next.

changelog:
v2:
  Added reviewed-by tags from Gu and Nikolay.
  Address comment from David for the 4th patch
  Address comment from Gu for the 2nd patch.

Qu Wenruo (6):
  btrfs: Check each block group has corresponding chunk at mount time
  btrfs: Verify every chunk has corresponding block group at mount time
  btrfs: Remove unused function btrfs_account_dev_extents_size()
  btrfs: Introduce mount time chunk <-> dev extent mapping check
  btrfs: Exit gracefully when failed to add chunk map
  btrfs: locking: Allow btrfs_tree_lock() to return error to avoid
    deadlock

 fs/btrfs/ctree.c           |  57 ++++++--
 fs/btrfs/disk-io.c         |   7 +
 fs/btrfs/extent-tree.c     | 114 +++++++++++++--
 fs/btrfs/extent_io.c       |   8 +-
 fs/btrfs/free-space-tree.c |   4 +-
 fs/btrfs/locking.c         |  13 +-
 fs/btrfs/locking.h         |   2 +-
 fs/btrfs/qgroup.c          |   4 +-
 fs/btrfs/relocation.c      |  13 +-
 fs/btrfs/tree-log.c        |  14 +-
 fs/btrfs/volumes.c         | 276 +++++++++++++++++++++++++------------
 fs/btrfs/volumes.h         |   4 +-
 12 files changed, 397 insertions(+), 119 deletions(-)

-- 
2.18.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  2:54   ` Su Yue
  2018-08-01  2:37 ` [PATCH v2 2/6] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

A crafted btrfs with incorrect chunk<->block group mapping, it could leads
to a lot of unexpected behavior.

Although the crafted image can be catched by block group item checker
added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
crafted a valid enough block group item which can pass above check but
still mismatch with existing chunk, it could cause a lot of undefined
behavior.

This patch will add extra block group -> chunk mapping check, to ensure
we have a completely matching (start, len, flags) chunk for each block
group at mount time.

Here we reuse the original find_first_block_group(), which is already
doing basic bg -> chunk check, adding more check on start/len and type
flags.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58c0080..63a6b5d36ac1 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9717,6 +9717,8 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 	struct btrfs_key found_key;
 	struct extent_buffer *leaf;
+	struct btrfs_block_group_item bg;
+	u64 flags;
 	int slot;
 
 	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
@@ -9751,8 +9753,33 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,
 			"logical %llu len %llu found bg but no related chunk",
 					  found_key.objectid, found_key.offset);
 				ret = -ENOENT;
+			} else if (em->start != found_key.objectid ||
+				   em->len != found_key.offset) {
+				btrfs_err(fs_info,
+		"block group %llu len %llu mismatch with chunk %llu len %llu",
+					  found_key.objectid, found_key.offset,
+					  em->start, em->len);
+				ret = -EUCLEAN;
 			} else {
-				ret = 0;
+				read_extent_buffer(leaf, &bg,
+					btrfs_item_ptr_offset(leaf, slot),
+					sizeof(bg));
+				flags = btrfs_block_group_flags(&bg) &
+					BTRFS_BLOCK_GROUP_TYPE_MASK;
+
+				if (flags != (em->map_lookup->type &
+					      BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+					btrfs_err(fs_info,
+"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
+						found_key.objectid,
+						found_key.offset,
+						flags,
+						(BTRFS_BLOCK_GROUP_TYPE_MASK &
+						 em->map_lookup->type));
+					ret = -EUCLEAN;
+				} else {
+					ret = 0;
+				}
 			}
 			free_extent_map(em);
 			goto out;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/6] btrfs: Verify every chunk has corresponding block group at mount time
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 3/6] btrfs: Remove unused function btrfs_account_dev_extents_size() Qu Wenruo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

If a crafted btrfs has missing block group items, it could cause
unexpected behavior and breaks our expectation on 1:1
chunk<->block group mapping.

Although we added block group -> chunk mapping check, we still need
chunk -> block group mapping check.

This patch will do extra check to ensure each chunk has its
corresponding block group.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Gu Jinxiang <gujx@cn.fujitsu.com>
---
 fs/btrfs/extent-tree.c | 57 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 63a6b5d36ac1..3578fa5b30ef 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -10030,6 +10030,61 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info,
 	return cache;
 }
 
+
+/*
+ * Iterate all chunks and verify each of them has corresponding block group
+ */
+static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree;
+	struct extent_map *em;
+	struct btrfs_block_group_cache *bg;
+	u64 start = 0;
+	int ret = 0;
+
+	while (1) {
+		read_lock(&map_tree->map_tree.lock);
+		/*
+		 * lookup_extent_mapping will return the first extent map
+		 * intersects the range, so set @len to 1 is enough to get
+		 * the first chunk.
+		 */
+		em = lookup_extent_mapping(&map_tree->map_tree, start, 1);
+		read_unlock(&map_tree->map_tree.lock);
+		if (!em)
+			break;
+
+		bg = btrfs_lookup_block_group(fs_info, em->start);
+		if (!bg) {
+			btrfs_err(fs_info,
+	"chunk start=%llu len=%llu doesn't have corresponding block group",
+				     em->start, em->len);
+			ret = -EUCLEAN;
+			free_extent_map(em);
+			break;
+		}
+		if (bg->key.objectid != em->start ||
+		    bg->key.offset != em->len ||
+		    (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) !=
+		    (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+			btrfs_err(fs_info,
+"chunk start=%llu len=%llu flags=0x%llx doesn't match with block group start=%llu len=%llu flags=0x%llx",
+				em->start, em->len,
+				em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK,
+				bg->key.objectid, bg->key.offset,
+				bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK);
+			ret = -EUCLEAN;
+			free_extent_map(em);
+			btrfs_put_block_group(bg);
+			break;
+		}
+		start = em->start + em->len;
+		free_extent_map(em);
+		btrfs_put_block_group(bg);
+	}
+	return ret;
+}
+
 int btrfs_read_block_groups(struct btrfs_fs_info *info)
 {
 	struct btrfs_path *path;
@@ -10203,7 +10258,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
 
 	btrfs_add_raid_kobjects(info);
 	init_global_block_rsv(info);
-	ret = 0;
+	ret = check_chunk_block_group_mappings(info);
 error:
 	btrfs_free_path(path);
 	return ret;
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/6] btrfs: Remove unused function btrfs_account_dev_extents_size()
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 2/6] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check Qu Wenruo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

This function is never used by any one in kernel, just remove it.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 85 ----------------------------------------------
 fs/btrfs/volumes.h |  2 --
 2 files changed, 87 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b33bf29130b6..e6a8e4aabc66 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1259,91 +1259,6 @@ int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
 	return ret;
 }
 
-/* helper to account the used device space in the range */
-int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
-				   u64 end, u64 *length)
-{
-	struct btrfs_key key;
-	struct btrfs_root *root = device->fs_info->dev_root;
-	struct btrfs_dev_extent *dev_extent;
-	struct btrfs_path *path;
-	u64 extent_end;
-	int ret;
-	int slot;
-	struct extent_buffer *l;
-
-	*length = 0;
-
-	if (start >= device->total_bytes ||
-		test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state))
-		return 0;
-
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-	path->reada = READA_FORWARD;
-
-	key.objectid = device->devid;
-	key.offset = start;
-	key.type = BTRFS_DEV_EXTENT_KEY;
-
-	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
-	if (ret < 0)
-		goto out;
-	if (ret > 0) {
-		ret = btrfs_previous_item(root, path, key.objectid, key.type);
-		if (ret < 0)
-			goto out;
-	}
-
-	while (1) {
-		l = path->nodes[0];
-		slot = path->slots[0];
-		if (slot >= btrfs_header_nritems(l)) {
-			ret = btrfs_next_leaf(root, path);
-			if (ret == 0)
-				continue;
-			if (ret < 0)
-				goto out;
-
-			break;
-		}
-		btrfs_item_key_to_cpu(l, &key, slot);
-
-		if (key.objectid < device->devid)
-			goto next;
-
-		if (key.objectid > device->devid)
-			break;
-
-		if (key.type != BTRFS_DEV_EXTENT_KEY)
-			goto next;
-
-		dev_extent = btrfs_item_ptr(l, slot, struct btrfs_dev_extent);
-		extent_end = key.offset + btrfs_dev_extent_length(l,
-								  dev_extent);
-		if (key.offset <= start && extent_end > end) {
-			*length = end - start + 1;
-			break;
-		} else if (key.offset <= start && extent_end > start)
-			*length += extent_end - start;
-		else if (key.offset > start && extent_end <= end)
-			*length += extent_end - key.offset;
-		else if (key.offset > start && key.offset <= end) {
-			*length += end - key.offset + 1;
-			break;
-		} else if (key.offset > end)
-			break;
-
-next:
-		path->slots[0]++;
-	}
-	ret = 0;
-out:
-	btrfs_free_path(path);
-	return ret;
-}
-
 static int contains_pending_extent(struct btrfs_transaction *transaction,
 				   struct btrfs_device *device,
 				   u64 *start, u64 len)
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 77e6004b6cb9..6d4f38ad9f5c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -384,8 +384,6 @@ static inline enum btrfs_map_op btrfs_op(struct bio *bio)
 	}
 }
 
-int btrfs_account_dev_extents_size(struct btrfs_device *device, u64 start,
-				   u64 end, u64 *length);
 void btrfs_get_bbio(struct btrfs_bio *bbio);
 void btrfs_put_bbio(struct btrfs_bio *bbio);
 int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
                   ` (2 preceding siblings ...)
  2018-08-01  2:37 ` [PATCH v2 3/6] btrfs: Remove unused function btrfs_account_dev_extents_size() Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  3:18   ` Su Yue
  2019-01-14 11:09   ` Filipe Manana
  2018-08-01  2:37 ` [PATCH v2 5/6] btrfs: Exit gracefully when failed to add chunk map Qu Wenruo
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

This patch will introduce chunk <-> dev extent mapping check, to protect
us against invalid dev extents or chunks.

Since chunk mapping is the fundamental infrastructure of btrfs, extra
check at mount time could prevent a lot of unexpected behavior (BUG_ON).

Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c |   7 ++
 fs/btrfs/volumes.c | 183 +++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/volumes.h |   2 +
 3 files changed, 192 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 205092dc9390..068ca7498e94 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3075,6 +3075,13 @@ int open_ctree(struct super_block *sb,
 	fs_info->generation = generation;
 	fs_info->last_trans_committed = generation;
 
+	ret = btrfs_verify_dev_extents(fs_info);
+	if (ret) {
+		btrfs_err(fs_info,
+			  "failed to verify dev extents against chunks: %d",
+			  ret);
+		goto fail_block_groups;
+	}
 	ret = btrfs_recover_balance(fs_info);
 	if (ret) {
 		btrfs_err(fs_info, "failed to recover balance: %d", ret);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e6a8e4aabc66..467a589854fa 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6440,6 +6440,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 	map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk);
 	map->type = btrfs_chunk_type(leaf, chunk);
 	map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
+	map->verified_stripes = 0;
 	for (i = 0; i < num_stripes; i++) {
 		map->stripes[i].physical =
 			btrfs_stripe_offset_nr(leaf, chunk, i);
@@ -7295,3 +7296,185 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
 		fs_devices = fs_devices->seed;
 	}
 }
+
+static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
+{
+	int index = btrfs_bg_flags_to_raid_index(type);
+	int ncopies = btrfs_raid_array[index].ncopies;
+	int data_stripes;
+
+	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case BTRFS_BLOCK_GROUP_RAID5:
+		data_stripes = num_stripes - 1;
+		break;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		data_stripes = num_stripes - 2;
+		break;
+	default:
+		data_stripes = num_stripes / ncopies;
+		break;
+	}
+	return div_u64(chunk_len, data_stripes);
+}
+static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
+				 u64 chunk_offset, u64 devid,
+				 u64 physical_offset, u64 physical_len)
+{
+	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
+	struct extent_map *em;
+	struct map_lookup *map;
+	u64 stripe_len;
+	bool found = false;
+	int ret = 0;
+	int i;
+
+	read_lock(&em_tree->lock);
+	em = lookup_extent_mapping(em_tree, chunk_offset, 1);
+	read_unlock(&em_tree->lock);
+
+	if (!em) {
+		ret = -EUCLEAN;
+		btrfs_err(fs_info,
+		"dev extent (%llu, %llu) doesn't have corresponding chunk",
+			  devid, physical_offset);
+		goto out;
+	}
+
+	map = em->map_lookup;
+	stripe_len = calc_stripe_length(map->type, em->len, map->num_stripes);
+	if (physical_len != stripe_len) {
+		btrfs_err(fs_info,
+"dev extent (%llu, %llu) length doesn't match with chunk %llu, have %llu expect %llu",
+			  devid, physical_offset, em->start, physical_len,
+			  stripe_len);
+		ret = -EUCLEAN;
+		goto out;
+	}
+
+	for (i = 0; i < map->num_stripes; i++) {
+		if (map->stripes[i].dev->devid == devid &&
+		    map->stripes[i].physical == physical_offset) {
+			found = true;
+			if (map->verified_stripes >= map->num_stripes) {
+				btrfs_err(fs_info,
+			"too many dev extent for chunk %llu is detected",
+					  em->start);
+				ret = -EUCLEAN;
+				goto out;
+			}
+			map->verified_stripes++;
+			break;
+		}
+	}
+	if (!found) {
+		ret = -EUCLEAN;
+		btrfs_err(fs_info,
+			"dev extent (%llu, %llu) has no corresponding chunk",
+			devid, physical_offset);
+	}
+out:
+	free_extent_map(em);
+	return ret;
+}
+
+static int verify_chunk_dev_extent_mapping(struct btrfs_fs_info *fs_info)
+{
+	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
+	struct extent_map *em;
+	struct rb_node *node;
+	int ret = 0;
+
+	read_lock(&em_tree->lock);
+	for (node = rb_first(&em_tree->map); node; node = rb_next(node)) {
+		em = rb_entry(node, struct extent_map, rb_node);
+		if (em->map_lookup->num_stripes !=
+		    em->map_lookup->verified_stripes) {
+			btrfs_err(fs_info,
+			"chunk %llu has missing dev extent, have %d expect %d",
+				  em->start, em->map_lookup->verified_stripes,
+				  em->map_lookup->num_stripes);
+			ret = -EUCLEAN;
+			goto out;
+		}
+	}
+out:
+	read_unlock(&em_tree->lock);
+	return ret;
+}
+
+/*
+ * Ensure all dev extents are mapped to correct chunk.
+ * Or later chunk allocation/free would cause unexpected behavior.
+ *
+ * NOTE: This will iterate through the whole device tree, which should be
+ * at the same size level of chunk tree.
+ * This would increase mount time by a tiny fraction.
+ */
+int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
+{
+	struct btrfs_path *path;
+	struct btrfs_root *root = fs_info->dev_root;
+	struct btrfs_key key;
+	int ret = 0;
+
+	key.objectid = 1;
+	key.type = BTRFS_DEV_EXTENT_KEY;
+	key.offset = 0;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	path->reada = READA_FORWARD;
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+
+	if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
+		ret = btrfs_next_item(root, path);
+		if (ret < 0)
+			goto out;
+		/* No dev extents at all? Not good */
+		if (ret > 0) {
+			ret = -EUCLEAN;
+			goto out;
+		}
+	}
+	while (1) {
+		struct extent_buffer *leaf = path->nodes[0];
+		struct btrfs_dev_extent *dext;
+		int slot = path->slots[0];
+		u64 chunk_offset;
+		u64 physical_offset;
+		u64 physical_len;
+		u64 devid;
+
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.type != BTRFS_DEV_EXTENT_KEY)
+			break;
+		devid = key.objectid;
+		physical_offset = key.offset;
+
+		dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
+		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
+		physical_len = btrfs_dev_extent_length(leaf, dext);
+
+		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
+					    physical_offset, physical_len);
+		if (ret < 0)
+			goto out;
+		ret = btrfs_next_item(root, path);
+		if (ret < 0)
+			goto out;
+		if (ret > 0) {
+			ret = 0;
+			break;
+		}
+	}
+
+	/* Ensure all chunks have corresponding dev extents */
+	ret = verify_chunk_dev_extent_mapping(fs_info);
+out:
+	btrfs_free_path(path);
+	return ret;
+}
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 6d4f38ad9f5c..4301bf2d0534 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -345,6 +345,7 @@ struct map_lookup {
 	u64 stripe_len;
 	int num_stripes;
 	int sub_stripes;
+	int verified_stripes; /* For mount time dev extent verification */
 	struct btrfs_bio_stripe stripes[];
 };
 
@@ -559,5 +560,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
 					struct btrfs_device *failing_dev);
+int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
 
 #endif
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/6] btrfs: Exit gracefully when failed to add chunk map
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
                   ` (3 preceding siblings ...)
  2018-08-01  2:37 ` [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  2:37 ` [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock Qu Wenruo
  2018-08-02 16:40 ` [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images David Sterba
  6 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

It's completely possible that a crafted btrfs image contains overlapping
chunks.

Although we can't detect such problem by tree-checker, but it's not a
catastrophic problem, current extent map can already detect such problem
and return -EEXIST.

We just only need to exit gracefully so btrfs can refuse to mount the
fs.

Reported-by: Xu Wen <wen.xu@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 467a589854fa..8c281c1e7f36 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6477,10 +6477,14 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
 	write_lock(&map_tree->map_tree.lock);
 	ret = add_extent_mapping(&map_tree->map_tree, em, 0);
 	write_unlock(&map_tree->map_tree.lock);
-	BUG_ON(ret); /* Tree corruption */
+	if (ret < 0) {
+		btrfs_err(fs_info,
+			  "failed to add chunk map, start=%llu len=%llu: %d",
+			  em->start, em->len, ret);
+	}
 	free_extent_map(em);
 
-	return 0;
+	return ret;
 }
 
 static void fill_device_from_item(struct extent_buffer *leaf,
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
                   ` (4 preceding siblings ...)
  2018-08-01  2:37 ` [PATCH v2 5/6] btrfs: Exit gracefully when failed to add chunk map Qu Wenruo
@ 2018-08-01  2:37 ` Qu Wenruo
  2018-08-01  2:55   ` Su Yue
  2018-08-02 16:40 ` [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images David Sterba
  6 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2018-08-01  2:37 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
For certains crafted image, whose csum root leaf has missing backref, if
we try to trigger write with data csum, it could cause deadlock with the
following kernel WARN_ON():
------
WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
Workqueue: btrfs-endio-write btrfs_endio_write_helper
RIP: 0010:btrfs_tree_lock+0x3e2/0x400
Call Trace:
 btrfs_alloc_tree_block+0x39f/0x770
 __btrfs_cow_block+0x285/0x9e0
 btrfs_cow_block+0x191/0x2e0
 btrfs_search_slot+0x492/0x1160
 btrfs_lookup_csum+0xec/0x280
 btrfs_csum_file_blocks+0x2be/0xa60
 add_pending_csums+0xaf/0xf0
 btrfs_finish_ordered_io+0x74b/0xc90
 finish_ordered_fn+0x15/0x20
 normal_work_helper+0xf6/0x500
 btrfs_endio_write_helper+0x12/0x20
 process_one_work+0x302/0x770
 worker_thread+0x81/0x6d0
 kthread+0x180/0x1d0
 ret_from_fork+0x35/0x40
---[ end trace 2e85051acb5f6dc1 ]---
------

[CAUSE]
That crafted image has missing backref for csum tree root leaf.
And when we try to allocate new tree block, since there is no
EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
and use it.

However we have already locked csum tree root leaf by ourselves, thus we
will ourselves to release read/write lock, and causing deadlock.

[FIX]
This patch will allow btrfs_tree_lock() to return error number, so
most callers can exit gracefully.
(Some caller doesn't really expect btrfs_tree_lock() to return error,
and in that case, we use BUG_ON())

Since such problem can only happen in crafted image, we will still
trigger kernel warning, but with a little more meaningful warning
message.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c           | 57 +++++++++++++++++++++++++++++++-------
 fs/btrfs/extent-tree.c     | 28 +++++++++++++++----
 fs/btrfs/extent_io.c       |  8 ++++--
 fs/btrfs/free-space-tree.c |  4 ++-
 fs/btrfs/locking.c         | 13 +++++++--
 fs/btrfs/locking.h         |  2 +-
 fs/btrfs/qgroup.c          |  4 ++-
 fs/btrfs/relocation.c      | 13 +++++++--
 fs/btrfs/tree-log.c        | 14 ++++++++--
 9 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4bc326df472e..6e695f4cd931 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -161,10 +161,12 @@ struct extent_buffer *btrfs_root_node(struct btrfs_root *root)
 struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root)
 {
 	struct extent_buffer *eb;
+	int ret;
 
 	while (1) {
 		eb = btrfs_root_node(root);
-		btrfs_tree_lock(eb);
+		ret = btrfs_tree_lock(eb);
+		BUG_ON(ret < 0);
 		if (eb == root->node)
 			break;
 		btrfs_tree_unlock(eb);
@@ -1584,6 +1586,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
 	for (i = start_slot; i <= end_slot; i++) {
 		struct btrfs_key first_key;
 		int close = 1;
+		int ret;
 
 		btrfs_node_key(parent, &disk_key, i);
 		if (!progress_passed && comp_keys(&disk_key, progress) < 0)
@@ -1637,7 +1640,11 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
 		if (search_start == 0)
 			search_start = last_block;
 
-		btrfs_tree_lock(cur);
+		ret = btrfs_tree_lock(cur);
+		if (ret < 0) {
+			free_extent_buffer(cur);
+			return ret;
+		}
 		btrfs_set_lock_blocking(cur);
 		err = __btrfs_cow_block(trans, root, cur, parent, i,
 					&cur, search_start,
@@ -1853,7 +1860,11 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 			goto enospc;
 		}
 
-		btrfs_tree_lock(child);
+		ret = btrfs_tree_lock(child);
+		if (ret < 0) {
+			free_extent_buffer(child);
+			goto enospc;
+		}
 		btrfs_set_lock_blocking(child);
 		ret = btrfs_cow_block(trans, root, child, mid, 0, &child);
 		if (ret) {
@@ -1891,7 +1902,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		left = NULL;
 
 	if (left) {
-		btrfs_tree_lock(left);
+		ret = btrfs_tree_lock(left);
+		if (ret < 0) {
+			free_extent_buffer(left);
+			left = NULL;
+			goto enospc;
+		}
 		btrfs_set_lock_blocking(left);
 		wret = btrfs_cow_block(trans, root, left,
 				       parent, pslot - 1, &left);
@@ -1906,7 +1922,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
 		right = NULL;
 
 	if (right) {
-		btrfs_tree_lock(right);
+		ret = btrfs_tree_lock(right);
+		if (ret < 0) {
+			free_extent_buffer(right);
+			right = NULL;
+			goto enospc;
+		}
 		btrfs_set_lock_blocking(right);
 		wret = btrfs_cow_block(trans, root, right,
 				       parent, pslot + 1, &right);
@@ -2069,7 +2090,11 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 	if (left) {
 		u32 left_nr;
 
-		btrfs_tree_lock(left);
+		ret = btrfs_tree_lock(left);
+		if (ret < 0) {
+			free_extent_buffer(left);
+			return ret;
+		}
 		btrfs_set_lock_blocking(left);
 
 		left_nr = btrfs_header_nritems(left);
@@ -2124,7 +2149,11 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
 	if (right) {
 		u32 right_nr;
 
-		btrfs_tree_lock(right);
+		ret = btrfs_tree_lock(right);
+		if (ret < 0) {
+			free_extent_buffer(right);
+			return ret;
+		}
 		btrfs_set_lock_blocking(right);
 
 		right_nr = btrfs_header_nritems(right);
@@ -2874,7 +2903,9 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 					err = btrfs_try_tree_write_lock(b);
 					if (!err) {
 						btrfs_set_path_blocking(p);
-						btrfs_tree_lock(b);
+						ret = btrfs_tree_lock(b);
+						if (ret < 0)
+							goto done;
 						btrfs_clear_path_blocking(p, b,
 								  BTRFS_WRITE_LOCK);
 					}
@@ -3773,7 +3804,9 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 	if (IS_ERR(right))
 		return 1;
 
-	btrfs_tree_lock(right);
+	ret = btrfs_tree_lock(right);
+	if (ret < 0)
+		goto out_free;
 	btrfs_set_lock_blocking(right);
 
 	free_space = btrfs_leaf_free_space(fs_info, right);
@@ -3811,6 +3844,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 				right, free_space, left_nritems, min_slot);
 out_unlock:
 	btrfs_tree_unlock(right);
+out_free:
 	free_extent_buffer(right);
 	return 1;
 }
@@ -4007,7 +4041,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 	if (IS_ERR(left))
 		return 1;
 
-	btrfs_tree_lock(left);
+	ret = btrfs_tree_lock(left);
+	if (ret < 0)
+		goto out_free;
 	btrfs_set_lock_blocking(left);
 
 	free_space = btrfs_leaf_free_space(fs_info, left);
@@ -4037,6 +4073,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 			       max_slot);
 out:
 	btrfs_tree_unlock(left);
+out_free:
 	free_extent_buffer(left);
 	return ret;
 }
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3578fa5b30ef..da615ebc072e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8297,14 +8297,19 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 {
 	struct btrfs_fs_info *fs_info = root->fs_info;
 	struct extent_buffer *buf;
+	int ret;
 
 	buf = btrfs_find_create_tree_block(fs_info, bytenr);
 	if (IS_ERR(buf))
 		return buf;
+	ret = btrfs_tree_lock(buf);
+	if (ret < 0) {
+		free_extent_buffer(buf);
+		return ERR_PTR(ret);
+	}
 
 	btrfs_set_header_generation(buf, trans->transid);
 	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
-	btrfs_tree_lock(buf);
 	clean_tree_block(fs_info, buf);
 	clear_bit(EXTENT_BUFFER_STALE, &buf->bflags);
 
@@ -8721,7 +8726,9 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 					       level - 1);
 		reada = 1;
 	}
-	btrfs_tree_lock(next);
+	ret = btrfs_tree_lock(next);
+	if (ret < 0)
+		goto out_free;
 	btrfs_set_lock_blocking(next);
 
 	ret = btrfs_lookup_extent_info(trans, fs_info, bytenr, level - 1, 1,
@@ -8781,7 +8788,9 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 			free_extent_buffer(next);
 			return -EIO;
 		}
-		btrfs_tree_lock(next);
+		ret = btrfs_tree_lock(next);
+		if (ret < 0)
+			goto out_free;
 		btrfs_set_lock_blocking(next);
 	}
 
@@ -8839,6 +8848,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 
 out_unlock:
 	btrfs_tree_unlock(next);
+out_free:
 	free_extent_buffer(next);
 
 	return ret;
@@ -8887,7 +8897,8 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
 		 */
 		if (!path->locks[level]) {
 			BUG_ON(level == 0);
-			btrfs_tree_lock(eb);
+			ret = btrfs_tree_lock(eb);
+			BUG_ON(ret < 0);
 			btrfs_set_lock_blocking(eb);
 			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
 
@@ -8929,7 +8940,8 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
 		/* make block locked assertion in clean_tree_block happy */
 		if (!path->locks[level] &&
 		    btrfs_header_generation(eb) == trans->transid) {
-			btrfs_tree_lock(eb);
+			ret = btrfs_tree_lock(eb);
+			BUG_ON(ret < 0);
 			btrfs_set_lock_blocking(eb);
 			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
 		}
@@ -9107,7 +9119,11 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 
 		level = btrfs_header_level(root->node);
 		while (1) {
-			btrfs_tree_lock(path->nodes[level]);
+			ret = btrfs_tree_lock(path->nodes[level]);
+			if (ret < 0) {
+				err = ret;
+				goto out_end_trans;
+			}
 			btrfs_set_lock_blocking(path->nodes[level]);
 			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index cce6087d6880..3ba02aac6dd5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3545,7 +3545,9 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 	if (!btrfs_try_tree_write_lock(eb)) {
 		flush = 1;
 		flush_write_bio(epd);
-		btrfs_tree_lock(eb);
+		ret = btrfs_tree_lock(eb);
+		if (ret < 0)
+			return ret;
 	}
 
 	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
@@ -3558,7 +3560,9 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
 		}
 		while (1) {
 			wait_on_extent_buffer_writeback(eb);
-			btrfs_tree_lock(eb);
+			ret = btrfs_tree_lock(eb);
+			if (ret < 0)
+				return ret;
 			if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags))
 				break;
 			btrfs_tree_unlock(eb);
diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index b5950aacd697..948bbc45638a 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -1242,7 +1242,9 @@ int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info)
 
 	list_del(&free_space_root->dirty_list);
 
-	btrfs_tree_lock(free_space_root->node);
+	ret = btrfs_tree_lock(free_space_root->node);
+	if (ret < 0)
+		goto abort;
 	clean_tree_block(fs_info, free_space_root->node);
 	btrfs_tree_unlock(free_space_root->node);
 	btrfs_free_tree_block(trans, free_space_root, free_space_root->node,
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 1da768e5ef75..ca151be026bf 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -224,10 +224,18 @@ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
 /*
  * take a spinning write lock.  This will wait for both
  * blocking readers or writers
+ *
+ * Return 0 if we successfully locked the tree block
+ * Return <0 if some selftest failed (mostly due to extent tree corruption)
  */
-void btrfs_tree_lock(struct extent_buffer *eb)
+int btrfs_tree_lock(struct extent_buffer *eb)
 {
-	WARN_ON(eb->lock_owner == current->pid);
+	if (unlikely(eb->lock_owner == current->pid)) {
+		WARN(1,
+"tree block %llu already locked by current (pid=%d), refuse to lock",
+		     eb->start, current->pid);
+		return -EUCLEAN;
+	}
 again:
 	wait_event(eb->read_lock_wq, atomic_read(&eb->blocking_readers) == 0);
 	wait_event(eb->write_lock_wq, atomic_read(&eb->blocking_writers) == 0);
@@ -248,6 +256,7 @@ void btrfs_tree_lock(struct extent_buffer *eb)
 	atomic_inc(&eb->spinning_writers);
 	atomic_inc(&eb->write_locks);
 	eb->lock_owner = current->pid;
+	return 0;
 }
 
 /*
diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
index 29135def468e..7394d7ba2ff9 100644
--- a/fs/btrfs/locking.h
+++ b/fs/btrfs/locking.h
@@ -11,7 +11,7 @@
 #define BTRFS_WRITE_LOCK_BLOCKING 3
 #define BTRFS_READ_LOCK_BLOCKING 4
 
-void btrfs_tree_lock(struct extent_buffer *eb);
+int btrfs_tree_lock(struct extent_buffer *eb);
 void btrfs_tree_unlock(struct extent_buffer *eb);
 
 void btrfs_tree_read_lock(struct extent_buffer *eb);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1874a6d2e6f5..ec4351fd7537 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1040,7 +1040,9 @@ int btrfs_quota_disable(struct btrfs_trans_handle *trans,
 
 	list_del(&quota_root->dirty_list);
 
-	btrfs_tree_lock(quota_root->node);
+	ret = btrfs_tree_lock(quota_root->node);
+	if (ret < 0)
+		goto out;
 	clean_tree_block(fs_info, quota_root->node);
 	btrfs_tree_unlock(quota_root->node);
 	btrfs_free_tree_block(trans, quota_root, quota_root->node, 0, 1);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index be94c65bb4d2..a2fc0bd83a40 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1877,7 +1877,11 @@ int replace_path(struct btrfs_trans_handle *trans,
 				free_extent_buffer(eb);
 				break;
 			}
-			btrfs_tree_lock(eb);
+			ret = btrfs_tree_lock(eb);
+			if (ret < 0) {
+				free_extent_buffer(eb);
+				break;
+			}
 			if (cow) {
 				ret = btrfs_cow_block(trans, dest, eb, parent,
 						      slot, &eb);
@@ -2788,7 +2792,12 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 			err = -EIO;
 			goto next;
 		}
-		btrfs_tree_lock(eb);
+		ret = btrfs_tree_lock(eb);
+		if (ret < 0) {
+			free_extent_buffer(eb);
+			err = ret;
+			goto next;
+		}
 		btrfs_set_lock_blocking(eb);
 
 		if (!node->eb) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index f8220ec02036..173bc15a7cd1 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2588,7 +2588,11 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans,
 				}
 
 				if (trans) {
-					btrfs_tree_lock(next);
+					ret = btrfs_tree_lock(next);
+					if (ret < 0) {
+						free_extent_buffer(next);
+						return ret;
+					}
 					btrfs_set_lock_blocking(next);
 					clean_tree_block(fs_info, next);
 					btrfs_wait_tree_block_writeback(next);
@@ -2672,7 +2676,9 @@ static noinline int walk_up_log_tree(struct btrfs_trans_handle *trans,
 				next = path->nodes[*level];
 
 				if (trans) {
-					btrfs_tree_lock(next);
+					ret = btrfs_tree_lock(next);
+					if (ret < 0)
+						return ret;
 					btrfs_set_lock_blocking(next);
 					clean_tree_block(fs_info, next);
 					btrfs_wait_tree_block_writeback(next);
@@ -2754,7 +2760,9 @@ static int walk_log_tree(struct btrfs_trans_handle *trans,
 			next = path->nodes[orig_level];
 
 			if (trans) {
-				btrfs_tree_lock(next);
+				ret = btrfs_tree_lock(next);
+				if (ret < 0)
+					goto out;
 				btrfs_set_lock_blocking(next);
 				clean_tree_block(fs_info, next);
 				btrfs_wait_tree_block_writeback(next);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time
  2018-08-01  2:37 ` [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
@ 2018-08-01  2:54   ` Su Yue
  0 siblings, 0 replies; 14+ messages in thread
From: Su Yue @ 2018-08-01  2:54 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 08/01/2018 10:37 AM, Qu Wenruo wrote:
> A crafted btrfs with incorrect chunk<->block group mapping, it could leads
> to a lot of unexpected behavior.
> 
> Although the crafted image can be catched by block group item checker
> added in "[PATCH] btrfs: tree-checker: Verify block_group_item", if one
> crafted a valid enough block group item which can pass above check but
> still mismatch with existing chunk, it could cause a lot of undefined
> behavior.
> 
> This patch will add extra block group -> chunk mapping check, to ensure
> we have a completely matching (start, len, flags) chunk for each block
> group at mount time.
> 
> Here we reuse the original find_first_block_group(), which is already
> doing basic bg -> chunk check, adding more check on start/len and type
> flags.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>

> ---
>   fs/btrfs/extent-tree.c | 29 ++++++++++++++++++++++++++++-
>   1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58c0080..63a6b5d36ac1 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9717,6 +9717,8 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,
>   	int ret = 0;
>   	struct btrfs_key found_key;
>   	struct extent_buffer *leaf;
> +	struct btrfs_block_group_item bg;
> +	u64 flags;
>   	int slot;
>   
>   	ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> @@ -9751,8 +9753,33 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info,
>   			"logical %llu len %llu found bg but no related chunk",
>   					  found_key.objectid, found_key.offset);
>   				ret = -ENOENT;
> +			} else if (em->start != found_key.objectid ||
> +				   em->len != found_key.offset) {
> +				btrfs_err(fs_info,
> +		"block group %llu len %llu mismatch with chunk %llu len %llu",
> +					  found_key.objectid, found_key.offset,
> +					  em->start, em->len);
> +				ret = -EUCLEAN;
>   			} else {
> -				ret = 0;
> +				read_extent_buffer(leaf, &bg,
> +					btrfs_item_ptr_offset(leaf, slot),
> +					sizeof(bg));
> +				flags = btrfs_block_group_flags(&bg) &
> +					BTRFS_BLOCK_GROUP_TYPE_MASK;
> +
> +				if (flags != (em->map_lookup->type &
> +					      BTRFS_BLOCK_GROUP_TYPE_MASK)) {
> +					btrfs_err(fs_info,
> +"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx",
> +						found_key.objectid,
> +						found_key.offset,
> +						flags,
> +						(BTRFS_BLOCK_GROUP_TYPE_MASK &
> +						 em->map_lookup->type));
> +					ret = -EUCLEAN;
> +				} else {
> +					ret = 0;
> +				}
>   			}
>   			free_extent_map(em);
>   			goto out;
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock
  2018-08-01  2:37 ` [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock Qu Wenruo
@ 2018-08-01  2:55   ` Su Yue
  0 siblings, 0 replies; 14+ messages in thread
From: Su Yue @ 2018-08-01  2:55 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 08/01/2018 10:37 AM, Qu Wenruo wrote:
> [BUG]
> For certains crafted image, whose csum root leaf has missing backref, if
> we try to trigger write with data csum, it could cause deadlock with the
> following kernel WARN_ON():
> ------
> WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
> CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
> Workqueue: btrfs-endio-write btrfs_endio_write_helper
> RIP: 0010:btrfs_tree_lock+0x3e2/0x400
> Call Trace:
>   btrfs_alloc_tree_block+0x39f/0x770
>   __btrfs_cow_block+0x285/0x9e0
>   btrfs_cow_block+0x191/0x2e0
>   btrfs_search_slot+0x492/0x1160
>   btrfs_lookup_csum+0xec/0x280
>   btrfs_csum_file_blocks+0x2be/0xa60
>   add_pending_csums+0xaf/0xf0
>   btrfs_finish_ordered_io+0x74b/0xc90
>   finish_ordered_fn+0x15/0x20
>   normal_work_helper+0xf6/0x500
>   btrfs_endio_write_helper+0x12/0x20
>   process_one_work+0x302/0x770
>   worker_thread+0x81/0x6d0
>   kthread+0x180/0x1d0
>   ret_from_fork+0x35/0x40
> ---[ end trace 2e85051acb5f6dc1 ]---
> ------
> 
> [CAUSE]
> That crafted image has missing backref for csum tree root leaf.
> And when we try to allocate new tree block, since there is no
> EXTENT/METADATA_ITEM for csum tree root, btrfs consider it's free slot
> and use it.
> 
> However we have already locked csum tree root leaf by ourselves, thus we
> will ourselves to release read/write lock, and causing deadlock.
> 
> [FIX]
> This patch will allow btrfs_tree_lock() to return error number, so
> most callers can exit gracefully.
> (Some caller doesn't really expect btrfs_tree_lock() to return error,
> and in that case, we use BUG_ON())
> 
> Since such problem can only happen in crafted image, we will still
> trigger kernel warning, but with a little more meaningful warning
> message.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>

> ---
>   fs/btrfs/ctree.c           | 57 +++++++++++++++++++++++++++++++-------
>   fs/btrfs/extent-tree.c     | 28 +++++++++++++++----
>   fs/btrfs/extent_io.c       |  8 ++++--
>   fs/btrfs/free-space-tree.c |  4 ++-
>   fs/btrfs/locking.c         | 13 +++++++--
>   fs/btrfs/locking.h         |  2 +-
>   fs/btrfs/qgroup.c          |  4 ++-
>   fs/btrfs/relocation.c      | 13 +++++++--
>   fs/btrfs/tree-log.c        | 14 ++++++++--
>   9 files changed, 115 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4bc326df472e..6e695f4cd931 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -161,10 +161,12 @@ struct extent_buffer *btrfs_root_node(struct btrfs_root *root)
>   struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root)
>   {
>   	struct extent_buffer *eb;
> +	int ret;
>   
>   	while (1) {
>   		eb = btrfs_root_node(root);
> -		btrfs_tree_lock(eb);
> +		ret = btrfs_tree_lock(eb);
> +		BUG_ON(ret < 0);
>   		if (eb == root->node)
>   			break;
>   		btrfs_tree_unlock(eb);
> @@ -1584,6 +1586,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
>   	for (i = start_slot; i <= end_slot; i++) {
>   		struct btrfs_key first_key;
>   		int close = 1;
> +		int ret;
>   
>   		btrfs_node_key(parent, &disk_key, i);
>   		if (!progress_passed && comp_keys(&disk_key, progress) < 0)
> @@ -1637,7 +1640,11 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
>   		if (search_start == 0)
>   			search_start = last_block;
>   
> -		btrfs_tree_lock(cur);
> +		ret = btrfs_tree_lock(cur);
> +		if (ret < 0) {
> +			free_extent_buffer(cur);
> +			return ret;
> +		}
>   		btrfs_set_lock_blocking(cur);
>   		err = __btrfs_cow_block(trans, root, cur, parent, i,
>   					&cur, search_start,
> @@ -1853,7 +1860,11 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   			goto enospc;
>   		}
>   
> -		btrfs_tree_lock(child);
> +		ret = btrfs_tree_lock(child);
> +		if (ret < 0) {
> +			free_extent_buffer(child);
> +			goto enospc;
> +		}
>   		btrfs_set_lock_blocking(child);
>   		ret = btrfs_cow_block(trans, root, child, mid, 0, &child);
>   		if (ret) {
> @@ -1891,7 +1902,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		left = NULL;
>   
>   	if (left) {
> -		btrfs_tree_lock(left);
> +		ret = btrfs_tree_lock(left);
> +		if (ret < 0) {
> +			free_extent_buffer(left);
> +			left = NULL;
> +			goto enospc;
> +		}
>   		btrfs_set_lock_blocking(left);
>   		wret = btrfs_cow_block(trans, root, left,
>   				       parent, pslot - 1, &left);
> @@ -1906,7 +1922,12 @@ static noinline int balance_level(struct btrfs_trans_handle *trans,
>   		right = NULL;
>   
>   	if (right) {
> -		btrfs_tree_lock(right);
> +		ret = btrfs_tree_lock(right);
> +		if (ret < 0) {
> +			free_extent_buffer(right);
> +			right = NULL;
> +			goto enospc;
> +		}
>   		btrfs_set_lock_blocking(right);
>   		wret = btrfs_cow_block(trans, root, right,
>   				       parent, pslot + 1, &right);
> @@ -2069,7 +2090,11 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>   	if (left) {
>   		u32 left_nr;
>   
> -		btrfs_tree_lock(left);
> +		ret = btrfs_tree_lock(left);
> +		if (ret < 0) {
> +			free_extent_buffer(left);
> +			return ret;
> +		}
>   		btrfs_set_lock_blocking(left);
>   
>   		left_nr = btrfs_header_nritems(left);
> @@ -2124,7 +2149,11 @@ static noinline int push_nodes_for_insert(struct btrfs_trans_handle *trans,
>   	if (right) {
>   		u32 right_nr;
>   
> -		btrfs_tree_lock(right);
> +		ret = btrfs_tree_lock(right);
> +		if (ret < 0) {
> +			free_extent_buffer(right);
> +			return ret;
> +		}
>   		btrfs_set_lock_blocking(right);
>   
>   		right_nr = btrfs_header_nritems(right);
> @@ -2874,7 +2903,9 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>   					err = btrfs_try_tree_write_lock(b);
>   					if (!err) {
>   						btrfs_set_path_blocking(p);
> -						btrfs_tree_lock(b);
> +						ret = btrfs_tree_lock(b);
> +						if (ret < 0)
> +							goto done;
>   						btrfs_clear_path_blocking(p, b,
>   								  BTRFS_WRITE_LOCK);
>   					}
> @@ -3773,7 +3804,9 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>   	if (IS_ERR(right))
>   		return 1;
>   
> -	btrfs_tree_lock(right);
> +	ret = btrfs_tree_lock(right);
> +	if (ret < 0)
> +		goto out_free;
>   	btrfs_set_lock_blocking(right);
>   
>   	free_space = btrfs_leaf_free_space(fs_info, right);
> @@ -3811,6 +3844,7 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>   				right, free_space, left_nritems, min_slot);
>   out_unlock:
>   	btrfs_tree_unlock(right);
> +out_free:
>   	free_extent_buffer(right);
>   	return 1;
>   }
> @@ -4007,7 +4041,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>   	if (IS_ERR(left))
>   		return 1;
>   
> -	btrfs_tree_lock(left);
> +	ret = btrfs_tree_lock(left);
> +	if (ret < 0)
> +		goto out_free;
>   	btrfs_set_lock_blocking(left);
>   
>   	free_space = btrfs_leaf_free_space(fs_info, left);
> @@ -4037,6 +4073,7 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>   			       max_slot);
>   out:
>   	btrfs_tree_unlock(left);
> +out_free:
>   	free_extent_buffer(left);
>   	return ret;
>   }
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3578fa5b30ef..da615ebc072e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8297,14 +8297,19 @@ btrfs_init_new_buffer(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>   {
>   	struct btrfs_fs_info *fs_info = root->fs_info;
>   	struct extent_buffer *buf;
> +	int ret;
>   
>   	buf = btrfs_find_create_tree_block(fs_info, bytenr);
>   	if (IS_ERR(buf))
>   		return buf;
> +	ret = btrfs_tree_lock(buf);
> +	if (ret < 0) {
> +		free_extent_buffer(buf);
> +		return ERR_PTR(ret);
> +	}
>   
>   	btrfs_set_header_generation(buf, trans->transid);
>   	btrfs_set_buffer_lockdep_class(root->root_key.objectid, buf, level);
> -	btrfs_tree_lock(buf);
>   	clean_tree_block(fs_info, buf);
>   	clear_bit(EXTENT_BUFFER_STALE, &buf->bflags);
>   
> @@ -8721,7 +8726,9 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
>   					       level - 1);
>   		reada = 1;
>   	}
> -	btrfs_tree_lock(next);
> +	ret = btrfs_tree_lock(next);
> +	if (ret < 0)
> +		goto out_free;
>   	btrfs_set_lock_blocking(next);
>   
>   	ret = btrfs_lookup_extent_info(trans, fs_info, bytenr, level - 1, 1,
> @@ -8781,7 +8788,9 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
>   			free_extent_buffer(next);
>   			return -EIO;
>   		}
> -		btrfs_tree_lock(next);
> +		ret = btrfs_tree_lock(next);
> +		if (ret < 0)
> +			goto out_free;
>   		btrfs_set_lock_blocking(next);
>   	}
>   
> @@ -8839,6 +8848,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
>   
>   out_unlock:
>   	btrfs_tree_unlock(next);
> +out_free:
>   	free_extent_buffer(next);
>   
>   	return ret;
> @@ -8887,7 +8897,8 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
>   		 */
>   		if (!path->locks[level]) {
>   			BUG_ON(level == 0);
> -			btrfs_tree_lock(eb);
> +			ret = btrfs_tree_lock(eb);
> +			BUG_ON(ret < 0);
>   			btrfs_set_lock_blocking(eb);
>   			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
>   
> @@ -8929,7 +8940,8 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
>   		/* make block locked assertion in clean_tree_block happy */
>   		if (!path->locks[level] &&
>   		    btrfs_header_generation(eb) == trans->transid) {
> -			btrfs_tree_lock(eb);
> +			ret = btrfs_tree_lock(eb);
> +			BUG_ON(ret < 0);
>   			btrfs_set_lock_blocking(eb);
>   			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
>   		}
> @@ -9107,7 +9119,11 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>   
>   		level = btrfs_header_level(root->node);
>   		while (1) {
> -			btrfs_tree_lock(path->nodes[level]);
> +			ret = btrfs_tree_lock(path->nodes[level]);
> +			if (ret < 0) {
> +				err = ret;
> +				goto out_end_trans;
> +			}
>   			btrfs_set_lock_blocking(path->nodes[level]);
>   			path->locks[level] = BTRFS_WRITE_LOCK_BLOCKING;
>   
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index cce6087d6880..3ba02aac6dd5 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3545,7 +3545,9 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
>   	if (!btrfs_try_tree_write_lock(eb)) {
>   		flush = 1;
>   		flush_write_bio(epd);
> -		btrfs_tree_lock(eb);
> +		ret = btrfs_tree_lock(eb);
> +		if (ret < 0)
> +			return ret;
>   	}
>   
>   	if (test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags)) {
> @@ -3558,7 +3560,9 @@ lock_extent_buffer_for_io(struct extent_buffer *eb,
>   		}
>   		while (1) {
>   			wait_on_extent_buffer_writeback(eb);
> -			btrfs_tree_lock(eb);
> +			ret = btrfs_tree_lock(eb);
> +			if (ret < 0)
> +				return ret;
>   			if (!test_bit(EXTENT_BUFFER_WRITEBACK, &eb->bflags))
>   				break;
>   			btrfs_tree_unlock(eb);
> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
> index b5950aacd697..948bbc45638a 100644
> --- a/fs/btrfs/free-space-tree.c
> +++ b/fs/btrfs/free-space-tree.c
> @@ -1242,7 +1242,9 @@ int btrfs_clear_free_space_tree(struct btrfs_fs_info *fs_info)
>   
>   	list_del(&free_space_root->dirty_list);
>   
> -	btrfs_tree_lock(free_space_root->node);
> +	ret = btrfs_tree_lock(free_space_root->node);
> +	if (ret < 0)
> +		goto abort;
>   	clean_tree_block(fs_info, free_space_root->node);
>   	btrfs_tree_unlock(free_space_root->node);
>   	btrfs_free_tree_block(trans, free_space_root, free_space_root->node,
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index 1da768e5ef75..ca151be026bf 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -224,10 +224,18 @@ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
>   /*
>    * take a spinning write lock.  This will wait for both
>    * blocking readers or writers
> + *
> + * Return 0 if we successfully locked the tree block
> + * Return <0 if some selftest failed (mostly due to extent tree corruption)
>    */
> -void btrfs_tree_lock(struct extent_buffer *eb)
> +int btrfs_tree_lock(struct extent_buffer *eb)
>   {
> -	WARN_ON(eb->lock_owner == current->pid);
> +	if (unlikely(eb->lock_owner == current->pid)) {
> +		WARN(1,
> +"tree block %llu already locked by current (pid=%d), refuse to lock",
> +		     eb->start, current->pid);
> +		return -EUCLEAN;
> +	}
>   again:
>   	wait_event(eb->read_lock_wq, atomic_read(&eb->blocking_readers) == 0);
>   	wait_event(eb->write_lock_wq, atomic_read(&eb->blocking_writers) == 0);
> @@ -248,6 +256,7 @@ void btrfs_tree_lock(struct extent_buffer *eb)
>   	atomic_inc(&eb->spinning_writers);
>   	atomic_inc(&eb->write_locks);
>   	eb->lock_owner = current->pid;
> +	return 0;
>   }
>   
>   /*
> diff --git a/fs/btrfs/locking.h b/fs/btrfs/locking.h
> index 29135def468e..7394d7ba2ff9 100644
> --- a/fs/btrfs/locking.h
> +++ b/fs/btrfs/locking.h
> @@ -11,7 +11,7 @@
>   #define BTRFS_WRITE_LOCK_BLOCKING 3
>   #define BTRFS_READ_LOCK_BLOCKING 4
>   
> -void btrfs_tree_lock(struct extent_buffer *eb);
> +int btrfs_tree_lock(struct extent_buffer *eb);
>   void btrfs_tree_unlock(struct extent_buffer *eb);
>   
>   void btrfs_tree_read_lock(struct extent_buffer *eb);
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 1874a6d2e6f5..ec4351fd7537 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1040,7 +1040,9 @@ int btrfs_quota_disable(struct btrfs_trans_handle *trans,
>   
>   	list_del(&quota_root->dirty_list);
>   
> -	btrfs_tree_lock(quota_root->node);
> +	ret = btrfs_tree_lock(quota_root->node);
> +	if (ret < 0)
> +		goto out;
>   	clean_tree_block(fs_info, quota_root->node);
>   	btrfs_tree_unlock(quota_root->node);
>   	btrfs_free_tree_block(trans, quota_root, quota_root->node, 0, 1);
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index be94c65bb4d2..a2fc0bd83a40 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1877,7 +1877,11 @@ int replace_path(struct btrfs_trans_handle *trans,
>   				free_extent_buffer(eb);
>   				break;
>   			}
> -			btrfs_tree_lock(eb);
> +			ret = btrfs_tree_lock(eb);
> +			if (ret < 0) {
> +				free_extent_buffer(eb);
> +				break;
> +			}
>   			if (cow) {
>   				ret = btrfs_cow_block(trans, dest, eb, parent,
>   						      slot, &eb);
> @@ -2788,7 +2792,12 @@ static int do_relocation(struct btrfs_trans_handle *trans,
>   			err = -EIO;
>   			goto next;
>   		}
> -		btrfs_tree_lock(eb);
> +		ret = btrfs_tree_lock(eb);
> +		if (ret < 0) {
> +			free_extent_buffer(eb);
> +			err = ret;
> +			goto next;
> +		}
>   		btrfs_set_lock_blocking(eb);
>   
>   		if (!node->eb) {
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index f8220ec02036..173bc15a7cd1 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -2588,7 +2588,11 @@ static noinline int walk_down_log_tree(struct btrfs_trans_handle *trans,
>   				}
>   
>   				if (trans) {
> -					btrfs_tree_lock(next);
> +					ret = btrfs_tree_lock(next);
> +					if (ret < 0) {
> +						free_extent_buffer(next);
> +						return ret;
> +					}
>   					btrfs_set_lock_blocking(next);
>   					clean_tree_block(fs_info, next);
>   					btrfs_wait_tree_block_writeback(next);
> @@ -2672,7 +2676,9 @@ static noinline int walk_up_log_tree(struct btrfs_trans_handle *trans,
>   				next = path->nodes[*level];
>   
>   				if (trans) {
> -					btrfs_tree_lock(next);
> +					ret = btrfs_tree_lock(next);
> +					if (ret < 0)
> +						return ret;
>   					btrfs_set_lock_blocking(next);
>   					clean_tree_block(fs_info, next);
>   					btrfs_wait_tree_block_writeback(next);
> @@ -2754,7 +2760,9 @@ static int walk_log_tree(struct btrfs_trans_handle *trans,
>   			next = path->nodes[orig_level];
>   
>   			if (trans) {
> -				btrfs_tree_lock(next);
> +				ret = btrfs_tree_lock(next);
> +				if (ret < 0)
> +					goto out;
>   				btrfs_set_lock_blocking(next);
>   				clean_tree_block(fs_info, next);
>   				btrfs_wait_tree_block_writeback(next);
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check
  2018-08-01  2:37 ` [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check Qu Wenruo
@ 2018-08-01  3:18   ` Su Yue
  2019-01-14 11:09   ` Filipe Manana
  1 sibling, 0 replies; 14+ messages in thread
From: Su Yue @ 2018-08-01  3:18 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 08/01/2018 10:37 AM, Qu Wenruo wrote:
> This patch will introduce chunk <-> dev extent mapping check, to protect
> us against invalid dev extents or chunks.
> 
> Since chunk mapping is the fundamental infrastructure of btrfs, extra
> check at mount time could prevent a lot of unexpected behavior (BUG_ON).
> 
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
> Signed-off-by: Qu Wenruo <wqu@suse.com>

LGTM.
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>

> ---
>   fs/btrfs/disk-io.c |   7 ++
>   fs/btrfs/volumes.c | 183 +++++++++++++++++++++++++++++++++++++++++++++
>   fs/btrfs/volumes.h |   2 +
>   3 files changed, 192 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 205092dc9390..068ca7498e94 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3075,6 +3075,13 @@ int open_ctree(struct super_block *sb,
>   	fs_info->generation = generation;
>   	fs_info->last_trans_committed = generation;
>   
> +	ret = btrfs_verify_dev_extents(fs_info);
> +	if (ret) {
> +		btrfs_err(fs_info,
> +			  "failed to verify dev extents against chunks: %d",
> +			  ret);
> +		goto fail_block_groups;
> +	}
>   	ret = btrfs_recover_balance(fs_info);
>   	if (ret) {
>   		btrfs_err(fs_info, "failed to recover balance: %d", ret);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e6a8e4aabc66..467a589854fa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6440,6 +6440,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>   	map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk);
>   	map->type = btrfs_chunk_type(leaf, chunk);
>   	map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
> +	map->verified_stripes = 0;
>   	for (i = 0; i < num_stripes; i++) {
>   		map->stripes[i].physical =
>   			btrfs_stripe_offset_nr(leaf, chunk, i);
> @@ -7295,3 +7296,185 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
>   		fs_devices = fs_devices->seed;
>   	}
>   }
> +
> +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
> +{
> +	int index = btrfs_bg_flags_to_raid_index(type);
> +	int ncopies = btrfs_raid_array[index].ncopies;
> +	int data_stripes;
> +
> +	switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> +	case BTRFS_BLOCK_GROUP_RAID5:
> +		data_stripes = num_stripes - 1;
> +		break;
> +	case BTRFS_BLOCK_GROUP_RAID6:
> +		data_stripes = num_stripes - 2;
> +		break;
> +	default:
> +		data_stripes = num_stripes / ncopies;
> +		break;
> +	}
> +	return div_u64(chunk_len, data_stripes);
> +}
> +static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
> +				 u64 chunk_offset, u64 devid,
> +				 u64 physical_offset, u64 physical_len)
> +{
> +	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
> +	struct extent_map *em;
> +	struct map_lookup *map;
> +	u64 stripe_len;
> +	bool found = false;
> +	int ret = 0;
> +	int i;
> +
> +	read_lock(&em_tree->lock);
> +	em = lookup_extent_mapping(em_tree, chunk_offset, 1);
> +	read_unlock(&em_tree->lock);
> +
> +	if (!em) {
> +		ret = -EUCLEAN;
> +		btrfs_err(fs_info,
> +		"dev extent (%llu, %llu) doesn't have corresponding chunk",
> +			  devid, physical_offset);
> +		goto out;
> +	}
> +
> +	map = em->map_lookup;
> +	stripe_len = calc_stripe_length(map->type, em->len, map->num_stripes);
> +	if (physical_len != stripe_len) {
> +		btrfs_err(fs_info,
> +"dev extent (%llu, %llu) length doesn't match with chunk %llu, have %llu expect %llu",
> +			  devid, physical_offset, em->start, physical_len,
> +			  stripe_len);
> +		ret = -EUCLEAN;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < map->num_stripes; i++) {
> +		if (map->stripes[i].dev->devid == devid &&
> +		    map->stripes[i].physical == physical_offset) {
> +			found = true;
> +			if (map->verified_stripes >= map->num_stripes) {
> +				btrfs_err(fs_info,
> +			"too many dev extent for chunk %llu is detected",
> +					  em->start);
> +				ret = -EUCLEAN;
> +				goto out;
> +			}
> +			map->verified_stripes++;
> +			break;
> +		}
> +	}
> +	if (!found) {
> +		ret = -EUCLEAN;
> +		btrfs_err(fs_info,
> +			"dev extent (%llu, %llu) has no corresponding chunk",
> +			devid, physical_offset);
> +	}
> +out:
> +	free_extent_map(em);
> +	return ret;
> +}
> +
> +static int verify_chunk_dev_extent_mapping(struct btrfs_fs_info *fs_info)
> +{
> +	struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
> +	struct extent_map *em;
> +	struct rb_node *node;
> +	int ret = 0;
> +
> +	read_lock(&em_tree->lock);
> +	for (node = rb_first(&em_tree->map); node; node = rb_next(node)) {
> +		em = rb_entry(node, struct extent_map, rb_node);
> +		if (em->map_lookup->num_stripes !=
> +		    em->map_lookup->verified_stripes) {
> +			btrfs_err(fs_info,
> +			"chunk %llu has missing dev extent, have %d expect %d",
> +				  em->start, em->map_lookup->verified_stripes,
> +				  em->map_lookup->num_stripes);
> +			ret = -EUCLEAN;
> +			goto out;
> +		}
> +	}
> +out:
> +	read_unlock(&em_tree->lock);
> +	return ret;
> +}
> +
> +/*
> + * Ensure all dev extents are mapped to correct chunk.
> + * Or later chunk allocation/free would cause unexpected behavior.
> + *
> + * NOTE: This will iterate through the whole device tree, which should be
> + * at the same size level of chunk tree.
> + * This would increase mount time by a tiny fraction.
> + */
> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
> +{
> +	struct btrfs_path *path;
> +	struct btrfs_root *root = fs_info->dev_root;
> +	struct btrfs_key key;
> +	int ret = 0;
> +
> +	key.objectid = 1;
> +	key.type = BTRFS_DEV_EXTENT_KEY;
> +	key.offset = 0;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	path->reada = READA_FORWARD;
> +	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +	if (ret < 0)
> +		goto out;
> +
> +	if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> +		ret = btrfs_next_item(root, path);
> +		if (ret < 0)
> +			goto out;
> +		/* No dev extents at all? Not good */
> +		if (ret > 0) {
> +			ret = -EUCLEAN;
> +			goto out;
> +		}
> +	}
> +	while (1) {
> +		struct extent_buffer *leaf = path->nodes[0];
> +		struct btrfs_dev_extent *dext;
> +		int slot = path->slots[0];
> +		u64 chunk_offset;
> +		u64 physical_offset;
> +		u64 physical_len;
> +		u64 devid;
> +
> +		btrfs_item_key_to_cpu(leaf, &key, slot);
> +		if (key.type != BTRFS_DEV_EXTENT_KEY)
> +			break;
> +		devid = key.objectid;
> +		physical_offset = key.offset;
> +
> +		dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
> +		chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
> +		physical_len = btrfs_dev_extent_length(leaf, dext);
> +
> +		ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
> +					    physical_offset, physical_len);
> +		if (ret < 0)
> +			goto out;
> +		ret = btrfs_next_item(root, path);
> +		if (ret < 0)
> +			goto out;
> +		if (ret > 0) {
> +			ret = 0;
> +			break;
> +		}
> +	}
> +
> +	/* Ensure all chunks have corresponding dev extents */
> +	ret = verify_chunk_dev_extent_mapping(fs_info);
> +out:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6d4f38ad9f5c..4301bf2d0534 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -345,6 +345,7 @@ struct map_lookup {
>   	u64 stripe_len;
>   	int num_stripes;
>   	int sub_stripes;
> +	int verified_stripes; /* For mount time dev extent verification */
>   	struct btrfs_bio_stripe stripes[];
>   };
>   
> @@ -559,5 +560,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>   void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>   bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>   					struct btrfs_device *failing_dev);
> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>   
>   #endif
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images
  2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
                   ` (5 preceding siblings ...)
  2018-08-01  2:37 ` [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock Qu Wenruo
@ 2018-08-02 16:40 ` David Sterba
  2018-08-03  0:06   ` Qu Wenruo
  6 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2018-08-02 16:40 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Aug 01, 2018 at 10:37:15AM +0800, Qu Wenruo wrote:
> The branch can be fetched from the following git repo:
> https://github.com/adam900710/linux/tree/tree_checker_enhance
> 
> It's based on v4.18-rc1, with 3 patches already merged into misc-next.
> 
> This patchset introduced the following enhanced validation check:
> 1) chunk/block group/dev extent cross check
>    Unlike extent tree, such cross check can be implemented pretty easy
>    with minimal mount time impact.
>    Now the kernel could do chunk/bg/dev extent check as good as btrfs
>    check.
> 
> 2) Locking test to avoid possible deadlock due to extent tree corruption
>    Unfortunately, for extent tree we can't do really much cross check.
>    Instead we use the selftest from btrfs_tree_lock() to detect and
>    avoid deadlock caused by corrupted extent tree.

Great, thanks.

> The 3rd patch "btrfs: Remove unused function btrfs_account_dev_extents_size()"
> has also been merged into misc-next.
> 
> changelog:
> v2:
>   Added reviewed-by tags from Gu and Nikolay.
>   Address comment from David for the 4th patch
>   Address comment from Gu for the 2nd patch.

Please rather write what did you fix and not who suggested that. There
are many patches and iterations and I don't remember everything.

> 
> Qu Wenruo (6):
>   btrfs: Check each block group has corresponding chunk at mount time
>   btrfs: Verify every chunk has corresponding block group at mount time
>   btrfs: Remove unused function btrfs_account_dev_extents_size()
>   btrfs: Introduce mount time chunk <-> dev extent mapping check
>   btrfs: Exit gracefully when failed to add chunk map

The above merged, with some adjustments of changelogs or the error
messages.

>   btrfs: locking: Allow btrfs_tree_lock() to return error to avoid
>     deadlock

I need to read and understand the explanation you posted, so this patch
is not in misc-next, I'll add it to for-next after I'll do the first
pass review.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images
  2018-08-02 16:40 ` [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images David Sterba
@ 2018-08-03  0:06   ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2018-08-03  0:06 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2604 bytes --]



On 2018年08月03日 00:40, David Sterba wrote:
> On Wed, Aug 01, 2018 at 10:37:15AM +0800, Qu Wenruo wrote:
>> The branch can be fetched from the following git repo:
>> https://github.com/adam900710/linux/tree/tree_checker_enhance
>>
>> It's based on v4.18-rc1, with 3 patches already merged into misc-next.
>>
>> This patchset introduced the following enhanced validation check:
>> 1) chunk/block group/dev extent cross check
>>    Unlike extent tree, such cross check can be implemented pretty easy
>>    with minimal mount time impact.
>>    Now the kernel could do chunk/bg/dev extent check as good as btrfs
>>    check.
>>
>> 2) Locking test to avoid possible deadlock due to extent tree corruption
>>    Unfortunately, for extent tree we can't do really much cross check.
>>    Instead we use the selftest from btrfs_tree_lock() to detect and
>>    avoid deadlock caused by corrupted extent tree.
> 
> Great, thanks.
> 
>> The 3rd patch "btrfs: Remove unused function btrfs_account_dev_extents_size()"
>> has also been merged into misc-next.
>>
>> changelog:
>> v2:
>>   Added reviewed-by tags from Gu and Nikolay.
>>   Address comment from David for the 4th patch
>>   Address comment from Gu for the 2nd patch.
> 
> Please rather write what did you fix and not who suggested that. There
> are many patches and iterations and I don't remember everything.

Sorry for that.

And for this patchset, the changelog should be:
  Added reviewed-by tags.
  Fixed unused variable and Link: tag.
  Moved checks to find_first_block_group() to reduce duplication.

> 
>>
>> Qu Wenruo (6):
>>   btrfs: Check each block group has corresponding chunk at mount time
>>   btrfs: Verify every chunk has corresponding block group at mount time
>>   btrfs: Remove unused function btrfs_account_dev_extents_size()
>>   btrfs: Introduce mount time chunk <-> dev extent mapping check
>>   btrfs: Exit gracefully when failed to add chunk map
> 
> The above merged, with some adjustments of changelogs or the error
> messages.
> 
>>   btrfs: locking: Allow btrfs_tree_lock() to return error to avoid
>>     deadlock
> 
> I need to read and understand the explanation you posted, so this patch
> is not in misc-next, I'll add it to for-next after I'll do the first
> pass review.

Feel free to ask if there is anything unclear.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check
  2018-08-01  2:37 ` [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check Qu Wenruo
  2018-08-01  3:18   ` Su Yue
@ 2019-01-14 11:09   ` Filipe Manana
  2019-01-14 11:28     ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: Filipe Manana @ 2019-01-14 11:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Aug 1, 2018 at 3:39 AM Qu Wenruo <wqu@suse.com> wrote:
>
> This patch will introduce chunk <-> dev extent mapping check, to protect
> us against invalid dev extents or chunks.
>
> Since chunk mapping is the fundamental infrastructure of btrfs, extra
> check at mount time could prevent a lot of unexpected behavior (BUG_ON).
>
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Btw, this makes at least one test case from btrfs-progs fail:

root 17:12:02 /home/fdmanana/git/hub/btrfs-progs/tests ((v4.19.1))>
TEST=021\* ./misc-tests.sh
    [TEST/misc]   021-image-multi-devices
failed: mount /dev/loop2 /home/fdmanana/git/hub/btrfs-progs/tests//mnt
test failed for case 021-image-multi-devices

dmesg/syslog has:

[432229.206699] BTRFS error (device loop0): dev extent physical offset
22020096 devid 1 has no corresponding chunk
[432229.207497] BTRFS error (device loop0): failed to find devid 1
[432229.208281] BTRFS error (device loop0): failed to verify dev
extents against chunks: -117
[432229.246286] BTRFS error (device loop0): open_ctree failed

Thanks.

> ---
>  fs/btrfs/disk-io.c |   7 ++
>  fs/btrfs/volumes.c | 183 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/volumes.h |   2 +
>  3 files changed, 192 insertions(+)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 205092dc9390..068ca7498e94 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3075,6 +3075,13 @@ int open_ctree(struct super_block *sb,
>         fs_info->generation = generation;
>         fs_info->last_trans_committed = generation;
>
> +       ret = btrfs_verify_dev_extents(fs_info);
> +       if (ret) {
> +               btrfs_err(fs_info,
> +                         "failed to verify dev extents against chunks: %d",
> +                         ret);
> +               goto fail_block_groups;
> +       }
>         ret = btrfs_recover_balance(fs_info);
>         if (ret) {
>                 btrfs_err(fs_info, "failed to recover balance: %d", ret);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e6a8e4aabc66..467a589854fa 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6440,6 +6440,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>         map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk);
>         map->type = btrfs_chunk_type(leaf, chunk);
>         map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
> +       map->verified_stripes = 0;
>         for (i = 0; i < num_stripes; i++) {
>                 map->stripes[i].physical =
>                         btrfs_stripe_offset_nr(leaf, chunk, i);
> @@ -7295,3 +7296,185 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
>                 fs_devices = fs_devices->seed;
>         }
>  }
> +
> +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
> +{
> +       int index = btrfs_bg_flags_to_raid_index(type);
> +       int ncopies = btrfs_raid_array[index].ncopies;
> +       int data_stripes;
> +
> +       switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> +       case BTRFS_BLOCK_GROUP_RAID5:
> +               data_stripes = num_stripes - 1;
> +               break;
> +       case BTRFS_BLOCK_GROUP_RAID6:
> +               data_stripes = num_stripes - 2;
> +               break;
> +       default:
> +               data_stripes = num_stripes / ncopies;
> +               break;
> +       }
> +       return div_u64(chunk_len, data_stripes);
> +}
> +static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
> +                                u64 chunk_offset, u64 devid,
> +                                u64 physical_offset, u64 physical_len)
> +{
> +       struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
> +       struct extent_map *em;
> +       struct map_lookup *map;
> +       u64 stripe_len;
> +       bool found = false;
> +       int ret = 0;
> +       int i;
> +
> +       read_lock(&em_tree->lock);
> +       em = lookup_extent_mapping(em_tree, chunk_offset, 1);
> +       read_unlock(&em_tree->lock);
> +
> +       if (!em) {
> +               ret = -EUCLEAN;
> +               btrfs_err(fs_info,
> +               "dev extent (%llu, %llu) doesn't have corresponding chunk",
> +                         devid, physical_offset);
> +               goto out;
> +       }
> +
> +       map = em->map_lookup;
> +       stripe_len = calc_stripe_length(map->type, em->len, map->num_stripes);
> +       if (physical_len != stripe_len) {
> +               btrfs_err(fs_info,
> +"dev extent (%llu, %llu) length doesn't match with chunk %llu, have %llu expect %llu",
> +                         devid, physical_offset, em->start, physical_len,
> +                         stripe_len);
> +               ret = -EUCLEAN;
> +               goto out;
> +       }
> +
> +       for (i = 0; i < map->num_stripes; i++) {
> +               if (map->stripes[i].dev->devid == devid &&
> +                   map->stripes[i].physical == physical_offset) {
> +                       found = true;
> +                       if (map->verified_stripes >= map->num_stripes) {
> +                               btrfs_err(fs_info,
> +                       "too many dev extent for chunk %llu is detected",
> +                                         em->start);
> +                               ret = -EUCLEAN;
> +                               goto out;
> +                       }
> +                       map->verified_stripes++;
> +                       break;
> +               }
> +       }
> +       if (!found) {
> +               ret = -EUCLEAN;
> +               btrfs_err(fs_info,
> +                       "dev extent (%llu, %llu) has no corresponding chunk",
> +                       devid, physical_offset);
> +       }
> +out:
> +       free_extent_map(em);
> +       return ret;
> +}
> +
> +static int verify_chunk_dev_extent_mapping(struct btrfs_fs_info *fs_info)
> +{
> +       struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
> +       struct extent_map *em;
> +       struct rb_node *node;
> +       int ret = 0;
> +
> +       read_lock(&em_tree->lock);
> +       for (node = rb_first(&em_tree->map); node; node = rb_next(node)) {
> +               em = rb_entry(node, struct extent_map, rb_node);
> +               if (em->map_lookup->num_stripes !=
> +                   em->map_lookup->verified_stripes) {
> +                       btrfs_err(fs_info,
> +                       "chunk %llu has missing dev extent, have %d expect %d",
> +                                 em->start, em->map_lookup->verified_stripes,
> +                                 em->map_lookup->num_stripes);
> +                       ret = -EUCLEAN;
> +                       goto out;
> +               }
> +       }
> +out:
> +       read_unlock(&em_tree->lock);
> +       return ret;
> +}
> +
> +/*
> + * Ensure all dev extents are mapped to correct chunk.
> + * Or later chunk allocation/free would cause unexpected behavior.
> + *
> + * NOTE: This will iterate through the whole device tree, which should be
> + * at the same size level of chunk tree.
> + * This would increase mount time by a tiny fraction.
> + */
> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
> +{
> +       struct btrfs_path *path;
> +       struct btrfs_root *root = fs_info->dev_root;
> +       struct btrfs_key key;
> +       int ret = 0;
> +
> +       key.objectid = 1;
> +       key.type = BTRFS_DEV_EXTENT_KEY;
> +       key.offset = 0;
> +
> +       path = btrfs_alloc_path();
> +       if (!path)
> +               return -ENOMEM;
> +
> +       path->reada = READA_FORWARD;
> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +       if (ret < 0)
> +               goto out;
> +
> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> +               ret = btrfs_next_item(root, path);
> +               if (ret < 0)
> +                       goto out;
> +               /* No dev extents at all? Not good */
> +               if (ret > 0) {
> +                       ret = -EUCLEAN;
> +                       goto out;
> +               }
> +       }
> +       while (1) {
> +               struct extent_buffer *leaf = path->nodes[0];
> +               struct btrfs_dev_extent *dext;
> +               int slot = path->slots[0];
> +               u64 chunk_offset;
> +               u64 physical_offset;
> +               u64 physical_len;
> +               u64 devid;
> +
> +               btrfs_item_key_to_cpu(leaf, &key, slot);
> +               if (key.type != BTRFS_DEV_EXTENT_KEY)
> +                       break;
> +               devid = key.objectid;
> +               physical_offset = key.offset;
> +
> +               dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
> +               chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
> +               physical_len = btrfs_dev_extent_length(leaf, dext);
> +
> +               ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
> +                                           physical_offset, physical_len);
> +               if (ret < 0)
> +                       goto out;
> +               ret = btrfs_next_item(root, path);
> +               if (ret < 0)
> +                       goto out;
> +               if (ret > 0) {
> +                       ret = 0;
> +                       break;
> +               }
> +       }
> +
> +       /* Ensure all chunks have corresponding dev extents */
> +       ret = verify_chunk_dev_extent_mapping(fs_info);
> +out:
> +       btrfs_free_path(path);
> +       return ret;
> +}
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6d4f38ad9f5c..4301bf2d0534 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -345,6 +345,7 @@ struct map_lookup {
>         u64 stripe_len;
>         int num_stripes;
>         int sub_stripes;
> +       int verified_stripes; /* For mount time dev extent verification */
>         struct btrfs_bio_stripe stripes[];
>  };
>
> @@ -559,5 +560,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>                                         struct btrfs_device *failing_dev);
> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>
>  #endif
> --
> 2.18.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check
  2019-01-14 11:09   ` Filipe Manana
@ 2019-01-14 11:28     ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2019-01-14 11:28 UTC (permalink / raw)
  To: fdmanana, Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 11973 bytes --]



On 2019/1/14 下午7:09, Filipe Manana wrote:
> On Wed, Aug 1, 2018 at 3:39 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> This patch will introduce chunk <-> dev extent mapping check, to protect
>> us against invalid dev extents or chunks.
>>
>> Since chunk mapping is the fundamental infrastructure of btrfs, extra
>> check at mount time could prevent a lot of unexpected behavior (BUG_ON).
>>
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200407
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Btw, this makes at least one test case from btrfs-progs fail:
> 
> root 17:12:02 /home/fdmanana/git/hub/btrfs-progs/tests ((v4.19.1))>
> TEST=021\* ./misc-tests.sh
>     [TEST/misc]   021-image-multi-devices
> failed: mount /dev/loop2 /home/fdmanana/git/hub/btrfs-progs/tests//mnt
> test failed for case 021-image-multi-devices

That is fixed by the following commits already in devel:
9996feb94d btrfs-progs: misc-tests/021: Do extra btrfs check before mounting
a1a98ee7a8 btrfs-progs: image: Remove all existing dev extents for later
rebuild
e6c1fa297a btrfs-progs: volumes: Refactor btrfs_alloc_dev_extent() into
two functions
9a65b425bb btrfs-progs: image: Fix block group item flags when restoring
multi-device image to single device
ca73162b48 btrfs-progs: image: Refactor fixup_devices() to
fixup_chunks_and_devices()

And they are pretty early detected and merged, just after v4.19.1.

Thanks,
Qu





> 
> dmesg/syslog has:
> 
> [432229.206699] BTRFS error (device loop0): dev extent physical offset
> 22020096 devid 1 has no corresponding chunk
> [432229.207497] BTRFS error (device loop0): failed to find devid 1
> [432229.208281] BTRFS error (device loop0): failed to verify dev
> extents against chunks: -117
> [432229.246286] BTRFS error (device loop0): open_ctree failed
> 
> Thanks.
> 
>> ---
>>  fs/btrfs/disk-io.c |   7 ++
>>  fs/btrfs/volumes.c | 183 +++++++++++++++++++++++++++++++++++++++++++++
>>  fs/btrfs/volumes.h |   2 +
>>  3 files changed, 192 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 205092dc9390..068ca7498e94 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3075,6 +3075,13 @@ int open_ctree(struct super_block *sb,
>>         fs_info->generation = generation;
>>         fs_info->last_trans_committed = generation;
>>
>> +       ret = btrfs_verify_dev_extents(fs_info);
>> +       if (ret) {
>> +               btrfs_err(fs_info,
>> +                         "failed to verify dev extents against chunks: %d",
>> +                         ret);
>> +               goto fail_block_groups;
>> +       }
>>         ret = btrfs_recover_balance(fs_info);
>>         if (ret) {
>>                 btrfs_err(fs_info, "failed to recover balance: %d", ret);
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index e6a8e4aabc66..467a589854fa 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6440,6 +6440,7 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key *key,
>>         map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk);
>>         map->type = btrfs_chunk_type(leaf, chunk);
>>         map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
>> +       map->verified_stripes = 0;
>>         for (i = 0; i < num_stripes; i++) {
>>                 map->stripes[i].physical =
>>                         btrfs_stripe_offset_nr(leaf, chunk, i);
>> @@ -7295,3 +7296,185 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info)
>>                 fs_devices = fs_devices->seed;
>>         }
>>  }
>> +
>> +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
>> +{
>> +       int index = btrfs_bg_flags_to_raid_index(type);
>> +       int ncopies = btrfs_raid_array[index].ncopies;
>> +       int data_stripes;
>> +
>> +       switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
>> +       case BTRFS_BLOCK_GROUP_RAID5:
>> +               data_stripes = num_stripes - 1;
>> +               break;
>> +       case BTRFS_BLOCK_GROUP_RAID6:
>> +               data_stripes = num_stripes - 2;
>> +               break;
>> +       default:
>> +               data_stripes = num_stripes / ncopies;
>> +               break;
>> +       }
>> +       return div_u64(chunk_len, data_stripes);
>> +}
>> +static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
>> +                                u64 chunk_offset, u64 devid,
>> +                                u64 physical_offset, u64 physical_len)
>> +{
>> +       struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
>> +       struct extent_map *em;
>> +       struct map_lookup *map;
>> +       u64 stripe_len;
>> +       bool found = false;
>> +       int ret = 0;
>> +       int i;
>> +
>> +       read_lock(&em_tree->lock);
>> +       em = lookup_extent_mapping(em_tree, chunk_offset, 1);
>> +       read_unlock(&em_tree->lock);
>> +
>> +       if (!em) {
>> +               ret = -EUCLEAN;
>> +               btrfs_err(fs_info,
>> +               "dev extent (%llu, %llu) doesn't have corresponding chunk",
>> +                         devid, physical_offset);
>> +               goto out;
>> +       }
>> +
>> +       map = em->map_lookup;
>> +       stripe_len = calc_stripe_length(map->type, em->len, map->num_stripes);
>> +       if (physical_len != stripe_len) {
>> +               btrfs_err(fs_info,
>> +"dev extent (%llu, %llu) length doesn't match with chunk %llu, have %llu expect %llu",
>> +                         devid, physical_offset, em->start, physical_len,
>> +                         stripe_len);
>> +               ret = -EUCLEAN;
>> +               goto out;
>> +       }
>> +
>> +       for (i = 0; i < map->num_stripes; i++) {
>> +               if (map->stripes[i].dev->devid == devid &&
>> +                   map->stripes[i].physical == physical_offset) {
>> +                       found = true;
>> +                       if (map->verified_stripes >= map->num_stripes) {
>> +                               btrfs_err(fs_info,
>> +                       "too many dev extent for chunk %llu is detected",
>> +                                         em->start);
>> +                               ret = -EUCLEAN;
>> +                               goto out;
>> +                       }
>> +                       map->verified_stripes++;
>> +                       break;
>> +               }
>> +       }
>> +       if (!found) {
>> +               ret = -EUCLEAN;
>> +               btrfs_err(fs_info,
>> +                       "dev extent (%llu, %llu) has no corresponding chunk",
>> +                       devid, physical_offset);
>> +       }
>> +out:
>> +       free_extent_map(em);
>> +       return ret;
>> +}
>> +
>> +static int verify_chunk_dev_extent_mapping(struct btrfs_fs_info *fs_info)
>> +{
>> +       struct extent_map_tree *em_tree = &fs_info->mapping_tree.map_tree;
>> +       struct extent_map *em;
>> +       struct rb_node *node;
>> +       int ret = 0;
>> +
>> +       read_lock(&em_tree->lock);
>> +       for (node = rb_first(&em_tree->map); node; node = rb_next(node)) {
>> +               em = rb_entry(node, struct extent_map, rb_node);
>> +               if (em->map_lookup->num_stripes !=
>> +                   em->map_lookup->verified_stripes) {
>> +                       btrfs_err(fs_info,
>> +                       "chunk %llu has missing dev extent, have %d expect %d",
>> +                                 em->start, em->map_lookup->verified_stripes,
>> +                                 em->map_lookup->num_stripes);
>> +                       ret = -EUCLEAN;
>> +                       goto out;
>> +               }
>> +       }
>> +out:
>> +       read_unlock(&em_tree->lock);
>> +       return ret;
>> +}
>> +
>> +/*
>> + * Ensure all dev extents are mapped to correct chunk.
>> + * Or later chunk allocation/free would cause unexpected behavior.
>> + *
>> + * NOTE: This will iterate through the whole device tree, which should be
>> + * at the same size level of chunk tree.
>> + * This would increase mount time by a tiny fraction.
>> + */
>> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info)
>> +{
>> +       struct btrfs_path *path;
>> +       struct btrfs_root *root = fs_info->dev_root;
>> +       struct btrfs_key key;
>> +       int ret = 0;
>> +
>> +       key.objectid = 1;
>> +       key.type = BTRFS_DEV_EXTENT_KEY;
>> +       key.offset = 0;
>> +
>> +       path = btrfs_alloc_path();
>> +       if (!path)
>> +               return -ENOMEM;
>> +
>> +       path->reada = READA_FORWARD;
>> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> +       if (ret < 0)
>> +               goto out;
>> +
>> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
>> +               ret = btrfs_next_item(root, path);
>> +               if (ret < 0)
>> +                       goto out;
>> +               /* No dev extents at all? Not good */
>> +               if (ret > 0) {
>> +                       ret = -EUCLEAN;
>> +                       goto out;
>> +               }
>> +       }
>> +       while (1) {
>> +               struct extent_buffer *leaf = path->nodes[0];
>> +               struct btrfs_dev_extent *dext;
>> +               int slot = path->slots[0];
>> +               u64 chunk_offset;
>> +               u64 physical_offset;
>> +               u64 physical_len;
>> +               u64 devid;
>> +
>> +               btrfs_item_key_to_cpu(leaf, &key, slot);
>> +               if (key.type != BTRFS_DEV_EXTENT_KEY)
>> +                       break;
>> +               devid = key.objectid;
>> +               physical_offset = key.offset;
>> +
>> +               dext = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent);
>> +               chunk_offset = btrfs_dev_extent_chunk_offset(leaf, dext);
>> +               physical_len = btrfs_dev_extent_length(leaf, dext);
>> +
>> +               ret = verify_one_dev_extent(fs_info, chunk_offset, devid,
>> +                                           physical_offset, physical_len);
>> +               if (ret < 0)
>> +                       goto out;
>> +               ret = btrfs_next_item(root, path);
>> +               if (ret < 0)
>> +                       goto out;
>> +               if (ret > 0) {
>> +                       ret = 0;
>> +                       break;
>> +               }
>> +       }
>> +
>> +       /* Ensure all chunks have corresponding dev extents */
>> +       ret = verify_chunk_dev_extent_mapping(fs_info);
>> +out:
>> +       btrfs_free_path(path);
>> +       return ret;
>> +}
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 6d4f38ad9f5c..4301bf2d0534 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -345,6 +345,7 @@ struct map_lookup {
>>         u64 stripe_len;
>>         int num_stripes;
>>         int sub_stripes;
>> +       int verified_stripes; /* For mount time dev extent verification */
>>         struct btrfs_bio_stripe stripes[];
>>  };
>>
>> @@ -559,5 +560,6 @@ void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
>>  void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
>>  bool btrfs_check_rw_degradable(struct btrfs_fs_info *fs_info,
>>                                         struct btrfs_device *failing_dev);
>> +int btrfs_verify_dev_extents(struct btrfs_fs_info *fs_info);
>>
>>  #endif
>> --
>> 2.18.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-14 11:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01  2:37 [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images Qu Wenruo
2018-08-01  2:37 ` [PATCH v2 1/6] btrfs: Check each block group has corresponding chunk at mount time Qu Wenruo
2018-08-01  2:54   ` Su Yue
2018-08-01  2:37 ` [PATCH v2 2/6] btrfs: Verify every chunk has corresponding block group " Qu Wenruo
2018-08-01  2:37 ` [PATCH v2 3/6] btrfs: Remove unused function btrfs_account_dev_extents_size() Qu Wenruo
2018-08-01  2:37 ` [PATCH v2 4/6] btrfs: Introduce mount time chunk <-> dev extent mapping check Qu Wenruo
2018-08-01  3:18   ` Su Yue
2019-01-14 11:09   ` Filipe Manana
2019-01-14 11:28     ` Qu Wenruo
2018-08-01  2:37 ` [PATCH v2 5/6] btrfs: Exit gracefully when failed to add chunk map Qu Wenruo
2018-08-01  2:37 ` [PATCH v2 6/6] btrfs: locking: Allow btrfs_tree_lock() to return error to avoid deadlock Qu Wenruo
2018-08-01  2:55   ` Su Yue
2018-08-02 16:40 ` [PATCH v2 0/6] btrfs: Enhanced validation check for fuzzed images David Sterba
2018-08-03  0:06   ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).