linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images
@ 2019-07-25  6:12 Qu Wenruo
  2019-07-25  6:12 ` [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jungyeon Yoon

Another wave of defence enhancment, including:

- Enhanced eb accessors
  Not really needed for the fuzzed images, as 448de471cd4c
  ("btrfs: Check the first key and level for cached extent buffer")
  already fixed half of the reported images.
  Just add a final layer of safe net.

  Just to complain here, two experienced btrfs developer have got
  confused by @start, @len in functions like read_extent_buffer() with
  logical address.
  The best example to solve the confusion is to check the
  read_extent_buffer() call in btree_read_extent_buffer_pages().

  I'm not sure why this confusion happens or even get spread.
  My guess is the extent_buffer::start naming causing the problem.

  If so, I would definitely rename extent_buffer::start to
  extent_buffer::bytenr at any cost.
  Hopes the new commend will address the problem for now.

- BUG_ON() hunt in __btrfs_free_extent()
  Kill BUG_ON()s in __btrfs_free_extent(), replace with error reporting
  and why it shouldn't happen.

  Also add comment on what __btrfs_free_extent() is designed to do, with
  two dump-tree examples for newcomers.

- BUG_ON() hunt in __btrfs_inc_extent_ref()
  Just like __btrfs_free_extent(), but less comment as
  comment for __btrfs_free_extent() should also work for
  __btrfs_inc_extent_ref(), and __btrfs_inc_extent_ref() has a better
  structure than __btrfs_free_extent().

- Defence against unbalanced empty leaf

- Defence against bad key order across two tree blocks

The last two cases can't be rejected by tree-checker and they are all
cross-eb cases.
Thankfully we can reuse existing first_key check against unbalanced
empty leaf, but needs extra check deep into ctree.c for tree block
merging time check.

Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
[ Not to mail bombarding the report, thus only RB tag in cover letter ]

Changelog:
v2:
- Remove duplicated error message in WARN() call.
  Changed to WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG))
  Also move WARN() after btrfs error message.

- Fix a comment error in __btrfs_free_extent()
  It's not adding refs to a tree block, but adding the same refs
  to an existing tree block ref.
  It's impossible a btrfs tree owning the same tree block directly twice.

- Add comment for eb accessors about @start and @len
  If anyone could tell me why such confusion between @start @len and
  logical address is here, I will definitely solve the root cause no
  matter how many codes need to be modified.

- Use bool to replace int where only two values are returned
  Also rename to follow the bool type.

- Remove one unrelated change for the error handler in
  btrfs_inc_extent_ref()

- Add Reviewed-by tag

Qu Wenruo (5):
  btrfs: extent_io: Do extra check for extent buffer read write
    functions
  btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do
    better comment
  btrfs: Detect unbalanced tree with empty leaf before crashing btree
    operations
  btrfs: extent-tree: Kill the BUG_ON() in
    insert_inline_extent_backref()
  btrfs: ctree: Checking key orders before merged tree blocks

 fs/btrfs/ctree.c        |  68 +++++++++++++++++
 fs/btrfs/disk-io.c      |   8 ++
 fs/btrfs/extent-tree.c  | 164 ++++++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.c    |  76 ++++++++++---------
 fs/btrfs/tree-checker.c |   6 ++
 5 files changed, 271 insertions(+), 51 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
@ 2019-07-25  6:12 ` Qu Wenruo
  2019-07-25  6:44   ` Nikolay Borisov
  2019-07-25  6:12 ` [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs

Although we have start, len check for extent buffer reader/write (e.g.
read_extent_buffer()), those checks has its limitations:
- No overflow check
  Values like start = 1024 len = -1024 can still pass the basic
   (start + len) > eb->len check.

- Checks are not consistent
  For read_extent_buffer() we only check (start + len) against eb->len.
  While for memcmp_extent_buffer() we also check start against eb->len.

- Different error reporting mechanism
  We use WARN() in read_extent_buffer() but BUG() in
  memcpy_extent_buffer().

- Still modify memory if the request is obviously wrong
  In read_extent_buffer() even we find (start + len) > eb->len, we still
  call memset(dst, 0, len), which can eaisly cause memory access error
  if start + len overflows.

To address above problems, this patch creates a new common function to
check such access, check_eb_range().
- Add overflow check
  This function checks start, start + len against eb->len and overflow
  check.

- Unified checks

- Unified error reports
  Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
  And also do btrfs_warn() message for non-debug build.

- Exit ASAP if check fails
  No more possible memory corruption.

- Add extra comment for @start @len used in those functions
  Even experienced developers sometimes get confused with the @start
  @len with logical address in those functions.
  I'm not sure what's the cause, maybe it's the extent_buffer::start
  naming.
  For now, just add some comment.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
[ Inspired by above report, the report itself is already addressed ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 76 +++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index db337e53aab3..d44a629e0cce 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5429,6 +5429,28 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	return ret;
 }
 
+/*
+ * Check if the [start, start + len) range is valid before reading/writing
+ * the eb.
+ * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
+ *
+ * Caller should not touch the dst/src memory if this function returns error.
+ */
+static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
+			  unsigned long len)
+{
+	/* start, start + len should not go beyond eb->len nor overflow */
+	if (unlikely(start > eb->len || start + len > eb->len ||
+		     len > eb->len)) {
+		btrfs_warn(eb->fs_info,
+"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
+			   eb->start, eb->len, start, len);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		return -EINVAL;
+	}
+	return 0;
+}
+
 void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 			unsigned long start, unsigned long len)
 {
@@ -5440,12 +5462,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	if (start + len > eb->len) {
-		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
-		     eb->start, eb->len, start, len);
-		memset(dst, 0, len);
+	if (check_eb_range(eb, start, len))
 		return;
-	}
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5554,8 +5572,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 	int ret = 0;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return -EINVAL;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5609,8 +5627,8 @@ void write_extent_buffer(struct extent_buffer *eb, const void *srcv,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5639,8 +5657,8 @@ void memzero_extent_buffer(struct extent_buffer *eb, unsigned long start,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5684,6 +5702,10 @@ void copy_extent_buffer(struct extent_buffer *dst, struct extent_buffer *src,
 	size_t start_offset = offset_in_page(dst->start);
 	unsigned long i = (start_offset + dst_offset) >> PAGE_SHIFT;
 
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(src, src_offset, len))
+		return;
+
 	WARN_ON(src->len != dst_len);
 
 	offset = offset_in_page(start_offset + dst_offset);
@@ -5872,7 +5894,6 @@ static void copy_pages(struct page *dst_page, struct page *src_page,
 void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 			   unsigned long src_offset, unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
@@ -5880,18 +5901,9 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus src_offset %lu move len %lu dst len %lu",
-			 src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus dst_offset %lu move len %lu dst len %lu",
-			 dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 
 	while (len > 0) {
 		dst_off_in_page = offset_in_page(start_offset + dst_offset);
@@ -5917,7 +5929,6 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 			   unsigned long src_offset, unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
@@ -5927,18 +5938,9 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus src_offset %lu move len %lu len %lu",
-			  src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus dst_offset %lu move len %lu len %lu",
-			  dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 	if (dst_offset < src_offset) {
 		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
 		return;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2019-07-25  6:12 ` [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2019-07-25  6:12 ` Qu Wenruo
  2019-07-25  8:39   ` Nikolay Borisov
  2019-07-30 14:59   ` kbuild test robot
  2019-07-25  6:12 ` [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs

__btrfs_free_extent() is one of the best cases to show how optimization
could make a function hard to read.

In fact __btrfs_free_extent() is only doing two major works:
1. Reduce the refs number of an extent backref
   Either it's an inlined extent backref (inside EXTENT/METADATA item) or
   a keyed extent backref (SHARED_* item).
   We only need to locate that backref line, either reduce the number or
   remove the backref line completely.

2. Update the refs count in EXTENT/METADATA_ITEM

But in real world, we do it in a complex but somewhat efficient way.
During step 1), we will try to locate the EXTENT/METADATA_ITEM without
triggering another btrfs_search_slot() as fast path.

Only when we failed to locate that item, we will trigger another
btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
updated/deleted the backref line.

And we have a lot of restrict check on things like refs_to_drop against
extent refs and special case check for single ref extent.

All of these results:
- 7 BUG_ON()s in a single function
  Although all these BUG_ON() are doing correct check, they're super
  easy to get triggered for fuzzed images.
  It's never a good idea to piss the end user.

- Near 300 lines without much useful comments but a lot of hidden
  conditions
  I believe even the author needs several minutes to recall what the
  code is doing
  Not to mention a lot of BUG_ON() conditions needs to go back tens of
  lines to find out why.

This patch address all these problems by:
- Introduce two examples to show what __btrfs_free_extent() is doing
  One inlined backref case and one keyed case.
  Should cover most cases.

- Kill all BUG_ON()s with proper error message and optional leaf dump

- Add comment to show the overall workflow

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
[ The report triggers one BUG_ON() in __btrfs_free_extent() ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 144 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 131 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5faf057f6f37..74e4b138218e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6848,6 +6848,53 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 	return 0;
 }
 
+/*
+ * Real work happens here to drop one or more refs of @node.
+ *
+ * The work is mostly done in two parts:
+ * 1. Locate the extent refs.
+ *    It's either inlined in EXTENT/METADATA_ITEM or in keyed SHARED_* item.
+ *    Locate it then reduces the refs number or remove the ref line completely.
+ *
+ * 2. Update the refs count in EXTENT/METADATA_ITEM
+ *
+ * Due to the above two operations and possible optimizations, the function
+ * is a little hard to read, but with the following examples, the result
+ * of this function should be pretty easy to get.
+ *
+ * Example:
+ * *** Inlined backref case ***
+ * In extent tree we have:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 2 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *		extent data backref root FS_TREE objectid 257 offset 0 count 1
+ *
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 257, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 1 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *
+ * *** Keyed backref case ***
+ * In extent tree we have:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 754 gen 6 flags DATA
+ *	[...]
+ *	item 2 key (13631488 EXTENT_DATA_REF <HASH>) itemoff 3915 itemsize 28
+ *		extent data backref root FS_TREE objectid 866 offset 0 count 1
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 866, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 753 gen 6 flags DATA
+ * And that (13631488 EXTENT_DATA_REF <HASH>) get removed.
+ */
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			       struct btrfs_delayed_ref_node *node, u64 parent,
 			       u64 root_objectid, u64 owner_objectid,
@@ -6881,7 +6928,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	path->leave_spinning = 1;
 
 	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
-	BUG_ON(!is_data && refs_to_drop != 1);
+
+	if (unlikely(!is_data && refs_to_drop != 1)) {
+		btrfs_crit(info,
+"invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
+			   node->bytenr, refs_to_drop);
+		ret = -EINVAL;
+		btrfs_abort_transaction(trans, ret);
+		goto out;
+	}
 
 	if (is_data)
 		skinny_metadata = false;
@@ -6890,6 +6945,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				    parent, root_objectid, owner_objectid,
 				    owner_offset);
 	if (ret == 0) {
+		/*
+		 * Either the inline backref or the SHARED_DATA_REF/
+		 * SHARED_BLOCK_REF is found
+		 *
+		 * Here is a quick path to locate EXTENT/METADATA_ITEM.
+		 * It's possible the EXTENT/METADATA_ITEM is near current slot.
+		 */
 		extent_slot = path->slots[0];
 		while (extent_slot >= 0) {
 			btrfs_item_key_to_cpu(path->nodes[0], &key,
@@ -6906,13 +6968,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				found_extent = 1;
 				break;
 			}
+
+			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
 			if (path->slots[0] - extent_slot > 5)
 				break;
 			extent_slot--;
 		}
 
 		if (!found_extent) {
-			BUG_ON(iref);
+			if (unlikely(iref)) {
+				btrfs_crit(info,
+"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
+				goto err_dump_abort;
+			}
+			/* Must be SHARED_* item, remove the backref first */
 			ret = remove_extent_backref(trans, path, NULL,
 						    refs_to_drop,
 						    is_data, &last_ref);
@@ -6923,6 +6992,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			btrfs_release_path(path);
 			path->leave_spinning = 1;
 
+
+			/* Slow path to locate EXTENT/METADATA_ITEM */
 			key.objectid = bytenr;
 			key.type = BTRFS_EXTENT_ITEM_KEY;
 			key.offset = num_bytes;
@@ -6997,19 +7068,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
 	    key.type == BTRFS_EXTENT_ITEM_KEY) {
 		struct btrfs_tree_block_info *bi;
-		BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
+		if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
+			btrfs_crit(info,
+"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >= %lu",
+				   key.objectid, key.type, key.offset,
+				   owner_objectid, item_size,
+				   sizeof(*ei) + sizeof(*bi));
+			goto err_dump_abort;
+		}
 		bi = (struct btrfs_tree_block_info *)(ei + 1);
 		WARN_ON(owner_objectid != btrfs_tree_block_level(leaf, bi));
 	}
 
 	refs = btrfs_extent_refs(leaf, ei);
 	if (refs < refs_to_drop) {
-		btrfs_err(info,
-			  "trying to drop %d refs but we only have %Lu for bytenr %Lu",
+		btrfs_crit(info,
+		"trying to drop %d refs but we only have %Lu for bytenr %Lu",
 			  refs_to_drop, refs, bytenr);
-		ret = -EINVAL;
-		btrfs_abort_transaction(trans, ret);
-		goto out;
+		goto err_dump_abort;
 	}
 	refs -= refs_to_drop;
 
@@ -7021,7 +7097,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 		 * be updated by remove_extent_backref
 		 */
 		if (iref) {
-			BUG_ON(!found_extent);
+			if (unlikely(!found_extent)) {
+				btrfs_crit(info,
+"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found");
+				goto err_dump_abort;
+			}
 		} else {
 			btrfs_set_extent_refs(leaf, ei, refs);
 			btrfs_mark_buffer_dirty(leaf);
@@ -7036,13 +7116,36 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			}
 		}
 	} else {
+		/* In this branch refs == 1 */
 		if (found_extent) {
-			BUG_ON(is_data && refs_to_drop !=
-			       extent_data_ref_count(path, iref));
+			if (is_data && refs_to_drop !=
+			    extent_data_ref_count(path, iref)) {
+				btrfs_crit(info,
+		"invalid refs_to_drop, current refs %u refs_to_drop %u",
+					   extent_data_ref_count(path, iref),
+					   refs_to_drop);
+				goto err_dump_abort;
+			}
 			if (iref) {
-				BUG_ON(path->slots[0] != extent_slot);
+				if (path->slots[0] != extent_slot) {
+					btrfs_crit(info,
+"invalid iref, extent item key (%llu %u %llu) doesn't has wanted iref",
+						   key.objectid, key.type,
+						   key.offset);
+					goto err_dump_abort;
+				}
 			} else {
-				BUG_ON(path->slots[0] != extent_slot + 1);
+				/*
+				 * No inline ref, we must at SHARED_* item,
+				 * And it's single ref, it must be:
+				 * |	extent_slot	  ||extent_slot + 1|
+				 * [ EXTENT/METADATA_ITEM ][ SHARED_* ITEM ]
+				 */
+				if (path->slots[0] != extent_slot + 1) {
+					btrfs_crit(info,
+	"invalid SHARED_* item, previous item is not EXTENT/METADATA_ITEM");
+					goto err_dump_abort;
+				}
 				path->slots[0] = extent_slot;
 				num_to_del = 2;
 			}
@@ -7082,6 +7185,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 out:
 	btrfs_free_path(path);
 	return ret;
+err_dump_abort:
+	/*
+	 * Leaf dump can take up a lot of dmesg buffer since default nodesize
+	 * is already 16K.
+	 * So we only do full leaf dump for debug build.
+	 */
+	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+		btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
+			   path->slots[0], extent_slot);
+		btrfs_print_leaf(path->nodes[0]);
+	}
+
+	btrfs_abort_transaction(trans, -EUCLEAN);
+	btrfs_free_path(path);
+	return -EUCLEAN;
 }
 
 /*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2019-07-25  6:12 ` [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
  2019-07-25  6:12 ` [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2019-07-25  6:12 ` Qu Wenruo
  2019-07-25  9:26   ` Nikolay Borisov
  2019-08-06 13:58   ` David Sterba
  2019-07-25  6:12 ` [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With crafted image, btrfs will panic at btree operations:
  kernel BUG at fs/btrfs/ctree.c:3894!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:__push_leaf_left+0x6b6/0x6e0
  Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
  RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
  RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
  RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
  R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
  FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
  Call Trace:
  ? _cond_resched+0x1a/0x50
  push_leaf_left+0x179/0x190
  btrfs_del_items+0x316/0x470
  btrfs_del_csums+0x215/0x3a0
  __btrfs_free_extent.isra.72+0x5a7/0xbe0
  __btrfs_run_delayed_refs+0x539/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace c2425e6e89b5558f ]---

[CAUSE]
The offending csum tree looks like this:
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        ...
        key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
        key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
        key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
        ...

leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
        item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
                range start 85975040 end 85983232 length 8192
        ...
leaf 29642752 items 0 free space 3995 generation 17 owner 0
                    ^ empty leaf            invalid owner ^

leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
        item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
                range start 92274688 end 95723520 length 3448832

So we have a corrupted csum tree where one tree leaf is completely
empty, causing unbalanced btree, thus leading to unexpected btree
balance error.

[FIX]
For this particular case, we handle it in two directions to catch it:
- Check if the tree block is empty through btrfs_verify_level_key()
  So that invalid tree blocks won't be read out through
  btrfs_search_slot() and its variants.

- Check 0 tree owner in tree checker
  NO tree is using 0 as its tree owner, detect it and reject at tree
  block read time.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c      | 8 ++++++++
 fs/btrfs/tree-checker.c | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index deb74a8c191a..a843c21f3060 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -414,6 +414,14 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
 
 	if (!first_key)
 		return 0;
+	/* We have @first_key, so this @eb must have at least one item */
+	if (btrfs_header_nritems(eb) == 0) {
+		btrfs_err(fs_info,
+		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
+			  eb->start);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		return -EUCLEAN;
+	}
 
 	/*
 	 * For live tree block (new tree blocks in current transaction),
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 96fce4bef4e7..a4c7f7ed8490 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -888,6 +888,12 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
 				    owner);
 			return -EUCLEAN;
 		}
+		/* Unknown tree */
+		if (owner == 0) {
+			generic_err(leaf, 0,
+				"invalid owner, root 0 is not defined");
+			return -EUCLEAN;
+		}
 		return 0;
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-07-25  6:12 ` [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2019-07-25  6:12 ` Qu Wenruo
  2019-07-25 10:20   ` Nikolay Borisov
  2019-07-25  6:12 ` [PATCH v2 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
  2019-07-25  6:49 ` [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Nikolay Borisov
  5 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With crafted image, btrfs can panic at insert_inline_extent_backref():
  kernel BUG at fs/btrfs/extent-tree.c:1857!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
  Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
  RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
  FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
  Call Trace:
  ? _cond_resched+0x1a/0x50
  __btrfs_inc_extent_ref.isra.64+0x7e/0x240
  ? btrfs_merge_delayed_refs+0xa5/0x330
  __btrfs_run_delayed_refs+0x653/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 2ad8b3de903cf825 ]---

[CAUSE]
Due to extent tree corruption (still valid by itself, but bad cross ref),
we can allocate an extent which is still in extent tree.
The offending tree block of that case is from csum tree.
The newly allocated tree block is also for csum tree.

Then we will try to insert an tree block ref for the existing tree block
ref.

For btrfs tree extent item, a tree block can never be shared directly by
the same tree twice.
We have such BUG_ON() to prevent such problem, but BUG_ON() is
definitely not good enough.

[FIX]
Replace that BUG_ON() with proper error message and leaf dump for debug
build.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 74e4b138218e..19dd9148a3aa 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1856,7 +1856,22 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
 					   num_bytes, parent, root_objectid,
 					   owner, offset, 1);
 	if (ret == 0) {
-		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		/*
+		 * We're adding refs to an tree block we already own, this
+		 * should not happen at all.
+		 */
+		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+			btrfs_crit(trans->fs_info,
+"invalid operation, adding refs to an existing tree ref, bytenr=%llu num_bytes=%llu root_objectid=%llu",
+				   bytenr, num_bytes, root_objectid);
+			if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+				WARN_ON(1);
+				btrfs_crit(trans->fs_info,
+			"path->slots[0]=%d path->nodes[0]:", path->slots[0]);
+				btrfs_print_leaf(path->nodes[0]);
+			}
+			return -EUCLEAN;
+		}
 		update_inline_extent_backref(path, iref, refs_to_add,
 					     extent_op, NULL);
 	} else if (ret == -ENOENT) {
@@ -2073,6 +2088,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 /*
  * __btrfs_inc_extent_ref - insert backreference for a given extent
  *
+ * The work is opposite as __btrfs_free_extent().
+ * For more info about how it works or examples, refer to __btrfs_free_extent().
+ *
  * @trans:	    Handle of transaction
  *
  * @node:	    The delayed ref node used to get the bytenr/length for
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-07-25  6:12 ` [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2019-07-25  6:12 ` Qu Wenruo
  2019-07-25  6:49 ` [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Nikolay Borisov
  5 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:12 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov

[BUG]
With crafted image, btrfs can panic at btrfs_del_csums().
  kernel BUG at fs/btrfs/ctree.c:3188!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
  Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
  RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
  RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
  R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
  FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
  Call Trace:
  truncate_one_csum+0xac/0xf0
  btrfs_del_csums+0x24f/0x3a0
  __btrfs_free_extent.isra.72+0x5a7/0xbe0
  __btrfs_run_delayed_refs+0x539/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 93bf9db00e6c374e ]---

[CAUSE]
This crafted image has a very tricky key order corruption:

  checksum tree key (CSUM_TREE ROOT_ITEM 0)
  node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
          ...
          key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
          key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
          ...

  leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
          item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
                  range start 73785344 end 75497472 length 1712128
          item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
                  range start 75497472 end 75501568 length 4096
          item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
                  range start 75501568 end 77283328 length 1781760
          item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
                  range start 77283328 end 77287424 length 4096
          item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
                  range start 4120596480 end 4120903680 length 307200
  leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
          item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
                  range start 77594624 end 79306752 length 1712128
          ...

Note the item 4 key of leaf 29757440, which is obviously too large, and
even larger than the first key of the next leaf.

However it still follows the key order in that tree block, thus tree
checker is unable to detect it at read time, since tree checker can only
work inside a leaf, thus such complex corruption can't be rejected in
advance.

[FIX]
The next timing to detect such problem is at tree block merge time,
which is in push_node_left(), balance_node_right(), push_leaf_left() and
push_leaf_right().

Now we check if the key order of the right most key of the left node is
larger than the left most key of the right node.

By this we don't need to call the full tree-check, while still keeps the
key order correct as key order in each node is already checked by tree
checker thus we only need to check the above two slots.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
---
 fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5df76c17775a..af884cf3e2f4 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3219,6 +3219,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
 		fixup_low_keys(path, &disk_key, 1);
 }
 
+/*
+ * Check the cross tree block key ordering.
+ *
+ * Tree-checker only works inside one tree block, thus the following
+ * corruption can not be rejected by tree-checker:
+ * Leaf @left			| Leaf @right
+ * --------------------------------------------------------------
+ * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
+ *
+ * Key f6 in leaf @left itself is valid, but not valid when the next
+ * key in leaf @right is 7.
+ * This can only be checked at tree block merge time.
+ * And since tree checker has ensured all key order in each tree block
+ * is correct, we only need to bother the last key of @left and the first
+ * key of @right.
+ */
+static bool valid_cross_tree_key_order(struct extent_buffer *left,
+				       struct extent_buffer *right)
+{
+	struct btrfs_key left_last;
+	struct btrfs_key right_first;
+	int level = btrfs_header_level(left);
+	int nr_left = btrfs_header_nritems(left);
+	int nr_right = btrfs_header_nritems(right);
+
+	/* No key to check in one of the tree blocks */
+	if (!nr_left || !nr_right)
+		return true;
+	if (level) {
+		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_node_key_to_cpu(right, &right_first, 0);
+	} else {
+		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_item_key_to_cpu(right, &right_first, 0);
+	}
+	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+		btrfs_crit(left->fs_info,
+"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
+			   left_last.objectid, left_last.type,
+			   left_last.offset, right_first.objectid,
+			   right_first.type, right_first.offset);
+		return false;
+	}
+	return true;
+}
+
 /*
  * try to push data from one node into the next node left in the
  * tree.
@@ -3263,6 +3309,12 @@ static int push_node_left(struct btrfs_trans_handle *trans,
 	} else
 		push_items = min(src_nritems - 8, push_items);
 
+	/* dst is the left eb, src is the middle eb */
+	if (!valid_cross_tree_key_order(dst, src)) {
+		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
+		return ret;
+	}
 	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
 	if (ret) {
 		btrfs_abort_transaction(trans, ret);
@@ -3331,6 +3383,12 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
 	if (max_push < push_items)
 		push_items = max_push;
 
+	/* dst is the right eb, src is the middle eb */
+	if (!valid_cross_tree_key_order(src, dst)) {
+		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
+		return ret;
+	}
 	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
 	BUG_ON(ret < 0);
 	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
@@ -3810,6 +3868,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 	if (left_nritems == 0)
 		goto out_unlock;
 
+	if (!valid_cross_tree_key_order(left, right)) {
+		ret = -EUCLEAN;
+		btrfs_tree_unlock(right);
+		free_extent_buffer(right);
+		return ret;
+	}
 	if (path->slots[0] == left_nritems && !empty) {
 		/* Key greater than all keys in the leaf, right neighbor has
 		 * enough room for it and we're not emptying our leaf to delete
@@ -4048,6 +4112,10 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 		goto out;
 	}
 
+	if (!valid_cross_tree_key_order(left, right)) {
+		ret = -EUCLEAN;
+		goto out;
+	}
 	return __push_leaf_left(path, min_data_size,
 			       empty, left, free_space, right_nritems,
 			       max_slot);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-25  6:12 ` [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2019-07-25  6:44   ` Nikolay Borisov
  2019-07-25  6:58     ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Nikolay Borisov @ 2019-07-25  6:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
> Although we have start, len check for extent buffer reader/write (e.g.
> read_extent_buffer()), those checks has its limitations:
> - No overflow check
>   Values like start = 1024 len = -1024 can still pass the basic
>    (start + len) > eb->len check.
> 
> - Checks are not consistent
>   For read_extent_buffer() we only check (start + len) against eb->len.
>   While for memcmp_extent_buffer() we also check start against eb->len.
> 
> - Different error reporting mechanism
>   We use WARN() in read_extent_buffer() but BUG() in
>   memcpy_extent_buffer().
> 
> - Still modify memory if the request is obviously wrong
>   In read_extent_buffer() even we find (start + len) > eb->len, we still
>   call memset(dst, 0, len), which can eaisly cause memory access error
>   if start + len overflows.
> 
> To address above problems, this patch creates a new common function to
> check such access, check_eb_range().
> - Add overflow check
>   This function checks start, start + len against eb->len and overflow
>   check.
> 
> - Unified checks
> 
> - Unified error reports
>   Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
>   And also do btrfs_warn() message for non-debug build.
> 
> - Exit ASAP if check fails
>   No more possible memory corruption.
> 
> - Add extra comment for @start @len used in those functions
>   Even experienced developers sometimes get confused with the @start
>   @len with logical address in those functions.
>   I'm not sure what's the cause, maybe it's the extent_buffer::start
>   naming.
>   For now, just add some comment.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
> [ Inspired by above report, the report itself is already addressed ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 76 +++++++++++++++++++++++---------------------
>  1 file changed, 39 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index db337e53aab3..d44a629e0cce 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5429,6 +5429,28 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>  	return ret;
>  }
>  
> +/*
> + * Check if the [start, start + len) range is valid before reading/writing
> + * the eb.
> + * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.

With proper naming a comment like that should be redundant.

> + *
> + * Caller should not touch the dst/src memory if this function returns error.
> + */
> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
> +			  unsigned long len)
> +{
> +	/* start, start + len should not go beyond eb->len nor overflow */
> +	if (unlikely(start > eb->len || start + len > eb->len ||
> +		     len > eb->len)) {
> +		btrfs_warn(eb->fs_info,
> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> +			   eb->start, eb->len, start, len);
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
>  			unsigned long start, unsigned long len)
>  {
> @@ -5440,12 +5462,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	if (start + len > eb->len) {
> -		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
> -		     eb->start, eb->len, start, len);
> -		memset(dst, 0, len);
> +	if (check_eb_range(eb, start, len))
>  		return;
> -	}
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5554,8 +5572,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  	int ret = 0;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return -EINVAL;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5609,8 +5627,8 @@ void write_extent_buffer(struct extent_buffer *eb, const void *srcv,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5639,8 +5657,8 @@ void memzero_extent_buffer(struct extent_buffer *eb, unsigned long start,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5684,6 +5702,10 @@ void copy_extent_buffer(struct extent_buffer *dst, struct extent_buffer *src,
>  	size_t start_offset = offset_in_page(dst->start);
>  	unsigned long i = (start_offset + dst_offset) >> PAGE_SHIFT;
>  
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(src, src_offset, len))
> +		return;
> +
>  	WARN_ON(src->len != dst_len);
>  
>  	offset = offset_in_page(start_offset + dst_offset);
> @@ -5872,7 +5894,6 @@ static void copy_pages(struct page *dst_page, struct page *src_page,
>  void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  			   unsigned long src_offset, unsigned long len)
>  {
> -	struct btrfs_fs_info *fs_info = dst->fs_info;
>  	size_t cur;
>  	size_t dst_off_in_page;
>  	size_t src_off_in_page;
> @@ -5880,18 +5901,9 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  	unsigned long dst_i;
>  	unsigned long src_i;
>  
> -	if (src_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			"memmove bogus src_offset %lu move len %lu dst len %lu",
> -			 src_offset, len, dst->len);
> -		BUG();
> -	}
> -	if (dst_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			"memmove bogus dst_offset %lu move len %lu dst len %lu",
> -			 dst_offset, len, dst->len);
> -		BUG();
> -	}
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(dst, src_offset, len))
> +		return;

I'm not sure about this. If the code expects memcpy_extent_buffer to
never fail then it will make more sense to do the range check outside of
this function. Otherwise it might silently fail and cause mayhem up the
call chain. Or just leave the BUG (I'd rather not).

>  
>  	while (len > 0) {
>  		dst_off_in_page = offset_in_page(start_offset + dst_offset);
> @@ -5917,7 +5929,6 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  			   unsigned long src_offset, unsigned long len)
>  {
> -	struct btrfs_fs_info *fs_info = dst->fs_info;
>  	size_t cur;
>  	size_t dst_off_in_page;
>  	size_t src_off_in_page;
> @@ -5927,18 +5938,9 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  	unsigned long dst_i;
>  	unsigned long src_i;
>  
> -	if (src_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			  "memmove bogus src_offset %lu move len %lu len %lu",
> -			  src_offset, len, dst->len);
> -		BUG();
> -	}
> -	if (dst_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			  "memmove bogus dst_offset %lu move len %lu len %lu",
> -			  dst_offset, len, dst->len);
> -		BUG();
> -	}
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(dst, src_offset, len))
> +		return;

DITTO as previous comment.

>  	if (dst_offset < src_offset) {
>  		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
>  		return;
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images
  2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (4 preceding siblings ...)
  2019-07-25  6:12 ` [PATCH v2 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
@ 2019-07-25  6:49 ` Nikolay Borisov
  5 siblings, 0 replies; 21+ messages in thread
From: Nikolay Borisov @ 2019-07-25  6:49 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Jungyeon Yoon



On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
> Another wave of defence enhancment, including:
> 
> - Enhanced eb accessors
>   Not really needed for the fuzzed images, as 448de471cd4c
>   ("btrfs: Check the first key and level for cached extent buffer")
>   already fixed half of the reported images.
>   Just add a final layer of safe net.
> 
>   Just to complain here, two experienced btrfs developer have got
>   confused by @start, @len in functions like read_extent_buffer() with
>   logical address.
>   The best example to solve the confusion is to check the
>   read_extent_buffer() call in btree_read_extent_buffer_pages().
> 
>   I'm not sure why this confusion happens or even get spread.
>   My guess is the extent_buffer::start naming causing the problem.
> 
>   If so, I would definitely rename extent_buffer::start to
>   extent_buffer::bytenr at any cost.
>   Hopes the new commend will address the problem for now.

it should either be bytenr or disk_bytenr or disk_addr or address.
Looking at the code base though, it seems there is already a convention
that bytenr means the byte number in the logical address space. So
indeed, bytenr should be ok.

> 
> - BUG_ON() hunt in __btrfs_free_extent()
>   Kill BUG_ON()s in __btrfs_free_extent(), replace with error reporting
>   and why it shouldn't happen.
> 
>   Also add comment on what __btrfs_free_extent() is designed to do, with
>   two dump-tree examples for newcomers.
> 
> - BUG_ON() hunt in __btrfs_inc_extent_ref()
>   Just like __btrfs_free_extent(), but less comment as
>   comment for __btrfs_free_extent() should also work for
>   __btrfs_inc_extent_ref(), and __btrfs_inc_extent_ref() has a better
>   structure than __btrfs_free_extent().
> 
> - Defence against unbalanced empty leaf
> 
> - Defence against bad key order across two tree blocks
> 
> The last two cases can't be rejected by tree-checker and they are all
> cross-eb cases.
> Thankfully we can reuse existing first_key check against unbalanced
> empty leaf, but needs extra check deep into ctree.c for tree block
> merging time check.
> 
> Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
> [ Not to mail bombarding the report, thus only RB tag in cover letter ]
> 
> Changelog:
> v2:
> - Remove duplicated error message in WARN() call.
>   Changed to WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG))
>   Also move WARN() after btrfs error message.
> 
> - Fix a comment error in __btrfs_free_extent()
>   It's not adding refs to a tree block, but adding the same refs
>   to an existing tree block ref.
>   It's impossible a btrfs tree owning the same tree block directly twice.
> 
> - Add comment for eb accessors about @start and @len
>   If anyone could tell me why such confusion between @start @len and
>   logical address is here, I will definitely solve the root cause no
>   matter how many codes need to be modified.
> 
> - Use bool to replace int where only two values are returned
>   Also rename to follow the bool type.
> 
> - Remove one unrelated change for the error handler in
>   btrfs_inc_extent_ref()
> 
> - Add Reviewed-by tag
> 
> Qu Wenruo (5):
>   btrfs: extent_io: Do extra check for extent buffer read write
>     functions
>   btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do
>     better comment
>   btrfs: Detect unbalanced tree with empty leaf before crashing btree
>     operations
>   btrfs: extent-tree: Kill the BUG_ON() in
>     insert_inline_extent_backref()
>   btrfs: ctree: Checking key orders before merged tree blocks
> 
>  fs/btrfs/ctree.c        |  68 +++++++++++++++++
>  fs/btrfs/disk-io.c      |   8 ++
>  fs/btrfs/extent-tree.c  | 164 ++++++++++++++++++++++++++++++++++++----
>  fs/btrfs/extent_io.c    |  76 ++++++++++---------
>  fs/btrfs/tree-checker.c |   6 ++
>  5 files changed, 271 insertions(+), 51 deletions(-)
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-25  6:44   ` Nikolay Borisov
@ 2019-07-25  6:58     ` Qu Wenruo
  2019-07-31 13:47       ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  6:58 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs

[snip]
>> -	if (dst_offset + len > dst->len) {
>> -		btrfs_err(fs_info,
>> -			"memmove bogus dst_offset %lu move len %lu dst len %lu",
>> -			 dst_offset, len, dst->len);
>> -		BUG();
>> -	}
>> +	if (check_eb_range(dst, dst_offset, len) ||
>> +	    check_eb_range(dst, src_offset, len))
>> +		return;
>
> I'm not sure about this. If the code expects memcpy_extent_buffer to
> never fail then it will make more sense to do the range check outside of
> this function. Otherwise it might silently fail and cause mayhem up the
> call chain. Or just leave the BUG (I'd rather not).

Yes, that's also what I'm concerned.

But at least, for that case we're not making things worse.
Furthermore, btrfs tree checker should have already rejected most
invalid trees already.
So I'm not so concerned.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-25  6:12 ` [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2019-07-25  8:39   ` Nikolay Borisov
  2019-07-25  9:31     ` Qu Wenruo
  2019-07-30 14:59   ` kbuild test robot
  1 sibling, 1 reply; 21+ messages in thread
From: Nikolay Borisov @ 2019-07-25  8:39 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
> __btrfs_free_extent() is one of the best cases to show how optimization
> could make a function hard to read.
> 
> In fact __btrfs_free_extent() is only doing two major works:
> 1. Reduce the refs number of an extent backref
>    Either it's an inlined extent backref (inside EXTENT/METADATA item) or
>    a keyed extent backref (SHARED_* item).
>    We only need to locate that backref line, either reduce the number or
>    remove the backref line completely.
> 
> 2. Update the refs count in EXTENT/METADATA_ITEM
> 
> But in real world, we do it in a complex but somewhat efficient way.
> During step 1), we will try to locate the EXTENT/METADATA_ITEM without
> triggering another btrfs_search_slot() as fast path.
> 
> Only when we failed to locate that item, we will trigger another
> btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
> updated/deleted the backref line.
> 
> And we have a lot of restrict check on things like refs_to_drop against
> extent refs and special case check for single ref extent.
> 
> All of these results:
> - 7 BUG_ON()s in a single function
>   Although all these BUG_ON() are doing correct check, they're super
>   easy to get triggered for fuzzed images.
>   It's never a good idea to piss the end user.
> 
> - Near 300 lines without much useful comments but a lot of hidden
>   conditions
>   I believe even the author needs several minutes to recall what the
>   code is doing
>   Not to mention a lot of BUG_ON() conditions needs to go back tens of
>   lines to find out why.
> 
> This patch address all these problems by:
> - Introduce two examples to show what __btrfs_free_extent() is doing
>   One inlined backref case and one keyed case.
>   Should cover most cases.
> 
> - Kill all BUG_ON()s with proper error message and optional leaf dump
> 
> - Add comment to show the overall workflow
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
> [ The report triggers one BUG_ON() in __btrfs_free_extent() ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Changes look simple enough and benign, only thing I wonder is if we
should force the system in RO mode? In any case:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-25  6:12 ` [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2019-07-25  9:26   ` Nikolay Borisov
  2019-07-25  9:34     ` Qu Wenruo
  2019-08-06 13:58   ` David Sterba
  1 sibling, 1 reply; 21+ messages in thread
From: Nikolay Borisov @ 2019-07-25  9:26 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs will panic at btree operations:
>   kernel BUG at fs/btrfs/ctree.c:3894!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:__push_leaf_left+0x6b6/0x6e0
>   Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
>   RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
>   RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
>   RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
>   R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
>   FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
>   Call Trace:
>   ? _cond_resched+0x1a/0x50
>   push_leaf_left+0x179/0x190
>   btrfs_del_items+0x316/0x470
>   btrfs_del_csums+0x215/0x3a0
>   __btrfs_free_extent.isra.72+0x5a7/0xbe0
>   __btrfs_run_delayed_refs+0x539/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace c2425e6e89b5558f ]---
> 
> [CAUSE]
> The offending csum tree looks like this:
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>         ...
>         key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
>         key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
>         key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
>         ...
> 
> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
>         item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
>                 range start 85975040 end 85983232 length 8192
>         ...
> leaf 29642752 items 0 free space 3995 generation 17 owner 0
>                     ^ empty leaf            invalid owner ^
> 
> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
>         item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
>                 range start 92274688 end 95723520 length 3448832
> 
> So we have a corrupted csum tree where one tree leaf is completely
> empty, causing unbalanced btree, thus leading to unexpected btree
> balance error.
> 
> [FIX]
> For this particular case, we handle it in two directions to catch it:
> - Check if the tree block is empty through btrfs_verify_level_key()
>   So that invalid tree blocks won't be read out through
>   btrfs_search_slot() and its variants.
> 
> - Check 0 tree owner in tree checker
>   NO tree is using 0 as its tree owner, detect it and reject at tree
>   block read time.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c      | 8 ++++++++
>  fs/btrfs/tree-checker.c | 6 ++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index deb74a8c191a..a843c21f3060 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -414,6 +414,14 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
>  
>  	if (!first_key)
>  		return 0;
> +	/* We have @first_key, so this @eb must have at least one item */
> +	if (btrfs_header_nritems(eb) == 0) {
> +		btrfs_err(fs_info,
> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> +			  eb->start);
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +		return -EUCLEAN;
> +	}

Shouldn't this check be before !first_key, e.g. we always want to verify
that a particular node has at least 1 item ?

>  
>  	/*
>  	 * For live tree block (new tree blocks in current transaction),
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 96fce4bef4e7..a4c7f7ed8490 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -888,6 +888,12 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
>  				    owner);
>  			return -EUCLEAN;
>  		}
> +		/* Unknown tree */
> +		if (owner == 0) {
> +			generic_err(leaf, 0,
> +				"invalid owner, root 0 is not defined");
> +			return -EUCLEAN;
> +		}
>  		return 0;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-25  8:39   ` Nikolay Borisov
@ 2019-07-25  9:31     ` Qu Wenruo
  0 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  9:31 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2019/7/25 下午4:39, Nikolay Borisov wrote:
>
>
> On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
>> __btrfs_free_extent() is one of the best cases to show how optimization
>> could make a function hard to read.
>>
>> In fact __btrfs_free_extent() is only doing two major works:
>> 1. Reduce the refs number of an extent backref
>>    Either it's an inlined extent backref (inside EXTENT/METADATA item) or
>>    a keyed extent backref (SHARED_* item).
>>    We only need to locate that backref line, either reduce the number or
>>    remove the backref line completely.
>>
>> 2. Update the refs count in EXTENT/METADATA_ITEM
>>
>> But in real world, we do it in a complex but somewhat efficient way.
>> During step 1), we will try to locate the EXTENT/METADATA_ITEM without
>> triggering another btrfs_search_slot() as fast path.
>>
>> Only when we failed to locate that item, we will trigger another
>> btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
>> updated/deleted the backref line.
>>
>> And we have a lot of restrict check on things like refs_to_drop against
>> extent refs and special case check for single ref extent.
>>
>> All of these results:
>> - 7 BUG_ON()s in a single function
>>   Although all these BUG_ON() are doing correct check, they're super
>>   easy to get triggered for fuzzed images.
>>   It's never a good idea to piss the end user.
>>
>> - Near 300 lines without much useful comments but a lot of hidden
>>   conditions
>>   I believe even the author needs several minutes to recall what the
>>   code is doing
>>   Not to mention a lot of BUG_ON() conditions needs to go back tens of
>>   lines to find out why.
>>
>> This patch address all these problems by:
>> - Introduce two examples to show what __btrfs_free_extent() is doing
>>   One inlined backref case and one keyed case.
>>   Should cover most cases.
>>
>> - Kill all BUG_ON()s with proper error message and optional leaf dump
>>
>> - Add comment to show the overall workflow
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
>> [ The report triggers one BUG_ON() in __btrfs_free_extent() ]
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Changes look simple enough and benign, only thing I wonder is if we
> should force the system in RO mode? In any case:

For the forced RO, it's a must.
Any error here already means extent tree is corrupted, we can't ignore
such serious error.

For the timing of forced RO, it doesn't really matter whether we abort
trans in this function or not.
As in __btrfs_run_delayed_refs() we will abort transaction anyway.

Thanks,
Qu

>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-25  9:26   ` Nikolay Borisov
@ 2019-07-25  9:34     ` Qu Wenruo
  0 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-07-25  9:34 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2019/7/25 下午5:26, Nikolay Borisov wrote:
>
>
> On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
>> [BUG]
>> With crafted image, btrfs will panic at btree operations:
>>   kernel BUG at fs/btrfs/ctree.c:3894!
>>   invalid opcode: 0000 [#1] SMP PTI
>>   CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>>   RIP: 0010:__push_leaf_left+0x6b6/0x6e0
>>   Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
>>   RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
>>   RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
>>   RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
>>   RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
>>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
>>   R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
>>   FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
>>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
>>   Call Trace:
>>   ? _cond_resched+0x1a/0x50
>>   push_leaf_left+0x179/0x190
>>   btrfs_del_items+0x316/0x470
>>   btrfs_del_csums+0x215/0x3a0
>>   __btrfs_free_extent.isra.72+0x5a7/0xbe0
>>   __btrfs_run_delayed_refs+0x539/0x1120
>>   btrfs_run_delayed_refs+0xdb/0x1b0
>>   btrfs_commit_transaction+0x52/0x950
>>   ? start_transaction+0x94/0x450
>>   transaction_kthread+0x163/0x190
>>   kthread+0x105/0x140
>>   ? btrfs_cleanup_transaction+0x560/0x560
>>   ? kthread_destroy_worker+0x50/0x50
>>   ret_from_fork+0x35/0x40
>>   Modules linked in:
>>   ---[ end trace c2425e6e89b5558f ]---
>>
>> [CAUSE]
>> The offending csum tree looks like this:
>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>         ...
>>         key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
>>         key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
>>         key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
>>         ...
>>
>> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
>>         item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
>>                 range start 85975040 end 85983232 length 8192
>>         ...
>> leaf 29642752 items 0 free space 3995 generation 17 owner 0
>>                     ^ empty leaf            invalid owner ^
>>
>> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
>>         item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
>>                 range start 92274688 end 95723520 length 3448832
>>
>> So we have a corrupted csum tree where one tree leaf is completely
>> empty, causing unbalanced btree, thus leading to unexpected btree
>> balance error.
>>
>> [FIX]
>> For this particular case, we handle it in two directions to catch it:
>> - Check if the tree block is empty through btrfs_verify_level_key()
>>   So that invalid tree blocks won't be read out through
>>   btrfs_search_slot() and its variants.
>>
>> - Check 0 tree owner in tree checker
>>   NO tree is using 0 as its tree owner, detect it and reject at tree
>>   block read time.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/disk-io.c      | 8 ++++++++
>>  fs/btrfs/tree-checker.c | 6 ++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index deb74a8c191a..a843c21f3060 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -414,6 +414,14 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
>>
>>  	if (!first_key)
>>  		return 0;
>> +	/* We have @first_key, so this @eb must have at least one item */
>> +	if (btrfs_header_nritems(eb) == 0) {
>> +		btrfs_err(fs_info,
>> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
>> +			  eb->start);
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +		return -EUCLEAN;
>> +	}
>
> Shouldn't this check be before !first_key, e.g. we always want to verify
> that a particular node has at least 1 item ?

Nope, it's possible that we have case like read csum tree root.

In that case, we don't have first key, and csum tree can be empty.

Thanks,
Qu
>
>>
>>  	/*
>>  	 * For live tree block (new tree blocks in current transaction),
>> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
>> index 96fce4bef4e7..a4c7f7ed8490 100644
>> --- a/fs/btrfs/tree-checker.c
>> +++ b/fs/btrfs/tree-checker.c
>> @@ -888,6 +888,12 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
>>  				    owner);
>>  			return -EUCLEAN;
>>  		}
>> +		/* Unknown tree */
>> +		if (owner == 0) {
>> +			generic_err(leaf, 0,
>> +				"invalid owner, root 0 is not defined");
>> +			return -EUCLEAN;
>> +		}
>>  		return 0;
>>  	}
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
  2019-07-25  6:12 ` [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2019-07-25 10:20   ` Nikolay Borisov
  0 siblings, 0 replies; 21+ messages in thread
From: Nikolay Borisov @ 2019-07-25 10:20 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 25.07.19 г. 9:12 ч., Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at insert_inline_extent_backref():
>   kernel BUG at fs/btrfs/extent-tree.c:1857!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
>   Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
>   RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
>   RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
>   RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
>   RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>   R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
>   FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
>   Call Trace:
>   ? _cond_resched+0x1a/0x50
>   __btrfs_inc_extent_ref.isra.64+0x7e/0x240
>   ? btrfs_merge_delayed_refs+0xa5/0x330
>   __btrfs_run_delayed_refs+0x653/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace 2ad8b3de903cf825 ]---
> 
> [CAUSE]
> Due to extent tree corruption (still valid by itself, but bad cross ref),
> we can allocate an extent which is still in extent tree.
> The offending tree block of that case is from csum tree.
> The newly allocated tree block is also for csum tree.
> 
> Then we will try to insert an tree block ref for the existing tree block
> ref.
> 
> For btrfs tree extent item, a tree block can never be shared directly by
> the same tree twice.
> We have such BUG_ON() to prevent such problem, but BUG_ON() is
> definitely not good enough.
> 
> [FIX]
> Replace that BUG_ON() with proper error message and leaf dump for debug
> build.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/extent-tree.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 74e4b138218e..19dd9148a3aa 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1856,7 +1856,22 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
>  					   num_bytes, parent, root_objectid,
>  					   owner, offset, 1);
>  	if (ret == 0) {
> -		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> +		/*
> +		 * We're adding refs to an tree block we already own, this
> +		 * should not happen at all.
> +		 */
> +		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> +			btrfs_crit(trans->fs_info,
> +"invalid operation, adding refs to an existing tree ref, bytenr=%llu num_bytes=%llu root_objectid=%llu",
> +				   bytenr, num_bytes, root_objectid);
> +			if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
> +				WARN_ON(1);
> +				btrfs_crit(trans->fs_info,
> +			"path->slots[0]=%d path->nodes[0]:", path->slots[0]);
> +				btrfs_print_leaf(path->nodes[0]);
> +			}
> +			return -EUCLEAN;
> +		}
>  		update_inline_extent_backref(path, iref, refs_to_add,
>  					     extent_op, NULL);
>  	} else if (ret == -ENOENT) {
> @@ -2073,6 +2088,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  /*
>   * __btrfs_inc_extent_ref - insert backreference for a given extent
>   *
> + * The work is opposite as __btrfs_free_extent().
> + * For more info about how it works or examples, refer to __btrfs_free_extent().
> + *
>   * @trans:	    Handle of transaction
>   *
>   * @node:	    The delayed ref node used to get the bytenr/length for
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-25  6:12 ` [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
  2019-07-25  8:39   ` Nikolay Borisov
@ 2019-07-30 14:59   ` kbuild test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2019-07-30 14:59 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: kbuild-all, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1153 bytes --]

Hi Qu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.3-rc2 next-20190730]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20190728-085508
config: i386-randconfig-a004-201930 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.2-10+deb8u1) 4.9.2
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   fs/btrfs/extent-tree.c: In function '__btrfs_free_extent':
>> fs/btrfs/extent-tree.c:4948:4: warning: format '%lu' expects argument of type 'long unsigned int', but argument 8 has type 'unsigned int' [-Wformat=]
       btrfs_crit(info,
       ^

vim +4948 fs/btrfs/extent-tree.c

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31992 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-25  6:58     ` Qu Wenruo
@ 2019-07-31 13:47       ` David Sterba
  0 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2019-07-31 13:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Nikolay Borisov, Qu Wenruo, linux-btrfs

On Thu, Jul 25, 2019 at 02:58:08PM +0800, Qu Wenruo wrote:
> [snip]
> >> -	if (dst_offset + len > dst->len) {
> >> -		btrfs_err(fs_info,
> >> -			"memmove bogus dst_offset %lu move len %lu dst len %lu",
> >> -			 dst_offset, len, dst->len);
> >> -		BUG();
> >> -	}
> >> +	if (check_eb_range(dst, dst_offset, len) ||
> >> +	    check_eb_range(dst, src_offset, len))
> >> +		return;
> >
> > I'm not sure about this. If the code expects memcpy_extent_buffer to
> > never fail then it will make more sense to do the range check outside of
> > this function. Otherwise it might silently fail and cause mayhem up the
> > call chain. Or just leave the BUG (I'd rather not).
> 
> Yes, that's also what I'm concerned.
> 
> But at least, for that case we're not making things worse.
> Furthermore, btrfs tree checker should have already rejected most
> invalid trees already.
> So I'm not so concerned.

I think this is the valid use of BUG, the check is supposed something
that's verified in advance and it must not happen inisdie the extent
buffer functions. Stress on the 'must not', so if it happens something
is seriously wrong, like a memory bitflip or accidental overwrite by
some other code and the BUG is there to stop propagating the error.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-25  6:12 ` [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
  2019-07-25  9:26   ` Nikolay Borisov
@ 2019-08-06 13:58   ` David Sterba
  2019-08-06 14:04     ` Qu Wenruo
  1 sibling, 1 reply; 21+ messages in thread
From: David Sterba @ 2019-08-06 13:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Thu, Jul 25, 2019 at 02:12:20PM +0800, Qu Wenruo wrote:
>  
>  	if (!first_key)
>  		return 0;
> +	/* We have @first_key, so this @eb must have at least one item */
> +	if (btrfs_header_nritems(eb) == 0) {
> +		btrfs_err(fs_info,
> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> +			  eb->start);
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +		return -EUCLEAN;
> +	}

generic/015 complains:

generic/015             [13:51:40][ 5949.416657] run fstests generic/015 at 2019-08-06 13:51:40
[ 5949.603473] BTRFS: device fsid 6a7e1bd0-42d2-4bab-9d9e-791635466751 devid 1 transid 5 /dev/vdb
[ 5949.616969] BTRFS info (device vdb): turning on discard
[ 5949.619414] BTRFS info (device vdb): disk space caching is enabled
[ 5949.622493] BTRFS info (device vdb): has skinny extents
[ 5949.624694] BTRFS info (device vdb): flagging fs with big metadata feature
[ 5949.629003] BTRFS info (device vdb): checking UUID tree
[ 5949.802548] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
[ 5949.806259] ------------[ cut here ]------------
[ 5949.808127] WARNING: CPU: 1 PID: 27019 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 5949.811946] Modules linked in: dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet
[ 5949.816321] CPU: 1 PID: 27019 Comm: kworker/u8:10 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 5949.819269] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 5949.822202] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 5949.824231] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 5949.825348] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7
 c7 40
[ 5949.828251] RSP: 0018:ffffab1288803ab8 EFLAGS: 00010246
[ 5949.829759] RAX: 0000000000000024 RBX: ffff9d4f6ed47b10 RCX: 0000000000000001
[ 5949.830961] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d02ca
[ 5949.832107] RBP: ffff9d4f60568000 R08: 0000000000000000 R09: 0000000000000000
[ 5949.833414] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
[ 5949.834508] R13: ffffab1288803b6e R14: ffff9d4f3325e850 R15: 0000000000000000
[ 5949.835572] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
[ 5949.837190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5949.838135] CR2: 00005645c3a0c078 CR3: 0000000014011000 CR4: 00000000000006e0
[ 5949.839197] Call Trace:
[ 5949.839729]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
[ 5949.840909]  btrfs_search_slot+0x25d/0xc10 [btrfs]
[ 5949.841796]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
[ 5949.842561]  ? kmem_cache_alloc+0x1f2/0x280
[ 5949.843268]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
[ 5949.844305]  add_pending_csums+0x50/0x70 [btrfs]
[ 5949.845553]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
[ 5949.846963]  normal_work_helper+0xd1/0x540 [btrfs]
[ 5949.848258]  process_one_work+0x22d/0x580
[ 5949.849337]  worker_thread+0x50/0x3b0
[ 5949.850361]  kthread+0xfe/0x140
[ 5949.851267]  ? process_one_work+0x580/0x580
[ 5949.852405]  ? kthread_park+0x80/0x80
[ 5949.853429]  ret_from_fork+0x24/0x30
[ 5949.854422] irq event stamp: 0
[ 5949.855341] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 5949.856688] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 5949.858104] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 5949.859400] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 5949.860611] ---[ end trace f6f3adf90f4ea40f ]---
[ 5949.862745] ------------[ cut here ]------------
[ 5949.864103] BTRFS: Transaction aborted (error -117)
[ 5949.865577] WARNING: CPU: 1 PID: 27019 at fs/btrfs/inode.c:3175 btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
[ 5949.868267] Modules linked in: dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet
[ 5949.872781] CPU: 1 PID: 27019 Comm: kworker/u8:10 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 5949.875535] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 5949.878428] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 5949.880211] RIP: 0010:btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
[ 5949.881317] Code: e9 aa fc ff ff 49 8b 47 50 f0 48 0f ba a8 e8 33 00 00 02 72 17 41 83 fd fb 74 5b 44 89 ee 48 c7 c7 08 b4 48 c0 e8 22 55 c9 df <0f> 0b 44 89 e9 ba 67 0c 00 00 eb b0 88 5c 24 10 41 89 de 41
 bd fb

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-08-06 13:58   ` David Sterba
@ 2019-08-06 14:04     ` Qu Wenruo
  2019-08-06 17:47       ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2019-08-06 14:04 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5141 bytes --]



On 2019/8/6 下午9:58, David Sterba wrote:
> On Thu, Jul 25, 2019 at 02:12:20PM +0800, Qu Wenruo wrote:
>>  
>>  	if (!first_key)
>>  		return 0;
>> +	/* We have @first_key, so this @eb must have at least one item */
>> +	if (btrfs_header_nritems(eb) == 0) {
>> +		btrfs_err(fs_info,
>> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
>> +			  eb->start);
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +		return -EUCLEAN;
>> +	}
> 
> generic/015 complains:
> 
> generic/015             [13:51:40][ 5949.416657] run fstests generic/015 at 2019-08-06 13:51:40

I hit this once, but not this test case.
The same backtrace for csum tree.

Have you ever hit it again?

Thanks,
Qu

> [ 5949.603473] BTRFS: device fsid 6a7e1bd0-42d2-4bab-9d9e-791635466751 devid 1 transid 5 /dev/vdb
> [ 5949.616969] BTRFS info (device vdb): turning on discard
> [ 5949.619414] BTRFS info (device vdb): disk space caching is enabled
> [ 5949.622493] BTRFS info (device vdb): has skinny extents
> [ 5949.624694] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 5949.629003] BTRFS info (device vdb): checking UUID tree
> [ 5949.802548] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
> [ 5949.806259] ------------[ cut here ]------------
> [ 5949.808127] WARNING: CPU: 1 PID: 27019 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 5949.811946] Modules linked in: dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet
> [ 5949.816321] CPU: 1 PID: 27019 Comm: kworker/u8:10 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 5949.819269] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 5949.822202] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 5949.824231] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 5949.825348] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7
>  c7 40
> [ 5949.828251] RSP: 0018:ffffab1288803ab8 EFLAGS: 00010246
> [ 5949.829759] RAX: 0000000000000024 RBX: ffff9d4f6ed47b10 RCX: 0000000000000001
> [ 5949.830961] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d02ca
> [ 5949.832107] RBP: ffff9d4f60568000 R08: 0000000000000000 R09: 0000000000000000
> [ 5949.833414] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
> [ 5949.834508] R13: ffffab1288803b6e R14: ffff9d4f3325e850 R15: 0000000000000000
> [ 5949.835572] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
> [ 5949.837190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5949.838135] CR2: 00005645c3a0c078 CR3: 0000000014011000 CR4: 00000000000006e0
> [ 5949.839197] Call Trace:
> [ 5949.839729]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
> [ 5949.840909]  btrfs_search_slot+0x25d/0xc10 [btrfs]
> [ 5949.841796]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
> [ 5949.842561]  ? kmem_cache_alloc+0x1f2/0x280
> [ 5949.843268]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
> [ 5949.844305]  add_pending_csums+0x50/0x70 [btrfs]
> [ 5949.845553]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
> [ 5949.846963]  normal_work_helper+0xd1/0x540 [btrfs]
> [ 5949.848258]  process_one_work+0x22d/0x580
> [ 5949.849337]  worker_thread+0x50/0x3b0
> [ 5949.850361]  kthread+0xfe/0x140
> [ 5949.851267]  ? process_one_work+0x580/0x580
> [ 5949.852405]  ? kthread_park+0x80/0x80
> [ 5949.853429]  ret_from_fork+0x24/0x30
> [ 5949.854422] irq event stamp: 0
> [ 5949.855341] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [ 5949.856688] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 5949.858104] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 5949.859400] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 5949.860611] ---[ end trace f6f3adf90f4ea40f ]---
> [ 5949.862745] ------------[ cut here ]------------
> [ 5949.864103] BTRFS: Transaction aborted (error -117)
> [ 5949.865577] WARNING: CPU: 1 PID: 27019 at fs/btrfs/inode.c:3175 btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
> [ 5949.868267] Modules linked in: dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet
> [ 5949.872781] CPU: 1 PID: 27019 Comm: kworker/u8:10 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 5949.875535] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 5949.878428] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 5949.880211] RIP: 0010:btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
> [ 5949.881317] Code: e9 aa fc ff ff 49 8b 47 50 f0 48 0f ba a8 e8 33 00 00 02 72 17 41 83 fd fb 74 5b 44 89 ee 48 c7 c7 08 b4 48 c0 e8 22 55 c9 df <0f> 0b 44 89 e9 ba 67 0c 00 00 eb b0 88 5c 24 10 41 89 de 41
>  bd fb
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-08-06 14:04     ` Qu Wenruo
@ 2019-08-06 17:47       ` David Sterba
  2019-08-07  2:22         ` Qu Wenruo
  2019-08-07  6:08         ` Qu Wenruo
  0 siblings, 2 replies; 21+ messages in thread
From: David Sterba @ 2019-08-06 17:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Tue, Aug 06, 2019 at 10:04:51PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/8/6 下午9:58, David Sterba wrote:
> > On Thu, Jul 25, 2019 at 02:12:20PM +0800, Qu Wenruo wrote:
> >>  
> >>  	if (!first_key)
> >>  		return 0;
> >> +	/* We have @first_key, so this @eb must have at least one item */
> >> +	if (btrfs_header_nritems(eb) == 0) {
> >> +		btrfs_err(fs_info,
> >> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> >> +			  eb->start);
> >> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >> +		return -EUCLEAN;
> >> +	}
> > 
> > generic/015 complains:
> > 
> > generic/015             [13:51:40][ 5949.416657] run fstests generic/015 at 2019-08-06 13:51:40
> 
> I hit this once, but not this test case.
> The same backtrace for csum tree.
> 
> Have you ever hit it again?

Yes I found a few more occurences, the last one seems to be interesting so it's
pasted as-is.

generic/449

[21423.875017]  read_block_for_search+0x144/0x380 [btrfs]
[21423.876433]  btrfs_search_slot+0x297/0xfc0 [btrfs]
[21423.877830]  ? btrfs_update_delayed_refs_rsv+0x59/0x70 [btrfs]
[21423.880038]  btrfs_lookup_csum+0xa9/0x210 [btrfs]
[21423.881304]  btrfs_csum_file_blocks+0x205/0x800 [btrfs]
[21423.882674]  ? unpin_extent_cache+0x27/0xc0 [btrfs]
[21423.884050]  add_pending_csums+0x50/0x70 [btrfs]
[21423.885285]  btrfs_finish_ordered_io+0x403/0x7b0 [btrfs]
[21423.886781]  ? _raw_spin_unlock_bh+0x30/0x40
[21423.888164]  normal_work_helper+0xe2/0x520 [btrfs]
[21423.889521]  process_one_work+0x22f/0x5b0
[21423.890332]  worker_thread+0x50/0x3b0
[21423.891001]  ? process_one_work+0x5b0/0x5b0
[21423.892025]  kthread+0x11a/0x130

generic/511

[45857.582982]  read_block_for_search+0x144/0x380 [btrfs]
[45857.584197]  btrfs_search_slot+0x297/0xfc0 [btrfs]
[45857.585363]  ? btrfs_update_delayed_refs_rsv+0x59/0x70 [btrfs]
[45857.586758]  btrfs_lookup_csum+0xa9/0x210 [btrfs]
[45857.587919]  btrfs_csum_file_blocks+0x205/0x800 [btrfs]
[45857.589023]  ? unpin_extent_cache+0x27/0xc0 [btrfs]
[45857.590311]  add_pending_csums+0x50/0x70 [btrfs]
[45857.591482]  btrfs_finish_ordered_io+0x403/0x7b0 [btrfs]
[45857.592671]  ? _raw_spin_unlock_bh+0x30/0x40
[45857.593759]  normal_work_helper+0xe2/0x520 [btrfs]
[45857.595274]  process_one_work+0x22f/0x5b0
[45857.596372]  worker_thread+0x50/0x3b0
[45857.597221]  ? process_one_work+0x5b0/0x5b0
[45857.598438]  kthread+0x11a/0x130

generic/129

[ 7512.874839] BTRFS info (device vda): disk space caching is enabled
[ 7512.877660] BTRFS info (device vda): has skinny extents
[ 7512.951947] BTRFS: device fsid 815ae95d-a328-472d-8299-a373d67af54e devid 1 transid 5 /dev/vdb
[ 7512.969169] BTRFS info (device vdb): turning on discard
[ 7512.971138] BTRFS info (device vdb): disk space caching is enabled
[ 7512.973506] BTRFS info (device vdb): has skinny extents
[ 7512.975497] BTRFS info (device vdb): flagging fs with big metadata feature
[ 7513.005926] BTRFS info (device vdb): checking UUID tree
[ 7513.395115] ------------[ cut here ]------------
[ 7513.395120] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
[ 7513.395122] ------------[ cut here ]------------
[ 7513.395124] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
[ 7513.395125] ------------[ cut here ]------------
[ 7513.395185] WARNING: CPU: 1 PID: 17085 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.395186] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
[ 7513.395193] CPU: 1 PID: 17085 Comm: kworker/u8:4 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 7513.395194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 7513.395212] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7513.395230] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.395232] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
[ 7513.395232] RSP: 0018:ffffab1286483ab8 EFLAGS: 00010246
[ 7513.395233] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
[ 7513.395234] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d1ca8
[ 7513.395235] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
[ 7513.395235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
[ 7513.395247] R13: ffffab1286483b6e R14: ffff9d4f2267aaf0 R15: 0000000000000000
[ 7513.395251] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
[ 7513.395252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7513.395253] CR2: 00007f5d921edef4 CR3: 0000000014011000 CR4: 00000000000006e0
[ 7513.395254] Call Trace:
[ 7513.395277]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
[ 7513.395300]  btrfs_search_slot+0x25d/0xc10 [btrfs]
[ 7513.395325]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
[ 7513.395330]  ? kmem_cache_alloc+0x1f2/0x280
[ 7513.395354]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
[ 7513.395378]  add_pending_csums+0x50/0x70 [btrfs]
[ 7513.395403]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
[ 7513.395432]  normal_work_helper+0xd1/0x540 [btrfs]
[ 7513.395437]  process_one_work+0x22d/0x580
[ 7513.395440]  worker_thread+0x50/0x3b0
[ 7513.395443]  kthread+0xfe/0x140
[ 7513.395446]  ? process_one_work+0x580/0x580
[ 7513.395447]  ? kthread_park+0x80/0x80
[ 7513.395452]  ret_from_fork+0x24/0x30
[ 7513.395454] irq event stamp: 0
[ 7513.395457] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 7513.395461] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.395463] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.395465] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 7513.395466] ---[ end trace f6f3adf90f4ea411 ]---
[ 7513.395470] ------------[ cut here ]------------
[ 7513.395471] BTRFS: Transaction aborted (error -117)
[ 7513.395528] WARNING: CPU: 1 PID: 17085 at fs/btrfs/inode.c:3175 btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
[ 7513.395529] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
[ 7513.395540] CPU: 1 PID: 17085 Comm: kworker/u8:4 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 7513.395542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 7513.395570] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7513.395588] RIP: 0010:btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
[ 7513.395590] Code: e9 aa fc ff ff 49 8b 47 50 f0 48 0f ba a8 e8 33 00 00 02 72 17 41 83 fd fb 74 5b 44 89 ee 48 c7 c7 08 b4 48 c0 e8 22 55 c9 df <0f> 0b 44 89 e9 ba 67 0c 00 00 eb b0 88 5c 24 10 41 89 de 41 bd fb
[ 7513.395592] RSP: 0018:ffffab1286483d90 EFLAGS: 00010286
[ 7513.395593] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[ 7513.395594] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa00d1ca8
[ 7513.395596] RBP: ffff9d4f573d8c80 R08: 0000000000000000 R09: 0000000000000000
[ 7513.395597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d4f06120e80
[ 7513.395598] R13: 00000000ffffff8b R14: ffff9d4f03e85000 R15: ffff9d4f57763750
[ 7513.395603] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
[ 7513.395605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7513.395606] CR2: 00007f5d921edef4 CR3: 0000000014011000 CR4: 00000000000006e0
[ 7513.395607] Call Trace:
[ 7513.395638]  normal_work_helper+0xd1/0x540 [btrfs]
[ 7513.395642]  process_one_work+0x22d/0x580
[ 7513.395645]  worker_thread+0x50/0x3b0
[ 7513.395648]  kthread+0xfe/0x140
[ 7513.395651]  ? process_one_work+0x580/0x580
[ 7513.395653]  ? kthread_park+0x80/0x80
[ 7513.395656]  ret_from_fork+0x24/0x30
[ 7513.395658] irq event stamp: 0
[ 7513.395659] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 7513.395662] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.395665] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.395666] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 7513.395668] ---[ end trace f6f3adf90f4ea412 ]---
[ 7513.395671] BTRFS: error (device vdb) in btrfs_finish_ordered_io:3175: errno=-117 unknown
[ 7513.395674] BTRFS info (device vdb): forced readonly
[ 7513.396527] WARNING: CPU: 3 PID: 2709 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.399117] WARNING: CPU: 2 PID: 29478 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.400326] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
[ 7513.402828] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
[ 7513.404136] CPU: 3 PID: 2709 Comm: kworker/u8:8 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 7513.406553] CPU: 2 PID: 29478 Comm: kworker/u8:14 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
[ 7513.411174] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 7513.413441] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
[ 7513.413464] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7513.415124] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 7513.416431] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.416433] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
[ 7513.417412] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
[ 7513.420742] RSP: 0000:ffffab1283b83ab8 EFLAGS: 00010246
[ 7513.421912] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
[ 7513.423177] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
[ 7513.423178] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d040b
[ 7513.424570] RSP: 0018:ffffab1287043ab8 EFLAGS: 00010246
[ 7513.426005] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
[ 7513.427876] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
[ 7513.429411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
[ 7513.429412] R13: ffffab1283b83b6e R14: ffff9d4f47b27150 R15: 0000000000000000
[ 7513.430963] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d040b
[ 7513.432092] FS:  0000000000000000(0000) GS:ffff9d4f7da00000(0000) knlGS:0000000000000000
[ 7513.433535] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
[ 7513.434145] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7513.434146] CR2: 00007f907ab328e0 CR3: 000000007d2cb000 CR4: 00000000000006e0
[ 7513.435037] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
[ 7513.435038] R13: ffffab1287043b6e R14: ffff9d4f61e75af0 R15: 0000000000000000
[ 7513.436009] Call Trace:
[ 7513.436973] FS:  0000000000000000(0000) GS:ffff9d4f7dc00000(0000) knlGS:0000000000000000
[ 7513.437892]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
[ 7513.438806] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7513.439932]  btrfs_search_slot+0x25d/0xc10 [btrfs]
[ 7513.441229] CR2: 00007f5d91fa7e40 CR3: 000000007d2cb000 CR4: 00000000000006e0
[ 7513.442323]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
[ 7513.442994] Call Trace:
[ 7513.443764]  ? kmem_cache_alloc+0x1f2/0x280
[ 7513.444346]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
[ 7513.445182]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
[ 7513.445815]  btrfs_search_slot+0x25d/0xc10 [btrfs]
[ 7513.446570]  add_pending_csums+0x50/0x70 [btrfs]
[ 7513.447126]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
[ 7513.448462]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
[ 7513.450730]  ? kmem_cache_alloc+0x1f2/0x280
[ 7513.452643]  normal_work_helper+0xd1/0x540 [btrfs]
[ 7513.454315]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
[ 7513.455379]  process_one_work+0x22d/0x580
[ 7513.456451]  add_pending_csums+0x50/0x70 [btrfs]
[ 7513.457570]  worker_thread+0x50/0x3b0
[ 7513.460066]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
[ 7513.463658]  kthread+0xfe/0x140
[ 7513.465743]  normal_work_helper+0xd1/0x540 [btrfs]
[ 7513.467701]  ? process_one_work+0x580/0x580
[ 7513.467703]  ? kthread_park+0x80/0x80
[ 7513.468978]  process_one_work+0x22d/0x580
[ 7513.470096]  ret_from_fork+0x24/0x30
[ 7513.473537]  worker_thread+0x50/0x3b0
[ 7513.474525] irq event stamp: 0
[ 7513.475631]  kthread+0xfe/0x140
[ 7513.477203] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 7513.477207] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.479011]  ? process_one_work+0x580/0x580
[ 7513.480541] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
[ 7513.480542] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 7513.482354]  ? kthread_park+0x80/0x80
[ 7513.484191] ---[ end trace f6f3adf90f4ea413 ]---
[ 7513.591444]  ret_from_fork+0x24/0x30
[ 7513.592257] irq event stamp: 13768774
[ 7513.592975] hardirqs last  enabled at (13768773): [<ffffffffa06575b9>] _raw_spin_unlock_irq+0x29/0x40
[ 7513.594386] hardirqs last disabled at (13768774): [<ffffffffa064fb7e>] __schedule+0xae/0x830
[ 7513.595732] softirqs last  enabled at (13768770): [<ffffffffa0a0035c>] __do_softirq+0x35c/0x45c
[ 7513.597472] softirqs last disabled at (13768759): [<ffffffffa006a713>] irq_exit+0xb3/0xc0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-08-06 17:47       ` David Sterba
@ 2019-08-07  2:22         ` Qu Wenruo
  2019-08-07  6:08         ` Qu Wenruo
  1 sibling, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-08-07  2:22 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 15068 bytes --]



On 2019/8/7 上午1:47, David Sterba wrote:
> On Tue, Aug 06, 2019 at 10:04:51PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/8/6 下午9:58, David Sterba wrote:
>>> On Thu, Jul 25, 2019 at 02:12:20PM +0800, Qu Wenruo wrote:
>>>>  
>>>>  	if (!first_key)
>>>>  		return 0;
>>>> +	/* We have @first_key, so this @eb must have at least one item */
>>>> +	if (btrfs_header_nritems(eb) == 0) {
>>>> +		btrfs_err(fs_info,
>>>> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
>>>> +			  eb->start);
>>>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>>>> +		return -EUCLEAN;
>>>> +	}
>>>
>>> generic/015 complains:
>>>
>>> generic/015             [13:51:40][ 5949.416657] run fstests generic/015 at 2019-08-06 13:51:40
>>
>> I hit this once, but not this test case.
>> The same backtrace for csum tree.
>>
>> Have you ever hit it again?
> 
> Yes I found a few more occurences, the last one seems to be interesting so it's
> pasted as-is.
> 

Thanks for all the traces.

Although they are all btrfs_search_slot(), I'm not sure if it's a
unexposed a race or just false alert.

Please discard that patch for now.

I'll keep digging into the case.

Thanks,
Qu

> generic/449
> 
> [21423.875017]  read_block_for_search+0x144/0x380 [btrfs]
> [21423.876433]  btrfs_search_slot+0x297/0xfc0 [btrfs]
> [21423.877830]  ? btrfs_update_delayed_refs_rsv+0x59/0x70 [btrfs]
> [21423.880038]  btrfs_lookup_csum+0xa9/0x210 [btrfs]
> [21423.881304]  btrfs_csum_file_blocks+0x205/0x800 [btrfs]
> [21423.882674]  ? unpin_extent_cache+0x27/0xc0 [btrfs]
> [21423.884050]  add_pending_csums+0x50/0x70 [btrfs]
> [21423.885285]  btrfs_finish_ordered_io+0x403/0x7b0 [btrfs]
> [21423.886781]  ? _raw_spin_unlock_bh+0x30/0x40
> [21423.888164]  normal_work_helper+0xe2/0x520 [btrfs]
> [21423.889521]  process_one_work+0x22f/0x5b0
> [21423.890332]  worker_thread+0x50/0x3b0
> [21423.891001]  ? process_one_work+0x5b0/0x5b0
> [21423.892025]  kthread+0x11a/0x130
> 
> generic/511
> 
> [45857.582982]  read_block_for_search+0x144/0x380 [btrfs]
> [45857.584197]  btrfs_search_slot+0x297/0xfc0 [btrfs]
> [45857.585363]  ? btrfs_update_delayed_refs_rsv+0x59/0x70 [btrfs]
> [45857.586758]  btrfs_lookup_csum+0xa9/0x210 [btrfs]
> [45857.587919]  btrfs_csum_file_blocks+0x205/0x800 [btrfs]
> [45857.589023]  ? unpin_extent_cache+0x27/0xc0 [btrfs]
> [45857.590311]  add_pending_csums+0x50/0x70 [btrfs]
> [45857.591482]  btrfs_finish_ordered_io+0x403/0x7b0 [btrfs]
> [45857.592671]  ? _raw_spin_unlock_bh+0x30/0x40
> [45857.593759]  normal_work_helper+0xe2/0x520 [btrfs]
> [45857.595274]  process_one_work+0x22f/0x5b0
> [45857.596372]  worker_thread+0x50/0x3b0
> [45857.597221]  ? process_one_work+0x5b0/0x5b0
> [45857.598438]  kthread+0x11a/0x130
> 
> generic/129
> 
> [ 7512.874839] BTRFS info (device vda): disk space caching is enabled
> [ 7512.877660] BTRFS info (device vda): has skinny extents
> [ 7512.951947] BTRFS: device fsid 815ae95d-a328-472d-8299-a373d67af54e devid 1 transid 5 /dev/vdb
> [ 7512.969169] BTRFS info (device vdb): turning on discard
> [ 7512.971138] BTRFS info (device vdb): disk space caching is enabled
> [ 7512.973506] BTRFS info (device vdb): has skinny extents
> [ 7512.975497] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 7513.005926] BTRFS info (device vdb): checking UUID tree
> [ 7513.395115] ------------[ cut here ]------------
> [ 7513.395120] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
> [ 7513.395122] ------------[ cut here ]------------
> [ 7513.395124] BTRFS error (device vdb): invalid tree nritems, bytenr=30736384 nritems=0 expect >0
> [ 7513.395125] ------------[ cut here ]------------
> [ 7513.395185] WARNING: CPU: 1 PID: 17085 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.395186] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
> [ 7513.395193] CPU: 1 PID: 17085 Comm: kworker/u8:4 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 7513.395194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 7513.395212] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7513.395230] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.395232] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
> [ 7513.395232] RSP: 0018:ffffab1286483ab8 EFLAGS: 00010246
> [ 7513.395233] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
> [ 7513.395234] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d1ca8
> [ 7513.395235] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
> [ 7513.395235] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
> [ 7513.395247] R13: ffffab1286483b6e R14: ffff9d4f2267aaf0 R15: 0000000000000000
> [ 7513.395251] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
> [ 7513.395252] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7513.395253] CR2: 00007f5d921edef4 CR3: 0000000014011000 CR4: 00000000000006e0
> [ 7513.395254] Call Trace:
> [ 7513.395277]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
> [ 7513.395300]  btrfs_search_slot+0x25d/0xc10 [btrfs]
> [ 7513.395325]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
> [ 7513.395330]  ? kmem_cache_alloc+0x1f2/0x280
> [ 7513.395354]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
> [ 7513.395378]  add_pending_csums+0x50/0x70 [btrfs]
> [ 7513.395403]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
> [ 7513.395432]  normal_work_helper+0xd1/0x540 [btrfs]
> [ 7513.395437]  process_one_work+0x22d/0x580
> [ 7513.395440]  worker_thread+0x50/0x3b0
> [ 7513.395443]  kthread+0xfe/0x140
> [ 7513.395446]  ? process_one_work+0x580/0x580
> [ 7513.395447]  ? kthread_park+0x80/0x80
> [ 7513.395452]  ret_from_fork+0x24/0x30
> [ 7513.395454] irq event stamp: 0
> [ 7513.395457] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [ 7513.395461] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.395463] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.395465] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 7513.395466] ---[ end trace f6f3adf90f4ea411 ]---
> [ 7513.395470] ------------[ cut here ]------------
> [ 7513.395471] BTRFS: Transaction aborted (error -117)
> [ 7513.395528] WARNING: CPU: 1 PID: 17085 at fs/btrfs/inode.c:3175 btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
> [ 7513.395529] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
> [ 7513.395540] CPU: 1 PID: 17085 Comm: kworker/u8:4 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 7513.395542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 7513.395570] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7513.395588] RIP: 0010:btrfs_finish_ordered_io+0x781/0x7f0 [btrfs]
> [ 7513.395590] Code: e9 aa fc ff ff 49 8b 47 50 f0 48 0f ba a8 e8 33 00 00 02 72 17 41 83 fd fb 74 5b 44 89 ee 48 c7 c7 08 b4 48 c0 e8 22 55 c9 df <0f> 0b 44 89 e9 ba 67 0c 00 00 eb b0 88 5c 24 10 41 89 de 41 bd fb
> [ 7513.395592] RSP: 0018:ffffab1286483d90 EFLAGS: 00010286
> [ 7513.395593] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
> [ 7513.395594] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa00d1ca8
> [ 7513.395596] RBP: ffff9d4f573d8c80 R08: 0000000000000000 R09: 0000000000000000
> [ 7513.395597] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9d4f06120e80
> [ 7513.395598] R13: 00000000ffffff8b R14: ffff9d4f03e85000 R15: ffff9d4f57763750
> [ 7513.395603] FS:  0000000000000000(0000) GS:ffff9d4f7d800000(0000) knlGS:0000000000000000
> [ 7513.395605] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7513.395606] CR2: 00007f5d921edef4 CR3: 0000000014011000 CR4: 00000000000006e0
> [ 7513.395607] Call Trace:
> [ 7513.395638]  normal_work_helper+0xd1/0x540 [btrfs]
> [ 7513.395642]  process_one_work+0x22d/0x580
> [ 7513.395645]  worker_thread+0x50/0x3b0
> [ 7513.395648]  kthread+0xfe/0x140
> [ 7513.395651]  ? process_one_work+0x580/0x580
> [ 7513.395653]  ? kthread_park+0x80/0x80
> [ 7513.395656]  ret_from_fork+0x24/0x30
> [ 7513.395658] irq event stamp: 0
> [ 7513.395659] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [ 7513.395662] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.395665] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.395666] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 7513.395668] ---[ end trace f6f3adf90f4ea412 ]---
> [ 7513.395671] BTRFS: error (device vdb) in btrfs_finish_ordered_io:3175: errno=-117 unknown
> [ 7513.395674] BTRFS info (device vdb): forced readonly
> [ 7513.396527] WARNING: CPU: 3 PID: 2709 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.399117] WARNING: CPU: 2 PID: 29478 at fs/btrfs/disk-io.c:417 btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.400326] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
> [ 7513.402828] Modules linked in: dm_snapshot dm_bufio dm_log_writes dm_flakey dm_mod btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop af_packet [last unloaded: scsi_debug]
> [ 7513.404136] CPU: 3 PID: 2709 Comm: kworker/u8:8 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 7513.406553] CPU: 2 PID: 29478 Comm: kworker/u8:14 Tainted: G        W         5.3.0-rc3-next-20190806-default #5
> [ 7513.411174] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 7513.413441] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
> [ 7513.413464] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7513.415124] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> [ 7513.416431] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.416433] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
> [ 7513.417412] RIP: 0010:btrfs_verify_level_key.cold+0x1e/0x2b [btrfs]
> [ 7513.420742] RSP: 0000:ffffab1283b83ab8 EFLAGS: 00010246
> [ 7513.421912] Code: ff ff e8 9f c8 ff ff e9 4d 58 f5 ff 48 8b 13 48 c7 c6 48 9c 48 c0 48 89 ef e8 88 c8 ff ff 48 c7 c7 c0 95 48 c0 e8 c0 e9 c6 df <0f> 0b 41 be 8b ff ff ff e9 dd 5a f5 ff be e9 05 00 00 48 c7 c7 40
> [ 7513.423177] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
> [ 7513.423178] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d040b
> [ 7513.424570] RSP: 0018:ffffab1287043ab8 EFLAGS: 00010246
> [ 7513.426005] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
> [ 7513.427876] RAX: 0000000000000024 RBX: ffff9d4f06a493b0 RCX: 0000000000000001
> [ 7513.429411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
> [ 7513.429412] R13: ffffab1283b83b6e R14: ffff9d4f47b27150 R15: 0000000000000000
> [ 7513.430963] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa00d040b
> [ 7513.432092] FS:  0000000000000000(0000) GS:ffff9d4f7da00000(0000) knlGS:0000000000000000
> [ 7513.433535] RBP: ffff9d4f069d4000 R08: 0000000000000000 R09: 0000000000000000
> [ 7513.434145] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7513.434146] CR2: 00007f907ab328e0 CR3: 000000007d2cb000 CR4: 00000000000006e0
> [ 7513.435037] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000006
> [ 7513.435038] R13: ffffab1287043b6e R14: ffff9d4f61e75af0 R15: 0000000000000000
> [ 7513.436009] Call Trace:
> [ 7513.436973] FS:  0000000000000000(0000) GS:ffff9d4f7dc00000(0000) knlGS:0000000000000000
> [ 7513.437892]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
> [ 7513.438806] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7513.439932]  btrfs_search_slot+0x25d/0xc10 [btrfs]
> [ 7513.441229] CR2: 00007f5d91fa7e40 CR3: 000000007d2cb000 CR4: 00000000000006e0
> [ 7513.442323]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
> [ 7513.442994] Call Trace:
> [ 7513.443764]  ? kmem_cache_alloc+0x1f2/0x280
> [ 7513.444346]  read_block_for_search.isra.0+0x13d/0x3d0 [btrfs]
> [ 7513.445182]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
> [ 7513.445815]  btrfs_search_slot+0x25d/0xc10 [btrfs]
> [ 7513.446570]  add_pending_csums+0x50/0x70 [btrfs]
> [ 7513.447126]  btrfs_lookup_csum+0x6a/0x160 [btrfs]
> [ 7513.448462]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
> [ 7513.450730]  ? kmem_cache_alloc+0x1f2/0x280
> [ 7513.452643]  normal_work_helper+0xd1/0x540 [btrfs]
> [ 7513.454315]  btrfs_csum_file_blocks+0x198/0x6f0 [btrfs]
> [ 7513.455379]  process_one_work+0x22d/0x580
> [ 7513.456451]  add_pending_csums+0x50/0x70 [btrfs]
> [ 7513.457570]  worker_thread+0x50/0x3b0
> [ 7513.460066]  btrfs_finish_ordered_io+0x3cb/0x7f0 [btrfs]
> [ 7513.463658]  kthread+0xfe/0x140
> [ 7513.465743]  normal_work_helper+0xd1/0x540 [btrfs]
> [ 7513.467701]  ? process_one_work+0x580/0x580
> [ 7513.467703]  ? kthread_park+0x80/0x80
> [ 7513.468978]  process_one_work+0x22d/0x580
> [ 7513.470096]  ret_from_fork+0x24/0x30
> [ 7513.473537]  worker_thread+0x50/0x3b0
> [ 7513.474525] irq event stamp: 0
> [ 7513.475631]  kthread+0xfe/0x140
> [ 7513.477203] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [ 7513.477207] hardirqs last disabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.479011]  ? process_one_work+0x580/0x580
> [ 7513.480541] softirqs last  enabled at (0): [<ffffffffa005ff00>] copy_process+0x6d0/0x1ac0
> [ 7513.480542] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 7513.482354]  ? kthread_park+0x80/0x80
> [ 7513.484191] ---[ end trace f6f3adf90f4ea413 ]---
> [ 7513.591444]  ret_from_fork+0x24/0x30
> [ 7513.592257] irq event stamp: 13768774
> [ 7513.592975] hardirqs last  enabled at (13768773): [<ffffffffa06575b9>] _raw_spin_unlock_irq+0x29/0x40
> [ 7513.594386] hardirqs last disabled at (13768774): [<ffffffffa064fb7e>] __schedule+0xae/0x830
> [ 7513.595732] softirqs last  enabled at (13768770): [<ffffffffa0a0035c>] __do_softirq+0x35c/0x45c
> [ 7513.597472] softirqs last disabled at (13768759): [<ffffffffa006a713>] irq_exit+0xb3/0xc0
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-08-06 17:47       ` David Sterba
  2019-08-07  2:22         ` Qu Wenruo
@ 2019-08-07  6:08         ` Qu Wenruo
  1 sibling, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2019-08-07  6:08 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3731 bytes --]



On 2019/8/7 上午1:47, David Sterba wrote:
> On Tue, Aug 06, 2019 at 10:04:51PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/8/6 下午9:58, David Sterba wrote:
>>> On Thu, Jul 25, 2019 at 02:12:20PM +0800, Qu Wenruo wrote:
>>>>  
>>>>  	if (!first_key)
>>>>  		return 0;
>>>> +	/* We have @first_key, so this @eb must have at least one item */
>>>> +	if (btrfs_header_nritems(eb) == 0) {
>>>> +		btrfs_err(fs_info,
>>>> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
>>>> +			  eb->start);
>>>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>>>> +		return -EUCLEAN;
>>>> +	}
>>>
>>> generic/015 complains:
>>>
>>> generic/015             [13:51:40][ 5949.416657] run fstests generic/015 at 2019-08-06 13:51:40
>>
>> I hit this once, but not this test case.
>> The same backtrace for csum tree.
>>
>> Have you ever hit it again?
> 
> Yes I found a few more occurences, the last one seems to be interesting so it's
> pasted as-is.
> 
> generic/449
> 
> [21423.875017]  read_block_for_search+0x144/0x380 [btrfs]
> [21423.876433]  btrfs_search_slot+0x297/0xfc0 [btrfs]
> [21423.877830]  ? btrfs_update_delayed_refs_rsv+0x59/0x70 [btrfs]
> [21423.880038]  btrfs_lookup_csum+0xa9/0x210 [btrfs]
> [21423.881304]  btrfs_csum_file_blocks+0x205/0x800 [btrfs]
> [21423.882674]  ? unpin_extent_cache+0x27/0xc0 [btrfs]
> [21423.884050]  add_pending_csums+0x50/0x70 [btrfs]
> [21423.885285]  btrfs_finish_ordered_io+0x403/0x7b0 [btrfs]
> [21423.886781]  ? _raw_spin_unlock_bh+0x30/0x40
> [21423.888164]  normal_work_helper+0xe2/0x520 [btrfs]
> [21423.889521]  process_one_work+0x22f/0x5b0
> [21423.890332]  worker_thread+0x50/0x3b0
> [21423.891001]  ? process_one_work+0x5b0/0x5b0
> [21423.892025]  kthread+0x11a/0x130

Haven't yet triggered it again, but indeed looks like a race.

I have only triggered it once on my old host, while now migrated to a
new system, it looks my new setup is sometimes too fast to trigger the
race window, sometimes even too fast to allow btrfs replace cancel to be
executed before replace finishes.

Would you please try the following diff you could trigger it more reliably?
(Which moves the nritems check after the generation check)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a843c21f3060..787ebe4af55d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -414,6 +414,15 @@ int btrfs_verify_level_key(struct extent_buffer
*eb, int level,

        if (!first_key)
                return 0;
+       /*
+        * For live tree block (new tree blocks in current transaction),
+        * we need proper lock context to avoid race, which is
impossible here.
+        * So we only checks tree blocks which is read from disk, whose
+        * generation <= fs_info->last_trans_committed.
+        */
+       if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
+               return 0;
+
        /* We have @first_key, so this @eb must have at least one item */
        if (btrfs_header_nritems(eb) == 0) {
                btrfs_err(fs_info,
@@ -423,14 +432,6 @@ int btrfs_verify_level_key(struct extent_buffer
*eb, int level,
                return -EUCLEAN;
        }

-       /*
-        * For live tree block (new tree blocks in current transaction),
-        * we need proper lock context to avoid race, which is
impossible here.
-        * So we only checks tree blocks which is read from disk, whose
-        * generation <= fs_info->last_trans_committed.
-        */
-       if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
-               return 0;
        if (found_level)
                btrfs_node_key_to_cpu(eb, &found_key, 0);
        else


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-08-07  6:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25  6:12 [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
2019-07-25  6:12 ` [PATCH v2 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
2019-07-25  6:44   ` Nikolay Borisov
2019-07-25  6:58     ` Qu Wenruo
2019-07-31 13:47       ` David Sterba
2019-07-25  6:12 ` [PATCH v2 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
2019-07-25  8:39   ` Nikolay Borisov
2019-07-25  9:31     ` Qu Wenruo
2019-07-30 14:59   ` kbuild test robot
2019-07-25  6:12 ` [PATCH v2 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
2019-07-25  9:26   ` Nikolay Borisov
2019-07-25  9:34     ` Qu Wenruo
2019-08-06 13:58   ` David Sterba
2019-08-06 14:04     ` Qu Wenruo
2019-08-06 17:47       ` David Sterba
2019-08-07  2:22         ` Qu Wenruo
2019-08-07  6:08         ` Qu Wenruo
2019-07-25  6:12 ` [PATCH v2 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
2019-07-25 10:20   ` Nikolay Borisov
2019-07-25  6:12 ` [PATCH v2 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
2019-07-25  6:49 ` [PATCH v2 0/5] btrfs: Enhanced runtime defence against fuzzed images Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).