All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images
@ 2019-07-10  8:02 Qu Wenruo
  2019-07-10  8:02 ` [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jungyeon Yoon

Another wave of defence enhancment, including:

- Enhanced eb accessors
  Not really needed for the fuzzed images, as 448de471cd4c
  ("btrfs: Check the first key and level for cached extent buffer")
  already fixed half of the reported images.
  Just add a final layer of safe net.

- BUG_ON() hunt in __btrfs_free_extent()
  Kill BUG_ON()s in __btrfs_free_extent(), replace with error reporting
  and why it shouldn't happen.

  Also add comment on what __btrfs_free_extent() is designed to do, with
  two dump-tree examples for newcomers.

- BUG_ON() hunt in __btrfs_inc_extent_ref()
  Just like __btrfs_free_extent(), but less comment as
  comment for __btrfs_free_extent() should also work for
  __btrfs_inc_extent_ref(), and __btrfs_inc_extent_ref() has a better
  structure than __btrfs_free_extent().

- Defence against unbalanced empty leaf

- Defence against bad key order across two tree blocks

The last two cases can't be rejected by tree-checker and they are all
cross-eb cases.
Thankfully we can reuse existing first_key check against unbalanced
empty leaf, but needs extra check deep into ctree.c for tree block
merging time check.

Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
[ Not to mail bombarding the report, thus only RB tag in cover letter ]

Qu Wenruo (5):
  btrfs: extent_io: Do extra check for extent buffer read write
    functions
  btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do
    better comment
  btrfs: Detect unbalanced tree with empty leaf before crashing btree
    operations
  btrfs: extent-tree: Kill the BUG_ON() in
    insert_inline_extent_backref()
  btrfs: ctree: Checking key orders before merged tree blocks

 fs/btrfs/ctree.c        |  63 +++++++++++++++
 fs/btrfs/disk-io.c      |   9 +++
 fs/btrfs/extent-tree.c  | 168 ++++++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.c    |  79 ++++++++++---------
 fs/btrfs/tree-checker.c |   6 ++
 5 files changed, 273 insertions(+), 52 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
@ 2019-07-10  8:02 ` Qu Wenruo
  2019-07-10 10:42   ` Nikolay Borisov
  2019-07-10  8:02 ` [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs

Although we have start, len check for extent buffer reader/write (e.g.
read_extent_buffer()), those checks has its limitations:
- No overflow check
  Values like start = 1024 len = -1024 can still pass the basic
   (start + len) > eb->len check.

- Checks are not consistent
  For read_extent_buffer() we only check (start + len) against eb->len.
  While for memcmp_extent_buffer() we also check start against eb->len.

- Different error reporting mechanism
  We use WARN() in read_extent_buffer() but BUG() in
  memcpy_extent_buffer().

- Still modify memory if the request is obviously wrong
  In read_extent_buffer() even we find (start + len) > eb->len, we still
  call memset(dst, 0, len), which can eaisly cause memory access error
  if start + len overflows.

To address above problems, this patch creates a new common function to
check such access, check_eb_range().
- Add overflow check
  This function checks start, start + len against eb->len and overflow
  check.

- Unified checks

- Unified error reports
  Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
  And also do btrfs_warn() message for non-debug build.

- Exit ASAP if check fails
  No more possible memory corruption.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
[ Inspired by above report, the report itself is already addressed ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent_io.c | 79 +++++++++++++++++++++++---------------------
 1 file changed, 42 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index db337e53aab3..b072ef5bceac 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5429,6 +5429,31 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	return ret;
 }
 
+/*
+ * Check if the [start, start + len) range is valid before reading/writing
+ * the eb.
+ *
+ * Caller should not touch the dst/src memory if this function returns error.
+ */
+static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
+			  unsigned long len)
+{
+	unsigned long end;
+
+	/* start, start + len should not go beyond eb->len nor overflow */
+	if (unlikely(start > eb->len || start + len > eb->len ||
+		     check_add_overflow(start, len, &end))) {
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
+"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
+		     eb->start, eb->len, start, len);
+		btrfs_warn(eb->fs_info,
+"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
+			   eb->start, eb->len, start, len);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 			unsigned long start, unsigned long len)
 {
@@ -5440,12 +5465,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	if (start + len > eb->len) {
-		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
-		     eb->start, eb->len, start, len);
-		memset(dst, 0, len);
+	if (check_eb_range(eb, start, len))
 		return;
-	}
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5554,8 +5575,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 	int ret = 0;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return -EINVAL;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5609,8 +5630,8 @@ void write_extent_buffer(struct extent_buffer *eb, const void *srcv,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5639,8 +5660,8 @@ void memzero_extent_buffer(struct extent_buffer *eb, unsigned long start,
 	size_t start_offset = offset_in_page(eb->start);
 	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start_offset + start);
 
@@ -5684,6 +5705,10 @@ void copy_extent_buffer(struct extent_buffer *dst, struct extent_buffer *src,
 	size_t start_offset = offset_in_page(dst->start);
 	unsigned long i = (start_offset + dst_offset) >> PAGE_SHIFT;
 
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(src, src_offset, len))
+		return;
+
 	WARN_ON(src->len != dst_len);
 
 	offset = offset_in_page(start_offset + dst_offset);
@@ -5872,7 +5897,6 @@ static void copy_pages(struct page *dst_page, struct page *src_page,
 void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 			   unsigned long src_offset, unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
@@ -5880,18 +5904,9 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus src_offset %lu move len %lu dst len %lu",
-			 src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus dst_offset %lu move len %lu dst len %lu",
-			 dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 
 	while (len > 0) {
 		dst_off_in_page = offset_in_page(start_offset + dst_offset);
@@ -5917,7 +5932,6 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 			   unsigned long src_offset, unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
@@ -5927,18 +5941,9 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus src_offset %lu move len %lu len %lu",
-			  src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus dst_offset %lu move len %lu len %lu",
-			  dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 	if (dst_offset < src_offset) {
 		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
 		return;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2019-07-10  8:02 ` [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2019-07-10  8:02 ` Qu Wenruo
  2019-07-10 10:48   ` Nikolay Borisov
  2019-07-10  8:02 ` [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs

__btrfs_free_extent() is one of the best cases to show how optimization
could make a function hard to read.

In fact __btrfs_free_extent() is only doing two major works:
1. Reduce the refs number of an extent backref
   Either it's an inlined extent backref (inside EXTENT/METADATA item) or
   a keyed extent backref (SHARED_* item).
   We only need to locate that backref line, either reduce the number or
   remove the backref line completely.

2. Update the refs count in EXTENT/METADATA_ITEM

But in real world, we do it in a complex but somewhat efficient way.
During step 1), we will try to locate the EXTENT/METADATA_ITEM without
triggering another btrfs_search_slot() as fast path.

Only when we failed to locate that item, we will trigger another
btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
updated/deleted the backref line.

And we have a lot of restrict check on things like refs_to_drop against
extent refs and special case check for single ref extent.

All of these results:
- 7 BUG_ON()s in a single function
  Although all these BUG_ON() are doing correct check, they're super
  easy to get triggered for fuzzed images.
  It's never a good idea to piss the end user.

- Near 300 lines without much useful comments but a lot of hidden
  conditions
  I believe even the author needs several minutes to recall what the
  code is doing
  Not to mention a lot of BUG_ON() conditions needs to go back tens of
  lines to find out why.

This patch address all these problems by:
- Introduce two examples to show what __btrfs_free_extent() is doing
  One inlined backref case and one keyed case.
  Should cover most cases.

- Kill all BUG_ON()s with proper error message and optional leaf dump

- Add comment to show the overall workflow

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
[ The report triggers one BUG_ON() in __btrfs_free_extent() ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 144 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 131 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5faf057f6f37..199e4eed8f2d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6848,6 +6848,53 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 	return 0;
 }
 
+/*
+ * Real work happens here to drop one or more refs of @node.
+ *
+ * The work is mostly done in two parts:
+ * 1. Locate the extent refs.
+ *    It's either inlined in EXTENT/METADATA_ITEM or in keyed SHARED_* item.
+ *    Locate it then reduces the refs number or remove the ref line completely.
+ *
+ * 2. Update the refs count in EXTENT/METADATA_ITEM
+ *
+ * Due to the above two operations and possible optimizations, the function
+ * is a little hard to read, but with the following examples, the result
+ * of this function should be pretty easy to get.
+ *
+ * Example:
+ * *** Inlined backref case ***
+ * In extent tree we have:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 2 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *		extent data backref root FS_TREE objectid 257 offset 0 count 1
+ *
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 257, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 1 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *
+ * *** Keyed backref case ***
+ * In extent tree we have:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 754 gen 6 flags DATA
+ *	[...]
+ *	item 2 key (13631488 EXTENT_DATA_REF <HASH>) itemoff 3915 itemsize 28
+ *		extent data backref root FS_TREE objectid 866 offset 0 count 1
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 866, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 753 gen 6 flags DATA
+ * And that (13631488 EXTENT_DATA_REF <HASH>) get removed.
+ */
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			       struct btrfs_delayed_ref_node *node, u64 parent,
 			       u64 root_objectid, u64 owner_objectid,
@@ -6881,7 +6928,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	path->leave_spinning = 1;
 
 	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
-	BUG_ON(!is_data && refs_to_drop != 1);
+
+	if (unlikely(!is_data && refs_to_drop != 1)) {
+		btrfs_crit(info,
+"invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
+			   node->bytenr, refs_to_drop);
+		ret = -EINVAL;
+		btrfs_abort_transaction(trans, ret);
+		goto out;
+	}
 
 	if (is_data)
 		skinny_metadata = false;
@@ -6890,6 +6945,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				    parent, root_objectid, owner_objectid,
 				    owner_offset);
 	if (ret == 0) {
+		/*
+		 * Either the inline backref or the SHARED_DATA_REF/
+		 * SHARED_BLOCK_REF is found
+		 *
+		 * Here is a quick path to locate EXTENT/METADATA_ITEM.
+		 * It's possible the EXTENT/METADATA_ITEM is near current slot.
+		 */
 		extent_slot = path->slots[0];
 		while (extent_slot >= 0) {
 			btrfs_item_key_to_cpu(path->nodes[0], &key,
@@ -6906,13 +6968,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				found_extent = 1;
 				break;
 			}
+
+			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
 			if (path->slots[0] - extent_slot > 5)
 				break;
 			extent_slot--;
 		}
 
 		if (!found_extent) {
-			BUG_ON(iref);
+			if (unlikely(iref)) {
+				btrfs_crit(info,
+"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
+				goto err_dump_abort;
+			}
+			/* Must be SHARED_* item, remove the backref first */
 			ret = remove_extent_backref(trans, path, NULL,
 						    refs_to_drop,
 						    is_data, &last_ref);
@@ -6923,6 +6992,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			btrfs_release_path(path);
 			path->leave_spinning = 1;
 
+
+			/* Slow path to locate EXTENT/METADATA_ITEM */
 			key.objectid = bytenr;
 			key.type = BTRFS_EXTENT_ITEM_KEY;
 			key.offset = num_bytes;
@@ -6997,19 +7068,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
 	    key.type == BTRFS_EXTENT_ITEM_KEY) {
 		struct btrfs_tree_block_info *bi;
-		BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
+		if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
+			btrfs_crit(info,
+"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >%lu",
+				   key.objectid, key.type, key.offset,
+				   owner_objectid, item_size,
+				   sizeof(*ei) + sizeof(*bi));
+			goto err_dump_abort;
+		}
 		bi = (struct btrfs_tree_block_info *)(ei + 1);
 		WARN_ON(owner_objectid != btrfs_tree_block_level(leaf, bi));
 	}
 
 	refs = btrfs_extent_refs(leaf, ei);
 	if (refs < refs_to_drop) {
-		btrfs_err(info,
-			  "trying to drop %d refs but we only have %Lu for bytenr %Lu",
+		btrfs_crit(info,
+		"trying to drop %d refs but we only have %Lu for bytenr %Lu",
 			  refs_to_drop, refs, bytenr);
-		ret = -EINVAL;
-		btrfs_abort_transaction(trans, ret);
-		goto out;
+		goto err_dump_abort;
 	}
 	refs -= refs_to_drop;
 
@@ -7021,7 +7097,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 		 * be updated by remove_extent_backref
 		 */
 		if (iref) {
-			BUG_ON(!found_extent);
+			if (unlikely(!found_extent)) {
+				btrfs_crit(info,
+"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found");
+				goto err_dump_abort;
+			}
 		} else {
 			btrfs_set_extent_refs(leaf, ei, refs);
 			btrfs_mark_buffer_dirty(leaf);
@@ -7036,13 +7116,36 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			}
 		}
 	} else {
+		/* In this branch refs == 1 */
 		if (found_extent) {
-			BUG_ON(is_data && refs_to_drop !=
-			       extent_data_ref_count(path, iref));
+			if (is_data && refs_to_drop !=
+			    extent_data_ref_count(path, iref)) {
+				btrfs_crit(info,
+		"invalid refs_to_drop, current refs %u refs_to_drop %u",
+					   extent_data_ref_count(path, iref),
+					   refs_to_drop);
+				goto err_dump_abort;
+			}
 			if (iref) {
-				BUG_ON(path->slots[0] != extent_slot);
+				if (path->slots[0] != extent_slot) {
+					btrfs_crit(info,
+"invalid iref, extent item key (%llu %u %llu) doesn't has wanted iref",
+						   key.objectid, key.type,
+						   key.offset);
+					goto err_dump_abort;
+				}
 			} else {
-				BUG_ON(path->slots[0] != extent_slot + 1);
+				/*
+				 * No inline ref, we must at SHARED_* item,
+				 * And it's single ref, it must be:
+				 * |	extent_slot	  ||extent_slot + 1|
+				 * [ EXTENT/METADATA_ITEM ][ SHARED_* ITEM ]
+				 */
+				if (path->slots[0] != extent_slot + 1) {
+					btrfs_crit(info,
+	"invalid SHARED_* item, previous item is not EXTENT/METADATA_ITEM");
+					goto err_dump_abort;
+				}
 				path->slots[0] = extent_slot;
 				num_to_del = 2;
 			}
@@ -7082,6 +7185,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 out:
 	btrfs_free_path(path);
 	return ret;
+err_dump_abort:
+	/*
+	 * Leaf dump can take up a lot of dmesg buffer since default nodesize
+	 * is already 16K.
+	 * So we only do full leaf dump for debug build.
+	 */
+	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+		btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
+			   path->slots[0], extent_slot);
+		btrfs_print_leaf(path->nodes[0]);
+	}
+
+	btrfs_abort_transaction(trans, -EUCLEAN);
+	btrfs_free_path(path);
+	return -EUCLEAN;
 }
 
 /*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2019-07-10  8:02 ` [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
  2019-07-10  8:02 ` [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2019-07-10  8:02 ` Qu Wenruo
  2019-07-10 10:54   ` Nikolay Borisov
  2019-07-10  8:02 ` [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
  2019-07-10  8:02 ` [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
  4 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With crafted image, btrfs will panic at btree operations:
  kernel BUG at fs/btrfs/ctree.c:3894!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:__push_leaf_left+0x6b6/0x6e0
  Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
  RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
  RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
  RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
  R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
  FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
  Call Trace:
  ? _cond_resched+0x1a/0x50
  push_leaf_left+0x179/0x190
  btrfs_del_items+0x316/0x470
  btrfs_del_csums+0x215/0x3a0
  __btrfs_free_extent.isra.72+0x5a7/0xbe0
  __btrfs_run_delayed_refs+0x539/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace c2425e6e89b5558f ]---

[CAUSE]
The offending csum tree looks like this:
checksum tree key (CSUM_TREE ROOT_ITEM 0)
node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        ...
        key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
        key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
        key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
        ...

leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
        item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
                range start 85975040 end 85983232 length 8192
        ...
leaf 29642752 items 0 free space 3995 generation 17 owner 0
                    ^ empty leaf            invalid owner ^

leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
        item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
                range start 92274688 end 95723520 length 3448832

So we have a corrupted csum tree where one tree leaf is completely
empty, causing unbalanced btree, thus leading to unexpected btree
balance error.

[FIX]
For this particular case, we handle it in two directions to catch it:
- Check if the tree block is empty through btrfs_verify_level_key()
  So that invalid tree blocks won't be read out through
  btrfs_search_slot() and its variants.

- Check 0 tree owner in tree checker
  NO tree is using 0 as its tree owner, detect it and reject at tree
  block read time.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c      | 9 +++++++++
 fs/btrfs/tree-checker.c | 6 ++++++
 2 files changed, 15 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index deb74a8c191a..af4786ed7bae 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -414,6 +414,15 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
 
 	if (!first_key)
 		return 0;
+	/* We have @first_key, so this @eb must have at least one item */
+	if (btrfs_header_nritems(eb) == 0) {
+		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+		     KERN_ERR "BTRFS: tree nritems check failed\n");
+		btrfs_err(fs_info,
+		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
+			  eb->start);
+		return -EUCLEAN;
+	}
 
 	/*
 	 * For live tree block (new tree blocks in current transaction),
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index 96fce4bef4e7..a4c7f7ed8490 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -888,6 +888,12 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
 				    owner);
 			return -EUCLEAN;
 		}
+		/* Unknown tree */
+		if (owner == 0) {
+			generic_err(leaf, 0,
+				"invalid owner, root 0 is not defined");
+			return -EUCLEAN;
+		}
 		return 0;
 	}
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
  2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-07-10  8:02 ` [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2019-07-10  8:02 ` Qu Wenruo
  2019-07-10 11:12   ` Nikolay Borisov
  2019-07-10  8:02 ` [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
  4 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With crafted image, btrfs can panic at insert_inline_extent_backref():
  kernel BUG at fs/btrfs/extent-tree.c:1857!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
  Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
  RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
  FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
  Call Trace:
  ? _cond_resched+0x1a/0x50
  __btrfs_inc_extent_ref.isra.64+0x7e/0x240
  ? btrfs_merge_delayed_refs+0xa5/0x330
  __btrfs_run_delayed_refs+0x653/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 2ad8b3de903cf825 ]---

[CAUSE]
Due to extent tree corruption (still valid by itself, but bad cross ref),
we can allocate an extent which is still in extent tree.

Then we will try to insert/update inlined extent backref line for the
existing tree block, and triggering the BUG_ON() where we don't expect
to increase refs on non-subvolume tree block.

[FIX]
Replace that BUG_ON() with proper error message and leaf dump for debug
build.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 199e4eed8f2d..72868d9ac58e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1856,7 +1856,24 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
 					   num_bytes, parent, root_objectid,
 					   owner, offset, 1);
 	if (ret == 0) {
-		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		/*
+		 * We're adding ref to an existing tree block, only happens
+		 * when creating snapshots, not possible to other trees.
+		 */
+		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+			WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
+			"invalid operation for non-subvolume tree block");
+
+			btrfs_crit(trans->fs_info,
+"invalid operation, adding refs to a non-subvolume tree block, bytenr=%llu num_bytes=%llu root_objectid=%llu",
+				   bytenr, num_bytes, root_objectid);
+			if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+				btrfs_crit(trans->fs_info,
+			"path->slots[0]=%d path->nodes[0]:", path->slots[0]);
+				btrfs_print_leaf(path->nodes[0]);
+			}
+			return -EUCLEAN;
+		}
 		update_inline_extent_backref(path, iref, refs_to_add,
 					     extent_op, NULL);
 	} else if (ret == -ENOENT) {
@@ -2073,6 +2090,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 /*
  * __btrfs_inc_extent_ref - insert backreference for a given extent
  *
+ * The work is opposite as __btrfs_free_extent().
+ * For more info about how it works or examples, refer to __btrfs_free_extent().
+ *
  * @trans:	    Handle of transaction
  *
  * @node:	    The delayed ref node used to get the bytenr/length for
@@ -2153,9 +2173,9 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 	/* now insert the actual backref */
 	ret = insert_extent_backref(trans, path, bytenr, parent, root_objectid,
 				    owner, offset, refs_to_add);
+out:
 	if (ret)
 		btrfs_abort_transaction(trans, ret);
-out:
 	btrfs_free_path(path);
 	return ret;
 }
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (3 preceding siblings ...)
  2019-07-10  8:02 ` [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2019-07-10  8:02 ` Qu Wenruo
  2019-07-10 11:19   ` Nikolay Borisov
  4 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10  8:02 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
With crafted image, btrfs can panic at btrfs_del_csums().
  kernel BUG at fs/btrfs/ctree.c:3188!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
  Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
  RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
  RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
  R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
  FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
  Call Trace:
  truncate_one_csum+0xac/0xf0
  btrfs_del_csums+0x24f/0x3a0
  __btrfs_free_extent.isra.72+0x5a7/0xbe0
  __btrfs_run_delayed_refs+0x539/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 93bf9db00e6c374e ]---

[CAUSE]
This crafted image has a very tricky key order corruption:

  checksum tree key (CSUM_TREE ROOT_ITEM 0)
  node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
          ...
          key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
          key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
          ...

  leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
          item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
                  range start 73785344 end 75497472 length 1712128
          item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
                  range start 75497472 end 75501568 length 4096
          item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
                  range start 75501568 end 77283328 length 1781760
          item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
                  range start 77283328 end 77287424 length 4096
          item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
                  range start 4120596480 end 4120903680 length 307200
  leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
          item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
                  range start 77594624 end 79306752 length 1712128
          ...

Note the item 4 key of leaf 29757440, which is obviously too large, and
even larger than the first key of the next leaf.

However it still follows the key order in that tree block, thus tree
checker is unable to detect it at read time, since tree checker can only
work inside a leaf, thus such complex corruption can't be rejected in
advance.

[FIX]
The next timing to detect such problem is at tree block merge time,
which is in push_node_left(), balance_node_right(), push_leaf_left() and
push_leaf_right().

Now we check if the key order of the right most key of the left node is
larger than the left most key of the right node.

By this we don't need to call the full tree-check, while still keeps the
key order correct as key order in each node is already checked by tree
checker thus we only need to check the above two slots.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5df76c17775a..38118e050689 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3219,6 +3219,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
 		fixup_low_keys(path, &disk_key, 1);
 }
 
+/*
+ * Check the cross tree block key ordering.
+ *
+ * Tree-checker only works inside one tree block, thus the following
+ * corruption can not be rejected by tree-checker:
+ * Leaf @left			| Leaf @right
+ * --------------------------------------------------------------
+ * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
+ *
+ * Key f6 in leaf @left itself is valid, but not valid when the next
+ * key in leaf @right is 7.
+ * This can only be checked at tree block merge time.
+ * And since tree checker has ensured all key order in each tree block
+ * is correct, we only need to bother the last key of @left and the first
+ * key of @right.
+ */
+static int check_cross_tree_key_order(struct extent_buffer *left,
+				      struct extent_buffer *right)
+{
+	struct btrfs_key left_last;
+	struct btrfs_key right_first;
+	int level = btrfs_header_level(left);
+	int nr_left = btrfs_header_nritems(left);
+	int nr_right = btrfs_header_nritems(right);
+
+	/* No key to check in one of the tree blocks */
+	if (!nr_left || !nr_right)
+		return 0;
+	if (level) {
+		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_node_key_to_cpu(right, &right_first, 0);
+	} else {
+		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_item_key_to_cpu(right, &right_first, 0);
+	}
+	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+		btrfs_crit(left->fs_info,
+"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
+			   left_last.objectid, left_last.type,
+			   left_last.offset, right_first.objectid,
+			   right_first.type, right_first.offset);
+		return -EUCLEAN;
+	}
+	return 0;
+}
+
 /*
  * try to push data from one node into the next node left in the
  * tree.
@@ -3263,6 +3309,10 @@ static int push_node_left(struct btrfs_trans_handle *trans,
 	} else
 		push_items = min(src_nritems - 8, push_items);
 
+	/* dst is the left eb src is the middle eb */
+	ret = check_cross_tree_key_order(dst, src);
+	if (ret < 0)
+		return ret;
 	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
 	if (ret) {
 		btrfs_abort_transaction(trans, ret);
@@ -3331,6 +3381,10 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
 	if (max_push < push_items)
 		push_items = max_push;
 
+	/* dst is the right eb, src is the middle eb */
+	ret = check_cross_tree_key_order(src, dst);
+	if (ret < 0)
+		return ret;
 	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
 	BUG_ON(ret < 0);
 	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
@@ -3810,6 +3864,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 	if (left_nritems == 0)
 		goto out_unlock;
 
+	ret = check_cross_tree_key_order(left, right);
+	if (ret < 0) {
+		btrfs_tree_unlock(right);
+		free_extent_buffer(right);
+		return ret;
+	}
 	if (path->slots[0] == left_nritems && !empty) {
 		/* Key greater than all keys in the leaf, right neighbor has
 		 * enough room for it and we're not emptying our leaf to delete
@@ -4048,6 +4108,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 		goto out;
 	}
 
+	ret = check_cross_tree_key_order(left, right);
+	if (ret < 0)
+		goto out;
 	return __push_leaf_left(path, min_data_size,
 			       empty, left, free_space, right_nritems,
 			       max_slot);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-10  8:02 ` [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2019-07-10 10:42   ` Nikolay Borisov
  2019-07-10 10:58     ` WenRuo Qu
  0 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 10:42 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
> Although we have start, len check for extent buffer reader/write (e.g.
> read_extent_buffer()), those checks has its limitations:
> - No overflow check
>   Values like start = 1024 len = -1024 can still pass the basic
>    (start + len) > eb->len check.
> 
> - Checks are not consistent
>   For read_extent_buffer() we only check (start + len) against eb->len.
>   While for memcmp_extent_buffer() we also check start against eb->len.
> 
> - Different error reporting mechanism
>   We use WARN() in read_extent_buffer() but BUG() in
>   memcpy_extent_buffer().
> 
> - Still modify memory if the request is obviously wrong
>   In read_extent_buffer() even we find (start + len) > eb->len, we still
>   call memset(dst, 0, len), which can eaisly cause memory access error
>   if start + len overflows.
> 
> To address above problems, this patch creates a new common function to
> check such access, check_eb_range().
> - Add overflow check
>   This function checks start, start + len against eb->len and overflow
>   check.
> 
> - Unified checks
> 
> - Unified error reports
>   Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
>   And also do btrfs_warn() message for non-debug build.
> 
> - Exit ASAP if check fails
>   No more possible memory corruption.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
> [ Inspired by above report, the report itself is already addressed ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent_io.c | 79 +++++++++++++++++++++++---------------------
>  1 file changed, 42 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index db337e53aab3..b072ef5bceac 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5429,6 +5429,31 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
>  	return ret;
>  }
>  
> +/*
> + * Check if the [start, start + len) range is valid before reading/writing
> + * the eb.
> + *
> + * Caller should not touch the dst/src memory if this function returns error.
> + */
> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
> +			  unsigned long len)
> +{
> +	unsigned long end;
> +
> +	/* start, start + len should not go beyond eb->len nor overflow */
> +	if (unlikely(start > eb->len || start + len > eb->len ||

I think your check here is wrong, it should be start + len > start +
eb->len. start is the logical address hence it can be a lot bigger than
the size of the eb which is 16k by default.

> +		     check_add_overflow(start, len, &end))) {
> +		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> +		     eb->start, eb->len, start, len);
> +		btrfs_warn(eb->fs_info,
> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> +			   eb->start, eb->len, start, len);

If CONFIG_BTRFS_DEBUG is enabled then we will print the warning text
twice. Simply make  it:

WARN_ON(IS_ENABLED()) and leave the btrfs_Warn to always print the text.

> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
>  void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
>  			unsigned long start, unsigned long len)
>  {
> @@ -5440,12 +5465,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	if (start + len > eb->len) {
> -		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
> -		     eb->start, eb->len, start, len);
> -		memset(dst, 0, len);
> +	if (check_eb_range(eb, start, len))
>  		return;
> -	}
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5554,8 +5575,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  	int ret = 0;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return -EINVAL;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5609,8 +5630,8 @@ void write_extent_buffer(struct extent_buffer *eb, const void *srcv,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5639,8 +5660,8 @@ void memzero_extent_buffer(struct extent_buffer *eb, unsigned long start,
>  	size_t start_offset = offset_in_page(eb->start);
>  	unsigned long i = (start_offset + start) >> PAGE_SHIFT;
>  
> -	WARN_ON(start > eb->len);
> -	WARN_ON(start + len > eb->start + eb->len);
> +	if (check_eb_range(eb, start, len))
> +		return;
>  
>  	offset = offset_in_page(start_offset + start);
>  
> @@ -5684,6 +5705,10 @@ void copy_extent_buffer(struct extent_buffer *dst, struct extent_buffer *src,
>  	size_t start_offset = offset_in_page(dst->start);
>  	unsigned long i = (start_offset + dst_offset) >> PAGE_SHIFT;
>  
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(src, src_offset, len))
> +		return;
> +
>  	WARN_ON(src->len != dst_len);
>  
>  	offset = offset_in_page(start_offset + dst_offset);
> @@ -5872,7 +5897,6 @@ static void copy_pages(struct page *dst_page, struct page *src_page,
>  void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  			   unsigned long src_offset, unsigned long len)
>  {
> -	struct btrfs_fs_info *fs_info = dst->fs_info;
>  	size_t cur;
>  	size_t dst_off_in_page;
>  	size_t src_off_in_page;
> @@ -5880,18 +5904,9 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  	unsigned long dst_i;
>  	unsigned long src_i;
>  
> -	if (src_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			"memmove bogus src_offset %lu move len %lu dst len %lu",
> -			 src_offset, len, dst->len);
> -		BUG();
> -	}
> -	if (dst_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			"memmove bogus dst_offset %lu move len %lu dst len %lu",
> -			 dst_offset, len, dst->len);
> -		BUG();
> -	}
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(dst, src_offset, len))
> +		return;
>  
>  	while (len > 0) {
>  		dst_off_in_page = offset_in_page(start_offset + dst_offset);
> @@ -5917,7 +5932,6 @@ void memcpy_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  			   unsigned long src_offset, unsigned long len)
>  {
> -	struct btrfs_fs_info *fs_info = dst->fs_info;
>  	size_t cur;
>  	size_t dst_off_in_page;
>  	size_t src_off_in_page;
> @@ -5927,18 +5941,9 @@ void memmove_extent_buffer(struct extent_buffer *dst, unsigned long dst_offset,
>  	unsigned long dst_i;
>  	unsigned long src_i;
>  
> -	if (src_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			  "memmove bogus src_offset %lu move len %lu len %lu",
> -			  src_offset, len, dst->len);
> -		BUG();
> -	}
> -	if (dst_offset + len > dst->len) {
> -		btrfs_err(fs_info,
> -			  "memmove bogus dst_offset %lu move len %lu len %lu",
> -			  dst_offset, len, dst->len);
> -		BUG();
> -	}
> +	if (check_eb_range(dst, dst_offset, len) ||
> +	    check_eb_range(dst, src_offset, len))
> +		return;
>  	if (dst_offset < src_offset) {
>  		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
>  		return;
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-10  8:02 ` [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2019-07-10 10:48   ` Nikolay Borisov
  2019-07-10 11:00     ` WenRuo Qu
  0 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 10:48 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
> __btrfs_free_extent() is one of the best cases to show how optimization
> could make a function hard to read.
> 
> In fact __btrfs_free_extent() is only doing two major works:
> 1. Reduce the refs number of an extent backref
>    Either it's an inlined extent backref (inside EXTENT/METADATA item) or
>    a keyed extent backref (SHARED_* item).
>    We only need to locate that backref line, either reduce the number or
>    remove the backref line completely.
> 
> 2. Update the refs count in EXTENT/METADATA_ITEM
> 
> But in real world, we do it in a complex but somewhat efficient way.
> During step 1), we will try to locate the EXTENT/METADATA_ITEM without
> triggering another btrfs_search_slot() as fast path.
> 
> Only when we failed to locate that item, we will trigger another
> btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
> updated/deleted the backref line.
> 
> And we have a lot of restrict check on things like refs_to_drop against
> extent refs and special case check for single ref extent.
> 
> All of these results:
> - 7 BUG_ON()s in a single function
>   Although all these BUG_ON() are doing correct check, they're super
>   easy to get triggered for fuzzed images.
>   It's never a good idea to piss the end user.
> 
> - Near 300 lines without much useful comments but a lot of hidden
>   conditions
>   I believe even the author needs several minutes to recall what the
>   code is doing
>   Not to mention a lot of BUG_ON() conditions needs to go back tens of
>   lines to find out why.
> 
> This patch address all these problems by:
> - Introduce two examples to show what __btrfs_free_extent() is doing
>   One inlined backref case and one keyed case.
>   Should cover most cases.
> 
> - Kill all BUG_ON()s with proper error message and optional leaf dump
> 
> - Add comment to show the overall workflow
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
> [ The report triggers one BUG_ON() in __btrfs_free_extent() ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

<snip>

> @@ -6997,19 +7068,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  	if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
>  	    key.type == BTRFS_EXTENT_ITEM_KEY) {
>  		struct btrfs_tree_block_info *bi;
> -		BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
> +		if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
> +			btrfs_crit(info,
> +"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >%lu",
nit: stray '>'

<snip>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations
  2019-07-10  8:02 ` [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
@ 2019-07-10 10:54   ` Nikolay Borisov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 10:54 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs will panic at btree operations:
>   kernel BUG at fs/btrfs/ctree.c:3894!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1138 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:__push_leaf_left+0x6b6/0x6e0
>   Code: 00 00 48 98 48 8d 04 80 48 8d 74 80 65 e8 42 5a 04 00 48 8b bd 78 ff ff ff 8b bf 90 d0 00 00 89 7d 98 83 ef 65 e9 06 ff ff ff <0f> 0b 0f 0b 48 8b 85 78 ff ff ff 8b 90 90 d0 00 00 e9 eb fe ff ff
>   RSP: 0018:ffffc0bd4128b990 EFLAGS: 00010246
>   RAX: 0000000000000000 RBX: ffffa0a4ab8f0e38 RCX: 0000000000000000
>   RDX: ffffa0a280000000 RSI: 0000000000000000 RDI: ffffa0a4b3814000
>   RBP: ffffc0bd4128ba38 R08: 0000000000001000 R09: ffffc0bd4128b948
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000240
>   R13: ffffa0a4b556fb60 R14: ffffa0a4ab8f0af0 R15: ffffa0a4ab8f0af0
>   FS: 0000000000000000(0000) GS:ffffa0a4b7a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f2461c80020 CR3: 000000022b32a006 CR4: 00000000000206f0
>   Call Trace:
>   ? _cond_resched+0x1a/0x50
>   push_leaf_left+0x179/0x190
>   btrfs_del_items+0x316/0x470
>   btrfs_del_csums+0x215/0x3a0
>   __btrfs_free_extent.isra.72+0x5a7/0xbe0
>   __btrfs_run_delayed_refs+0x539/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace c2425e6e89b5558f ]---
> 
> [CAUSE]
> The offending csum tree looks like this:
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>         ...
>         key (EXTENT_CSUM EXTENT_CSUM 85975040) block 29630464 gen 17
>         key (EXTENT_CSUM EXTENT_CSUM 89911296) block 29642752 gen 17 <<<
>         key (EXTENT_CSUM EXTENT_CSUM 92274688) block 29646848 gen 17
>         ...
> 
> leaf 29630464 items 6 free space 1 generation 17 owner CSUM_TREE
>         item 0 key (EXTENT_CSUM EXTENT_CSUM 85975040) itemoff 3987 itemsize 8
>                 range start 85975040 end 85983232 length 8192
>         ...
> leaf 29642752 items 0 free space 3995 generation 17 owner 0
>                     ^ empty leaf            invalid owner ^
> 
> leaf 29646848 items 1 free space 602 generation 17 owner CSUM_TREE
>         item 0 key (EXTENT_CSUM EXTENT_CSUM 92274688) itemoff 627 itemsize 3368
>                 range start 92274688 end 95723520 length 3448832
> 
> So we have a corrupted csum tree where one tree leaf is completely
> empty, causing unbalanced btree, thus leading to unexpected btree
> balance error.
> 
> [FIX]
> For this particular case, we handle it in two directions to catch it:
> - Check if the tree block is empty through btrfs_verify_level_key()
>   So that invalid tree blocks won't be read out through
>   btrfs_search_slot() and its variants.
> 
> - Check 0 tree owner in tree checker
>   NO tree is using 0 as its tree owner, detect it and reject at tree
>   block read time.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202821
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/disk-io.c      | 9 +++++++++
>  fs/btrfs/tree-checker.c | 6 ++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index deb74a8c191a..af4786ed7bae 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -414,6 +414,15 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level,
>  
>  	if (!first_key)
>  		return 0;
> +	/* We have @first_key, so this @eb must have at least one item */
> +	if (btrfs_header_nritems(eb) == 0) {
> +		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
> +		     KERN_ERR "BTRFS: tree nritems check failed\n");

The text is redundant with the one printed from btrfs_err, just make it
WARN_ON(IS_ENABLED())

> +		btrfs_err(fs_info,
> +		"invalid tree nritems, bytenr=%llu nritems=0 expect >0",
> +			  eb->start);
> +		return -EUCLEAN;
> +	}
>  
>  	/*
>  	 * For live tree block (new tree blocks in current transaction),
> diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
> index 96fce4bef4e7..a4c7f7ed8490 100644
> --- a/fs/btrfs/tree-checker.c
> +++ b/fs/btrfs/tree-checker.c
> @@ -888,6 +888,12 @@ static int check_leaf(struct extent_buffer *leaf, bool check_item_data)
>  				    owner);
>  			return -EUCLEAN;
>  		}
> +		/* Unknown tree */
> +		if (owner == 0) {
> +			generic_err(leaf, 0,
> +				"invalid owner, root 0 is not defined");
> +			return -EUCLEAN;
> +		}
>  		return 0;
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-10 10:42   ` Nikolay Borisov
@ 2019-07-10 10:58     ` WenRuo Qu
  2019-07-24 16:00       ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: WenRuo Qu @ 2019-07-10 10:58 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2019/7/10 下午6:42, Nikolay Borisov wrote:
> 
> 
[...]
>>  
>> +/*
>> + * Check if the [start, start + len) range is valid before reading/writing
>> + * the eb.
>> + *
>> + * Caller should not touch the dst/src memory if this function returns error.
>> + */
>> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
>> +			  unsigned long len)
>> +{
>> +	unsigned long end;
>> +
>> +	/* start, start + len should not go beyond eb->len nor overflow */
>> +	if (unlikely(start > eb->len || start + len > eb->len ||
> 
> I think your check here is wrong, it should be start + len > start +
> eb->len. start is the logical address hence it can be a lot bigger than
> the size of the eb which is 16k by default.

Definitely NO.

[start, start + len) must be in the range of [0, nodesize).
So think again.

> 
>> +		     check_add_overflow(start, len, &end))) {
>> +		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
>> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
>> +		     eb->start, eb->len, start, len);
>> +		btrfs_warn(eb->fs_info,
>> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
>> +			   eb->start, eb->len, start, len);
> 
> If CONFIG_BTRFS_DEBUG is enabled then we will print the warning text
> twice. Simply make  it:
> 
> WARN_ON(IS_ENABLED()) and leave the btrfs_Warn to always print the text.

WARN_ON() doesn't contain any text to indicate the reason of the stack dump.
Thus I still prefer to show exact the reason other than takes developer
several seconds to combine the stack with the following btrfs_warn()
message.

Anyway, it's not something you'll see even in months, thus I don't
really think it would cause any difference.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2019-07-10 10:48   ` Nikolay Borisov
@ 2019-07-10 11:00     ` WenRuo Qu
  0 siblings, 0 replies; 19+ messages in thread
From: WenRuo Qu @ 2019-07-10 11:00 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs



On 2019/7/10 下午6:48, Nikolay Borisov wrote:
> 
> 
> On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
>> __btrfs_free_extent() is one of the best cases to show how optimization
>> could make a function hard to read.
>>
>> In fact __btrfs_free_extent() is only doing two major works:
>> 1. Reduce the refs number of an extent backref
>>    Either it's an inlined extent backref (inside EXTENT/METADATA item) or
>>    a keyed extent backref (SHARED_* item).
>>    We only need to locate that backref line, either reduce the number or
>>    remove the backref line completely.
>>
>> 2. Update the refs count in EXTENT/METADATA_ITEM
>>
>> But in real world, we do it in a complex but somewhat efficient way.
>> During step 1), we will try to locate the EXTENT/METADATA_ITEM without
>> triggering another btrfs_search_slot() as fast path.
>>
>> Only when we failed to locate that item, we will trigger another
>> btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
>> updated/deleted the backref line.
>>
>> And we have a lot of restrict check on things like refs_to_drop against
>> extent refs and special case check for single ref extent.
>>
>> All of these results:
>> - 7 BUG_ON()s in a single function
>>   Although all these BUG_ON() are doing correct check, they're super
>>   easy to get triggered for fuzzed images.
>>   It's never a good idea to piss the end user.
>>
>> - Near 300 lines without much useful comments but a lot of hidden
>>   conditions
>>   I believe even the author needs several minutes to recall what the
>>   code is doing
>>   Not to mention a lot of BUG_ON() conditions needs to go back tens of
>>   lines to find out why.
>>
>> This patch address all these problems by:
>> - Introduce two examples to show what __btrfs_free_extent() is doing
>>   One inlined backref case and one keyed case.
>>   Should cover most cases.
>>
>> - Kill all BUG_ON()s with proper error message and optional leaf dump
>>
>> - Add comment to show the overall workflow
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
>> [ The report triggers one BUG_ON() in __btrfs_free_extent() ]
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
> 
> <snip>
> 
>> @@ -6997,19 +7068,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>>  	if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
>>  	    key.type == BTRFS_EXTENT_ITEM_KEY) {
>>  		struct btrfs_tree_block_info *bi;
>> -		BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
>> +		if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
>> +			btrfs_crit(info,
>> +"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >%lu",
> nit: stray '>'

With values filled, it would be:
"invalid extent item size for key (1048575, 168, 4096) owner 7, has 10
expect >33"

Now you'd see it's not a stray '>'.

Thanks,
Qu
> 
> <snip>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
  2019-07-10  8:02 ` [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2019-07-10 11:12   ` Nikolay Borisov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 11:12 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at insert_inline_extent_backref():
>   kernel BUG at fs/btrfs/extent-tree.c:1857!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
>   Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
>   RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
>   RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
>   RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
>   RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>   R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
>   FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
>   Call Trace:
>   ? _cond_resched+0x1a/0x50
>   __btrfs_inc_extent_ref.isra.64+0x7e/0x240
>   ? btrfs_merge_delayed_refs+0xa5/0x330
>   __btrfs_run_delayed_refs+0x653/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace 2ad8b3de903cf825 ]---
> 
> [CAUSE]
> Due to extent tree corruption (still valid by itself, but bad cross ref),
> we can allocate an extent which is still in extent tree.
> 
> Then we will try to insert/update inlined extent backref line for the
> existing tree block, and triggering the BUG_ON() where we don't expect
> to increase refs on non-subvolume tree block.
> 
> [FIX]
> Replace that BUG_ON() with proper error message and leaf dump for debug
> build.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 199e4eed8f2d..72868d9ac58e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1856,7 +1856,24 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
>  					   num_bytes, parent, root_objectid,
>  					   owner, offset, 1);
>  	if (ret == 0) {
> -		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
> +		/*
> +		 * We're adding ref to an existing tree block, only happens
> +		 * when creating snapshots, not possible to other trees.
> +		 */
> +		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
> +			WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
> +			"invalid operation for non-subvolume tree block");

Again, the text of the warn doesn't bring any value, make it plain
WARN_ON(IS_ENABLED())

> +
> +			btrfs_crit(trans->fs_info,
> +"invalid operation, adding refs to a non-subvolume tree block, bytenr=%llu num_bytes=%llu root_objectid=%llu",
> +				   bytenr, num_bytes, root_objectid);
> +			if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
> +				btrfs_crit(trans->fs_info,
> +			"path->slots[0]=%d path->nodes[0]:", path->slots[0]);
> +				btrfs_print_leaf(path->nodes[0]);
> +			}
> +			return -EUCLEAN;
> +		}
>  		update_inline_extent_backref(path, iref, refs_to_add,
>  					     extent_op, NULL);
>  	} else if (ret == -ENOENT) {
> @@ -2073,6 +2090,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  /*
>   * __btrfs_inc_extent_ref - insert backreference for a given extent
>   *
> + * The work is opposite as __btrfs_free_extent().
> + * For more info about how it works or examples, refer to __btrfs_free_extent().
> + *
>   * @trans:	    Handle of transaction
>   *
>   * @node:	    The delayed ref node used to get the bytenr/length for
> @@ -2153,9 +2173,9 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
>  	/* now insert the actual backref */
>  	ret = insert_extent_backref(trans, path, bytenr, parent, root_objectid,
>  				    owner, offset, refs_to_add);
> +out:
>  	if (ret)
>  		btrfs_abort_transaction(trans, ret);
> -out:

I think this silently fixes a bug if insert_inline_extent_backref fails.
If that's the case then it should be in a dedicated patch with an
explicit rationale

>  	btrfs_free_path(path);
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-10  8:02 ` [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
@ 2019-07-10 11:19   ` Nikolay Borisov
  2019-07-10 12:02     ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 11:19 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 10.07.19 г. 11:02 ч., Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at btrfs_del_csums().
>   kernel BUG at fs/btrfs/ctree.c:3188!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
>   Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
>   RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
>   RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
>   RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
>   RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
>   R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
>   R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
>   FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
>   Call Trace:
>   truncate_one_csum+0xac/0xf0
>   btrfs_del_csums+0x24f/0x3a0
>   __btrfs_free_extent.isra.72+0x5a7/0xbe0
>   __btrfs_run_delayed_refs+0x539/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace 93bf9db00e6c374e ]---
> 
> [CAUSE]
> This crafted image has a very tricky key order corruption:
> 
>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>           ...
>           key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
>           key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
>           ...
> 
>   leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
>           item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
>                   range start 73785344 end 75497472 length 1712128
>           item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
>                   range start 75497472 end 75501568 length 4096
>           item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
>                   range start 75501568 end 77283328 length 1781760
>           item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
>                   range start 77283328 end 77287424 length 4096
>           item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
>                   range start 4120596480 end 4120903680 length 307200
>   leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
>           item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
>                   range start 77594624 end 79306752 length 1712128
>           ...
> 
> Note the item 4 key of leaf 29757440, which is obviously too large, and
> even larger than the first key of the next leaf.
> 
> However it still follows the key order in that tree block, thus tree
> checker is unable to detect it at read time, since tree checker can only
> work inside a leaf, thus such complex corruption can't be rejected in
> advance.
> 
> [FIX]
> The next timing to detect such problem is at tree block merge time,
> which is in push_node_left(), balance_node_right(), push_leaf_left() and
> push_leaf_right().
> 
> Now we check if the key order of the right most key of the left node is
> larger than the left most key of the right node.
> 
> By this we don't need to call the full tree-check, while still keeps the
> key order correct as key order in each node is already checked by tree
> checker thus we only need to check the above two slots.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>, albeit see below for
some minors suggestions (feel free to ignore).

> ---
>  fs/btrfs/ctree.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 5df76c17775a..38118e050689 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -3219,6 +3219,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
>  		fixup_low_keys(path, &disk_key, 1);
>  }
>  
> +/*
> + * Check the cross tree block key ordering.
> + *
> + * Tree-checker only works inside one tree block, thus the following
> + * corruption can not be rejected by tree-checker:
> + * Leaf @left			| Leaf @right
> + * --------------------------------------------------------------
> + * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
> + *
> + * Key f6 in leaf @left itself is valid, but not valid when the next
> + * key in leaf @right is 7.
> + * This can only be checked at tree block merge time.
> + * And since tree checker has ensured all key order in each tree block
> + * is correct, we only need to bother the last key of @left and the first
> + * key of @right.
> + */
> +static int check_cross_tree_key_order(struct extent_buffer *left,
> +				      struct extent_buffer *right)
> +{
> +	struct btrfs_key left_last;
> +	struct btrfs_key right_first;
> +	int level = btrfs_header_level(left);
> +	int nr_left = btrfs_header_nritems(left);
> +	int nr_right = btrfs_header_nritems(right);
> +
> +	/* No key to check in one of the tree blocks */
> +	if (!nr_left || !nr_right)
> +		return 0;
> +	if (level) {
> +		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
> +		btrfs_node_key_to_cpu(right, &right_first, 0);
> +	} else {
> +		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
> +		btrfs_item_key_to_cpu(right, &right_first, 0);
> +	}
> +	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> +		btrfs_crit(left->fs_info,
> +"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
> +			   left_last.objectid, left_last.type,
> +			   left_last.offset, right_first.objectid,
> +			   right_first.type, right_first.offset);
> +		return -EUCLEAN;
> +	}
> +	return 0;
> +}
> +

nit: I wonder if it will make it a bit easier to reason about the code
if that function is renamed to valid_cross_block_key_order and make it
return true or false, then it's callers will do if
(!valid_cross_block_key_ordered) {
return -EUCLEAN
}

I guess it won't be much different than it is now.

>  /*
>   * try to push data from one node into the next node left in the
>   * tree.
> @@ -3263,6 +3309,10 @@ static int push_node_left(struct btrfs_trans_handle *trans,
>  	} else
>  		push_items = min(src_nritems - 8, push_items);
>  
> +	/* dst is the left eb src is the middle eb */
nit: missing ',' between 'eb' and 'src'. But this is very minor.


> +	ret = check_cross_tree_key_order(dst, src);
> +	if (ret < 0)
> +		return ret;
>  	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
>  	if (ret) {
>  		btrfs_abort_transaction(trans, ret);
> @@ -3331,6 +3381,10 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
>  	if (max_push < push_items)
>  		push_items = max_push;
>  
> +	/* dst is the right eb, src is the middle eb */
> +	ret = check_cross_tree_key_order(src, dst);
> +	if (ret < 0)
> +		return ret;
>  	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
>  	BUG_ON(ret < 0);
>  	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
> @@ -3810,6 +3864,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>  	if (left_nritems == 0)
>  		goto out_unlock;
>  
> +	ret = check_cross_tree_key_order(left, right);
> +	if (ret < 0) {
> +		btrfs_tree_unlock(right);
> +		free_extent_buffer(right);
> +		return ret;
> +	}
>  	if (path->slots[0] == left_nritems && !empty) {
>  		/* Key greater than all keys in the leaf, right neighbor has
>  		 * enough room for it and we're not emptying our leaf to delete
> @@ -4048,6 +4108,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>  		goto out;
>  	}
>  
> +	ret = check_cross_tree_key_order(left, right);
> +	if (ret < 0)
> +		goto out;
>  	return __push_leaf_left(path, min_data_size,
>  			       empty, left, free_space, right_nritems,
>  			       max_slot);
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-10 11:19   ` Nikolay Borisov
@ 2019-07-10 12:02     ` Qu Wenruo
  2019-07-10 12:12       ` Nikolay Borisov
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-10 12:02 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs



On 2019/7/10 下午7:19, Nikolay Borisov wrote:
>
>
[...]
>> +static int check_cross_tree_key_order(struct extent_buffer *left,
>> +				      struct extent_buffer *right)
>> +{
>> +	struct btrfs_key left_last;
>> +	struct btrfs_key right_first;
>> +	int level = btrfs_header_level(left);
>> +	int nr_left = btrfs_header_nritems(left);
>> +	int nr_right = btrfs_header_nritems(right);
>> +
>> +	/* No key to check in one of the tree blocks */
>> +	if (!nr_left || !nr_right)
>> +		return 0;
>> +	if (level) {
>> +		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
>> +		btrfs_node_key_to_cpu(right, &right_first, 0);
>> +	} else {
>> +		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
>> +		btrfs_item_key_to_cpu(right, &right_first, 0);
>> +	}
>> +	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>> +		btrfs_crit(left->fs_info,
>> +"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
>> +			   left_last.objectid, left_last.type,
>> +			   left_last.offset, right_first.objectid,
>> +			   right_first.type, right_first.offset);
>> +		return -EUCLEAN;
>> +	}
>> +	return 0;
>> +}
>> +
>
> nit: I wonder if it will make it a bit easier to reason about the code
> if that function is renamed to valid_cross_block_key_order and make it
> return true or false, then it's callers will do if
> (!valid_cross_block_key_ordered) {
> return -EUCLEAN
> }

I'm always uncertain what's the correct schema for check function.

Sometimes we have bool check_whatever() sometimes we have bool
is_whatever(), and sometimes we have int check_whatever().

If we have a good guidance for btrfs, it will be a no-brain choice.

>
> I guess it won't be much different than it is now.
>
>>  /*
>>   * try to push data from one node into the next node left in the
>>   * tree.
>> @@ -3263,6 +3309,10 @@ static int push_node_left(struct btrfs_trans_handle *trans,
>>  	} else
>>  		push_items = min(src_nritems - 8, push_items);
>>
>> +	/* dst is the left eb src is the middle eb */
> nit: missing ',' between 'eb' and 'src'. But this is very minor.

I'd prefer the rename the parameter of push_node_left() directly in
another patch so that we won't need this comment at all.

@dst @src?! That makes no sense compared to @left and @right.

Thanks,
Qu

>
>
>> +	ret = check_cross_tree_key_order(dst, src);
>> +	if (ret < 0)
>> +		return ret;
>>  	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
>>  	if (ret) {
>>  		btrfs_abort_transaction(trans, ret);
>> @@ -3331,6 +3381,10 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
>>  	if (max_push < push_items)
>>  		push_items = max_push;
>>
>> +	/* dst is the right eb, src is the middle eb */
>> +	ret = check_cross_tree_key_order(src, dst);
>> +	if (ret < 0)
>> +		return ret;
>>  	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
>>  	BUG_ON(ret < 0);
>>  	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
>> @@ -3810,6 +3864,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>>  	if (left_nritems == 0)
>>  		goto out_unlock;
>>
>> +	ret = check_cross_tree_key_order(left, right);
>> +	if (ret < 0) {
>> +		btrfs_tree_unlock(right);
>> +		free_extent_buffer(right);
>> +		return ret;
>> +	}
>>  	if (path->slots[0] == left_nritems && !empty) {
>>  		/* Key greater than all keys in the leaf, right neighbor has
>>  		 * enough room for it and we're not emptying our leaf to delete
>> @@ -4048,6 +4108,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>>  		goto out;
>>  	}
>>
>> +	ret = check_cross_tree_key_order(left, right);
>> +	if (ret < 0)
>> +		goto out;
>>  	return __push_leaf_left(path, min_data_size,
>>  			       empty, left, free_space, right_nritems,
>>  			       max_slot);
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-10 12:02     ` Qu Wenruo
@ 2019-07-10 12:12       ` Nikolay Borisov
  2019-07-24 16:24         ` David Sterba
  0 siblings, 1 reply; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-10 12:12 UTC (permalink / raw)
  To: Qu Wenruo, WenRuo Qu, linux-btrfs



On 10.07.19 г. 15:02 ч., Qu Wenruo wrote:
> 
> 
> On 2019/7/10 下午7:19, Nikolay Borisov wrote:
>>
>>
> [...]
>>> +static int check_cross_tree_key_order(struct extent_buffer *left,
>>> +				      struct extent_buffer *right)
>>> +{
>>> +	struct btrfs_key left_last;
>>> +	struct btrfs_key right_first;
>>> +	int level = btrfs_header_level(left);
>>> +	int nr_left = btrfs_header_nritems(left);
>>> +	int nr_right = btrfs_header_nritems(right);
>>> +
>>> +	/* No key to check in one of the tree blocks */
>>> +	if (!nr_left || !nr_right)
>>> +		return 0;
>>> +	if (level) {
>>> +		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
>>> +		btrfs_node_key_to_cpu(right, &right_first, 0);
>>> +	} else {
>>> +		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
>>> +		btrfs_item_key_to_cpu(right, &right_first, 0);
>>> +	}
>>> +	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>>> +		btrfs_crit(left->fs_info,
>>> +"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
>>> +			   left_last.objectid, left_last.type,
>>> +			   left_last.offset, right_first.objectid,
>>> +			   right_first.type, right_first.offset);
>>> +		return -EUCLEAN;
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>
>> nit: I wonder if it will make it a bit easier to reason about the code
>> if that function is renamed to valid_cross_block_key_order and make it
>> return true or false, then it's callers will do if
>> (!valid_cross_block_key_ordered) {
>> return -EUCLEAN
>> }>
> I'm always uncertain what's the correct schema for check function.
> 
> Sometimes we have bool check_whatever() sometimes we have bool
> is_whatever(), and sometimes we have int check_whatever().
> 
> If we have a good guidance for btrfs, it will be a no-brain choice.

There is no such guidance in this case my logic is that this function
really returns a boolean value - 0 or -EUCLEAN to make that explicit I'd
define it as returning bool and rename it to valid_cross_block_key_order
to really emphasize the fact it's a predicated-type of function. Thus if
someone reads the they will be certain that this function return either
true or false depending on whether the input parameters make sense.
Whereas right now I will have to go and look at the implementation to
determine what are the return values because of the "int" return type.

Again, that's just me and if you deem it doesn't bring value then feel
free to ignore it.



> 
>>
>> I guess it won't be much different than it is now.
>>
>>>  /*
>>>   * try to push data from one node into the next node left in the
>>>   * tree.
>>> @@ -3263,6 +3309,10 @@ static int push_node_left(struct btrfs_trans_handle *trans,
>>>  	} else
>>>  		push_items = min(src_nritems - 8, push_items);
>>>
>>> +	/* dst is the left eb src is the middle eb */
>> nit: missing ',' between 'eb' and 'src'. But this is very minor.
> 
> I'd prefer the rename the parameter of push_node_left() directly in
> another patch so that we won't need this comment at all.
> 
> @dst @src?! That makes no sense compared to @left and @right.

I somewhat agree, however you will also need to rename the respective nr
items variables -> src_nritems/dst_nritems.


> 
> Thanks,
> Qu
> 
>>
>>
>>> +	ret = check_cross_tree_key_order(dst, src);
>>> +	if (ret < 0)
>>> +		return ret;
>>>  	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
>>>  	if (ret) {
>>>  		btrfs_abort_transaction(trans, ret);
>>> @@ -3331,6 +3381,10 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
>>>  	if (max_push < push_items)
>>>  		push_items = max_push;
>>>
>>> +	/* dst is the right eb, src is the middle eb */
>>> +	ret = check_cross_tree_key_order(src, dst);
>>> +	if (ret < 0)
>>> +		return ret;
>>>  	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
>>>  	BUG_ON(ret < 0);
>>>  	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
>>> @@ -3810,6 +3864,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
>>>  	if (left_nritems == 0)
>>>  		goto out_unlock;
>>>
>>> +	ret = check_cross_tree_key_order(left, right);
>>> +	if (ret < 0) {
>>> +		btrfs_tree_unlock(right);
>>> +		free_extent_buffer(right);
>>> +		return ret;
>>> +	}
>>>  	if (path->slots[0] == left_nritems && !empty) {
>>>  		/* Key greater than all keys in the leaf, right neighbor has
>>>  		 * enough room for it and we're not emptying our leaf to delete
>>> @@ -4048,6 +4108,9 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
>>>  		goto out;
>>>  	}
>>>
>>> +	ret = check_cross_tree_key_order(left, right);
>>> +	if (ret < 0)
>>> +		goto out;
>>>  	return __push_leaf_left(path, min_data_size,
>>>  			       empty, left, free_space, right_nritems,
>>>  			       max_slot);
>>>
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-10 10:58     ` WenRuo Qu
@ 2019-07-24 16:00       ` David Sterba
  2019-07-24 22:54         ` Qu Wenruo
  0 siblings, 1 reply; 19+ messages in thread
From: David Sterba @ 2019-07-24 16:00 UTC (permalink / raw)
  To: WenRuo Qu; +Cc: Nikolay Borisov, linux-btrfs

On Wed, Jul 10, 2019 at 10:58:40AM +0000, WenRuo Qu wrote:
> 
> 
> On 2019/7/10 下午6:42, Nikolay Borisov wrote:
> > 
> > 
> [...]
> >>  
> >> +/*
> >> + * Check if the [start, start + len) range is valid before reading/writing
> >> + * the eb.
> >> + *
> >> + * Caller should not touch the dst/src memory if this function returns error.
> >> + */
> >> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
> >> +			  unsigned long len)
> >> +{
> >> +	unsigned long end;
> >> +
> >> +	/* start, start + len should not go beyond eb->len nor overflow */
> >> +	if (unlikely(start > eb->len || start + len > eb->len ||
> > 
> > I think your check here is wrong, it should be start + len > start +
> > eb->len. start is the logical address hence it can be a lot bigger than
> > the size of the eb which is 16k by default.
> 
> Definitely NO.
> 
> [start, start + len) must be in the range of [0, nodesize).
> So think again.

'start' is the logical address, that's always larger than eb->len (16K),
Nikolay is IMO right, the check

	start > eb->len

does not make sense, neither does 'start + len > eb->len'.

Your reference to nodesize probably means that the interval
[start, start + len] is subset of [start, start + nodesize].

The test condition in your patch must explode every time the function
is called.

> >> +		     check_add_overflow(start, len, &end))) {
> >> +		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
> >> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> >> +		     eb->start, eb->len, start, len);
> >> +		btrfs_warn(eb->fs_info,
> >> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> >> +			   eb->start, eb->len, start, len);
> > 
> > If CONFIG_BTRFS_DEBUG is enabled then we will print the warning text
> > twice. Simply make  it:
> > 
> > WARN_ON(IS_ENABLED()) and leave the btrfs_Warn to always print the text.
> 
> WARN_ON() doesn't contain any text to indicate the reason of the stack dump.
> Thus I still prefer to show exact the reason other than takes developer
> several seconds to combine the stack with the following btrfs_warn()
> message.

I agree that a message next to a WARN is a good thing. Plain WARN does
not have the btrfs header (filesystem, device, ...) so we should use our
helpers and do WARN_ON(IS_ENABLED()) after that. Why after? Because with
panic-on-warn enabled the message won't be printed.

Duplicating the message or even the code does not seem like a good
practice to me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks
  2019-07-10 12:12       ` Nikolay Borisov
@ 2019-07-24 16:24         ` David Sterba
  0 siblings, 0 replies; 19+ messages in thread
From: David Sterba @ 2019-07-24 16:24 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Qu Wenruo, WenRuo Qu, linux-btrfs

On Wed, Jul 10, 2019 at 03:12:37PM +0300, Nikolay Borisov wrote:
> On 10.07.19 г. 15:02 ч., Qu Wenruo wrote:
> > On 2019/7/10 下午7:19, Nikolay Borisov wrote:
> >> nit: I wonder if it will make it a bit easier to reason about the code
> >> if that function is renamed to valid_cross_block_key_order and make it
> >> return true or false, then it's callers will do if
> >> (!valid_cross_block_key_ordered) {
> >> return -EUCLEAN
> >> }>
> > I'm always uncertain what's the correct schema for check function.
> > 
> > Sometimes we have bool check_whatever() sometimes we have bool
> > is_whatever(), and sometimes we have int check_whatever().
> > 
> > If we have a good guidance for btrfs, it will be a no-brain choice.
> 
> There is no such guidance in this case my logic is that this function
> really returns a boolean value - 0 or -EUCLEAN to make that explicit I'd
> define it as returning bool and rename it to valid_cross_block_key_order
> to really emphasize the fact it's a predicated-type of function. Thus if
> someone reads the they will be certain that this function return either
> true or false depending on whether the input parameters make sense.

Also what
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#function-return-values-and-names
says.

Int in place of bool is in the old code, we should use bool in new code,
and convert the old code eventually. The naming of predicates prefixed
wich check_ sounds understandable to me, like going through a check
list, if it's ok continue, otherwise exit.

	if (check_this_is_fine())
		goto out_it_is_not;

Using 'is_' could be possible in some cases, and eg. for is_fstree or
is_bad_inode is ok.

> Whereas right now I will have to go and look at the implementation to
> determine what are the return values because of the "int" return type.
> 
> Again, that's just me and if you deem it doesn't bring value then feel
> free to ignore it.

I think this becomes important in a large code base that btrfs is slowly
turning into, so it's not just you. As I have to read a lot of code I
can notice patterns that are easy to understand and where checking other
files is necessary, which takes time and is distracting. New code should
follow the best practices and old code should be updated when changed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-24 16:00       ` David Sterba
@ 2019-07-24 22:54         ` Qu Wenruo
  2019-07-25  6:39           ` Nikolay Borisov
  0 siblings, 1 reply; 19+ messages in thread
From: Qu Wenruo @ 2019-07-24 22:54 UTC (permalink / raw)
  To: dsterba, WenRuo Qu, Nikolay Borisov, linux-btrfs



On 2019/7/25 上午12:00, David Sterba wrote:
> On Wed, Jul 10, 2019 at 10:58:40AM +0000, WenRuo Qu wrote:
>>
>>
>> On 2019/7/10 下午6:42, Nikolay Borisov wrote:
>>>
>>>
>> [...]
>>>>
>>>> +/*
>>>> + * Check if the [start, start + len) range is valid before reading/writing
>>>> + * the eb.
>>>> + *
>>>> + * Caller should not touch the dst/src memory if this function returns error.
>>>> + */
>>>> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
>>>> +			  unsigned long len)
>>>> +{
>>>> +	unsigned long end;
>>>> +
>>>> +	/* start, start + len should not go beyond eb->len nor overflow */
>>>> +	if (unlikely(start > eb->len || start + len > eb->len ||
>>>
>>> I think your check here is wrong, it should be start + len > start +
>>> eb->len. start is the logical address hence it can be a lot bigger than
>>> the size of the eb which is 16k by default.
>>
>> Definitely NO.
>>
>> [start, start + len) must be in the range of [0, nodesize).
>> So think again.
>
> 'start' is the logical address, that's always larger than eb->len (16K),
> Nikolay is IMO right, the check

No. This @start is not eb->start. It's the offset inside the eb.

>
> 	start > eb->len
>
> does not make sense, neither does 'start + len > eb->len'.
>
> Your reference to nodesize probably means that the interval
> [start, start + len] is subset of [start, start + nodesize].
>
> The test condition in your patch must explode every time the function
> is called.

Nope, it doesn't explode.

>
>>>> +		     check_add_overflow(start, len, &end))) {
>>>> +		WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), KERN_ERR
>>>> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
>>>> +		     eb->start, eb->len, start, len);
>>>> +		btrfs_warn(eb->fs_info,
>>>> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
>>>> +			   eb->start, eb->len, start, len);
>>>
>>> If CONFIG_BTRFS_DEBUG is enabled then we will print the warning text
>>> twice. Simply make  it:
>>>
>>> WARN_ON(IS_ENABLED()) and leave the btrfs_Warn to always print the text.
>>
>> WARN_ON() doesn't contain any text to indicate the reason of the stack dump.
>> Thus I still prefer to show exact the reason other than takes developer
>> several seconds to combine the stack with the following btrfs_warn()
>> message.
>
> I agree that a message next to a WARN is a good thing. Plain WARN does
> not have the btrfs header (filesystem, device, ...) so we should use our
> helpers and do WARN_ON(IS_ENABLED()) after that. Why after? Because with
> panic-on-warn enabled the message won't be printed.
>
> Duplicating the message or even the code does not seem like a good
> practice to me.
>
OK, I'll just remove the duplicated error message.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions
  2019-07-24 22:54         ` Qu Wenruo
@ 2019-07-25  6:39           ` Nikolay Borisov
  0 siblings, 0 replies; 19+ messages in thread
From: Nikolay Borisov @ 2019-07-25  6:39 UTC (permalink / raw)
  To: Qu Wenruo, WenRuo Qu, dsterba, linux-btrfs



On 25.07.19 г. 1:54 ч., Qu Wenruo wrote:
> 
> 
> On 2019/7/25 上午12:00, David Sterba wrote:
>> On Wed, Jul 10, 2019 at 10:58:40AM +0000, WenRuo Qu wrote:
>>>
>>>
>>> On 2019/7/10 下午6:42, Nikolay Borisov wrote:
>>>>
>>>>
>>> [...]
>>>>>
>>>>> +/*
>>>>> + * Check if the [start, start + len) range is valid before reading/writing
>>>>> + * the eb.
>>>>> + *
>>>>> + * Caller should not touch the dst/src memory if this function returns error.
>>>>> + */
>>>>> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
>>>>> +			  unsigned long len)
>>>>> +{
>>>>> +	unsigned long end;
>>>>> +
>>>>> +	/* start, start + len should not go beyond eb->len nor overflow */
>>>>> +	if (unlikely(start > eb->len || start + len > eb->len ||
>>>>
>>>> I think your check here is wrong, it should be start + len > start +
>>>> eb->len. start is the logical address hence it can be a lot bigger than
>>>> the size of the eb which is 16k by default.
>>>
>>> Definitely NO.
>>>
>>> [start, start + len) must be in the range of [0, nodesize).
>>> So think again.
>>
>> 'start' is the logical address, that's always larger than eb->len (16K),
>> Nikolay is IMO right, the check
> 
> No. This @start is not eb->start. It's the offset inside the eb.

Given David is the 2nd person who got confused I think it's clear that
the variable inside check_eb_range is poorly named ... For the next
version rename it to something like offset or eb_offset.

<snip>

> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-07-25  6:39 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-10  8:02 [PATCH 0/5] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
2019-07-10  8:02 ` [PATCH 1/5] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
2019-07-10 10:42   ` Nikolay Borisov
2019-07-10 10:58     ` WenRuo Qu
2019-07-24 16:00       ` David Sterba
2019-07-24 22:54         ` Qu Wenruo
2019-07-25  6:39           ` Nikolay Borisov
2019-07-10  8:02 ` [PATCH 2/5] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
2019-07-10 10:48   ` Nikolay Borisov
2019-07-10 11:00     ` WenRuo Qu
2019-07-10  8:02 ` [PATCH 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations Qu Wenruo
2019-07-10 10:54   ` Nikolay Borisov
2019-07-10  8:02 ` [PATCH 4/5] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
2019-07-10 11:12   ` Nikolay Borisov
2019-07-10  8:02 ` [PATCH 5/5] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
2019-07-10 11:19   ` Nikolay Borisov
2019-07-10 12:02     ` Qu Wenruo
2019-07-10 12:12       ` Nikolay Borisov
2019-07-24 16:24         ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.