linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images
@ 2020-08-12  6:05 Qu Wenruo
  2020-08-12  6:05 ` [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12  6:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Jungyeon Yoon

This patch is revived after one year, as one internal report has hit one
BUG_ON() with real world fs, so I believe this patchset still makes sense.

- Enhanced eb accessors
  Not really needed for the fuzzed images, as 448de471cd4c
  ("btrfs: Check the first key and level for cached extent buffer")
  already fixed half of the reported images.
  Just add a final layer of safe net.

  Just to complain here, two experienced btrfs developer have got
  confused by @start, @len in functions like read_extent_buffer() with
  logical address.
  The best example to solve the confusion is to check the
  read_extent_buffer() call in btree_read_extent_buffer_pages().

  I'm not sure why this confusion happens or even get spread.
  My guess is the extent_buffer::start naming causing the problem.

  If so, I would definitely rename extent_buffer::start to
  extent_buffer::bytenr at any cost.
  Hopes the new commend will address the problem for now.

- BUG_ON() hunt in __btrfs_free_extent()
  Kill BUG_ON()s in __btrfs_free_extent(), replace with error reporting
  and why it shouldn't happen.

  Also add comment on what __btrfs_free_extent() is designed to do, with
  two dump-tree examples for newcomers.

- BUG_ON() hunt in __btrfs_inc_extent_ref()
  Just like __btrfs_free_extent(), but less comment as
  comment for __btrfs_free_extent() should also work for
  __btrfs_inc_extent_ref(), and __btrfs_inc_extent_ref() has a better
  structure than __btrfs_free_extent().

- Defence against unbalanced empty leaf

- Defence against bad key order across two tree blocks

The last two cases can't be rejected by tree-checker and they are all
cross-eb cases.
Thankfully we can reuse existing first_key check against unbalanced
empty leaf, but needs extra check deep into ctree.c for tree block
merging time check.

Reported-by: Jungyeon Yoon <jungyeon.yoon@gmail.com>
[ Not to mail bombarding the report, thus only RB tag in cover letter ]

Changelog:
v2:
- Remove duplicated error message in WARN() call.
  Changed to WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG))
  Also move WARN() after btrfs error message.

- Fix a comment error in __btrfs_free_extent()
  It's not adding refs to a tree block, but adding the same refs
  to an existing tree block ref.
  It's impossible a btrfs tree owning the same tree block directly twice.

- Add comment for eb accessors about @start and @len
  If anyone could tell me why such confusion between @start @len and
  logical address is here, I will definitely solve the root cause no
  matter how many codes need to be modified.

- Use bool to replace int where only two values are returned
  Also rename to follow the bool type.

- Remove one unrelated change for the error handler in
  btrfs_inc_extent_ref()

- Add Reviewed-by tag

v3:
- Rebased to latest misc-next branch
  All conflicts can be auto-merged.

v4:
- Remove one patch which is already merged
  A little surprised by the fact that git can't detecth such case.

- Add new reviewed-by tags from Josef

Qu Wenruo (4):
  btrfs: extent_io: Do extra check for extent buffer read write
    functions
  btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do
    better comment
  btrfs: extent-tree: Kill the BUG_ON() in
    insert_inline_extent_backref()
  btrfs: ctree: Checking key orders before merged tree blocks

 fs/btrfs/ctree.c       |  68 +++++++++++++++++
 fs/btrfs/extent-tree.c | 164 +++++++++++++++++++++++++++++++++++++----
 fs/btrfs/extent_io.c   |  76 +++++++++----------
 3 files changed, 257 insertions(+), 51 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions
  2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
@ 2020-08-12  6:05 ` Qu Wenruo
  2020-08-13 14:05   ` David Sterba
  2020-08-12  6:05 ` [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12  6:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

Although we have start, len check for extent buffer reader/write (e.g.
read_extent_buffer()), those checks has its limitations:
- No overflow check
  Values like start = 1024 len = -1024 can still pass the basic
   (start + len) > eb->len check.

- Checks are not consistent
  For read_extent_buffer() we only check (start + len) against eb->len.
  While for memcmp_extent_buffer() we also check start against eb->len.

- Different error reporting mechanism
  We use WARN() in read_extent_buffer() but BUG() in
  memcpy_extent_buffer().

- Still modify memory if the request is obviously wrong
  In read_extent_buffer() even we find (start + len) > eb->len, we still
  call memset(dst, 0, len), which can eaisly cause memory access error
  if start + len overflows.

To address above problems, this patch creates a new common function to
check such access, check_eb_range().
- Add overflow check
  This function checks start, start + len against eb->len and overflow
  check.

- Unified checks

- Unified error reports
  Will call WARN() if CONFIG_BTRFS_DEBUG is configured.
  And also do btrfs_warn() message for non-debug build.

- Exit ASAP if check fails
  No more possible memory corruption.

- Add extra comment for @start @len used in those functions
  Even experienced developers sometimes get confused with the @start
  @len with logical address in those functions.
  I'm not sure what's the cause, maybe it's the extent_buffer::start
  naming.
  For now, just add some comment.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202817
[ Inspired by above report, the report itself is already addressed ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent_io.c | 76 +++++++++++++++++++++++---------------------
 1 file changed, 39 insertions(+), 37 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 617ea38e6fd7..9f583ef1e387 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5620,6 +5620,28 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
 	return ret;
 }
 
+/*
+ * Check if the [start, start + len) range is valid before reading/writing
+ * the eb.
+ * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
+ *
+ * Caller should not touch the dst/src memory if this function returns error.
+ */
+static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
+			  unsigned long len)
+{
+	/* start, start + len should not go beyond eb->len nor overflow */
+	if (unlikely(start > eb->len || start + len > eb->len ||
+		     len > eb->len)) {
+		btrfs_warn(eb->fs_info,
+"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
+			   eb->start, eb->len, start, len);
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		return -EINVAL;
+	}
+	return 0;
+}
+
 void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 			unsigned long start, unsigned long len)
 {
@@ -5630,12 +5652,8 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
 	char *dst = (char *)dstv;
 	unsigned long i = start >> PAGE_SHIFT;
 
-	if (start + len > eb->len) {
-		WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
-		     eb->start, eb->len, start, len);
-		memset(dst, 0, len);
+	if (check_eb_range(eb, start, len))
 		return;
-	}
 
 	offset = offset_in_page(start);
 
@@ -5700,8 +5718,8 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
 	unsigned long i = start >> PAGE_SHIFT;
 	int ret = 0;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return -EINVAL;
 
 	offset = offset_in_page(start);
 
@@ -5754,8 +5772,8 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
 	char *src = (char *)srcv;
 	unsigned long i = start >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start);
 
@@ -5783,8 +5801,8 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
 	char *kaddr;
 	unsigned long i = start >> PAGE_SHIFT;
 
-	WARN_ON(start > eb->len);
-	WARN_ON(start + len > eb->start + eb->len);
+	if (check_eb_range(eb, start, len))
+		return;
 
 	offset = offset_in_page(start);
 
@@ -5828,6 +5846,10 @@ void copy_extent_buffer(const struct extent_buffer *dst,
 	char *kaddr;
 	unsigned long i = dst_offset >> PAGE_SHIFT;
 
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(src, src_offset, len))
+		return;
+
 	WARN_ON(src->len != dst_len);
 
 	offset = offset_in_page(dst_offset);
@@ -6017,25 +6039,15 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
 			  unsigned long dst_offset, unsigned long src_offset,
 			  unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus src_offset %lu move len %lu dst len %lu",
-			 src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			"memmove bogus dst_offset %lu move len %lu dst len %lu",
-			 dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 
 	while (len > 0) {
 		dst_off_in_page = offset_in_page(dst_offset);
@@ -6062,7 +6074,6 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 			   unsigned long dst_offset, unsigned long src_offset,
 			   unsigned long len)
 {
-	struct btrfs_fs_info *fs_info = dst->fs_info;
 	size_t cur;
 	size_t dst_off_in_page;
 	size_t src_off_in_page;
@@ -6071,18 +6082,9 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
 	unsigned long dst_i;
 	unsigned long src_i;
 
-	if (src_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus src_offset %lu move len %lu len %lu",
-			  src_offset, len, dst->len);
-		BUG();
-	}
-	if (dst_offset + len > dst->len) {
-		btrfs_err(fs_info,
-			  "memmove bogus dst_offset %lu move len %lu len %lu",
-			  dst_offset, len, dst->len);
-		BUG();
-	}
+	if (check_eb_range(dst, dst_offset, len) ||
+	    check_eb_range(dst, src_offset, len))
+		return;
 	if (dst_offset < src_offset) {
 		memcpy_extent_buffer(dst, dst_offset, src_offset, len);
 		return;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2020-08-12  6:05 ` [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2020-08-12  6:05 ` Qu Wenruo
  2020-08-13 14:10   ` David Sterba
  2020-08-12  6:05 ` [PATCH v4 3/4] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12  6:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Josef Bacik

__btrfs_free_extent() is one of the best cases to show how optimization
could make a function hard to read.

In fact __btrfs_free_extent() is only doing two major works:
1. Reduce the refs number of an extent backref
   Either it's an inlined extent backref (inside EXTENT/METADATA item) or
   a keyed extent backref (SHARED_* item).
   We only need to locate that backref line, either reduce the number or
   remove the backref line completely.

2. Update the refs count in EXTENT/METADATA_ITEM

But in real world, we do it in a complex but somewhat efficient way.
During step 1), we will try to locate the EXTENT/METADATA_ITEM without
triggering another btrfs_search_slot() as fast path.

Only when we failed to locate that item, we will trigger another
btrfs_search_slot() to get that EXTENT/METADATA_ITEM after we
updated/deleted the backref line.

And we have a lot of restrict check on things like refs_to_drop against
extent refs and special case check for single ref extent.

All of these results:
- 7 BUG_ON()s in a single function
  Although all these BUG_ON() are doing correct check, they're super
  easy to get triggered for fuzzed images.
  It's never a good idea to piss the end user.

- Near 300 lines without much useful comments but a lot of hidden
  conditions
  I believe even the author needs several minutes to recall what the
  code is doing
  Not to mention a lot of BUG_ON() conditions needs to go back tens of
  lines to find out why.

This patch address all these problems by:
- Introduce two examples to show what __btrfs_free_extent() is doing
  One inlined backref case and one keyed case.
  Should cover most cases.

- Kill all BUG_ON()s with proper error message and optional leaf dump

- Add comment to show the overall workflow

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202819
[ The report triggers one BUG_ON() in __btrfs_free_extent() ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 144 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 131 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index fa7d83051587..8e86e3524861 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2930,6 +2930,53 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans)
 	return 0;
 }
 
+/*
+ * Real work happens here to drop one or more refs of @node.
+ *
+ * The work is mostly done in two parts:
+ * 1. Locate the extent refs.
+ *    It's either inlined in EXTENT/METADATA_ITEM or in keyed SHARED_* item.
+ *    Locate it then reduces the refs number or remove the ref line completely.
+ *
+ * 2. Update the refs count in EXTENT/METADATA_ITEM
+ *
+ * Due to the above two operations and possible optimizations, the function
+ * is a little hard to read, but with the following examples, the result
+ * of this function should be pretty easy to get.
+ *
+ * Example:
+ * *** Inlined backref case ***
+ * In extent tree we have:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 2 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *		extent data backref root FS_TREE objectid 257 offset 0 count 1
+ *
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 257, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ * 	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 16201 itemsize 82
+ *		refs 1 gen 6 flags DATA
+ *		extent data backref root FS_TREE objectid 258 offset 0 count 1
+ *
+ * *** Keyed backref case ***
+ * In extent tree we have:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 754 gen 6 flags DATA
+ *	[...]
+ *	item 2 key (13631488 EXTENT_DATA_REF <HASH>) itemoff 3915 itemsize 28
+ *		extent data backref root FS_TREE objectid 866 offset 0 count 1
+ * This function get called with
+ *  node->bytenr = 13631488, node->num_bytes = 1048576
+ *  root_objectid = FS_TREE, owner_objectid = 866, owner_offset = 0,
+ *  refs_to_drop = 1
+ * Then we should get some like:
+ *	item 0 key (13631488 EXTENT_ITEM 1048576) itemoff 3971 itemsize 24
+ *		refs 753 gen 6 flags DATA
+ * And that (13631488 EXTENT_DATA_REF <HASH>) get removed.
+ */
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			       struct btrfs_delayed_ref_node *node, u64 parent,
 			       u64 root_objectid, u64 owner_objectid,
@@ -2962,7 +3009,15 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	path->leave_spinning = 1;
 
 	is_data = owner_objectid >= BTRFS_FIRST_FREE_OBJECTID;
-	BUG_ON(!is_data && refs_to_drop != 1);
+
+	if (unlikely(!is_data && refs_to_drop != 1)) {
+		btrfs_crit(info,
+"invalid refs_to_drop, dropping more than 1 refs for tree block %llu refs_to_drop %u",
+			   node->bytenr, refs_to_drop);
+		ret = -EINVAL;
+		btrfs_abort_transaction(trans, ret);
+		goto out;
+	}
 
 	if (is_data)
 		skinny_metadata = false;
@@ -2971,6 +3026,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				    parent, root_objectid, owner_objectid,
 				    owner_offset);
 	if (ret == 0) {
+		/*
+		 * Either the inline backref or the SHARED_DATA_REF/
+		 * SHARED_BLOCK_REF is found
+		 *
+		 * Here is a quick path to locate EXTENT/METADATA_ITEM.
+		 * It's possible the EXTENT/METADATA_ITEM is near current slot.
+		 */
 		extent_slot = path->slots[0];
 		while (extent_slot >= 0) {
 			btrfs_item_key_to_cpu(path->nodes[0], &key,
@@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 				found_extent = 1;
 				break;
 			}
+
+			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
 			if (path->slots[0] - extent_slot > 5)
 				break;
 			extent_slot--;
 		}
 
 		if (!found_extent) {
-			BUG_ON(iref);
+			if (unlikely(iref)) {
+				btrfs_crit(info,
+"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
+				goto err_dump_abort;
+			}
+			/* Must be SHARED_* item, remove the backref first */
 			ret = remove_extent_backref(trans, path, NULL,
 						    refs_to_drop,
 						    is_data, &last_ref);
@@ -3004,6 +3073,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			btrfs_release_path(path);
 			path->leave_spinning = 1;
 
+
+			/* Slow path to locate EXTENT/METADATA_ITEM */
 			key.objectid = bytenr;
 			key.type = BTRFS_EXTENT_ITEM_KEY;
 			key.offset = num_bytes;
@@ -3078,19 +3149,24 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 	if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
 	    key.type == BTRFS_EXTENT_ITEM_KEY) {
 		struct btrfs_tree_block_info *bi;
-		BUG_ON(item_size < sizeof(*ei) + sizeof(*bi));
+		if (unlikely(item_size < sizeof(*ei) + sizeof(*bi))) {
+			btrfs_crit(info,
+"invalid extent item size for key (%llu, %u, %llu) owner %llu, has %u expect >= %lu",
+				   key.objectid, key.type, key.offset,
+				   owner_objectid, item_size,
+				   sizeof(*ei) + sizeof(*bi));
+			goto err_dump_abort;
+		}
 		bi = (struct btrfs_tree_block_info *)(ei + 1);
 		WARN_ON(owner_objectid != btrfs_tree_block_level(leaf, bi));
 	}
 
 	refs = btrfs_extent_refs(leaf, ei);
 	if (refs < refs_to_drop) {
-		btrfs_err(info,
-			  "trying to drop %d refs but we only have %Lu for bytenr %Lu",
+		btrfs_crit(info,
+		"trying to drop %d refs but we only have %Lu for bytenr %Lu",
 			  refs_to_drop, refs, bytenr);
-		ret = -EINVAL;
-		btrfs_abort_transaction(trans, ret);
-		goto out;
+		goto err_dump_abort;
 	}
 	refs -= refs_to_drop;
 
@@ -3102,7 +3178,11 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 		 * be updated by remove_extent_backref
 		 */
 		if (iref) {
-			BUG_ON(!found_extent);
+			if (unlikely(!found_extent)) {
+				btrfs_crit(info,
+"invalid iref, got inlined extent ref but no EXTENT/METADATA_ITEM found");
+				goto err_dump_abort;
+			}
 		} else {
 			btrfs_set_extent_refs(leaf, ei, refs);
 			btrfs_mark_buffer_dirty(leaf);
@@ -3117,13 +3197,36 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 			}
 		}
 	} else {
+		/* In this branch refs == 1 */
 		if (found_extent) {
-			BUG_ON(is_data && refs_to_drop !=
-			       extent_data_ref_count(path, iref));
+			if (is_data && refs_to_drop !=
+			    extent_data_ref_count(path, iref)) {
+				btrfs_crit(info,
+		"invalid refs_to_drop, current refs %u refs_to_drop %u",
+					   extent_data_ref_count(path, iref),
+					   refs_to_drop);
+				goto err_dump_abort;
+			}
 			if (iref) {
-				BUG_ON(path->slots[0] != extent_slot);
+				if (path->slots[0] != extent_slot) {
+					btrfs_crit(info,
+"invalid iref, extent item key (%llu %u %llu) doesn't has wanted iref",
+						   key.objectid, key.type,
+						   key.offset);
+					goto err_dump_abort;
+				}
 			} else {
-				BUG_ON(path->slots[0] != extent_slot + 1);
+				/*
+				 * No inline ref, we must at SHARED_* item,
+				 * And it's single ref, it must be:
+				 * |	extent_slot	  ||extent_slot + 1|
+				 * [ EXTENT/METADATA_ITEM ][ SHARED_* ITEM ]
+				 */
+				if (path->slots[0] != extent_slot + 1) {
+					btrfs_crit(info,
+	"invalid SHARED_* item, previous item is not EXTENT/METADATA_ITEM");
+					goto err_dump_abort;
+				}
 				path->slots[0] = extent_slot;
 				num_to_del = 2;
 			}
@@ -3164,6 +3267,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
 out:
 	btrfs_free_path(path);
 	return ret;
+err_dump_abort:
+	/*
+	 * Leaf dump can take up a lot of dmesg buffer since default nodesize
+	 * is already 16K.
+	 * So we only do full leaf dump for debug build.
+	 */
+	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+		btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
+			   path->slots[0], extent_slot);
+		btrfs_print_leaf(path->nodes[0]);
+	}
+
+	btrfs_abort_transaction(trans, -EUCLEAN);
+	btrfs_free_path(path);
+	return -EUCLEAN;
 }
 
 /*
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/4] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref()
  2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
  2020-08-12  6:05 ` [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
  2020-08-12  6:05 ` [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2020-08-12  6:05 ` Qu Wenruo
  2020-08-12  6:05 ` [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
  2020-08-12  6:52 ` [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images David Sterba
  4 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12  6:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Josef Bacik

[BUG]
With crafted image, btrfs can panic at insert_inline_extent_backref():
  kernel BUG at fs/btrfs/extent-tree.c:1857!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1117 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:insert_inline_extent_backref+0xcc/0xe0
  Code: 45 20 49 8b 7e 50 49 89 d8 4c 8b 4d 10 48 8b 55 c8 4c 89 e1 41 57 4c 89 ee 50 ff 75 18 e8 cc bf ff ff 31 c0 48 83 c4 18 eb b2 <0f> 0b e8 9d df bd ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffffac4dc1287be8 EFLAGS: 00010293
  RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000001
  RDX: 0000000000001000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffffac4dc1287c28 R08: ffffac4dc1287ab8 R09: ffffac4dc1287ac0
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff8febef88a540 R14: ffff8febeaa7bc30 R15: 0000000000000000
  FS: 0000000000000000(0000) GS:ffff8febf7a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f663ace94c0 CR3: 0000000235698006 CR4: 00000000000206f0
  Call Trace:
  ? _cond_resched+0x1a/0x50
  __btrfs_inc_extent_ref.isra.64+0x7e/0x240
  ? btrfs_merge_delayed_refs+0xa5/0x330
  __btrfs_run_delayed_refs+0x653/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 2ad8b3de903cf825 ]---

[CAUSE]
Due to extent tree corruption (still valid by itself, but bad cross ref),
we can allocate an extent which is still in extent tree.
The offending tree block of that case is from csum tree.
The newly allocated tree block is also for csum tree.

Then we will try to insert an tree block ref for the existing tree block
ref.

For btrfs tree extent item, a tree block can never be shared directly by
the same tree twice.
We have such BUG_ON() to prevent such problem, but BUG_ON() is
definitely not good enough.

[FIX]
Replace that BUG_ON() with proper error message and leaf dump for debug
build.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202829
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/extent-tree.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 8e86e3524861..b664ad361bd8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1177,7 +1177,22 @@ int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
 					   num_bytes, parent, root_objectid,
 					   owner, offset, 1);
 	if (ret == 0) {
-		BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
+		/*
+		 * We're adding refs to an tree block we already own, this
+		 * should not happen at all.
+		 */
+		if (owner < BTRFS_FIRST_FREE_OBJECTID) {
+			btrfs_crit(trans->fs_info,
+"invalid operation, adding refs to an existing tree ref, bytenr=%llu num_bytes=%llu root_objectid=%llu",
+				   bytenr, num_bytes, root_objectid);
+			if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
+				WARN_ON(1);
+				btrfs_crit(trans->fs_info,
+			"path->slots[0]=%d path->nodes[0]:", path->slots[0]);
+				btrfs_print_leaf(path->nodes[0]);
+			}
+			return -EUCLEAN;
+		}
 		update_inline_extent_backref(path, iref, refs_to_add,
 					     extent_op, NULL);
 	} else if (ret == -ENOENT) {
@@ -1397,6 +1412,9 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 /*
  * __btrfs_inc_extent_ref - insert backreference for a given extent
  *
+ * The work is opposite as __btrfs_free_extent().
+ * For more info about how it works or examples, refer to __btrfs_free_extent().
+ *
  * @trans:	    Handle of transaction
  *
  * @node:	    The delayed ref node used to get the bytenr/length for
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks
  2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (2 preceding siblings ...)
  2020-08-12  6:05 ` [PATCH v4 3/4] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
@ 2020-08-12  6:05 ` Qu Wenruo
  2020-08-13 14:21   ` David Sterba
  2020-08-12  6:52 ` [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images David Sterba
  4 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-12  6:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, Josef Bacik

[BUG]
With crafted image, btrfs can panic at btrfs_del_csums().
  kernel BUG at fs/btrfs/ctree.c:3188!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
  RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
  Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
  RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
  RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
  RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
  R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
  R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
  FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
  Call Trace:
  truncate_one_csum+0xac/0xf0
  btrfs_del_csums+0x24f/0x3a0
  __btrfs_free_extent.isra.72+0x5a7/0xbe0
  __btrfs_run_delayed_refs+0x539/0x1120
  btrfs_run_delayed_refs+0xdb/0x1b0
  btrfs_commit_transaction+0x52/0x950
  ? start_transaction+0x94/0x450
  transaction_kthread+0x163/0x190
  kthread+0x105/0x140
  ? btrfs_cleanup_transaction+0x560/0x560
  ? kthread_destroy_worker+0x50/0x50
  ret_from_fork+0x35/0x40
  Modules linked in:
  ---[ end trace 93bf9db00e6c374e ]---

[CAUSE]
This crafted image has a very tricky key order corruption:

  checksum tree key (CSUM_TREE ROOT_ITEM 0)
  node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
          ...
          key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
          key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
          ...

  leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
          item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
                  range start 73785344 end 75497472 length 1712128
          item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
                  range start 75497472 end 75501568 length 4096
          item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
                  range start 75501568 end 77283328 length 1781760
          item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
                  range start 77283328 end 77287424 length 4096
          item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
                  range start 4120596480 end 4120903680 length 307200
  leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
          item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
                  range start 77594624 end 79306752 length 1712128
          ...

Note the item 4 key of leaf 29757440, which is obviously too large, and
even larger than the first key of the next leaf.

However it still follows the key order in that tree block, thus tree
checker is unable to detect it at read time, since tree checker can only
work inside a leaf, thus such complex corruption can't be rejected in
advance.

[FIX]
The next timing to detect such problem is at tree block merge time,
which is in push_node_left(), balance_node_right(), push_leaf_left() and
push_leaf_right().

Now we check if the key order of the right most key of the left node is
larger than the left most key of the right node.

By this we don't need to call the full tree-check, while still keeps the
key order correct as key order in each node is already checked by tree
checker thus we only need to check the above two slots.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 70e49d8d4f6c..497abb397ea1 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3159,6 +3159,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
 		fixup_low_keys(path, &disk_key, 1);
 }
 
+/*
+ * Check the cross tree block key ordering.
+ *
+ * Tree-checker only works inside one tree block, thus the following
+ * corruption can not be rejected by tree-checker:
+ * Leaf @left			| Leaf @right
+ * --------------------------------------------------------------
+ * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
+ *
+ * Key f6 in leaf @left itself is valid, but not valid when the next
+ * key in leaf @right is 7.
+ * This can only be checked at tree block merge time.
+ * And since tree checker has ensured all key order in each tree block
+ * is correct, we only need to bother the last key of @left and the first
+ * key of @right.
+ */
+static bool valid_cross_tree_key_order(struct extent_buffer *left,
+				       struct extent_buffer *right)
+{
+	struct btrfs_key left_last;
+	struct btrfs_key right_first;
+	int level = btrfs_header_level(left);
+	int nr_left = btrfs_header_nritems(left);
+	int nr_right = btrfs_header_nritems(right);
+
+	/* No key to check in one of the tree blocks */
+	if (!nr_left || !nr_right)
+		return true;
+	if (level) {
+		btrfs_node_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_node_key_to_cpu(right, &right_first, 0);
+	} else {
+		btrfs_item_key_to_cpu(left, &left_last, nr_left - 1);
+		btrfs_item_key_to_cpu(right, &right_first, 0);
+	}
+	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+		btrfs_crit(left->fs_info,
+"bad key order cross tree blocks, left last (%llu %u %llu) right first (%llu %u %llu",
+			   left_last.objectid, left_last.type,
+			   left_last.offset, right_first.objectid,
+			   right_first.type, right_first.offset);
+		return false;
+	}
+	return true;
+}
+
 /*
  * try to push data from one node into the next node left in the
  * tree.
@@ -3203,6 +3249,12 @@ static int push_node_left(struct btrfs_trans_handle *trans,
 	} else
 		push_items = min(src_nritems - 8, push_items);
 
+	/* dst is the left eb, src is the middle eb */
+	if (!valid_cross_tree_key_order(dst, src)) {
+		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
+		return ret;
+	}
 	ret = tree_mod_log_eb_copy(dst, src, dst_nritems, 0, push_items);
 	if (ret) {
 		btrfs_abort_transaction(trans, ret);
@@ -3271,6 +3323,12 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
 	if (max_push < push_items)
 		push_items = max_push;
 
+	/* dst is the right eb, src is the middle eb */
+	if (!valid_cross_tree_key_order(src, dst)) {
+		ret = -EUCLEAN;
+		btrfs_abort_transaction(trans, ret);
+		return ret;
+	}
 	ret = tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
 	BUG_ON(ret < 0);
 	memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(push_items),
@@ -3747,6 +3805,12 @@ static int push_leaf_right(struct btrfs_trans_handle *trans, struct btrfs_root
 	if (left_nritems == 0)
 		goto out_unlock;
 
+	if (!valid_cross_tree_key_order(left, right)) {
+		ret = -EUCLEAN;
+		btrfs_tree_unlock(right);
+		free_extent_buffer(right);
+		return ret;
+	}
 	if (path->slots[0] == left_nritems && !empty) {
 		/* Key greater than all keys in the leaf, right neighbor has
 		 * enough room for it and we're not emptying our leaf to delete
@@ -3984,6 +4048,10 @@ static int push_leaf_left(struct btrfs_trans_handle *trans, struct btrfs_root
 		goto out;
 	}
 
+	if (!valid_cross_tree_key_order(left, right)) {
+		ret = -EUCLEAN;
+		goto out;
+	}
 	return __push_leaf_left(path, min_data_size,
 			       empty, left, free_space, right_nritems,
 			       max_slot);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images
  2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
                   ` (3 preceding siblings ...)
  2020-08-12  6:05 ` [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
@ 2020-08-12  6:52 ` David Sterba
  4 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-08-12  6:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Jungyeon Yoon

On Wed, Aug 12, 2020 at 02:05:05PM +0800, Qu Wenruo wrote:
> v4:
> - Remove one patch which is already merged
>   A little surprised by the fact that git can't detecth such case.

I've looked at it and that's a normal patch from git perspective,
there's not even a conflict in the context, you're adding a new hunk.
That it's the same one that's a few lines below is caused by
intermediate changes, but that's what happens with any other patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions
  2020-08-12  6:05 ` [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
@ 2020-08-13 14:05   ` David Sterba
  2020-08-14  0:47     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-08-13 14:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Josef Bacik

On Wed, Aug 12, 2020 at 02:05:06PM +0800, Qu Wenruo wrote:
> +/*
> + * Check if the [start, start + len) range is valid before reading/writing
> + * the eb.
> + * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
> + *
> + * Caller should not touch the dst/src memory if this function returns error.
> + */
> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
> +			  unsigned long len)
> +{
> +	/* start, start + len should not go beyond eb->len nor overflow */
> +	if (unlikely(start > eb->len || start + len > eb->len ||
> +		     len > eb->len)) {
> +		btrfs_warn(eb->fs_info,
> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> +			   eb->start, eb->len, start, len);
> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> +		return -EINVAL;
> +	}
> +	return 0;
> +}

This helper is similar to the check_setget_bounds that have some
performance impact,
https://lore.kernel.org/linux-btrfs/20200730110943.GE3703@twin.jikos.cz/
.

The extent buffer helpers are not called that often as the setget
helpers but still it could be improved to avoid the function call
penalty on the hot path.

static inline in check_eb_range(...) {
	if (unlikely(out of range))
		return report_eb_range(...)
	return 0;
}

In the original code the range check was open coded and the above will
lead to the same asm output, while keeping the C code readable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2020-08-12  6:05 ` [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
@ 2020-08-13 14:10   ` David Sterba
  2020-08-14  0:52     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-08-13 14:10 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov, Josef Bacik

On Wed, Aug 12, 2020 at 02:05:07PM +0800, Qu Wenruo wrote:
> @@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  				found_extent = 1;
>  				break;
>  			}
> +
> +			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
>  			if (path->slots[0] - extent_slot > 5)
>  				break;
>  			extent_slot--;
>  		}
>  
>  		if (!found_extent) {
> -			BUG_ON(iref);
> +			if (unlikely(iref)) {
> +				btrfs_crit(info,
> +"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
> +				goto err_dump_abort;

This is not calling transaction abort at the place where it happens,
here and several other places too.

> @@ -3164,6 +3267,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>  out:
>  	btrfs_free_path(path);
>  	return ret;
> +err_dump_abort:
> +	/*
> +	 * Leaf dump can take up a lot of dmesg buffer since default nodesize
> +	 * is already 16K.
> +	 * So we only do full leaf dump for debug build.
> +	 */
> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
> +		btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
> +			   path->slots[0], extent_slot);
> +		btrfs_print_leaf(path->nodes[0]);

Preceded by a potentially long-running action, so this will delay the
acutal abort.

> +	}
> +
> +	btrfs_abort_transaction(trans, -EUCLEAN);
> +	btrfs_free_path(path);
> +	return -EUCLEAN;
>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks
  2020-08-12  6:05 ` [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
@ 2020-08-13 14:21   ` David Sterba
  2020-08-14  0:54     ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-08-13 14:21 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Nikolay Borisov, Josef Bacik

On Wed, Aug 12, 2020 at 02:05:09PM +0800, Qu Wenruo wrote:
> [BUG]
> With crafted image, btrfs can panic at btrfs_del_csums().
>   kernel BUG at fs/btrfs/ctree.c:3188!
>   invalid opcode: 0000 [#1] SMP PTI
>   CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
>   RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
>   Code: b7 48 8d 7d bf 4c 89 fe 48 89 45 c8 0f b6 45 b6 88 45 c7 48 8b 45 ae 48 89 45 bf e8 ce f2 ff ff 85 c0 0f 8f 48 ff ff ff 0f 0b <0f> 0b e8 dd 8d be ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 66 66
>   RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
>   RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
>   RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
>   RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
>   R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
>   R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
>   FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
>   CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
>   Call Trace:
>   truncate_one_csum+0xac/0xf0
>   btrfs_del_csums+0x24f/0x3a0
>   __btrfs_free_extent.isra.72+0x5a7/0xbe0
>   __btrfs_run_delayed_refs+0x539/0x1120
>   btrfs_run_delayed_refs+0xdb/0x1b0
>   btrfs_commit_transaction+0x52/0x950
>   ? start_transaction+0x94/0x450
>   transaction_kthread+0x163/0x190
>   kthread+0x105/0x140
>   ? btrfs_cleanup_transaction+0x560/0x560
>   ? kthread_destroy_worker+0x50/0x50
>   ret_from_fork+0x35/0x40
>   Modules linked in:
>   ---[ end trace 93bf9db00e6c374e ]---
> 
> [CAUSE]
> This crafted image has a very tricky key order corruption:
> 
>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>           ...
>           key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
>           key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
>           ...
> 
>   leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
>           item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
>                   range start 73785344 end 75497472 length 1712128
>           item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
>                   range start 75497472 end 75501568 length 4096
>           item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
>                   range start 75501568 end 77283328 length 1781760
>           item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
>                   range start 77283328 end 77287424 length 4096
>           item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
>                   range start 4120596480 end 4120903680 length 307200
>   leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
>           item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
>                   range start 77594624 end 79306752 length 1712128
>           ...
> 
> Note the item 4 key of leaf 29757440, which is obviously too large, and
> even larger than the first key of the next leaf.
> 
> However it still follows the key order in that tree block, thus tree
> checker is unable to detect it at read time, since tree checker can only
> work inside a leaf, thus such complex corruption can't be rejected in
> advance.
> 
> [FIX]
> The next timing to detect such problem is at tree block merge time,
> which is in push_node_left(), balance_node_right(), push_leaf_left() and
> push_leaf_right().
> 
> Now we check if the key order of the right most key of the left node is
> larger than the left most key of the right node.
> 
> By this we don't need to call the full tree-check, while still keeps the
> key order correct as key order in each node is already checked by tree
> checker thus we only need to check the above two slots.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202833
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 70e49d8d4f6c..497abb397ea1 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -3159,6 +3159,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
>  		fixup_low_keys(path, &disk_key, 1);
>  }
>  
> +/*
> + * Check the cross tree block key ordering.
> + *
> + * Tree-checker only works inside one tree block, thus the following
> + * corruption can not be rejected by tree-checker:
> + * Leaf @left			| Leaf @right
> + * --------------------------------------------------------------
> + * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
> + *
> + * Key f6 in leaf @left itself is valid, but not valid when the next
> + * key in leaf @right is 7.
> + * This can only be checked at tree block merge time.
> + * And since tree checker has ensured all key order in each tree block
> + * is correct, we only need to bother the last key of @left and the first
> + * key of @right.
> + */
> +static bool valid_cross_tree_key_order(struct extent_buffer *left,
> +				       struct extent_buffer *right)

I think this naming is confusing, my first thought was that keys from
two trees were being checked, but this is for two leaves in the same
tree.

The arguments should be constified.

Elsewhere we use a check_<something> naming scheme with return value
true - problem, and 0/false - all ok. The 'valid' is the reverse and
also not following the scheme.

> +{
> +	struct btrfs_key left_last;
> +	struct btrfs_key right_first;
> +	int level = btrfs_header_level(left);
> +	int nr_left = btrfs_header_nritems(left);
> +	int nr_right = btrfs_header_nritems(right);
> +
> +	/* No key to check in one of the tree blocks */
> +	if (!nr_left || !nr_right)
> +		return true;
> +	if (level) {

You don't need a temporary variable for one-time use, btrfs_header_level
is understandable here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions
  2020-08-13 14:05   ` David Sterba
@ 2020-08-14  0:47     ` Qu Wenruo
  2020-08-14 10:29       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-14  0:47 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Josef Bacik


[-- Attachment #1.1: Type: text/plain, Size: 1743 bytes --]



On 2020/8/13 下午10:05, David Sterba wrote:
> On Wed, Aug 12, 2020 at 02:05:06PM +0800, Qu Wenruo wrote:
>> +/*
>> + * Check if the [start, start + len) range is valid before reading/writing
>> + * the eb.
>> + * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
>> + *
>> + * Caller should not touch the dst/src memory if this function returns error.
>> + */
>> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
>> +			  unsigned long len)
>> +{
>> +	/* start, start + len should not go beyond eb->len nor overflow */
>> +	if (unlikely(start > eb->len || start + len > eb->len ||
>> +		     len > eb->len)) {
>> +		btrfs_warn(eb->fs_info,
>> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
>> +			   eb->start, eb->len, start, len);
>> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>> +		return -EINVAL;
>> +	}
>> +	return 0;
>> +}
> 
> This helper is similar to the check_setget_bounds that have some
> performance impact,
> https://lore.kernel.org/linux-btrfs/20200730110943.GE3703@twin.jikos.cz/
> .
> 
> The extent buffer helpers are not called that often as the setget
> helpers but still it could be improved to avoid the function call
> penalty on the hot path.
> 
> static inline in check_eb_range(...) {
> 	if (unlikely(out of range))
> 		return report_eb_range(...)
> 	return 0;
> }

I thought we should avoid manual inline, but let the compiler to
determine if it's needed.

Or in this particular case, we're better than the compiler?

Thanks,
Qu

> 
> In the original code the range check was open coded and the above will
> lead to the same asm output, while keeping the C code readable.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2020-08-13 14:10   ` David Sterba
@ 2020-08-14  0:52     ` Qu Wenruo
  2020-08-14  8:01       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-14  0:52 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Josef Bacik



On 2020/8/13 下午10:10, David Sterba wrote:
> On Wed, Aug 12, 2020 at 02:05:07PM +0800, Qu Wenruo wrote:
>> @@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>>  				found_extent = 1;
>>  				break;
>>  			}
>> +
>> +			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
>>  			if (path->slots[0] - extent_slot > 5)
>>  				break;
>>  			extent_slot--;
>>  		}
>>
>>  		if (!found_extent) {
>> -			BUG_ON(iref);
>> +			if (unlikely(iref)) {
>> +				btrfs_crit(info,
>> +"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
>> +				goto err_dump_abort;
>
> This is not calling transaction abort at the place where it happens,
> here and several other places too.

Did you mean, we want the btrfs_abort_transaction() call not merged
under one tag, so that we can get the kernel warning with the line number?

If so, that's indeed the case, we lose the exact line number.

But we still have the unique error message to locate the problem without
much hassle (it's less obvious than the line number thought).

Thus to me, we don't lose much debug info anyway.

>
>> @@ -3164,6 +3267,21 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>>  out:
>>  	btrfs_free_path(path);
>>  	return ret;
>> +err_dump_abort:
>> +	/*
>> +	 * Leaf dump can take up a lot of dmesg buffer since default nodesize
>> +	 * is already 16K.
>> +	 * So we only do full leaf dump for debug build.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) {
>> +		btrfs_crit(info, "path->slots[0]=%d extent_slot=%d",
>> +			   path->slots[0], extent_slot);
>> +		btrfs_print_leaf(path->nodes[0]);
>
> Preceded by a potentially long-running action, so this will delay the
> acutal abort.

That's why it's only for CONFIG_BTRFS_DEBUG.

And for debug kernel, that info would definitely be more helpful for us
to locate the problem, right?

Thanks,
Qu

>
>> +	}
>> +
>> +	btrfs_abort_transaction(trans, -EUCLEAN);
>> +	btrfs_free_path(path);
>> +	return -EUCLEAN;
>>  }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks
  2020-08-13 14:21   ` David Sterba
@ 2020-08-14  0:54     ` Qu Wenruo
  2020-08-14  7:58       ` David Sterba
  0 siblings, 1 reply; 16+ messages in thread
From: Qu Wenruo @ 2020-08-14  0:54 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Josef Bacik



On 2020/8/13 下午10:21, David Sterba wrote:
> On Wed, Aug 12, 2020 at 02:05:09PM +0800, Qu Wenruo wrote:
[...]
>> ---
>>  fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 68 insertions(+)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 70e49d8d4f6c..497abb397ea1 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -3159,6 +3159,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
>>  		fixup_low_keys(path, &disk_key, 1);
>>  }
>>
>> +/*
>> + * Check the cross tree block key ordering.
>> + *
>> + * Tree-checker only works inside one tree block, thus the following
>> + * corruption can not be rejected by tree-checker:
>> + * Leaf @left			| Leaf @right
>> + * --------------------------------------------------------------
>> + * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
>> + *
>> + * Key f6 in leaf @left itself is valid, but not valid when the next
>> + * key in leaf @right is 7.
>> + * This can only be checked at tree block merge time.
>> + * And since tree checker has ensured all key order in each tree block
>> + * is correct, we only need to bother the last key of @left and the first
>> + * key of @right.
>> + */
>> +static bool valid_cross_tree_key_order(struct extent_buffer *left,
>> +				       struct extent_buffer *right)
>
> I think this naming is confusing, my first thought was that keys from
> two trees were being checked, but this is for two leaves in the same
> tree.
>
> The arguments should be constified.
>
> Elsewhere we use a check_<something> naming scheme with return value
> true - problem, and 0/false - all ok. The 'valid' is the reverse and
> also not following the scheme.

Any good candidate?

My current top list candidate is, check_sibling_keys().

Thanks,
Qu

>
>> +{
>> +	struct btrfs_key left_last;
>> +	struct btrfs_key right_first;
>> +	int level = btrfs_header_level(left);
>> +	int nr_left = btrfs_header_nritems(left);
>> +	int nr_right = btrfs_header_nritems(right);
>> +
>> +	/* No key to check in one of the tree blocks */
>> +	if (!nr_left || !nr_right)
>> +		return true;
>> +	if (level) {
>
> You don't need a temporary variable for one-time use, btrfs_header_level
> is understandable here.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks
  2020-08-14  0:54     ` Qu Wenruo
@ 2020-08-14  7:58       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-08-14  7:58 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Josef Bacik

On Fri, Aug 14, 2020 at 08:54:27AM +0800, Qu Wenruo wrote:
> On 2020/8/13 下午10:21, David Sterba wrote:
> > On Wed, Aug 12, 2020 at 02:05:09PM +0800, Qu Wenruo wrote:
> [...]
> >> ---
> >>  fs/btrfs/ctree.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 68 insertions(+)
> >>
> >> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >> index 70e49d8d4f6c..497abb397ea1 100644
> >> --- a/fs/btrfs/ctree.c
> >> +++ b/fs/btrfs/ctree.c
> >> @@ -3159,6 +3159,52 @@ void btrfs_set_item_key_safe(struct btrfs_fs_info *fs_info,
> >>  		fixup_low_keys(path, &disk_key, 1);
> >>  }
> >>
> >> +/*
> >> + * Check the cross tree block key ordering.
> >> + *
> >> + * Tree-checker only works inside one tree block, thus the following
> >> + * corruption can not be rejected by tree-checker:
> >> + * Leaf @left			| Leaf @right
> >> + * --------------------------------------------------------------
> >> + * | 1 | 2 | 3 | 4 | 5 | f6 |   | 7 | 8 |
> >> + *
> >> + * Key f6 in leaf @left itself is valid, but not valid when the next
> >> + * key in leaf @right is 7.
> >> + * This can only be checked at tree block merge time.
> >> + * And since tree checker has ensured all key order in each tree block
> >> + * is correct, we only need to bother the last key of @left and the first
> >> + * key of @right.
> >> + */
> >> +static bool valid_cross_tree_key_order(struct extent_buffer *left,
> >> +				       struct extent_buffer *right)
> >
> > I think this naming is confusing, my first thought was that keys from
> > two trees were being checked, but this is for two leaves in the same
> > tree.
> >
> > The arguments should be constified.
> >
> > Elsewhere we use a check_<something> naming scheme with return value
> > true - problem, and 0/false - all ok. The 'valid' is the reverse and
> > also not following the scheme.
> 
> Any good candidate?
> 
> My current top list candidate is, check_sibling_keys().

That sounds perfect.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2020-08-14  0:52     ` Qu Wenruo
@ 2020-08-14  8:01       ` David Sterba
  2020-08-14  8:07         ` Qu Wenruo
  0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2020-08-14  8:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Josef Bacik

On Fri, Aug 14, 2020 at 08:52:19AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/8/13 下午10:10, David Sterba wrote:
> > On Wed, Aug 12, 2020 at 02:05:07PM +0800, Qu Wenruo wrote:
> >> @@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
> >>  				found_extent = 1;
> >>  				break;
> >>  			}
> >> +
> >> +			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
> >>  			if (path->slots[0] - extent_slot > 5)
> >>  				break;
> >>  			extent_slot--;
> >>  		}
> >>
> >>  		if (!found_extent) {
> >> -			BUG_ON(iref);
> >> +			if (unlikely(iref)) {
> >> +				btrfs_crit(info,
> >> +"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
> >> +				goto err_dump_abort;
> >
> > This is not calling transaction abort at the place where it happens,
> > here and several other places too.
> 
> Did you mean, we want the btrfs_abort_transaction() call not merged
> under one tag, so that we can get the kernel warning with the line number?
> 
> If so, that's indeed the case, we lose the exact line number.
> 
> But we still have the unique error message to locate the problem without
> much hassle (it's less obvious than the line number thought).
> 
> Thus to me, we don't lose much debug info anyway.

https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_handling_and_transaction_abort

"Please keep all transaction abort exactly at the place where they happen
and do not merge them to one. This pattern should be used everwhere and
is important when debugging because we can pinpoint the line in the code
from the syslog message and do not have to guess which way it got to the
merged call."

It's very convenient to paste the file:line number from a stacktrace
report and land exactly in the spot where it failed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment
  2020-08-14  8:01       ` David Sterba
@ 2020-08-14  8:07         ` Qu Wenruo
  0 siblings, 0 replies; 16+ messages in thread
From: Qu Wenruo @ 2020-08-14  8:07 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs, Nikolay Borisov, Josef Bacik



On 2020/8/14 下午4:01, David Sterba wrote:
> On Fri, Aug 14, 2020 at 08:52:19AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/8/13 下午10:10, David Sterba wrote:
>>> On Wed, Aug 12, 2020 at 02:05:07PM +0800, Qu Wenruo wrote:
>>>> @@ -2987,13 +3049,20 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
>>>>  				found_extent = 1;
>>>>  				break;
>>>>  			}
>>>> +
>>>> +			/* Quick path didn't find the EXTEMT/METADATA_ITEM */
>>>>  			if (path->slots[0] - extent_slot > 5)
>>>>  				break;
>>>>  			extent_slot--;
>>>>  		}
>>>>
>>>>  		if (!found_extent) {
>>>> -			BUG_ON(iref);
>>>> +			if (unlikely(iref)) {
>>>> +				btrfs_crit(info,
>>>> +"invalid iref, no EXTENT/METADATA_ITEM found but has inline extent ref");
>>>> +				goto err_dump_abort;
>>>
>>> This is not calling transaction abort at the place where it happens,
>>> here and several other places too.
>>
>> Did you mean, we want the btrfs_abort_transaction() call not merged
>> under one tag, so that we can get the kernel warning with the line number?
>>
>> If so, that's indeed the case, we lose the exact line number.
>>
>> But we still have the unique error message to locate the problem without
>> much hassle (it's less obvious than the line number thought).
>>
>> Thus to me, we don't lose much debug info anyway.
> 
> https://btrfs.wiki.kernel.org/index.php/Development_notes#Error_handling_and_transaction_abort
> 
> "Please keep all transaction abort exactly at the place where they happen
> and do not merge them to one. This pattern should be used everwhere and
> is important when debugging because we can pinpoint the line in the code
> from the syslog message and do not have to guess which way it got to the
> merged call."
> 
> It's very convenient to paste the file:line number from a stacktrace
> report and land exactly in the spot where it failed.
> 
OK, makse sense.

Will go that direction in next update.

Thanks,
Qu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions
  2020-08-14  0:47     ` Qu Wenruo
@ 2020-08-14 10:29       ` David Sterba
  0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2020-08-14 10:29 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Josef Bacik

On Fri, Aug 14, 2020 at 08:47:17AM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/8/13 下午10:05, David Sterba wrote:
> > On Wed, Aug 12, 2020 at 02:05:06PM +0800, Qu Wenruo wrote:
> >> +/*
> >> + * Check if the [start, start + len) range is valid before reading/writing
> >> + * the eb.
> >> + * NOTE: @start and @len are offset *INSIDE* the eb, *NOT* logical address.
> >> + *
> >> + * Caller should not touch the dst/src memory if this function returns error.
> >> + */
> >> +static int check_eb_range(const struct extent_buffer *eb, unsigned long start,
> >> +			  unsigned long len)
> >> +{
> >> +	/* start, start + len should not go beyond eb->len nor overflow */
> >> +	if (unlikely(start > eb->len || start + len > eb->len ||
> >> +		     len > eb->len)) {
> >> +		btrfs_warn(eb->fs_info,
> >> +"btrfs: bad eb rw request, eb bytenr=%llu len=%lu rw start=%lu len=%lu\n",
> >> +			   eb->start, eb->len, start, len);
> >> +		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
> >> +		return -EINVAL;
> >> +	}
> >> +	return 0;
> >> +}
> > 
> > This helper is similar to the check_setget_bounds that have some
> > performance impact,
> > https://lore.kernel.org/linux-btrfs/20200730110943.GE3703@twin.jikos.cz/
> > .
> > 
> > The extent buffer helpers are not called that often as the setget
> > helpers but still it could be improved to avoid the function call
> > penalty on the hot path.
> > 
> > static inline in check_eb_range(...) {
> > 	if (unlikely(out of range))
> > 		return report_eb_range(...)
> > 	return 0;
> > }
> 
> I thought we should avoid manual inline, but let the compiler to
> determine if it's needed.
> 
> Or in this particular case, we're better than the compiler?

In general it shouldn't be necessary to inline or partition the
functions. In the check_setget case it had a noticeable impact on
performance, so crafting the hot path manually produces a better
assembly and does not depend on the compiler optimizations. The
important part here is that this has been analyzed and measured that it
really makes a difference.

For sanity checks I think we should try to make it as fast as possible,
it's better to have them then not but we also don't want to sacrifice
performance. I haven't analyzed the asm code impact in this patch but
the pattern and flow of check_eb_range is the same.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-08-14 10:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-12  6:05 [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images Qu Wenruo
2020-08-12  6:05 ` [PATCH v4 1/4] btrfs: extent_io: Do extra check for extent buffer read write functions Qu Wenruo
2020-08-13 14:05   ` David Sterba
2020-08-14  0:47     ` Qu Wenruo
2020-08-14 10:29       ` David Sterba
2020-08-12  6:05 ` [PATCH v4 2/4] btrfs: extent-tree: Kill BUG_ON() in __btrfs_free_extent() and do better comment Qu Wenruo
2020-08-13 14:10   ` David Sterba
2020-08-14  0:52     ` Qu Wenruo
2020-08-14  8:01       ` David Sterba
2020-08-14  8:07         ` Qu Wenruo
2020-08-12  6:05 ` [PATCH v4 3/4] btrfs: extent-tree: Kill the BUG_ON() in insert_inline_extent_backref() Qu Wenruo
2020-08-12  6:05 ` [PATCH v4 4/4] btrfs: ctree: Checking key orders before merged tree blocks Qu Wenruo
2020-08-13 14:21   ` David Sterba
2020-08-14  0:54     ` Qu Wenruo
2020-08-14  7:58       ` David Sterba
2020-08-12  6:52 ` [PATCH v4 0/4] btrfs: Enhanced runtime defence against fuzzed images David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).