All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: correctly caculate item size used when item key collision happends
@ 2018-08-14  9:05 ethanwu
  2018-08-14 23:07 ` Hans van Kranenburg
  0 siblings, 1 reply; 5+ messages in thread
From: ethanwu @ 2018-08-14  9:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: ethanwu

Item key collision is allowed for some item types, like dir item and
inode refs, but the overall item size is limited by the leafsize.

item size(ins_len) passed from btrfs_insert_empty_items to
btrfs_search_slot already contains size of btrfs_item.

When btrfs_search_slot reaches leaf, we'll see if we need to split leaf,
since the ins_len includes one struct btrfs_item, the check might
fail even though new item we try to insert could merge into the existing
one without adding new btrfs_item.
And split_leaf return -EOVERFLOW from following code:
if (extend && data_size + btrfs_item_size_nr(l, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
    return -EOVERFLOW;

In most cases, when callers receive -EOVERFLOW, they either return
this error or handle in different ways. For example, in normal dir item
insertion the userspace will get errno EOVERFLOW; in inode ref case
INODE_EXTREF is used instead if INODE_REF is full.

However, this is not the case for rename. To avoid the unrecoverable
situation in rename, btrfs_check_dir_item_collision is called in
early phase of rename. In this function, when item key collision is
detected leaf space is checked:

data_size = sizeof(*di) + name_len;
if (data_size + btrfs_item_size_nr(leaf, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))

the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
refers to existing item size.

The check condition here is not consistent with the btrfs_search_slot
when item key collision happens. We might pass check here but fail
at btrfs_search_slot.

in the rename call path
btrfs_add_link
  btrfs_insert_dir_item
    insert_with_overflow
      btrfs_insert_empty_items (btrfs item is counted)
        btrfs_search_slot

if (ins_len > 0 &&
    btrfs_leaf_free_space(fs_info, b) < ins_len) {

The ins_len here contains btrfs_item and the item data, but this
btrfs_item is already in leaf used space, two btrfs_item is counted and
we only need one when this is item key collision cases.
Therfore, rename fails, and abort transaction is triggered with
following error messages:
BTRFS: error (device loop0) in btrfs_rename:9870: errno=-75 unknown

There are two ways to fix rename issue, one is to revert the patch
878f2d2cb355 Btrfs: fix max dir item size calculation
to make the condition consistent.

The other way is to handle the leaf space check correctly when
collision happens. I prefer the second one since it correct leaf
space check in collision case. This fix needs unify the usage of ins_len
in btrfs_search_slot to contain btrfs_item anyway and
adjust all callers of btrfs_search_slot that intentionally pass ins_len
without btrfs_item size to add size of btrfs_item from now.

dir item hash collision is not easy to reproduce.
The following is a leaf sample filled with inode ref.

Before applying the patch, when item data reaches 16200
and we want to add another link with namelen 26(inode ref size 36)
It will not pass the leafspace check
10(inode ref item) + 26(name len) + 25(btrfs item) >
    leaf free space 58
and use BTRFS_INODE_EXTREF_KEY instead.
Nevertheless, The 25 bytes btrfs_item is not needed because the
newly inserted item could be merged with the existing one.

before patch:
leaf 31571968 items 1 free space 58 generation 178 owner 262
fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
    item 0 key (261 INODE_REF 257) itemoff 83 itemsize 16200
        inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
        inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
        inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
        ...
        inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450

after patch:
leaf 31899648 items 1 free space 22 generation 180 owner 262
fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
    item 0 key (263 INODE_REF 262) itemoff 47 itemsize 16236
        inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
        inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
        inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
        ...
        inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
        inode ref index 452 namelen 26 name: abcdefghijklmnopqrstuv0451

Signed-off-by: ethanwu <ethanwu@synology.com>
---
 fs/btrfs/ctree.c       | 15 +++++++++++++--
 fs/btrfs/extent-tree.c |  5 +++--
 fs/btrfs/file-item.c   |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4bc326d..3614dd9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
  * @p:		Holds all btree nodes along the search path
  * @root:	The root node of the tree
  * @key:	The key we are looking for
- * @ins_len:	Indicates purpose of search, for inserts it is 1, for
- *		deletions it's -1. 0 for plain searches
+ * @ins_len:	Indicates purpose of search, for inserts it is item size
+ *		including btrfs_item, for deletions it's -1. 0 for plain searches
  * @cow:	boolean should CoW operations be performed. Must always be 1
  *		when modifying the tree.
  *
@@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 			}
 		} else {
 			p->slots[level] = slot;
+			/*
+			 * item key collision happens. In this case, if we are allow
+			 * to insert the item(for example, in dir_item case, item key
+			 * collision is allowed), it will be merged with the original
+			 * item. Only the item size grows, no new btrfs item will be
+			 * added. Since the ins_len already accounts the size btrfs_item,
+			 * this value is counted twice. Duduct this value here so the
+			 * leaf space check will be correct.
+			 */
+			if (ret == 0 && ins_len > 0)
+				ins_len -= sizeof(struct btrfs_item);
 			if (ins_len > 0 &&
 			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
 				if (write_lock_level < 1) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58..4e3aa09 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
 
 	new_size -= sizeof(*ei0);
 	ret = btrfs_search_slot(trans, root, &key, path,
-				new_size + extra_size, 1);
+				new_size + extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		return ret;
 	BUG_ON(ret); /* Corruption */
@@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
 	}
 
 again:
-	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
+	ret = btrfs_search_slot(trans, root, &key, path,
+			    extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0) {
 		err = ret;
 		goto out;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index f9dd6d1..0b6c581 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 	 */
 	btrfs_release_path(path);
 	ret = btrfs_search_slot(trans, root, &file_key, path,
-				csum_size, 1);
+				csum_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		goto fail_unlock;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Btrfs: correctly caculate item size used when item key collision happends
  2018-08-14  9:05 [PATCH] Btrfs: correctly caculate item size used when item key collision happends ethanwu
@ 2018-08-14 23:07 ` Hans van Kranenburg
  2018-08-15  8:42   ` [PATCH v2] Btrfs: correctly calculate " ethanwu
  0 siblings, 1 reply; 5+ messages in thread
From: Hans van Kranenburg @ 2018-08-14 23:07 UTC (permalink / raw)
  To: ethanwu, linux-btrfs

On 08/14/2018 11:05 AM, ethanwu wrote:
> Item key collision is allowed for some item types, like dir item and
> inode refs, but the overall item size is limited by the leafsize.
> 
> item size(ins_len) passed from btrfs_insert_empty_items to
> btrfs_search_slot already contains size of btrfs_item.
> 
> When btrfs_search_slot reaches leaf, we'll see if we need to split leaf,
> since the ins_len includes one struct btrfs_item, the check might
> fail even though new item we try to insert could merge into the existing
> one without adding new btrfs_item.
> And split_leaf return -EOVERFLOW from following code:
> if (extend && data_size + btrfs_item_size_nr(l, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>     return -EOVERFLOW;
> 
> In most cases, when callers receive -EOVERFLOW, they either return
> this error or handle in different ways. For example, in normal dir item
> insertion the userspace will get errno EOVERFLOW; in inode ref case
> INODE_EXTREF is used instead if INODE_REF is full.
> 
> However, this is not the case for rename. To avoid the unrecoverable
> situation in rename, btrfs_check_dir_item_collision is called in
> early phase of rename. In this function, when item key collision is
> detected leaf space is checked:
> 
> data_size = sizeof(*di) + name_len;
> if (data_size + btrfs_item_size_nr(leaf, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
> 
> the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
> refers to existing item size.
> 
> The check condition here is not consistent with the btrfs_search_slot
> when item key collision happens. We might pass check here but fail
> at btrfs_search_slot.
> 
> in the rename call path
> btrfs_add_link
>   btrfs_insert_dir_item
>     insert_with_overflow
>       btrfs_insert_empty_items (btrfs item is counted)
>         btrfs_search_slot
> 
> if (ins_len > 0 &&
>     btrfs_leaf_free_space(fs_info, b) < ins_len) {
> 
> The ins_len here contains btrfs_item and the item data, but this
> btrfs_item is already in leaf used space, two btrfs_item is counted and
> we only need one when this is item key collision cases.
> Therfore, rename fails, and abort transaction is triggered with
> following error messages:
> BTRFS: error (device loop0) in btrfs_rename:9870: errno=-75 unknown
> 
> There are two ways to fix rename issue, one is to revert the patch
> 878f2d2cb355 Btrfs: fix max dir item size calculation
> to make the condition consistent.
> 
> The other way is to handle the leaf space check correctly when
> collision happens. I prefer the second one since it correct leaf
> space check in collision case. This fix needs unify the usage of ins_len
> in btrfs_search_slot to contain btrfs_item anyway and
> adjust all callers of btrfs_search_slot that intentionally pass ins_len
> without btrfs_item size to add size of btrfs_item from now.
> 
> dir item hash collision is not easy to reproduce.

I used this when testing dir_item related things:

http://crypto.junod.info/2012/12/13/hash-dos-and-btrfs/

> The following is a leaf sample filled with inode ref.
> 
> Before applying the patch, when item data reaches 16200
> and we want to add another link with namelen 26(inode ref size 36)
> It will not pass the leafspace check
> 10(inode ref item) + 26(name len) + 25(btrfs item) >
>     leaf free space 58
> and use BTRFS_INODE_EXTREF_KEY instead.
> Nevertheless, The 25 bytes btrfs_item is not needed because the
> newly inserted item could be merged with the existing one.
> 
> before patch:
> leaf 31571968 items 1 free space 58 generation 178 owner 262
> fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
> chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
>     item 0 key (261 INODE_REF 257) itemoff 83 itemsize 16200
>         inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
>         inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
>         inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
>         ...
>         inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
> 
> after patch:
> leaf 31899648 items 1 free space 22 generation 180 owner 262
> fs uuid 1abc143e-54af-491f-bff8-e58e21ad26e5
> chunk uuid 688bc1b5-5407-4f2d-9986-3dc3bf3019d3
>     item 0 key (263 INODE_REF 262) itemoff 47 itemsize 16236
>         inode ref index 504 namelen 26 name: abcdefghijklmnopqrstuv0001
>         inode ref index 505 namelen 26 name: abcdefghijklmnopqrstuv0002
>         inode ref index 506 namelen 26 name: abcdefghijklmnopqrstuv0003
>         ...
>         inode ref index 953 namelen 26 name: abcdefghijklmnopqrstuv0450
>         inode ref index 452 namelen 26 name: abcdefghijklmnopqrstuv0451
> 
> Signed-off-by: ethanwu <ethanwu@synology.com>
> ---
>  fs/btrfs/ctree.c       | 15 +++++++++++++--
>  fs/btrfs/extent-tree.c |  5 +++--
>  fs/btrfs/file-item.c   |  2 +-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4bc326d..3614dd9 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
>   * @p:		Holds all btree nodes along the search path
>   * @root:	The root node of the tree
>   * @key:	The key we are looking for
> - * @ins_len:	Indicates purpose of search, for inserts it is 1, for
> - *		deletions it's -1. 0 for plain searches
> + * @ins_len:	Indicates purpose of search, for inserts it is item size
> + *		including btrfs_item, for deletions it's -1. 0 for plain searches
>   * @cow:	boolean should CoW operations be performed. Must always be 1
>   *		when modifying the tree.
>   *
> @@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  			}
>  		} else {
>  			p->slots[level] = slot;
> +			/*
> +			 * item key collision happens. In this case, if we are allow
> +			 * to insert the item(for example, in dir_item case, item key
> +			 * collision is allowed), it will be merged with the original
> +			 * item. Only the item size grows, no new btrfs item will be
> +			 * added. Since the ins_len already accounts the size btrfs_item,
> +			 * this value is counted twice. Duduct this value here so the
> +			 * leaf space check will be correct.
> +			 */
> +			if (ret == 0 && ins_len > 0)
> +				ins_len -= sizeof(struct btrfs_item);
>  			if (ins_len > 0 &&
>  			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
>  				if (write_lock_level < 1) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58..4e3aa09 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
>  
>  	new_size -= sizeof(*ei0);
>  	ret = btrfs_search_slot(trans, root, &key, path,
> -				new_size + extra_size, 1);
> +				new_size + extra_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0)
>  		return ret;
>  	BUG_ON(ret); /* Corruption */
> @@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
>  	}
>  
>  again:
> -	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
> +	ret = btrfs_search_slot(trans, root, &key, path,
> +			    extra_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0) {
>  		err = ret;
>  		goto out;
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index f9dd6d1..0b6c581 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
>  	 */
>  	btrfs_release_path(path);
>  	ret = btrfs_search_slot(trans, root, &file_key, path,
> -				csum_size, 1);
> +				csum_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0)
>  		goto fail_unlock;
>  
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] Btrfs: correctly calculate item size used when item key collision happends
  2018-08-14 23:07 ` Hans van Kranenburg
@ 2018-08-15  8:42   ` ethanwu
  2018-08-15 11:07     ` Nikolay Borisov
  0 siblings, 1 reply; 5+ messages in thread
From: ethanwu @ 2018-08-15  8:42 UTC (permalink / raw)
  To: linux-btrfs; +Cc: ethanwu

Item key collision is allowed for some item types, like dir item and
inode refs, but the overall item size is limited by the leafsize.

item size(ins_len) passed from btrfs_insert_empty_items to
btrfs_search_slot already contains size of btrfs_item.

When btrfs_search_slot reaches leaf, we'll see if we need to split leaf.
The check incorrectly reports that split leaf is required, because
it treats the space required by the newly inserted item as
btrfs_item + item data. But in item key collision case, only item data
is actually needed, the newly inserted item could merge into the existing
one. No new btrfs_item will be inserted.

And split_leaf return -EOVERFLOW from following code:
if (extend && data_size + btrfs_item_size_nr(l, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
    return -EOVERFLOW;

In most cases, when callers receive -EOVERFLOW, they either return
this error or handle in different ways. For example, in normal dir item
creation the userspace will get errno EOVERFLOW; in inode ref case
INODE_EXTREF is used instead.

However, this is not the case for rename. To avoid the unrecoverable
situation in rename, btrfs_check_dir_item_collision is called in
early phase of rename. In this function, when item key collision is
detected leaf space is checked:

data_size = sizeof(*di) + name_len;
if (data_size + btrfs_item_size_nr(leaf, slot) +
    sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))

the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
refers to existing item size, the condition here correctly calculate
the needed size for collision case rather than the wrong case at
above.

The consequence of inconsistent condition check between
btrfs_check_dir_item_collision and btrfs_search_slot when item key
collision happens is that we might pass check here but fail
later at btrfs_search_slot. Rename fails and volume is forced readonly

[436149.586170] ------------[ cut here ]------------
[436149.586173] BTRFS: Transaction aborted (error -75)
[436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs]
[436149.586197] Modules linked in: btrfs zstd_compress xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c rpcsec_gss_krb5 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vmw_balloon crypto_simd cryptd glue_helper joydev input_leds intel_rapl_perf serio_raw vmw_vmci mac_hid sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse mptspi mptscsih mptbase scsi_transport_spi ahci vmxnet3 libahci i2c_piix4 floppy pata_acpi
[436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G      D           4.18.0-rc5+ #1
[436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs]
[436149.586238] Code: 50 f0 48 0f ba a8 10 ce 00 00 02 72 27 41 83 f8 fb 74 6f 44 89 c6 48 c7 c7 48 09 85 c0 44 89 55 80 44 89 45 98 e8 f8 5e 4c c5 <0f> 0b 44 8b 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98 ba
[436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286
[436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006
[436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0
[436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472
[436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08
[436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe
[436149.586260] FS:  00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000
[436149.586261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0
[436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[436149.586296] Call Trace:
[436149.586311]  vfs_rename+0x383/0x920
[436149.586313]  ? vfs_rename+0x383/0x920
[436149.586315]  do_renameat2+0x4ca/0x590
[436149.586317]  __x64_sys_rename+0x20/0x30
[436149.586324]  do_syscall_64+0x5a/0x120
[436149.586330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[436149.586332] RIP: 0033:0x7fa3133b1d37
[436149.586332] Code: 75 12 48 89 df e8 89 60 09 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 09 f3 c3 0f 1f 80 00 00 00 00 48 8b 15 19 f1
[436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37
[436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0
[436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0
[436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00

Thanks to Hans van Kranenburg for information about crc32 hash collision tools,
I was able to reproduce the dir item collision with following python script.
https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py
Run it under a btrfs volume will trigger the abort transaction.
It simply creates files and rename them to forged names that leads to
hash collision.

There are two ways to fix this. One is to simply revert the patch
"878f2d2cb355 Btrfs: fix max dir item size calculation"
to make the condition consistent although that patch is correct
about the size.

The other way is to handle the leaf space check correctly when
collision happens. I prefer the second one since it correct leaf
space check in collision case. This fix needs unify the usage of ins_len
in btrfs_search_slot to contain btrfs_item anyway and adjust all callers
of btrfs_search_slot that intentionally pass ins_len without btrfs_item
size to add size of btrfs_item from now.

Fixes: 878f2d2cb355 Btrfs: fix max dir item size calculation
Signed-off-by: ethanwu <ethanwu@synology.com>
---

v2: modify change log, add call trace and way to reproduce it

 fs/btrfs/ctree.c       | 15 +++++++++++++--
 fs/btrfs/extent-tree.c |  5 +++--
 fs/btrfs/file-item.c   |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 4bc326d..3614dd9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
  * @p:		Holds all btree nodes along the search path
  * @root:	The root node of the tree
  * @key:	The key we are looking for
- * @ins_len:	Indicates purpose of search, for inserts it is 1, for
- *		deletions it's -1. 0 for plain searches
+ * @ins_len:	Indicates purpose of search, for inserts it is item size
+ *		including btrfs_item, for deletions it's -1. 0 for plain searches
  * @cow:	boolean should CoW operations be performed. Must always be 1
  *		when modifying the tree.
  *
@@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
 			}
 		} else {
 			p->slots[level] = slot;
+			/*
+			 * item key collision happens. In this case, if we are allow
+			 * to insert the item(for example, in dir_item case, item key
+			 * collision is allowed), it will be merged with the original
+			 * item. Only the item size grows, no new btrfs item will be
+			 * added. Since the ins_len already accounts the size btrfs_item,
+			 * this value is counted twice. Duduct this value here so the
+			 * leaf space check will be correct.
+			 */
+			if (ret == 0 && ins_len > 0)
+				ins_len -= sizeof(struct btrfs_item);
 			if (ins_len > 0 &&
 			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
 				if (write_lock_level < 1) {
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3d9fe58..4e3aa09 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
 
 	new_size -= sizeof(*ei0);
 	ret = btrfs_search_slot(trans, root, &key, path,
-				new_size + extra_size, 1);
+				new_size + extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		return ret;
 	BUG_ON(ret); /* Corruption */
@@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
 	}
 
 again:
-	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
+	ret = btrfs_search_slot(trans, root, &key, path,
+			    extra_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0) {
 		err = ret;
 		goto out;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index f9dd6d1..0b6c581 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
 	 */
 	btrfs_release_path(path);
 	ret = btrfs_search_slot(trans, root, &file_key, path,
-				csum_size, 1);
+				csum_size + sizeof(struct btrfs_item), 1);
 	if (ret < 0)
 		goto fail_unlock;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] Btrfs: correctly calculate item size used when item key collision happends
  2018-08-15  8:42   ` [PATCH v2] Btrfs: correctly calculate " ethanwu
@ 2018-08-15 11:07     ` Nikolay Borisov
  2020-11-23 10:58       ` 吳子杰
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2018-08-15 11:07 UTC (permalink / raw)
  To: ethanwu, linux-btrfs



On 15.08.2018 11:42, ethanwu wrote:
> Item key collision is allowed for some item types, like dir item and
> inode refs, but the overall item size is limited by the leafsize.
> 
> item size(ins_len) passed from btrfs_insert_empty_items to
> btrfs_search_slot already contains size of btrfs_item.
> 
> When btrfs_search_slot reaches leaf, we'll see if we need to split leaf.
> The check incorrectly reports that split leaf is required, because
> it treats the space required by the newly inserted item as
> btrfs_item + item data. But in item key collision case, only item data
> is actually needed, the newly inserted item could merge into the existing
> one. No new btrfs_item will be inserted.
> 
> And split_leaf return -EOVERFLOW from following code:
> if (extend && data_size + btrfs_item_size_nr(l, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
>     return -EOVERFLOW;
> 
> In most cases, when callers receive -EOVERFLOW, they either return
> this error or handle in different ways. For example, in normal dir item
> creation the userspace will get errno EOVERFLOW; in inode ref case
> INODE_EXTREF is used instead.
> 
> However, this is not the case for rename. To avoid the unrecoverable
> situation in rename, btrfs_check_dir_item_collision is called in
> early phase of rename. In this function, when item key collision is
> detected leaf space is checked:
> 
> data_size = sizeof(*di) + name_len;
> if (data_size + btrfs_item_size_nr(leaf, slot) +
>     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
> 
> the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
> refers to existing item size, the condition here correctly calculate
> the needed size for collision case rather than the wrong case at
> above.
> 
> The consequence of inconsistent condition check between
> btrfs_check_dir_item_collision and btrfs_search_slot when item key
> collision happens is that we might pass check here but fail
> later at btrfs_search_slot. Rename fails and volume is forced readonly
> 
> [436149.586170] ------------[ cut here ]------------
> [436149.586173] BTRFS: Transaction aborted (error -75)
> [436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs]
> [436149.586197] Modules linked in: btrfs zstd_compress xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c rpcsec_gss_krb5 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vmw_balloon crypto_simd cryptd glue_helper joydev input_leds intel_rapl_perf serio_raw vmw_vmci mac_hid sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse mptspi mptscsih mptbase scsi_transport_spi ahci vmxnet3 libahci i2c_piix4 floppy pata_acpi
> [436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G      D           4.18.0-rc5+ #1
> [436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
> [436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs]
> [436149.586238] Code: 50 f0 48 0f ba a8 10 ce 00 00 02 72 27 41 83 f8 fb 74 6f 44 89 c6 48 c7 c7 48 09 85 c0 44 89 55 80 44 89 45 98 e8 f8 5e 4c c5 <0f> 0b 44 8b 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98 ba
> [436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286
> [436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006
> [436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0
> [436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472
> [436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08
> [436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe
> [436149.586260] FS:  00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000
> [436149.586261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0
> [436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [436149.586296] Call Trace:
> [436149.586311]  vfs_rename+0x383/0x920
> [436149.586313]  ? vfs_rename+0x383/0x920
> [436149.586315]  do_renameat2+0x4ca/0x590
> [436149.586317]  __x64_sys_rename+0x20/0x30
> [436149.586324]  do_syscall_64+0x5a/0x120
> [436149.586330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [436149.586332] RIP: 0033:0x7fa3133b1d37
> [436149.586332] Code: 75 12 48 89 df e8 89 60 09 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 09 f3 c3 0f 1f 80 00 00 00 00 48 8b 15 19 f1
> [436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
> [436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37
> [436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0
> [436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0
> [436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00
> 
> Thanks to Hans van Kranenburg for information about crc32 hash collision tools,
> I was able to reproduce the dir item collision with following python script.
> https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py
> Run it under a btrfs volume will trigger the abort transaction.
> It simply creates files and rename them to forged names that leads to
> hash collision.
> 
> There are two ways to fix this. One is to simply revert the patch
> "878f2d2cb355 Btrfs: fix max dir item size calculation"
> to make the condition consistent although that patch is correct
> about the size.
> 
> The other way is to handle the leaf space check correctly when
> collision happens. I prefer the second one since it correct leaf
> space check in collision case. This fix needs unify the usage of ins_len
> in btrfs_search_slot to contain btrfs_item anyway and adjust all callers
> of btrfs_search_slot that intentionally pass ins_len without btrfs_item
> size to add size of btrfs_item from now.
> 
> Fixes: 878f2d2cb355 Btrfs: fix max dir item size calculation
> Signed-off-by: ethanwu <ethanwu@synology.com>
> ---
> 
> v2: modify change log, add call trace and way to reproduce it
> 
>  fs/btrfs/ctree.c       | 15 +++++++++++++--
>  fs/btrfs/extent-tree.c |  5 +++--
>  fs/btrfs/file-item.c   |  2 +-
>  3 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 4bc326d..3614dd9 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
>   * @p:		Holds all btree nodes along the search path
>   * @root:	The root node of the tree
>   * @key:	The key we are looking for
> - * @ins_len:	Indicates purpose of search, for inserts it is 1, for
> - *		deletions it's -1. 0 for plain searches
> + * @ins_len:	Indicates purpose of search, for inserts it is item size
> + *		including btrfs_item, for deletions it's -1. 0 for plain searches

This must be reworded - even my initial commit adding the documentation
was wrong:

* @ins_len:	Indicates purpose of search, for inserts it is a positive
number (size of item inserted), for deletions it's a negative number and
0 for plain searches, not modifying the tree.

>   * @cow:	boolean should CoW operations be performed. Must always be 1
>   *		when modifying the tree.
>   *
> @@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>  			}
>  		} else {
>  			p->slots[level] = slot;
> +			/*
> +			 * item key collision happens. In this case, if we are allow
> +			 * to insert the item(for example, in dir_item case, item key
> +			 * collision is allowed), it will be merged with the original
> +			 * item. Only the item size grows, no new btrfs item will be
> +			 * added. Since the ins_len already accounts the size btrfs_item,
> +			 * this value is counted twice. Duduct this value here so the
> +			 * leaf space check will be correct.
> +			 */
> +			if (ret == 0 && ins_len > 0)
> +				ins_len -= sizeof(struct btrfs_item);
>  			if (ins_len > 0 &&
>  			    btrfs_leaf_free_space(fs_info, b) < ins_len) {
>  				if (write_lock_level < 1) {
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3d9fe58..4e3aa09 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
>  
>  	new_size -= sizeof(*ei0);
>  	ret = btrfs_search_slot(trans, root, &key, path,
> -				new_size + extra_size, 1);
> +				new_size + extra_size + sizeof(struct btrfs_item), 1);

You are using an old kernel tree since convert_extent_item_v0 (and all
v0) code for that matter has been removed from upstream. Rebase the
patch on current misc-next

>  	if (ret < 0)
>  		return ret;
>  	BUG_ON(ret); /* Corruption */
> @@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
>  	}
>  
>  again:
> -	ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
> +	ret = btrfs_search_slot(trans, root, &key, path,
> +			    extra_size + sizeof(struct btrfs_item), 1);

So you add the size of the struct btrfs_item but in case collision
happens then the newly added code in btrfs_search_slot will negate this
and the code will act as before the patch. Is my understanding correct?

>  	if (ret < 0) {
>  		err = ret;
>  		goto out;
> diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> index f9dd6d1..0b6c581 100644
> --- a/fs/btrfs/file-item.c
> +++ b/fs/btrfs/file-item.c
> @@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
>  	 */
>  	btrfs_release_path(path);
>  	ret = btrfs_search_slot(trans, root, &file_key, path,
> -				csum_size, 1);
> +				csum_size + sizeof(struct btrfs_item), 1);
>  	if (ret < 0)
>  		goto fail_unlock;
>  
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] Btrfs: correctly calculate item size used when item key collision happends
  2018-08-15 11:07     ` Nikolay Borisov
@ 2020-11-23 10:58       ` 吳子杰
  0 siblings, 0 replies; 5+ messages in thread
From: 吳子杰 @ 2020-11-23 10:58 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: ethanwu, linux-btrfs

Nikolay Borisov <nborisov@suse.com> 於 2018年8月15日 週三 下午7:09寫道:
>
>
>
> On 15.08.2018 11:42, ethanwu wrote:
> > Item key collision is allowed for some item types, like dir item and
> > inode refs, but the overall item size is limited by the leafsize.
> >
> > item size(ins_len) passed from btrfs_insert_empty_items to
> > btrfs_search_slot already contains size of btrfs_item.
> >
> > When btrfs_search_slot reaches leaf, we'll see if we need to split leaf.
> > The check incorrectly reports that split leaf is required, because
> > it treats the space required by the newly inserted item as
> > btrfs_item + item data. But in item key collision case, only item data
> > is actually needed, the newly inserted item could merge into the existing
> > one. No new btrfs_item will be inserted.
> >
> > And split_leaf return -EOVERFLOW from following code:
> > if (extend && data_size + btrfs_item_size_nr(l, slot) +
> >     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
> >     return -EOVERFLOW;
> >
> > In most cases, when callers receive -EOVERFLOW, they either return
> > this error or handle in different ways. For example, in normal dir item
> > creation the userspace will get errno EOVERFLOW; in inode ref case
> > INODE_EXTREF is used instead.
> >
> > However, this is not the case for rename. To avoid the unrecoverable
> > situation in rename, btrfs_check_dir_item_collision is called in
> > early phase of rename. In this function, when item key collision is
> > detected leaf space is checked:
> >
> > data_size = sizeof(*di) + name_len;
> > if (data_size + btrfs_item_size_nr(leaf, slot) +
> >     sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))
> >
> > the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
> > refers to existing item size, the condition here correctly calculate
> > the needed size for collision case rather than the wrong case at
> > above.
> >
> > The consequence of inconsistent condition check between
> > btrfs_check_dir_item_collision and btrfs_search_slot when item key
> > collision happens is that we might pass check here but fail
> > later at btrfs_search_slot. Rename fails and volume is forced readonly
> >
> > [436149.586170] ------------[ cut here ]------------
> > [436149.586173] BTRFS: Transaction aborted (error -75)
> > [436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs]
> > [436149.586197] Modules linked in: btrfs zstd_compress xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c rpcsec_gss_krb5 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 vmw_balloon crypto_simd cryptd glue_helper joydev input_leds intel_rapl_perf serio_raw vmw_vmci mac_hid sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc parport_pc ppdev lp parport ip_tables x_tables autofs4 vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm psmouse mptspi mptscsih mptbase scsi_transport_spi ahci vmxnet3 libahci i2c_piix4 floppy pata_acpi
> > [436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G      D           4.18.0-rc5+ #1
> > [436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
> > [436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs]
> > [436149.586238] Code: 50 f0 48 0f ba a8 10 ce 00 00 02 72 27 41 83 f8 fb 74 6f 44 89 c6 48 c7 c7 48 09 85 c0 44 89 55 80 44 89 45 98 e8 f8 5e 4c c5 <0f> 0b 44 8b 45 98 44 8b 55 80 44 89 55 80 44 89 c1 44 89 45 98 ba
> > [436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286
> > [436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006
> > [436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0
> > [436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472
> > [436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08
> > [436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe
> > [436149.586260] FS:  00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000
> > [436149.586261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0
> > [436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [436149.586296] Call Trace:
> > [436149.586311]  vfs_rename+0x383/0x920
> > [436149.586313]  ? vfs_rename+0x383/0x920
> > [436149.586315]  do_renameat2+0x4ca/0x590
> > [436149.586317]  __x64_sys_rename+0x20/0x30
> > [436149.586324]  do_syscall_64+0x5a/0x120
> > [436149.586330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [436149.586332] RIP: 0033:0x7fa3133b1d37
> > [436149.586332] Code: 75 12 48 89 df e8 89 60 09 00 85 c0 0f 95 c0 0f b6 c0 f7 d8 5b c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 09 f3 c3 0f 1f 80 00 00 00 00 48 8b 15 19 f1
> > [436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
> > [436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37
> > [436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0
> > [436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0
> > [436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00
> >
> > Thanks to Hans van Kranenburg for information about crc32 hash collision tools,
> > I was able to reproduce the dir item collision with following python script.
> > https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py
> > Run it under a btrfs volume will trigger the abort transaction.
> > It simply creates files and rename them to forged names that leads to
> > hash collision.
> >
> > There are two ways to fix this. One is to simply revert the patch
> > "878f2d2cb355 Btrfs: fix max dir item size calculation"
> > to make the condition consistent although that patch is correct
> > about the size.
> >
> > The other way is to handle the leaf space check correctly when
> > collision happens. I prefer the second one since it correct leaf
> > space check in collision case. This fix needs unify the usage of ins_len
> > in btrfs_search_slot to contain btrfs_item anyway and adjust all callers
> > of btrfs_search_slot that intentionally pass ins_len without btrfs_item
> > size to add size of btrfs_item from now.
> >
> > Fixes: 878f2d2cb355 Btrfs: fix max dir item size calculation
> > Signed-off-by: ethanwu <ethanwu@synology.com>
> > ---
> >
> > v2: modify change log, add call trace and way to reproduce it
> >
> >  fs/btrfs/ctree.c       | 15 +++++++++++++--
> >  fs/btrfs/extent-tree.c |  5 +++--
> >  fs/btrfs/file-item.c   |  2 +-
> >  3 files changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index 4bc326d..3614dd9 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -2678,8 +2678,8 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
> >   * @p:               Holds all btree nodes along the search path
> >   * @root:    The root node of the tree
> >   * @key:     The key we are looking for
> > - * @ins_len: Indicates purpose of search, for inserts it is 1, for
> > - *           deletions it's -1. 0 for plain searches
> > + * @ins_len: Indicates purpose of search, for inserts it is item size
> > + *           including btrfs_item, for deletions it's -1. 0 for plain searches
>
> This must be reworded - even my initial commit adding the documentation
> was wrong:
>
> * @ins_len:     Indicates purpose of search, for inserts it is a positive
> number (size of item inserted), for deletions it's a negative number and
> 0 for plain searches, not modifying the tree.
>

OK I'll modify the comment as well, and thanks for providing the sample.

> >   * @cow:     boolean should CoW operations be performed. Must always be 1
> >   *           when modifying the tree.
> >   *
> > @@ -2893,6 +2893,17 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
> >                       }
> >               } else {
> >                       p->slots[level] = slot;
> > +                     /*
> > +                      * item key collision happens. In this case, if we are allow
> > +                      * to insert the item(for example, in dir_item case, item key
> > +                      * collision is allowed), it will be merged with the original
> > +                      * item. Only the item size grows, no new btrfs item will be
> > +                      * added. Since the ins_len already accounts the size btrfs_item,
> > +                      * this value is counted twice. Duduct this value here so the
> > +                      * leaf space check will be correct.
> > +                      */
> > +                     if (ret == 0 && ins_len > 0)
> > +                             ins_len -= sizeof(struct btrfs_item);
> >                       if (ins_len > 0 &&
> >                           btrfs_leaf_free_space(fs_info, b) < ins_len) {
> >                               if (write_lock_level < 1) {
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 3d9fe58..4e3aa09 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -1094,7 +1094,7 @@ static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
> >
> >       new_size -= sizeof(*ei0);
> >       ret = btrfs_search_slot(trans, root, &key, path,
> > -                             new_size + extra_size, 1);
> > +                             new_size + extra_size + sizeof(struct btrfs_item), 1);
>
> You are using an old kernel tree since convert_extent_item_v0 (and all
> v0) code for that matter has been removed from upstream. Rebase the
> patch on current misc-next
>
Sorry, I'll rebase. I didn't find misc-next, I think it's for-next right now?

> >       if (ret < 0)
> >               return ret;
> >       BUG_ON(ret); /* Corruption */
> > @@ -1644,7 +1644,8 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans,
> >       }
> >
> >  again:
> > -     ret = btrfs_search_slot(trans, root, &key, path, extra_size, 1);
> > +     ret = btrfs_search_slot(trans, root, &key, path,
> > +                         extra_size + sizeof(struct btrfs_item), 1);
>
> So you add the size of the struct btrfs_item but in case collision
> happens then the newly added code in btrfs_search_slot will negate this
> and the code will act as before the patch. Is my understanding correct?
>
Yes, checking all the places where ins_len > 0, only these two places
needs to be fix.
But I handled these two places incorrectly in this patch.

extra_size might be -1, and in this case I should not add
sizeof(struct btrfs_item);

I'll change to this

...
int ins_len
...

if (insert) {
    extra_size = btrfs_extent_inline_ref_size(want);
    ins_len = extra_size + sizeof(struct btrfs_item);
    ...
} else {
    extra_size = -1;
    ins_len = -1;
}

,,,
ret = btrfs_search_slot(trans, root, &key, path, ins_len, 1);

> >       if (ret < 0) {
> >               err = ret;
> >               goto out;
> > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
> > index f9dd6d1..0b6c581 100644
> > --- a/fs/btrfs/file-item.c
> > +++ b/fs/btrfs/file-item.c
> > @@ -804,7 +804,7 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
> >        */
> >       btrfs_release_path(path);
> >       ret = btrfs_search_slot(trans, root, &file_key, path,
> > -                             csum_size, 1);
> > +                             csum_size + sizeof(struct btrfs_item), 1);
> >       if (ret < 0)
> >               goto fail_unlock;
> >
> >

This code is wrong as well.
Reaching here means we got -EFBIG from btrfs_lookup_csum earlier in
btrfs_csum_file_blocks.
This indicates that we found a checksum item that ends at an offset
matching the start of the checksum range we want to insert. Therefore,
btrfs_search_slot won't find the item with file_key we want, so
sizeof(struct btrfs_item) is not needed here.

I'll resend v3 patch set.

thanks,
ethanwu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-23 10:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14  9:05 [PATCH] Btrfs: correctly caculate item size used when item key collision happends ethanwu
2018-08-14 23:07 ` Hans van Kranenburg
2018-08-15  8:42   ` [PATCH v2] Btrfs: correctly calculate " ethanwu
2018-08-15 11:07     ` Nikolay Borisov
2020-11-23 10:58       ` 吳子杰

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.