All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] btrfs: Check the first key and level for cached extent buffer
@ 2019-03-12  7:45 Qu Wenruo
  2019-03-12  7:57 ` Nikolay Borisov
  2019-03-12  8:28 ` Nikolay Borisov
  0 siblings, 2 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-03-12  7:45 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Yoon Jungyeon

[BUG]
When reading a file from a fuzzed image, kernel can panic like:
  BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
  assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
  ------------[ cut here ]------------
  kernel BUG at fs/btrfs/ctree.h:3500!
  invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
  Call Trace:
   btrfs_lookup_csum+0x52/0x150 [btrfs]
   __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
   btrfs_submit_bio_hook+0x103/0x170 [btrfs]
   submit_one_bio+0x59/0x80 [btrfs]
   extent_read_full_page+0x58/0x80 [btrfs]
   generic_file_read_iter+0x2f6/0x9d0
   __vfs_read+0x14d/0x1a0
   vfs_read+0x8d/0x140
   ksys_read+0x52/0xc0
   do_syscall_64+0x60/0x210
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
  checksum tree key (CSUM_TREE ROOT_ITEM 0)
  node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
  fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
  chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
  	...
          key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19

  leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
  leaf 29761536 flags 0x1(WRITTEN) backref revision 1
  fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
  chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
          item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
                  range start 8798638964736 end 8798641262592 length 2297856

For the first time tree read, it will not pass verify_level_key() check.
But the extent buffer will still be cached.

Also there is a pitfall in read_block_for_search(), where a cached
extent buffer will not be checked for its level and first key.

There are context where we read tree block without verifying its
first key, such as scrub.

So in that case, a corrupted leaf can sneak in and screw up the kernel.

[FIX]
Export verify_level_key() as btrfs_verify_level_key() and call it in
read_block_for_search() to fill the hole.

Please note, this will cause a lot of extra error message if we have a
bad tree block in any hot tree, but it's still much better to trigger
the final safe net in key_search_validate().

Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
There is one remaining report where I can't get any kernel message
before the VM completely lost reponse.
https://bugzilla.kernel.org/show_bug.cgi?id=202763
(out-of-bound access in end_bio_extent_readpage() when mounting and operating a crafted btrfs image)

And the above case can't be fixed by this patch. Still trying to get a
good idea of what's going wrong (AKA good kernel message).

There is also another report, which doesn't provide the fuzzed image.
https://bugzilla.kernel.org/show_bug.cgi?id=202765
(NULL pointer dereference when mounting a crafted btrfs image)
---
 fs/btrfs/ctree.c   | 10 ++++++++++
 fs/btrfs/disk-io.c | 10 +++++-----
 fs/btrfs/disk-io.h |  3 +++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5a6c39b44c84..37447a7c5b4d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2401,6 +2401,16 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
 	if (tmp) {
 		/* first we do an atomic uptodate check */
 		if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) {
+			/*
+			 * Do extra check for first_key, as it's possible the
+			 * eb is read from the context without first_key
+			 * requirement
+			 */
+			if (btrfs_verify_level_key(fs_info, tmp,
+					parent_level - 1, &first_key, gen)) {
+				free_extent_buffer(tmp);
+				return -EUCLEAN;
+			}
 			*eb_ret = tmp;
 			return 0;
 		}
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 298b34721bc0..e2a0cb362d28 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -423,9 +423,9 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
 	return ret;
 }
 
-static int verify_level_key(struct btrfs_fs_info *fs_info,
-			    struct extent_buffer *eb, int level,
-			    struct btrfs_key *first_key, u64 parent_transid)
+int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *eb, int level,
+			   struct btrfs_key *first_key, u64 parent_transid)
 {
 	int found_level;
 	struct btrfs_key found_key;
@@ -500,8 +500,8 @@ static int btree_read_extent_buffer_pages(struct btrfs_fs_info *fs_info,
 			if (verify_parent_transid(io_tree, eb,
 						   parent_transid, 0))
 				ret = -EIO;
-			else if (verify_level_key(fs_info, eb, level,
-						  first_key, parent_transid))
+			else if (btrfs_verify_level_key(fs_info, eb, level,
+						first_key, parent_transid))
 				ret = -EUCLEAN;
 			else
 				break;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 987a64bc0c66..67a9fe2d29c7 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
 struct btrfs_device;
 struct btrfs_fs_devices;
 
+int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
+			   struct extent_buffer *eb, int level,
+			   struct btrfs_key *first_key, u64 parent_transid);
 struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
 				      u64 parent_transid, int level,
 				      struct btrfs_key *first_key);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  7:45 [PATCH] btrfs: Check the first key and level for cached extent buffer Qu Wenruo
@ 2019-03-12  7:57 ` Nikolay Borisov
  2019-03-12  8:05   ` Qu Wenruo
  2019-03-12  8:11   ` Nikolay Borisov
  2019-03-12  8:28 ` Nikolay Borisov
  1 sibling, 2 replies; 9+ messages in thread
From: Nikolay Borisov @ 2019-03-12  7:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
> [BUG]
> When reading a file from a fuzzed image, kernel can panic like:
>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ctree.h:3500!
>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>   Call Trace:
>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>    submit_one_bio+0x59/0x80 [btrfs]
>    extent_read_full_page+0x58/0x80 [btrfs]
>    generic_file_read_iter+0x2f6/0x9d0
>    __vfs_read+0x14d/0x1a0
>    vfs_read+0x8d/0x140
>    ksys_read+0x52/0xc0
>    do_syscall_64+0x60/0x210
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [CAUSE]
> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>   	...
>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
> 
>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>                   range start 8798638964736 end 8798641262592 length 2297856
> 
> For the first time tree read, it will not pass verify_level_key() check.
> But the extent buffer will still be cached.
> 
> Also there is a pitfall in read_block_for_search(), where a cached
> extent buffer will not be checked for its level and first key.
> 
> There are context where we read tree block without verifying its
> first key, such as scrub.
> 
> So in that case, a corrupted leaf can sneak in and screw up the kernel.
> 
> [FIX]
> Export verify_level_key() as btrfs_verify_level_key() and call it in
> read_block_for_search() to fill the hole.
> 
> Please note, this will cause a lot of extra error message if we have a
> bad tree block in any hot tree, but it's still much better to trigger
> the final safe net in key_search_validate().
> 
> Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> There is one remaining report where I can't get any kernel message
> before the VM completely lost reponse.
> https://bugzilla.kernel.org/show_bug.cgi?id=202763
> (out-of-bound access in end_bio_extent_readpage() when mounting and operating a crafted btrfs image)
> 
> And the above case can't be fixed by this patch. Still trying to get a
> good idea of what's going wrong (AKA good kernel message).
> 
> There is also another report, which doesn't provide the fuzzed image.
> https://bugzilla.kernel.org/show_bug.cgi?id=202765
> (NULL pointer dereference when mounting a crafted btrfs image)
> ---
>  fs/btrfs/ctree.c   | 10 ++++++++++
>  fs/btrfs/disk-io.c | 10 +++++-----
>  fs/btrfs/disk-io.h |  3 +++
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 5a6c39b44c84..37447a7c5b4d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2401,6 +2401,16 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
>  	if (tmp) {
>  		/* first we do an atomic uptodate check */
>  		if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) {
> +			/*
> +			 * Do extra check for first_key, as it's possible the
> +			 * eb is read from the context without first_key
> +			 * requirement
> +			 */
> +			if (btrfs_verify_level_key(fs_info, tmp,
> +					parent_level - 1, &first_key, gen)) {
> +				free_extent_buffer(tmp);
> +				return -EUCLEAN;
> +			}
>  			*eb_ret = tmp;
>  			return 0;
>  		}
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 298b34721bc0..e2a0cb362d28 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -423,9 +423,9 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
>  	return ret;
>  }
>  
> -static int verify_level_key(struct btrfs_fs_info *fs_info,
> -			    struct extent_buffer *eb, int level,
> -			    struct btrfs_key *first_key, u64 parent_transid)
> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *eb, int level,
> +			   struct btrfs_key *first_key, u64 parent_transid)
>  {
>  	int found_level;
>  	struct btrfs_key found_key;
> @@ -500,8 +500,8 @@ static int btree_read_extent_buffer_pages(struct btrfs_fs_info *fs_info,
>  			if (verify_parent_transid(io_tree, eb,
>  						   parent_transid, 0))
>  				ret = -EIO;
> -			else if (verify_level_key(fs_info, eb, level,
> -						  first_key, parent_transid))
> +			else if (btrfs_verify_level_key(fs_info, eb, level,
> +						first_key, parent_transid))
>  				ret = -EUCLEAN;

Actually why is the buffer still held when we return EUCLEAN since in
read_tree_block if btree_read_extent_buffer_pages returns an error
free_extent_buffer should be called and it should delete the eb from eb
cache, no ? IMO the correct behavior should be to remove the corrupted
buffer ASAP and not rely on later validation.

>  			else
>  				break;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 987a64bc0c66..67a9fe2d29c7 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>  struct btrfs_device;
>  struct btrfs_fs_devices;
>  
> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *eb, int level,
> +			   struct btrfs_key *first_key, u64 parent_transid);
>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>  				      u64 parent_transid, int level,
>  				      struct btrfs_key *first_key);
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  7:57 ` Nikolay Borisov
@ 2019-03-12  8:05   ` Qu Wenruo
  2019-03-12  8:11   ` Nikolay Borisov
  1 sibling, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-03-12  8:05 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs; +Cc: Yoon Jungyeon



On 2019/3/12 下午3:57, Nikolay Borisov wrote:
[snip]
>>  			*eb_ret = tmp;
>>  			return 0;
>>  		}
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 298b34721bc0..e2a0cb362d28 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -423,9 +423,9 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
>>  	return ret;
>>  }
>>  
>> -static int verify_level_key(struct btrfs_fs_info *fs_info,
>> -			    struct extent_buffer *eb, int level,
>> -			    struct btrfs_key *first_key, u64 parent_transid)
>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>> +			   struct extent_buffer *eb, int level,
>> +			   struct btrfs_key *first_key, u64 parent_transid)
>>  {
>>  	int found_level;
>>  	struct btrfs_key found_key;
>> @@ -500,8 +500,8 @@ static int btree_read_extent_buffer_pages(struct btrfs_fs_info *fs_info,
>>  			if (verify_parent_transid(io_tree, eb,
>>  						   parent_transid, 0))
>>  				ret = -EIO;
>> -			else if (verify_level_key(fs_info, eb, level,
>> -						  first_key, parent_transid))
>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>> +						first_key, parent_transid))
>>  				ret = -EUCLEAN;
> 
> Actually why is the buffer still held when we return EUCLEAN since in
> read_tree_block if btree_read_extent_buffer_pages returns an error
> free_extent_buffer should be called and it should delete the eb from eb
> cache, no ?

Yes, that's the ideal case.

But there is a hole where we could read tree block without first_key
check, just as mentioned it's scrub.

In that case extent buffer will be cached anyway, so we can't skip the
check here.

Thanks,
Qu

> IMO the correct behavior should be to remove the corrupted
> buffer ASAP and not rely on later validation.
> 
>>  			else
>>  				break;
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index 987a64bc0c66..67a9fe2d29c7 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>  struct btrfs_device;
>>  struct btrfs_fs_devices;
>>  
>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>> +			   struct extent_buffer *eb, int level,
>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>  				      u64 parent_transid, int level,
>>  				      struct btrfs_key *first_key);
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  7:57 ` Nikolay Borisov
  2019-03-12  8:05   ` Qu Wenruo
@ 2019-03-12  8:11   ` Nikolay Borisov
  2019-03-12  8:32     ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2019-03-12  8:11 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 12.03.19 г. 9:57 ч., Nikolay Borisov wrote:
> 
> 
> On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
>> [BUG]
>> When reading a file from a fuzzed image, kernel can panic like:
>>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>>   ------------[ cut here ]------------
>>   kernel BUG at fs/btrfs/ctree.h:3500!
>>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>>   Call Trace:
>>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>>    submit_one_bio+0x59/0x80 [btrfs]
>>    extent_read_full_page+0x58/0x80 [btrfs]
>>    generic_file_read_iter+0x2f6/0x9d0
>>    __vfs_read+0x14d/0x1a0
>>    vfs_read+0x8d/0x140
>>    ksys_read+0x52/0xc0
>>    do_syscall_64+0x60/0x210
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> [CAUSE]
>> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>   	...
>>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
>>
>>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>>                   range start 8798638964736 end 8798641262592 length 2297856
>>
>> For the first time tree read, it will not pass verify_level_key() check.
>> But the extent buffer will still be cached.
>>
>> Also there is a pitfall in read_block_for_search(), where a cached
>> extent buffer will not be checked for its level and first key.
>>
>> There are context where we read tree block without verifying its
>> first key, such as scrub.
>>
>> So in that case, a corrupted leaf can sneak in and screw up the kernel.
>>
>> [FIX]
>> Export verify_level_key() as btrfs_verify_level_key() and call it in
>> read_block_for_search() to fill the hole.
>>
>> Please note, this will cause a lot of extra error message if we have a
>> bad tree block in any hot tree, but it's still much better to trigger
>> the final safe net in key_search_validate().
>>

<snip>

>>  				ret = -EIO;
>> -			else if (verify_level_key(fs_info, eb, level,
>> -						  first_key, parent_transid))
>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>> +						first_key, parent_transid))
>>  				ret = -EUCLEAN;
> 
> Actually why is the buffer still held when we return EUCLEAN since in
> read_tree_block if btree_read_extent_buffer_pages returns an error
> free_extent_buffer should be called and it should delete the eb from eb
> cache, no ? IMO the correct behavior should be to remove the corrupted
> buffer ASAP and not rely on later validation.

Actually in this case the call to free_extent_buffer in read_tree_block
won't really clean the buffer since at this point the buffer has refs =
2 (one from alloc_extent_buffer and one from being added to the tree),
however the code in free_extent_buffer won't execute the atomic_cmpxchg
to do the decrement nor will it execute the fix up right after the
spinlock if (refs==2 && EXTENT_BUFFER_STALE) which leaves only a single
call to atomic_dec_and_test in release_extent_buffer which will return
false. That's wrong.


The way to fix it is to either:
a) add a call to atomic_dec(eb->refs) so that the single call to
atomic_dec_and_test frees the eb

b) call free_extent_buffer_stale which does atomic_dec itself, I'm more
inclined to use this option.


> 
>>  			else
>>  				break;
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index 987a64bc0c66..67a9fe2d29c7 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>  struct btrfs_device;
>>  struct btrfs_fs_devices;
>>  
>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>> +			   struct extent_buffer *eb, int level,
>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>  				      u64 parent_transid, int level,
>>  				      struct btrfs_key *first_key);
>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  7:45 [PATCH] btrfs: Check the first key and level for cached extent buffer Qu Wenruo
  2019-03-12  7:57 ` Nikolay Borisov
@ 2019-03-12  8:28 ` Nikolay Borisov
  2019-03-12  8:34   ` Qu Wenruo
  1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2019-03-12  8:28 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
> [BUG]
> When reading a file from a fuzzed image, kernel can panic like:
>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>   ------------[ cut here ]------------
>   kernel BUG at fs/btrfs/ctree.h:3500!
>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>   Call Trace:
>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>    submit_one_bio+0x59/0x80 [btrfs]
>    extent_read_full_page+0x58/0x80 [btrfs]
>    generic_file_read_iter+0x2f6/0x9d0
>    __vfs_read+0x14d/0x1a0
>    vfs_read+0x8d/0x140
>    ksys_read+0x52/0xc0
>    do_syscall_64+0x60/0x210
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [CAUSE]
> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>   	...
>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
> 
>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>                   range start 8798638964736 end 8798641262592 length 2297856
> 
> For the first time tree read, it will not pass verify_level_key() check.
> But the extent buffer will still be cached.
> 
> Also there is a pitfall in read_block_for_search(), where a cached
> extent buffer will not be checked for its level and first key.
> 
> There are context where we read tree block without verifying its
> first key, such as scrub.
> 
> So in that case, a corrupted leaf can sneak in and screw up the kernel.
> 
> [FIX]
> Export verify_level_key() as btrfs_verify_level_key() and call it in
> read_block_for_search() to fill the hole.
> 
> Please note, this will cause a lot of extra error message if we have a
> bad tree block in any hot tree, but it's still much better to trigger
> the final safe net in key_search_validate().
> 
> Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Additionally which of those 6 issues contain the image which triggers
this problem or all of them ?

> ---
> There is one remaining report where I can't get any kernel message
> before the VM completely lost reponse.
> https://bugzilla.kernel.org/show_bug.cgi?id=202763
> (out-of-bound access in end_bio_extent_readpage() when mounting and operating a crafted btrfs image)
> 
> And the above case can't be fixed by this patch. Still trying to get a
> good idea of what's going wrong (AKA good kernel message).
> 
> There is also another report, which doesn't provide the fuzzed image.
> https://bugzilla.kernel.org/show_bug.cgi?id=202765
> (NULL pointer dereference when mounting a crafted btrfs image)
> ---
>  fs/btrfs/ctree.c   | 10 ++++++++++
>  fs/btrfs/disk-io.c | 10 +++++-----
>  fs/btrfs/disk-io.h |  3 +++
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index 5a6c39b44c84..37447a7c5b4d 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2401,6 +2401,16 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
>  	if (tmp) {
>  		/* first we do an atomic uptodate check */
>  		if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) {
> +			/*
> +			 * Do extra check for first_key, as it's possible the
> +			 * eb is read from the context without first_key
> +			 * requirement
> +			 */
> +			if (btrfs_verify_level_key(fs_info, tmp,
> +					parent_level - 1, &first_key, gen)) {
> +				free_extent_buffer(tmp);
> +				return -EUCLEAN;
> +			}
>  			*eb_ret = tmp;
>  			return 0;
>  		}
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 298b34721bc0..e2a0cb362d28 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -423,9 +423,9 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
>  	return ret;
>  }
>  
> -static int verify_level_key(struct btrfs_fs_info *fs_info,
> -			    struct extent_buffer *eb, int level,
> -			    struct btrfs_key *first_key, u64 parent_transid)
> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *eb, int level,
> +			   struct btrfs_key *first_key, u64 parent_transid)
>  {
>  	int found_level;
>  	struct btrfs_key found_key;
> @@ -500,8 +500,8 @@ static int btree_read_extent_buffer_pages(struct btrfs_fs_info *fs_info,
>  			if (verify_parent_transid(io_tree, eb,
>  						   parent_transid, 0))
>  				ret = -EIO;
> -			else if (verify_level_key(fs_info, eb, level,
> -						  first_key, parent_transid))
> +			else if (btrfs_verify_level_key(fs_info, eb, level,
> +						first_key, parent_transid))
>  				ret = -EUCLEAN;
>  			else
>  				break;
> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
> index 987a64bc0c66..67a9fe2d29c7 100644
> --- a/fs/btrfs/disk-io.h
> +++ b/fs/btrfs/disk-io.h
> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>  struct btrfs_device;
>  struct btrfs_fs_devices;
>  
> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
> +			   struct extent_buffer *eb, int level,
> +			   struct btrfs_key *first_key, u64 parent_transid);
>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>  				      u64 parent_transid, int level,
>  				      struct btrfs_key *first_key);
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  8:11   ` Nikolay Borisov
@ 2019-03-12  8:32     ` Qu Wenruo
  2019-03-12  8:34       ` Nikolay Borisov
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2019-03-12  8:32 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 2019/3/12 下午4:11, Nikolay Borisov wrote:
> 
> 
> On 12.03.19 г. 9:57 ч., Nikolay Borisov wrote:
>>
>>
>> On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
>>> [BUG]
>>> When reading a file from a fuzzed image, kernel can panic like:
>>>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>>>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>>>   ------------[ cut here ]------------
>>>   kernel BUG at fs/btrfs/ctree.h:3500!
>>>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>>>   Call Trace:
>>>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>>>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>>>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>>>    submit_one_bio+0x59/0x80 [btrfs]
>>>    extent_read_full_page+0x58/0x80 [btrfs]
>>>    generic_file_read_iter+0x2f6/0x9d0
>>>    __vfs_read+0x14d/0x1a0
>>>    vfs_read+0x8d/0x140
>>>    ksys_read+0x52/0xc0
>>>    do_syscall_64+0x60/0x210
>>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>
>>> [CAUSE]
>>> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>>>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>   	...
>>>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
>>>
>>>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>>>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>>>                   range start 8798638964736 end 8798641262592 length 2297856
>>>
>>> For the first time tree read, it will not pass verify_level_key() check.
>>> But the extent buffer will still be cached.
>>>
>>> Also there is a pitfall in read_block_for_search(), where a cached
>>> extent buffer will not be checked for its level and first key.
>>>
>>> There are context where we read tree block without verifying its
>>> first key, such as scrub.
>>>
>>> So in that case, a corrupted leaf can sneak in and screw up the kernel.
>>>
>>> [FIX]
>>> Export verify_level_key() as btrfs_verify_level_key() and call it in
>>> read_block_for_search() to fill the hole.
>>>
>>> Please note, this will cause a lot of extra error message if we have a
>>> bad tree block in any hot tree, but it's still much better to trigger
>>> the final safe net in key_search_validate().
>>>
> 
> <snip>
> 
>>>  				ret = -EIO;
>>> -			else if (verify_level_key(fs_info, eb, level,
>>> -						  first_key, parent_transid))
>>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>>> +						first_key, parent_transid))
>>>  				ret = -EUCLEAN;
>>
>> Actually why is the buffer still held when we return EUCLEAN since in
>> read_tree_block if btree_read_extent_buffer_pages returns an error
>> free_extent_buffer should be called and it should delete the eb from eb
>> cache, no ? IMO the correct behavior should be to remove the corrupted
>> buffer ASAP and not rely on later validation.
> 
> Actually in this case the call to free_extent_buffer in read_tree_block
> won't really clean the buffer since at this point the buffer has refs =
> 2 (one from alloc_extent_buffer and one from being added to the tree),
> however the code in free_extent_buffer won't execute the atomic_cmpxchg
> to do the decrement nor will it execute the fix up right after the
> spinlock if (refs==2 && EXTENT_BUFFER_STALE) which leaves only a single
> call to atomic_dec_and_test in release_extent_buffer which will return
> false. That's wrong.
> 
> 
> The way to fix it is to either:
> a) add a call to atomic_dec(eb->refs) so that the single call to
> atomic_dec_and_test frees the eb
> 
> b) call free_extent_buffer_stale which does atomic_dec itself, I'm more
> inclined to use this option.

Despite the scrub case I described, there is even a more possible case
to sneak a bad eb into cache tree.

One tree block shared by two snapshots, and one of the parent has bad key.

Anyway, either method you mentioned can't solve either shared tree block
nor the scrub case.

So we still need the check, and keep the key_seach_validate() as final
safe net.

Thanks,
Qu

> 
> 
>>
>>>  			else
>>>  				break;
>>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>>> index 987a64bc0c66..67a9fe2d29c7 100644
>>> --- a/fs/btrfs/disk-io.h
>>> +++ b/fs/btrfs/disk-io.h
>>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>>  struct btrfs_device;
>>>  struct btrfs_fs_devices;
>>>  
>>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>>> +			   struct extent_buffer *eb, int level,
>>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>>  				      u64 parent_transid, int level,
>>>  				      struct btrfs_key *first_key);
>>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  8:28 ` Nikolay Borisov
@ 2019-03-12  8:34   ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-03-12  8:34 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 2019/3/12 下午4:28, Nikolay Borisov wrote:
> 
> 
> On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
>> [BUG]
>> When reading a file from a fuzzed image, kernel can panic like:
>>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>>   ------------[ cut here ]------------
>>   kernel BUG at fs/btrfs/ctree.h:3500!
>>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>>   Call Trace:
>>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>>    submit_one_bio+0x59/0x80 [btrfs]
>>    extent_read_full_page+0x58/0x80 [btrfs]
>>    generic_file_read_iter+0x2f6/0x9d0
>>    __vfs_read+0x14d/0x1a0
>>    vfs_read+0x8d/0x140
>>    ksys_read+0x52/0xc0
>>    do_syscall_64+0x60/0x210
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> [CAUSE]
>> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>   	...
>>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
>>
>>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>>                   range start 8798638964736 end 8798641262592 length 2297856
>>
>> For the first time tree read, it will not pass verify_level_key() check.
>> But the extent buffer will still be cached.
>>
>> Also there is a pitfall in read_block_for_search(), where a cached
>> extent buffer will not be checked for its level and first key.
>>
>> There are context where we read tree block without verifying its
>> first key, such as scrub.
>>
>> So in that case, a corrupted leaf can sneak in and screw up the kernel.
>>
>> [FIX]
>> Export verify_level_key() as btrfs_verify_level_key() and call it in
>> read_block_for_search() to fill the hole.
>>
>> Please note, this will cause a lot of extra error message if we have a
>> bad tree block in any hot tree, but it's still much better to trigger
>> the final safe net in key_search_validate().
>>
>> Reported-by: Yoon Jungyeon <jungyeon@gatech.edu>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Additionally which of those 6 issues contain the image which triggers
> this problem or all of them ?

The first contains the image which triggers the exact bug described in
[BUG] section.

The remaining images triggers the same first_key-mismatch-but-cached
problem, and can be rejected by this patch.

Thanks,
Qu

> 
>> ---
>> There is one remaining report where I can't get any kernel message
>> before the VM completely lost reponse.
>> https://bugzilla.kernel.org/show_bug.cgi?id=202763
>> (out-of-bound access in end_bio_extent_readpage() when mounting and operating a crafted btrfs image)
>>
>> And the above case can't be fixed by this patch. Still trying to get a
>> good idea of what's going wrong (AKA good kernel message).
>>
>> There is also another report, which doesn't provide the fuzzed image.
>> https://bugzilla.kernel.org/show_bug.cgi?id=202765
>> (NULL pointer dereference when mounting a crafted btrfs image)
>> ---
>>  fs/btrfs/ctree.c   | 10 ++++++++++
>>  fs/btrfs/disk-io.c | 10 +++++-----
>>  fs/btrfs/disk-io.h |  3 +++
>>  3 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index 5a6c39b44c84..37447a7c5b4d 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -2401,6 +2401,16 @@ read_block_for_search(struct btrfs_root *root, struct btrfs_path *p,
>>  	if (tmp) {
>>  		/* first we do an atomic uptodate check */
>>  		if (btrfs_buffer_uptodate(tmp, gen, 1) > 0) {
>> +			/*
>> +			 * Do extra check for first_key, as it's possible the
>> +			 * eb is read from the context without first_key
>> +			 * requirement
>> +			 */
>> +			if (btrfs_verify_level_key(fs_info, tmp,
>> +					parent_level - 1, &first_key, gen)) {
>> +				free_extent_buffer(tmp);
>> +				return -EUCLEAN;
>> +			}
>>  			*eb_ret = tmp;
>>  			return 0;
>>  		}
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 298b34721bc0..e2a0cb362d28 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -423,9 +423,9 @@ static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info,
>>  	return ret;
>>  }
>>  
>> -static int verify_level_key(struct btrfs_fs_info *fs_info,
>> -			    struct extent_buffer *eb, int level,
>> -			    struct btrfs_key *first_key, u64 parent_transid)
>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>> +			   struct extent_buffer *eb, int level,
>> +			   struct btrfs_key *first_key, u64 parent_transid)
>>  {
>>  	int found_level;
>>  	struct btrfs_key found_key;
>> @@ -500,8 +500,8 @@ static int btree_read_extent_buffer_pages(struct btrfs_fs_info *fs_info,
>>  			if (verify_parent_transid(io_tree, eb,
>>  						   parent_transid, 0))
>>  				ret = -EIO;
>> -			else if (verify_level_key(fs_info, eb, level,
>> -						  first_key, parent_transid))
>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>> +						first_key, parent_transid))
>>  				ret = -EUCLEAN;
>>  			else
>>  				break;
>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>> index 987a64bc0c66..67a9fe2d29c7 100644
>> --- a/fs/btrfs/disk-io.h
>> +++ b/fs/btrfs/disk-io.h
>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>  struct btrfs_device;
>>  struct btrfs_fs_devices;
>>  
>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>> +			   struct extent_buffer *eb, int level,
>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>  				      u64 parent_transid, int level,
>>  				      struct btrfs_key *first_key);
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  8:32     ` Qu Wenruo
@ 2019-03-12  8:34       ` Nikolay Borisov
  2019-03-12  8:39         ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Nikolay Borisov @ 2019-03-12  8:34 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 12.03.19 г. 10:32 ч., Qu Wenruo wrote:
> 
> 
> On 2019/3/12 下午4:11, Nikolay Borisov wrote:
>>
>>
>> On 12.03.19 г. 9:57 ч., Nikolay Borisov wrote:
>>>
>>>
>>> On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
>>>> [BUG]
>>>> When reading a file from a fuzzed image, kernel can panic like:
>>>>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>>>>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>>>>   ------------[ cut here ]------------
>>>>   kernel BUG at fs/btrfs/ctree.h:3500!
>>>>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>>>>   Call Trace:
>>>>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>>>>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>>>>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>>>>    submit_one_bio+0x59/0x80 [btrfs]
>>>>    extent_read_full_page+0x58/0x80 [btrfs]
>>>>    generic_file_read_iter+0x2f6/0x9d0
>>>>    __vfs_read+0x14d/0x1a0
>>>>    vfs_read+0x8d/0x140
>>>>    ksys_read+0x52/0xc0
>>>>    do_syscall_64+0x60/0x210
>>>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>
>>>> [CAUSE]
>>>> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>>>>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>>>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>>   	...
>>>>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
>>>>
>>>>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>>>>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>>>>                   range start 8798638964736 end 8798641262592 length 2297856
>>>>
>>>> For the first time tree read, it will not pass verify_level_key() check.
>>>> But the extent buffer will still be cached.
>>>>
>>>> Also there is a pitfall in read_block_for_search(), where a cached
>>>> extent buffer will not be checked for its level and first key.
>>>>
>>>> There are context where we read tree block without verifying its
>>>> first key, such as scrub.
>>>>
>>>> So in that case, a corrupted leaf can sneak in and screw up the kernel.
>>>>
>>>> [FIX]
>>>> Export verify_level_key() as btrfs_verify_level_key() and call it in
>>>> read_block_for_search() to fill the hole.
>>>>
>>>> Please note, this will cause a lot of extra error message if we have a
>>>> bad tree block in any hot tree, but it's still much better to trigger
>>>> the final safe net in key_search_validate().
>>>>
>>
>> <snip>
>>
>>>>  				ret = -EIO;
>>>> -			else if (verify_level_key(fs_info, eb, level,
>>>> -						  first_key, parent_transid))
>>>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>>>> +						first_key, parent_transid))
>>>>  				ret = -EUCLEAN;
>>>
>>> Actually why is the buffer still held when we return EUCLEAN since in
>>> read_tree_block if btree_read_extent_buffer_pages returns an error
>>> free_extent_buffer should be called and it should delete the eb from eb
>>> cache, no ? IMO the correct behavior should be to remove the corrupted
>>> buffer ASAP and not rely on later validation.
>>
>> Actually in this case the call to free_extent_buffer in read_tree_block
>> won't really clean the buffer since at this point the buffer has refs =
>> 2 (one from alloc_extent_buffer and one from being added to the tree),
>> however the code in free_extent_buffer won't execute the atomic_cmpxchg
>> to do the decrement nor will it execute the fix up right after the
>> spinlock if (refs==2 && EXTENT_BUFFER_STALE) which leaves only a single
>> call to atomic_dec_and_test in release_extent_buffer which will return
>> false. That's wrong.
>>
>>
>> The way to fix it is to either:
>> a) add a call to atomic_dec(eb->refs) so that the single call to
>> atomic_dec_and_test frees the eb
>>
>> b) call free_extent_buffer_stale which does atomic_dec itself, I'm more
>> inclined to use this option.
> 
> Despite the scrub case I described, there is even a more possible case
> to sneak a bad eb into cache tree.
> 
> One tree block shared by two snapshots, and one of the parent has bad key.
> 
> Anyway, either method you mentioned can't solve either shared tree block
> nor the scrub case.
> 
> So we still need the check, and keep the key_seach_validate() as final
> safe net.

Still, there seems to be a bug in the way failed eb's are handled during
normal read. Also your commit log doesn't describe how those ebs can
sneak in. Please describe the call chains in v2

> 
> Thanks,
> Qu
> 
>>
>>
>>>
>>>>  			else
>>>>  				break;
>>>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>>>> index 987a64bc0c66..67a9fe2d29c7 100644
>>>> --- a/fs/btrfs/disk-io.h
>>>> +++ b/fs/btrfs/disk-io.h
>>>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>>>  struct btrfs_device;
>>>>  struct btrfs_fs_devices;
>>>>  
>>>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>>>> +			   struct extent_buffer *eb, int level,
>>>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>>>  				      u64 parent_transid, int level,
>>>>  				      struct btrfs_key *first_key);
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] btrfs: Check the first key and level for cached extent buffer
  2019-03-12  8:34       ` Nikolay Borisov
@ 2019-03-12  8:39         ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2019-03-12  8:39 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs; +Cc: Yoon Jungyeon



On 2019/3/12 下午4:34, Nikolay Borisov wrote:
> 
> 
> On 12.03.19 г. 10:32 ч., Qu Wenruo wrote:
>>
>>
>> On 2019/3/12 下午4:11, Nikolay Borisov wrote:
>>>
>>>
>>> On 12.03.19 г. 9:57 ч., Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 12.03.19 г. 9:45 ч., Qu Wenruo wrote:
>>>>> [BUG]
>>>>> When reading a file from a fuzzed image, kernel can panic like:
>>>>>   BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
>>>>>   assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
>>>>>   ------------[ cut here ]------------
>>>>>   kernel BUG at fs/btrfs/ctree.h:3500!
>>>>>   invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>>>   RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
>>>>>   Call Trace:
>>>>>    btrfs_lookup_csum+0x52/0x150 [btrfs]
>>>>>    __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
>>>>>    btrfs_submit_bio_hook+0x103/0x170 [btrfs]
>>>>>    submit_one_bio+0x59/0x80 [btrfs]
>>>>>    extent_read_full_page+0x58/0x80 [btrfs]
>>>>>    generic_file_read_iter+0x2f6/0x9d0
>>>>>    __vfs_read+0x14d/0x1a0
>>>>>    vfs_read+0x8d/0x140
>>>>>    ksys_read+0x52/0xc0
>>>>>    do_syscall_64+0x60/0x210
>>>>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>
>>>>> [CAUSE]
>>>>> The fuzzed image has a corrupted leaf whose first key doesn't match with its parent:
>>>>>   checksum tree key (CSUM_TREE ROOT_ITEM 0)
>>>>>   node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
>>>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>>>   	...
>>>>>           key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
>>>>>
>>>>>   leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
>>>>>   leaf 29761536 flags 0x1(WRITTEN) backref revision 1
>>>>>   fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
>>>>>   chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
>>>>>           item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
>>>>>                   range start 8798638964736 end 8798641262592 length 2297856
>>>>>
>>>>> For the first time tree read, it will not pass verify_level_key() check.
>>>>> But the extent buffer will still be cached.
>>>>>
>>>>> Also there is a pitfall in read_block_for_search(), where a cached
>>>>> extent buffer will not be checked for its level and first key.
>>>>>
>>>>> There are context where we read tree block without verifying its
>>>>> first key, such as scrub.
>>>>>
>>>>> So in that case, a corrupted leaf can sneak in and screw up the kernel.
>>>>>
>>>>> [FIX]
>>>>> Export verify_level_key() as btrfs_verify_level_key() and call it in
>>>>> read_block_for_search() to fill the hole.
>>>>>
>>>>> Please note, this will cause a lot of extra error message if we have a
>>>>> bad tree block in any hot tree, but it's still much better to trigger
>>>>> the final safe net in key_search_validate().
>>>>>
>>>
>>> <snip>
>>>
>>>>>  				ret = -EIO;
>>>>> -			else if (verify_level_key(fs_info, eb, level,
>>>>> -						  first_key, parent_transid))
>>>>> +			else if (btrfs_verify_level_key(fs_info, eb, level,
>>>>> +						first_key, parent_transid))
>>>>>  				ret = -EUCLEAN;
>>>>
>>>> Actually why is the buffer still held when we return EUCLEAN since in
>>>> read_tree_block if btree_read_extent_buffer_pages returns an error
>>>> free_extent_buffer should be called and it should delete the eb from eb
>>>> cache, no ? IMO the correct behavior should be to remove the corrupted
>>>> buffer ASAP and not rely on later validation.
>>>
>>> Actually in this case the call to free_extent_buffer in read_tree_block
>>> won't really clean the buffer since at this point the buffer has refs =
>>> 2 (one from alloc_extent_buffer and one from being added to the tree),
>>> however the code in free_extent_buffer won't execute the atomic_cmpxchg
>>> to do the decrement nor will it execute the fix up right after the
>>> spinlock if (refs==2 && EXTENT_BUFFER_STALE) which leaves only a single
>>> call to atomic_dec_and_test in release_extent_buffer which will return
>>> false. That's wrong.
>>>
>>>
>>> The way to fix it is to either:
>>> a) add a call to atomic_dec(eb->refs) so that the single call to
>>> atomic_dec_and_test frees the eb
>>>
>>> b) call free_extent_buffer_stale which does atomic_dec itself, I'm more
>>> inclined to use this option.
>>
>> Despite the scrub case I described, there is even a more possible case
>> to sneak a bad eb into cache tree.
>>
>> One tree block shared by two snapshots, and one of the parent has bad key.
>>
>> Anyway, either method you mentioned can't solve either shared tree block
>> nor the scrub case.
>>
>> So we still need the check, and keep the key_seach_validate() as final
>> safe net.
> 
> Still, there seems to be a bug in the way failed eb's are handled during
> normal read. Also your commit log doesn't describe how those ebs can
> sneak in. Please describe the call chains in v2

Sure, I'll add that part and describe the reason why we need to do the
check here.

In fact after I send out the btrfs-progs patch to discard bad tree
blocks, I tried the same way in kernel, but as you mentioned, it's
different in kernel and needs extra care to handle.

Thanks,
Qu

> 
>>
>> Thanks,
>> Qu
>>
>>>
>>>
>>>>
>>>>>  			else
>>>>>  				break;
>>>>> diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
>>>>> index 987a64bc0c66..67a9fe2d29c7 100644
>>>>> --- a/fs/btrfs/disk-io.h
>>>>> +++ b/fs/btrfs/disk-io.h
>>>>> @@ -39,6 +39,9 @@ static inline u64 btrfs_sb_offset(int mirror)
>>>>>  struct btrfs_device;
>>>>>  struct btrfs_fs_devices;
>>>>>  
>>>>> +int btrfs_verify_level_key(struct btrfs_fs_info *fs_info,
>>>>> +			   struct extent_buffer *eb, int level,
>>>>> +			   struct btrfs_key *first_key, u64 parent_transid);
>>>>>  struct extent_buffer *read_tree_block(struct btrfs_fs_info *fs_info, u64 bytenr,
>>>>>  				      u64 parent_transid, int level,
>>>>>  				      struct btrfs_key *first_key);
>>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-12  8:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12  7:45 [PATCH] btrfs: Check the first key and level for cached extent buffer Qu Wenruo
2019-03-12  7:57 ` Nikolay Borisov
2019-03-12  8:05   ` Qu Wenruo
2019-03-12  8:11   ` Nikolay Borisov
2019-03-12  8:32     ` Qu Wenruo
2019-03-12  8:34       ` Nikolay Borisov
2019-03-12  8:39         ` Qu Wenruo
2019-03-12  8:28 ` Nikolay Borisov
2019-03-12  8:34   ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.