Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
@ 2019-09-13  1:51 Qu Wenruo
  2019-09-13  1:51 ` [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls Qu Wenruo
  2019-09-13 12:57 ` [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Nikolay Borisov
  0 siblings, 2 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-09-13  1:51 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

[BUG]
Under the follow case with qgroup enabled, if some error happened after
we have reserved delalloc space, then in error handling path, we could
cause qgroup data space leakage:

From btrfs_truncate_block() in inode.c:

	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
					   block_start, blocksize);
	if (ret)
		goto out;

again:
	page = find_or_create_page(mapping, index, mask);
	if (!page) {
		btrfs_delalloc_release_space(inode, data_reserved,
					     block_start, blocksize, true);
		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
		ret = -ENOMEM;
		goto out;
	}

[CAUSE]
In above case, btrfs_delalloc_reserve_space() will call
btrfs_qgroup_reserve_data() and mark the io_tree range with
EXTENT_QGROUP_RESERVED flag.

In the error handling path, btrfs_delalloc_release_space() calls
btrfs_qgroup_free_data() which should clear EXTENT_QGROUP_RESERVED flag
and reduce the reserved data space accroding to the cleared range.

However due to a completion bug, btrfs_qgroup_free_data() will clear
EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
than the correct BTRFS_I(inode)->io_tree.
Since io_failure_tree is never marked with that flag,
btrfs_qgroup_free_data() will not free any data reserved space at all,
causing a leakage.

All of such error handling cases can only be triggered some errors not
from qgroup, so regular EDQUOT error won't trigger the bug.
Normally we need error injection to trigger such bug.

[FIX]
Fix the wrong target io_tree.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 2891b57b9e1e..64bdc3e3652d 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3492,7 +3492,7 @@ static int qgroup_free_reserved_data(struct inode *inode,
 		 * EXTENT_QGROUP_RESERVED, we won't double free.
 		 * So not need to rush.
 		 */
-		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
+		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree,
 				free_start, free_start + free_len - 1,
 				EXTENT_QGROUP_RESERVED, &changeset);
 		if (ret < 0)
-- 
2.23.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
  2019-09-13  1:51 [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Qu Wenruo
@ 2019-09-13  1:51 ` Qu Wenruo
  2019-09-13 13:24   ` Nikolay Borisov
  2019-09-13 12:57 ` [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Nikolay Borisov
  1 sibling, 1 reply; 5+ messages in thread
From: Qu Wenruo @ 2019-09-13  1:51 UTC (permalink / raw)
  To: linux-btrfs

[BUG]
The following script can cause btrfs qgroup data space leak:

  mkfs.btrfs -f $dev
  mount $dev -o nospace_cache $mnt

  btrfs subv create $mnt/subv
  btrfs quota en $mnt
  btrfs quota rescan -w $mnt
  btrfs qgroup limit 128m $mnt/subv

  for (( i = 0; i < 3; i++)); do
          # Create 3 64M holes for latter fallocate to fail
          truncate -s 192m $mnt/subv/file
          xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
          xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
          sync

          # it's supposed to fail, and each failure will leak at least 64M
          # data space
          xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
          rm $mnt/subv/file
          sync
  done

  # Shouldn't fail after we removed the file
  xfs_io -f -c "falloc 0 64m" $mnt/subv/file

[CAUSE]
Btrfs qgroup data reserve code allows multiple reserve happen on a
single extent_changeset:

The only usage is in btrfs_fallocate():
	struct extent_changeset *data_reserved = NULL;
	btrfs_qgroup_reserve_data(inode, &data_reserved,
				  range_start, range_len);
	...
	btrfs_qgroup_reserve_data(inode, &data_reserved,
				  new_range_start, new_range_len);
	extent_changeset_free(data_reserved);

However in btrfs_qgroup_reserve_data(), if one of the call failed, it
will cleanup all reserved space.
The cleanup itself is OK, but it only cleans up all
EXTENT_QGROUP_RESERVED flag, forget to release the reserved bytes.

So if multiple btrfs_qgroup_reserve_data() get called, and the last one
failed, then previously reserved data space will get leaked.

And due to the fact that EXTENT_QGROUP_RESERVED flag is cleaned
correctly, btrfs_qgroup_check_reserved_leak() won't catch the leakage.

[FIX]
Also free previously reserved data bytes when btrfs_qgroup_reserve_data
fails.

Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/qgroup.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 64bdc3e3652d..59f6a9981087 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
 	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
 		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
 				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
+	/* Also free data bytes of already reserved one */
+	btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
+				  orig_reserved, BTRFS_QGROUP_RSV_DATA);
 	extent_changeset_release(reserved);
 	return ret;
 }
-- 
2.23.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
  2019-09-13  1:51 [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Qu Wenruo
  2019-09-13  1:51 ` [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls Qu Wenruo
@ 2019-09-13 12:57 ` Nikolay Borisov
  2019-09-13 13:02   ` Qu Wenruo
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Borisov @ 2019-09-13 12:57 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Josef Bacik



On 13.09.19 г. 4:51 ч., Qu Wenruo wrote:
> [BUG]
> Under the follow case with qgroup enabled, if some error happened after
> we have reserved delalloc space, then in error handling path, we could
> cause qgroup data space leakage:
> 
> From btrfs_truncate_block() in inode.c:
> 
> 	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
> 					   block_start, blocksize);
> 	if (ret)
> 		goto out;
> 
> again:
> 	page = find_or_create_page(mapping, index, mask);
> 	if (!page) {
> 		btrfs_delalloc_release_space(inode, data_reserved,
> 					     block_start, blocksize, true);
> 		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
> 		ret = -ENOMEM;
> 		goto out;
> 	}
> 
> [CAUSE]
> In above case, btrfs_delalloc_reserve_space() will call
> btrfs_qgroup_reserve_data() and mark the io_tree range with
> EXTENT_QGROUP_RESERVED flag.
> 
> In the error handling path, btrfs_delalloc_release_space() calls
> btrfs_qgroup_free_data() which should clear EXTENT_QGROUP_RESERVED flag
> and reduce the reserved data space accroding to the cleared range.
> 
> However due to a completion bug, btrfs_qgroup_free_data() will clear
> EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
> than the correct BTRFS_I(inode)->io_tree.

This is a bit confusing because the error is actually in
qgroup_free_reserved_data, which is called from
__btrfs_qgroup_release_data. But in the latter function there is also a
call to clear_record_extent_bits with the correct tree. Just fix the
function name by using qgroup_free_reserved_data.

> Since io_failure_tree is never marked with that flag,
> btrfs_qgroup_free_data() will not free any data reserved space at all,
> causing a leakage.
> 
> All of such error handling cases can only be triggered some errors not

I take it you meant:

This error handling can only be triggered by errors outside of qgroup
e.g. EDQUOT can't triger the bug?

The first part of the sentence is hard to parse.

> from qgroup, so regular EDQUOT error won't trigger the bug.
> Normally we need error injection to trigger such bug.
> 
> [FIX]
> Fix the wrong target io_tree.
> 
> Reported-by: Josef Bacik <josef@toxicpanda.com>
> Fixes: bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/qgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 2891b57b9e1e..64bdc3e3652d 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -3492,7 +3492,7 @@ static int qgroup_free_reserved_data(struct inode *inode,
>  		 * EXTENT_QGROUP_RESERVED, we won't double free.
>  		 * So not need to rush.
>  		 */
> -		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
> +		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree,
>  				free_start, free_start + free_len - 1,
>  				EXTENT_QGROUP_RESERVED, &changeset);
>  		if (ret < 0)
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
  2019-09-13 12:57 ` [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Nikolay Borisov
@ 2019-09-13 13:02   ` Qu Wenruo
  0 siblings, 0 replies; 5+ messages in thread
From: Qu Wenruo @ 2019-09-13 13:02 UTC (permalink / raw)
  To: Nikolay Borisov, Qu Wenruo, linux-btrfs; +Cc: Josef Bacik



On 2019/9/13 下午8:57, Nikolay Borisov wrote:
>
>
> On 13.09.19 г. 4:51 ч., Qu Wenruo wrote:
>> [BUG]
>> Under the follow case with qgroup enabled, if some error happened after
>> we have reserved delalloc space, then in error handling path, we could
>> cause qgroup data space leakage:
>>
>> From btrfs_truncate_block() in inode.c:
>>
>> 	ret = btrfs_delalloc_reserve_space(inode, &data_reserved,
>> 					   block_start, blocksize);
>> 	if (ret)
>> 		goto out;
>>
>> again:
>> 	page = find_or_create_page(mapping, index, mask);
>> 	if (!page) {
>> 		btrfs_delalloc_release_space(inode, data_reserved,
>> 					     block_start, blocksize, true);
>> 		btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true);
>> 		ret = -ENOMEM;
>> 		goto out;
>> 	}
>>
>> [CAUSE]
>> In above case, btrfs_delalloc_reserve_space() will call
>> btrfs_qgroup_reserve_data() and mark the io_tree range with
>> EXTENT_QGROUP_RESERVED flag.
>>
>> In the error handling path, btrfs_delalloc_release_space() calls
>> btrfs_qgroup_free_data() which should clear EXTENT_QGROUP_RESERVED flag
>> and reduce the reserved data space accroding to the cleared range.
>>
>> However due to a completion bug, btrfs_qgroup_free_data() will clear
>> EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other
>> than the correct BTRFS_I(inode)->io_tree.
>
> This is a bit confusing because the error is actually in
> qgroup_free_reserved_data, which is called from
> __btrfs_qgroup_release_data. But in the latter function there is also a
> call to clear_record_extent_bits with the correct tree. Just fix the
> function name by using qgroup_free_reserved_data.

Right, I ignored some caller here, as the caller chain is not only
dependent on btrfs_qgroup_free_data() but also on the parameter.
E.g. only when reserved is non-null we go qgroup_free_reserved_data().

>
>> Since io_failure_tree is never marked with that flag,
>> btrfs_qgroup_free_data() will not free any data reserved space at all,
>> causing a leakage.
>>
>> All of such error handling cases can only be triggered some errors not
>
> I take it you meant:
>
> This error handling can only be triggered by errors outside of qgroup
> e.g. EDQUOT can't triger the bug?

Right.

I'll change it too something like "such leakage can only be triggered by
errors outside of qgroup."

Thanks,
Qu

>
> The first part of the sentence is hard to parse.
>
>> from qgroup, so regular EDQUOT error won't trigger the bug.
>> Normally we need error injection to trigger such bug.
>>
>> [FIX]
>> Fix the wrong target io_tree.
>>
>> Reported-by: Josef Bacik <josef@toxicpanda.com>
>> Fixes: bc42bda22345 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/qgroup.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 2891b57b9e1e..64bdc3e3652d 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -3492,7 +3492,7 @@ static int qgroup_free_reserved_data(struct inode *inode,
>>  		 * EXTENT_QGROUP_RESERVED, we won't double free.
>>  		 * So not need to rush.
>>  		 */
>> -		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_failure_tree,
>> +		ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree,
>>  				free_start, free_start + free_len - 1,
>>  				EXTENT_QGROUP_RESERVED, &changeset);
>>  		if (ret < 0)
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
  2019-09-13  1:51 ` [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls Qu Wenruo
@ 2019-09-13 13:24   ` Nikolay Borisov
  0 siblings, 0 replies; 5+ messages in thread
From: Nikolay Borisov @ 2019-09-13 13:24 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs



On 13.09.19 г. 4:51 ч., Qu Wenruo wrote:
> [BUG]
> The following script can cause btrfs qgroup data space leak:
> 
>   mkfs.btrfs -f $dev
>   mount $dev -o nospace_cache $mnt
> 
>   btrfs subv create $mnt/subv
>   btrfs quota en $mnt
>   btrfs quota rescan -w $mnt
>   btrfs qgroup limit 128m $mnt/subv
> 
>   for (( i = 0; i < 3; i++)); do
>           # Create 3 64M holes for latter fallocate to fail
>           truncate -s 192m $mnt/subv/file
>           xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null
>           xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null
>           sync
> 
>           # it's supposed to fail, and each failure will leak at least 64M
>           # data space
>           xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null
>           rm $mnt/subv/file
>           sync
>   done
> 
>   # Shouldn't fail after we removed the file
>   xfs_io -f -c "falloc 0 64m" $mnt/subv/file
> 
> [CAUSE]
> Btrfs qgroup data reserve code allows multiple reserve happen on a
                                                  ^
                                                 reservations to happen
> single extent_changeset:
> 
> The only usage is in btrfs_fallocate():
> 	struct extent_changeset *data_reserved = NULL;
> 	btrfs_qgroup_reserve_data(inode, &data_reserved,
> 				  range_start, range_len);
> 	...
> 	btrfs_qgroup_reserve_data(inode, &data_reserved,
> 				  new_range_start, new_range_len);
> 	extent_changeset_free(data_reserved);

I take it you refer to the while() loop in btrfs_fallocate. The code
above is really just a _VERY_ condensed version. extent_changeset_free
is at the end of the function. Instead of putting random lines of code
just explicitly state it, something along the lines of:

"The only such pattern is in btrfs_fallocate in the main while loop in
that function".

> 
> However in btrfs_qgroup_reserve_data(), if one of the call failed, it               > will cleanup all reserved space.
> The cleanup itself is OK, but it only cleans up all
> EXTENT_QGROUP_RESERVED flag, forget to release the reserved bytes.
> 
> So if multiple btrfs_qgroup_reserve_data() get called, and the last one
> failed, then previously reserved data space will get leaked.
> 
> And due to the fact that EXTENT_QGROUP_RESERVED flag is cleaned
> correctly, btrfs_qgroup_check_reserved_leak() won't catch the leakage.

How about rephraing the above 3 paragraphs along the lines of:

"btrfs_qgroup_reserve_data's error handling has a bug in that on error
it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED flag and
doesn't free the reserved bytes. This behavior has a two fold effect:

 1. Clearing EXTENT_QGROUP_RESERVED ranges prevents
btrfs_qgroup_check_reserved_leak to catch the leakage.
 2. Leak the previously reserved data bytes.


The bug manifests when N calls to btrfs_qgroup_reserve_data are made and
the last one fails, leaking space allocated in the previous ones.
"


> 
> [FIX]
> Also free previously reserved data bytes when btrfs_qgroup_reserve_data
> fails.
> 
> Fixes: 524725537023 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/qgroup.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 64bdc3e3652d..59f6a9981087 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -3448,6 +3448,9 @@ int btrfs_qgroup_reserve_data(struct inode *inode,
>  	while ((unode = ulist_next(&reserved->range_changed, &uiter)))
>  		clear_extent_bit(&BTRFS_I(inode)->io_tree, unode->val,
>  				 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL);
> +	/* Also free data bytes of already reserved one */
> +	btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid,
> +				  orig_reserved, BTRFS_QGROUP_RSV_DATA);
>  	extent_changeset_release(reserved);
>  	return ret;
>  }
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13  1:51 [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Qu Wenruo
2019-09-13  1:51 ` [PATCH 2/2] btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls Qu Wenruo
2019-09-13 13:24   ` Nikolay Borisov
2019-09-13 12:57 ` [PATCH 1/2] btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space Nikolay Borisov
2019-09-13 13:02   ` Qu Wenruo

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox