All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent()
@ 2021-03-03 10:41 Qu Wenruo
  2021-03-03 10:41 ` [PATCH v2 2/2] btrfs: fix qgroup data rsv leak caused by falloc failure Qu Wenruo
  2021-03-08 16:44 ` [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-03-03 10:41 UTC (permalink / raw)
  To: linux-btrfs

There is a piece of weird code in insert_prealloc_file_extent(), which
looks like:

	ret = btrfs_qgroup_release_data(inode, file_offset, len);
	if (ret < 0)
		return ERR_PTR(ret);
	if (trans) {
		ret = insert_reserved_file_extent(trans, inode,
						  file_offset, &stack_fi,
						  true, ret);
	...
	}
	extent_info.is_new_extent = true;
	extent_info.qgroup_reserved = ret;
	...

Note how the variable @ret is abused here, and if anyone is adding code
just after btrfs_qgroup_release_data() call, it's super easy to
overwrite the @ret and cause tons of qgroup related bugs.

Fix such abuse by introducing new variable @qgroup_released, so that we
won't reuse the existing variable @ret.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- None
---
 fs/btrfs/inode.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4f2f1e932751..4e9717c29451 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9873,6 +9873,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	struct btrfs_path *path;
 	u64 start = ins->objectid;
 	u64 len = ins->offset;
+	int qgroup_released;
 	int ret;
 
 	memset(&stack_fi, 0, sizeof(stack_fi));
@@ -9885,14 +9886,14 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE);
 	/* Encryption and other encoding is reserved and all 0 */
 
-	ret = btrfs_qgroup_release_data(inode, file_offset, len);
-	if (ret < 0)
-		return ERR_PTR(ret);
+	qgroup_released = btrfs_qgroup_release_data(inode, file_offset, len);
+	if (qgroup_released < 0)
+		return ERR_PTR(qgroup_released);
 
 	if (trans) {
 		ret = insert_reserved_file_extent(trans, inode,
 						  file_offset, &stack_fi,
-						  true, ret);
+						  true, qgroup_released);
 		if (ret)
 			return ERR_PTR(ret);
 		return trans;
@@ -9905,7 +9906,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	extent_info.file_offset = file_offset;
 	extent_info.extent_buf = (char *)&stack_fi;
 	extent_info.is_new_extent = true;
-	extent_info.qgroup_reserved = ret;
+	extent_info.qgroup_reserved = qgroup_released;
 	extent_info.insertions = 0;
 
 	path = btrfs_alloc_path();
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v2 2/2] btrfs: fix qgroup data rsv leak caused by falloc failure
  2021-03-03 10:41 [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() Qu Wenruo
@ 2021-03-03 10:41 ` Qu Wenruo
  2021-03-08 16:44 ` [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Qu Wenruo @ 2021-03-03 10:41 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Nikolay Borisov, David Sterba

[BUG]
When running fsstress with only falloc workload, and a very low qgroup
limit set, we can get qgroup data rsv leak at unmount time.

 BTRFS warning (device dm-0): qgroup 0/5 has unreleased space, type 0 rsv 20480
 BTRFS error (device dm-0): qgroup reserved space leaked

The minimal reproducer looks like:

  #!/bin/bash
  dev=/dev/test/test
  mnt="/mnt/btrfs"
  fsstress=~/xfstests-dev/ltp/fsstress
  runtime=8

  workload()
  {
          umount $dev &> /dev/null
          umount $mnt &> /dev/null
          mkfs.btrfs -f $dev > /dev/null
          mount $dev $mnt

          btrfs quota en $mnt
          btrfs quota rescan -w $mnt
          btrfs qgroup limit 16m 0/5 $mnt

          $fsstress -w -z -f creat=10 -f fallocate=10 -p 2 -n 100 \
  		-d $mnt -v > /tmp/fsstress

          umount $mnt
          if dmesg | grep leak ; then
		echo "!!! FAILED !!!"
  		exit 1
          fi
  }

  for (( i=0; i < $runtime; i++)); do
          echo "=== $i/$runtime==="
          workload
  done

Normally it would fail before round 4.

[CAUSE]
In function insert_prealloc_file_extent(), we first call
btrfs_qgroup_release_data() to know how many bytes are reserved for
qgroup data rsv.

Then use that @qgroup_released number to continue our work.

But after we call btrfs_qgroup_release_data(), we should either queue
@qgroup_released to delayed ref or free them manually in error path.

Unfortunately, we lack the error handling to free the released bytes,
leaking qgroup data rsv.

All the error handling function outside won't help at all, as we have
released the range, meaning in inode io tree, the EXTENT_QGROUP_RESERVED
bit is already cleared, thus all btrfs_qgroup_free_data() call won't
free any data rsv.

[FIX]
Add free_qgroup tag to manually free the released qgroup data rsv.

Fixes: 9729f10a608f ("btrfs: inode: move qgroup reserved space release to the callers of insert_reserved_file_extent()")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changlog:
v2:
- Add a missing error branch for insert_reserved_file_extent()
---
 fs/btrfs/inode.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4e9717c29451..4d26e2a41021 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9895,7 +9895,7 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 						  file_offset, &stack_fi,
 						  true, qgroup_released);
 		if (ret)
-			return ERR_PTR(ret);
+			goto free_qgroup;
 		return trans;
 	}
 
@@ -9910,17 +9910,30 @@ static struct btrfs_trans_handle *insert_prealloc_file_extent(
 	extent_info.insertions = 0;
 
 	path = btrfs_alloc_path();
-	if (!path)
-		return ERR_PTR(-ENOMEM);
+	if (!path) {
+		ret = -ENOMEM;
+		goto free_qgroup;
+	}
 
 	ret = btrfs_replace_file_extents(&inode->vfs_inode, path, file_offset,
 				     file_offset + len - 1, &extent_info,
 				     &trans);
 	btrfs_free_path(path);
 	if (ret)
-		return ERR_PTR(ret);
-
+		goto free_qgroup;
 	return trans;
+free_qgroup:
+	/*
+	 * We have released qgroup data range at the beginning of the function,
+	 * and normally @qgroup_released bytes will be freed when committing
+	 * transaction.
+	 * But if we error out early, we have to free what we have released
+	 * or we leak qgroup data rsv.
+	 */
+	btrfs_qgroup_free_refroot(inode->root->fs_info,
+			inode->root->root_key.objectid, qgroup_released,
+			BTRFS_QGROUP_RSV_DATA);
+	return ERR_PTR(ret);
 }
 
 static int __btrfs_prealloc_file_range(struct inode *inode, int mode,
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent()
  2021-03-03 10:41 [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() Qu Wenruo
  2021-03-03 10:41 ` [PATCH v2 2/2] btrfs: fix qgroup data rsv leak caused by falloc failure Qu Wenruo
@ 2021-03-08 16:44 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2021-03-08 16:44 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Mar 03, 2021 at 06:41:51PM +0800, Qu Wenruo wrote:
> There is a piece of weird code in insert_prealloc_file_extent(), which
> looks like:
> 
> 	ret = btrfs_qgroup_release_data(inode, file_offset, len);
> 	if (ret < 0)
> 		return ERR_PTR(ret);
> 	if (trans) {
> 		ret = insert_reserved_file_extent(trans, inode,
> 						  file_offset, &stack_fi,
> 						  true, ret);
> 	...
> 	}
> 	extent_info.is_new_extent = true;
> 	extent_info.qgroup_reserved = ret;
> 	...
> 
> Note how the variable @ret is abused here, and if anyone is adding code
> just after btrfs_qgroup_release_data() call, it's super easy to
> overwrite the @ret and cause tons of qgroup related bugs.
> 
> Fix such abuse by introducing new variable @qgroup_released, so that we
> won't reuse the existing variable @ret.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-03-08 16:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 10:41 [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() Qu Wenruo
2021-03-03 10:41 ` [PATCH v2 2/2] btrfs: fix qgroup data rsv leak caused by falloc failure Qu Wenruo
2021-03-08 16:44 ` [PATCH v2 1/2] btrfs: use a dedicated variable for qgroup released data rsv for insert_prealloc_file_extent() David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.