* [PATCH 2/2] btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations
2020-07-24 6:46 [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode Qu Wenruo
@ 2020-07-24 6:46 ` Qu Wenruo
2020-08-24 18:08 ` Josef Bacik
2020-08-24 18:02 ` [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode Josef Bacik
2020-08-27 11:32 ` David Sterba
2 siblings, 1 reply; 6+ messages in thread
From: Qu Wenruo @ 2020-07-24 6:46 UTC (permalink / raw)
To: linux-btrfs
[BUG]
When quota is enabled for TEST_DEV, generic/013 sometimes fails like this:
generic/013 14s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//generic/013.dmesg)
And with the following metadata leak:
BTRFS warning (device dm-3): qgroup 0/1370 has unreleased space, type 2 rsv 49152
------------[ cut here ]------------
WARNING: CPU: 2 PID: 47912 at fs/btrfs/disk-io.c:4078 close_ctree+0x1dc/0x323 [btrfs]
Call Trace:
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
---[ end trace a6cfd45ba80e4e06 ]---
BTRFS error (device dm-3): qgroup reserved space leaked
BTRFS info (device dm-3): disk space caching is enabled
BTRFS info (device dm-3): has skinny extents
[CAUSE]
The qgroup preallocated meta rsv operations of that offending root are:
btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
btrfs_delayed_inode_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=131072
btrfs_subvolume_reserve_metadata: rsv_meta_prealloc root=1370 num_bytes=49152
btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
btrfs_delayed_inode_release_metadata: convert_meta_prealloc root=1370 num_bytes=-131072
It's pretty obvious that, we reserve qgroup meta rsv in
btrfs_subvolume_reserve_metadata(), but doesn't have corresponding
release/convert calls in btrfs_subvolume_release_metadata().
This leads to the leakage.
[FIX]
To fix this bug, we should follow what we're doing in
btrfs_delalloc_reserve_metadata(), where we reserve qgroup space, and
add it to block_rsv->qgroup_rsv_reserved.
And free the qgroup reserved metadata space when releasing the
block_rsv.
To do this, we need to change the btrfs_subvolume_release_metadata() to
accept btrfs_root, and record the qgroup_to_release number, and call
btrfs_qgroup_convert_reserved_meta() for it.
Fixes: 733e03a0b26a ("btrfs: qgroup: Split meta rsv type into meta_prealloc and meta_pertrans")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/ctree.h | 2 +-
fs/btrfs/inode.c | 2 +-
fs/btrfs/ioctl.c | 6 +++---
fs/btrfs/root-tree.c | 13 +++++++++++--
4 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9c7e466f27a9..e1db56ff8f6f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2619,7 +2619,7 @@ enum btrfs_flush_state {
int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
struct btrfs_block_rsv *rsv,
int nitems, bool use_global_rsv);
-void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
+void btrfs_subvolume_release_metadata(struct btrfs_root *root,
struct btrfs_block_rsv *rsv);
void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 num_bytes);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 611b3412fbfd..e9def7cdf662 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4047,7 +4047,7 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
err = ret;
inode->i_flags |= S_DEAD;
out_release:
- btrfs_subvolume_release_metadata(fs_info, &block_rsv);
+ btrfs_subvolume_release_metadata(root, &block_rsv);
out_up_write:
up_write(&fs_info->subvol_sem);
if (err) {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index bd3511c5ca81..976aeddca86c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -618,7 +618,7 @@ static noinline int create_subvol(struct inode *dir,
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
- btrfs_subvolume_release_metadata(fs_info, &block_rsv);
+ btrfs_subvolume_release_metadata(root, &block_rsv);
goto fail_free;
}
trans->block_rsv = &block_rsv;
@@ -742,7 +742,7 @@ static noinline int create_subvol(struct inode *dir,
kfree(root_item);
trans->block_rsv = NULL;
trans->bytes_reserved = 0;
- btrfs_subvolume_release_metadata(fs_info, &block_rsv);
+ btrfs_subvolume_release_metadata(root, &block_rsv);
err = btrfs_commit_transaction(trans);
if (err && !ret)
@@ -856,7 +856,7 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
if (ret && pending_snapshot->snap)
pending_snapshot->snap->anon_dev = 0;
btrfs_put_root(pending_snapshot->snap);
- btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv);
+ btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
free_pending:
if (pending_snapshot->anon_dev)
free_anon_bdev(pending_snapshot->anon_dev);
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index c89697486366..702dc5441f03 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -512,11 +512,20 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root *root,
if (ret && qgroup_num_bytes)
btrfs_qgroup_free_meta_prealloc(root, qgroup_num_bytes);
+ if (!ret) {
+ spin_lock(&rsv->lock);
+ rsv->qgroup_rsv_reserved += qgroup_num_bytes;
+ spin_unlock(&rsv->lock);
+ }
return ret;
}
-void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
+void btrfs_subvolume_release_metadata(struct btrfs_root *root,
struct btrfs_block_rsv *rsv)
{
- btrfs_block_rsv_release(fs_info, rsv, (u64)-1, NULL);
+ struct btrfs_fs_info *fs_info = root->fs_info;
+ u64 qgroup_to_release;
+
+ btrfs_block_rsv_release(fs_info, rsv, (u64)-1, &qgroup_to_release);
+ btrfs_qgroup_convert_reserved_meta(root, qgroup_to_release);
}
--
2.27.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode
2020-07-24 6:46 [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode Qu Wenruo
2020-07-24 6:46 ` [PATCH 2/2] btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations Qu Wenruo
@ 2020-08-24 18:02 ` Josef Bacik
2020-08-27 11:32 ` David Sterba
2 siblings, 0 replies; 6+ messages in thread
From: Josef Bacik @ 2020-08-24 18:02 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 7/24/20 2:46 AM, Qu Wenruo wrote:
> For delayed inode facility, qgroup metadata is reserved for it, and
> later freed.
>
> However we're freeing more bytes than we reserved.
> In btrfs_delayed_inode_reserve_metadata():
>
> num_bytes = btrfs_calc_metadata_size(fs_info, 1);
> ...
> ret = btrfs_qgroup_reserve_meta_prealloc(root,
> fs_info->nodesize, true);
> ...
> if (!ret) {
> node->bytes_reserved = num_bytes;
>
> But in btrfs_delayed_inode_release_metadata():
>
> if (qgroup_free)
> btrfs_qgroup_free_meta_prealloc(node->root,
> node->bytes_reserved);
> else
> btrfs_qgroup_convert_reserved_meta(node->root,
> node->bytes_reserved);
>
> This means, we're always releasing more qgroup metadata rsv than we have
> reserved.
>
> This won't trigger selftest warning, as btrfs qgroup metadata rsv has
> extra protection against cases like quota enabled half-way.
>
> But we still need to fix this problem any way.
>
> This patch will use the same num_bytes for qgroup metadata rsv so we
> could handle it correctly.
>
> Fixes: f218ea6c4792 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode
2020-07-24 6:46 [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode Qu Wenruo
2020-07-24 6:46 ` [PATCH 2/2] btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations Qu Wenruo
2020-08-24 18:02 ` [PATCH 1/2] btrfs: qgroup: fix wrong qgroup metadata reserve for delayed inode Josef Bacik
@ 2020-08-27 11:32 ` David Sterba
2 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2020-08-27 11:32 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Fri, Jul 24, 2020 at 02:46:09PM +0800, Qu Wenruo wrote:
> For delayed inode facility, qgroup metadata is reserved for it, and
> later freed.
>
> However we're freeing more bytes than we reserved.
> In btrfs_delayed_inode_reserve_metadata():
>
> num_bytes = btrfs_calc_metadata_size(fs_info, 1);
> ...
> ret = btrfs_qgroup_reserve_meta_prealloc(root,
> fs_info->nodesize, true);
> ...
> if (!ret) {
> node->bytes_reserved = num_bytes;
>
> But in btrfs_delayed_inode_release_metadata():
>
> if (qgroup_free)
> btrfs_qgroup_free_meta_prealloc(node->root,
> node->bytes_reserved);
> else
> btrfs_qgroup_convert_reserved_meta(node->root,
> node->bytes_reserved);
>
> This means, we're always releasing more qgroup metadata rsv than we have
> reserved.
>
> This won't trigger selftest warning, as btrfs qgroup metadata rsv has
> extra protection against cases like quota enabled half-way.
>
> But we still need to fix this problem any way.
>
> This patch will use the same num_bytes for qgroup metadata rsv so we
> could handle it correctly.
>
> Fixes: f218ea6c4792 ("btrfs: delayed-inode: Remove wrong qgroup meta reservation calls")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Added to misc-next.
Patch 2/2 should go to stable (4.19+) but it does not apply even to 5.8
due to some other intermediate changes. I have tagged it for 4.19 but it
will expectedly fail to apply. A backport would be needed then.
^ permalink raw reply [flat|nested] 6+ messages in thread